Reading Notes 2023 Sept - Oct

3 minute read

This post summarises the Medium blogs I read in the past two months. Hope you will enjoy the reading as well.

Causal Inference

Causal Analysis with PyMC: Answering “What If?” With the New Do Operator: An introduction to the PyMC package and how to conduct a Bayesian Casual Inference with it
Demystifying the Applications of Causal Inference in the Industry: Common causal inference techniques including Instrumental Variables, Propensity Score Matching, and Difference in Difference
The Downsides of Experimentation: Things to avoid when you are running an experimentation
Introducing the pymatch Python Package: pymatch package to do propensity score matching
Apply Instrumental Variables Method in Causal Analysis: An explanation and walkthrough of the Instrumental Variable method
Causal Inference Part 8: Instrumental Variable Analysis: A Powerful Technique for Causal Inference in Data Science: Another tutorial of Instrumental Variable analysis
Differences between Matching and Regression: What’s the difference between the two common causal inference techniques Matching and Regression
Exploring Counterfactual Insights: From Correlation to Causation in Data Analysis: Using pgmpy package to conduct counterfactual analysis
Pricing and Promotion for Data Scientists with Causal AI: An introduction of the decisionOS package by causaLens for causal analysis
Go beyond predictive modeling: Accurately compute interventions and counterfactuals with Double Machine Learning: Use the decisionOS package to conduct Double Machine Learning to learn unbiased estimates of causal effects
Sneaky Science: Data Dredging Exposed: What is p-hacking and why it is bad
A Visual Explanation of Variance, Covariance, Correlation and Causation: An intuitive visual explanation of the concepts

Machine Leanring

SHAP for Binary and Multiclass Target Variables: How to interpret SHAP when predicting binary or multiclass target variables
Expedia Group’s Customer Lifetime Value Prediction Model: The modeling approach that Expedia group employed to build a customer LTV prediction model
Which Features Are Harmful For Your Classification Model?: How to use quantitative method to find the most harmful feature and recursively remove the feature with the highest error contribution
Boosting Model Accuracy: Techniques I Learned During My Machine Learning Thesis at Spotify (+Code Snippets): A walkthrough of a Machine Learning project at Spotify
Feature Importance Analysis with SHAP I Learned at Spotify (with the Help of the Avengers): An subsequent story to use SHAP value for model interpretation
SHAP vs. ALE for Feature Interactions: Understanding Conflicting Results: Two commonly used feature interpretation techniques SHAP and ALE, and a deep dive of why they could give different results
An Alternative Approach to Visualizing Feature Relationships in Large Datasets: How to use boxplot to better visualize the feature relationships
XGBoost 2.0: Major Update on Tree-based Methods: What’s new in XGBoost 2.0
Bring Your Own Algorithm to Anomaly Detection: The anomaly detection algorithm platform at Pinterest that allow data scientists to develop, migrate, and deploy their own Python algorithms easily

Data Career

Why Being a Head of Data Isn’t What You Think It Is: What does it mean to be a Head of Data
How to be a 10X Data Scientist: Tips to be an exceptional Data Scientist
LLM Monitoring and Observability — A Summary of Techniques and Approaches for Responsible AI: How to evaluate and monitor LLM applications
How to Talk About Data and Analysis to Non-Data People: Useful tips to communicate data insights to stakeholders
6 Bad Habits Killing Your Productivity in Data Science: Tips to improve DS work productivity
Data Engineering at Meta: High-Level Overview of The Internal Tech Stack: What Meta uses for data warehouse, data discovery, data catalog, and various data workflow
Balancing Urgency vs. Sustainability as an Analytics Team: When getting ad-hoc requests from stakehoders, how to proceed while ensure the analytics work is sustainable
Is Impostor Syndrome the Result of Skipping an Important Step?: A discussion of imposter syndrome and a potential cure

AI and LLM

How To Use ChatGPT: Data Analyst & Data Scientist Use Cases: How to use ChatGPT to benefit your daily work as a data professional
ChatGPT + Tableau= A Match Made in AI Heaven!: How to integrate ChatGPT with Tableau via TabPy to make Tableau conversational :)
How I used ChatGPT to Build a Streamlit Dashboard App: Ask ChatGPT’s help to build a Streamlit Dashboard
Quantifying GPT-4’s Hidden Regressions Over Time: Some quantitative measure of the regressions of GPT-4
AI Bias: Good Intentions Can Lead to Nasty Results: The kind of bias that AI could have
The Future of AI Depends on Data Quality: Here’s Why: Why data quality is extremely important for AI quality

Share on

X Facebook LinkedIn Bluesky

Yu Dong

Reading Notes 2023 Sept - Oct

Causal Inference

Machine Leanring

Data Career

AI and LLM

Share on

You May Also Enjoy

Weekly Viz 2025-07-21

My 2025 Weekly Vizzes

Weekly Viz 2025-07-14

Weekly Viz 2025-07-07