Reading Notes 2024 Sep - Oct

3 minute read

My Medium Articles!

It has been six months since I started writing on Medium and I am approaching 3k followers now! Below are the articles I posted lately. You can also find a copy of each one on my blog.

Seven Common Causes of Data Leakage in Machine Learning: Data leakage can sabotage even the most well-intentioned machine learning models, leading to inflated results and poor generalization. In this article, I cover seven common mistakes in data preprocessing, feature engineering, and train-test splitting that often lead to leakage—and how to avoid them.
Beyond Line and Bar Charts: 7 Less Common But Powerful Visualization Types: I have explored a wide range of data visualization types in my weekly visualization journey. In this article, I introduce seven less-known but powerful visualization types, with their specific use cases.
Top 5 Principles for Building User-Friendly Data Tables: In my article, I break down the Top 5 Principles for Building User-Friendly Data Tables to help make data more intuitive and reliable for your team.
From Insights to Impact: Presentation Skills Every Data Scientist Needs: As data scientists, we know that uncovering insights is only half the journey. The real challenge however is to turn those insights into actions and impact. Here I share a proven framework to help data professionals structure, design, and deliver presentations that resonate with stakeholders and drive business results.

Reading List in Past Two Months

Now, let’s talk about the great articles I came across in September and October:

Data Science & Analytics

Recommending for Long-Term Member Satisfaction at Netflix: Netflix talks about their reward engineering efforts to align Netflix recommendations with long-term member satisfaction.
Matching, Weighting, or Regression?: Understanding and comparing different methods for conditional causal inference analysis.
Double Machine Learning for Causal Inference: A Practical Guide: How to use Double Machine Learning to accurately estimate treatment effects
A Growth Marketer Guide to Designing A/B Tests using Python: A step-by-step guide on A/B tests with Python code examples.
Unlocking the Power of Difference-in-Differences: Estimating Causal Effects from Observational Data: Key assumptions of Difference-in-Differences, and how to use it to calculate the causal effect.
Understanding Instrumental Variables: A practical example of using Instrumental Variables to estimate the effect of newsletter subscription on sales.
Testing Percentiles: How to test a difference in percentiles between two groups, or the difference between a percentile from an observed sample and an expected value.
Mean vs Median Causal Effect: How to test quantile with Quantile Regression method.
Evaluating Uplift Models: How to compare and select the best uplift model.
The stats that tell you what could have been: Counterfactual Learning and Uplift Modeling: How Klaviyo used Uplift model to optimize email and SMS communications.
Convenient Time Series Forecasting with sktime: A walk-through of how to employ sktime for daily forecasting tasks.
5 Must-Know Techniques for Mastering Time-Series Analysis: Understand seasonality and trend of time series data, and how to do feature enginering and cross-validation appropriately.

Data Career

My Weekly Calendar as a Senior Data Science Manager: Practical advice on how to better arrange the time as a data science manager.
My 7 Sources of Income as a Data Scientist: Different ways to monetize your experience as a data scientist, including Full-time job, investment, YouTube Adsense, YouTube Sponsors, Mentoring/Consulting, Blogging, and Affiliates.
A Data Scientist’s Guide to Stakeholders: Advices on how to better collaborate with stakeholders as a data scientist.
The Illusion of Data Democratisation & Self-Service: What are the challenges and preceding conditions of achieving Data Democratisation & Self-Service.

AI and LLM

What Nobody Tells You About RAGs: A deep dive into why RAG doesn’t always work as expected: an overview of the business value, the data, and the technology behind it.
The LinkedIn OptOut AI Scandal: LinkedIn automatically opted users in to allowing LinkedIn to use your intellectual property to train their automated content creation GenAI systems. What does it mean and what’s the impact.
Why GenAI is a Data Deletion and Privacy Nightmare: Following the above article, what’s the challenge in terms of data deletion and privacy compliance for GenAI.
OpenAI Rolls Out ‘Canvas’ In ChatGPT — A Brand New Writing and Coding Interface: Overview of the new Canvas feature and how it compares with the Artifacts from Claude.
Deep Dive on OpenAI’s MLE-Bench: What is OpenAI’s MLE-Bench, what does it mean for MLEs, what it tells about OpenAI’s future plan?

Share on

X Facebook LinkedIn Bluesky

Yu Dong

Reading Notes 2024 Sep - Oct

My Medium Articles!

Reading List in Past Two Months

Data Science & Analytics

Data Career

AI and LLM

Share on

You May Also Enjoy

Weekly Viz 2025-07-21

My 2025 Weekly Vizzes

Weekly Viz 2025-07-14

Weekly Viz 2025-07-07