2 minute read

As in 2021, this series of blogs summarise the best data science-related posts Elise and I came across in the past two months during our Friday and Sunday night reading time :)

Experimentation

  1. Experimentation at Netflix Series: This series covers the basics of A/B tests (Part 1 and Part 2), core statistical concepts (Part 3 and Part 4), how to build confidence in decisions based on A/B test results (Part 5), and experimentation use cases across different domains at Netflix (Part 6). This is by far the best experimentation reading series I have read, as it describes the underlying statistics and methodology clearly, meanwhile provides enough real-world examples and further readings
  2. Mindful Experimentation: Evaluate Recommendation System Performance using A/B Testing at Headspace: This post walks through the architecture and various considerations to evaluate recommendations system performance at Headspace
  3. Experiment without the wait: Speeding up the iteration cycle with Offline Replay Experimentation: How Pinterest uses the offline replay technique to speed up their experimentation
  4. Multiple Comparison: A Common Pitfall for A/B Testing: Why multiple comparison is a problem and its solutions
  5. Netflix: A Culture of Learning: How Netflix established a culture of experimentation and causal inference
  6. Why It Matters Where You Randomize Users in A/B Experiments: Simulates the impact on test duration with different user split point
  7. Don’t Use a T-Test for A/B Testing: How to use OLS to reduce the variance and reach statistical significance faster
  8. How to Double A/B Testing Speed with CUPED: Explains how we can use CUPED to speed up A/B testing
  9. Improve Your A/B Tests with 9 Lessons from the COVID-19 Vaccine Trials: This posts talks about learnings from COVID-19 Vaccine trials and how we could apply them to A/B tests
  10. Estimating the Long-run Value We Give to Our Users through Experiment meta-analysis: How Meta uses meta-analysis method that combines a series of experimentation and linear regression to estimate long-run value
  11. Embrace Overlapping A/B Tests and Avoid the Dangers of Isolating Experiments: A great comparison of sequential testing, overlapping testing, isolated testing, A/B/n testing, and multivariate testing

Machine Learning

  1. Does Your Recommender System Evaluation Metric Meet Business Goals?: Discusses various metrics to evaluate recommender system and how they align with the business goals
  2. 6 Types of “Feature Importance” Any Data Scientist Should Master: A great summary of different univariate and multivariate feature importance measures
  3. Model Interpretability in Risk Analytics: Interpret risk models with LIME, SHAP, iBreakDown, and Partial Dependence Plot (PDP)
  4. This is Machine Learning at Capital One: A catalog of machine learning-related blog posts at Capital One
  5. Managing Account Risk in Cash Product at Brex: How we built the Account Risk model with active learning and how to make the model results more interpretable

Analytics

  1. User Engagement Metrics in a nutshell: This article summarizes common user engagement metrics and their definition
  2. Monetization metrics in a nutshell: This article summarizes common monetization metrics and their definition
  3. Retention Analysis Framework: A beginner level reading on retention metrics and how to analyze retention
  4. Quantifying Product/Market Fit: A basic framework to quantify your product/market fit
  5. Improving the Quality of Listings on Faire: A Case Study: A great example of how Faire analyzed factors that impacting the quality of listings, and validated the insights
  6. Targeting Product Growth with Aha Moment Metrics: How to find the ‘Aha Moment’ metrics for startup product growth
  7. At a Startup, a Data Scientist must also be a Product Manager, and More: Role as the earliest data scientist at a startup
  8. Fixing Performance Regressions Before they Happen: How Netflix utilizes anomaly and changepoint detection to prevent performance regression