Reading Notes 2022 Jan - Feb
As in 2021, this series of blogs summarise the best data science-related posts Elise and I came across in the past two months during our Friday and Sunday night reading time :)
Experimentation
- Experimentation at Netflix Series: This series covers the basics of A/B tests (Part 1 and Part 2), core statistical concepts (Part 3 and Part 4), how to build confidence in decisions based on A/B test results (Part 5), and experimentation use cases across different domains at Netflix (Part 6). This is by far the best experimentation reading series I have read, as it describes the underlying statistics and methodology clearly, meanwhile provides enough real-world examples and further readings
- Mindful Experimentation: Evaluate Recommendation System Performance using A/B Testing at Headspace: This post walks through the architecture and various considerations to evaluate recommendations system performance at Headspace
- Experiment without the wait: Speeding up the iteration cycle with Offline Replay Experimentation: How Pinterest uses the offline replay technique to speed up their experimentation
- Multiple Comparison: A Common Pitfall for A/B Testing: Why multiple comparison is a problem and its solutions
- Netflix: A Culture of Learning: How Netflix established a culture of experimentation and causal inference
- Why It Matters Where You Randomize Users in A/B Experiments: Simulates the impact on test duration with different user split point
- Don’t Use a T-Test for A/B Testing: How to use OLS to reduce the variance and reach statistical significance faster
- How to Double A/B Testing Speed with CUPED: Explains how we can use CUPED to speed up A/B testing
- Improve Your A/B Tests with 9 Lessons from the COVID-19 Vaccine Trials: This posts talks about learnings from COVID-19 Vaccine trials and how we could apply them to A/B tests
- Estimating the Long-run Value We Give to Our Users through Experiment meta-analysis: How Meta uses meta-analysis method that combines a series of experimentation and linear regression to estimate long-run value
- Embrace Overlapping A/B Tests and Avoid the Dangers of Isolating Experiments: A great comparison of sequential testing, overlapping testing, isolated testing, A/B/n testing, and multivariate testing
Machine Learning
- Does Your Recommender System Evaluation Metric Meet Business Goals?: Discusses various metrics to evaluate recommender system and how they align with the business goals
- 6 Types of “Feature Importance” Any Data Scientist Should Master: A great summary of different univariate and multivariate feature importance measures
- Model Interpretability in Risk Analytics: Interpret risk models with LIME, SHAP, iBreakDown, and Partial Dependence Plot (PDP)
- This is Machine Learning at Capital One: A catalog of machine learning-related blog posts at Capital One
- Managing Account Risk in Cash Product at Brex: How we built the Account Risk model with active learning and how to make the model results more interpretable
Analytics
- User Engagement Metrics in a nutshell: This article summarizes common user engagement metrics and their definition
- Monetization metrics in a nutshell: This article summarizes common monetization metrics and their definition
- Retention Analysis Framework: A beginner level reading on retention metrics and how to analyze retention
- Quantifying Product/Market Fit: A basic framework to quantify your product/market fit
- Improving the Quality of Listings on Faire: A Case Study: A great example of how Faire analyzed factors that impacting the quality of listings, and validated the insights
- Targeting Product Growth with Aha Moment Metrics: How to find the ‘Aha Moment’ metrics for startup product growth
- At a Startup, a Data Scientist must also be a Product Manager, and More: Role as the earliest data scientist at a startup
- Fixing Performance Regressions Before they Happen: How Netflix utilizes anomaly and changepoint detection to prevent performance regression