2 minute read

This is a summary of the great Medium posts I came across in the past two months. Hope you enjoy it :)

Causal Inference

  1. Causal Forecasting at Lyft (Part I,Part II): Lyft team introduces their causal forecasting framework with real examples and explanations
  2. Beyond A/B Test : Speeding up Airbnb Search Ranking Experimentation through Interleaving: How Airbnb uses Interleaving techniques to test search ranking algorithms and the benefits
  3. Don’t Be Seduced by the Allure: A Guide for How (Not) to Use Proxy Metrics in Experiments: A great framework by Meta on when, why, and how to use proxy metrics in experiments
  4. Mean vs Median Causal Effect: How to estimate treatment effect on quantiles using quantile regression
  5. How Product Teams Can Build Empathy Through Experimentation: A great interview with Travis Brooks, Netflix Product Manager for Experimentation Platform, talking about how to build products that user like with experimentations

Machine Learning

  1. Why SHAP Values Might not be Perfect: Talks about how SHAP values lack causal structure and potential solutions to it
  2. SHAP for Categorical Features with CatBoost: How to use SHAP to interpret categorical variables in CatBoost
  3. How to Use UMAP For Much Faster And Effective Outlier Detection: How UMAP can be used to speed up outlier detection
  4. 5 Unusual Ways Bias Can Sneak into Your Models: Common sources of bias when building ML models
  5. Managing Biases in Recommender Systems: Common biases in recommender systems and how to handle them
  6. A Curated List of Important Time Series Forecasting Concepts: Quick refresh on time series concepts
  7. Top Python libraries for Time Series Analysis in 2022: Walkthroughs popular time series packages in Python
  8. Machine Learning for Fraud Detection in Streaming Services: Netflix team talks about considerations and learnings for fraud detection in streaming services
  9. Don’t use One-Hot Encoding Anymore: Alternatives to One-Hot Encoding when dealing with categorical variables
  10. How Instacart Uses Embeddings to Improve Search Relevance: An introduction of the ITEMS (the Instacart Transformer-based Embedding Model for Search) framework to improve search performance
  11. Forecasting Something That Never Happened: How We Estimated Past Promotions Profitability: A very detailed case study on how to estimate the impact of promotions ran in the past

Analytics

  1. Why does Self-Service BI Fail and What could Enterprises Do to Turn the Tide?: Common barriers that make self-service BI less efficient or not working
  2. Is “Self-Service” Data’s Biggest Lie?: A debate on why and why not self-service analytics works
  3. Analytics and Product-Market Fit: A great framework to measure the product-market-fit of a new product
  4. The 10 Best Data Visualizations of 2022: 10 hand-picked great data visualizations on Reddit this year
  5. Detailed Dashboard Design Guidelines Used by Professionals: Great guidance on how to design dashboards that convey information clearly
  6. Visualization Tools with Python: Popular visualization packages in Python
  7. Not All Data Requests Are Urgent, So Start by Asking These 5 Questions: Five things important to ask when you get ad-hoc data requests