3 minute read

This is the first article of the reading notes series in 2023. It summarises the Medium blogs I read in the past two months. Hope you will enjoy the reading as well.

Machine Learning and Causal Inference

  1. Building a Dynamic Pricing Capability (in under 90 days): Detailed walkthrough of building a Competitive Price Index Elasticity model for dynamic pricing
  2. The Science (and Art) of Estimating Price Elasticities: Different ways to estimate price elasticities
  3. Using Rideshare Data to Evaluate Racial Bias in the Issuance of Speeding Citations: Data Scientists at Lyft used rideshare data to estimate the racial inequities in traffic-related police punishment
  4. How to Build a Causal Inference Machine Learning Model to Explore Whether Global Warming is Caused by Human Activity: A case study of using Causal Inference techniques and DoWhy package to evaluate the causation between Human Acitivty and Global Warming
  5. Causal Machine Learning for Creative Insights: How Netflix used Causal Machine Learning to establish causality between artwork and its success
  6. Understanding Causal Trees: How to use causal trees to estimate heterogeneous treatment effects
  7. Matching, Weighting, or Regression?: Use matching, weighting, or regression for causal inference
  8. Understanding Meta Learners: Use Meta-learners (S-learner, T-learner and X-learner) to understand if a causal effect is different for different users
  9. Multi-touch Attribution: The Fundamental to Optimizing Customer Acquisition: An introduction of multi-touch attribution framework
  10. Using Sklearn Pipelines to Streamline your Machine Learning Process: A very clear step by step walkthrough of Sklearn pipeline
  11. Learning to Rank Using XGBoost: How to use XGBoost to train a Learning to Rank model
  12. Is There Always a Tradeoff Between Bias and Variance?: What is bias and variance tradeoff and if there is always one
  13. Overfitting, Underfitting, and Regularization: Understand basic machine learning concepts of overfitting and underfitting
  14. Understanding Gradient Boosting: A Data Scientist’s Guide: A clear explanation of gradient boosting and why it works
  15. Scaling Media Machine Learning at Netflix: Netflix talks about their media machine learning framework
  16. Discovering Creative Insights in Promotional Artwork: Netflix talks about top-down and bottom-up approaches to discover creative insights
  17. Grid Search and Random Search Are Outdated. This Approach Outperforms Both.: Introduces Bayesian search and compares its performance with Grid Search and Random Search
  18. A Quick Guide to Design Rigorous Machine Learning Experiments: Different things to consider when evaluating a domain-specific machine learning approach vs. a generic machine learning technique
  19. Uncovering the Limitations of Traditional DiD Method: Traditional DiD method may give significantly misleading estimates of the treatment effects when there are multiple time periods and variations in the treatment timing

DS Career

  1. The One Metric that All Data Teams Need to Track for Success: Discuss the best north star metric for data teams
  2. Data ROI: How to Estimate the Value of Your Data & Analytics Projects: Different directions to estimate DS project values
  3. What I’ve Learned from Interviewing more than 300 Data Scientists: Important things to stand out as a DS candidate
  4. What’s Next for Analytics in 2023?: Some trends to watch out for analytics
  5. The UX of Data: How to empower everyone with data
  6. Product Thinking for Data Teams: Use product thinking to drive data projects
  7. Data Storytelling 101: Essential Strategies for Data Scientists and AI Practitioner: An effective framework for storytelling in data science

Others

  1. Can ChatGPT Write Better SQL than a Data Analyst?: An interesting experimentation on making ChatGPT to write SQL
  2. 6 Ways ChatGPT Can Help Your Data & Analytics Team: Use ChatGPT to empower DS Analytics teams
  3. Data Science and ChatGPT: Five things ChatGPT can help with day to day Data Science work
  4. The New Google Analytics 4: Differences between the old Universal Analytics tag and the new Google Analytics 4
  5. A Beginner’s Guide to Markov Chains, Conditional Probability, and Independence: A detailed explanation of Markob Chains basics
  6. How We Cut ~95% Cost for Analytics Reporting and What We Have Learned: A great case study of how an organization can cut data infra costs by optimizing data storage, pipeline and queries