Reading Notes 2023 Jan - Feb
This is the first article of the reading notes series in 2023. It summarises the Medium blogs I read in the past two months. Hope you will enjoy the reading as well.
Machine Learning and Causal Inference
- Building a Dynamic Pricing Capability (in under 90 days): Detailed walkthrough of building a Competitive Price Index Elasticity model for dynamic pricing
- The Science (and Art) of Estimating Price Elasticities: Different ways to estimate price elasticities
- Using Rideshare Data to Evaluate Racial Bias in the Issuance of Speeding Citations: Data Scientists at Lyft used rideshare data to estimate the racial inequities in traffic-related police punishment
- How to Build a Causal Inference Machine Learning Model to Explore Whether Global Warming is Caused by Human Activity: A case study of using Causal Inference techniques and DoWhy package to evaluate the causation between Human Acitivty and Global Warming
- Causal Machine Learning for Creative Insights: How Netflix used Causal Machine Learning to establish causality between artwork and its success
- Understanding Causal Trees: How to use causal trees to estimate heterogeneous treatment effects
- Matching, Weighting, or Regression?: Use matching, weighting, or regression for causal inference
- Understanding Meta Learners: Use Meta-learners (S-learner, T-learner and X-learner) to understand if a causal effect is different for different users
- Multi-touch Attribution: The Fundamental to Optimizing Customer Acquisition: An introduction of multi-touch attribution framework
- Using Sklearn Pipelines to Streamline your Machine Learning Process: A very clear step by step walkthrough of Sklearn pipeline
- Learning to Rank Using XGBoost: How to use XGBoost to train a Learning to Rank model
- Is There Always a Tradeoff Between Bias and Variance?: What is bias and variance tradeoff and if there is always one
- Overfitting, Underfitting, and Regularization: Understand basic machine learning concepts of overfitting and underfitting
- Understanding Gradient Boosting: A Data Scientist’s Guide: A clear explanation of gradient boosting and why it works
- Scaling Media Machine Learning at Netflix: Netflix talks about their media machine learning framework
- Discovering Creative Insights in Promotional Artwork: Netflix talks about top-down and bottom-up approaches to discover creative insights
- Grid Search and Random Search Are Outdated. This Approach Outperforms Both.: Introduces Bayesian search and compares its performance with Grid Search and Random Search
- A Quick Guide to Design Rigorous Machine Learning Experiments: Different things to consider when evaluating a domain-specific machine learning approach vs. a generic machine learning technique
- Uncovering the Limitations of Traditional DiD Method: Traditional DiD method may give significantly misleading estimates of the treatment effects when there are multiple time periods and variations in the treatment timing
DS Career
- The One Metric that All Data Teams Need to Track for Success: Discuss the best north star metric for data teams
- Data ROI: How to Estimate the Value of Your Data & Analytics Projects: Different directions to estimate DS project values
- What I’ve Learned from Interviewing more than 300 Data Scientists: Important things to stand out as a DS candidate
- What’s Next for Analytics in 2023?: Some trends to watch out for analytics
- The UX of Data: How to empower everyone with data
- Product Thinking for Data Teams: Use product thinking to drive data projects
- Data Storytelling 101: Essential Strategies for Data Scientists and AI Practitioner: An effective framework for storytelling in data science
Others
- Can ChatGPT Write Better SQL than a Data Analyst?: An interesting experimentation on making ChatGPT to write SQL
- 6 Ways ChatGPT Can Help Your Data & Analytics Team: Use ChatGPT to empower DS Analytics teams
- Data Science and ChatGPT: Five things ChatGPT can help with day to day Data Science work
- The New Google Analytics 4: Differences between the old Universal Analytics tag and the new Google Analytics 4
- A Beginner’s Guide to Markov Chains, Conditional Probability, and Independence: A detailed explanation of Markob Chains basics
- How We Cut ~95% Cost for Analytics Reporting and What We Have Learned: A great case study of how an organization can cut data infra costs by optimizing data storage, pipeline and queries