Reading Notes 2022 Nov - Dec

4 minute read

This is the summary of the great blog posts I read in November and December. This is the last one of the year – I am glad I keep up with this habit of reading blog posts every Friday and Sunday night for another year. Hope you will enjoy the reading as well. Happy new year :)

Causal Inference

What to Do When Your Experiment Returns a Non-Statistically Significant Result: Common reasons of a non-stat sig A/B testing results and ways to handle/communicate it
Fooled by Statistical Significance: What is really statistical significance and common misunderstandings
Uplift Modeling — A Bridge between Causal Inference, Machine Learning and Personalization: How to use Uplift Modeling to measure the impact of a marketing campaign
We Increased Conversion Rates by Over 20% Doing This: A great case study of increasing conversion rates with great design, user testing, and iterative experiments
Using Causal ML Instead of A/B Testing: Why Causal ML could be more handy than A/B testing in some cases
Why Spillover Effects Bias Your AB Testing Results and Ways to Overcome them: An explanation of Spillover Effects and common solutions to it
Notifications: why less is more — how Facebook has been increasing both user satisfaction and app usage by sending only a few notifications: The Facebook Notifications Data Science team shares their findings on notification volume and categories and its impact on user satisfaction and app usage using long-term experiments
Experiments on Returns on Investment: How to estimate the impact and confidence interval on ROI using experiments and the Delta Method

Machine Learning

5 Biggest Trends In Data Science In 2022: Five noticeable trends in DS including Tiny ML, Auto ML, Data-Driven Customer Experience, AIaaS (AI as a Service), and Augmented Analytics
Why is Mean Squared Error (MSE) So Popular?: Why people like MSE and where we should use MSE
What’s the Difference Between a Metric and a Loss Function?: How to differentiate the two and what are the best measures in each case
What’s your computer’s favorite metric?: Still the same series – why MSE is the easiest metric for computers to optimize for
Why is MSE = Bias² + Variance?: A great walkthrough and breakdown of MSE
Difference Between Normalization and Standardization: What are the different normalization and standardization methods and when to use them
D.A.R.T — Your New Weapon Against Overfitting in Boosting Models: How to use D.A.R.T(Dropouts meet Multiple Additive Regression Trees) to avoid overfitting in boosting models
PyCaret 3 is coming… What’s New?: New functions in PyCaret 3
Top Python Packages for Feature Engineering: Three useful Python packages for feature engineering – featuretools, feature-engine, and tsfresh
New Series: Creating Media with Machine Learning: The first post of a blog series highlighting Machine Learning efforts for content creation at Netflix
Match Cutting at Netflix: Finding Cuts with Smooth Visual Transitions: How Netflix uses machine learning techniques to help match the cuts in two shots
Building Airbnb Categories with ML and Human-in-the-Loop: ML and human efforts behind the launch of Airbnb Categories feature

Data Career

6 Habits to Include in Your Daily Routine for a Long, Happy Career as a Data Scientist: Six pieces of great advices that will benefit your DS career
Top 3 Tools to Promote Your Work in Analytics and Data Science: How to communicate and promote your great DS work to gain more visibility and impact
12 Books to Expand Your Worldview as a Data Professional: A great list of data science-related books
The 3 Stages of Data Maturity in an Organization: Starting with data -> Scaling with data -> Leading with data
Making the case for Analytics Product Managers: What is Analytics Product Manager and why we need this position
6 Reasons Why Companies Fail at Data Governance: Why would data governance initiatives fail in companies

Others

How to Get Actionable Insights from Customer Feedback: A framework to collect and utilize the customer feedback in product development
A Non-Exhaustive List Of ‘Silent’ Mistakes in SQL That Can Ruin Your Analysis: Some common SQL mistakes that you should keep an eye on
SQL Query Optimization: Level Up Your SQL Performance Tuning: A great list of SQL query optimization tips
I Modified An SQL Query From 24 Mins Down To 2 Seconds - A Tale of Query Optimization: A very good real-life case study of optimizing SQL query
5 SQL Bad Habits You Need to Break: Some common bad ways to write SQL queries
Start Using Google Trends as part of Our Data Analysis: A great example of how to derive some actionable business insights from Google Trends
Introducing ChatGPT!: An amazing reading on what is ChatGPT and its limitation (with a real example)
Creating a Customer Health Index: A Terrific Tool to Measure and Improve Customer Experience and Drive Growth: why we need a Customer Health Index and how to create one
The Many Layers of Data Lineage: How to better structure data lineage to make it more understandable and useful
Top 10 Data Visualizations of 2022 Worth Looking at!: 10 great visualizations that worth checking out
BONUS – Five Great Programming and Data Science Memes

Share on

X Facebook LinkedIn Bluesky

Yu Dong

Reading Notes 2022 Nov - Dec

Causal Inference

Machine Learning

Data Career

Others

Share on

You May Also Enjoy

Cost of Living China vs. India

My 2026 Weekly Vizzes

Your First 90 Days as a Data Scientist

US YoY Rental Costs