Reading Notes 2021 Mar - Apr

2 minute read

As mentioned in the last post, this year I have been reading DS&Analytics related blogs with my friend Elise every Friday and Sunday night. This is my second blog of this series, summarising the great posts we came acorss. Hope you enjoy it :)

ExperimentationPermalink

8 Common Pitfalls of Running A/B tests: Summarise common pitfalls and solutions of running A/B testing
Designing Experimentation Guardrails: Introduces how Airbnb implemented the guardrails metrics framework to scale up experiments while not harming key metrics
How We Rearchitected Mobile A/B Testing at The New York Times: A real example at NYT on how they debugged an unbalanced mobile experimentation assignment issue and resolved it
What We Can Learn From Google’s Long-term AB Tests: Why we need long-term AB tests and an example at Google
How Duolingo Runs Experiments at Scale: How Duolingo built the experimentation platform and a testing culture

AnalyticsPermalink

Mastering User Retention like Amazon, Spotify and Co.: How to measure retention for onboarding and customer loyalty
Doing Key-driver Analysis in Python: An example of how to do key-driver analysis in Python
Causal Inference: Trying to Understand the Question of Why: How to utilize the ‘DoWhy’ library to do causal inference
The Analytical Workflow is Broken: A very fun reading of the daily analytics workflow and how things are broken. The charts resonate with me a lot LOL
A Data-Driven Approach to Grow Spotify Radio: Use Spotify Radio product as an example to explain the AARRR framework

NLPPermalink

Keyword Extraction with BERT: How to utilize BERT to do keyword extraction
Interactive Topic Modeling with BERTopic: Introduces BERTopic package that is built on BERT for topic modeling
Sentiment Analysis of COVID-19 Vaccine Tweets: A sentiment analysis of COVID-19 vaccine related tweets using TextBlob

Machine LearningPermalink

The 5 Clustering Algorithms Data Scientists Need to Know: Talks about the 5 common clustering algorithms and pros and cons
Beyond Churn: An Introduction to Uplift Modeling: A case study of how to use Uplift Modeling to identify the customers most likely to response and act upon receiving a treatment (such as a promotional email)
Modern Recommender Systems: Discussed classic recommendation systems like content-based filtering and collaborative filtering, and modern methodologies like DLRM
KNN is Dead: Introduces ANN, a class of nearest-neighbor algorithm that is much faster than KNN with small cost of accuracy
Comparison of Segmentation Approaches: Different algorithms for segmentation (clustering)
17 types of similarity and dissimilarity measures used in data science: A very detailed introduction of 17 different similarity/distance measures and pros and cons
How to use Facebook’s NeuralProphet and why it’s so powerful: The new neural network version Prophet package by Facebook and examples on real data
Synthetic Data Vault (SDV): A Python Library for Dataset Modeling: A python package that can help to build fake data that captures the behavior of the actual data

Data Team & InfrastructurePermalink

Visualizing Data Timeliness at Airbnb: Talks about how Airbnb built a dashboard to monitor data pipeline SLAs and optimize timeliness
Analytics Engineering at Spotify: Introduces how Spotify created the Analytics Engineer position to improve data quality and infra
We Failed to Set Up a Data Catalog 3x. Here’s Why.: A fun reading on how Data Catalog initiatives fail in the company
Why We Need More AI Product Owners, Not Data Scientists: Talks about why companies need AI product owners and what’s the difference from traditional product owners
How To Estimate The Value of Data Products: Discuss how to estimate the value of data products

Share on

X Facebook LinkedIn Bluesky

Yu Dong

Reading Notes 2021 Mar - Apr

ExperimentationPermalink

AnalyticsPermalink

NLPPermalink

Machine LearningPermalink

Data Team & InfrastructurePermalink

Share on

You May Also Enjoy

Weekly Viz 2025-07-14

My 2025 Weekly Vizzes

Weekly Viz 2025-07-07

Reading Notes 2025 May - Jun