Reading Notes 2021 Mar - Apr
As mentioned in the last post, this year I have been reading DS&Analytics related blogs with my friend Elise every Friday and Sunday night. This is my second blog of this series, summarising the great posts we came acorss. Hope you enjoy it :)
Experimentation
- 8 Common Pitfalls of Running A/B tests: Summarise common pitfalls and solutions of running A/B testing
- Designing Experimentation Guardrails: Introduces how Airbnb implemented the guardrails metrics framework to scale up experiments while not harming key metrics
- How We Rearchitected Mobile A/B Testing at The New York Times: A real example at NYT on how they debugged an unbalanced mobile experimentation assignment issue and resolved it
- What We Can Learn From Google’s Long-term AB Tests: Why we need long-term AB tests and an example at Google
- How Duolingo Runs Experiments at Scale: How Duolingo built the experimentation platform and a testing culture
Analytics
- Mastering User Retention like Amazon, Spotify and Co.: How to measure retention for onboarding and customer loyalty
- Doing Key-driver Analysis in Python: An example of how to do key-driver analysis in Python
- Causal Inference: Trying to Understand the Question of Why: How to utilize the ‘DoWhy’ library to do causal inference
- The Analytical Workflow is Broken: A very fun reading of the daily analytics workflow and how things are broken. The charts resonate with me a lot LOL
- A Data-Driven Approach to Grow Spotify Radio: Use Spotify Radio product as an example to explain the AARRR framework
NLP
- Keyword Extraction with BERT: How to utilize BERT to do keyword extraction
- Interactive Topic Modeling with BERTopic: Introduces BERTopic package that is built on BERT for topic modeling
- Sentiment Analysis of COVID-19 Vaccine Tweets: A sentiment analysis of COVID-19 vaccine related tweets using TextBlob
Machine Learning
- The 5 Clustering Algorithms Data Scientists Need to Know: Talks about the 5 common clustering algorithms and pros and cons
- Beyond Churn: An Introduction to Uplift Modeling: A case study of how to use Uplift Modeling to identify the customers most likely to response and act upon receiving a treatment (such as a promotional email)
- Modern Recommender Systems: Discussed classic recommendation systems like content-based filtering and collaborative filtering, and modern methodologies like DLRM
- KNN is Dead: Introduces ANN, a class of nearest-neighbor algorithm that is much faster than KNN with small cost of accuracy
- Comparison of Segmentation Approaches: Different algorithms for segmentation (clustering)
- 17 types of similarity and dissimilarity measures used in data science: A very detailed introduction of 17 different similarity/distance measures and pros and cons
- How to use Facebook’s NeuralProphet and why it’s so powerful: The new neural network version Prophet package by Facebook and examples on real data
- Synthetic Data Vault (SDV): A Python Library for Dataset Modeling: A python package that can help to build fake data that captures the behavior of the actual data
Data Team & Infrastructure
- Visualizing Data Timeliness at Airbnb: Talks about how Airbnb built a dashboard to monitor data pipeline SLAs and optimize timeliness
- Analytics Engineering at Spotify: Introduces how Spotify created the Analytics Engineer position to improve data quality and infra
- We Failed to Set Up a Data Catalog 3x. Here’s Why.: A fun reading on how Data Catalog initiatives fail in the company
- Why We Need More AI Product Owners, Not Data Scientists: Talks about why companies need AI product owners and what’s the difference from traditional product owners
- How To Estimate The Value of Data Products: Discuss how to estimate the value of data products