Reading Notes 2021 Nov - Dec

3 minute read

This is my sixth and last blog of this series this year, summarising the great posts Elise and I came across during our Friday and Sunday night reading sessions. Yay we made it to the end of 2021!
This year we read ~300 Data Science blog posts, and summarized the best 130+ in this readong notes series. Looking back we have learned lots of new stuff acorss experimentation, machine learning, analytics, data engineer, product strategy, etc. Hope you enoy this last post and see you next year :)

ExperimentationPermalink

Universal Holdout Groups at Disney Streaming: This article introduces the universal holdout concept, how Hulu uses it, common challenges and solutions
Simulated Bootstrapped A/A Tests: DS at Twitch discusses how they run simulated A/A tests to uncover potential problems with the intended metric
Marketing Incremental Lift Test 101: What is Marketing Incremental Lift test, and its difference and similarity comparing to product experiments
Bayesian A/B testing — A Practical Exploration with Simulations: Walks through what is Bayesian A/B testing and how to determine test duration, loss threshold, and prior
Ditch p-values. Use Bootstrap Confidence Intervals Instead: Common misunderstanding of p-value and how to construct a bootstrap confidence interval

Machine LearningPermalink

Avoiding Data Leakage in Time Series 101: Talks about why data leakage is a severe problem in time series model and how to avoid it
In Defense of Zillow’s Besieged Data Scientists: Zillow iBuying shutdown is a big news in Nov and aroused discussion around the value of machine learning models and how to use it wisely
Text-based Causal Inference: Tutorial on analyzing voter fraud disinformation by estimating causal effect with text as treatment and confounder
Top 5 techniques for Explainable AI: 5 techniques to make machine learning models more explainable
Explain like I’m five: Artificial neurons: A high-level and easy-to-understand introduction of neural network
Introduction to BanditPAM: How to combine Multi-Armed Bandit and PAM to make KMedoids algorithms faster
Prophet vs. NeuralProphet: A walkthrough of Prophet and NeuralProphet by Facebook and compared their performance on a dataset
5 Anomaly Detection Algorithms Every Data Scientist Should Know: A quick summary of five anomaly detection algorithms
Can You Trust Your Model When Data Shifts?: A practical example of why data shifts could impact your machine learning model in the context of text classification
Can Consumers’ Ratings Be Considered Equidistant?: Can we treat star ratings as equidistant? This article uses Correspondence Analysis to try to answer this question
How We Built a (Mostly) Automated System to Solve Credit Card Merchant Classification: A post by our DS team at Brex :) It introduces the credit card merchant classification framework combining Google Places API, Amazon Mechanical Turk, and machine learning models

AnalyticsPermalink

Airbnb’s Page Performance Score Part I, Part II, Part III, Part IV: A series of posts introduce how Airbnb designed and implemented the metrics to measure the page performance on web and mobile platforms
The Global Normalcy Index: How The Economist designed the Global Normalcy Index to track returning to normalcy after covid-19 and their findings

Data Team & StrategyPermalink

An experience of a ‘Data Ecosystem”: Discuss various data roles in the product ecosystem, their responsibilities and how to collaborate
Data Governance Has a Serious Branding Problem: Why many data governance teams/efforts are failing
Data Advantage Matrix: A New Way to Think About Data Strategy: Talks about four types of data advantages and how should companies prioritize them
Credibility of Data Science: Considerations to make data science more credible
FAANG Companies Are Redefining Data Science Archetypes: Data science is a new and evolving field. This articles discusses four common data science archetypes nowadays
How to Make Agile Actually Work for Analytics: Agile was initially designed for engineer processes, and how it could be applied to analytics workflows

OthersPermalink

Determining the optimal Pokemon team for Pokemon Brilliant Diamond and Shining Pearl with Pulp: A fun reading on how to use linear programming to determine the best Pokemon team :)
Data Science Experiment in Government: Data science initiatives at the German Ministry of Health

Share on

X Facebook LinkedIn Bluesky

Yu Dong

Reading Notes 2021 Nov - Dec

ExperimentationPermalink

Machine LearningPermalink

AnalyticsPermalink

Data Team & StrategyPermalink

OthersPermalink

Share on

You May Also Enjoy

Rethinking Data Science Interviews in the Age of AI

Share of New Electric Cars Sold

My 2025 Weekly Vizzes

NVDA vs BTC Prices Last 5 Years