Reading Notes 2021 Nov - Dec
This is my sixth and last blog of this series this year, summarising the great posts Elise and I came across during our Friday and Sunday night reading sessions. Yay we made it to the end of 2021!
This year we read ~300 Data Science blog posts, and summarized the best 130+ in this readong notes series. Looking back we have learned lots of new stuff acorss experimentation, machine learning, analytics, data engineer, product strategy, etc. Hope you enoy this last post and see you next year :)
Experimentation
- Universal Holdout Groups at Disney Streaming: This article introduces the universal holdout concept, how Hulu uses it, common challenges and solutions
- Simulated Bootstrapped A/A Tests: DS at Twitch discusses how they run simulated A/A tests to uncover potential problems with the intended metric
- Marketing Incremental Lift Test 101: What is Marketing Incremental Lift test, and its difference and similarity comparing to product experiments
- Bayesian A/B testing — A Practical Exploration with Simulations: Walks through what is Bayesian A/B testing and how to determine test duration, loss threshold, and prior
- Ditch p-values. Use Bootstrap Confidence Intervals Instead: Common misunderstanding of p-value and how to construct a bootstrap confidence interval
Machine Learning
- Avoiding Data Leakage in Time Series 101: Talks about why data leakage is a severe problem in time series model and how to avoid it
- In Defense of Zillow’s Besieged Data Scientists: Zillow iBuying shutdown is a big news in Nov and aroused discussion around the value of machine learning models and how to use it wisely
- Text-based Causal Inference: Tutorial on analyzing voter fraud disinformation by estimating causal effect with text as treatment and confounder
- Top 5 techniques for Explainable AI: 5 techniques to make machine learning models more explainable
- Explain like I’m five: Artificial neurons: A high-level and easy-to-understand introduction of neural network
- Introduction to BanditPAM: How to combine Multi-Armed Bandit and PAM to make KMedoids algorithms faster
- Prophet vs. NeuralProphet: A walkthrough of Prophet and NeuralProphet by Facebook and compared their performance on a dataset
- 5 Anomaly Detection Algorithms Every Data Scientist Should Know: A quick summary of five anomaly detection algorithms
- Can You Trust Your Model When Data Shifts?: A practical example of why data shifts could impact your machine learning model in the context of text classification
- Can Consumers’ Ratings Be Considered Equidistant?: Can we treat star ratings as equidistant? This article uses Correspondence Analysis to try to answer this question
- How We Built a (Mostly) Automated System to Solve Credit Card Merchant Classification: A post by our DS team at Brex :) It introduces the credit card merchant classification framework combining Google Places API, Amazon Mechanical Turk, and machine learning models
Analytics
- Airbnb’s Page Performance Score Part I, Part II, Part III, Part IV: A series of posts introduce how Airbnb designed and implemented the metrics to measure the page performance on web and mobile platforms
- The Global Normalcy Index: How The Economist designed the Global Normalcy Index to track returning to normalcy after covid-19 and their findings
Data Team & Strategy
- An experience of a ‘Data Ecosystem”: Discuss various data roles in the product ecosystem, their responsibilities and how to collaborate
- Data Governance Has a Serious Branding Problem: Why many data governance teams/efforts are failing
- Data Advantage Matrix: A New Way to Think About Data Strategy: Talks about four types of data advantages and how should companies prioritize them
- Credibility of Data Science: Considerations to make data science more credible
- FAANG Companies Are Redefining Data Science Archetypes: Data science is a new and evolving field. This articles discusses four common data science archetypes nowadays
- How to Make Agile Actually Work for Analytics: Agile was initially designed for engineer processes, and how it could be applied to analytics workflows
Others
- Determining the optimal Pokemon team for Pokemon Brilliant Diamond and Shining Pearl with Pulp: A fun reading on how to use linear programming to determine the best Pokemon team :)
- Data Science Experiment in Government: Data science initiatives at the German Ministry of Health