Reading Notes 2022 Mar - Apr
This is a summary of the great Medium posts Elise and I read in March and April. Please enjoy :)
Experimentation
- Assign Experiment Variants at Scale in A/B tests: This article walks through a real-world example of flawed randomization, how the team found the problem and redesigned it
- Can we use difference-in-difference with a Biased A/B test?: A great discussion on if you could use DID for causal inference if the A/B test is biased
- Democratizing Experimentation: How to build up a good experimentation culture
- A/B Testing is Dead: Not that you should not do A/B testing, but you need to avoid these things to conduct effective A/B testing
- Measure A/B Testing Platform Health with Simulated A/A and A/B Tests: How to simulate A/A Tests to measure type I error rates, and simulated A/B tests to measure power
- Even Split Increases Power of A/B Tests: Explains why even split increases the A/B testing power and when to run an A/B test with unequal allocations
Machine Learning
- What is ROC-AUC and when not to use it: A short and clear explanation on ROC-AUC and how imbalanced dataset would impact it, and the PR-AUC alternative
- Explainable and Accessible AI: Using Push Notifications to Broaden the Reach of ML at Headspace: In this post, Headspace team talks about how they utilized explainable AI with push notification to do personalized recommendations
- Stop Using Random Forest Feature Importances. Take This Intuitive Approach Instead: Explains the shortcoming of random forest feature importance, and the alternative of permutation feature importance
- You Should Probably Know These 5 Facts About Tree Based Feature Importances: This post is from the same author as the last one, and it walks through the facts about tree based feature importances (both good and bad)
- Stop Using SMOTE to Treat Class Imbalance, Take This Intuitive Approach Instead: Same author talks about the shortcomings of SMOTE and alternatives to handle imbalanced dataset
- Personalization and Recommendation with Contextual Bandits; Explains what is Contextual Bandits and how it combines reinforcement learning and multi-armed bandit
- How Did We Predict Sales for Products with almost No Historical Data (Launches): A great article on how do product clustering based on look-alikes and use the similar products’ historical sales data to predict the early day sales of a brand new product
- Identifying Behavioral Personas with Cluster Analysis: An example of using Non-negative matrix factorization(NMF) to do user segmentation
- Customer Lifetime Value Estimation via Probabilistic Modeling: How to model customer lifetime value with the Beta Geometric Negative Binomial Distribution (BG-NBD) model
Analytics
- Causal Reasoning in Fashion Retail: The author talks about causal inference use cases in the fashion retail industry and the different types of problems
- Using Log-Time Denormalization for Data Wrangling at Meta: In the context of the giant impression data at Meta, this post talks about the pros and cons of the traditional snowflake schema and full denormalization schema, and the log-time denormalization solution
- Quantifying the Impact of Remote Work on the Work-life Balance: The data team at Atlassian utilized Confluence, JIRA and Bitbucket data, analyzed the impact of remote work on work efficiency and WLB
- Campaign Budgets at Pinterest: A quick overview on the Campaign Budget Optimization product at Pinterest
- Game Analytics 101 — Metrics and Frameworks: Basic metrics framework for game analytics
- Growing SEO by Slashing URLs: How Course Hero improved SEO performance by predicting the value of documents and pruning the URLs of less popular content
- How to Explain Channel Conversion Rate Change with Mix-Rate Bridging Analysis: A simple but useful framework to explain the channel conversion rate change
- Prioritizing Sales Outreach with Account Scoring: Our very own analytics team at Brex talks our scoring framework to prioritize sales outreach