3 minute read

This is a summary of the great Medium posts Elise and I read in March and April. Please enjoy :)

Experimentation

  1. Assign Experiment Variants at Scale in A/B tests: This article walks through a real-world example of flawed randomization, how the team found the problem and redesigned it
  2. Can we use difference-in-difference with a Biased A/B test?: A great discussion on if you could use DID for causal inference if the A/B test is biased
  3. Democratizing Experimentation: How to build up a good experimentation culture
  4. A/B Testing is Dead: Not that you should not do A/B testing, but you need to avoid these things to conduct effective A/B testing
  5. Measure A/B Testing Platform Health with Simulated A/A and A/B Tests: How to simulate A/A Tests to measure type I error rates, and simulated A/B tests to measure power
  6. Even Split Increases Power of A/B Tests: Explains why even split increases the A/B testing power and when to run an A/B test with unequal allocations

Machine Learning

  1. What is ROC-AUC and when not to use it: A short and clear explanation on ROC-AUC and how imbalanced dataset would impact it, and the PR-AUC alternative
  2. Explainable and Accessible AI: Using Push Notifications to Broaden the Reach of ML at Headspace: In this post, Headspace team talks about how they utilized explainable AI with push notification to do personalized recommendations
  3. Stop Using Random Forest Feature Importances. Take This Intuitive Approach Instead: Explains the shortcoming of random forest feature importance, and the alternative of permutation feature importance
  4. You Should Probably Know These 5 Facts About Tree Based Feature Importances: This post is from the same author as the last one, and it walks through the facts about tree based feature importances (both good and bad)
  5. Stop Using SMOTE to Treat Class Imbalance, Take This Intuitive Approach Instead: Same author talks about the shortcomings of SMOTE and alternatives to handle imbalanced dataset
  6. Personalization and Recommendation with Contextual Bandits; Explains what is Contextual Bandits and how it combines reinforcement learning and multi-armed bandit
  7. How Did We Predict Sales for Products with almost No Historical Data (Launches): A great article on how do product clustering based on look-alikes and use the similar products’ historical sales data to predict the early day sales of a brand new product
  8. Identifying Behavioral Personas with Cluster Analysis: An example of using Non-negative matrix factorization(NMF) to do user segmentation
  9. Customer Lifetime Value Estimation via Probabilistic Modeling: How to model customer lifetime value with the Beta Geometric Negative Binomial Distribution (BG-NBD) model

Analytics

  1. Causal Reasoning in Fashion Retail: The author talks about causal inference use cases in the fashion retail industry and the different types of problems
  2. Using Log-Time Denormalization for Data Wrangling at Meta: In the context of the giant impression data at Meta, this post talks about the pros and cons of the traditional snowflake schema and full denormalization schema, and the log-time denormalization solution
  3. Quantifying the Impact of Remote Work on the Work-life Balance: The data team at Atlassian utilized Confluence, JIRA and Bitbucket data, analyzed the impact of remote work on work efficiency and WLB
  4. Campaign Budgets at Pinterest: A quick overview on the Campaign Budget Optimization product at Pinterest
  5. Game Analytics 101 — Metrics and Frameworks: Basic metrics framework for game analytics
  6. Growing SEO by Slashing URLs: How Course Hero improved SEO performance by predicting the value of documents and pruning the URLs of less popular content
  7. How to Explain Channel Conversion Rate Change with Mix-Rate Bridging Analysis: A simple but useful framework to explain the channel conversion rate change
  8. Prioritizing Sales Outreach with Account Scoring: Our very own analytics team at Brex talks our scoring framework to prioritize sales outreach