Reading Notes 2023 May - Jun
This post summarises the Medium blogs I read in the past two months. Hope you will enjoy the reading as well.
DS, Analytics
- Creative Fatigue: How advertisers can improve performance by managing repeated exposures: Meta talks about their analysis on creative fatigue and how to control it
- Using Graphs to Model and Analyze the Customer Journey: How the DS team at Microsoft uses graph to present customer journey
- Warden: Real Time Anomaly Detection at Pinterest: How Pinterest uses their Real Time Anomaly detection tool Warden to detect real time ML model drift, and dect spams
- Innovating Faster on Personalization Algorithms at Netflix Using Interleaving: How Netflix uses Interleaving on testing personalization algorithms, and how it is faster than the traditional A/B testing
- When You Should Prefer “Thompson Sampling” Over A/B Tests: What is Thompson Sampling and why it could be better than A/B tests
- Choosing the Right Path: Churn Models vs. Uplift Models: How to create an uplift model to better handle the churn problem
Machine Learning
- Visualizing Shapley Values Over Time: This post introduces several good ways to visualize Shapley values and help with model interpretation
- Why You Should Stop Using the ROC Curve: Detailed explanation of the differences between ROC Curve and PR Curve with examples
- An ML Based Approach to Proactive Advertiser Churn Prevention: How Pinterest team used GBDT to predict advertiser’s churn likelihood and validated with experimentation
- From Clusters To Insights; The Next Step: How to detect the driving features behind the cluster labels
- Twitter’s recommendation algorithm is now open source. What does it tell us?: Some observations from the recommendation algorithm that Twitter open sourced
- Representation Online Matters: Practical End-to-end Diversification in Search and Recommender Systems: Pinterest team walks through how they ensure diversification in search and recommender systems
- 19 Most Elegant Sklearn Tricks I Found After 3 Years of Use: This post talks about some Sklearn methods or tips that are less known but absolutely helpful
DS Career
- Build More Analyses, Build Less Dashboards: Why and how to change the mindset of buidling too many dashboards
- What I Am Doing to Stay Relevant as a Data Analyst: Several ways to always keep up with data analytics skills
- The Role of Product Data Science: A good summary of the main responsibilities as Product DS
- Crossing the Bridge: A Comparison of Data Science in Academia and Industry: This post compares the how DS work is different in academia and industry
- 12 Mental Models for Data Science: Important things to keep in mind as a data scientist
- What I Look For in Every Data Analyst Candidate: Important characteristics as a data analyst, from the perspective of a hiring manager
- Why Data Scientists don’t get a seat at the table and what they can do about it: How to get involved in product and strategy conversations as a data scientist
- Transform Your 1:1 Meetings into a Source of Insight: Suggestions on improving 1:1s
LLM
- Pandas AI — The Future of Data Analysis: An interesting new package that uses OpenAI API to run analytics with human language
- How GPT Models Work: Explains the algorithm behind GPT models on a high level
- Will Generative AI Replace the Need for Data Analysts?: Discusses the analytics use case of Generative AI and if it will actually replace data analysts
- I Used ChatGPT (Every Day) for 5 Months. Here Are Some Hidden Gems That Will Change Your Life: Some great tips on using ChatGPT better
- From Chaos to Clarity: Streamlining Data Cleansing Using Large Language Models: An example that uses LLM to process and clean messy data
Others
- Metis: Building Airbnb’s Next Generation Data Management Platform: An introduction of Airbnb’s data management platform and how it evolved
- Which Team Should Own Data Quality?: Discusses different options of manage data quality in industry
- Why You Should Become A Data Product Manager In 2023: What is data product manager and why it could be a good career choice
- The Unforgettable 15: Exploring the Best Data Visualizations of All Time (2023): Great visualizations to check out