Reading Notes 2023 Mar - Apr
This post summarises the Medium blogs I read in the past two months. Hope you will enjoy the reading as well.
Causal Inference
- How to correctly select your unit of randomization in A/B Tests?: Discusses common randomization units and when to use each
- Every AB Test is Wrong: The title looks scary, but it basically talks about the nuances like novelty effect, time and spatial stability, etc. when running AB testing
- Market Segmentation for Geo-Testing at Scale: How Expedia uses geo-testing to measure the effect of digital campaigns
Machine Learning
- Meta-learning in Finance: Boosting Models Calibration with Deep Learning: How to use meta-learning in Finance
- Building Airbnb Categories with ML & Human in the Loop: Airbnb team talks about their new Categories feature and how it was built with ML and human in the loop
- Building a Media Understanding Platform for ML Innovations: Netflix talks about its media understanding platform with some interesting use cases
- Gradient-Boosted Trees: To Early Stop or Not to Early Stop?: What is early stop and will it work for GBDTs
- What Is Learning to Rank: A Beginner’s Guide to Learning to Rank Methods: Learning to Rank model introduction
- Is F1-Score Really Better than Accuracy?: Detailed walkthrough of the differences between accuracy and F1-score, especially when the label is imbalanced
- The Recommendation System at Lyft: The team at Lyft talks about several applications of Recommendation System at Lyft
- Quantifying Efficiency in Ridesharing Marketplaces: How Lyft measures ridesharing marketplaces efficiency
- What does Entropy Measure? An Intuitive Explanation: What actually is Entropy and why it is defined this way
Data Engineering
- Zero-ETL, ChatGPT, And The Future of Data Engineering: Some recent trends in data engineering
- Pandas 2.0 is Here: Introduces improvements in Pandas 2.0
- The 3 Reasons Why I Have Permanently Switched From Pandas To Polars: Why Polars might be a better package than Pandas for data transformation and manipulation in Python
- Getting Started with the Polars DataFrame Library: Polars code examples for basic data transformation
- Why Is Polars All The Rage: Another post focusing on the advantages of Polars
- Pandas vs Polars vs Pandas 2.0 … ROUND 2: Performance comparison between Pandas, Pandas 2.0, and Polars
ChatGPT
- Breaking: Google Bard and GPT-4: Brief introduction of LLMs, specifically Bard and GPT-4
- Why everyone should try GPT-4, even the CEO: You should try it yourself :)
- Unboxing Google Bard and GPT-4: A side-by-side comparison between Bard and GPT-4
- Will ChatGPT Steal Your Job? Your Boss Should be More Scared than You: Discusses a research by OpenAI on how impacted are certain jobs or occupations to LLMs
- Beyond Written Output: Can ChatGPT Help With Analysis?: Examples of how ChatGPT can help with text analytics
- How I Save Over 5 Hours Every Week Using ChatGPT as a Data Scientist: Areas ChatGPT could potentially help in DS work
- How to Validate OpenAI GPT Model Performance with Text Summarization: An article comparing the text summarization job performance across different GPT models
- Bonus: Online Course - ChatGPT Prompt Engineering for Developers: A great new course launched by DeepLearning.ai talking about ChatGPT prompt engineering tactics. See my course notes here
Others
- Data ingestion Pipeline with Operation Management: Netflix introduces its Annotation Operations which allows teams to create data pipelines and easily write annotations without worrying about access patterns of their data from different applications
- Boosting Conversion Rate in E-Commerce: Three Proven Data Strategies and How to Prioritize Them: Some general ideas to boost conversions in e-commerce, including map your shopper journey, locate your biggest growth opportunities, and align data strategy with your profitability goal
- Understanding a Diverse User Base with Frequency Segmentation at Scale: Team at Canva discusses how they built a user segmentation model
- How to Turn Boring Visualization into Fascinating Data Storytelling: Key tactics to tell stories with data
- Facebook/prophet in 2023 and Beyond: What we should expect from prophet package in the near future
- What Should Your Decision Be When Your p-value = 0.052?: How to choose the p-value threshold
- Four Analytics Best Practices We Adopted — and Why You Should Too: The analytics team from Meta shares four important best practices they utilize at work
- 6 Ways to Build Best Practices for Data Science Teams: Best practices DS teams should adopt