Reading Notes 2023 Mar - Apr

3 minute read

This post summarises the Medium blogs I read in the past two months. Hope you will enjoy the reading as well.

How to correctly select your unit of randomization in A/B Tests?: Discusses common randomization units and when to use each
Every AB Test is Wrong: The title looks scary, but it basically talks about the nuances like novelty effect, time and spatial stability, etc. when running AB testing
Market Segmentation for Geo-Testing at Scale: How Expedia uses geo-testing to measure the effect of digital campaigns

Meta-learning in Finance: Boosting Models Calibration with Deep Learning: How to use meta-learning in Finance
Building Airbnb Categories with ML & Human in the Loop: Airbnb team talks about their new Categories feature and how it was built with ML and human in the loop
Building a Media Understanding Platform for ML Innovations: Netflix talks about its media understanding platform with some interesting use cases
Gradient-Boosted Trees: To Early Stop or Not to Early Stop?: What is early stop and will it work for GBDTs
What Is Learning to Rank: A Beginner’s Guide to Learning to Rank Methods: Learning to Rank model introduction
Is F1-Score Really Better than Accuracy?: Detailed walkthrough of the differences between accuracy and F1-score, especially when the label is imbalanced
The Recommendation System at Lyft: The team at Lyft talks about several applications of Recommendation System at Lyft
Quantifying Efficiency in Ridesharing Marketplaces: How Lyft measures ridesharing marketplaces efficiency
What does Entropy Measure? An Intuitive Explanation: What actually is Entropy and why it is defined this way

Zero-ETL, ChatGPT, And The Future of Data Engineering: Some recent trends in data engineering
Pandas 2.0 is Here: Introduces improvements in Pandas 2.0
The 3 Reasons Why I Have Permanently Switched From Pandas To Polars: Why Polars might be a better package than Pandas for data transformation and manipulation in Python
Getting Started with the Polars DataFrame Library: Polars code examples for basic data transformation
Why Is Polars All The Rage: Another post focusing on the advantages of Polars
Pandas vs Polars vs Pandas 2.0 … ROUND 2: Performance comparison between Pandas, Pandas 2.0, and Polars

Breaking: Google Bard and GPT-4: Brief introduction of LLMs, specifically Bard and GPT-4
Why everyone should try GPT-4, even the CEO: You should try it yourself :)
Unboxing Google Bard and GPT-4: A side-by-side comparison between Bard and GPT-4
Will ChatGPT Steal Your Job? Your Boss Should be More Scared than You: Discusses a research by OpenAI on how impacted are certain jobs or occupations to LLMs
Beyond Written Output: Can ChatGPT Help With Analysis?: Examples of how ChatGPT can help with text analytics
How I Save Over 5 Hours Every Week Using ChatGPT as a Data Scientist: Areas ChatGPT could potentially help in DS work
How to Validate OpenAI GPT Model Performance with Text Summarization: An article comparing the text summarization job performance across different GPT models
Bonus: Online Course - ChatGPT Prompt Engineering for Developers: A great new course launched by DeepLearning.ai talking about ChatGPT prompt engineering tactics. See my course notes here

Data ingestion Pipeline with Operation Management: Netflix introduces its Annotation Operations which allows teams to create data pipelines and easily write annotations without worrying about access patterns of their data from different applications
Boosting Conversion Rate in E-Commerce: Three Proven Data Strategies and How to Prioritize Them: Some general ideas to boost conversions in e-commerce, including map your shopper journey, locate your biggest growth opportunities, and align data strategy with your profitability goal
Understanding a Diverse User Base with Frequency Segmentation at Scale: Team at Canva discusses how they built a user segmentation model
How to Turn Boring Visualization into Fascinating Data Storytelling: Key tactics to tell stories with data
Facebook/prophet in 2023 and Beyond: What we should expect from prophet package in the near future
What Should Your Decision Be When Your p-value = 0.052?: How to choose the p-value threshold
Four Analytics Best Practices We Adopted — and Why You Should Too: The analytics team from Meta shares four important best practices they utilize at work
6 Ways to Build Best Practices for Data Science Teams: Best practices DS teams should adopt

You May Also Enjoy