3 minute read

This post summarises the Medium blogs I read in the past two months. Hope you will enjoy the reading as well.

Causal Inference

  1. How to correctly select your unit of randomization in A/B Tests?: Discusses common randomization units and when to use each
  2. Every AB Test is Wrong: The title looks scary, but it basically talks about the nuances like novelty effect, time and spatial stability, etc. when running AB testing
  3. Market Segmentation for Geo-Testing at Scale: How Expedia uses geo-testing to measure the effect of digital campaigns

Machine Learning

  1. Meta-learning in Finance: Boosting Models Calibration with Deep Learning: How to use meta-learning in Finance
  2. Building Airbnb Categories with ML & Human in the Loop: Airbnb team talks about their new Categories feature and how it was built with ML and human in the loop
  3. Building a Media Understanding Platform for ML Innovations: Netflix talks about its media understanding platform with some interesting use cases
  4. Gradient-Boosted Trees: To Early Stop or Not to Early Stop?: What is early stop and will it work for GBDTs
  5. What Is Learning to Rank: A Beginner’s Guide to Learning to Rank Methods: Learning to Rank model introduction
  6. Is F1-Score Really Better than Accuracy?: Detailed walkthrough of the differences between accuracy and F1-score, especially when the label is imbalanced
  7. The Recommendation System at Lyft: The team at Lyft talks about several applications of Recommendation System at Lyft
  8. Quantifying Efficiency in Ridesharing Marketplaces: How Lyft measures ridesharing marketplaces efficiency
  9. What does Entropy Measure? An Intuitive Explanation: What actually is Entropy and why it is defined this way

Data Engineering

  1. Zero-ETL, ChatGPT, And The Future of Data Engineering: Some recent trends in data engineering
  2. Pandas 2.0 is Here: Introduces improvements in Pandas 2.0
  3. The 3 Reasons Why I Have Permanently Switched From Pandas To Polars: Why Polars might be a better package than Pandas for data transformation and manipulation in Python
  4. Getting Started with the Polars DataFrame Library: Polars code examples for basic data transformation
  5. Why Is Polars All The Rage: Another post focusing on the advantages of Polars
  6. Pandas vs Polars vs Pandas 2.0 … ROUND 2: Performance comparison between Pandas, Pandas 2.0, and Polars

ChatGPT

  1. Breaking: Google Bard and GPT-4: Brief introduction of LLMs, specifically Bard and GPT-4
  2. Why everyone should try GPT-4, even the CEO: You should try it yourself :)
  3. Unboxing Google Bard and GPT-4: A side-by-side comparison between Bard and GPT-4
  4. Will ChatGPT Steal Your Job? Your Boss Should be More Scared than You: Discusses a research by OpenAI on how impacted are certain jobs or occupations to LLMs
  5. Beyond Written Output: Can ChatGPT Help With Analysis?: Examples of how ChatGPT can help with text analytics
  6. How I Save Over 5 Hours Every Week Using ChatGPT as a Data Scientist: Areas ChatGPT could potentially help in DS work
  7. How to Validate OpenAI GPT Model Performance with Text Summarization: An article comparing the text summarization job performance across different GPT models
  8. Bonus: Online Course - ChatGPT Prompt Engineering for Developers: A great new course launched by DeepLearning.ai talking about ChatGPT prompt engineering tactics. See my course notes here

Others

  1. Data ingestion Pipeline with Operation Management: Netflix introduces its Annotation Operations which allows teams to create data pipelines and easily write annotations without worrying about access patterns of their data from different applications
  2. Boosting Conversion Rate in E-Commerce: Three Proven Data Strategies and How to Prioritize Them: Some general ideas to boost conversions in e-commerce, including map your shopper journey, locate your biggest growth opportunities, and align data strategy with your profitability goal
  3. Understanding a Diverse User Base with Frequency Segmentation at Scale: Team at Canva discusses how they built a user segmentation model
  4. How to Turn Boring Visualization into Fascinating Data Storytelling: Key tactics to tell stories with data
  5. Facebook/prophet in 2023 and Beyond: What we should expect from prophet package in the near future
  6. What Should Your Decision Be When Your p-value = 0.052?: How to choose the p-value threshold
  7. Four Analytics Best Practices We Adopted — and Why You Should Too: The analytics team from Meta shares four important best practices they utilize at work
  8. 6 Ways to Build Best Practices for Data Science Teams: Best practices DS teams should adopt