Reading Notes 2024 Jan - Feb
This is the fourth year since I started reading data science blogs (mostly on Medium) every Friday and Sunday night. This habit has brought me immense self-fulfillment, and it’s a journey I’m eager to continue. To share this passion, I will continue to post a bi-monthly compilation of the great articles I encounter, with the hope that you’ll find as much joy and inspiration in them as I do. :)
Data Science
- How is Causal Inference Different in Academia and Industry?: How causal inference is different in academia and industry in terms of speed, method, feedback loop, metric, and model efficiency, etc.
- Is R on the decline?: R vs. Python is such a classic debate topic since I started studying data science. This article talks about some recent trend and pros and cons of each
- Don’t use loc/iloc with Loops In Python, Instead, Use This!, Don’t use Apply in Python, follow these Best Practices!: Useful Python tips of using ‘at’ in place of ‘loc’ in for loops, and alternatives to ‘apply’, with clear comparison of efficiencies
- How to Use Causal Inference When A/B Testing Is Not Available: An example of using propensity score matching to measure the impact of Contextual Ad
- Sequential A/B Testing Keeps the World Streaming Netflix Part 1: Continuous Data: Using sequential testing to quickly and confidently identify any difference in the distribution of play-delay at Netflix
- Customer Attrition: How to Define Churn When Customers Do Not Tell They’re Leaving: Common ways to define churn for retail and banking industry where customers don’t explicitly tell they are leaving
- Unlocking Insights: Building a Scorecard with Logistic Regression: Detailed walkthrough of using logistic regression and Weights of Evidence transformation to build a loan application scorecard
- How Predicting Customer Lifetime Value Enables HelloFresh to Optimize Its Marketing Spend: How HelloFresh integrate their Customer Lifetime Value model with Google ads to achieve higher return on ad spend
- Find Unusual Segments in Your Data with Subgroup Discovery: How Patient Rule Induction Method (PRIM) works, and a code example of identifying subgroup with unusual churn rate with this method
Data Career
- What Sets Great Data Analysts Apart: Six key areas one should focus on to become a 10x data analyst
- How Airbnb Turned Itself Into A Data-Driven Company Through Business Intelligence: How does Airbnb adopted data science and built up the data-driven culture in its early days
- How I Stay Up to Date with AI as a Data Scientist: Useful tips to keep yourself updated in this fast-evolving data science field
- Positioning Your Analytics Team on the Right Projects: DS/DA teams today always get tons of requests from various stakeholders. This post talks about how to prioritize them and find the right “investment thesis”
- The Data Speaker’s Blueprint: Turning Analytics into Applause: Why do we need data speakers and how to get started
- Becoming Data Driven, From First Principles: Talks about the key of a successful Amazon-style Weekly Business Review, the idea of Statistical Process Control, and briefly introduces XmR chart
- Get to know: Ronan Bradley, VP of Product Analytics and User Research for Facebook at Meta: An interview with Ronan Bradley about his journey at Meta and advices for data scientists
- 5 Habits Spotify Senior Data Scientists Use to Boost Their Productivity: Helpful habits to improve DS work productivity
- 3 Types of Delivery Models for Data and Analytics Teams: Differences between “data as a product”, “data platform”, and “data product” in terms of value props, roles, responsibilities and output
- Data Science in 2024 — What Has Changed: Look back at the data science field development since 2020, and outlook in 2024 with the layoffs and the evolving AI tools
- Self-Service Data Analytics as a Hierarchy of Needs: Using the Hierarchy of Needs to explain the types of self-serve data analytics and how to implement it
- The Sad Reality: Not Enough Actual Data Science: A very cruel reality – when small to mid-size companies hire Data Scientists and expect them to do everything from simple data entry on spreadsheets to building advanced data models and writing documentation, it leaves very little room for Data Scientists to do what they do best: perform analysis.
- How to Make Yourself More Layoff-Proof as a Data Scientist: Though layoff in many cases is out of our control, things we could do to make ourselves a bit more ‘layoff-proof’
AI and LLM
- How to Use the Powerful New Assistants API for Data Analysis: Step-by-step walkthrough of how to build a data analysis assistant
- I Tried Data Analysis ChatGPT Plugin — Every Analyst’s Dream or a Nightmare in Disguise?: Evaluate the data analysis plugin with different prompts
- Reframing LLM ‘Chat with Data’: Introducing LLM-Assisted Data Recipes: A framework to build a LLM-assisted data recipes that can help non-technical stakeholders to use natural languages to ask questions about the data and get answers back
- Powerful Data Analysis and Plotting via Natural Language Requests by Giving LLMs Access to Libraries: How to ask an LLM questions about a dataset with my own words and have it interpret these questions with the maths or scripting required to answer them? A proposed solution with actual code
- QueryGPT — Harnessing Generative AI To Query Your Data With Natural Language: A prototype tool powered by Large Language Models to use natural language to query your databases
- What are: Embeddings? Vector Databases? Vector Search? k-NN? ANN?: Explain these concepts in plain English
- Topic Modelling using ChatGPT API: A comprehensive guide to use ChatGPT API to identify common topics and make topic classification
- Performing Customer Analytics with LangChain and LLMs: Step by step example of create a LangChain application that can write queries and output charts for you