Reading Notes 2023 Sept - Oct
This post summarises the Medium blogs I read in the past two months. Hope you will enjoy the reading as well.
Causal Inference
- Causal Analysis with PyMC: Answering “What If?” With the New Do Operator: An introduction to the PyMC package and how to conduct a Bayesian Casual Inference with it
- Demystifying the Applications of Causal Inference in the Industry: Common causal inference techniques including Instrumental Variables, Propensity Score Matching, and Difference in Difference
- The Downsides of Experimentation: Things to avoid when you are running an experimentation
- Introducing the pymatch Python Package: pymatch package to do propensity score matching
- Apply Instrumental Variables Method in Causal Analysis: An explanation and walkthrough of the Instrumental Variable method
- Causal Inference Part 8: Instrumental Variable Analysis: A Powerful Technique for Causal Inference in Data Science: Another tutorial of Instrumental Variable analysis
- Differences between Matching and Regression: What’s the difference between the two common causal inference techniques Matching and Regression
- Exploring Counterfactual Insights: From Correlation to Causation in Data Analysis: Using pgmpy package to conduct counterfactual analysis
- Pricing and Promotion for Data Scientists with Causal AI: An introduction of the decisionOS package by causaLens for causal analysis
- Go beyond predictive modeling: Accurately compute interventions and counterfactuals with Double Machine Learning: Use the decisionOS package to conduct Double Machine Learning to learn unbiased estimates of causal effects
- Sneaky Science: Data Dredging Exposed: What is p-hacking and why it is bad
- A Visual Explanation of Variance, Covariance, Correlation and Causation: An intuitive visual explanation of the concepts
Machine Leanring
- SHAP for Binary and Multiclass Target Variables: How to interpret SHAP when predicting binary or multiclass target variables
- Expedia Group’s Customer Lifetime Value Prediction Model: The modeling approach that Expedia group employed to build a customer LTV prediction model
- Which Features Are Harmful For Your Classification Model?: How to use quantitative method to find the most harmful feature and recursively remove the feature with the highest error contribution
- Boosting Model Accuracy: Techniques I Learned During My Machine Learning Thesis at Spotify (+Code Snippets): A walkthrough of a Machine Learning project at Spotify
- Feature Importance Analysis with SHAP I Learned at Spotify (with the Help of the Avengers): An subsequent story to use SHAP value for model interpretation
- SHAP vs. ALE for Feature Interactions: Understanding Conflicting Results: Two commonly used feature interpretation techniques SHAP and ALE, and a deep dive of why they could give different results
- An Alternative Approach to Visualizing Feature Relationships in Large Datasets: How to use boxplot to better visualize the feature relationships
- XGBoost 2.0: Major Update on Tree-based Methods: What’s new in XGBoost 2.0
- Bring Your Own Algorithm to Anomaly Detection: The anomaly detection algorithm platform at Pinterest that allow data scientists to develop, migrate, and deploy their own Python algorithms easily
Data Career
- Why Being a Head of Data Isn’t What You Think It Is: What does it mean to be a Head of Data
- How to be a 10X Data Scientist: Tips to be an exceptional Data Scientist
- LLM Monitoring and Observability — A Summary of Techniques and Approaches for Responsible AI: How to evaluate and monitor LLM applications
- How to Talk About Data and Analysis to Non-Data People: Useful tips to communicate data insights to stakeholders
- 6 Bad Habits Killing Your Productivity in Data Science: Tips to improve DS work productivity
- Data Engineering at Meta: High-Level Overview of The Internal Tech Stack: What Meta uses for data warehouse, data discovery, data catalog, and various data workflow
- Balancing Urgency vs. Sustainability as an Analytics Team: When getting ad-hoc requests from stakehoders, how to proceed while ensure the analytics work is sustainable
- Is Impostor Syndrome the Result of Skipping an Important Step?: A discussion of imposter syndrome and a potential cure
AI and LLM
- How To Use ChatGPT: Data Analyst & Data Scientist Use Cases: How to use ChatGPT to benefit your daily work as a data professional
- ChatGPT + Tableau= A Match Made in AI Heaven!: How to integrate ChatGPT with Tableau via TabPy to make Tableau conversational :)
- How I used ChatGPT to Build a Streamlit Dashboard App: Ask ChatGPT’s help to build a Streamlit Dashboard
- Quantifying GPT-4’s Hidden Regressions Over Time: Some quantitative measure of the regressions of GPT-4
- AI Bias: Good Intentions Can Lead to Nasty Results: The kind of bias that AI could have
- The Future of AI Depends on Data Quality: Here’s Why: Why data quality is extremely important for AI quality