3 minute read

This post summarises the Medium blogs I read in the past two months. Hope you will enjoy the reading as well.

Causal Inference

  1. Causal Analysis with PyMC: Answering “What If?” With the New Do Operator: An introduction to the PyMC package and how to conduct a Bayesian Casual Inference with it
  2. Demystifying the Applications of Causal Inference in the Industry: Common causal inference techniques including Instrumental Variables, Propensity Score Matching, and Difference in Difference
  3. The Downsides of Experimentation: Things to avoid when you are running an experimentation
  4. Introducing the pymatch Python Package: pymatch package to do propensity score matching
  5. Apply Instrumental Variables Method in Causal Analysis: An explanation and walkthrough of the Instrumental Variable method
  6. Causal Inference Part 8: Instrumental Variable Analysis: A Powerful Technique for Causal Inference in Data Science: Another tutorial of Instrumental Variable analysis
  7. Differences between Matching and Regression: What’s the difference between the two common causal inference techniques Matching and Regression
  8. Exploring Counterfactual Insights: From Correlation to Causation in Data Analysis: Using pgmpy package to conduct counterfactual analysis
  9. Pricing and Promotion for Data Scientists with Causal AI: An introduction of the decisionOS package by causaLens for causal analysis
  10. Go beyond predictive modeling: Accurately compute interventions and counterfactuals with Double Machine Learning: Use the decisionOS package to conduct Double Machine Learning to learn unbiased estimates of causal effects
  11. Sneaky Science: Data Dredging Exposed: What is p-hacking and why it is bad
  12. A Visual Explanation of Variance, Covariance, Correlation and Causation: An intuitive visual explanation of the concepts

Machine Leanring

  1. SHAP for Binary and Multiclass Target Variables: How to interpret SHAP when predicting binary or multiclass target variables
  2. Expedia Group’s Customer Lifetime Value Prediction Model: The modeling approach that Expedia group employed to build a customer LTV prediction model
  3. Which Features Are Harmful For Your Classification Model?: How to use quantitative method to find the most harmful feature and recursively remove the feature with the highest error contribution
  4. Boosting Model Accuracy: Techniques I Learned During My Machine Learning Thesis at Spotify (+Code Snippets): A walkthrough of a Machine Learning project at Spotify
  5. Feature Importance Analysis with SHAP I Learned at Spotify (with the Help of the Avengers): An subsequent story to use SHAP value for model interpretation
  6. SHAP vs. ALE for Feature Interactions: Understanding Conflicting Results: Two commonly used feature interpretation techniques SHAP and ALE, and a deep dive of why they could give different results
  7. An Alternative Approach to Visualizing Feature Relationships in Large Datasets: How to use boxplot to better visualize the feature relationships
  8. XGBoost 2.0: Major Update on Tree-based Methods: What’s new in XGBoost 2.0
  9. Bring Your Own Algorithm to Anomaly Detection: The anomaly detection algorithm platform at Pinterest that allow data scientists to develop, migrate, and deploy their own Python algorithms easily

Data Career

  1. Why Being a Head of Data Isn’t What You Think It Is: What does it mean to be a Head of Data
  2. How to be a 10X Data Scientist: Tips to be an exceptional Data Scientist
  3. LLM Monitoring and Observability — A Summary of Techniques and Approaches for Responsible AI: How to evaluate and monitor LLM applications
  4. How to Talk About Data and Analysis to Non-Data People: Useful tips to communicate data insights to stakeholders
  5. 6 Bad Habits Killing Your Productivity in Data Science: Tips to improve DS work productivity
  6. Data Engineering at Meta: High-Level Overview of The Internal Tech Stack: What Meta uses for data warehouse, data discovery, data catalog, and various data workflow
  7. Balancing Urgency vs. Sustainability as an Analytics Team: When getting ad-hoc requests from stakehoders, how to proceed while ensure the analytics work is sustainable
  8. Is Impostor Syndrome the Result of Skipping an Important Step?: A discussion of imposter syndrome and a potential cure

AI and LLM

  1. How To Use ChatGPT: Data Analyst & Data Scientist Use Cases: How to use ChatGPT to benefit your daily work as a data professional
  2. ChatGPT + Tableau= A Match Made in AI Heaven!: How to integrate ChatGPT with Tableau via TabPy to make Tableau conversational :)
  3. How I used ChatGPT to Build a Streamlit Dashboard App: Ask ChatGPT’s help to build a Streamlit Dashboard
  4. Quantifying GPT-4’s Hidden Regressions Over Time: Some quantitative measure of the regressions of GPT-4
  5. AI Bias: Good Intentions Can Lead to Nasty Results: The kind of bias that AI could have
  6. The Future of AI Depends on Data Quality: Here’s Why: Why data quality is extremely important for AI quality