3 minute read

This is a summary of the data-science related blogs I read in the past two months. Hope you will enjoy the reading as well.

Data Science

  1. Statistical Measures Every Analyst Must Know - Part 1: A revisit of basic statistical measures
  2. Customer Segmentation: More Than Clustering: A framework of customer segmentation with considerations outside of building the model
  3. Measuring ROI of Creator Content Investments at Meta: The analytics team at Meta talks about how they estimate the impact of Creator Content Investments
  4. Causal Inference Python Implementation: Walk through the CausalImpact Python package for causal inference of a given intervention
  5. Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data Platform: How Netflix integrates the rule-based classifier with an ML service to remediate configuration errors
  6. Sequential A/B Testing Keeps the World Streaming Netflix Part 2: Counting Processes: What is Sequential A/B Testing and how Netflix uses it to monitor key platform metrics
  7. A Guide on Estimating Long-Term Effects in A/B Tests: Why long-term and short-term effects may differ and the methods to measure the long-term effects
  8. How Meta tests products with strong network effects: How Meta builds clusters to eliminate the network effects in A/B testing
  9. Clustered Standard Errors in AB Tests: What to do when the unit of observation differs from the unit of randomization
  10. Top 10 Data Visualizations of 2023 Worth Looking at!: 10 great data visualization examples in 2023
  11. Explainability of the Features? No! Of the Hyperparameters.: An interesting method of using SHAP to explain the effect of hyper-parameters tuning
  12. Navigating the Netflix Data Deluge: The Imperative of Effective Data Management: Netflix talks about how to manage large-scale data effectively
  13. The Secret to Duolingo’s Exponential Growth: How Duolingo grew fast with experimentations and other strategies
  14. Why You Should Never Use Cross-Validation: When the standard random cross-validation is not appropriate
  15. Three Rules of Statistical Analysis from Your Statistics Class to Unlearn: How to handle statistical assumptions and outliers appropriately in real world
  16. A Quantitative Approach to Product Market Fit: Two common frameworks to measure Product Market Fit quantitatively
  17. I Analyzed 100 Dashboards. Here Are the Most Common Data Viz Errors I Saw.: Common data visualization mistakes to avoid
  18. 5 Emerging Trends in Data Visualization in 2024: With new techniques emerging, how is data visualization changing

Data Career

  1. Why Data Scientists Get Taken For Granted: The struggle of DS’s contributions going unmentioned at domain-wide meetings or in discussions among executives and other leaders
  2. Crafting a Data Science Portfolio That Will Actually Get You An Interview This 2024: Advice on how to build a good data science portfolio
  3. A Guide To Building a Data Department From Scratch: Guidance on how to build a data department at a startup
  4. How To Advance In Data Science: Advice to continue advancing in the DS career
  5. Is Data Science dead?: With all the recent advancement of Gen AI, what the future of DS would be
  6. How to Become a Freelance Data Analyst in 2024: The author’s experience of becoming a freelance data analyst with great advice
  7. 5 Passive Income Ideas For Data Scientists: Ways to earn passive income as a data scientist
  8. 7 Subscriptions That Help Me As A Data Scientist: Subscriptions that might help your data science career
  9. Why Data Scientists and Engineers Quit Their Jobs: Behind the high turnover rate from data professionals, what is the root cause
  10. Five Key Trends in AI and Data Science for 2024: How would AI change the data science field

AI and LLM

  1. GPT-4’s Prompting Effectiveness For Python Dashboards: Comparing Dash, Panel & Streamlit: Build Python dashboards with various packages, plus actual GPT prompts and code
  2. Data Dirtiness Score: How to quantify the data quality with LLM
  3. Data Quality Error Detection powered by LLMs: Following the above article, describes identifying errors in tabular data sets with LLM
  4. How we built Text-to-SQL at Pinterest: Pinterest’s great example of building a Text-to-SQL solution with LLM
  5. How I use Gen AI as a Data Engineer: AI use cases in Data Engineering
  6. RAG 2.0, Finally Getting RAG Right!: What is RAG 2.0 and how it is different from RAG 1.0
  7. Airbnb Brandometer: Powering Brand Perception Measurement on Social Media Data with AI: Use NLP methods to measure Airbnb brand perception on social media