Reading Notes 2024 Mar - Apr
This is a summary of the data-science related blogs I read in the past two months. Hope you will enjoy the reading as well.
Data Science
- Statistical Measures Every Analyst Must Know - Part 1: A revisit of basic statistical measures
- Customer Segmentation: More Than Clustering: A framework of customer segmentation with considerations outside of building the model
- Measuring ROI of Creator Content Investments at Meta: The analytics team at Meta talks about how they estimate the impact of Creator Content Investments
- Causal Inference Python Implementation: Walk through the CausalImpact Python package for causal inference of a given intervention
- Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data Platform: How Netflix integrates the rule-based classifier with an ML service to remediate configuration errors
- Sequential A/B Testing Keeps the World Streaming Netflix Part 2: Counting Processes: What is Sequential A/B Testing and how Netflix uses it to monitor key platform metrics
- A Guide on Estimating Long-Term Effects in A/B Tests: Why long-term and short-term effects may differ and the methods to measure the long-term effects
- How Meta tests products with strong network effects: How Meta builds clusters to eliminate the network effects in A/B testing
- Clustered Standard Errors in AB Tests: What to do when the unit of observation differs from the unit of randomization
- Top 10 Data Visualizations of 2023 Worth Looking at!: 10 great data visualization examples in 2023
- Explainability of the Features? No! Of the Hyperparameters.: An interesting method of using SHAP to explain the effect of hyper-parameters tuning
- Navigating the Netflix Data Deluge: The Imperative of Effective Data Management: Netflix talks about how to manage large-scale data effectively
- The Secret to Duolingo’s Exponential Growth: How Duolingo grew fast with experimentations and other strategies
- Why You Should Never Use Cross-Validation: When the standard random cross-validation is not appropriate
- Three Rules of Statistical Analysis from Your Statistics Class to Unlearn: How to handle statistical assumptions and outliers appropriately in real world
- A Quantitative Approach to Product Market Fit: Two common frameworks to measure Product Market Fit quantitatively
- I Analyzed 100 Dashboards. Here Are the Most Common Data Viz Errors I Saw.: Common data visualization mistakes to avoid
- 5 Emerging Trends in Data Visualization in 2024: With new techniques emerging, how is data visualization changing
Data Career
- Why Data Scientists Get Taken For Granted: The struggle of DS’s contributions going unmentioned at domain-wide meetings or in discussions among executives and other leaders
- Crafting a Data Science Portfolio That Will Actually Get You An Interview This 2024: Advice on how to build a good data science portfolio
- A Guide To Building a Data Department From Scratch: Guidance on how to build a data department at a startup
- How To Advance In Data Science: Advice to continue advancing in the DS career
- Is Data Science dead?: With all the recent advancement of Gen AI, what the future of DS would be
- How to Become a Freelance Data Analyst in 2024: The author’s experience of becoming a freelance data analyst with great advice
- 5 Passive Income Ideas For Data Scientists: Ways to earn passive income as a data scientist
- 7 Subscriptions That Help Me As A Data Scientist: Subscriptions that might help your data science career
- Why Data Scientists and Engineers Quit Their Jobs: Behind the high turnover rate from data professionals, what is the root cause
- Five Key Trends in AI and Data Science for 2024: How would AI change the data science field
AI and LLM
- GPT-4’s Prompting Effectiveness For Python Dashboards: Comparing Dash, Panel & Streamlit: Build Python dashboards with various packages, plus actual GPT prompts and code
- Data Dirtiness Score: How to quantify the data quality with LLM
- Data Quality Error Detection powered by LLMs: Following the above article, describes identifying errors in tabular data sets with LLM
- How we built Text-to-SQL at Pinterest: Pinterest’s great example of building a Text-to-SQL solution with LLM
- How I use Gen AI as a Data Engineer: AI use cases in Data Engineering
- RAG 2.0, Finally Getting RAG Right!: What is RAG 2.0 and how it is different from RAG 1.0
- Airbnb Brandometer: Powering Brand Perception Measurement on Social Media Data with AI: Use NLP methods to measure Airbnb brand perception on social media