Reading Notes 2024 May - Jun
My Medium Articles!
Before I jump into the summary of the articles I read in the past two months, I am excited to announce that I have become a write on Medium myself!!! So, let me first list them here. You can find all of here. And don’t worry if you are not a Medium member, you can find all of them for free on my blog here :)
- Topic Summarization and Categorization with GPT: Use GPT-3.5 API for text analytics to categorize and summarize data science blog posts
- Boost Your Data Analysis with the New ChatGPT Desktop App: How to integrate the new ChatGPT desktop App into your daily DS workflow
- Evaluating ChatGPT in Data Science: Churn Prediction Analysis as an Example: Can ChatGPT assist or even replace a data scientist?
- My Five Key Learnings To Be a Better Data Scientist: Reflection on my six-year data science career
- 330 Weeks of Data Visualizations: My Journey and Key Takeaways: How consistent practice in data visualization enhanced my data science skills
- Mastering SQL Optimization: From Functional to Efficient Queries: Six Simple Yet Effective SQL Tips That Helped Me Reduce 50 Hours of Snowflake Query Time Every Day
Reading List in Past Two Months
Now, let’s talk about the great articles I came across in May and June:
Data Science
- Automated Causal Detection: Introduces the new automated causal detection system CR-2
- Difference-in-Difference 101: Basics of the causal inference method Difference-in-Difference (DiD): what is it, how to do DiD, and its assumptions
- An Intuitive Explanation for Inverse Propensity Weighting in Causal Inference: A great walkthrough of why Inverse Propensity Weighting (IPW) is a useful causal inference technique
- Feature Engineering that Makes Business Sense: Common categories of feature engineering ideas that make sense to your stakeholders
- Rethinking Statistical Significance: How shall we correctly understand and interpret statistical significance, is it even a fair evaluation, and how to improve it
- Commonly Used Statistical Tests in Data Science: A good summary of the five common statistical tests and when to use them
- Interpretable Outlier Detection: Frequent Patterns Outlier Factor (FPOF): Interpretability could be key to outlier detection algorithms. This article introduces a fast and intuitive one – Frequent Patterns Outlier Factor (FPOF)
- Counts Outlier Detector: Interpretable Outlier Detection: Article from the same author, introducing another interpretable outlier detection algorithm Counts Outlier Detector with detailed explanations and limitations
- Ten New Tableau Tricks I Learned at the 2024 Conference: 10 great and handy Tableau tricks and some of them I wasn’t aware of
- Behind the metric: how we developed “Time Spent Learning Well”: Duolingo talks about how they arrived at the Time Spent Learning Well (TSLM) metric from other options including total sessions and time spent learning, and how they influence TSLM with A/B testing
- How We Developed Our Addictive and Delightful Widget: I am a big fan of the amusing Duolingo widget. In this post, they talked about how the idea started, and the development looks like, and how they tested out the widget
- Meaningful Metrics: How Data Sharpened the Focus of Product Teams: Another blog from Duolingo, walking through the details of their Growth Model and how they used data techniques like Markov model to find the “movable” metrics
- How to Effectively Forecast Time Series with Amazon’s New Time Series Forecasting Model: A walkthrough of Amazon’s new time series model Chronos with code examples and performance comparison
Data Career
- Data Science Advice I Wish I Knew Sooner: Great advice on harnessing data science technical skills
- One Mindset Shift That Will Make You a Better Data Scientist: Questions to ask to better tie your data science projects to the priority of your organization
- The Data Analyst Every CEO Wants: Top recommendations to be a data analyst that is invaluable to your business
- The Two Documents Every Data Scientist Must Write Before Taking Interviews: How to better prepare for the two unavoidable interview questions “Tell Me About Yourself” and “Tell Me About A Project You’ve Worked On”
- How to Get Promoted in Data Science: Six tips to increase your chances to get promoted
- Data about Data from 1,000 Conversations with Data Teams: Interesting data points about data teams, including the relative size and the characteristics of companies with data teams
- The Art of Stress Management as a Data Scientist: Some great advice to manage stress at work, especially as a data scientist
- What 10 Years at Uber, Meta and Startups Taught Me About Data Analytics: A good summary of the differences between working at a large established organization vs. a startup
- Should You Join FAANG or a Startup as a Data Scientist?: Another article discussing the differences between working at FAANG and a startup, and factors you should consider when making the decision
- How to Better Communicate as a Data Scientist: Four tips to enhance your communication at work, with data examples
- Is Data Science dead? The Rebuke: How does the evolution of AI impacts data science job market and are Data Scientists future-proof
- 9 Simple Tips to Take You From “Busy” Data Scientist to Productive Data Scientist in 2024: How to use the Eisenhower Matrix to prioritize your tasks at work and other useful tips on time management
- How to Keep on Developing as a Data Scientist: 10+ good advice on developing your data science skills and advancing your career
AI and LLM
- Improvements to data analysis in ChatGPT: OpenAI introduces their newest advancements in the data analysis feature in ChatGPT, including interactive tables and charts! Can’t wait to try them out
- Introducing GPT-4o and more tools to ChatGPT free users: Also from OpenAI’s May release on the new GPT-4o model and new tool like the desktop app
- The (lesser known) rising application of LLMs: LLM could be the perfect tool to structure unstructed data, for example, into JSON
- From Data to Decisions: Leveraging Generative AI and Data Products: The new era of data products with GenAI
- How to Think About Gen AI Use Cases: What have we learned from the traditional AI development, and how does that imply the use cases of GenAI
- The 5 Prompt Engineering Techniques AI Engineers Need to Know: Five very useful prompt engineering techniques: Few-Shot Prompting, Chain-of-Thought Prompting, Self-Consistency, Prompt Chaining, and Generated Knowledge Prompting
- How I Won Singapore’s GPT-4 Prompt Engineering Competition: A very comprehensive and advanced guide of excelling at Prompt Engineering
- Why does ChatGPT use “Delve” so much? Mystery Solved.: A theory of how the bias introduced from the human annotation tasks leads to ChatGPT using uncommon English words like ‘delve’
- Detect AI Text by Just Looking at it: Related to the article above, this one talks about the words that AI uses often but real human don’t, including ‘delve’