Reading Notes 2023 Jul - Aug

3 minute read

This post summarises the Medium blogs I read in the past two months. Hope you will enjoy the reading as well.

DS, Analytics

From Analytics to Actual Application: the Case of Customer Lifetime Value: What is customer lifetime value and its applications
Three Common Hypothesis Tests All Data Scientists Should Know: A review of three types of common hypothesis tests
Why You Sometimes Need to Break the Rules in Data Viz: General data viz rules and when you might need to break them
The Synthetic Data Field Guide: Common types of sythetic data
Why would you want synthetic data?: Following the above post - why synthetic data is useful in real world
AI-Generated Synthetic Data: Why AI-generated synthetic data could be exciting for machine learning
The Pros and Cons of Synthetic Data: why and why not use synthetic data, and specifically AI-genreated synthetic data
Stop Using PowerPoint for Your ML Presentations and Try This Instead: An interesting new tool to present ML model and performance
Decoding the Customer Journey with Graph Node Embeddings: How to structure the customer journey data in a graph database and how to analyze it
The Five Types of A/B Test Decisions: Common results of A/B tests and how you should react to each one of them
Why You Need a Knowledge Graph, And How to Build It: When and how a graph database could be more useful than a relational database
New SHAP Plots: Violin and Heatmap: New types of plots that SHAP package introduces
5 Common Data Governance Pain Points for Analysts & Data Scientists: The lifecycle of data governance and what could go wrong
How Pinterest Leverages Realtime User Actions in Recommendation to Boost Homefeed Engagement Volume: Pinterest talks about how they incoperate realtime user actions into their transformer encoder recommendation system
How we use AutoML, Multi-task learning and Multi-tower models for Pinterest Ads: The AutoML infra at Pinterest
Lessons Learnt From Consolidating ML Models in a Large Scale Recommendation System: How Netflix consolidate their ML models for better scalability

DS Career

How Poor Stakeholder Management Ruins Analytics: Why stakeholder management is important and common problems
Why Data Scientists and Engineers Quit Their Jobs: Common reasons of data scientists and engineers quitting
A Spotify Data Scientist’s Guide to Turning Your Insights into Impactful Actions: Maths + Code + Business Acumen + Soft Skills = Data Scientist Formula
Is Decision Science Quietly Becoming the New Data Science?: Look into the new data science job family - ‘decision scientist’
9 Techniques to Find Stories of Data: A summary of common techniques that can help interpret data

AI and LLM

How to Use ChatGPT to Learn Data Science Faster, Even If You Are Already Advanced: An interesting post to use ChatGPT as a data science tutor
Revolutionizing AI Interactions: Unpacking OpenAI’s Function Calling Capability in the Chat Completions API: The ‘Function Calling’ feature in OpenAI API and its applications
OpenAI Function Calling Examples: Examples of using ‘Function Calling’ to get standard responses
How is AI Disrupting Data Governance?: Potential ways that AI will change data governance
5 ChatGPT plugins That Will put you ahead of 99% of Data Scientists: ChatGPT plugins that are specifically helpful with data science
Thunking vs Thinking: Whose Job Does AI Automate?: A discussion of what kind of job does AI automate
Automating Data Analytics with ChatGPT: Walks through an infra of automating analytics with chatGPT
5 Ways Generative AI Changes How Companies Approach Data (And How It Doesn’t): Potential benefits and challenges brought by Generative AI
A Gentle Introduction to Open Source Large Language Models: Review of the recent open source LLM trends and the most powerful ones
Bye-bye ChatGPT: AI Tools As Good As ChatGPT (But Few People Are Using Them): Interesting AI tool additional to ChatGPT, including Auto-GPT, Playground, Jasper, Quillbot
How to Use LLMs to Build Better Clustering Models: Use embeddings from LLMs to build clustering models, and comparison with traditional clustering methods
Why Trust and Safety in Enterprise AI Is (Relatively) Easy: Why traditional AI has the reliability advantage over generative AI
Brex’s AI-Powered Engine for Identifying Customer Insights: Last but not least, our very own recent applications of OpenAI API to auto-categorize and analyze customer feedback at Brex!

Share on

X Facebook LinkedIn Bluesky

Yu Dong

Reading Notes 2023 Jul - Aug

DS, Analytics

DS Career

AI and LLM

Share on

You May Also Enjoy

Cost of Living China vs. India

My 2026 Weekly Vizzes

Your First 90 Days as a Data Scientist

US YoY Rental Costs