Reading Notes 2023 Jul - Aug
This post summarises the Medium blogs I read in the past two months. Hope you will enjoy the reading as well.
DS, Analytics
- From Analytics to Actual Application: the Case of Customer Lifetime Value: What is customer lifetime value and its applications
- Three Common Hypothesis Tests All Data Scientists Should Know: A review of three types of common hypothesis tests
- Why You Sometimes Need to Break the Rules in Data Viz: General data viz rules and when you might need to break them
- The Synthetic Data Field Guide: Common types of sythetic data
- Why would you want synthetic data?: Following the above post - why synthetic data is useful in real world
- AI-Generated Synthetic Data: Why AI-generated synthetic data could be exciting for machine learning
- The Pros and Cons of Synthetic Data: why and why not use synthetic data, and specifically AI-genreated synthetic data
- Stop Using PowerPoint for Your ML Presentations and Try This Instead: An interesting new tool to present ML model and performance
- Decoding the Customer Journey with Graph Node Embeddings: How to structure the customer journey data in a graph database and how to analyze it
- The Five Types of A/B Test Decisions: Common results of A/B tests and how you should react to each one of them
- Why You Need a Knowledge Graph, And How to Build It: When and how a graph database could be more useful than a relational database
- New SHAP Plots: Violin and Heatmap: New types of plots that SHAP package introduces
- 5 Common Data Governance Pain Points for Analysts & Data Scientists: The lifecycle of data governance and what could go wrong
- How Pinterest Leverages Realtime User Actions in Recommendation to Boost Homefeed Engagement Volume: Pinterest talks about how they incoperate realtime user actions into their transformer encoder recommendation system
- How we use AutoML, Multi-task learning and Multi-tower models for Pinterest Ads: The AutoML infra at Pinterest
- Lessons Learnt From Consolidating ML Models in a Large Scale Recommendation System: How Netflix consolidate their ML models for better scalability
DS Career
- How Poor Stakeholder Management Ruins Analytics: Why stakeholder management is important and common problems
- Why Data Scientists and Engineers Quit Their Jobs: Common reasons of data scientists and engineers quitting
- A Spotify Data Scientist’s Guide to Turning Your Insights into Impactful Actions: Maths + Code + Business Acumen + Soft Skills = Data Scientist Formula
- Is Decision Science Quietly Becoming the New Data Science?: Look into the new data science job family - ‘decision scientist’
- 9 Techniques to Find Stories of Data: A summary of common techniques that can help interpret data
AI and LLM
- How to Use ChatGPT to Learn Data Science Faster, Even If You Are Already Advanced: An interesting post to use ChatGPT as a data science tutor
- Revolutionizing AI Interactions: Unpacking OpenAI’s Function Calling Capability in the Chat Completions API: The ‘Function Calling’ feature in OpenAI API and its applications
- OpenAI Function Calling Examples: Examples of using ‘Function Calling’ to get standard responses
- How is AI Disrupting Data Governance?: Potential ways that AI will change data governance
- 5 ChatGPT plugins That Will put you ahead of 99% of Data Scientists: ChatGPT plugins that are specifically helpful with data science
- Thunking vs Thinking: Whose Job Does AI Automate?: A discussion of what kind of job does AI automate
- Automating Data Analytics with ChatGPT: Walks through an infra of automating analytics with chatGPT
- 5 Ways Generative AI Changes How Companies Approach Data (And How It Doesn’t): Potential benefits and challenges brought by Generative AI
- A Gentle Introduction to Open Source Large Language Models: Review of the recent open source LLM trends and the most powerful ones
- Bye-bye ChatGPT: AI Tools As Good As ChatGPT (But Few People Are Using Them): Interesting AI tool additional to ChatGPT, including Auto-GPT, Playground, Jasper, Quillbot
- How to Use LLMs to Build Better Clustering Models: Use embeddings from LLMs to build clustering models, and comparison with traditional clustering methods
- Why Trust and Safety in Enterprise AI Is (Relatively) Easy: Why traditional AI has the reliability advantage over generative AI
- Brex’s AI-Powered Engine for Identifying Customer Insights: Last but not least, our very own recent applications of OpenAI API to auto-categorize and analyze customer feedback at Brex!