3 minute read

This post summarises the Medium blogs I read in the past two months. Hope you will enjoy the reading as well.

DS, Analytics

  1. From Analytics to Actual Application: the Case of Customer Lifetime Value: What is customer lifetime value and its applications
  2. Three Common Hypothesis Tests All Data Scientists Should Know: A review of three types of common hypothesis tests
  3. Why You Sometimes Need to Break the Rules in Data Viz: General data viz rules and when you might need to break them
  4. The Synthetic Data Field Guide: Common types of sythetic data
  5. Why would you want synthetic data?: Following the above post - why synthetic data is useful in real world
  6. AI-Generated Synthetic Data: Why AI-generated synthetic data could be exciting for machine learning
  7. The Pros and Cons of Synthetic Data: why and why not use synthetic data, and specifically AI-genreated synthetic data
  8. Stop Using PowerPoint for Your ML Presentations and Try This Instead: An interesting new tool to present ML model and performance
  9. Decoding the Customer Journey with Graph Node Embeddings: How to structure the customer journey data in a graph database and how to analyze it
  10. The Five Types of A/B Test Decisions: Common results of A/B tests and how you should react to each one of them
  11. Why You Need a Knowledge Graph, And How to Build It: When and how a graph database could be more useful than a relational database
  12. New SHAP Plots: Violin and Heatmap: New types of plots that SHAP package introduces
  13. 5 Common Data Governance Pain Points for Analysts & Data Scientists: The lifecycle of data governance and what could go wrong
  14. How Pinterest Leverages Realtime User Actions in Recommendation to Boost Homefeed Engagement Volume: Pinterest talks about how they incoperate realtime user actions into their transformer encoder recommendation system
  15. How we use AutoML, Multi-task learning and Multi-tower models for Pinterest Ads: The AutoML infra at Pinterest
  16. Lessons Learnt From Consolidating ML Models in a Large Scale Recommendation System: How Netflix consolidate their ML models for better scalability

DS Career

  1. How Poor Stakeholder Management Ruins Analytics: Why stakeholder management is important and common problems
  2. Why Data Scientists and Engineers Quit Their Jobs: Common reasons of data scientists and engineers quitting
  3. A Spotify Data Scientist’s Guide to Turning Your Insights into Impactful Actions: Maths + Code + Business Acumen + Soft Skills = Data Scientist Formula
  4. Is Decision Science Quietly Becoming the New Data Science?: Look into the new data science job family - ‘decision scientist’
  5. 9 Techniques to Find Stories of Data: A summary of common techniques that can help interpret data

AI and LLM

  1. How to Use ChatGPT to Learn Data Science Faster, Even If You Are Already Advanced: An interesting post to use ChatGPT as a data science tutor
  2. Revolutionizing AI Interactions: Unpacking OpenAI’s Function Calling Capability in the Chat Completions API: The ‘Function Calling’ feature in OpenAI API and its applications
  3. OpenAI Function Calling Examples: Examples of using ‘Function Calling’ to get standard responses
  4. How is AI Disrupting Data Governance?: Potential ways that AI will change data governance
  5. 5 ChatGPT plugins That Will put you ahead of 99% of Data Scientists: ChatGPT plugins that are specifically helpful with data science
  6. Thunking vs Thinking: Whose Job Does AI Automate?: A discussion of what kind of job does AI automate
  7. Automating Data Analytics with ChatGPT: Walks through an infra of automating analytics with chatGPT
  8. 5 Ways Generative AI Changes How Companies Approach Data (And How It Doesn’t): Potential benefits and challenges brought by Generative AI
  9. A Gentle Introduction to Open Source Large Language Models: Review of the recent open source LLM trends and the most powerful ones
  10. Bye-bye ChatGPT: AI Tools As Good As ChatGPT (But Few People Are Using Them): Interesting AI tool additional to ChatGPT, including Auto-GPT, Playground, Jasper, Quillbot
  11. How to Use LLMs to Build Better Clustering Models: Use embeddings from LLMs to build clustering models, and comparison with traditional clustering methods
  12. Why Trust and Safety in Enterprise AI Is (Relatively) Easy: Why traditional AI has the reliability advantage over generative AI
  13. Brex’s AI-Powered Engine for Identifying Customer Insights: Last but not least, our very own recent applications of OpenAI API to auto-categorize and analyze customer feedback at Brex!