Reading Notes 2023 Nov - Dec

3 minute read

This post summarises the Medium blogs I read in the past two months. I have been exploring various LLM applications including GPTs and semantic search with the OpenAI embeddings API endpoint. Therefore, you will find multiple posts related to this topic. I have also been reading a lot about DS career and best data analytics practices. Hope you will enjoy the reading as well.

Data Science

Advanced Dimensionality Reduction Models Made Simple: Explanation of the Curse of Dimensionality, and different methods to reduce dimensions
Semantic Search with Embeddings: Index Anything: Walk through the process of building a semantic search pipeline
Exploring Semantic Search Using Embeddings and Vector Databases with Some Popular Use Cases: Explanation of semantic search and use case examples in e-commerce recommendation, content discovery, enterprise search, medical diagnosis, and legal research
KeyPhrase Extraction Using Sentence Embeddings(Unsupervised Learning): Walks through the architecture to extract key phrase of a document
Data Quality Score: The Next Chapter of Data Quality at Airbnb: Airbnb talks about the design of their newly invented ‘data quality score’ and its use cases
Why (and When) You Should Keep Non-significant Covariates in Your Regression Model: Difference implications of non-significant covariates and when to keep them
Understanding Instrumental Variables: What is the Instrumental Variables method in causal inference and its limitations
Discovering Causal Drivers at Scale: CausaLens has productized four approaches that greatly increase the scalability of causal discovery
A Comprehensive Guide on Causal Inference in Retail: Example use cases of causal inference in retail, challenges and assumptions
Methods for Modelling Customer Lifetime Value: The Good Stuff and the Gotchas: Basic and simple methods to model the customer lifetime value
Top 13 Statistics Mistakes Made by Data Scientists, Are You Doing These?: 13 common statistics mistakes that every data scientist should be aware of

Data Career

Would You Become a Data Strategist?: the Data Strategist is a new role on the rise in the data industry. This post talks about its responsibility and differences from the other data roles
How We Think about Data Pipelines is Changing: Why observability will be more important in data pipelining
The Problem with Data Strategy: Key capabilities of a successfully data strategy
What It Takes to Be a Senior IC at Meta: Requirements of a senior IC in data science at Meta
Deciphering the 2023 Data Job Market: Do the Numbers Suggest Oversaturation or Opportunity?: Discuss the data job market with real data
The Art of Making Quality Data Analyses: Key component of a quality data analysis: timeliness, methodology, and digestibility. Introduces the CRoss Industry Standard Process for Data Mining (CRISP-DM) framework
Why Data Projects Fail to Deliver Real-Life Impact: 5 Critical Elements to Watch Out for as an Analytics Manager: macro elements that impact the success of a data analysis
Revisiting the Death of Data Science: How data scientists should embrace the power of GenAI to advance their career

AI and LLM

OpenAI Just Released GPTs: Create Your Own ChatGPT And Make Money From It (No Coding Required): A general introduction to GPTs and monetization with GPTs
OpenAI Just Killed an Entire Market in 45 Minutes: What does the recent OpenAI announcements mean and its impact to the AI industry
Introducing Text and Code Embeddings: OpenAI official blog on text and code embeddings - high level how it works, different models, and use cases
OpenAI GPT-3 Text Embeddings - Really A New state-of-the-art in Dense Text Embeddings?: A detailed evaluation of the OpenAI text embeddings output (the initial model version)
GPT-4 Chatbot Guide: Mastering Embeddings and Personalized Knowledge Bases: A great walkthrough of how to build a chatbot that answers questions based on a knowledge base
Large Language Models and Vector Databases for News Recommendations: A specific use case of new recommendations with LLM and embeddings
Retrieval-Augmented Generation in Snowflake: NLP semantic search, embeddings, vector storage and vector similarity search in Snowflake: How to build RAG pipeline in Snowflake with OpenAI API

Share on

X Facebook LinkedIn Bluesky

Yu Dong

Reading Notes 2023 Nov - Dec

Data Science

Data Career

AI and LLM

Share on

You May Also Enjoy

Weekly Viz 2025-07-21

My 2025 Weekly Vizzes

Weekly Viz 2025-07-14

Weekly Viz 2025-07-07