Reading Notes 2023 Nov - Dec
This post summarises the Medium blogs I read in the past two months. I have been exploring various LLM applications including GPTs and semantic search with the OpenAI embeddings API endpoint. Therefore, you will find multiple posts related to this topic. I have also been reading a lot about DS career and best data analytics practices. Hope you will enjoy the reading as well.
Data Science
- Advanced Dimensionality Reduction Models Made Simple: Explanation of the Curse of Dimensionality, and different methods to reduce dimensions
- Semantic Search with Embeddings: Index Anything: Walk through the process of building a semantic search pipeline
- Exploring Semantic Search Using Embeddings and Vector Databases with Some Popular Use Cases: Explanation of semantic search and use case examples in e-commerce recommendation, content discovery, enterprise search, medical diagnosis, and legal research
- KeyPhrase Extraction Using Sentence Embeddings(Unsupervised Learning): Walks through the architecture to extract key phrase of a document
- Data Quality Score: The Next Chapter of Data Quality at Airbnb: Airbnb talks about the design of their newly invented ‘data quality score’ and its use cases
- Why (and When) You Should Keep Non-significant Covariates in Your Regression Model: Difference implications of non-significant covariates and when to keep them
- Understanding Instrumental Variables: What is the Instrumental Variables method in causal inference and its limitations
- Discovering Causal Drivers at Scale: CausaLens has productized four approaches that greatly increase the scalability of causal discovery
- A Comprehensive Guide on Causal Inference in Retail: Example use cases of causal inference in retail, challenges and assumptions
- Methods for Modelling Customer Lifetime Value: The Good Stuff and the Gotchas: Basic and simple methods to model the customer lifetime value
- Top 13 Statistics Mistakes Made by Data Scientists, Are You Doing These?: 13 common statistics mistakes that every data scientist should be aware of
Data Career
- Would You Become a Data Strategist?: the Data Strategist is a new role on the rise in the data industry. This post talks about its responsibility and differences from the other data roles
- How We Think about Data Pipelines is Changing: Why observability will be more important in data pipelining
- The Problem with Data Strategy: Key capabilities of a successfully data strategy
- What It Takes to Be a Senior IC at Meta: Requirements of a senior IC in data science at Meta
- Deciphering the 2023 Data Job Market: Do the Numbers Suggest Oversaturation or Opportunity?: Discuss the data job market with real data
- The Art of Making Quality Data Analyses: Key component of a quality data analysis: timeliness, methodology, and digestibility. Introduces the CRoss Industry Standard Process for Data Mining (CRISP-DM) framework
- Why Data Projects Fail to Deliver Real-Life Impact: 5 Critical Elements to Watch Out for as an Analytics Manager: macro elements that impact the success of a data analysis
- Revisiting the Death of Data Science: How data scientists should embrace the power of GenAI to advance their career
AI and LLM
- OpenAI Just Released GPTs: Create Your Own ChatGPT And Make Money From It (No Coding Required): A general introduction to GPTs and monetization with GPTs
- OpenAI Just Killed an Entire Market in 45 Minutes: What does the recent OpenAI announcements mean and its impact to the AI industry
- Introducing Text and Code Embeddings: OpenAI official blog on text and code embeddings - high level how it works, different models, and use cases
- OpenAI GPT-3 Text Embeddings - Really A New state-of-the-art in Dense Text Embeddings?: A detailed evaluation of the OpenAI text embeddings output (the initial model version)
- GPT-4 Chatbot Guide: Mastering Embeddings and Personalized Knowledge Bases: A great walkthrough of how to build a chatbot that answers questions based on a knowledge base
- Large Language Models and Vector Databases for News Recommendations: A specific use case of new recommendations with LLM and embeddings
- Retrieval-Augmented Generation in Snowflake: NLP semantic search, embeddings, vector storage and vector similarity search in Snowflake: How to build RAG pipeline in Snowflake with OpenAI API