3 minute read

This post summarises the Medium blogs I read in the past two months. I have been exploring various LLM applications including GPTs and semantic search with the OpenAI embeddings API endpoint. Therefore, you will find multiple posts related to this topic. I have also been reading a lot about DS career and best data analytics practices. Hope you will enjoy the reading as well.

Data Science

  1. Advanced Dimensionality Reduction Models Made Simple: Explanation of the Curse of Dimensionality, and different methods to reduce dimensions
  2. Semantic Search with Embeddings: Index Anything: Walk through the process of building a semantic search pipeline
  3. Exploring Semantic Search Using Embeddings and Vector Databases with Some Popular Use Cases: Explanation of semantic search and use case examples in e-commerce recommendation, content discovery, enterprise search, medical diagnosis, and legal research
  4. KeyPhrase Extraction Using Sentence Embeddings(Unsupervised Learning): Walks through the architecture to extract key phrase of a document
  5. Data Quality Score: The Next Chapter of Data Quality at Airbnb: Airbnb talks about the design of their newly invented ‘data quality score’ and its use cases
  6. Why (and When) You Should Keep Non-significant Covariates in Your Regression Model: Difference implications of non-significant covariates and when to keep them
  7. Understanding Instrumental Variables: What is the Instrumental Variables method in causal inference and its limitations
  8. Discovering Causal Drivers at Scale: CausaLens has productized four approaches that greatly increase the scalability of causal discovery
  9. A Comprehensive Guide on Causal Inference in Retail: Example use cases of causal inference in retail, challenges and assumptions
  10. Methods for Modelling Customer Lifetime Value: The Good Stuff and the Gotchas: Basic and simple methods to model the customer lifetime value
  11. Top 13 Statistics Mistakes Made by Data Scientists, Are You Doing These?: 13 common statistics mistakes that every data scientist should be aware of

Data Career

  1. Would You Become a Data Strategist?: the Data Strategist is a new role on the rise in the data industry. This post talks about its responsibility and differences from the other data roles
  2. How We Think about Data Pipelines is Changing: Why observability will be more important in data pipelining
  3. The Problem with Data Strategy: Key capabilities of a successfully data strategy
  4. What It Takes to Be a Senior IC at Meta: Requirements of a senior IC in data science at Meta
  5. Deciphering the 2023 Data Job Market: Do the Numbers Suggest Oversaturation or Opportunity?: Discuss the data job market with real data
  6. The Art of Making Quality Data Analyses: Key component of a quality data analysis: timeliness, methodology, and digestibility. Introduces the CRoss Industry Standard Process for Data Mining (CRISP-DM) framework
  7. Why Data Projects Fail to Deliver Real-Life Impact: 5 Critical Elements to Watch Out for as an Analytics Manager: macro elements that impact the success of a data analysis
  8. Revisiting the Death of Data Science: How data scientists should embrace the power of GenAI to advance their career

AI and LLM

  1. OpenAI Just Released GPTs: Create Your Own ChatGPT And Make Money From It (No Coding Required): A general introduction to GPTs and monetization with GPTs
  2. OpenAI Just Killed an Entire Market in 45 Minutes: What does the recent OpenAI announcements mean and its impact to the AI industry
  3. Introducing Text and Code Embeddings: OpenAI official blog on text and code embeddings - high level how it works, different models, and use cases
  4. OpenAI GPT-3 Text Embeddings - Really A New state-of-the-art in Dense Text Embeddings?: A detailed evaluation of the OpenAI text embeddings output (the initial model version)
  5. GPT-4 Chatbot Guide: Mastering Embeddings and Personalized Knowledge Bases: A great walkthrough of how to build a chatbot that answers questions based on a knowledge base
  6. Large Language Models and Vector Databases for News Recommendations: A specific use case of new recommendations with LLM and embeddings
  7. Retrieval-Augmented Generation in Snowflake: NLP semantic search, embeddings, vector storage and vector similarity search in Snowflake: How to build RAG pipeline in Snowflake with OpenAI API