Reading Notes 2024 Jul - Aug
My Medium Articles!
In the past two months, I continued writing data science and AI contents on Medium. I am super excited to have more than 1k followers now! Below are the articles I posted lately. You can also find a copy on my blog.
- Building a Standout Data Science Portfolio: A Comprehensive Guide: My tips on how to set up a data science portfolio, its content strategy, and what makes a good portfolio.
- Evaluating ChatGPT’s Data Analysis Improvements: Interactive Tables and Charts: My evalation of ChatGPT’s new interactive tables and charts feature, with my assumption of ChatGPT’s future development.
- Navigating Data Science: B2C vs. B2B Analytics: Differences between data science and analytics at B2C and B2B business based on my industry experiences.
- ChatGPT vs. Claude vs. Gemini for Data Analysis (Part 1): Evaluation of which AI tool writes the best SQL query based on thier accuracy, efficiency, formatting, and explanation.
- Build a RAG-Based Chatbot to Retrieve Visualizations in 3 Steps: A step-by-step guide to creating a visualization discovery chatbot with OpenAI API, FAISS, and Streamlit
- ChatGPT vs. Claude vs. Gemini for Data Analysis (Part 2): Who’s the Best at EDA?: Compare ChatGPT, Claude, and Gemini in tackling Exploratory Data Analysis
- ChatGPT vs. Claude vs. Gemini for Data Analysis (Part 3): Best AI Assistant for Machine Learning: How AI can accelerate your ML projects from feature engineering to model training
Reading List in Past Two Months
Now, let’s talk about the great articles I came across in July and Augest:
Data Science & Analytics
- Rethinking How We Evaluate The New York Times Subscription Performance: An exploration into The New York Times Growth Data team’s process of designing and building a new subscription reporting model.
- Forget Statistical Tests: A/B Testing Is All About Simulations: How to understand A/B testing intuitively with simulations
- My First Billion (of Rows) in DuckDB: An experimentation in DuckDB, showing its strengths
- The Ultimate Guide to Finding Outliers in Your Time-Series Data (Part 1): This article explores both visual and statistical methods to identify outliers effectively in time-series data
- The Ultimate Guide to Finding Outliers in Your Time-Series Data (Part 2): Built up on the last article to cover machine learning methods for outlier detection.
- 9 Key Differences Between B2B and B2C Marketing: A great summary of marketing in B2B vs. B2C, informing different data strategies.
- Delivering Faster Analytics at Pinterest: Pinterest shares their experience of launching our Analytics app on StarRocks.
- Friendly Introduction to Deep Learning Architectures: Short but easy-to-understand summary of CNN, RNN, GAN, Transformers, and Encoder-Decoder Architectures.
- Predictive Marketing Mix Modeling with GLOP: The Perfect Cocktail Shaker: How to use GLOP (Google Linear Optimization Package) to optimize Return on Ad Spend (ROAS).
- Why Polars Destroy Pandas in All Possible Ways for Data Scientists?: Compare Polars and Pandas and explains why Polars has better performance.
- Improve Your Next Experiment by Learning Better Proxy Metrics From Past Experiments: Netflix’s new method to establish the causal relationship for long-term outcomes.
Data Career
- A Product Manager’s Guide to Roadmap Prioritization for Data Analytics Team: How to prioritize data analytics work using frameworks like RICE Score Ranking and Stack Ranking.
- The 4 Boring Ways I Doubled My Company’s Revenue In Less Than 30 Days: Simple and data-driven revenue growth strategy.
- Leading by Doing: Lessons Learned as a Data Science Manager and Why I’m Opting for a Return to an Individual Contributor Role: The author’s experience as a IC vs. manager in data science, and why they decided to return to an IC role.
- How Do I Become Chief Analytics Officer?: What does the career path looks like for data science analytics, and what makes a good Chief Analytics Officer.
AI and LLM
- 17 (Advanced) RAG Techniques to Turn Your LLM App Prototype into a Production-Ready Solution: Important techniques to improve the performance of a RAG pipeline.
- Building RAG application using Langchain 🦜, OpenAI 🤖, FAISS: Walks through an example of creating a PDF chatbot with RAG.
- From Data to Visualization with the OpenAI Assistants API and GPT-4o: How to create an AI assistant to conduct data visualization tasks.
- Claude-3: Data Analysts, Prepare for a New Challenger!: Review Calude-3 models’ capability in tasks like writing SQL queries, image analysis, and web page summaries.
- How I Built ‘University Course Finder’ Using RAG: A real example of creating a RAG application with Verba.
- Multimodal RAG — Intuitively and Exhaustively Explained: A brief introduction to RAG, and discusses various methods to build a RAG application from different types of data.
- How I Built My First RAG Pipeline: An overview of RAG framework with code examples.
- Let’s Build AI-Powered Case Discovery for Law Firms From Scratch: Use AI to retrieve law case documents.
- 5 Proven Query Translation Techniques to Boost Your RAG Performance: 5 very practical tips to improve the RAG performance with clear examples.
- Start With Why AI: When AI solutions are appropriate.
- The Evolution of SQL: How to design a text-to-SQL solution with AI.
- Don’t Limit Your RAG Knowledgebase to Just Text: Use images as the data source of your RAG.
- A busy person’s Intro to AI Agents: The history of AI agents and what AI agents can do.
- Is Prompt Engineering Dead?: Introduce Anthropic’s prompt generator, which is a powerful tool designed to simplify the process of creating effective prompts for AI models like Claude.