DeepSeek V3: A New Contender in AI-Powered Data Science
How DeepSeek’s budget-friendly AI model stacks up against ChatGPT, Claude, and Gemini in SQL, EDA, and machine learning
Nvidia stock price slumped over 15% on Monday, Jan 27th, after a Chinese startup DeepSeek released its new AI model. The model performance is on par with ChatGPT, Llama, and Claude but at a fraction of the cost. According to Wired, OpenAI spent more than USD$100m to train GPT-4. But DeepSeek’s V3 model was trained for just $5.6m. This cost efficiency is also reflected in the API costs – for every 1M tokens, the deepseek-chat model (V3) costs $0.14, and the deepseek-reasoner model (R1) costs only $0.55 (DeepSeek API Pricing). Meanwhile, gpt-4o API costs $2.50 / 1M input tokens, and o1 API costs $15.00 / 1M input tokens (OpenAI API Pricing).
Always intrigued by emerging LLMs and their application in data science, I decided to put DeepSeek to the test. My goal was to see how well its chatbot (V3) model could assist or even replace data scientists in their daily tasks. I used the same criteria from my previous article series, where I evaluated the performance of ChatGPT 4o vs. Claude 3.5 Sonnet vs. Gemini Advanced on SQL queries, Exploratory Data Analysis (EDA), and Machine Learning (ML).
I. First Impressions
Here are some quick observations from my initial exploration of DeepSeek’s web chatbot UI:
- Interface: The chatbot UI seems pretty similar to ChatGPT, with all the past chats listed on the left and the current conversation in the main panel.
- Chat Labels: ChatGPT usually labels your past chats with a brief summary. However, by default, DeepSeek labels old chats with the first several words of your prompt. For new chats in the same web session, it simply shows ‘New chat’, which could be confusing when there are multiple. But of course, you can rename any chats for clarity.
- Model Options: In the message input box, you can opt to use the reasoning model (R1) or enable the web search functionality.
- Formatting: The chatbot formats keywords and code snippets neatly, making it easy to read.
data:image/s3,"s3://crabby-images/16042/160428587d6d2c11a8666420a7d39b5248a310c7" alt="DeepSeek UI"
II. SQL
1. Problem Solving (3/3)
I started by testing DeepSeek’s problem-solving skills with three challenging LeetCode SQL questions (262, 185, and 1341) that have low acceptance rates. These questions have clear question descriptions with input and output table structures and are similar to interview questions. DeepSeek aced all three questions using aggregation, filters, window functions, etc. It also offered step-by-step breakdowns and clear explanations.
data:image/s3,"s3://crabby-images/b5ac5/b5ac50ef9709e59cfdc9e03a118981335be353b2" alt="DeepSeek SQL1"
data:image/s3,"s3://crabby-images/dd43d/dd43d2edd423adbba1320701b966af527689c7cf" alt="DeepSeek SQL2"
data:image/s3,"s3://crabby-images/33883/33883bf1e8c733850f11825b29b4a5287751462d" alt="DeepSeek SQL3"
2. Business Logic (3.5/4)
Next, I uploaded four synthetic datasets to simulate real-world scenarios where table descriptions are often incomplete and you have to assume information based on how the data looks like.
Though the total size of the four CSV files is only ~300KB, I got the error message “DeepSeek can only read 40% of all files. Try replacing the attached files with smaller excerpts.” So I cut the file size down to only 30KB by truncating to the top 100 rows of each file – I no longer got the above error message, but this time it said “Oops! DeepSeek is experiencing high traffic at the moment. Please check back in a little while.” Eventually, I turned to upload screenshots of the top rows of each dataset, which worked.
data:image/s3,"s3://crabby-images/e190d/e190d6bc6b3a7e7298e6c0f7166d762586e288f9" alt="DeepSeek File Upload"
The datasets include:
- users: User-level data with demographic information.
- products: Product-level data.
- orders: Order-level data with payment information.
- ordered_products: A table linking orders and products.
I asked DeepSeek to write queries for metrics like total order amount by month from US users, monthly new user counts, top 5 best-selling product categories, and monthly user retention rate. These are common metrics that you would track at an e-commerce company. It was able to generate correct queries for the first three but made an error in the retention rate question (this was also the question other AI tools struggled with the most). However, after prompting the issue, it was able to fix the query.
data:image/s3,"s3://crabby-images/c26ae/c26aea7ea228b0373151147254362107981c3145" alt="DeepSeek Retention SQL1"
data:image/s3,"s3://crabby-images/83d0b/83d0be495291476c587442ee4d9093cf3c6c5591" alt="DeepSeek Retention SQL2"
3. Query Optimization (2.5/3)
Finally, I tested DeepSeek’s ability to optimize suboptimal SQL queries. I used inefficient code examples from my SQL optimization article. It improved queries by only selecting necessary columns, moving aggregation steps earlier, avoiding redundant de-duplication operations, etc. What I like the most is that it not only suggested improvements but provided detailed explanations of everything that could be optimized, including database-specific tips.
data:image/s3,"s3://crabby-images/c96c3/c96c3f612b381cf6943f5af384b6913de0272499" alt="DeepSeek SQL Optimization1"
data:image/s3,"s3://crabby-images/ba207/ba207f92ab415281b947e33240595148aa2cab98" alt="DeepSeek SQL Optimization2"
data:image/s3,"s3://crabby-images/db2d2/db2d2914ed3b871342a67125efc54bd8c1c0c7a4" alt="DeepSeek SQL Optimization3"
data:image/s3,"s3://crabby-images/7db44/7db442083c855798d8c2655f26984b8a52503996" alt="DeepSeek SQL Optimization4"
I only encountered an issue with the last question where I asked it to further optimize the query by adjusting the window function, but it generated a query that produced repetitive rows. It quickly corrected the issue after I pointed it out.
data:image/s3,"s3://crabby-images/991cc/991cc7743431229bc67592d53bb6bae3600c21ee" alt="DeepSeek SQL Issue1"
data:image/s3,"s3://crabby-images/375a4/375a413b4c77a284bd0a79e7d2e4ddcf621d2ae8" alt="DeepSeek SQL Issue2"
data:image/s3,"s3://crabby-images/51c87/51c87c13415b32efb7b85a275ee7f03766376045" alt="DeepSeek SQL Issue3"
SQL Performance Summary
Overall, DeepSeek performed very well in the SQL section, providing clear explanations and suggestions for SQL queries. It only made two small mistakes and managed to fix itself quickly after my prompts. However, its file upload limitations could cause inconvenience for users.
data:image/s3,"s3://crabby-images/17baa/17baa51f8d94e2df0a7e99939d562cccb06d6407" alt="DeepSeek SQL Performance"
III. Exploratory Data Analysis (EDA)
Now let’s switch gears to Exploratory Data Analysis. Due to DeepSeek’s file upload constraints, I only managed to upload a very tiny dataset (2KB) of my Medium article performance. If you are interested, you can find a detailed analysis and review of my Medium journey in my past article.
Here is my prompt:
I have been writing articles on Medium and collected this dataset of my articles' performance. You are a data science professional. Your objective today is to help me conduct a thorough exploratory data analysis (EDA) of this dataset with necessary steps, such as data cleaning, analysis and visualizations, clear insights, and actionable recommendations.
Your EDA will be used to better understand the medium earning and inform future writing strategies.
Below are the rubrics I used to evaluate the EDA capability of AI tools.
data:image/s3,"s3://crabby-images/bdc02/bdc02e9f6c1c84f082d81333ed42161352e4a355" alt="EDA Evaluation Rubrics"
1. Completeness (4/5)
DeepSeek’s EDA response was very organizedand covered most of the critical components of EDA.
Data inspection: You could click on the uploaded dataset to get a preview. However, the preview was text-based, making it hard to digest. It also does not provide any text description of the dataset. Therefore, I consider this step incomplete.
data:image/s3,"s3://crabby-images/4c59a/4c59a17da88a4e69b6ed58a6c1c37c43c0bdadb9" alt="DeepSeek EDA Inspection"
Data cleaning: DeepSeek started its report with data cleaning. It checked for missing values, and data types, and removed unnecessary columns. Though it did not display the results, it provided summaries and instructions at each step.
data:image/s3,"s3://crabby-images/49fc5/49fc5b8a4172f04b16387782592bcd0f26e1137b" alt="DeepSeek EDA Cleaning"
Univariate analysis: DeepSeek examined the distribution of earnings and other columns with Python code to run analysis and generate visualizations. It does not plot the charts in the UI, so I ran them manually in my Jupyter Notebook to validate.
data:image/s3,"s3://crabby-images/8290a/8290a277187c56bc74ff0057ae4a71f7b2e8a96b" alt="DeepSeek EDA - Univariate Analysis1"
data:image/s3,"s3://crabby-images/a94d6/a94d6e51ce583688f710babe5bccf793a8c0ae8f" alt="DeepSeek EDA - Univariate Analysis2"
Bivariate and multivariate analysis: DeepSeek explored the relationship between earnings and many other variables to understand the drivers of Medium earnings.
data:image/s3,"s3://crabby-images/0148e/0148e747b505c7a62d8a8ec992082d44ac4a5e3c" alt="DeepSeek EDA - Bivariate Analysis1"
data:image/s3,"s3://crabby-images/aea36/aea362804856cf71c2f9801bfd7c5cddb70bd29b" alt="DeepSeek EDA - Bivariate Analysis2"
Insights and recommendations: DeepSeek also provided actionable insights based on its analysis.
data:image/s3,"s3://crabby-images/7c974/7c9744dec1f62f97e7e23bd8dad5fe38678037fd" alt="DeepSeek EDA - Insights"
2. Accuracy (3/4)
I reviewed the Python script DeepSeek generated and ran it manually. While most of the code worked well, the correlation matrix section threw an error due to non-numeric columns being included. I reported the error message back, and it corrected the issue by adding df.select_dtypes(include=[np.number])
to filter on numeric columns only.
This minor error resulted in a one-point deduction.
data:image/s3,"s3://crabby-images/b52f9/b52f9f9461ebb0b0f00da893b5ba3499ac47ed0d" alt="DeepSeek EDA Accuracy"
3. Visualization (2/4)
DeepSeek did not display the visualizations in its UI, only the Python code. While they generated accurate visualizations (except for the correlation matrix error above), the overall experience was less user-friendly compared to other tools like ChatGPT. Therefore, I deducted two points for this limitation.
4. Insightfulness (4/4)
DeepSeek provided valuable insights and actionable recommendations based on its analysis. It covered content strategy, publication selection, the power of “Boost”, etc.
data:image/s3,"s3://crabby-images/7c974/7c9744dec1f62f97e7e23bd8dad5fe38678037fd" alt="DeepSeek EDA Insights"
5. Reproducibility and Documentation (3/3)
DeepSeek structured its EDA report logically, from data cleaning to analysis, and insights. The paragraph is also well formatted with bullet points, code blocks, and highlighted keywords.
EDA Performance Summary
DeepSeek delivered a logically structured EDA report with functional code and clear insights. However, its inability to display visualizations in the UI was a notable drawback – this added an additional step for users to run the code locally and adjust the charts manually.
data:image/s3,"s3://crabby-images/d68ca/d68ca18f74a40521374a97b0150f4dba62e6fe60" alt="DeepSeek EDA Performance"
III. Machine Learning (ML)
I used the same dataset to evaluate how DeepSeek could assist in Machine Learning projects. Here are my rubrics.
data:image/s3,"s3://crabby-images/6970d/6970d25cf003678f0a3c875b43d99a8240e0a034" alt="ML Evaluation Rubrics"
1. Feature Engineering (3/3)
I first asked it to conduct feature engineering with the below prompt:
I have been writing articles on Medium and collected this dataset of my articles' performance. You are a data scientist professional. I would like you to help me build a machine learning model to forecast article earnings and understand how to improve earnings.
Let's do the task step by step. First, please focus on feature engineering.
Can you suggest some feature engineering techniques that could help improve the performance of my model?
Please consider transformations, interactions between features, and any domain-specific features that might be relevant.
Provide a brief explanation for each suggested feature or transformation.
DeepSeek suggested 10 feature engineering techniques. Most methods were pretty reasonable, for example, applying log transformation on right-skewed variables, calculating engagement per view ratios, adding temporal features, etc.
data:image/s3,"s3://crabby-images/54032/540321cd9450d16575915dcdee4b80762b88a3eb" alt="DeepSeek Feature Engineering1"
data:image/s3,"s3://crabby-images/3101a/3101af7412b4f89d980bf15cfb958f9971eb9dd6" alt="DeepSeek Feature Engineering2"
data:image/s3,"s3://crabby-images/18dbd/18dbd88ad6130cb279742d8b4b4698aff9f0347f" alt="DeepSeek Feature Engineering3"
2. Model Selection (3/3)
Next, I asked the AI tools to recommend the most suitable model: “Can you recommend the most suitable machine learning models for this task? For each recommended model, provide a brief explanation of why it is appropriate and mention any important considerations for using it effectively”. DeepSeek listed eight model options, from Linear Regression and its variations, to Random Forest and other tree-based models, to Neural Networks. It provided clear pros and cons for each model, ending with a summary and actionable next steps.
data:image/s3,"s3://crabby-images/0ee88/0ee884150574c18f0a4f1cf45a89f35b13bead60" alt="DeepSeek ML Model1"
data:image/s3,"s3://crabby-images/5c311/5c311bf434820a640d113ea75984a3e94d2d3e9f" alt="DeepSeek ML Model2"
data:image/s3,"s3://crabby-images/416dc/416dce26bcf2ab33b7ba4b7040c79ad5bee9007a" alt="DeepSeek ML Model3"
3. Model Training and Evaluation (3.5/4)
Lastly, let’s see its capability in model training and evaluation. My prompt is
Can you provide the code to train a ridge regression model? Please ensure that it includes steps like splitting the data into training and testing sets and performing cross-validation. Please also suggest the appropriate evaluation metrics and potential hyperparameters tunning opportunities.
DeepSeek’s code has a clear structure and comments. It ran well and output regression coefficients. It also offered reasonable strategy of picking the right evaluation metrics and hyperparameter tunning. I followed up with the question of how to interpret the coefficients of a Ridge Regression model, it was also able to explain the interpretation methodology and challenges with multicollinearity. However, it only provided the basic code without incorporating any of its feature engineering ideas earlier in the same thread. When I asked it to add those features in, it kept erroring out with the message “The server is busy. Please try again later.” I finally got the output after four retries. I’ve noticed this in earlier threads as well – DeepSeek server did not seem very reliable and errored out often, especially with longer chats. Therefore, I deducted 0.5 points for the server reliability issue.
data:image/s3,"s3://crabby-images/4c08e/4c08e9854abaa98a2da1a981d66b69a81b159a31" alt="DeepSeek ML Training1"
data:image/s3,"s3://crabby-images/c0e27/c0e275345f681dc3c982769687dcd545d588c35a" alt="DeepSeek ML Training2"
data:image/s3,"s3://crabby-images/91176/911766482f71467b21c82f99165bdcca2c779492" alt="DeepSeek ML Training3"
ML Performance Summary
For Machine Learning use cases, similar to the other AI tools, DeepSeek excelled at suggesting feature engineering ideas, brainstorming models, and writing code. However, it required human expertise to provide guidance, ask follow-up questions, and make final calls.
data:image/s3,"s3://crabby-images/ce8a0/ce8a0170b56bbae2a939cfd62da8c90a3b2cb02f" alt="DeepSeek ML Performance"
Summary and Final Thoughts
When it comes to Data Science projects, DeepSeek v3’s performance is very much on par with ChatGPT-4o, Claude 3.5 Sonnet, and Gemini Advanced. This is especially impressive given its much lower training costs.
data:image/s3,"s3://crabby-images/5a69d/5a69d87da766debe5321732fe9964cd573952fd3" alt="LLM Final Performance Comparison"
- DeepSeek excels in coding but misses the key functionality of executing Python code and displaying visualizations directly in the UI. This is very similar to my observation of Claude 3.5 Sonnet back in Aug last year when it was also missing the interactive visualization function. However, Claude has since then added the analysis tool function (though via JavaScript and React, instead of Python), overcoming its previous drawback. DeepSeek might follow the trend and add that feature as well.
- Its server seems less reliable than the other tools right now – it feels like the early-version ChatGPT when it was first launched. The limitation of uploading files could pose a challenge to using the chatbot meaningfully in data science workflows.
- However, its free chatbot access and more affordable API costs give it a significant competitive edge, particularly for users in China and for small businesses worldwide. It could democratize access to advanced AI tools, enabling smaller companies and individual developers to leverage powerful models at a much lower cost.
- DeepSeek’s rise will for sure incentivize more AI innovations globally, both from AI giants like OpenAI and Anthropic and from smaller AI startups. Super excited to see how this space will evolve in the coming year!