About DeepSeek v3

DeepSeek V3 is a large language model (LLM) developed by DeepSeek AI.

Trained by a company, but the model is shared with any of us under MIT License.

There are various theories around there about how it got the knowledge…

…but what it matters is that we can SelfHost DeepSeek v3 and try it out.

And if that way, you are not forced to send your queries to any third party.

Key Strengths and Focus Areas:

  • Coding Prowess: DeepSeek V3 is particularly noted for its strong performance in code generation, understanding, and debugging. It’s likely trained on a massive dataset of code in various programming languages. Benchmarks and comparisons often highlight its competitive edge in coding tasks.
  • Mathematical Reasoning: DeepSeek V3 is also designed to handle mathematical problems and logical reasoning challenges. This suggests that its training data included a substantial amount of mathematical text and code.
  • Multilingual Capabilities: While its primary focus might be on English and code-related languages, it’s likely that DeepSeek V3 has some level of multilingual ability, although the extent of this is not always explicitly stated.
  • Performance: DeepSeek AI generally emphasizes the efficiency and performance of their models. DeepSeek V3 is likely optimized for speed and memory usage, although the specifics are usually kept confidential.

What We Can Infer:

  • Transformer Architecture: Like most modern LLMs, DeepSeek V3 is almost certainly based on the transformer architecture. This architecture is known for its ability to handle long-range dependencies in text and is the foundation of models like GPT, BERT, and others.
  • Large Dataset Training: LLMs require massive amounts of data for training. DeepSeek V3 was likely trained on a very large and diverse dataset, including text from the internet, books, code repositories, and potentially other sources.
  • Scaling: The performance of LLMs often improves with scale (more parameters, more data). DeepSeek V3 is likely a very large model, although the exact number of parameters might not be publicly disclosed.

How it Compares (General Trends):

It’s difficult to make direct comparisons without access to detailed benchmarks and specifications. However, DeepSeek V3 is positioned as a competitor to other leading LLMs. Generally, newer models tend to push the boundaries in:

  • Performance: Improved accuracy and efficiency on various NLP tasks.
  • Specialized Skills: Focusing on particular domains like coding or math.
  • Efficiency: Reducing the computational resources required for inference.

DeepSeek v3 Source

A strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token.

Running DeepSeek v3 Locally

Im going to show you how to use Deepseek v3 locally, thanks to:

  1. Containers: Docker or Podman!
  2. Ollama
  3. Open Web UI (ex Ollama Web UI)

Star History Chart

Thanks to Ollama, we can do:

docker exec -it ollama /bin/bash
#https://ollama.com/library/deepseek-v3

ollama run deepseek-v3