Chinese artificial intelligence company

Hangzhou DeepSeek Artificial Intelligence Basic Technology Research Co., Ltd., doing business as DeepSeek, is a Chinese artificial intelligence (AI) company that develops large language models (LLMs). Based in Hangzhou, Zhejiang, DeepSeek is … Wikipedia

Factsheet

Native name 杭州深度求索人工智能基础技术研究有限公司

Company type Private

Industry Information technology
Artificial intelligence

Factsheet

Native name 杭州深度求索人工智能基础技术研究有限公司

Company type Private

Industry Information technology
Artificial intelligence

Hugging Face

huggingface.co › deepseek-ai › DeepSeek-R1

deepseek-ai/DeepSeek-R1 · Hugging Face

1 month ago - DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning.

Videos

07:37

YouTube

DeepSeek R1 Explained – The Mind-Blowing AI Model. - YouTube

January 28, 2025

08:33

YouTube

DeepSeek R1 Explained to your grandma - YouTube

January 23, 2025

youtube.com

DeepSeek-R1 Crash Course

10:07

YouTube

Dave Plummer explains Deepseek R1 - YouTube

January 28, 2025

10:04

YouTube

Microsoft New AI is 100X More Intelligent Than DeepSeek R1 - YouTube

September 1, 2025

View all

Discussions

How was DeepSeek-R1 built; For dummies

Just wrote about it, it's absolutely great, and the less is more will definitely redefine AI as we know it More on reddit.com

r/LLMDevs

59

876

January 27, 2025

Notes on Deepseek r1: Just how good it is compared to OpenAI o1

Aside from the LLM model itself, this shown that OpenAI isn't that ahead anymore from others, I mean, OpenAI still has the money and the hype, but 1 year ago, no one could beat them. The game has changed, surely. Of course OpenAI is gonna make moves, but this is a huge W for LLM in general More on reddit.com

r/LocalLLaMA

486

1266

October 25, 2024

arXiv

arxiv.org › abs › 2501.12948

[2501.12948] DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

January 22, 2025 - DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities.

GitHub

github.com › deepseek-ai › DeepSeek-R1

GitHub - deepseek-ai/DeepSeek-R1

DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning.

Starred by 91.6K users

Forked by 11.8K users

MIT Technology Review

technologyreview.com › artificial intelligence › quantum physicists have shrunk and “de-censored” deepseek r1

Quantum physicists have shrunk and “de-censored” DeepSeek R1 | MIT Technology Review

November 19, 2025 - A group of quantum physicists claims to have created a version of the powerful reasoning AI model DeepSeek R1 that strips out the censorship built into the original by its Chinese creators.

Unsloth

docs.unsloth.ai › get-started › unsloth-model-catalog

Unsloth Model Catalog | Unsloth Documentation

4 days ago - DeepSeek-R1-0528 · R1-0528-Qwen3-8B · link · link · R1-0528 · link · — · Model · Variant · GGUF · Instruct (4-bit) DeepSeek-V3.1 · Terminus · link · V3.1 · link · DeepSeek-V3 · V3-0324 · link · — · V3 · link · — · DeepSeek-R1 · R1-0528 · link ·

Find elsewhere

Google Bing Mojeek

DeepSeek

api-docs.deepseek.com › deepseek-r1 release 2025/01/20

DeepSeek-R1 Release | DeepSeek API Docs

🛠️ DeepSeek-R1: Technical Highlights · 📈 Large-scale RL in post-training · 🏆 Significant performance boost with minimal labeled data · 🔢 Math, code, and reasoning tasks on par with OpenAI-o1 · 📄 More details: https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf ·

Understanding AI

understandingai.org › p › the-best-chinese-open-weight-models

The best Chinese open-weight models — and the strongest US rivals

1 week ago - And DeepSeek R1 powered the first consumer-facing chatbot to show the full chain of thought before answering.

YouTube

youtube.com › watch

DeepSeek R1 - Everything you need to know - YouTube

51:22

Greg's DeepSeek Cheetsheet: From Installation to Expert Prompting: https://www.gregisenberg.com/deepseekRay Fernando, a former Apple engineer, gives an in-de...

Published January 29, 2025

Nature

nature.com › articles › article

DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning | Nature

September 17, 2025 - A new artificial intelligence model, DeepSeek-R1, is introduced, demonstrating that the reasoning abilities of large language models can be incentivized through pure reinforcement learning, removing the need for human-annotated demonstrations.

Ollama

ollama.com › library › deepseek-r1

deepseek-r1

DeepSeek-R1 is a family of open reasoning models with performance approaching that of leading models, such as O3 and Gemini 2.5 Pro.

DeepSeek

deepseek-r1.com

DeepSeek R1 Online (Free|Nologin)

DeepSeek R1 Online (Free|nologin) is Open-Source AI Model for Advanced Reasoning that beats Openai o1

reddit.com › r/llmdevs › how was deepseek-r1 built; for dummies

r/LLMDevs on Reddit: How was DeepSeek-R1 built; For dummies

January 27, 2025 -

Over the weekend I wanted to learn how was DeepSeek-R1 trained, and what was so revolutionary about it. So I ended up reading the paper, and wrote down my thoughts. < the article linked is (hopefully) written in a way that it's easier for everyone to understand it -- no PhD required!

Here's a "quick" summary:

1/ DeepSeek-R1-Zero is trained with pure-reinforcement learning (RL), without using labeled data. It's the first time someone tried and succeeded doing that. (that we know of, o1 report didn't show much)

2/ Traditional RL frameworks (like PPO) have something like an 'LLM coach or critic' that tells the model whether the answer was good or bad -- based on given examples (labeled data). DeepSeek uses GRPO, a pure-RL framework that skips the critic and calculates the group average of LLM answers based on predefined rules

3/ But, how can you evaluate the performance if you don't have labeled data to test against it? With this framework, the rules aren't perfect—they’re just a best guess at what "good" looks like. The RL process tries to optimize on things like:

Does the answer make sense? (Coherence)

Is it in the right format? (Completeness)

Does it match the general style we expect? (Fluency)

For example, for the DeepSeek-R1-Zero model, for mathematical tasks, the model could be rewarded for producing outputs that align to mathematical principles or logical consistency.

It makes sense.. and it works... to some extent!

4/ This model (R1-Zero) had issues with poor readability and language mixing -- something that you'd get from using pure-RL. So, the authors wanted to go through a multi-stage training process and do something that feels like hacking various training methods:

5/ What you see above is the DeepSeek-R1 model that goes through a list of training methods for different purposes

(i) the cold start data lays a structured foundation fixing issues like poor readability
(ii) pure-RL develops reasoning almost on auto-pilot
(iii) rejection sampling + SFT works with top-tier training data that improves accuracy, and
(iv) another final RL stage ensures additional level of generalization.

And with that they're doing as good as or better than o1 models.

Lmk if you have any questions (i might be able to answer them).

Deepseek r1 vs OpenAI o1.

So, for this, I tested r1 and o1 side by side on complex reasoning, math, coding, and creative writing problems. These are the questions that o1 solved only or by none before.

Here’s what I found:

For reasoning, it is much better than any previous SOTA model until o1. It is better than o1-preview but a notch below o1. This is also shown in the ARC AGI bench.
Mathematics: It's also the same for mathematics; r1 is a killer, but o1 is better.
Coding: I didn’t get to play much, but on first look, it’s up there with o1, and the fact that it costs 20x less makes it the practical winner.
Writing: This is where R1 takes the lead. It gives the same vibes as early Opus. It’s free, less censored, has much more personality, is easy to steer, and is very creative compared to the rest, even o1-pro.

What interested me was how free the model sounded and thought traces were, akin to human internal monologue. Perhaps this is because of the less stringent RLHF, unlike US models.

The fact that you can get r1 from v3 via pure RL was the most surprising.

For in-depth analysis, commentary, and remarks on the Deepseek r1, check out this blog post: Notes on Deepseek r1

What are your experiences with the new Deepseek r1? Did you find the model useful for your use cases?