Hi Leo Chow,
Currently, the DeepSeek-R1 model is in Preview mode, and it supports a maximum context length of 128k tokens. This extended context length enables the model to excel at complex reasoning tasks, including language understanding, scientific reasoning, and coding.

Hope this helps. Do let us know if you have any further queries.


If this answers your query, do click Accept Answer and Yes for was this answer helpful.

Answer from Pavankumar Purilla on learn.microsoft.com
🌐
DeepSeek
api-docs.deepseek.com › models & pricing
Models & Pricing | DeepSeek API Docs
The prices listed below are in units of per 1M tokens. A token, the smallest unit of text that the model recognizes, can be a word, a number, or even a punctuation mark. We will bill based on the total number of input and output tokens by the model.
🌐
Hugging Face
huggingface.co › deepseek-ai › DeepSeek-R1
deepseek-ai/DeepSeek-R1 · Hugging Face
For all our models, the maximum generation length is set to 32,768 tokens. For benchmarks requiring sampling, we use a temperature of $0.6$, a top-p value of $0.95$, and generate 64 responses per query to estimate pass@1. You can chat with ...
🌐
AWS
docs.aws.amazon.com › amazon bedrock › user guide › amazon bedrock foundation model information › inference request parameters and response fields for foundation models › deepseek models
DeepSeek models - Amazon Bedrock
For optimal response quality with DeepSeek-R1, limit the max_tokens parameter to 8,192 tokens or fewer. While the API accepts up to 32,768 tokens, response quality significantly degrades above 8,192 tokens.
🌐
GitHub
github.com › langgenius › dify › issues › 13491
Deepseek R1 Max Token can not set to more than 4092 when using OpenAI campatiable model to set up API · Issue #13491 · langgenius/dify
December 14, 2024 - Deepseek R1 Max Token can not set to more than 4092 when using OpenAI campatiable model to set up API#13491
Published   Feb 10, 2025
🌐
GitHub
github.com › deepseek-ai › DeepSeek-R1
GitHub - deepseek-ai/DeepSeek-R1
For all our models, the maximum generation length is set to 32,768 tokens. For benchmarks requiring sampling, we use a temperature of ... You can chat with DeepSeek-R1 on DeepSeek's official website: chat.deepseek.com, and switch on the button ...
Starred by 91.6K users
Forked by 11.8K users
🌐
DeepSeek
api-docs.deepseek.com › deepseek-r1 release 2025/01/20
DeepSeek-R1 Release | DeepSeek API Docs
⚙️ Use DeepSeek-R1 by setting model=deepseek-reasoner · 💰 $0.14 / million input tokens (cache hit) 💰 $0.55 / million input tokens (cache miss) 💰 $2.19 / million output tokens ·
Find elsewhere
🌐
Stack Overflow
stackoverflow.com › questions › 79406917 › how-can-i-accurately-count-tokens-for-llama3-deepseek-r1-prompts-when-groq-api-r
python - How can I accurately count tokens for Llama3/DeepSeek r1 prompts when Groq API reports “Request too large”? - Stack Overflow
When I build my prompt,the deepseek-ai/DeepSeek-R1-Distill-Llama-70B and meta-llama/Meta-Llama-3-8B tokenizers gave me same token count of 7209 tokens. Then the GPT‑2 tokenizer gives me a token count of around 21,204 tokens, and I also did ...
🌐
oTTomator
thinktank.ottomator.ai › bolt.diy › [bolt.diy] general discussion
Can the DeepSeek R1/V3 API Output Token Limit Be Increased Beyond 8,000 Tokens?
January 26, 2025 - Is there a way to configure the DeepSeek R1 and V3 API endpoints to allow output token limits higher than 8,000? On bolt.new, I was able to create a website with 100,000 tokens, but bolt.diy currently restricts usage to …
🌐
DataCamp
datacamp.com › blog › deepseek-r1
DeepSeek-R1: Features, o1 Comparison, Distilled Models & More | DataCamp
June 4, 2025 - For example, the API’s deepseek-reasoner model supports a maximum output length of 64,000 tokens, which includes reasoning steps (Chain-of-Thought) and the final answer. Running DeepSeek-R1 or its distilled models locally requires high-performance ...
🌐
Hugging Face
huggingface.co › deepseek-ai › DeepSeek-R1-0528
deepseek-ai/DeepSeek-R1-0528 · Hugging Face
For all our models, the maximum ... to 64K tokens. For benchmarks requiring sampling, we use a temperature of $0.6$, a top-p value of $0.95$, and generate 16 responses per query to estimate pass@1. Note: We use Agentless framework to evaluate ...
🌐
Artificial Analysis
artificialanalysis.ai › models › deepseek-r1
DeepSeek R1 0528 - Intelligence, Performance & Price Analysis
Analysis of DeepSeek's DeepSeek R1 0528 (May '25) and comparison to other AI models across key metrics including quality, price, performance (tokens per second & time to first token), context window & more.
🌐
APIpie
apipie.ai › deepseek
DeepSeek AI by APIpie: Overview & Key Features | APIpie
Extended Token Capacity: Models support context lengths from 32K to 131K tokens for handling various text processing needs. Multi-Provider Availability: Accessible across platforms like OpenRouter, Together, and Amazon Bedrock.
🌐
Prompthub
prompthub.us › models › deepseek-reasoner-r1
DeepSeek Reasoner (R1) Model Card
‍DeepSeek Reasoner (R1) supports a context window of up to 64,000 tokens.
🌐
Reddit
reddit.com › r/llmdevs › how to use deepseek r1 via groq: a step-by-step guide
r/LLMDevs on Reddit: How to Use Deepseek R1 via Groq: A Step-by-Step Guide
February 6, 2025 -

Deepseek R1 is a powerful AI model, and with Groq’s high-speed inference, you can get lightning-fast responses. If you're looking to integrate Deepseek R1 distill with Groq, here's how you can do it.

Direct model link: https://console.groq.com/playground?model=deepseek-r1-distill-llama-70b

Set Up the API Request

You need to send a POST request to Groq’s API endpoint:

📌 URL:
https://api.groq.com/openai/v1/chat/completions

📌 Headers:

  • Authorization: Bearer <your-api-key>

📌 Request Body (JSON format):

{   "messages": [     {       "role": "system",       "content": "Please answer in English only"     },     {       "role": "user",       "content": "Deepseek R1 vs OpenAI O1"     }   ],   "model": "deepseek-r1-distill-llama-70b",   "temperature": 0.6,   "max_completion_tokens": 4096,   "top_p": 0.95,   "stream": false,   "stop": null } 

👉 Replace <your-api-key> with your actual API key.

Why Use Groq for Deepseek R1?

Faster Inference – Groq’s hardware accelerates LLM responses significantly.
Easy API Integration – Works seamlessly with OpenAI-style API requests.
High Token Limit – Supports long responses up to 131072 tokens.

💡 Pro Tip: Adjust the temperature and top_p parameters to fine-tune response randomness and creativity.

Have you tried using Deepseek R1 via Groq? Share your experiences in the comments! 🚀

Download the n8n template: https://drive.google.com/file/d/1ImStl41g32DD7RdcKP0YYAqO4q18jhWI/view?usp=download

🌐
GitHub
github.com › unslothai › unsloth › issues › 1591
How to run DeepSeek-R1 IQ1_S 1.58bit at 140 Token/Sec · Issue #1591 · unslothai/unsloth
January 28, 2025 - Following the blog post Run DeepSeek R1 Dynamic 1.58-bit I tried to reproduce the 140 token/second when running DeepSeek-R1-UD-IQ1_S i.e. 1.58-bit / 131GB / IQ1_S. My setup was to offload to gpu all layers: ./llama.cpp/build/bin/llama-cl...
Published   Jan 28, 2025
🌐
NextBigFuture
nextbigfuture.com › home › artificial intelligence › open source deepseek r1 runs at 200 tokens per second on raspberry pi
Open Source DeepSeek R1 Runs at 200 Tokens Per Second on Raspberry Pi | NextBigFuture.com
January 26, 2025 - Experimenters have had overnight tests confirming they have OPEN SOURCE DeepSeek R1 running at 200 tokens per second on a NON-INTERNET connected Raspberry Pi.