Thinking for open models

Some open models support a "thinking" mode that allows them to perform step-by-step reasoning before providing a final answer. This is useful for tasks requiring transparent logic, like mathematical proofs, intricate code debugging, or multi-step agent planning.

Model-specific guidance

The following sections provide model-specific guidance for thinking.

DeepSeek R1 0528

For DeepSeek R1 0528, reasoning is surrounded by <think></think> tags in the content field. There is no reasoning_content field.

Example request:

 curl  
-X  
POST  
 \ 
-H  
 "Authorization: Bearer 
 $( 
gcloud  
auth  
print-access-token ) 
 " 
  
 \ 
-H  
 "Content-Type: application/json" 
  
 \ 
https://us-central1-aiplatform.googleapis.com/v1/projects/test-project/locations/us-central1/endpoints/openapi/chat/completions  
-d  
 '{ 
 "model": "deepseek-ai/deepseek-r1-0528-maas", 
 "messages": [{ 
 "role": "user", 
 "content": "Who are you?" 
 }] 
 }' 
 

Example response:

  { 
  
 "choices" 
 : 
  
 [ 
  
 { 
  
 "finish_reason" 
 : 
  
 "stop" 
 , 
  
 "index" 
 : 
  
 0 
 , 
  
 "logprobs" 
 : 
  
 null 
 , 
  
 "message" 
 : 
  
 { 
  
 "content" 
 : 
  
 "<think>\nHmm, the user just asked \u201cWho are you?\u201d - 
 a simple but fundamental question. \n\nFirst, let's consider the 
 context. This is likely their first interaction, so they're probably 
 testing the waters or genuinely curious about what I am. The phrasing is 
 neutral - no urgency or frustration detected. \n\nI should keep my 
 response warm and informative without being overwhelming. Since they 
 didn't specify language preference, I'll default to English but note 
 they might be multilingual. \n\nKey points to cover:\n- My identity 
 (DeepSeek-R1)\n- My capabilities \n- My purpose (being helpful)\n- Tone: 
 friendly with emojis to seem approachable\n- Ending with an open 
 question to continue conversation\n\nThe smiley face feels appropriate 
 here - establishes friendliness. Mentioning \u201cAI assistant\u201d 
 upfront avoids confusion. Listing examples of what I can do gives 
 concrete value. Ending with \u201cHow can I help?\u201d turns it into an 
 active conversation starter. \n\nNot adding too much technical detail 
 (like model specs) since that might overwhelm a first-time user. The 
 \u201calways learning\u201d phrase subtly manages expectations about my 
 limitations.\n</think>\nI'm DeepSeek-R1, your friendly AI assistant! 
 \ud83d\ude0a \nI'm here to help you with questions, ideas, problems, or 
 just a chat\u2014whether it's about homework, writing, coding, career 
 advice, or fun trivia. I can read files (like PDFs, Word, Excel, etc.), 
 browse the web for you (if enabled), and I'm always learning to be more 
 helpful!\n\nSo\u2026 how can I help you today? \ud83d\udcac\u2728" 
 , 
  
 "role" 
 : 
  
 "assistant" 
  
 } 
  
 } 
  
 ], 
  
 "created" 
 : 
  
 1758229877 
 , 
  
 "id" 
 : 
  
 "dXXMaKrsKqaQm9IPsLeTkAQ" 
 , 
  
 "model" 
 : 
  
 "deepseek-ai/deepseek-r1-0528-maas" 
 , 
  
 "object" 
 : 
  
 "chat.completion" 
 , 
  
 "system_fingerprint" 
 : 
  
 "" 
 , 
  
 "usage" 
 : 
  
 { 
  
 "completion_tokens" 
 : 
  
 329 
 , 
  
 "prompt_tokens" 
 : 
  
 9 
 , 
  
 "total_tokens" 
 : 
  
 338 
  
 } 
 } 
 

DeepSeek v3.1

DeepSeek-V3.1 supports hybrid reasoning, which lets you turn reasoning on or off. By default, reasoning is off. To enable reasoning, include "chat_template_kwargs": { "thinking": true } in your request.

When reasoning is enabled, thinking text appears at the beginning of the content field, followed by </think> and then the answer text (for example, THINKING_TEXT</think>ANSWER_TEXT ). Note that there is no initial <think> tag.

Example request:

 curl  
-X  
POST  
 \ 
-H  
 "Authorization: Bearer 
 $( 
gcloud  
auth  
print-access-token ) 
 " 
  
 \ 
-H  
 "Content-Type: application/json" 
  
 \ 
https://us-west2-aiplatform.googleapis.com/v1/projects/test-project/locations/us-west2/endpoints/openapi/chat/completions  
-d  
 '{ 
 "model": "deepseek-ai/deepseek-v3.1-maas", 
 "messages": [{ 
 "role": "user", 
 "content": "What are the first 3 even prime numbers?" 
 }], 
 "chat_template_kwargs": { 
 "thinking": true 
 } 
 }' 
 

Example response:

  { 
  
 "choices" 
 : 
  
 [ 
  
 { 
  
 "finish_reason" 
 : 
  
 "stop" 
 , 
  
 "index" 
 : 
  
 0 
 , 
  
 "logprobs" 
 : 
  
 null 
 , 
  
 "matched_stop" 
 : 
  
 1 
 , 
  
 "message" 
 : 
  
 { 
  
 "content" 
 : 
  
 "SNIPPED</think>The only even prime number is 2. By 
 definition, a prime number is a natural number greater than 1 that has 
 no positive divisors other than 1 and itself. Since all even numbers 
 other than 2 are divisible by 2 and themselves, they have at least three 
 divisors and are not prime. Therefore, there are no second or third even 
 prime numbers. The concept of \"first three even prime numbers\" is not 
 applicable beyond the first one." 
 , 
  
 "reasoning_content" 
 : 
  
 null 
 , 
  
 "role" 
 : 
  
 "assistant" 
 , 
  
 "tool_calls" 
 : 
  
 null 
  
 } 
  
 } 
  
 ], 
  
 "created" 
 : 
  
 1758229458 
 , 
  
 "id" 
 : 
  
 "ynPMaIWPN-KeltsPh87s0Qg" 
 , 
  
 "model" 
 : 
  
 "deepseek-ai/deepseek-v3.1-maas" 
 , 
  
 "object" 
 : 
  
 "chat.completion" 
 , 
  
 "usage" 
 : 
  
 { 
  
 "completion_tokens" 
 : 
  
 1525 
 , 
  
 "prompt_tokens" 
 : 
  
 17 
 , 
  
 "prompt_tokens_details" 
 : 
  
 null 
 , 
  
 "total_tokens" 
 : 
  
 1542 
  
 } 
 } 
 

Qwen3-Next Thinking

For Qwen3-Next Thinking models, reasoning tokens are in the reasoning_content field, and normal response text is in the content field.

Example request:

 curl  
-X  
POST  
 \ 
-H  
 "Authorization: Bearer 
 $( 
gcloud  
auth  
print-access-token ) 
 " 
  
 \ 
-H  
 "Content-Type: application/json" 
  
 \ 
https://aiplatform.googleapis.com/v1/projects/test-project/locations/global/endpoints/openapi/chat/completions  
-d  
 '{ 
 "model": "qwen/qwen3-next-80b-a3b-thinking-maas", 
 "messages": [{ 
 "role": "user", 
 "content": "Who are you?" 
 }] 
 }' 
 

Example response:

  { 
  
 "choices" 
 : 
  
 [ 
  
 { 
  
 "finish_reason" 
 : 
  
 "stop" 
 , 
  
 "index" 
 : 
  
 0 
 , 
  
 "logprobs" 
 : 
  
 null 
 , 
  
 "matched_stop" 
 : 
  
 151645 
 , 
  
 "message" 
 : 
  
 { 
  
 "content" 
 : 
  
 "Hello! I'm Qwen, a large language model developed by Alibaba 
 Cloud. I can help answer questions, write stories, emails, scripts, 
 perform logical reasoning, code, and more. How can I assist you today? 
 \ud83d\ude0a" 
 , 
  
 "reasoning_content" 
 : 
  
 "Okay, the user asked, Who are you? 
 I need to respond appropriately. First, I should state my name clearly: 
 I am Qwen, a large language model developed by Alibaba Cloud. Then, 
 maybe mention my capabilities, like answering questions, writing 
 stories, emails, scripts, logical reasoning, programming, etc. Also, 
 highlight that I can express views and play games. But keep it concise 
 since the user might just want a brief intro.\n\nWait, the user might 
 not know much about me, so I should explain what I can do in simple 
 terms. Maybe add something about being trained on a large amount of 
 data, so I can handle various topics. But don't get too technical. Also, 
 maybe mention that I'm here to help. Should I ask if they have any 
 specific questions? That might be good to invite further 
 interaction.\n\nLet me check the structure. Start with a greeting, 
 introduce myself, list key abilities, and offer assistance. Keep it 
 friendly and approachable. Avoid jargon. Maybe say something like \"I'm 
 Qwen, a large language model created by Alibaba Cloud. I can help with 
 answering questions, writing, coding, and more. How can I assist you 
 today?\" That seems good.\n\nWait, the original response in Chinese 
 might be different, but the user asked in English, so the response 
 should be in English. Let me confirm. The user's query is in English, so 
 I should respond in English. Yes.\n\nAlso, check for any specific 
 details. Maybe mention that I have 128K context length, but maybe that's 
 too technical. Maybe just say \"I have a large context window\" or not 
 necessary. For a basic introduction, maybe stick to the main points 
 without too many specs. The user might not need the technical details 
 right away.\n\nSo, the response should be: Hello! I'm Qwen, a large 
 language model developed by Alibaba Cloud. I can help answer questions, 
 write stories, emails, scripts, perform logical reasoning, code, and 
 more. How can I assist you today? That's concise and covers the main 
 points.\n\nWait, in the initial problem statement, the user's question 
 is \"Who are you?\" So the standard response would be to introduce 
 myself. Yes. Also, maybe add that I'm an AI language model, but the name 
 Qwen is important. So the key elements are name, developer, 
 capabilities, and offer help.\n\nYes. Let me make sure there's no 
 mistake. Alibaba Cloud is the correct developer. Yes. So the response 
 should be accurate. Alright, that seems good.\n" 
 , 
  
 "role" 
 : 
  
 "assistant" 
 , 
  
 "tool_calls" 
 : 
  
 null 
  
 } 
  
 } 
  
 ], 
  
 "created" 
 : 
  
 1758229652 
 , 
  
 "id" 
 : 
  
 "knTMaJC0EJfM5OMP7I3xkAk" 
 , 
  
 "metadata" 
 : 
  
 { 
  
 "weight_version" 
 : 
  
 "default" 
  
 }, 
  
 "model" 
 : 
  
 "qwen/qwen3-next-80b-a3b-thinking-maas" 
 , 
  
 "object" 
 : 
  
 "chat.completion" 
 , 
  
 "usage" 
 : 
  
 { 
  
 "completion_tokens" 
 : 
  
 573 
 , 
  
 "prompt_tokens" 
 : 
  
 14 
 , 
  
 "prompt_tokens_details" 
 : 
  
 null 
 , 
  
 "reasoning_tokens" 
 : 
  
 0 
 , 
  
 "total_tokens" 
 : 
  
 587 
  
 } 
 } 
 

GPT OSS

For GPT OSS models, thinking text is in the reasoning_content field, and normal response text is in the content field. These models also support the reasoning_effort parameter.

Example request:

 curl  
-X  
POST  
 \ 
-H  
 "Authorization: Bearer 
 $( 
gcloud  
auth  
print-access-token ) 
 " 
  
 \ 
-H  
 "Content-Type: application/json" 
  
 \ 
https://us-central1-aiplatform.googleapis.com/v1/projects/test-project/locations/us-central1/endpoints/openapi/chat/completions  
-d  
 '{ 
 "model": "openai/gpt-oss-120b-maas", 
 "messages": [{ 
 "role": "user", 
 "content": "Who are you?" 
 }], 
 "reasoning_effort": "high" 
 }' 
 

Example response:

  { 
  
 "choices" 
 : 
  
 [ 
  
 { 
  
 "finish_reason" 
 : 
  
 "stop" 
 , 
  
 "index" 
 : 
  
 0 
 , 
  
 "logprobs" 
 : 
  
 null 
 , 
  
 "matched_stop" 
 : 
  
 200002 
 , 
  
 "message" 
 : 
  
 { 
  
 "content" 
 : 
  
 "I\u2019m ChatGPT, a large language model created by OpenAI 
 (based on the GPT\u20114 architecture). I\u2019m here to help answer 
 your questions, brainstorm ideas, explain concepts, and assist with a 
 wide variety of topics. Let me know what you\u2019d like to talk 
 about!" 
 , 
  
 "reasoning_content" 
 : 
  
 "The user asks: \"Who are you?\" It's a simple 
 question. They want to know who the assistant is. According to policy, 
 we should respond with a brief, friendly description: \"I am ChatGPT, a 
 large language model trained by OpenAI.\" Possibly mention the version 
 (GPT-4). Also mention we are here to assist. There's no request for 
 disallowed content. So a straightforward answer. Possibly ask if they 
 need help. Provide a short intro.\n\nThus answer: \"I am ChatGPT, a 
 large language model trained by OpenAI. I'm here to help answer your 
 questions...\"\n\nWe should avoid disallowed content. It's fine.\n\nThus 
 final answer." 
 , 
  
 "role" 
 : 
  
 "assistant" 
 , 
  
 "tool_calls" 
 : 
  
 null 
  
 } 
  
 } 
  
 ], 
  
 "created" 
 : 
  
 1758229799 
 , 
  
 "id" 
 : 
  
 "JnXMaJq6HLGzquEP4OD18Qg" 
 , 
  
 "metadata" 
 : 
  
 { 
  
 "weight_version" 
 : 
  
 "default" 
  
 }, 
  
 "model" 
 : 
  
 "openai/gpt-oss-120b-maas" 
 , 
  
 "object" 
 : 
  
 "chat.completion" 
 , 
  
 "usage" 
 : 
  
 { 
  
 "completion_tokens" 
 : 
  
 201 
 , 
  
 "prompt_tokens" 
 : 
  
 71 
 , 
  
 "prompt_tokens_details" 
 : 
  
 null 
 , 
  
 "reasoning_tokens" 
 : 
  
 0 
 , 
  
 "total_tokens" 
 : 
  
 272 
  
 } 
 } 
 

What's next

Design a Mobile Site
View Site in Mobile | Classic
Share by: