ModelPriceLab
Changes/DeepSeek launched V4 Pro and V4 Flash with thinking mode controls
MajorDeepSeekNew Launchdeepseek-v4-proVerified

DeepSeek launched V4 Pro and V4 Flash with thinking mode controls

DeepSeek added deepseek-v4-pro and deepseek-v4-flash to its API model catalog on April 24, 2026. Both models support thinking and non-thinking modes, 1M token context, 384K max output, JSON output, tool calls, and FIM in non-thinking mode. Official pricing lists V4 Flash at $0.14 cache-miss input / $0.028 cache-hit input / $0.28 output per 1M tokens, and V4 Pro at $1.74 cache-miss input / $0.145 cache-hit input / $3.48 output per 1M tokens.

Apr 24, 2026Effective: Apr 24, 2026
At a glance
Decide impact, cost movement, and the next step before reading the full diff.
Affected
DeepSeek · deepseek-v4-pro
New Launch
Cost impact
High · 5/5
Estimated from pricing and window changes
Migration pressure
Medium · 3/5
No explicit deadline yet
Next step
Add explicit routing entries for deepseek-v4-pro and deepseek-v4-flash i...
Do this first, then review field-level details

Impact summary

Use explicit levels instead of abstract charts to judge this change.

Cost
High
Score 5/5100%
Quality
High
Score 5/5100%
Migration
Medium
Score 3/560%
Reliability
Medium
Score 2/540%
Compliance
Low
Score 1/520%
What changed
  • New API models: deepseek-v4-flash and deepseek-v4-pro
  • Both support thinking and non-thinking modes; thinking defaults to enabled
  • OpenAI-format controls: thinking.type enabled/disabled and reasoning_effort high/max
  • Anthropic-format effort control: output_config.effort high/max
  • Context window: 1M tokens; max output: 384K tokens
  • Features: JSON output, tool calls, chat prefix completion, and FIM in non-thinking mode
  • Pricing: V4 Flash $0.14 cache-miss input / $0.028 cache-hit input / $0.28 output per 1M tokens
  • Pricing: V4 Pro $1.74 cache-miss input / $0.145 cache-hit input / $3.48 output per 1M tokens
  • deepseek-chat and deepseek-reasoner remain compatibility aliases for V4 Flash modes and are marked for future deprecation
Recommended actions
  • Add explicit model IDs instead of relying on deepseek-chat or deepseek-reasoner aliases.
  • For agentic coding and complex reasoning, benchmark V4 Pro with reasoning_effort=max against your current frontier route.
  • For high-volume and latency-sensitive workloads, test V4 Flash with thinking disabled and enabled separately.
  • If using tool calls in thinking mode, persist and pass back reasoning_content on subsequent requests to avoid 400 responses.
  • Update budget rules to account for cache-hit and cache-miss input pricing separately.

Changes

Review changed fields first, then decide whether you need full before/after values.

7 fields changed
Pro
modelscompatibility_aliasesthinking_modecontext_windowmax_outputfeaturesapi_pricing
Upgrade to Pro for full field-level diffs, before/after values, and migration guidance.
Upgrade

Recommended Actions

Only actionable items stay prominent. Incomplete actions fall back to a compact list.

Do this first
1
Migrate

Add explicit routing entries for deepseek-v4-pro and deepseek-v4-flash instead of relying only on deepseek-chat or deepseek-reasoner aliases.

2
Validate

Benchmark V4 Pro with reasoning_effort=max for agentic coding and complex reasoning before promoting it to a frontier route.

3
Update

For thinking-mode tool calls, store assistant reasoning_content with tool_calls and pass it back in subsequent requests.

4
Monitor

Update cost dashboards to distinguish cache-hit input, cache-miss input, and output tokens for both V4 tiers.

Sources

Each source focuses on what it confirms, without repeating the whole article.

social
DeepSeek V4 announcement on X

Official DeepSeek social announcement URL for the V4 Pro and V4 Flash launch.

Open
pricing
DeepSeek Models & Pricing

DeepSeek lists deepseek-v4-flash and deepseek-v4-pro with 1M context, 384K max output, cache-hit input, cache-miss input, and output token pricing.

Open
docs
DeepSeek Thinking Mode

DeepSeek documents thinking mode toggles, reasoning effort controls, reasoning_content behavior, and tool-call handling for deepseek-v4-pro.

Open
model_card
DeepSeek V4 Pro model card

DeepSeek describes V4 Pro and V4 Flash as MoE models with 1M token context and configurable non-think, high, and max reasoning modes.

Open

Comments

Loading comments...
DeepSeek launched V4 Pro and V4 Flash with thinking mode controls — ModelPriceLab