Changes/DeepSeek launched V4 Pro and V4 Flash with thinking mode controls

MajorDeepSeekNew Launchdeepseek-v4-proVerified

DeepSeek launched V4 Pro and V4 Flash with thinking mode controls

DeepSeek added deepseek-v4-pro and deepseek-v4-flash to its API model catalog on April 24, 2026. Both models support thinking and non-thinking modes, 1M token context, 384K max output, JSON output, tool calls, and FIM in non-thinking mode. Official pricing lists V4 Flash at $0.14 cache-miss input / $0.028 cache-hit input / $0.28 output per 1M tokens, and V4 Pro at $1.74 cache-miss input / $0.145 cache-hit input / $3.48 output per 1M tokens.

Apr 24, 2026Effective: Apr 24, 2026

At a glance

Decide impact, cost movement, and the next step before reading the full diff.

Affected

DeepSeek · deepseek-v4-pro

New Launch

Cost impact

High · 5/5

Estimated from pricing and window changes

Migration pressure

Medium · 3/5

No explicit deadline yet

Next step

Add explicit routing entries for deepseek-v4-pro and deepseek-v4-flash i...

Do this first, then review field-level details

Impact summary

Use explicit levels instead of abstract charts to judge this change.

Cost

High

Score 5/5100%

Quality

High

Score 5/5100%

Migration

Medium

Score 3/560%

Reliability

Medium

Score 2/540%

Compliance

Low

Score 1/520%

What changed

New API models: deepseek-v4-flash and deepseek-v4-pro
Both support thinking and non-thinking modes; thinking defaults to enabled
OpenAI-format controls: thinking.type enabled/disabled and reasoning_effort high/max
Anthropic-format effort control: output_config.effort high/max
Context window: 1M tokens; max output: 384K tokens
Features: JSON output, tool calls, chat prefix completion, and FIM in non-thinking mode
Pricing: V4 Flash $0.14 cache-miss input / $0.028 cache-hit input / $0.28 output per 1M tokens
Pricing: V4 Pro $1.74 cache-miss input / $0.145 cache-hit input / $3.48 output per 1M tokens
deepseek-chat and deepseek-reasoner remain compatibility aliases for V4 Flash modes and are marked for future deprecation

Recommended actions

Add explicit model IDs instead of relying on deepseek-chat or deepseek-reasoner aliases.
For agentic coding and complex reasoning, benchmark V4 Pro with reasoning_effort=max against your current frontier route.
For high-volume and latency-sensitive workloads, test V4 Flash with thinking disabled and enabled separately.
If using tool calls in thinking mode, persist and pass back reasoning_content on subsequent requests to avoid 400 responses.
Update budget rules to account for cache-hit and cache-miss input pricing separately.

Changes

Review changed fields first, then decide whether you need full before/after values.

7 fields changed

Pro

modelscompatibility_aliasesthinking_modecontext_windowmax_outputfeaturesapi_pricing

Upgrade to Pro for full field-level diffs, before/after values, and migration guidance.

Upgrade

Recommended Actions

Only actionable items stay prominent. Incomplete actions fall back to a compact list.

Do this first

Migrate

Add explicit routing entries for deepseek-v4-pro and deepseek-v4-flash instead of relying only on deepseek-chat or deepseek-reasoner aliases.

Validate

Benchmark V4 Pro with reasoning_effort=max for agentic coding and complex reasoning before promoting it to a frontier route.

Update

For thinking-mode tool calls, store assistant reasoning_content with tool_calls and pass it back in subsequent requests.

Monitor

Update cost dashboards to distinguish cache-hit input, cache-miss input, and output tokens for both V4 tiers.

Sources

Each source focuses on what it confirms, without repeating the whole article.

social

DeepSeek V4 announcement on X

Official DeepSeek social announcement URL for the V4 Pro and V4 Flash launch.

Open

pricing

DeepSeek Models & Pricing

DeepSeek lists deepseek-v4-flash and deepseek-v4-pro with 1M context, 384K max output, cache-hit input, cache-miss input, and output token pricing.

Open

docs

DeepSeek Thinking Mode

DeepSeek documents thinking mode toggles, reasoning effort controls, reasoning_content behavior, and tool-call handling for deepseek-v4-pro.

Open

model_card

DeepSeek V4 Pro model card

DeepSeek describes V4 Pro and V4 Flash as MoE models with 1M token context and configurable non-think, high, and max reasoning modes.

Open

Comments

Loading comments...