On May 22, DeepSeek made the call the rest of the industry was dreading: the 75% discount on V4-Pro — previously set to expire May 31 — is now the permanent price1.
In numbers: $0.435 per million input tokens (cache miss), $0.87 per million output tokens, and $0.003625 per million cached input tokens2. The crossed-out original prices of $1.74/$3.48 stay on the page as decoration.
This is the fourth price adjustment in under a month: V4 launched April 24, 75% off on April 25, cache-hit prices dropped to one-tenth on April 26 (permanent), and now the discount becomes the list price34. Each time, they said “this is it.” Each time, it wasn’t.
The numbers, side by side
Table 1: Frontier model API pricing per million tokens 2567
| Model | Input | Output | Cache hit |
|---|---|---|---|
| DeepSeek V4-Pro | $0.435 | $0.87 | $0.0036 |
| DeepSeek V4-Flash | $0.14 | $0.28 | $0.0028 |
| GPT-5.5 | $5.00 | $30.00 | Not published |
| Claude Opus 4.7 | $5.00 | $25.00 | $0.50 |
| Gemini 3.5 Pro | $2.00 | $12.00 | Tiered |
V4-Pro output is one-thirty-fourth of GPT-5.5 and one-thirtieth of Claude Opus 4.756. Cached input — the number that matters for agents — is 138x cheaper than Anthropic’s27.
For developers, this price gap has moved past “which is more cost-effective” into “what use cases are now economically viable that weren’t before?” Tasks that looked too expensive to automate on GPT-5.5 or Claude Opus might cost pocket change on V4-Pro.
Model labs can’t follow — and they know it
Since early 2026, AI labs on both sides of the Pacific had reached a quiet consensus: bigger models cost more, and users will pay3. OpenAI went from GPT-4o to GPT-5.5 with a 4x output price hike. Anthropic went from Sonnet 3.5 to Opus 4.7 with a 5x increase. Chinese firms like Zhipu, Alibaba, and Tencent followed.
DeepSeek nailing prices to the floor tells every competitor: your pricing structure doesn’t hold.
But the labs won’t match it anytime soon. OpenAI runs on Microsoft Azure H200 clusters. Anthropic runs on AWS and GCP8. US GPU inference has a marginal cost floor that makes DeepSeek’s prices unprofitable at equivalent hardware economics. DeepSeek’s cost structure is a different sport: a custom inference stack, CSA+HCA hybrid attention that needs only 27% of V3’s compute at 1M context, and domestic Ascend chips9. The architecture gap isn’t a few percentage points — it’s an order of magnitude.
The likely response from Silicon Valley is differentiation over price matching. OpenAI leans on ChatGPT, Codex, and Microsoft integration. Anthropic bets on safety certifications, government contracts, and enterprise compliance (HIPAA, SOC 2, FedRAMP)8. The value proposition shifts from tokens per dollar to trust per dollar.
The hosting providers are the ones really squeezed
This is where the story gets less obvious — and more consequential.
OpenAI and Anthropic are closed models. If you want Claude, you pay Anthropic. DeepSeek’s pricing pressures their margins, but it doesn’t offer a direct substitute through the same API endpoint.
DeepSeek is MIT-licensed9. Anyone can download it, deploy it, and charge for hosting it. That’s exactly what Together AI, Fireworks AI, DeepInfra, Novita, and others are doing. And that’s where the real squeeze lands.
Table 2: Same model, different host, different price 101112
| Provider | V4-Pro input/M | V4-Pro output/M | What you get |
|---|---|---|---|
| DeepSeek (native) | $0.435 | $0.87 | 1M context, 500 concurrency cap2 |
| Together AI | $2.10 | $4.40 | 512K context, enterprise SLA10 |
| Fireworks AI | $1.74 | $3.48 | 167 t/s throughput, 1M context12 |
| DeepInfra | $1.74 | $3.48 | FP4 quant, 66K context12 |
| Novita AI | $1.74 | $3.48 | 1M context12 |
Together AI charges 5x the native DeepSeek price. Fireworks and DeepInfra charge 4x101112. The original pitch for hosting providers was straightforward: “we run open-source models so you don’t have to.” When the model maker’s own API is this cheap, that value proposition needs rethinking.
The cards hosting providers have left: higher concurrency (DeepSeek caps at 500)2, raw throughput (Fireworks hits 167 tokens/second)12, and compliance certifications with data residency guarantees. Notably, Together AI only supports 512K context10 — shorter than DeepSeek native’s 1M — which makes the “we offer more” pitch harder to sustain. Whether the remaining advantages justify a 4-5x premium is a question every hosting customer is now running the numbers on.
The problem compounds: providers are also competing against each other. Fireworks, DeepInfra, and Novita all sit within 1.2x of each other at $2.17/1M blended12. Margins in this band are thin and heading thinner.
Why $0.0036 cached input changes the math
This might be the most underrated number in the whole announcement.
$0.003625 per million cached input tokens2. Agent products, coding assistants, customer support systems, document-heavy workflows — in all of these, the same system prompts and reference documents get sent on every call. Over 95% of input tokens hit cache3. The effective cost of input is essentially zero.
For agent developers, this is the key number. Every agent call resends the entire context. When cached input is nearly free, the per-call cost drops to output tokens only. At $0.87/M output, a medium-complexity agent task runs for less than a cup of coffee per day.
As 36Kr put it: this isn’t winning a price war by a few percent — it’s driving costs to a level where matching them means losing money3.
The Ascend 950 wildcard for the second half
DeepSeek left a note at V4 launch: Pro throughput is currently limited by compute availability, and prices should drop further once Huawei’s Ascend 950 supernodes ship at scale in the second half of 2026413.
If 950 production hits schedule, there’s another leg down. Goldman Sachs sees the same trajectory: domestic chip supply expansion enables further price compression13. The “permanent” price announced this week might not be the floor — it might be the next stepping stone.
Everyone tracking this space should hold onto one thought: don’t treat today’s pricing as the endpoint. DeepSeek has already proven that its “permanent” prices are built to be broken.
References
Conclusion
DeepSeek turning a 75% discount into the permanent price isn’t about another price cut. It’s a statement that this price is sustainable — and here to stay.
For OpenAI and Anthropic, the pressure is manageable for now — closed models still have trust and compliance moats. The real corner is for hosting providers. When the model maker’s own API undercuts third-party hosting by 4-5x, the “we run it for you” business needs a new foundation. Hosters must compete on service (concurrency, throughput, SLA), on price, or find a new value anchor entirely.
If Ascend 950 ships as planned in the second half, this round of cuts might not be the bottom. DeepSeek has already demonstrated that its “permanent” prices are designed to be broken — from above.
Footnotes
-
DeepSeek official announcement — V4-Pro 75% discount becomes permanent price after May 31 https://api-docs.deepseek.com/quick_start/pricing/ ↩
-
DeepSeek pricing page — V4-Pro permanent pricing: $0.435/M input, $0.87/M output, $0.0036/M cached https://api-docs.deepseek.com/quick_start/pricing/ ↩ ↩2 ↩3 ↩4 ↩5 ↩6
-
36Kr — “DeepSeek chose to clear the field amid a price hike wave” — technology-gap-driven structural pricing analysis https://36kr.com/p/3785921785076998 ↩ ↩2 ↩3 ↩4
-
Startup Fortune — “DeepSeek is making its 75 percent API discount permanent” — four price cuts in one month timeline https://startupfortune.com/deepseek-is-making-its-75-percent-api-discount-permanent/ ↩ ↩2
-
OpenAI pricing — GPT-5.5: $5.00/M input, $30.00/M output https://openai.com/api/pricing/ ↩ ↩2
-
Anthropic pricing — Claude Opus 4.7: $5.00/M input, $25.00/M output https://anthropic.com/pricing ↩ ↩2
-
Yahoo Hong Kong — DeepSeek V4-Pro cached input 138x cheaper than Anthropic https://hk.news.yahoo.com/deepseek-v4-pro-%E5%AE%A3%E4%BD%88%E6%B0%B8%E4%B9%85%E9%99%8D%E5%83%B9-75-175922880.html ↩ ↩2
-
TechFastForward — Anthropic structurally most exposed: API-concentrated revenue, can’t match DeepSeek pricing https://techfastforward.com/articles/deepseek-v4-pro-matches-claude-at-86-percent-off-frontier-ai-economics-2026 ↩ ↩2
-
DeepSeek V4 technical report — 1.6T param MoE, 49B active, CSA+HCA hybrid attention at 27% compute, MIT license https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro ↩ ↩2
-
Together AI — V4-Pro hosted: $2.10/M input, $4.40/M output, 512K context https://www.together.ai/blog/deepseek-v4-pro-now-available-on-together-ai ↩ ↩2 ↩3 ↩4
-
LLMReference — Fireworks AI V4-Pro pricing matches DeepSeek original list price https://www.llmreference.com/model/deepseek-v4-pro/fireworks-ai ↩ ↩2
-
DeepInfra — 6-provider benchmark: Fireworks 167 t/s, Together 0.99s TTFT, blended $2.17 https://deepinfra.com/blog/deepseek-v4-pro-max-api-benchmarks-latency-throughput-cost ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7
-
HKCNA / Goldman Sachs — Ascend 950PR mass production H2 2026, V4-Pro expected to drop further https://www.hkcna.hk/h5/docDetail.jsp?channel=2808&id=101306406 ↩ ↩2