May 23, 2026

DeepSeek Made Its 75% Discount Permanent — The Squeeze Doesn't Stop at OpenAI

DeepSeek turned its temporary V4-Pro discount into a permanent price reset. $0.435/M input, $0.87/M output. But the real pressure lands on third-party inference providers charging 4-5x more for the same open-weight model.

On May 22, DeepSeek made the call the rest of the industry was dreading: the 75% discount on V4-Pro — previously set to expire May 31 — is now the permanent price¹.

In numbers: $0.435 per million input tokens (cache miss), $0.87 per million output tokens, and $0.003625 per million cached input tokens². The crossed-out original prices of $1.74/$3.48 stay on the page as decoration.

This is the fourth price adjustment in under a month: V4 launched April 24, 75% off on April 25, cache-hit prices dropped to one-tenth on April 26 (permanent), and now the discount becomes the list price³⁴. Each time, they said “this is it.” Each time, it wasn’t.

The numbers, side by side

Table 1: Frontier model API pricing per million tokens ²⁵⁶⁷

Model	Input	Output	Cache hit
DeepSeek V4-Pro	$0.435	$0.87	$0.0036
DeepSeek V4-Flash	$0.14	$0.28	$0.0028
GPT-5.5	$5.00	$30.00	Not published
Claude Opus 4.7	$5.00	$25.00	$0.50
Gemini 3.5 Pro	$2.00	$12.00	Tiered

V4-Pro output is one-thirty-fourth of GPT-5.5 and one-thirtieth of Claude Opus 4.7⁵⁶. Cached input — the number that matters for agents — is 138x cheaper than Anthropic’s²⁷.

For developers, this price gap has moved past “which is more cost-effective” into “what use cases are now economically viable that weren’t before?” Tasks that looked too expensive to automate on GPT-5.5 or Claude Opus might cost pocket change on V4-Pro.

Model labs can’t follow — and they know it

Since early 2026, AI labs on both sides of the Pacific had reached a quiet consensus: bigger models cost more, and users will pay³. OpenAI went from GPT-4o to GPT-5.5 with a 4x output price hike. Anthropic went from Sonnet 3.5 to Opus 4.7 with a 5x increase. Chinese firms like Zhipu, Alibaba, and Tencent followed.

DeepSeek nailing prices to the floor tells every competitor: your pricing structure doesn’t hold.

But the labs won’t match it anytime soon. OpenAI runs on Microsoft Azure H200 clusters. Anthropic runs on AWS and GCP⁸. US GPU inference has a marginal cost floor that makes DeepSeek’s prices unprofitable at equivalent hardware economics. DeepSeek’s cost structure is a different sport: a custom inference stack, CSA+HCA hybrid attention that needs only 27% of V3’s compute at 1M context, and domestic Ascend chips⁹. The architecture gap isn’t a few percentage points — it’s an order of magnitude.

The likely response from Silicon Valley is differentiation over price matching. OpenAI leans on ChatGPT, Codex, and Microsoft integration. Anthropic bets on safety certifications, government contracts, and enterprise compliance (HIPAA, SOC 2, FedRAMP)⁸. The value proposition shifts from tokens per dollar to trust per dollar.

The hosting providers are the ones really squeezed

This is where the story gets less obvious — and more consequential.

OpenAI and Anthropic are closed models. If you want Claude, you pay Anthropic. DeepSeek’s pricing pressures their margins, but it doesn’t offer a direct substitute through the same API endpoint.

DeepSeek is MIT-licensed⁹. Anyone can download it, deploy it, and charge for hosting it. That’s exactly what Together AI, Fireworks AI, DeepInfra, Novita, and others are doing. And that’s where the real squeeze lands.

Table 2: Same model, different host, different price ¹⁰¹¹¹²

Provider	V4-Pro input/M	V4-Pro output/M	What you get
DeepSeek (native)	$0.435	$0.87	1M context, 500 concurrency cap²
Together AI	$2.10	$4.40	512K context, enterprise SLA¹⁰
Fireworks AI	$1.74	$3.48	167 t/s throughput, 1M context¹²
DeepInfra	$1.74	$3.48	FP4 quant, 66K context¹²
Novita AI	$1.74	$3.48	1M context¹²

Together AI charges 5x the native DeepSeek price. Fireworks and DeepInfra charge 4x¹⁰¹¹¹². The original pitch for hosting providers was straightforward: “we run open-source models so you don’t have to.” When the model maker’s own API is this cheap, that value proposition needs rethinking.

The cards hosting providers have left: higher concurrency (DeepSeek caps at 500)², raw throughput (Fireworks hits 167 tokens/second)¹², and compliance certifications with data residency guarantees. Notably, Together AI only supports 512K context¹⁰ — shorter than DeepSeek native’s 1M — which makes the “we offer more” pitch harder to sustain. Whether the remaining advantages justify a 4-5x premium is a question every hosting customer is now running the numbers on.

The problem compounds: providers are also competing against each other. Fireworks, DeepInfra, and Novita all sit within 1.2x of each other at $2.17/1M blended¹². Margins in this band are thin and heading thinner.

Why $0.0036 cached input changes the math

This might be the most underrated number in the whole announcement.

$0.003625 per million cached input tokens². Agent products, coding assistants, customer support systems, document-heavy workflows — in all of these, the same system prompts and reference documents get sent on every call. Over 95% of input tokens hit cache³. The effective cost of input is essentially zero.

For agent developers, this is the key number. Every agent call resends the entire context. When cached input is nearly free, the per-call cost drops to output tokens only. At $0.87/M output, a medium-complexity agent task runs for less than a cup of coffee per day.

As 36Kr put it: this isn’t winning a price war by a few percent — it’s driving costs to a level where matching them means losing money³.

The Ascend 950 wildcard for the second half

DeepSeek left a note at V4 launch: Pro throughput is currently limited by compute availability, and prices should drop further once Huawei’s Ascend 950 supernodes ship at scale in the second half of 2026⁴¹³.

If 950 production hits schedule, there’s another leg down. Goldman Sachs sees the same trajectory: domestic chip supply expansion enables further price compression¹³. The “permanent” price announced this week might not be the floor — it might be the next stepping stone.

Everyone tracking this space should hold onto one thought: don’t treat today’s pricing as the endpoint. DeepSeek has already proven that its “permanent” prices are built to be broken.

References

Conclusion

DeepSeek turning a 75% discount into the permanent price isn’t about another price cut. It’s a statement that this price is sustainable — and here to stay.

For OpenAI and Anthropic, the pressure is manageable for now — closed models still have trust and compliance moats. The real corner is for hosting providers. When the model maker’s own API undercuts third-party hosting by 4-5x, the “we run it for you” business needs a new foundation. Hosters must compete on service (concurrency, throughput, SLA), on price, or find a new value anchor entirely.

If Ascend 950 ships as planned in the second half, this round of cuts might not be the bottom. DeepSeek has already demonstrated that its “permanent” prices are designed to be broken — from above.

DeepSeek official announcement — V4-Pro 75% discount becomes permanent price after May 31 https://api-docs.deepseek.com/quick_start/pricing/ ↩
DeepSeek pricing page — V4-Pro permanent pricing: $0.435/M input, $0.87/M output, $0.0036/M cached https://api-docs.deepseek.com/quick_start/pricing/ ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶
36Kr — “DeepSeek chose to clear the field amid a price hike wave” — technology-gap-driven structural pricing analysis https://36kr.com/p/3785921785076998 ↩ ↩² ↩³ ↩⁴
Startup Fortune — “DeepSeek is making its 75 percent API discount permanent” — four price cuts in one month timeline https://startupfortune.com/deepseek-is-making-its-75-percent-api-discount-permanent/ ↩ ↩²
OpenAI pricing — GPT-5.5: $5.00/M input, $30.00/M output https://openai.com/api/pricing/ ↩ ↩²
Anthropic pricing — Claude Opus 4.7: $5.00/M input, $25.00/M output https://anthropic.com/pricing ↩ ↩²
Yahoo Hong Kong — DeepSeek V4-Pro cached input 138x cheaper than Anthropic https://hk.news.yahoo.com/deepseek-v4-pro-%E5%AE%A3%E4%BD%88%E6%B0%B8%E4%B9%85%E9%99%8D%E5%83%B9-75-175922880.html ↩ ↩²
TechFastForward — Anthropic structurally most exposed: API-concentrated revenue, can’t match DeepSeek pricing https://techfastforward.com/articles/deepseek-v4-pro-matches-claude-at-86-percent-off-frontier-ai-economics-2026 ↩ ↩²
DeepSeek V4 technical report — 1.6T param MoE, 49B active, CSA+HCA hybrid attention at 27% compute, MIT license https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro ↩ ↩²
Together AI — V4-Pro hosted: $2.10/M input, $4.40/M output, 512K context https://www.together.ai/blog/deepseek-v4-pro-now-available-on-together-ai ↩ ↩² ↩³ ↩⁴
LLMReference — Fireworks AI V4-Pro pricing matches DeepSeek original list price https://www.llmreference.com/model/deepseek-v4-pro/fireworks-ai ↩ ↩²
DeepInfra — 6-provider benchmark: Fireworks 167 t/s, Together 0.99s TTFT, blended $2.17 https://deepinfra.com/blog/deepseek-v4-pro-max-api-benchmarks-latency-throughput-cost ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷
HKCNA / Goldman Sachs — Ascend 950PR mass production H2 2026, V4-Pro expected to drop further https://www.hkcna.hk/h5/docDetail.jsp?channel=2808&id=101306406 ↩ ↩²