May 3, 2026

Grok 4.3 Lands: How xAI's New Flagship Stacks Up Against Three Chinese Open-Source Models

xAI's Grok 4.3 (~0.5T params, $1.25/$2.50 per 1M tokens) goes head-to-head with DeepSeek V4 Flash, MiniMax M2.7, and MiMo 2.5. Benchmarks, pricing, and where each model actually wins.

Disclaimer: All benchmark data in this article comes from official model cards, technical reports, pricing pages, and publicly available results from Artificial Analysis. None of it is from our own independent testing. Scores from different evaluation environments are not directly comparable and should be treated as directional only.

On April 30, 2026, xAI opened Grok 4.3 to full API access¹, two weeks after it appeared silently in the model selector for SuperGrok Heavy subscribers at $300/month². Elon Musk confirmed the live checkpoint lands at roughly 0.5 trillion parameters, with a 1T version roughly five days from completing training at the time of the April 17 beta launch².

Meanwhile, three Chinese open-source models have been shipping back-to-back: DeepSeek V4 Flash (April 24)³, MiniMax M2.7 (March 18)⁴, and Xiaomi’s MiMo 2.5 (April 22)⁵. All three share a common playbook: Mixture-of-Experts architectures with 10–15B active parameters, pushing toward frontier-level reasoning at a fraction of the cost of closed models. Grok 4.3 takes a different approach—larger scale, always-on reasoning, and a clear focus on vertical precision in legal and financial domains.

Specs at a Glance

Category	Grok 4.3	DeepSeek V4 Flash	MiniMax M2.7	MiMo 2.5
Developer	xAI	DeepSeek	MiniMax	Xiaomi
Released	Apr 17 (Beta) / Apr 30 (GA)¹²	Apr 24, 2026³	Mar 18, 2026⁴	Apr 22, 2026⁵
Total params	~0.5T (Musk confirmed)²	284B³	229B⁴	310B⁵
Active params	Undisclosed	13B³	10B⁴	15B⁵
Architecture	Undisclosed	MoE + CSA/HCA hybrid attention³	MoE⁴	MoE + hybrid sliding-window attention⁵
Context window	1M (API) / 2M (App)¹²	1M³	200K⁴	1M⁵
Reasoning control	Always on, cannot disable¹	3 tiers: Non-Think / High / Max³	Toggleable⁴	Toggleable⁵
Modalities	Text + image + video²	Text only³	Text only⁶	Text + image + audio + video⁵
Open source	Closed²	MIT license³	Open (license unclear)⁷	MIT license⁸
Document generation	PDF / PPTX / Excel²	Not supported	Not supported	Not supported

Grok 4.3 is a clear half-step ahead in feature completeness—native video understanding and structured document output are capabilities the open-source trio can’t fully match. MiMo 2.5 comes closest with its multimodal support, but it generates no document output.

Benchmarks: Grok Leads, But Not by Much

All data below is sourced from official disclosures and Artificial Analysis third-party evaluations:

Benchmark	Grok 4.3	DeepSeek V4 Flash (Max)	MiniMax M2.7	MiMo 2.5
GPQA Diamond	90.1%¹	88.1%³	87.4%⁹	84.9%¹⁰
HLE (Humanity’s Last Exam)	35.0%¹	34.8%³	28.1%⁹	25.2%¹⁰
AA Intelligence Index (composite)	53.2¹	47¹¹	49.6⁹	49.0¹⁰
SciCode	47.3%¹	—	47.0%⁹	—
τ²-Bench	97.7%¹	95.6% (High)³	84.8%⁹	90.6%¹¹
IFBench	81.3%¹	79.2% (Max)³	75.7%⁹	67.1%¹⁰
GDPval-AA (ELO)	~1500¹²	1395³	1495⁴	1578–1581¹³
Output speed (tok/s)	225.4¹	~80¹¹	~42⁹	100–150¹⁴
TTFT (time to first token)	13.13s¹	1.03s (Non-Think)¹¹	1.75s–2.31s⁹	N/A

A few things stand out:

Composite intelligence belongs to Grok 4.3, but the gap is narrow. AA Intelligence Index 53.2 vs 47–49.6 for the open-source trio—a 4–6 point spread. On GPQA Diamond, Grok’s 90.1% leads DeepSeek V4 Flash’s 88.1% by just two percentage points. The real separation happens on HLE (Humanity’s Last Exam), where Grok 4.3 pulls 6.9 points ahead of MiniMax M2.7.

Agent and tool-use benchmarks tell a messier story. On τ²-Bench, Grok 4.3 tops at 97.7%, but DeepSeek V4 Flash is within 2 points at 95.6%. On GDPval-AA, the picture actually flips: MiMo 2.5 leads at 1578–1581 ELO, ahead of Grok 4.3’s ~1500¹³¹². MiniMax M2.7 at 1495 is basically tied. VentureBeat noted that Grok 4.3’s ~300 ELO jump over Grok 4.20 was a major improvement, but mostly it pulled the model up to parity with the open-source pack rather than pulling away¹².

Grok 4.3 is the fastest generator but the slowest starter. 225.4 tok/s output speed is over 5× MiniMax M2.7, but 13.13 seconds to first token is dead last¹. That’s the cost of always-on reasoning—it thinks before every answer. DeepSeek V4 Flash clocks ~1 second to first token in Non-Think mode but slows dramatically in Max mode¹¹.

Where Grok 4.3 genuinely dominates: law and finance. It ranks #1 on CaseLaw v2 at 79.3% accuracy and #1 on CorpFin¹². The three open-source models have published no comparable scores on these vertical benchmarks.

Pricing: Grok Sits in the Middle, Open Source Polarized

Category	Grok 4.3	DeepSeek V4 Flash	MiniMax M2.7	MiMo 2.5
Input ($/1M tokens)	$1.25¹	$0.14³	$0.30⁴	~$0.50¹³
Output ($/1M tokens)	$2.50¹	$0.28³	$1.20⁴	~$1.50¹³
Cache read ($/1M)	$0.20¹	$0.0028³	$0.06¹⁵	N/A
Tiered pricing	2× above 200K tokens¹	Flat³	Flat⁴	Flat
Reasoning token billing	Same as output tokens¹	Same as output tokens³	Included in output⁴	Included
Consumer plans	$30/mo SuperGrok / $300/mo Heavy²	Pay-as-you-go only³	Token Plans available⁴	Token Plans available¹³

Grok 4.3’s input price of $1.25/M is nearly 9× DeepSeek V4 Flash, with output at $2.50/M similarly 9×. But against MiniMax M2.7 ($1.20/M output) and MiMo 2.5 (~$1.50/M), the gap shrinks to 1.7–2×.

The hidden cost is reasoning tokens. Grok 4.3 always thinks—each internal reasoning step generates billable output tokens. DeepSeek V4 Flash in Max mode incurs similar overhead. In practice, the gap between sticker price and real cost is larger than it looks on the pricing page.

What Grok 4.3 Has That Nobody Else Does

Three things are unique to Grok 4.3:

Vertical domain precision: #1 on CaseLaw v2 (legal reasoning, 79.3%) and #1 on CorpFin (financial analysis)¹² means real engineering value in compliance review, contract analysis, and financial modeling. The open-source models have no published scores on these benchmarks.
Native document generation: produces formatted PDFs, PPTX, and Excel files directly from conversation²—useful for competitive analysis, due diligence reports, and anything that currently requires manual formatting.
Deep X platform integration: Grok 4.3 can search X posts, user profiles, and threads in real time¹. That data channel is unavailable to any other model family.

Where the Open-Source Trio Fights Back

Price moat: DeepSeek V4 Flash at $0.28/M output means that for chat, classification, extraction, and high-volume API workloads, total cost can be a fraction of Grok 4.3 even if performance is slightly lower.
Deployment freedom: MIT-licensed models run on private cloud or on-premise—no data compliance risk. Grok 4.3 is API-only.
Agent capability isn’t weaker: On GDPval-AA, MiMo 2.5 actually scores higher than Grok 4.3, and MiniMax M2.7 ties it. These aren’t second-tier alternatives for agentic workflows.

What to Watch For

Grok 4.3’s release signals two things about the state of the market.

First, medium-scale models still have headroom. Grok 4.3 hits a 53.2 AA Intelligence Index with ~0.5T parameters—impressive but not unattainable for teams with sufficient compute. Musk’s mention of a 1T version still in training² suggests the ceiling at this scale isn’t settled yet.

Second, reasoning × cost × vertical precision is the 2026 battleground. Grok 4.3 picked “always reason, mid-range price.” DeepSeek V4 Flash picked “dirt cheap, turn reasoning off when you don’t need it.” MiniMax M2.7 picked “agentic balance.” MiMo 2.5 picked “multimodal + office agents.” Nobody spans all dimensions.

For developers building model routing in mid-2026, this means you probably won’t tie yourself to one model. Route legal and financial analysis to Grok 4.3. Route cost-sensitive high-throughput workloads to DeepSeek V4 Flash. Route multimodal applications to MiMo 2.5. Run head-to-head evaluations of MiniMax M2.7 and Grok 4.3 on your specific agent tasks.

If the 1T version of Grok 4.3 ships by end of Q2 with comparable pricing, it could push the “medium-scale” capability ceiling higher. Whether open-source models in the 10–20B active parameter class can close that gap will be the most interesting technical story of H2 2026.

References

Artificial Analysis / Easy Benchmarks / xAI Docs — Grok 4.3 third-party benchmark compilation: AA Intelligence Index 53.2, GPQA Diamond 90.1%, HLE 35.0%, output speed 225.4 tok/s, pricing $1.25/$2.50 https://easy-benchmarks.com/models/grok-4-3 and https://docs.x.ai/docs/models ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰ ↩¹¹ ↩¹² ↩¹³ ↩¹⁴ ↩¹⁵ ↩¹⁶ ↩¹⁷ ↩¹⁸ ↩¹⁹
Awesome Agents — Grok 4.3 full parameter confirmation (~0.5T), video input, document generation, Musk’s confirmation of 1T checkpoint in training https://awesomeagents.ai/models/grok-4-3/ ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰ ↩¹¹
DeepSeek Hugging Face — DeepSeek-V4-Flash official model card: full specs (284B/13B), benchmarks, MIT license https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰ ↩¹¹ ↩¹² ↩¹³ ↩¹⁴ ↩¹⁵ ↩¹⁶ ↩¹⁷ ↩¹⁸ ↩¹⁹ ↩²⁰
DataLearnerAI — MiniMax M2.7 official specs (229B/10B), benchmarks, and pricing https://www.datalearner.com/en/ai-models/pretrained-models/minimax-m2-7 ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰ ↩¹¹ ↩¹² ↩¹³
Xiaomi MiMo Official — MiMo-V2.5 release page: 310B/15B, 1M context, ClawEval scores https://mimo.xiaomi.com/mimo-v2-5/ ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸
DocsBot — MiniMax M2.7 model specs, confirmed text-only modality https://docsbot.ai/models/minimax-m2-7 ↩
RemoteOpenClaw Blog — MiniMax M2.7 model comparison, open-source status, parameter analysis https://www.remoteopenclaw.com/blog/best-minimax-models-for-openclaw ↩
VentureBeat — Xiaomi MiMo-V2.5 series open-sourced under MIT license https://venturebeat.com/technology/open-source-xiaomi-mimo-v2-5-and-v2-5-pro-are-among-the-most-efficient-and-affordable-at-agentic-claw-tasks ↩
Ufuk Ozen (Artificial Analysis data) — MiniMax M2.7: Intelligence Index 49.6, Coding Index 41.9, speed 42 tok/s https://ufukozen.com/model/minimax-minimax-m2.7 ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸
LLMBase.ai — MiMo-V2.5 vs Gemini 2.5 Flash benchmark comparison: GPQA 84.9%, HLE 25.2%, τ²-Bench 90.6% https://llmbase.ai/compare/gemini-2-5-flash-preview-09-2025-reasoning,mimo-v2-5-0424/ ↩ ↩² ↩³ ↩⁴
Codersera — DeepSeek V4 Flash deep dive: AA Intelligence Index 47, speed ~80 tok/s, TTFT 1.03s (Non-Think) https://codersera.com/blog/deepseek-v4-flash-deep-dive/ ↩ ↩² ↩³ ↩⁴ ↩⁵
VentureBeat — Grok 4.3 launch report: GDPval-AA ~300 ELO jump, CaseLaw v2 79.3%, CorpFin #1 https://venturebeat.com/technology/xai-launches-grok-4-3-at-an-aggressively-low-price-and-a-new-fast-powerful-voice-cloning-suite ↩ ↩² ↩³ ↩⁴ ↩⁵
Agmazon — MiMo-V2.5 complete guide: Terminal-Bench 56.1%, GDPval-AA 1578–1581, pricing ~$0.50/$1.50 https://agmazon.com/blog/articles/technology/202604/mimo-v2-5-complete-guide-en.html ↩ ↩² ↩³ ↩⁴ ↩⁵
Yahoo Tech — MiMo 2.5 speed 100–150 tok/s report https://tech.yahoo.com/ai/articles/xiaomis-mimo-2-5-pro-204235330.html ↩
Price Per Token — MiniMax M2.7 cache read price $0.06/M https://pricepertoken.com/pricing-page/model/minimax-minimax-m2.7 ↩