News Deep Dive Opinion Research Data Resources Events About

Grok 4.3 Lands: How xAI's New Flagship Stacks Up Against Three Chinese Open-Source Models

xAI's Grok 4.3 (~0.5T params, $1.25/$2.50 per 1M tokens) goes head-to-head with DeepSeek V4 Flash, MiniMax M2.7, and MiMo 2.5. Benchmarks, pricing, and where each model actually wins.

Disclaimer: All benchmark data in this article comes from official model cards, technical reports, pricing pages, and publicly available results from Artificial Analysis. None of it is from our own independent testing. Scores from different evaluation environments are not directly comparable and should be treated as directional only.

On April 30, 2026, xAI opened Grok 4.3 to full API access1, two weeks after it appeared silently in the model selector for SuperGrok Heavy subscribers at $300/month2. Elon Musk confirmed the live checkpoint lands at roughly 0.5 trillion parameters, with a 1T version roughly five days from completing training at the time of the April 17 beta launch2.

Meanwhile, three Chinese open-source models have been shipping back-to-back: DeepSeek V4 Flash (April 24)3, MiniMax M2.7 (March 18)4, and Xiaomi’s MiMo 2.5 (April 22)5. All three share a common playbook: Mixture-of-Experts architectures with 10–15B active parameters, pushing toward frontier-level reasoning at a fraction of the cost of closed models. Grok 4.3 takes a different approach—larger scale, always-on reasoning, and a clear focus on vertical precision in legal and financial domains.

Specs at a Glance

CategoryGrok 4.3DeepSeek V4 FlashMiniMax M2.7MiMo 2.5
DeveloperxAIDeepSeekMiniMaxXiaomi
ReleasedApr 17 (Beta) / Apr 30 (GA)12Apr 24, 20263Mar 18, 20264Apr 22, 20265
Total params~0.5T (Musk confirmed)2284B3229B4310B5
Active paramsUndisclosed13B310B415B5
ArchitectureUndisclosedMoE + CSA/HCA hybrid attention3MoE4MoE + hybrid sliding-window attention5
Context window1M (API) / 2M (App)121M3200K41M5
Reasoning controlAlways on, cannot disable13 tiers: Non-Think / High / Max3Toggleable4Toggleable5
ModalitiesText + image + video2Text only3Text only6Text + image + audio + video5
Open sourceClosed2MIT license3Open (license unclear)7MIT license8
Document generationPDF / PPTX / Excel2Not supportedNot supportedNot supported

Grok 4.3 is a clear half-step ahead in feature completeness—native video understanding and structured document output are capabilities the open-source trio can’t fully match. MiMo 2.5 comes closest with its multimodal support, but it generates no document output.

Benchmarks: Grok Leads, But Not by Much

All data below is sourced from official disclosures and Artificial Analysis third-party evaluations:

BenchmarkGrok 4.3DeepSeek V4 Flash (Max)MiniMax M2.7MiMo 2.5
GPQA Diamond90.1%188.1%387.4%984.9%10
HLE (Humanity’s Last Exam)35.0%134.8%328.1%925.2%10
AA Intelligence Index (composite)53.21471149.6949.010
SciCode47.3%147.0%9
τ²-Bench97.7%195.6% (High)384.8%990.6%11
IFBench81.3%179.2% (Max)375.7%967.1%10
GDPval-AA (ELO)~15001213953149541578–158113
Output speed (tok/s)225.41~8011~429100–15014
TTFT (time to first token)13.13s11.03s (Non-Think)111.75s–2.31s9N/A

A few things stand out:

Composite intelligence belongs to Grok 4.3, but the gap is narrow. AA Intelligence Index 53.2 vs 47–49.6 for the open-source trio—a 4–6 point spread. On GPQA Diamond, Grok’s 90.1% leads DeepSeek V4 Flash’s 88.1% by just two percentage points. The real separation happens on HLE (Humanity’s Last Exam), where Grok 4.3 pulls 6.9 points ahead of MiniMax M2.7.

Agent and tool-use benchmarks tell a messier story. On τ²-Bench, Grok 4.3 tops at 97.7%, but DeepSeek V4 Flash is within 2 points at 95.6%. On GDPval-AA, the picture actually flips: MiMo 2.5 leads at 1578–1581 ELO, ahead of Grok 4.3’s ~15001312. MiniMax M2.7 at 1495 is basically tied. VentureBeat noted that Grok 4.3’s ~300 ELO jump over Grok 4.20 was a major improvement, but mostly it pulled the model up to parity with the open-source pack rather than pulling away12.

Grok 4.3 is the fastest generator but the slowest starter. 225.4 tok/s output speed is over 5× MiniMax M2.7, but 13.13 seconds to first token is dead last1. That’s the cost of always-on reasoning—it thinks before every answer. DeepSeek V4 Flash clocks ~1 second to first token in Non-Think mode but slows dramatically in Max mode11.

Where Grok 4.3 genuinely dominates: law and finance. It ranks #1 on CaseLaw v2 at 79.3% accuracy and #1 on CorpFin12. The three open-source models have published no comparable scores on these vertical benchmarks.

Pricing: Grok Sits in the Middle, Open Source Polarized

CategoryGrok 4.3DeepSeek V4 FlashMiniMax M2.7MiMo 2.5
Input ($/1M tokens)$1.251$0.143$0.304~$0.5013
Output ($/1M tokens)$2.501$0.283$1.204~$1.5013
Cache read ($/1M)$0.201$0.00283$0.0615N/A
Tiered pricing2× above 200K tokens1Flat3Flat4Flat
Reasoning token billingSame as output tokens1Same as output tokens3Included in output4Included
Consumer plans$30/mo SuperGrok / $300/mo Heavy2Pay-as-you-go only3Token Plans available4Token Plans available13

Grok 4.3’s input price of $1.25/M is nearly 9× DeepSeek V4 Flash, with output at $2.50/M similarly 9×. But against MiniMax M2.7 ($1.20/M output) and MiMo 2.5 (~$1.50/M), the gap shrinks to 1.7–2×.

The hidden cost is reasoning tokens. Grok 4.3 always thinks—each internal reasoning step generates billable output tokens. DeepSeek V4 Flash in Max mode incurs similar overhead. In practice, the gap between sticker price and real cost is larger than it looks on the pricing page.

What Grok 4.3 Has That Nobody Else Does

Three things are unique to Grok 4.3:

  • Vertical domain precision: #1 on CaseLaw v2 (legal reasoning, 79.3%) and #1 on CorpFin (financial analysis)12 means real engineering value in compliance review, contract analysis, and financial modeling. The open-source models have no published scores on these benchmarks.
  • Native document generation: produces formatted PDFs, PPTX, and Excel files directly from conversation2—useful for competitive analysis, due diligence reports, and anything that currently requires manual formatting.
  • Deep X platform integration: Grok 4.3 can search X posts, user profiles, and threads in real time1. That data channel is unavailable to any other model family.

Where the Open-Source Trio Fights Back

  • Price moat: DeepSeek V4 Flash at $0.28/M output means that for chat, classification, extraction, and high-volume API workloads, total cost can be a fraction of Grok 4.3 even if performance is slightly lower.
  • Deployment freedom: MIT-licensed models run on private cloud or on-premise—no data compliance risk. Grok 4.3 is API-only.
  • Agent capability isn’t weaker: On GDPval-AA, MiMo 2.5 actually scores higher than Grok 4.3, and MiniMax M2.7 ties it. These aren’t second-tier alternatives for agentic workflows.

What to Watch For

Grok 4.3’s release signals two things about the state of the market.

First, medium-scale models still have headroom. Grok 4.3 hits a 53.2 AA Intelligence Index with ~0.5T parameters—impressive but not unattainable for teams with sufficient compute. Musk’s mention of a 1T version still in training2 suggests the ceiling at this scale isn’t settled yet.

Second, reasoning × cost × vertical precision is the 2026 battleground. Grok 4.3 picked “always reason, mid-range price.” DeepSeek V4 Flash picked “dirt cheap, turn reasoning off when you don’t need it.” MiniMax M2.7 picked “agentic balance.” MiMo 2.5 picked “multimodal + office agents.” Nobody spans all dimensions.

For developers building model routing in mid-2026, this means you probably won’t tie yourself to one model. Route legal and financial analysis to Grok 4.3. Route cost-sensitive high-throughput workloads to DeepSeek V4 Flash. Route multimodal applications to MiMo 2.5. Run head-to-head evaluations of MiniMax M2.7 and Grok 4.3 on your specific agent tasks.

If the 1T version of Grok 4.3 ships by end of Q2 with comparable pricing, it could push the “medium-scale” capability ceiling higher. Whether open-source models in the 10–20B active parameter class can close that gap will be the most interesting technical story of H2 2026.

References

Footnotes

  1. Artificial Analysis / Easy Benchmarks / xAI Docs — Grok 4.3 third-party benchmark compilation: AA Intelligence Index 53.2, GPQA Diamond 90.1%, HLE 35.0%, output speed 225.4 tok/s, pricing $1.25/$2.50 https://easy-benchmarks.com/models/grok-4-3 and https://docs.x.ai/docs/models 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

  2. Awesome Agents — Grok 4.3 full parameter confirmation (~0.5T), video input, document generation, Musk’s confirmation of 1T checkpoint in training https://awesomeagents.ai/models/grok-4-3/ 2 3 4 5 6 7 8 9 10 11

  3. DeepSeek Hugging Face — DeepSeek-V4-Flash official model card: full specs (284B/13B), benchmarks, MIT license https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

  4. DataLearnerAI — MiniMax M2.7 official specs (229B/10B), benchmarks, and pricing https://www.datalearner.com/en/ai-models/pretrained-models/minimax-m2-7 2 3 4 5 6 7 8 9 10 11 12 13

  5. Xiaomi MiMo Official — MiMo-V2.5 release page: 310B/15B, 1M context, ClawEval scores https://mimo.xiaomi.com/mimo-v2-5/ 2 3 4 5 6 7 8

  6. DocsBot — MiniMax M2.7 model specs, confirmed text-only modality https://docsbot.ai/models/minimax-m2-7

  7. RemoteOpenClaw Blog — MiniMax M2.7 model comparison, open-source status, parameter analysis https://www.remoteopenclaw.com/blog/best-minimax-models-for-openclaw

  8. VentureBeat — Xiaomi MiMo-V2.5 series open-sourced under MIT license https://venturebeat.com/technology/open-source-xiaomi-mimo-v2-5-and-v2-5-pro-are-among-the-most-efficient-and-affordable-at-agentic-claw-tasks

  9. Ufuk Ozen (Artificial Analysis data) — MiniMax M2.7: Intelligence Index 49.6, Coding Index 41.9, speed 42 tok/s https://ufukozen.com/model/minimax-minimax-m2.7 2 3 4 5 6 7 8

  10. LLMBase.ai — MiMo-V2.5 vs Gemini 2.5 Flash benchmark comparison: GPQA 84.9%, HLE 25.2%, τ²-Bench 90.6% https://llmbase.ai/compare/gemini-2-5-flash-preview-09-2025-reasoning,mimo-v2-5-0424/ 2 3 4

  11. Codersera — DeepSeek V4 Flash deep dive: AA Intelligence Index 47, speed ~80 tok/s, TTFT 1.03s (Non-Think) https://codersera.com/blog/deepseek-v4-flash-deep-dive/ 2 3 4 5

  12. VentureBeat — Grok 4.3 launch report: GDPval-AA ~300 ELO jump, CaseLaw v2 79.3%, CorpFin #1 https://venturebeat.com/technology/xai-launches-grok-4-3-at-an-aggressively-low-price-and-a-new-fast-powerful-voice-cloning-suite 2 3 4 5

  13. Agmazon — MiMo-V2.5 complete guide: Terminal-Bench 56.1%, GDPval-AA 1578–1581, pricing ~$0.50/$1.50 https://agmazon.com/blog/articles/technology/202604/mimo-v2-5-complete-guide-en.html 2 3 4 5

  14. Yahoo Tech — MiMo 2.5 speed 100–150 tok/s report https://tech.yahoo.com/ai/articles/xiaomis-mimo-2-5-pro-204235330.html

  15. Price Per Token — MiniMax M2.7 cache read price $0.06/M https://pricepertoken.com/pricing-page/model/minimax-minimax-m2.7