Disclaimer: All benchmark data in this article comes from official model cards, technical reports, pricing pages, and publicly available results from Artificial Analysis. None of it is from our own independent testing. Scores from different evaluation environments are not directly comparable and should be treated as directional only.
On April 30, 2026, xAI opened Grok 4.3 to full API access1, two weeks after it appeared silently in the model selector for SuperGrok Heavy subscribers at $300/month2. Elon Musk confirmed the live checkpoint lands at roughly 0.5 trillion parameters, with a 1T version roughly five days from completing training at the time of the April 17 beta launch2.
Meanwhile, three Chinese open-source models have been shipping back-to-back: DeepSeek V4 Flash (April 24)3, MiniMax M2.7 (March 18)4, and Xiaomi’s MiMo 2.5 (April 22)5. All three share a common playbook: Mixture-of-Experts architectures with 10–15B active parameters, pushing toward frontier-level reasoning at a fraction of the cost of closed models. Grok 4.3 takes a different approach—larger scale, always-on reasoning, and a clear focus on vertical precision in legal and financial domains.
Specs at a Glance
| Category | Grok 4.3 | DeepSeek V4 Flash | MiniMax M2.7 | MiMo 2.5 |
|---|---|---|---|---|
| Developer | xAI | DeepSeek | MiniMax | Xiaomi |
| Released | Apr 17 (Beta) / Apr 30 (GA)12 | Apr 24, 20263 | Mar 18, 20264 | Apr 22, 20265 |
| Total params | ~0.5T (Musk confirmed)2 | 284B3 | 229B4 | 310B5 |
| Active params | Undisclosed | 13B3 | 10B4 | 15B5 |
| Architecture | Undisclosed | MoE + CSA/HCA hybrid attention3 | MoE4 | MoE + hybrid sliding-window attention5 |
| Context window | 1M (API) / 2M (App)12 | 1M3 | 200K4 | 1M5 |
| Reasoning control | Always on, cannot disable1 | 3 tiers: Non-Think / High / Max3 | Toggleable4 | Toggleable5 |
| Modalities | Text + image + video2 | Text only3 | Text only6 | Text + image + audio + video5 |
| Open source | Closed2 | MIT license3 | Open (license unclear)7 | MIT license8 |
| Document generation | PDF / PPTX / Excel2 | Not supported | Not supported | Not supported |
Grok 4.3 is a clear half-step ahead in feature completeness—native video understanding and structured document output are capabilities the open-source trio can’t fully match. MiMo 2.5 comes closest with its multimodal support, but it generates no document output.
Benchmarks: Grok Leads, But Not by Much
All data below is sourced from official disclosures and Artificial Analysis third-party evaluations:
| Benchmark | Grok 4.3 | DeepSeek V4 Flash (Max) | MiniMax M2.7 | MiMo 2.5 |
|---|---|---|---|---|
| GPQA Diamond | 90.1%1 | 88.1%3 | 87.4%9 | 84.9%10 |
| HLE (Humanity’s Last Exam) | 35.0%1 | 34.8%3 | 28.1%9 | 25.2%10 |
| AA Intelligence Index (composite) | 53.21 | 4711 | 49.69 | 49.010 |
| SciCode | 47.3%1 | — | 47.0%9 | — |
| τ²-Bench | 97.7%1 | 95.6% (High)3 | 84.8%9 | 90.6%11 |
| IFBench | 81.3%1 | 79.2% (Max)3 | 75.7%9 | 67.1%10 |
| GDPval-AA (ELO) | ~150012 | 13953 | 14954 | 1578–158113 |
| Output speed (tok/s) | 225.41 | ~8011 | ~429 | 100–15014 |
| TTFT (time to first token) | 13.13s1 | 1.03s (Non-Think)11 | 1.75s–2.31s9 | N/A |
A few things stand out:
Composite intelligence belongs to Grok 4.3, but the gap is narrow. AA Intelligence Index 53.2 vs 47–49.6 for the open-source trio—a 4–6 point spread. On GPQA Diamond, Grok’s 90.1% leads DeepSeek V4 Flash’s 88.1% by just two percentage points. The real separation happens on HLE (Humanity’s Last Exam), where Grok 4.3 pulls 6.9 points ahead of MiniMax M2.7.
Agent and tool-use benchmarks tell a messier story. On τ²-Bench, Grok 4.3 tops at 97.7%, but DeepSeek V4 Flash is within 2 points at 95.6%. On GDPval-AA, the picture actually flips: MiMo 2.5 leads at 1578–1581 ELO, ahead of Grok 4.3’s ~15001312. MiniMax M2.7 at 1495 is basically tied. VentureBeat noted that Grok 4.3’s ~300 ELO jump over Grok 4.20 was a major improvement, but mostly it pulled the model up to parity with the open-source pack rather than pulling away12.
Grok 4.3 is the fastest generator but the slowest starter. 225.4 tok/s output speed is over 5× MiniMax M2.7, but 13.13 seconds to first token is dead last1. That’s the cost of always-on reasoning—it thinks before every answer. DeepSeek V4 Flash clocks ~1 second to first token in Non-Think mode but slows dramatically in Max mode11.
Where Grok 4.3 genuinely dominates: law and finance. It ranks #1 on CaseLaw v2 at 79.3% accuracy and #1 on CorpFin12. The three open-source models have published no comparable scores on these vertical benchmarks.
Pricing: Grok Sits in the Middle, Open Source Polarized
| Category | Grok 4.3 | DeepSeek V4 Flash | MiniMax M2.7 | MiMo 2.5 |
|---|---|---|---|---|
| Input ($/1M tokens) | $1.251 | $0.143 | $0.304 | ~$0.5013 |
| Output ($/1M tokens) | $2.501 | $0.283 | $1.204 | ~$1.5013 |
| Cache read ($/1M) | $0.201 | $0.00283 | $0.0615 | N/A |
| Tiered pricing | 2× above 200K tokens1 | Flat3 | Flat4 | Flat |
| Reasoning token billing | Same as output tokens1 | Same as output tokens3 | Included in output4 | Included |
| Consumer plans | $30/mo SuperGrok / $300/mo Heavy2 | Pay-as-you-go only3 | Token Plans available4 | Token Plans available13 |
Grok 4.3’s input price of $1.25/M is nearly 9× DeepSeek V4 Flash, with output at $2.50/M similarly 9×. But against MiniMax M2.7 ($1.20/M output) and MiMo 2.5 (~$1.50/M), the gap shrinks to 1.7–2×.
The hidden cost is reasoning tokens. Grok 4.3 always thinks—each internal reasoning step generates billable output tokens. DeepSeek V4 Flash in Max mode incurs similar overhead. In practice, the gap between sticker price and real cost is larger than it looks on the pricing page.
What Grok 4.3 Has That Nobody Else Does
Three things are unique to Grok 4.3:
- Vertical domain precision: #1 on CaseLaw v2 (legal reasoning, 79.3%) and #1 on CorpFin (financial analysis)12 means real engineering value in compliance review, contract analysis, and financial modeling. The open-source models have no published scores on these benchmarks.
- Native document generation: produces formatted PDFs, PPTX, and Excel files directly from conversation2—useful for competitive analysis, due diligence reports, and anything that currently requires manual formatting.
- Deep X platform integration: Grok 4.3 can search X posts, user profiles, and threads in real time1. That data channel is unavailable to any other model family.
Where the Open-Source Trio Fights Back
- Price moat: DeepSeek V4 Flash at $0.28/M output means that for chat, classification, extraction, and high-volume API workloads, total cost can be a fraction of Grok 4.3 even if performance is slightly lower.
- Deployment freedom: MIT-licensed models run on private cloud or on-premise—no data compliance risk. Grok 4.3 is API-only.
- Agent capability isn’t weaker: On GDPval-AA, MiMo 2.5 actually scores higher than Grok 4.3, and MiniMax M2.7 ties it. These aren’t second-tier alternatives for agentic workflows.
What to Watch For
Grok 4.3’s release signals two things about the state of the market.
First, medium-scale models still have headroom. Grok 4.3 hits a 53.2 AA Intelligence Index with ~0.5T parameters—impressive but not unattainable for teams with sufficient compute. Musk’s mention of a 1T version still in training2 suggests the ceiling at this scale isn’t settled yet.
Second, reasoning × cost × vertical precision is the 2026 battleground. Grok 4.3 picked “always reason, mid-range price.” DeepSeek V4 Flash picked “dirt cheap, turn reasoning off when you don’t need it.” MiniMax M2.7 picked “agentic balance.” MiMo 2.5 picked “multimodal + office agents.” Nobody spans all dimensions.
For developers building model routing in mid-2026, this means you probably won’t tie yourself to one model. Route legal and financial analysis to Grok 4.3. Route cost-sensitive high-throughput workloads to DeepSeek V4 Flash. Route multimodal applications to MiMo 2.5. Run head-to-head evaluations of MiniMax M2.7 and Grok 4.3 on your specific agent tasks.
If the 1T version of Grok 4.3 ships by end of Q2 with comparable pricing, it could push the “medium-scale” capability ceiling higher. Whether open-source models in the 10–20B active parameter class can close that gap will be the most interesting technical story of H2 2026.
References
Footnotes
-
Artificial Analysis / Easy Benchmarks / xAI Docs — Grok 4.3 third-party benchmark compilation: AA Intelligence Index 53.2, GPQA Diamond 90.1%, HLE 35.0%, output speed 225.4 tok/s, pricing $1.25/$2.50 https://easy-benchmarks.com/models/grok-4-3 and https://docs.x.ai/docs/models ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8 ↩9 ↩10 ↩11 ↩12 ↩13 ↩14 ↩15 ↩16 ↩17 ↩18 ↩19
-
Awesome Agents — Grok 4.3 full parameter confirmation (~0.5T), video input, document generation, Musk’s confirmation of 1T checkpoint in training https://awesomeagents.ai/models/grok-4-3/ ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8 ↩9 ↩10 ↩11
-
DeepSeek Hugging Face — DeepSeek-V4-Flash official model card: full specs (284B/13B), benchmarks, MIT license https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8 ↩9 ↩10 ↩11 ↩12 ↩13 ↩14 ↩15 ↩16 ↩17 ↩18 ↩19 ↩20
-
DataLearnerAI — MiniMax M2.7 official specs (229B/10B), benchmarks, and pricing https://www.datalearner.com/en/ai-models/pretrained-models/minimax-m2-7 ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8 ↩9 ↩10 ↩11 ↩12 ↩13
-
Xiaomi MiMo Official — MiMo-V2.5 release page: 310B/15B, 1M context, ClawEval scores https://mimo.xiaomi.com/mimo-v2-5/ ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8
-
DocsBot — MiniMax M2.7 model specs, confirmed text-only modality https://docsbot.ai/models/minimax-m2-7 ↩
-
RemoteOpenClaw Blog — MiniMax M2.7 model comparison, open-source status, parameter analysis https://www.remoteopenclaw.com/blog/best-minimax-models-for-openclaw ↩
-
VentureBeat — Xiaomi MiMo-V2.5 series open-sourced under MIT license https://venturebeat.com/technology/open-source-xiaomi-mimo-v2-5-and-v2-5-pro-are-among-the-most-efficient-and-affordable-at-agentic-claw-tasks ↩
-
Ufuk Ozen (Artificial Analysis data) — MiniMax M2.7: Intelligence Index 49.6, Coding Index 41.9, speed 42 tok/s https://ufukozen.com/model/minimax-minimax-m2.7 ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8
-
LLMBase.ai — MiMo-V2.5 vs Gemini 2.5 Flash benchmark comparison: GPQA 84.9%, HLE 25.2%, τ²-Bench 90.6% https://llmbase.ai/compare/gemini-2-5-flash-preview-09-2025-reasoning,mimo-v2-5-0424/ ↩ ↩2 ↩3 ↩4
-
Codersera — DeepSeek V4 Flash deep dive: AA Intelligence Index 47, speed ~80 tok/s, TTFT 1.03s (Non-Think) https://codersera.com/blog/deepseek-v4-flash-deep-dive/ ↩ ↩2 ↩3 ↩4 ↩5
-
VentureBeat — Grok 4.3 launch report: GDPval-AA ~300 ELO jump, CaseLaw v2 79.3%, CorpFin #1 https://venturebeat.com/technology/xai-launches-grok-4-3-at-an-aggressively-low-price-and-a-new-fast-powerful-voice-cloning-suite ↩ ↩2 ↩3 ↩4 ↩5
-
Agmazon — MiMo-V2.5 complete guide: Terminal-Bench 56.1%, GDPval-AA 1578–1581, pricing ~$0.50/$1.50 https://agmazon.com/blog/articles/technology/202604/mimo-v2-5-complete-guide-en.html ↩ ↩2 ↩3 ↩4 ↩5
-
Yahoo Tech — MiMo 2.5 speed 100–150 tok/s report https://tech.yahoo.com/ai/articles/xiaomis-mimo-2-5-pro-204235330.html ↩
-
Price Per Token — MiniMax M2.7 cache read price $0.06/M https://pricepertoken.com/pricing-page/model/minimax-minimax-m2.7 ↩