MiniMax M3 brings frontier AI performance to open weights — at a fraction of the cost

Published: June 02, 2026, 01:28 UTC

Chinese AI startup MiniMax released M3 on June 1, an open-weights model that matches or beats proprietary leaders like GPT-5.5 and Gemini 3.1 Pro on key coding and agentic benchmarks — at 5% to 10% of their API cost.

A new architecture for sparse attention

M3 is built on MiniMax Sparse Attention (MSA), a custom architecture that replaces full self-attention with KV-block selection. Standard attention computes every token’s relationship to every other token — quadratic complexity that becomes prohibitive at very long contexts. MSA selects only the most relevant key-value blocks for each query, dramatically reducing per-token compute while preserving the model’s ability to retrieve information across long sequences. This is what enables M3’s 1 million token context window without the latency blowup that plagues dense attention at that scale.

The model supports text, image, and video inputs with text output — native multimodality in a single weight set rather than a gated mixture of experts.

Benchmarks that beat the incumbents

MiniMax reports M3 scores of 59% on SWE-bench Pro (software engineering), 66% on Terminal-Bench 2.1 (agentic terminal tasks), 74.2% on MCP Atlas (tool-use orchestration), and 83.5 on BrowseComp (web-based agent tasks). These figures edge out Anthropic’s Claude Opus 4.7 and OpenAI’s GPT-5.5 on the same evaluations, according to VentureBeat’s Carl Franzen.

On Terminal-Bench 2.1, M3’s 66% places it ahead of most closed-source frontier models — a surprising result given that terminal-based agentic coding has been a weakness for open-weight systems. The MSA architecture appears to be the differentiator: sparse attention lets M3 maintain coherent reasoning across the multi-step terminal sessions that trip up dense-attention models at scale.

Pricing that undercuts the market

M3 is available via the MiniMax API at a promotional price of $0.30 per million input tokens and $1.20 per million output tokens for the first week. After that, full pricing settles at $0.60/$2.40 per million tokens — still 8% to 20% of what GPT-5.5 or Gemini 3.1 Pro costs on a per-token basis.

MiniMax has also launched subscription token plans starting at $20 per month, and the company plans to release open weights under a permissive license within 10 days, allowing enterprise customers to self-host and fine-tune the model.

For comparison, DeepSeek V4 Flash costs $0.14/$0.28 per million tokens and Xiaomi’s MiMo-V2.5 Flash comes in at $0.10/$0.30, but neither matches M3’s benchmark profile on agentic and coding tasks.

The open-weight frontier bends

M3 arrives at a moment when the gap between open and closed models has been narrowing rapidly. DeepSeek’s V4 series, the Qwen3 family from Alibaba, and now MiniMax’s M3 have each demonstrated that open-weight systems can compete with proprietary flagships on specific benchmarks — but M3 is the first to combine frontier-level coding, million-token context, and native multimodality in a single open-weights release.

The question isn’t whether open weights can match proprietary quality anymore. It’s whether the ecosystem around self-hosted models — toolchains, observability, security practices — can mature fast enough to make self-hosting the default choice for enterprises that can afford the infrastructure.

Sources: VentureBeat (June 1, 2026); MiniMax API Docs (June 1, 2026); Lushbinary Developer Guide (June 1, 2026); OpenRouter Benchmarks (June 1, 2026)

A new architecture for sparse attention

Benchmarks that beat the incumbents

Pricing that undercuts the market

The open-weight frontier bends

Leave a Comment Cancel Reply