← Blog

The $8M trillion: when frontier-grade training falls out of the lab

May 30, 2026 · 9 min read

In December 2024, DeepSeek-V3 was trained for roughly $5.576 million on a 2048-GPU H800 cluster in 55 days. Sixteen months later, DeepSeek shipped V4-Pro at 1.6 trillion total parameters (49B active) - trained on Nvidia Hoppers, but engineered with native Huawei Ascend inference support and priced at $1.74 per million input tokens against GPT-5.5-class rates. In the same window, OpenAI announced the $500 billion Stargate joint venture with SoftBank and Oracle. The gap between the cheapest frontier-grade training run and the most expensive one is now roughly four orders of magnitude, and it will close in one direction, not the other.

This is the contrarian piece. I think the frontier-lab capex story is broadly correct and mostly irrelevant for everyone except the four companies running the largest training clusters. Everyone else is about to discover that the model is the commodity, and that the moat they spent 2024 and 2025 building - "we have a custom-trained model" - was a moat against the wrong river.

The KPI block: anchor these five numbers

The argument lives or dies on whether you accept this trajectory. Internalise it.

Reference point Public estimate Source
Frontier final-run cost growth, 2016-2024 2.4x per year (90% CI: 2.0-2.9x) Cottier et al / Epoch AI, 2024
Frontier development cost composition Hardware 47-67%, staff 29-49%, energy 2-6% Epoch AI
OpenAI 2024 compute spend $5B R&D + $1.8B inference Epoch AI
DeepSeek-V3 final pretraining run, Dec 2024 $5.6M (671B MoE, 55 days, 2048 H800) Maginative
DeepSeek-V4-Pro, April 2026 1.6T total / 49B active; trained on Nvidia Hopper, inference on Nvidia + Ascend ChinaTalk

The Epoch numbers and the DeepSeek numbers are measuring different things on purpose. Epoch counts amortised hardware plus energy for the largest declared runs. DeepSeek counts only the final pretraining pass. Both are honest. The interesting fact is that the spread between "biggest lab final-run cost" and "fast follower final-run cost" has gone from ~3x in 2022 to roughly 20-40x today. The frontier is sprinting; the trail behind it is paving itself.

Note: Treat DeepSeek's $5.6M as a marginal cost, not a total cost of ownership. SemiAnalysis estimated DeepSeek's actual server capex at roughly $1.3 billion once you count clusters, staff, and ablations. The right reading is: the marginal cost of one more frontier-grade model on existing infrastructure is now under $10M, and that is the number that determines what your competitors can ship next quarter.

The cost curve is not the story. The decoupling is.

Cottier et al's Epoch AI study puts the headline number at a 2.4x-per-year growth in frontier final-run cost since 2016. At that rate, the largest 2027 runs will exceed a billion dollars. That part of the literature is well-rehearsed and broadly believed. What is less well-rehearsed: the floor under "what counts as frontier-grade" is collapsing at roughly the same rate the ceiling is climbing.

Two clocks are running.

  • The frontier clock measures what the leading labs can do if they spend everything. Stargate, Anthropic's Amazon-backed buildouts, Google's TPU v6, Meta's 350K-GPU fleet. Cost: tens of billions per year per lab, and rising.
  • The fast-follower clock measures what a well-organised team can replicate using last year's techniques on last year's hardware once the recipe is in the open literature. Cost: $5-15M per training run, falling.

The frontier clock determines who has GPT-6 first. The fast-follower clock determines what your competitor's startup will be selling at $0.28 per million output tokens by Q4. If you are building a product, the second clock is the one that matters.

What compounds when the floor keeps dropping

Run the trajectory forward 24 months under three reasonable assumptions: H100 rental rates stabilise in the $1.49-$6.98/hr band across cloud providers (they bounced from $8 in early 2024 to under $2 on marketplaces in 2026), open-weights recipes continue to ship within 12 months of closed-weights frontier, and at least one non-Nvidia path (Ascend, Trainium, TPU rental, MI300X) reaches inference production maturity.

Under those assumptions, here is what gets cheap:

  • Trillion-parameter MoE training for any well-funded company. DeepSeek did 1.6T on Hoppers in April 2026 and engineered it to run on Ascend for inference. Mistral, Cohere, xAI, Alibaba, and at least two state-backed labs in the UAE and Saudi Arabia can do equivalent work by Q1 2027 if they choose to. The technical recipe is no longer the bottleneck. The data is.
  • Domain-specialised frontier models. A $10M training budget is now in the range where a vertical SaaS company with a serious data moat (Bloomberg, Epic, ICE/NYSE, Schwab) can plausibly justify training their own. Not fine-tuning. Pretraining. The economics start to work when the avoided inference markup over three years exceeds the training capex, which for any company shipping over 100B tokens per month it now does.
  • Inference, not training, becomes the binding constraint. OpenAI's own 2024 compute breakdown was $5B on R&D and $1.8B on inference, and the inference line is the one growing. By 2027, inference will be the larger line for everyone except the labs still chasing the frontier. Capacity, latency, and chip diversity matter more than parameter count.

The contrarian prediction

Here is the bet, stated specifically enough to be wrong: by end of 2027, at least one production-grade open-weights model trained for under $15M will match GPT-5.5 on the agentic coding and tool-use benchmarks that actually correlate with revenue, and the marginal price of intelligence (tokens served per dollar) will drop by another 8-12x from May 2026 levels. The labs will keep climbing the frontier and the frontier will keep being worth climbing, but the commercial gap between "best closed model" and "good-enough open model" will be 6-9 months and shrinking. Pricing power for general-purpose chat collapses. Pricing power for verticalised agents and proprietary-data systems holds.

A reasonable disagreement: the frontier labs will hit a regime of returns-to-scale that the followers structurally cannot match, because the next jump requires the kind of pretraining-data, RL-environment, and tool-use infrastructure that only $50B-a-year capex can build. That is the OpenAI / Anthropic thesis. I think it is half right - on the frontier - and half wrong - on what the median enterprise will pay for. Reasonable people can disagree; that is the point.

What the cost collapse does to the AI investment stack

If you accept the floor-collapse trajectory, the implications for where money should sit reorder cleanly.

Layer 2024 thesis 2026-2027 thesis
Frontier labs Winner-take-most Winner-take-frontier; commodity beneath
Inference clouds Margin compression coming Structural shortage; pricing power
Model-as-product (general) Defensible Commodity within 9 months of release
Vertical agents on proprietary data Underpriced The actual moat
Eval / observability / safety tooling Nice-to-have Required infrastructure
Custom silicon (non-Nvidia) Speculative Production-grade for inference by 2027

The investments that look smartest in 2027 will be the ones that assumed the model layer was commoditising and put their effort upstream (data acquisition, eval, distribution) or downstream (inference economics, agent reliability, domain integration). Investments that assumed "our fine-tune is the moat" will look like the 2011 mobile-app companies that thought their splash screen was the moat.

Note: This is the part that should make you uncomfortable if you are running a foundation-model startup that is neither a frontier lab nor a vertical specialist. The middle is being squeezed from both ends. You have 18 months to pick a side.

A side note on Huawei, Ascend, and the export-control question The most under-discussed datapoint of 2026 is what [DeepSeek did with V4 inference](https://www.theregister.com/2026/04/24/deepseek_v4/). Training itself stayed on Nvidia Hoppers - the ecosystem still wins for that workload - but V4 was engineered for native Ascend inference, with MXFP4 quantisation and custom non-CUDA kernels designed to break the Nvidia inference-stack lock-in. That is the actual chip-diversification story: training is still hard to move; inference is becoming portable. Five years of US export controls were premised on the assumption that frontier-grade *capability* required Nvidia end to end. Training, yes for now. Serving, no, already. Once Chinese (and Middle Eastern, and European) labs can train wherever and serve domestically, the calculus on chip-export policy looks different than it did in 2023. This deserves its own essay.

The second-order consequences nobody is pricing in

A few that compound from the floor-collapse:

  • The cost of being wrong drops too. When training a frontier-grade model costs $10M, you can run three competing architectures and ship the winner. When it costs $300M, you ship the one your VP signed off on and pray. Cheap training is implicitly a tax on bureaucracy, and the labs with the most bureaucracy lose the most.
  • Open-weights becomes the default release strategy. Not because anyone discovered ideology. Because if you do not open-weight, someone in Hangzhou or Abu Dhabi will replicate you within 9 months and capture your distribution. Releasing open is the new "release-as-loss-leader."
  • The eval problem becomes the dominant problem. If every six months a new contender ships a model claiming frontier-grade performance at one-tenth the cost, the only way an enterprise can keep up is automated, continuous evaluation pipelines on their own data. Eval is now infrastructure, not a side project.
  • Talent re-clusters. The post-training, RL, and agentic-systems researchers concentrate at the frontier labs because that work still requires the compute. The pretraining researchers diffuse outward because their craft is now reproducible. Expect a wave of pretraining-heavy founders leaving the big labs in 2026-2027.

What to bet on this quarter

If you take the floor-collapse seriously, here are five concrete moves worth running this week:

  1. Audit every "our fine-tune is the moat" claim in your portfolio or your roadmap. Replace it with a moat that lives upstream (data, distribution) or downstream (eval, integration, agent reliability). If you cannot, you do not have a moat.
  2. Price-stress your inference economics against $0.50 per million input tokens by end of 2027. If your unit economics break, your product is mis-positioned, not your provider.
  3. Run one production workload on a non-Nvidia path (Trainium, TPU, Ascend if accessible, MI300X) this quarter. Optionality on silicon is now strategy, not a science project.
  4. If you have proprietary, high-value, low-volume data (legal, medical, financial, industrial), model out a $10-25M pretraining budget against your avoided inference costs. For an increasing number of companies, the answer flips to "yes."
  5. Treat eval as a P0 capability. The half-life of "we tested it last month" is now weeks. Build the pipeline that lets you swap the underlying model in 48 hours without breaking production.

The cost collapse is not a story about DeepSeek. DeepSeek is the proof. The story is that training a frontier-grade model is going from a sovereign-scale capability to a Series-B-scale capability inside one product cycle, and almost every assumption in the 2024 AI strategy deck was written before that became true. Update the deck.