Cerebras (CBRS) IPO: the first pure-play inference bet hits the tape
Cerebras (CBRS) opened +68% on its $5.55B IPO. Wafer-scale silicon vs NVIDIA, 86% UAE customer concentration, $24.6B backlog. The bull and bear cases.
$CBRS priced its IPO at $185 on May 13, opened at $350, and closed day one at $311.07 — +68.2% from offer. 30M shares for $5.55B raised, 2026's largest IPO so far, $48.8B fully-diluted valuation against $510M in 2025 revenue. The book was reportedly 20× oversubscribed; the range walked from $115-125 → $150-160 → $185 over the roadshow.
That's the headline. The actual interesting thing about Cerebras is that it's the first specialist inference accelerator to trade publicly, which makes it the first listed name where you can express a view on the inference layer without going through $NVDA. That's a clean carve-out the public market didn't previously offer.
This piece walks through what the chip actually does, why the customer concentration is the central risk, where CBRS sits in our Compute / GPUs bubble taxonomy, and five things the tape isn't yet pricing.
The chip: wafer-scale, layer-by-layer, inference-tuned
The Wafer-Scale Engine 3 (WSE-3) is one die the size of a dinner plate. 4 trillion transistors. 900,000 cores. 21 PB/s of on-wafer memory bandwidth — roughly 2,600× a single NVIDIA B200. The CS-3 system that wraps it delivers 125 PFLOPS of AI compute, draws 23 kW, occupies 15U.
For inference, that bandwidth advantage cashes:
- On Llama 3 70B reasoning workloads, Cerebras claims ~21× faster inference than B200 at ~32% lower total cost of ownership.
- Independent Artificial Analysis benchmarks on Llama 4 Maverick (400B params): CS-3 delivers ~2,500 tokens/sec/user, vs NVIDIA DGX B200 ~1,000, SambaNova ~794, Groq ~549.
- Single-chip inference: 1,200–2,000 tok/s on WSE-3 vs ~100–150 on a single H100.
The architectural reason is structural, not marketing. Cerebras runs a "layer-by-layer" dataflow: the entire wafer computes one layer of the model for all in-flight data, then the next. This eliminates the cross-chip memory synchronization that dominates multi-GPU inference latency. Token generation is sequential, latency-bound, and bandwidth-bound — exactly the workload where wafer-scale wins.
Cerebras's own S-1 is explicit about what this doesn't do: it doesn't challenge NVIDIA on training, doesn't displace general-purpose compute, doesn't pursue the broad CUDA-ecosystem moat. The pitch is narrower and sharper — latency-critical inference for frontier models, where every additional ms of TTFT is a UX cost.
This narrowness is the bull case AND the bear case. If inference becomes a workload-segmented market (training on NVIDIA, latency-sensitive inference on specialist silicon), Cerebras owns a real lane. If hyperscaler inference consolidates back onto GB200 NVL72 racks because the tooling is already there, the lane gets squeezed.
Revenue trajectory: real, but concentrated
The growth curve looks like an inflection: $24.6M (2022) → $78.7M (2023) → $290.3M (2024) → $510M (2025), +76% YoY. GAAP net income $237.8M in 2025 — but the GAAP operating loss was $145.9M, so the bottom-line print is driven by non-operating items (valuation marks, deferred tax). The operating business is still burning cash; the headline profit is an artifact.
The concentration disclosed in the S-1 is the part that matters more than the curve:
- G42 (UAE): 24% of 2025 revenue (was 85% of 2024)
- MBZUAI (Mohamed bin Zayed University of AI): 62% of 2025 revenue
- Together: ~86% of 2025 revenue, with both entities flagged in the filing as related parties to each other
In practice the apparent diversification away from G42 in 2025 was reallocation between connected Abu Dhabi entities, not new customer acquisition. The total UAE-linked share didn't move — it just got two columns instead of one.
The backlog tells the same story with the next leg on top:
- $1.43B in long-term commitments from G42
- $10B / 750 MW OpenAI deal signed January 2026, running through 2028
- Total disclosed backlog: ~$24.6B, of which ~80% is OpenAI
So the customer base on a revenue basis is two-thirds UAE today, and on a backlog basis it pivots hard to OpenAI for 2026–2028. Three customers explain the entire forward business. Two of them are related-party Abu Dhabi entities; the third is a private company whose own runway is consumed by Microsoft's compute pricing and Stargate's actual build pace.
The CFIUS overhang, and why it still matters
Cerebras filed its first S-1 in September 2024 and was forced to withdraw after the U.S. Committee on Foreign Investment in the United States (CFIUS) opened a review of G42's minority stake. The review concluded in October 2025 after G42's holding was restructured to non-voting shares. That cleared the path for the May 2026 listing.
The legal entanglement is resolved. The economic one isn't. ~86% of 2025 revenue still flows from a foreign-government-linked customer cluster, and the export-control posture for Middle East AI compute has been an active, bipartisan US policy file since 2023. If the next administration tightens the H20/Blackwell-equivalent rules to cover wafer-scale specialty silicon — a category that didn't exist when the current regime was written — the customer concentration becomes a regulatory tail.
This isn't a base-case bear. But it's the kind of risk the prospectus prices in once via boilerplate and the market re-prices repeatedly as headlines arrive.
Where CBRS slots in the bubble taxonomy
For readers familiar with our 12 editorial AI bubbles framework: CBRS belongs in the Semiconductors / Compute bloc on paper — same demand signal (AI capex), same end-use (model inference) — but the residualized correlation will almost certainly print lower than NVDA/AMD/AVGO within that bloc, and the reason is the same dilution argument we ran on Hyperscalers in reverse.
NVDA's stock responds to: data-center revenue, gaming, automotive, Mellanox, software licensing, China export-rule news, broad AI capex sentiment. The AI inference thesis is one of seven drivers.
CBRS's stock will respond to: G42 renewal cadence, OpenAI deployment milestones, AWS Bedrock ramp, one CFIUS headline, one Stargate timeline update. The AI inference thesis is essentially the whole stock.
That's the Quantum failure mode in reverse: pure-thesis exposure with nothing to dilute it. Net result: CBRS's residualized return won't track NVDA tightly even though both are "AI compute." It'll trade more like a pre-revenue Quantum name with one mega-customer added — episodic, headline-driven, sized off backlog announcements.
We'll add CBRS to the live Semiconductors / Compute bubble dashboard after the first 30 sessions of post-IPO trading and report what the residualized correlation actually prints. Our prior: it joins the bloc but doesn't tighten it — and the within-bloc residualization for the GPU pure-plays may improve once CBRS pulls the inference-specialist exposure out.
The AWS read-through
March 13, 2026: AWS announced Cerebras would be the first cloud provider for its disaggregated inference offering on Amazon Bedrock. The architecture pairs AWS Trainium silicon with WSE for "5× more high-speed token capacity in the same hardware footprint."
This matters in two directions:
-
For Cerebras, it's validation that the largest hyperscaler is willing to put specialist silicon next to its own custom accelerators rather than route inference through NVIDIA. That's the strategic asset the customer-concentration disclosure doesn't capture — distribution leverage in the one channel that owns the inference demand.
-
For NVIDIA, the read is more nuanced than "Cerebras is taking market share." NVDA's response was to acquire $20B of Groq assets in December 2025 and announce Groq-architecture-based products months later. The frontier inference market is partitioning into a specialist tier, and NVIDIA is buying into it rather than ceding it. The competitive pressure on NVDA gross margin from this carve-out is real but narrow — frontier-latency inference is a small fraction of total AI silicon revenue today, even if it grows fastest.
The lockup math
CBRS has an unusually compressed lockup schedule. Over 60 million shares unlock by the Q2 2026 earnings release — that's roughly 2× the IPO float, hitting the market less than 90 days after the debut.
Concrete implication: the IPO supply on the tape today (30M shares from the offer) is structurally light. The flow that matters for the medium-term price is what happens at the lockup expiry. With 20× oversubscription in the book, the secondary supply will land into demand that already missed the initial allocation. But with insiders sitting on a 70%+ paper gain from the IPO open price, the supply incentive is also high.
This is the textbook setup for a sharp post-lockup volatility spike, either direction. The first 90 days of CBRS as a public stock are not a representative sample of where it trades long-term. The fair-value question gets a clean read only after the lockup absorbs and the holder base normalizes.
Five things the tape isn't yet pricing
In rough order of conviction:
-
Customer-concentration normalization is gradual, not instant. UAE share won't drop below 50% before 2027 even on the most generous OpenAI ramp. Anyone modeling CBRS like a "diversified AI infra" name is mispricing the next four quarters' political risk surface.
-
The OpenAI deal is dollars and compute capacity locked, not committed revenue. $10B / 750 MW through 2028 is a capacity-purchase agreement. Actual revenue recognition depends on OpenAI's own datacenter buildout pace, which is gated by power and zoning more than chips. The backlog converts at a pace OpenAI controls, not Cerebras.
-
Inference TAM is harder to size than training TAM, because per-query economics scale with model size × token volume × latency requirements. The bull case for inference-TAM expansion assumes frontier models stay big AND latency stays a moat AND on-device inference doesn't eat the long tail. Each of those three is contestable, and the consensus TAM ranges you'll see cited (anywhere from $50B to $200B by 2030) are not the same number.
-
The "21× B200" benchmark is workload-specific. Llama 3 70B reasoning is the use case Cerebras is optimized for. On vanilla embedding workloads, image generation, or training, the margin compresses or inverts. The all-in TCO advantage shrinks materially outside the latency-bound inference lane.
-
The GAAP profitability print is non-operating. The 2025 net income line is not a sustainable operating result. The company is still investing through its cost line. Forward EBITDA modeling that anchors off the $237M net-income figure rather than the $146M operating loss is reading the wrong field.
The live Semiconductors / Compute bubble dashboard tracks NVDA, AMD, AVGO, TSM, INTC, MRVL, MU, QCOM. CBRS will be added after 30 post-IPO sessions, with the residualized-correlation result published whichever way the data prints.
The deeper read: CBRS is the first listed name in the AI stack where the entire investment thesis is "inference specialization wins a structural lane against general-purpose GPUs." That thesis is empirically testable in a way most AI-infrastructure names aren't, because the customer base is small enough to track quarter by quarter and the workload (frontier-model inference) is concentrated enough to read off public benchmarks.
It's a clean trade — for or against — in a sector where most trades are dirty.
For taxonomy context: The 12 AI bubbles, ranked by empirical realness. For the related methodological piece on why concentrated single-thesis names cluster tightly: What is residualized correlation?.
Related bubbles
Get the daily digest.
One email a day · alerts + bubble shifts + new research. Free during beta.
No spam. One email per day max. Telegram alerts coming with the paid tier.