Hyperscalers: the failed AI bubble (and why we publish failures)

We thought "AI-Capex Hyperscalers" was going to be one of the strongest bubbles in our taxonomy. The thesis is overwhelming: $MSFT, $GOOGL, $AMZN, $META, $ORCL are committing the bulk of the multi-hundred-billion-dollar AI infrastructure spend in 2026. They're the demand side of the AI buildout. Surely they trade as a bloc.

We measured. They don't.

This article is the methodological-honesty piece. It walks through why the most obvious "AI bubble" in retail intuition fails the empirical test — and why we publish that result on the platform instead of papering over it.

The setup: what the editorial group looks like

The five names by 2026 capex commitments:

$MSFT — ~$120B in projected 2026 capex, primarily Azure / OpenAI compute
$GOOGL — ~$185B, Google Cloud + TPU buildout
$AMZN — ~$200B, AWS infrastructure + Anthropic deals
$META — ~$130B, AI training infrastructure + Llama serving
$ORCL — ~$50B, OCI catching up

Combined: ~$685B of AI infrastructure spend in a single year. Real money, real demand signal, real economic theme. From a fundamentals lens, they share an exposure: every dollar of AI capex either flows through these companies or runs into them as a constraint.

The first measurement — raw correlation looks compelling

Run a 252-day pairwise correlation matrix on the five names' raw daily returns. The numbers cluster around 0.55-0.70. Average pairwise correlation: ~0.65.

In retail-facing dashboards, this is where the analysis stops. "These stocks all correlate at 0.65 — strong bloc." The marketing slide for any thematic ETF uses this same number. The story sells itself.

If we stopped here, we'd publish a Hyperscalers bubble page declaring it validated, ship a daily basket-tracking widget, and charge for a bubble-correlation alert. This is what most thematic dashboards do.

We don't stop here.

The residualization step — and the collapse

Residualization, in 90 seconds: every stock's daily return decomposes into "what the market did" + "what was specific to this stock." Raw correlation conflates the two. Residualized correlation (correlation between the residuals after regressing each stock on $SPY) measures only the idiosyncratic component — the part that says "these stocks are responding to something specific the market isn't."

For Hyperscalers, the residualized correlation is approximately 0.05.

Effectively zero.

The 0.65 raw correlation was driven entirely by these stocks being mega-caps with similar betas. When the S&P moves, they move together because they're all in the S&P, not because they're responding to AI capex news. Strip the market exposure out and there is essentially no idiosyncratic co-movement among the five.

A residualized correlation of 0.05 is what you'd get if you randomly picked five mega-cap stocks from any sector. There is no AI-thematic signal in this bloc. It's SPY in a costume.

Why this happens — the dilution problem

The structural reason Hyperscalers fail the test: AI is too small a fraction of each company's revenue for the AI thesis to dominate the stock's idiosyncratic moves.

Rough breakdown by 2026 revenue mix:

$MSFT: AI-attributable revenue ~10-15% of total. Office, legacy enterprise, gaming, etc. dominate.
$GOOGL: ~10% AI-attributable. Search ads still ~75% of revenue.
$AMZN: AI-related AWS ~5-8% of consolidated revenue. Retail, AWS-non-AI, advertising dominate.
$META: Most "AI" investment is internal (recommendation systems, ad targeting). Externally-monetized AI revenue ~minimal.
$ORCL: OCI is growing fast but core database business is still the majority.

When the AI narrative shifts, each stock's reaction is filtered through 80-90% of revenue that has nothing to do with AI. The signal-to-noise ratio is poor. Idiosyncratic returns are dominated by company-specific factors (Microsoft's enterprise renewal cycle, Google's ad pricing, Amazon's retail margins) rather than the shared AI exposure.

Compare with our Quantum bubble which holds 0.76 residualized correlation: those are pre-revenue companies whose entire stock is the AI thesis. Pure exposure → tight residualized clustering. Hyperscalers are diluted exposure → noise.

Where to actually express AI capex

The deeper point isn't that AI capex isn't real — it obviously is. It's that Hyperscalers are the wrong vehicle to express it.

If you want pure AI-capex exposure, look at where the capex is actually flowing:

Semi Equipment (residualized 0.82) — $ASML, $AMAT, $LRCX, $KLAC. The cleanest single expression. AI capex flows through chip-fab capacity buildouts, and these are the picks-and-shovels.
Memory / HBM (residualized ~0.71) — $MU + SK Hynix + Samsung. Memory is the pacing constraint on AI compute; the bloc captures pure exposure to "AI compute volume × HBM ASP."
Datacenter Power (residualized ~0.55) — $VST, $CEG, $NRG, etc. The next-layer constraint. AI eats electricity and the utilities supplying AI campuses have small but pure exposure to the demand signal.

Each of these blocs has tighter residualized clustering than Hyperscalers because the AI thesis is not diluted by 80-90% of unrelated revenue.

Why we publish the failure

The methodological lesson generalizes beyond Hyperscalers. Editorial taxonomies that group by story ("companies investing in AI") fail when the underlying revenue exposures don't cohere. The story can be true at the macro level — there is an AI capex bubble — without the story being expressible at the equity level through the names that allocate the capex.

The discipline we hold: build editorial bubbles, test empirically, publish whichever answer the data gives.

If we only published the bubbles that validated, the platform would be a thematic-investing dashboard that always confirms whatever narrative is fashionable. That's a marketing tool, not research.

If we publish the failures alongside the successes, the platform is methodologically honest. When we say a bubble is real (Quantum, Semi Equipment, Memory), the claim has weight because we're not afraid to say which bubbles aren't.

The Hyperscalers page on the platform stays public with the failed-validation result intact. Same for AI Software (~0.10 residualized) and Robotics (~0.20 residualized). Honest "we measured this and it doesn't work" is more credible than padded "this thematic narrative is real because [analyst convention]."

Receipts > narratives. The platform's job is to surface the empirical truth about thematic flows, not to validate every story that happens to feel right. Half the editorial AI bubbles aren't real blocs. We say so.

Practical takeaways

For traders trying to express an AI thesis:

1. Don't buy Hyperscalers as a basket and call it AI exposure. You're effectively buying a high-beta SPY. Whatever AI thesis you have is being diluted by 85%+ of these companies' revenue that has nothing to do with AI. If you want SPY exposure, buy SPY at lower fees.

2. If you want one of the names individually, fine — but it's an idiosyncratic bet. A long position in $MSFT is a bet on Microsoft's enterprise execution + Azure AI growth + Office Copilot adoption. Buy it on those merits. Don't pretend it's a "bloc" play.

3. The actual AI capex blocs are upstream. $ASML orders are the cleanest single signal that AI capex is real and accelerating. $VRT backlog is the cleanest signal that it's hitting the rack-deployment phase. Hyperscaler stock prices are not in this category.

Browse all 12 bubbles ranked by empirical realness for the full taxonomy. Or read the methodology piece on residualized correlation for the technique that produced the Hyperscalers result above.

The live Hyperscalers bubble page shows the daily-updated raw and residualized correlations for the five names, with the failed-validation flag intact. Refreshed nightly after the US close.

We measured. They don't.

The setup: what the editorial group looks like

The five names by 2026 capex commitments:

$MSFT — ~$120B in projected 2026 capex, primarily Azure / OpenAI compute
$GOOGL — ~$185B, Google Cloud + TPU buildout
$AMZN — ~$200B, AWS infrastructure + Anthropic deals
$META — ~$130B, AI training infrastructure + Llama serving
$ORCL — ~$50B, OCI catching up

The first measurement — raw correlation looks compelling

Run a 252-day pairwise correlation matrix on the five names' raw daily returns. The numbers cluster around 0.55-0.70. Average pairwise correlation: ~0.65.

We don't stop here.

The residualization step — and the collapse

For Hyperscalers, the residualized correlation is approximately 0.05.

Effectively zero.

A residualized correlation of 0.05 is what you'd get if you randomly picked five mega-cap stocks from any sector. There is no AI-thematic signal in this bloc. It's SPY in a costume.

Why this happens — the dilution problem

The structural reason Hyperscalers fail the test: AI is too small a fraction of each company's revenue for the AI thesis to dominate the stock's idiosyncratic moves.

Rough breakdown by 2026 revenue mix:

$MSFT: AI-attributable revenue ~10-15% of total. Office, legacy enterprise, gaming, etc. dominate.
$GOOGL: ~10% AI-attributable. Search ads still ~75% of revenue.
$AMZN: AI-related AWS ~5-8% of consolidated revenue. Retail, AWS-non-AI, advertising dominate.
$META: Most "AI" investment is internal (recommendation systems, ad targeting). Externally-monetized AI revenue ~minimal.
$ORCL: OCI is growing fast but core database business is still the majority.

Where to actually express AI capex

The deeper point isn't that AI capex isn't real — it obviously is. It's that Hyperscalers are the wrong vehicle to express it.

If you want pure AI-capex exposure, look at where the capex is actually flowing:

Semi Equipment (residualized 0.82) — $ASML, $AMAT, $LRCX, $KLAC. The cleanest single expression. AI capex flows through chip-fab capacity buildouts, and these are the picks-and-shovels.
Memory / HBM (residualized ~0.71) — $MU + SK Hynix + Samsung. Memory is the pacing constraint on AI compute; the bloc captures pure exposure to "AI compute volume × HBM ASP."
Datacenter Power (residualized ~0.55) — $VST, $CEG, $NRG, etc. The next-layer constraint. AI eats electricity and the utilities supplying AI campuses have small but pure exposure to the demand signal.

Each of these blocs has tighter residualized clustering than Hyperscalers because the AI thesis is not diluted by 80-90% of unrelated revenue.

Why we publish the failure

The discipline we hold: build editorial bubbles, test empirically, publish whichever answer the data gives.

If we only published the bubbles that validated, the platform would be a thematic-investing dashboard that always confirms whatever narrative is fashionable. That's a marketing tool, not research.

Practical takeaways

For traders trying to express an AI thesis:

Browse all 12 bubbles ranked by empirical realness for the full taxonomy. Or read the methodology piece on residualized correlation for the technique that produced the Hyperscalers result above.

The live Hyperscalers bubble page shows the daily-updated raw and residualized correlations for the five names, with the failed-validation flag intact. Refreshed nightly after the US close.

Hyperscalers: the failed AI bubble (and why we publish failures)

The setup: what the editorial group looks like

The first measurement — raw correlation looks compelling

The residualization step — and the collapse

Why this happens — the dilution problem

Where to actually express AI capex

Why we publish the failure

Practical takeaways

Related bubbles

Get the daily digest.

Hyperscalers: the failed AI bubble (and why we publish failures)

The setup: what the editorial group looks like

The first measurement — raw correlation looks compelling

The residualization step — and the collapse

Why this happens — the dilution problem

Where to actually express AI capex

Why we publish the failure

Practical takeaways

Related bubbles

Get the daily digest.