What is residualized correlation? (And why most thematic ETFs lie)
The market beta hidden in 'AI bubbles' makes correlation matrices useless until you strip it. Worked example: why Hyperscalers fail the test and Quantum passes.
When you read that "AI stocks" all moved together this morning — $NVDA up 2%, $MSFT up 1.8%, $GOOGL up 2.1%, $AMZN up 1.7% — your first instinct is to call this a thematic move. They share an exposure. They're a bloc. The "AI bubble" is real.
That instinct is almost always wrong.
The reason these names moved together this morning isn't because they share a thematic AI exposure. It's because the market was up 1.6% and they're all $1T+ market-cap names with high beta. Strip the market move out and you'll find the residual co-movement — what's left after subtracting "what the market did today" — is essentially zero. The bloc is a costume.
This is the single most important methodological insight in thematic investing, and almost no one publishes the residualized data. So most "AI bubble" charts are misleading by construction.
Two parts of every stock return
Every daily return on a stock can be decomposed into two parts:
- Market beta — what the broad market did today, scaled by how much this stock typically moves with the market.
- Idiosyncratic — what's specific to this stock, after subtracting the market move.
When the S&P is up 1%, a stock with a beta of 1.5 is "expected" to be up 1.5% just because of market gravity. If it's actually up 1.5%, the idiosyncratic return for the day is zero — nothing happened to this stock specifically. Whatever it did was 100% explained by the market.
Now scale this up to a basket of stocks. If you compute their pairwise correlation using raw returns — the actual daily price changes — you'll find every mega-cap stock correlates with every other mega-cap stock at 0.5–0.8. That's not because they share a theme. It's because they all move with the market, and they're all in the market.
The residualization procedure
The fix is straightforward and well-understood in academic finance, but rarely surfaced in retail-facing dashboards:
- For each stock, regress its daily returns on the market's daily returns (we use $SPY) over the same window.
- Save the residuals — what's left after the regression. These are the daily returns the market couldn't explain.
- Compute the correlation between residuals across stocks instead of between raw returns.
What you get is the correlation that reflects only the idiosyncratic component. If two stocks still cluster after this, they share a real exposure beyond market beta. That's a thematic bloc. That's what we mean by a "bubble" in the QuantAbundancia taxonomy.
The headline test: residualized correlation strips the market beta and keeps the thematic flow. If the residualized number stays high, the bloc is real. If it collapses to near zero, the apparent co-movement was just everyone going up together because the tide came in.
Worked example #1 — Hyperscalers (the failed bubble)
Take the natural editorial grouping for "AI capex hyperscalers": $MSFT, $GOOGL, $AMZN, $META, $ORCL. They're the five companies committing the bulk of the multi-hundred-billion-dollar AI infrastructure spend in 2026. Surely they share a thesis.
Run the correlation matrix on raw 252-day returns and the picture looks compelling — pairwise correlations cluster around 0.55–0.70. If you stop here you'll conclude this is a coherent bloc, build a basket, and call it your AI capex trade.
Then residualize. Strip $SPY beta from each member. Re-run the correlation matrix on the residuals.
The numbers collapse to near zero.
What was happening: these are five of the largest market-cap names on the exchange. They have similar betas. When the S&P moves, they move together — but for S&P reasons, not for AI reasons. The raw correlation was being driven entirely by their shared market exposure, which is a property of being mega-cap, not a property of being AI-exposed.
The honest read: if you "play AI capex by buying hyperscalers," you're effectively buying $SPY with extra steps. Whatever AI thesis you have is being diluted by the 80–90% of these companies' revenue that has nothing to do with AI infrastructure spend.
This is why we publish the Hyperscalers bubble page with the failed-validation result rather than papering over it. Receipts beat narratives.
Worked example #2 — Quantum (the strongest validated bubble)
Now run the same procedure on Quantum: $QBTS, $RGTI, $IONQ, $ARQQ, $QUBT.
Raw 252-day correlation is high — around 0.78. Already an interesting starting point.
Residualize: 0.76.
Almost no drop.
These stocks share something the market can't explain. They co-move on news that has nothing to do with broad market sentiment. The signal is: when one quantum-computing name reports a benchmark, the others rip. When government quantum funding gets cut, the entire bloc tanks. The shared exposure is specific to the thematic story, not to market beta.
This is what a real bubble looks like in our framework. The Hyperscalers had a 0.65 → ~0.05 collapse under residualization. Quantum has a 0.78 → 0.76 hold. One of these is a tradeable bloc; the other is a marketing label.
Why this matters in practice
Most thematic ETFs are sold using raw correlation matrices in their marketing decks. That's not necessarily dishonest — it just reflects the industry standard for "see, these stocks move together." But for a trader trying to express a specific thesis, the raw matrix is systematically misleading.
Three concrete consequences:
-
Position sizing — A bloc with 0.76 residualized correlation means you can't really "diversify" within it; if one falls 20% on negative thematic news, they all fall. So your position size in any single name should be driven by your thesis budget for the bloc, not the name. A bloc that disappears under residualization (Hyperscalers) means treating each name as a separate idiosyncratic bet.
-
Hedging — You can hedge thematic exposure with the bloc's primary ETF if the residualized correlation is high. You can't hedge a phantom bloc.
-
Catalyst response — When a new piece of thematic news drops, a real bloc moves uniformly. A fake bloc moves heterogeneously, with idiosyncratic dominators (typically the largest name) absorbing most of the move while smaller names lag.
The methodology behind the platform
This isn't a one-off computation we ran for a single thread. The QuantAbundancia platform recomputes residualized correlations daily, across all 12 editorial bubbles, with a 252-day rolling window. Every bubble page on the site shows both the raw and the residualized number side by side. When the residualized number collapses, we say so publicly. Failed bubble pages stay published.
The deeper point is methodological: we'd rather kill an editorial bubble we like than keep a number that doesn't reproduce. The platform exists to publish that distinction transparently — not to validate every thematic story that happens to feel right.
That's the test. That's why we use 252 days. That's why every chart on the site shows the residualized line.
The narrative says "AI bubble." The data says: about half of them are real.
Want the data live? Every bubble page on the platform shows the latest residualized correlation, member-stock breakdown, and daily flow context. Free, refreshed nightly after the US close.
Related bubbles
Get the daily digest.
One email a day · alerts + bubble shifts + new research. Free during beta.
No spam. One email per day max. Telegram alerts coming with the paid tier.