Why correlation > narrative in thematic investing
Editorial taxonomies tell you what stocks SHOULD trade together. Correlation tells you what stocks ACTUALLY trade together. Most of the time, those are different lists.
Editorial taxonomies tell you what stocks should trade together. Correlation tells you what stocks actually trade together. Most of the time, those are different lists. The gap is the alpha.
Every thematic narrative — "AI stocks," "the green energy trade," "infrastructure plays," "the obesity drug bubble" — is a story. Stories are how the financial media talks. Stories are how ETF marketing works. Stories are how retail decides what to buy.
Mechanics are different. Mechanics are how money actually flows. When a sector ETF takes a billion in inflows on Tuesday, the constituents mechanically buy in proportion to their weight, regardless of whether they share a story. When a hyperscaler signals a capex revision, the upstream demand chain responds, regardless of editorial label.
Stories and mechanics overlap sometimes. Often, they don't. The traders who survive thematic cycles are the ones who measure the mechanics instead of trusting the stories.
What "AI stocks" actually contains
Pick any five well-known "AI stocks": $NVDA, $AMD, $MSFT, $ASML, $PLTR.
The narrative says: these all benefit from the AI supercycle. Buy them as a basket and you've expressed your AI thesis.
The data says: these are five stocks responding to five different primary factors.
- $NVDA moves on its own idiosyncratic story (margin extension, MI300 vs Blackwell competitive dynamics, Blackwell shipping).
- $AMD moves with NVDA secondarily but has its own data-center vs PC mix dynamics.
- $MSFT moves with the broader software factor — its AI revenue is 10-15% of total; the rest is Office, gaming, legacy enterprise. Most of MSFT's daily idiosyncratic move is unrelated to AI.
- $ASML moves with the semi-equipment bloc ($AMAT, $LRCX, $KLAC) — same buyer, same demand signal. Different bloc from NVDA.
- $PLTR is its own animal — political risk + government contracts dominate. The AI association is largely incidental.
Run a residualized correlation matrix on these five. The pairwise correlations between them are mostly low. The structure that emerges is: NVDA + AMD cluster with each other (Compute / GPU bloc), ASML clusters with the Semi Equipment bloc (which doesn't include NVDA or AMD tightly), MSFT clusters with the broader software factor (closer to $CRM than to NVDA), and PLTR clusters with no one.
Five "AI stocks." Five different blocs. The editorial label is doing no analytical work.
The market trades by exposure, not by theme
The reason this happens isn't that the editorial taxonomy is "wrong" — it's that the editorial taxonomy is grouping by story (these companies all benefit from AI) while the market trades by exposure (each stock responds to its dominant revenue driver, which may or may not be AI-related).
When $SOXX takes a billion in inflows, every constituent moves the same hour because the ETF mechanically buys in proportion to weight. That's exposure-driven. NVDA may or may not move with that ETF flow — it's in the index but its idiosyncratic factor often dominates.
When a thematic ETF like $BOTZ ("robotics") takes inflows, the constituents respond to that flow but the constituents as a group don't co-move on robotics-narrative news. They respond to flow, then they respond to their own factors.
This is why Semi Equipment is the strongest empirically-validated bloc in our taxonomy (0.82 residualized correlation): the four members share a single dominant exposure (capex from the same handful of foundries) and almost nothing else. Story and mechanics align.
It's also why Hyperscalers is the canonical failed bloc (~0.05 residualized): the five members nominally share an AI capex story, but their dominant revenue exposures are wildly different (Office, ads, retail, etc.). Story and mechanics diverge.
Worked example: the ARKQ "robotics" basket
A useful concrete case. $ARKQ is the "Autonomous Technology & Robotics" ETF. The marketing narrative: own one ticker, get exposure to the robotics theme.
Look at the constituent residualized correlations and you find a basket of stocks tied to roughly five distinct factors:
- $TSLA dominates the basket weight and moves on FSD news + EV demand. The "robotics" angle (Optimus humanoid) is a small fraction of the stock's idiosyncratic moves.
- ABB, Fanuc trade with the Asian factory automation cycle.
- $SYM moves with Walmart's quarterly capex guidance (its single largest customer).
- $NDSN moves with the precision-automation industrial cycle.
- Smaller names trade idiosyncratically.
The residualized correlation across the basket is approximately 0.20. Translation: there is no robotics bloc that owns a coherent share of these stocks' idiosyncratic moves. Buying ARKQ thinking you're long a robotics theme is buying a basket of unrelated bets that happen to be in the same ETF.
Doesn't mean ARKQ is a bad ETF — it might be a fine basket with reasonable risk-adjusted returns. It just means: it's not a thematic expression of robotics in any meaningful empirical sense. The thematic label is a marketing label.
Why this matters for sizing
Three concrete consequences for trade construction:
1. Position sizing is different for real blocs vs. fake blocs.
For a real bloc (Semi Equipment, Quantum), constituent risk is highly substitutable — your sizing budget is for the bubble, not the name. You can pick names within the bloc on liquidity or beta preference without changing your thesis exposure much. A 3% position in the bloc spread across 4-5 names is a single thesis bet sized at 3%.
For a fake bloc (Hyperscalers, AI Software, Robotics), each constituent is its own idiosyncratic bet. A 3% position in $MSFT is not "1/5 of a Hyperscalers position" — it's a full thesis bet on Microsoft. The implication: don't size as if you're diversifying within a thematic basket, because you aren't.
2. Hedging works for real blocs.
If you're long the Semi Equipment bloc, you can hedge with $SOXX (which is ~60% the bloc by weight). The residualized correlation is high enough that the hedge tracks the position.
You can't meaningfully hedge a fake bloc. Long Hyperscalers cannot be hedged with anything tighter than SPY itself, because the "bloc" is essentially SPY exposure to begin with.
3. Catalysts move blocs differently.
When a major thematic catalyst hits — say, a hyperscaler announces a capex revision — a real upstream bloc moves uniformly (Semi Equipment all rips together, with similar magnitudes). A fake bloc moves heterogeneously, with idiosyncratic dominators (the largest name) absorbing most of the move and smaller names lagging.
The trade implication: real blocs let you express thematic catalyst views with confidence in the bloc-level response. Fake blocs require predicting which constituent will absorb the most of the move, which is a much harder forecasting problem.
Our discipline
The QuantAbundancia approach:
- Build editorial bubbles — start with the narrative (the story we're testing).
- Compute pairwise residualized correlations — measure whether the story expresses at the equity level.
- Publish whichever answer the data gives — including, especially, the failures.
We've published 12 editorial bubbles. About half pass the validation test cleanly. The rest stay published with their failed-validation flags intact. The pages don't get taken down because the result that the bubble doesn't trade as a bloc is itself information — actionable for any trader who would otherwise have constructed a basket on the editorial label.
The deeper point isn't "use correlation, not narrative." Both have value. Narrative tells you what story is being told and where the marketing dollars flow. Correlation tells you whether the story actually expresses at the equity level. Use both — but don't let either substitute for the other.
If you want to see this discipline in action, every bubble page on the platform shows raw + residualized correlation side by side. The gap is the lesson.
Read What is residualized correlation? for the methodology, or The 12 AI bubbles, ranked for the full taxonomy with verdicts.
Related bubbles
Get the daily digest.
One email a day · alerts + bubble shifts + new research. Free during beta.
No spam. One email per day max. Telegram alerts coming with the paid tier.