← Blog
Research May 2026

We stress-tested our alpha claim. Here's what we found.

Our headline backtest result is that companies flagged by FilingDrift underperform the S&P 500 by a median of −22.4% at 36 months, across 6059 flag events. Before using that number in public, we wanted to know whether it was real signal or an artifact of how we built the test. So we ran four robustness checks.

The short version: the number holds up. But the checks are interesting on their own, so here's what we did and what we found.

The concern: benchmark composition bias

The S&P 500 is cap-weighted. From 2020–2024, the Magnificent 7 (Apple, Microsoft, Nvidia, Alphabet, Amazon, Meta, Tesla) accounted for a large fraction of index returns. The median S&P 500 component underperformed the index by 8–15% over that period — not because it was distressed, but because it wasn't one of seven mega-cap stocks.

Our corpus is small-cap heavy. If we're measuring flagged company returns against a cap-weighted index dominated by mega-cap tech, our −22.4% could decompose into "~−12% for being small-cap" + "~−9% for actually being distressed." That's a very different claim.

Test 1: Equal-weighted benchmark

To directly remove the Magnificent 7 composition effect, we reran all 6059 flag events using RSP (Invesco S&P 500 Equal Weight ETF) as the benchmark instead of SPY. RSP weights every S&P component equally — it represents the average S&P stock, not the index dominated by the largest seven.

Horizon vs. SPY (cap-weighted) vs. RSP (equal-weighted) Composition effect
1 year −8.6% −8.3% −0.3%
2 years −14.8% −15.1% +0.3%
3 years −22.4% −21.9% −0.5%

The Magnificent 7 composition effect is essentially zero for flagged companies: the RSP alpha at 36m is −21.9% vs SPY −22.4% — a difference of 0.5%. Flagged companies underperform whether you use a cap-weighted or equal-weighted benchmark.

Test 2: Are we flagging companies that are already declining?

A different kind of concern: maybe we're just identifying companies that equity markets have already priced for distress. If flagged companies are already down 30% before we flag them, we're confirming what the market already knows — useful for some purposes, but not "early warning."

We measured the stock return in the 6 months before each of the 7069 flag events.

+4.4%
Median stock return in the 6 months before a flag
80% of flagged companies are NOT already down >20% at the flag date.
40.7% have any negative 6-month return. The majority are flat or positive.
20% are already declining significantly — for those, the signal is confirming, not leading.

The median flagged company is near its price high when we flag it. SVB at $267 on its filing date is the archetypal case. Bed Bath & Beyond was flagged while the meme rally was still running. Party City was flagged while appearing to recover from COVID restructuring.

The 20% that were already declining represent a different use case — the language is quantifying and contextualizing a decline that equity markets had begun to price in. That's still valuable for sizing a position or timing an exit. But the primary signal is in the 80% where the filing language diverges before the stock does.

Test 3: Has the market already priced in default risk?

Even if the stock price is near its high, credit markets could be quietly pricing in distress. If market-implied default probability is already elevated at flag time, our signal is confirming what sophisticated traders already know.

We tested this using the Merton (1974) Distance-to-Default model — the standard structural model of default risk, introduced in Merton (1974) and implemented following Bharath & Shumway (2008). Equity is modeled as a call option on firm assets; Distance-to-Default measures how many standard deviations of asset value separate the firm from the default boundary. A healthy firm has D-t-D > 3; a firm in distress has D-t-D near zero or negative.

Merton Distance-to-Default (n=1,565 events)
Median D-t-D at flag date 4.71
Companies with D-t-D > 2 (market sees as healthy) 82%
Median D-t-D change (6m pre → flag) +0.006 (flat)

Across 1,565 events, the equity market prices 82% of flagged companies as far from default at flag time. The Merton D-t-D is essentially flat at flag time (+0.006 median change) — consistent with the +4.4% pre-flag equity return from Test 3. Our language signal is ahead of both the stock price and the market-implied default probability. That's the cleanest possible "leading indicator" result.

Test 4: Does the signal survive factor adjustment?

Separately from these tests, we ran the full backtest through a Fama-French 3-factor regression. FF3 explicitly controls for market beta, small-cap factor (SMB), and value factor (HML) — so the intercept (alpha) is the return unexplained by standard factor exposure.

Result: 61 bps/month Q1–Q5 long-short alpha, t-stat 5.41 across 292 months. That's above the Lazy Prices benchmark (Cohen, Malloy & Nguyen 2020: 18–45 bps/month, t-stat 3–5). The signal survives factor adjustment.

This is the cleanest institutional-grade claim. The −22.4% simple SPY alpha is easier to communicate; the 61 bps/month FF3 alpha is harder to dismiss. They're both measuring the same underlying phenomenon from different angles.

Summary

Concern Finding
Cap-weighted vs equal-weighted S&P −21.9% vs RSP vs −22.4% vs SPY. Composition bias: <0.5% (negligible).
Lagging equity markets Median pre-flag 6m return: +4.4%. Leading, not confirming.
Market-implied default probability (Merton) Median D-t-D = 4.71 at flag. 82% appear healthy to markets.
Factor exposure (size, value) FF3 alpha 61 bps/month, t-stat 5.41. Survives adjustment.

Four independent tests. The claim holds on all of them. The Merton result is particularly clean: at the moment we flag a company, equity markets are pricing it as 4.71 standard deviations from default. The language is moving before both the stock price and market-implied credit quality.

Full methodology: /about and /faq.

Disclaimer: Research tool, not investment advice. Past performance of the backtest does not guarantee future accuracy.

This site uses a session cookie for authentication. We also use Plausible Analytics, a privacy-friendly, cookieless tool that collects no personal data and requires no consent under GDPR. See our Privacy Policy.