← Blog
Research May 2026

We analyzed language drift in 4923 SEC 10-K filings over 10 years. Here's what the data shows.

SVB's 2022 annual report contained a sentence that no other bank in our corpus was writing at the time. Our system flagged it in January 2023. The FDIC arrived in March.

That's a striking data point. But it's one case. The question worth asking is: does this pattern hold at scale? We ran the numbers across 4923 public companies and 7069 flag events. Here's what we found.

The signal

Each year a company files a 10-K. We compare that filing's language against two things simultaneously:

The score is high when a company is simultaneously writing things unusual for itself and unusual for its peers. That double divergence is the signal. Computed using sentence embeddings, fully deterministic — the same input always produces the same score.

This is different from keyword search (which can't distinguish industry-wide language shifts from company-specific ones) and different from LLM summarization (which can't do cross-sectional comparison across thousands of filings).

What the backtest shows

We ran a forward-return backtest across every flag event in the full corpus: 7069 events from 4923 companies (excluding the 2007–2011 macro crisis era, which would inflate the numbers for any distress signal).

Horizon Median alpha vs. S&P 500 % events with negative alpha
1 year -8.6% 58%
2 years -14.8% 61%
3 years -22.4% 63%

n=7069 flag events, 4923 companies. Excluding 2007–2011 macro crisis era. Alpha = company return minus SPY return over the same period.

To be clear about what this means: across 7069 flag events, companies that crossed the distress ceiling underperformed SPY by a median of -8.6% at 1 year. 58% of those events had negative alpha — versus roughly 50% you'd expect from random flagging. That's a real directional signal in a noisy market, not a perfect predictor.

Specific cases

Companies we flagged before widely-known distress events:

Company Event Lead time
SVB Bank collapse Mar 2023 14 days (final filing)
Party City Bankruptcy Apr 2023 1,137 days
Nikola Bankruptcy Nov 2023 1,336 days
Bed Bath & Beyond Bankruptcy Apr 2023 731 days
Rite Aid Bankruptcy Oct 2023 167 days

What it missed — and why

Three notable misses worth documenting:

These aren't buried in a footnote. The signal requires multi-year filing history to work. Companies with few historical pairs have lower signal reliability, and we flag this on the company page.

Known methodological issues

Two problems we know about and haven't solved:

The binomial false-positive problem. The ceiling is set at the 95th percentile of pair scores from labeled stable companies. But if a company has 10 years of filing history, the probability of at least one pair randomly exceeding the 95th percentile is 1-(0.95^10) ≈ 40%. Companies with long histories have a structurally higher false-positive rate. We're working on adaptive per-company thresholds.

Coarse peer groups. We use EDGAR SIC codes for peer comparison. These are imprecise — a healthcare device company and a biotech might share a code despite very different filing vocabularies. Tighter industry classifications would improve the cross-sectional comparison.

The tool

We built FilingDrift to make this signal accessible. Free tier covers our labeled company set (the cases above and more). Individual, Professional, and Institutional plans add watchlist alerts, API access, and the full 4923-company corpus.

The live demo shows SVB's full score history with annotations. The methodology page has the technical detail and the full validation analysis.

← All posts

Questions about the methodology or specific tickers? Email hello@filingdrift.com

This site uses a session cookie for authentication. We also use Plausible Analytics, a privacy-friendly, cookieless tool that collects no personal data and requires no consent under GDPR. See our Privacy Policy.