Frequently Asked Questions

Honest answers to the questions we'd ask ourselves.

What is FilingDrift? +

FilingDrift is a tool that reads SEC 10-K filings and scores language change — how much the wording has shifted year-over-year, and how unusual that shift looks compared to peer companies in the same year. We flag the outliers. You decide what they mean.

We are not financial analysts, economists, or credit rating agencies. We are engineers who built a language model over a corpus of public filings, and we make the output available so you can add it to your own research process.

How is this different from asking ChatGPT? +

We don't use ChatGPT, Claude, or any large language model. This is worth being direct about because it changes the reliability properties entirely.

FilingDrift scores filings using a deterministic algorithm — the same filing always produces the same score. There is no text generation, no prompting, no summarization that might be inaccurate. The output is a number, computed from the actual language in the document.

No hallucination. The score is computed from actual sentences in the filing. We don't generate text or infer things the document doesn't say.
No truncation. 10-Ks are 100–200 pages. LLMs cut off at their context window. We process the entire document.
No one-shot snapshots. ChatGPT sees one filing. We compare it to every prior filing from the same company, and to every peer filing from the same year.
Repeatable. Run it twice on the same filing and you get the same score. ChatGPT gives you a different answer every time.

ChatGPT is good for summarizing things you already understand. FilingDrift is for detecting the drift you wouldn't otherwise notice.

How is this different from just reading the 10-K myself? +

You could read SVB's 2022 10-K and notice the phrase "unrealized losses" appears frequently. What you can't easily do: know that no other bank mentioned it as often that year, that SVB's usage increased 4× year-over-year, and that the sentence embeddings of those paragraphs place them semantically closer to distress language than anything JPMorgan filed.

The value isn't reading one filing. It's knowing where that filing sits relative to 40 other companies, at the same moment in time. That's the peer comparison you can't do by hand.

How accurate is it? +

8 out of 10 tracked crisis companies scored above the 95th percentile of healthy companies in the year before their collapse. We are honest about the 2 that didn't.

The misses have patterns: M&A distortions (a surviving acquirer inheriting target language), sector-wide shocks (where every peer also spiked, diluting the relative signal), and regulatory disclosures that read as distressed but aren't (banks under formal agreements use atypical language by mandate).

This is a signal, not a verdict. An 80% detection rate is useful as one layer of a larger research process. It is not useful as the only thing you look at.

What's the difference between semantic drift and sentiment analysis? +

Sentiment analysis assigns a positive/negative score to a piece of text. "We face significant liquidity risks" is negative. That's useful but shallow — most companies use cautious legal boilerplate, so everything scores slightly negative all the time.

Semantic drift is different. We're not asking "is this sentence negative?" We're asking: "Is this sentence semantically different from what this company said last year, and from what every peer company said this year?" A company that shifts from standard risk-factor boilerplate to language structurally similar to sentences found in bankruptcy filings — that's drift. The sentiment score might be the same. The semantic position has moved.

The other key difference: drift is relative. SVB mentioning "unrealized losses" is only meaningful because they mentioned it 4x more than last year and more than any peer bank that quarter. Sentiment analysis looks at each sentence in isolation.

I've been subscribed for months and haven't received any alerts. Is something wrong? +

Probably not. Annual 10-K filings are — as the name suggests — filed once per year. A company that filed in February 2025 won't file again until February 2026. For most tracked companies, there is genuinely nothing to alert you about for 11 months of the year.

The value of the subscription isn't frequent notifications. It's not missing the signal when it does arrive. When SVB filed on February 24, 2023, the score above ceiling was visible that day. Without an alert, you would have had to check manually. With one, you'd have known within hours of the filing landing on EDGAR.

If you want to check the current score of any company you're watching, the dashboard is always live. If you believe a company has filed and you haven't received an alert, email us at support@filingdrift.com.

If this tool is so accurate, why don't you trade on the signals yourself? +

Two honest reasons.

First: we're engineers, not traders. Building a reliable short position on a company requires more than a signal — it requires position sizing, risk management, broker relationships, and a thesis on timing. A company can have elevated language in its filing and still take 18 months to collapse. Being right about the direction doesn't tell you when, and "when" is what determines whether a trade makes money.

Second: this signal is not sufficient alone. FilingDrift scored SVB above the ceiling in its final filing. It also scored RTX above the ceiling in 2020 — because of a merger that generated distress-adjacent language with no actual distress. The false positive rate is low but real. A signal this uncertain, without other confirming indicators, doesn't make a good sole basis for trading.

What we provide is one research layer — a heads-up that the language has changed in a statistically unusual way. What you do with that, in combination with your own analysis, is entirely your call. We explicitly do not provide investment advice.

Isn't this survivorship bias? You picked companies you already knew failed. +

It's a fair challenge. We selected crisis companies after the fact — SVB, BBBY, Rite Aid, Party City — because they had documented collapse events with known dates. That selection process can't introduce bias into the scoring algorithm itself (which is deterministic and has no knowledge of the outcome), but it absolutely could bias how we report results. We've tried to address this three ways.

First, we didn't cherry-pick the crisis companies — we include every case we analyzed, including the misses (Silvergate, Countrywide, PG&E). Second, we include 30 healthy control companies and report the false positive rate: 6 of 30 healthy companies exceeded the ceiling at some point. We don't hide that. Third, we're adding more companies continuously, rather than hand-selecting the most favorable set.

The deeper version of the question is: "If I had been watching a random set of 500 companies in 2022, would FilingDrift's elevated scores have been actionable, or would they have been drowned out by false positives?" That's the right test — and it's what we're building toward as the corpus grows to 1,000+ companies.

Does the algorithm have lookahead bias? Did you tune it knowing the outcomes? +

We were careful about this but you're right to ask. The algorithm has two components: a phrase escalation score and a semantic drift score. The phrase escalation score is entirely blind to outcomes — it measures frequency change and cross-company rarity, which are properties of the text itself, not labels we applied.

The semantic component uses an "anchor" set of distress-adjacent sentences drawn from confirmed crisis filings to define a "distress direction." This is where lookahead risk exists: if we tuned the anchor set to maximize scores for known failures, the results would be circular. In practice, we built the anchor set before running the full analysis, and we use the same anchors across all companies — we didn't iterate to improve detection on specific cases.

The honest answer is: the approach was developed with some knowledge that certain companies had failed, so we can't claim it's a fully out-of-sample test. What we can say is that the algorithm has no company-specific tuning — SVB's score is computed the same way as JPMorgan's. The right validation is prospective: watching how it performs on new filings from companies not in the training set. We'll report on that as the corpus grows.

Is this investment advice? +

No. We analyze language in public SEC filings. We don't predict stock prices, recommend trades, or guarantee any outcome. Past detection of distress events does not mean future detections will be accurate.

We built a linguistic measurement tool. What you do with the measurements is entirely your call.

How do you compute the score? +

The score combines two things:

What's new or escalating. Phrases that appeared for the first time, or dramatically increased year-over-year, weighted by how rarely other companies use them. Boilerplate that every company uses scores low. Language specific to this company, in this year, scores high.
Where the language is heading. A measure of how much the language has shifted overall, focused on the sections most predictive of distress — risk factors, liquidity disclosures, and the management discussion. This catches concept-level drift that keyword lists miss.

Both components are calibrated against 30 healthy companies. The 95th percentile of that group is the control ceiling — scores above it are flagged. See the About page for more.

How often are scores updated? +

We check EDGAR daily for new 10-K filings. When a tracked company files, we process it and update the score within 24 hours. Pro subscribers get an email alert when this happens.

Most large-cap companies file once a year. The interesting moment is the 24–72 hours after filing, when the document is public but most people haven't read it. That's the window we're designed for.

What companies are covered? +

Currently 1890+ companies: a mix of verified crisis events (SVB, Lehman, Enron, Bed Bath & Beyond, Party City, Revlon, and others) and healthy control companies (large-cap banks, retailers, consumer staples) used to calibrate the baseline.

Pro subscribers can request coverage of any ticker. Pro+ subscribers get watchlists up to 200 tickers.

What's in the Pro tier? +

Everything in Free, plus: email alerts when a company you follow files a new 10-K with an elevated score, a portfolio watchlist to track any ticker, CSV export for your own models, and API access for programmatic queries.

See the pricing page for current rates.

Who is this for? +

People who read SEC filings as part of their job or research — independent investors, credit analysts, short sellers, journalists covering corporate distress, and students studying the 2008 crisis or COVID bankruptcies.

It is probably not for casual retail investors looking for a stock-picking signal. The tool is most useful when you already have a view on a company and want to know if the language is confirming or contradicting it.

Who built this? +

FilingDrift is a small independent product operated by Latent Systems SAS, a French software company. We are not a hedge fund, not a financial advisory firm, and not affiliated with any broker-dealer.

We built this because we noticed that nobody was doing systematic language-change scoring on SEC filings at the sentence level, with peer comparison. The SVB story validated the approach. We're sharing it. See the About page for more.

Have a question that's not here? Email us at support@filingdrift.com.