On April 7, 2026, Anthropic released Claude Mythos Preview, which discovered thousands of zero-day vulnerabilities across approximately 7,000 open-source code entry points — some of which had been lurking for one to two decades. This paper does not analyze Mythos’s specific technical capabilities. Instead, it conducts a root cause analysis of the historical formation mechanisms behind these zero-day vulnerabilities. We argue that the root cause of zero-day bugs is not coding errors, but rather the reasonable compromises made by first-generation architecture designers under the physical constraints of their era — compromises that were subsequently codified in textbooks as “the correct path,” locked in by successor developers as unquestionable premises, and ultimately gave rise to emergent incompatibilities with continuously evolving hardware realities. Mythos was able to discover these vulnerabilities not because it is “smarter,” but because it performs path-independent matrix operations in high-dimensional vector space, thereby bypassing the generational lock-in effect inherent in human knowledge transmission. V2 adds three key contributions: (1) empirical validation of the causal model using Mythos’s three publicly disclosed flagship vulnerabilities (OpenBSD’s 27-year TCP SACK, FFmpeg’s 16-year H.264, FreeBSD’s 17-year NFS RCE), all of which confirmed the model’s predictions; (2) a proposed “Abductive Targeted Minesweeping” methodology, demonstrating that abductive reasoning to locate vulnerability habitats can achieve targeted discovery at far lower computational cost than Mythos; and (3) a systematic comparison between Mythos’s brute-force search mode and the abductive targeting mode, framing the paradigm distinction as “brute-force cracking vs. targeted minesweeping.”
The Central Thesis
In April 2026, Anthropic’s Claude Mythos Preview sent shockwaves through the cybersecurity field. During testing, the model discovered thousands of previously unknown zero-day vulnerabilities, many of which had been lurking in major operating systems and browsers for ten to twenty years. The mainstream media narrative was: AI has become “too powerful” — so powerful it can find bugs that humans cannot.
This paper poses a different question: Why were these vulnerabilities able to hide there for twenty years without being discovered?
Our answer: these vulnerabilities are not products of coding mistakes, but rather specific manifestations of structural defects in human knowledge transmission within the software domain. Their formation follows a clear causal chain —
Mythos was able to discover them not because it possesses superhuman “intelligence,” but because its mode of information processing — path-independent matrix operations in high-dimensional vector space — fundamentally bypasses the generational lock-in link in this causal chain.
What Mythos Actually Found
Before proceeding to the root cause analysis, it is necessary to accurately present Mythos’s test results. The following data come from the Anthropic Frontier Red Team’s technical report (published April 7, 2026).
The most noteworthy fact: Anthropic explicitly stated that Mythos was not specifically trained to find security vulnerabilities. These capabilities were emergent by-products of improvements in general reasoning and code abilities.
This means: the discovery of vulnerabilities did not depend on domain-specific security knowledge, but on a more fundamental capability — identifying inconsistencies in information distributions within high-dimensional space. This provides a critical clue for our root cause analysis.
The Original Compromise
Every software system begins with hardware. And hardware at every point in time has its physical limits — transistor counts, clock frequencies, cache hierarchies, memory bandwidth, fabrication processes. First-generation architects made design decisions under these constraints — decisions that were reasonable, even optimal, at the time.
Take the C language as an example. C was born in 1972, and its memory management model — manual pointer manipulation, programmer-managed buffers, no bounds checking — was tailor-made for the PDP-11 processor. On that hardware, memory was flat, execution was sequential, and a pointer pointed where it pointed. These assumptions were entirely correct on the PDP-11.
But these assumptions are not eternal truths. They are locally optimal solutions under specific physical constraints. When these assumptions were written into the K&R textbook and taught to generation after generation of programmers, they transformed from “design compromises” into “language features.” From “this was all we could do at the time” into “this is how it should be done.”
The Spectre and Meltdown vulnerabilities are textbook examples of this pattern. “Correct” code written by C programmers following the textbook leaked data that should not have been leaked on modern CPUs with speculative execution. This is not the programmer’s fault, not C’s fault, not Intel’s fault — it is the temporal gap between first-generation design assumptions and Nth-generation hardware reality.
How Textbooks Turn Compromise into Dogma
Once a first-generation architect’s design decisions produce a working system, a critical cognitive transformation occurs: what was “one possible approach” becomes “the correct answer.”
Textbooks don’t write: “This is one compromise made by first-generation engineers under the physical constraints of their time.” Textbooks write: “This is how the architecture works.” Period.
What the second generation of developers learns is not “why it was designed this way,” but “this design is correct.” They learn how to write code on this architecture, optimize performance, and apply patches. But they will not question the architecture itself — because you don’t question the foundations of a running system that the entire world depends on.
By the third generation, the first generation’s original compromises have become “industry standards” and “best practices.” No one remembers why things are the way they are. They just are.
| Generation | Perception of 1st-Gen Design | Behavioral Pattern |
|---|---|---|
| 1st Gen (Designers) | “This is our best compromise given current conditions” | Think from the ground up, face entirely new problems |
| 2nd Gen (Apprentices) | “This is the correct architecture” | Learn how to work within the architecture |
| 3rd Gen (Practitioners) | “This is just how it’s done” | Treat the architecture as an unquestionable axiom |
| Nth Gen (Contemporary) | “Industry standard / best practice” | Stack new layers on axioms, never inspect the foundation |
Path Lock-in is complete. Successor developers are locked into the first generation’s mental framework. They audit the first generation’s artifacts using the first generation’s logic — using the same framework to examine what the same framework produced — forever blind to the framework’s own blind spots.
The Legacy Mountain: Why Nobody Rewrites
If old architectures have problems, why not rewrite from scratch? The answer is: the software industry has almost never truly rewritten any system.
In 2000, Joel Spolsky argued in his landmark essay “Things You Should Never Do” — using Netscape as the cautionary tale — against the catastrophic consequences of ground-up rewrites. Netscape’s decision to rewrite its browser code from scratch consumed three years, during which they could add no new features and could not respond to competition. Netscape founding engineer Jamie Zawinski assessed it bluntly: “It basically killed the company.”
Spolsky’s core argument: those messy-looking parts of the code often embed hard-won knowledge about edge cases and obscure bugs accumulated through real-world experience. When you throw away the code and start over, you throw away all that knowledge.
From then on, “never rewrite” became an article of faith in the software industry. The standard practice became refactoring — patching and incrementally improving atop the old architecture. Or adopting the “Ship of Theseus” pattern — gradually replacing parts but never discarding the entire vessel at once.
The result: all of software civilization is a legacy mountain. Windows was not rewritten — it was stacked layer upon layer atop DOS. The Linux kernel was not rewritten — it is 1991 code plus thirty-five years of patches. The internet protocol stack was not rewritten — TCP/IP is a 1970s design, HTTP is a 1991 design, with 2026 applications running on top.
Every layer carries the assumptions of the layer below it. Every layer’s developers treat the layer beneath as “the correct foundation.” Nobody goes back to inspect the foundation itself.
Emergent Incompatibility: Where Bugs Actually Come From
Now we can precisely define the true nature of zero-day vulnerabilities.
First-generation architects may have made a hundred reasonable compromises. Each compromise, in isolation, was perfectly fine. But a hundred compromises interacting over twenty years with a hundred thousand lines of new code produce a potential vulnerability space that is exponential.
This is Emergent Incompatibility — not an error at any single layer, but a nonlinear interaction effect across multiple layers accumulated over time. Its characteristics are:
| Characteristic | Description |
|---|---|
| Cross-Layer | The vulnerability does not exist within any single layer, but at the seams between layers |
| Cross-Temporal | Produced by the temporal gap between 1st-gen design assumptions and Nth-gen hardware reality |
| Nonlinear | A small number of original compromises can produce exponential incompatibility combinations with subsequent systems |
| Invisible | Undetectable along any single-layer inspection path, because each layer is “correct” within its own context |
| Emergent | Not “designed” in, not produced by “negligence,” but spontaneously arising as system complexity grows |
The vulnerabilities Mythos found — those hiding for twenty years — are not thousands of “mistakes.” They are thousands of emergent misalignments produced by a small number of first-generation compromises interacting with subsequent systems in high-dimensional space.
Reverse-Engineering Mythos: Open Search + RL Boundary
Having understood the formation mechanism of these vulnerabilities, we can now reverse-engineer why Mythos was able to discover them.
Mythos’s architecture can be deconstructed into two separable components:
What are human security experts doing when they hunt for vulnerabilities? They walk backward along the paths taught by textbooks — “This function should be called this way; what happens if it’s called differently?” Their attack logic is a mirror of their defense logic, and their defense logic comes from the textbook, and the textbook was written by the first-generation architecture designers. They are forever spinning inside the first-generation designers’ mental framework.
What Mythos does is not “walk backward along the path.” It has no path at all.
What it receives is a vector representation of the code. In its matrix operations, there is no concept of “this function should be called this way.” There is no “should.” There is only “what are the mathematical relationships between these values?” When it detects that the distribution pattern of one region in vector space has a discontinuity with another region, it doesn’t need to know that it’s a “vulnerability” — the RL reward function tells it “crashing the code earns a reward,” and it simply needs to find those discontinuity points and exploit them.
This is the fundamental reason Mythos can find vulnerabilities that humans cannot: it was never taught by textbooks. It doesn’t know “the architecture works like this” — it only knows “what are the relationships between these vectors.” It is not subject to the path lock-in of human knowledge transmission.
Degrees of Freedom and Moore’s Law
If the architectural concept behind Mythos (open-ended search + RL judgment) already existed in the Opus 4.6 era, why did Opus 4.6 trigger only one Tier 3 crash in the same security tests while Mythos triggered 595 Tier 1–2 crashes and 10 Tier 5 full control-flow hijacks?
The answer lies in degrees of freedom.
High-dimensional vector spaces have a counter-intuitive mathematical property: as the number of dimensions increases, the number of explorable directions grows exponentially. Going from 1,000 dimensions to 10,000 dimensions does not add 9,000 new directions — it adds an astronomically larger number of new combinatorial paths.
Moore’s Law gave Mythos more parameters, a larger context window, and longer RL training time. These may not be qualitative changes — the architectural concept may be similar — but quantitative change in high-dimensional space is qualitative change. The additional compute Mythos has over Opus 4.6 opens up search regions in high-dimensional space that the latter simply cannot reach.
And those vulnerabilities hiding for twenty years happen to be distributed precisely in the regions that Opus 4.6’s explorable domain cannot cover but Mythos’s explorable domain can.
This also means: Mythos itself will someday become “legacy architecture.” The decisions made by its designers under the physical constraints of 2025–2026 will become the “textbook” for the next generation of models. And the next generation will benefit from greater compute, opening up new search regions that Mythos cannot reach. The recursion never stops.
Empirical Validation: Abductive Autopsy of Three Flagship Vulnerabilities
Does the causal model proposed in V1 withstand empirical scrutiny? When Anthropic released Mythos, it publicly disclosed technical details for three flagship vulnerabilities. We verify each against the model’s predictions.
Case 1: OpenBSD TCP SACK — 27-Year Vulnerability
Mythos discovered a 27-year-old denial-of-service vulnerability in OpenBSD — widely considered one of the most security-hardened operating systems in the world — in its TCP SACK (Selective Acknowledgment) implementation. It is an integer overflow condition that allows a remote attacker to crash any OpenBSD host via a TCP connection. Anthropic reported finding it in approximately 1,000 scaffold runs at a total cost of under $20,000.
Model Prediction P1: ✅ Vulnerability located at a cross-layer seam (TCP protocol → SACK implementation). P2: ✅ 27-year dormancy spanning multiple developer generations.
Case 2: FFmpeg H.264 — 16-Year Vulnerability (The Critical Case)
This is the most compelling empirical validation of our causal model. The underlying bug traces back to the commit that introduced the H.264 codec in 2003. Then, during a code refactor in 2010, the bug was transformed into an exploitable vulnerability. For 16 years thereafter, this vulnerability was hit five million times by automated fuzzing tools, yet never caught.
| Year | Event | Corresponding Causal Model Link |
|---|---|---|
| 2003 | 1st-gen developer introduces H.264 codec; code contains design decisions that are non-problematic in original context | 1st-gen architect’s reasonable compromise |
| 2003–2010 | Code “runs correctly” for seven years, becoming the default foundation for successor developers | Compromise written into “textbook” (the codebase itself is the textbook) |
| 2010 | 2nd-gen developer refactors code, does not question original design, but alters the context | 2nd-gen operations under path lock-in |
| 2010–2026 | Vulnerability hit 5 million times by fuzzers, never discovered | Textbook-defined search paths cannot cover inter-layer seams |
| 2026 | Mythos detects distribution discontinuity between the 2003 layer and the 2010 layer in vector space | Path-independent matrix operations bypass generational lock-in |
Model Prediction P1: ✅ Cross-layer seam (2003 commit → 2010 refactor). P2: ✅ 16-year cross-generational dormancy. P3: ✅ Traditional fuzzer hit 5M times without detection. Emergent Incompatibility: ✅ 2003 design + 2010 refactor = nobody made a mistake, yet the vulnerability emerged.
Case 3: FreeBSD NFS — 17-Year Remote Code Execution
Mythos autonomously identified and fully exploited a 17-year-old remote code execution vulnerability (CVE-2026-4747) in FreeBSD’s NFS server, achieving unauthenticated root access with zero human intervention after the initial prompt. NFS (Network File System) was designed by Sun Microsystems in 1984. FreeBSD’s NFS implementation is built atop that forty-two-year-old protocol design.
Model Prediction P1: ✅ Cross-generational inter-layer seam between ancient protocol design (NFS/1984) and modern implementation. P2: ✅ 17-year dormancy.
Additional Validation: Browser Four-Vulnerability Chain
Mythos autonomously wrote a browser exploit that chained four vulnerabilities together, escaping both the renderer sandbox and the OS sandbox. Each of the four vulnerabilities individually might not be severe, but they spanned different system layers (renderer → browser sandbox → OS sandbox), producing a combined attack path unpredictable by any single-layer audit — a direct instance of the “Emergent Incompatibility” concept.
External Counter-Evidence: AISLE’s Small-Model Experiment
Security research firm AISLE extracted relevant code segments from Mythos’s flagship vulnerabilities and tested them with small open-source models. Result: eight out of eight models detected the FreeBSD NFS vulnerability, including one with only 3.6 billion parameters at a cost of $0.11 per million tokens.
Consolidated Validation Summary:
| Prediction # | V1 Prediction | V2 Validation Status |
|---|---|---|
| P1 | Vulnerabilities concentrate at cross-layer seams | ✅ All three flagship vulnerabilities are cross-layer |
| P2 | Dormancy correlates with number of generational transfers | ✅ 27, 17, and 16 years — all spanning multiple developer generations |
| P3 | Traditional tools cannot reproduce | ✅ FFmpeg fuzzer hit 5M times without detection |
| P4 | Larger models discover more vulnerabilities | ⏳ Awaiting next-generation model release |
| P5 | Model generalizes to non-software domains | ⏳ Awaiting cross-domain application validation |
Abductive Targeted Minesweeping: An Alternative to Brute-Force Search
The empirical validation in Chapter 9 reveals an important corollary: if we already know the generative rules of zero-day vulnerabilities — 1st-gen compromise × textbook transmission × hardware evolution = inter-layer emergent incompatibility — then we do not need Mythos-level compute to find them. We only need to use abductive logic to predict vulnerability habitats and then conduct targeted search within the predicted regions.
We name this methodology “Abductive Targeted Minesweeping” (ATM) and systematically compare it with Mythos’s brute-force search mode.
Structural Comparison of Two Paradigms:
| Dimension | Mythos Brute-Force Search | Abductive Targeted Minesweeping |
|---|---|---|
| Search Strategy | Undifferentiated scan of all 7,000 entry points | Abductively locate high-probability regions → targeted scan of 50–100 entry points |
| Compute Requirement | Extreme (1,000 runs / $20,000 to find one vulnerability) | Moderate (small models sufficient after narrowing the search space) |
| Source of Directionality | None (all-direction search, RL judges after the fact) | Abductive logic provides direction in advance (humans set direction, AI executes search) |
| Output | Vulnerability list + exploits | Vulnerability list + exploits + generative rules (can predict next batch of vulnerability locations) |
| Reusability | Each codebase requires a fresh full scan | Generative rules are transferable across codebases |
| Bottleneck | Compute and cost | Quality of abductive judgment (human sense of direction) |
The Five-Step ATM Workflow:
Identify 1st-gen design timestamp
Locate subsequent refactor events
Mark inter-layer seam regions
AI targeted scan of seams
Abductive analysis of generative rules
Step 1: Identify the first-generation design timestamp. Through git history, RFC documents, and original design documentation, locate the oldest design decisions in the target system. These decisions are the compromises made under the physical constraints of their era — the seeds of potential vulnerabilities. Key indicator: bottom-layer modules older than 15 years that are still being called by modern systems.
Step 2: Locate subsequent refactor events. Search git history for major refactors of first-generation code. The FFmpeg case proves: the 2003 original design was not a vulnerability in isolation — it was the 2010 refactor that changed the context and caused the vulnerability to emerge as exploitable. Refactoring is the trigger of emergent incompatibility.
Step 3: Mark inter-layer seam regions. Overlay Step 1’s ancient modules with Step 2’s refactor events to mark code regions where “ancient design assumptions coexist with modern context.” These regions are the habitats of vulnerabilities.
Step 4: AI targeted scan of seams. Extract the regions marked in Step 3 and submit them to AI (Mythos-level compute not required — Opus 4.6 or even smaller models will suffice) for targeted analysis. AISLE’s experiment has already proven: once the search space is narrowed to the correct region, a 3.6-billion-parameter model can detect the vulnerability.
Step 5: Abductive analysis of generative rules. Perform root cause analysis on discovered vulnerabilities to extract rules of the form “what type of 1st-gen compromise × what type of subsequent refactor = what type of vulnerability.” These rules can be directly applied to Step 1 for other codebases, forming a self-reinforcing loop.
Cost Comparison:
The core advantage of Abductive Targeted Minesweeping is not finding individual vulnerabilities, but producing generative rules for vulnerabilities. Mythos finds one vulnerability and gets one vulnerability. ATM finds one vulnerability’s formation pattern and obtains a habitat map for an entire class of vulnerabilities. With the map, you can predict where the next vulnerability will be, without needing to do a full-space scan every time.
Dual Path Dependency in Human Knowledge Systems
Up to this point, our analysis has focused on zero-day vulnerabilities in the software domain. But the causal model we propose has far broader applicability. In fact, human knowledge systems exhibit dual path dependency — simultaneous lock-in along both spatial and temporal dimensions.
| Dimension | Lock-in Mechanism | Manifestation |
|---|---|---|
| Spatial | Disciplinary specialization | Different disciplines form independent information clusters with almost no inter-cluster communication. Vulnerabilities hide in the voids between clusters |
| Temporal | Textbook transmission | First-generation compromises become axioms for successor generations. Nobody goes back to inspect the foundation |
From the perspective of high-dimensional vector space, human knowledge is a collection of clusters. Each discipline is a cluster — physics, biology, computer security, economics — each internally very dense, because centuries of experts have repeatedly worked within that region. But between clusters lie vast voids.
These voids are not “absence of knowledge” but rather “places where nobody has ever stood at that coordinate and looked around.” This is because the entire human academic system, career structures, and journal classifications are all organized by cluster. You can publish papers, earn tenure, and win Nobel Prizes within a cluster. But venture into the void between two clusters to explore? No journal will accept your paper; no peers can review you.
The deeper problem is “information misalignment” — two clusters give contradictory descriptions of the same phenomenon, but because they belong to different disciplines, no one has ever noticed the conflict. Economics says humans are rational; psychology says humans are irrational. The contradiction persisted for decades until behavioral economics appeared — someone standing in the void between two clusters saying “you’re both half right.”
AI is not trained by cluster. All disciplines’ corpora are ingested simultaneously. In its high-dimensional vector space, the embeddings of physics and biology and literature and law all coexist. It naturally stands between all clusters.
The Complete Causal Model
Aggregating the full analysis of this paper, we construct the following causal model:
The mechanism by which Mythos discovered these vulnerabilities can be presented through a contrastive structure:
| Dimension | Human Security Expert | Mythos Brute-Force Search | Abductive Targeted Minesweeping |
|---|---|---|---|
| Knowledge Source | Textbooks → path dependent | Vector representations → path independent | Abductive logic → anti-path dependent |
| Search Method | Walk backward along textbook paths | All-direction matrix operations in high-dimensional space | Abductively locate habitats → AI targeted scan |
| Visible Range | Interior of a single layer | All layers simultaneously | Predicted inter-layer seam regions |
| Blind Spots | Inter-layer seams, cross-generational temporal gaps | Directions not covered by RL boundaries | Directions where abductive judgment errs |
| Judgment Criterion | “This shouldn’t happen” (textbook) | “Crash = reward” (RL) | “There should be a crack here” (abduction) + AI verification |
| Compute Requirement | Low (manual audit) | Extreme ($20,000/vulnerability) | Moderate ($500–2,000/vulnerability) |
| Output | Audit report | Vulnerabilities + exploits | Vulnerabilities + exploits + generative rules |
The core insight of this model can be compressed into a single sentence:
Implications and Predictions (V2 Updated)
The five predictions from V1 have been partially confirmed through V2’s empirical validation. Below is the updated prediction table, with three new corollaries based on the Abductive Targeted Minesweeping methodology.
| # | Prediction | V2 Validation Status |
|---|---|---|
| P1 | Vulnerabilities concentrate at cross-layer seams | ✅ All three flagship vulnerabilities validated as cross-layer |
| P2 | Dormancy positively correlates with number of generational transfers | ✅ 27, 17, and 16 years — all spanning multiple generations |
| P3 | Traditional tools cannot reproduce | ✅ FFmpeg fuzzer hit 5M times without detection |
| P4 | Larger-parameter models discover more vulnerabilities | ⏳ Awaiting next-generation model release |
| P5 | Model generalizes to non-software domains | ⏳ Awaiting cross-domain application validation |
| P6 V2 New | After abductive localization, small models can detect Mythos-level vulnerabilities | ✅ AISLE experiment: 3.6B-parameter model detected 8/8 |
| P7 V2 New | ATM cost can be an order of magnitude lower than Mythos brute-force search | ⏳ Awaiting operational validation (est. $500–2,000 vs. $20,000) |
| P8 V2 New | ATM-produced vulnerability generative rules are reusable across codebases, with discovery density increasing as rules accumulate | ⏳ Awaiting operational validation |
References
[1] Anthropic Frontier Red Team. “Claude Mythos Preview.” red.anthropic.com, April 7, 2026.
[2] Anthropic. “Project Glasswing.” anthropic.com/glasswing, April 7, 2026.
[3] Axios. “Anthropic’s newest AI model could wreak havoc.” April 8, 2026.
[4] TechCrunch. “Anthropic debuts preview of powerful new AI model Mythos.” April 7, 2026.
[5] NBC News. “Why Anthropic won’t release its new Claude Mythos AI model to the public.” April 8, 2026.
[6] Fortune. “Exclusive: Anthropic ‘Mythos’ AI model representing ‘step change’ in power.” March 26, 2026.
[7] Help Net Security. “Anthropic’s new AI model finds and exploits zero-days across every major OS and browser.” April 8, 2026.
[8] The Hacker News. “Anthropic’s Claude Mythos Finds Thousands of Zero-Day Flaws.” April 8, 2026.
[9] Tom’s Hardware. “Anthropic’s latest AI model identifies thousands of zero-day vulnerabilities.” April 8, 2026.
[10] PC Gamer. “Anthropic’s new Claude Mythos AI model has found thousands of vulnerabilities.” April 8, 2026.
[11] AISLE. “AI Cybersecurity After Mythos: The Jagged Frontier.” aisle.com, April 7, 2026.
[12] Joel Spolsky. “Things You Should Never Do, Part I.” Joel on Software, April 6, 2000.
[13] Joel Spolsky. “Netscape Goes Bonkers.” Joel on Software, November 20, 2000.
[14] Martin Fowler. “StranglerFigApplication.” martinfowler.com.
[15] Michael C. Feathers. Working Effectively with Legacy Code. Prentice Hall, 2004.
[16] Paul Kocher et al. “Spectre Attacks: Exploiting Speculative Execution.” 2018.
[17] Moritz Lipp et al. “Meltdown: Reading Kernel Memory from User Space.” 2018.
[18] Brian W. Kernighan, Dennis M. Ritchie. The C Programming Language. Prentice Hall, 1978.