Original Thought Paper · V2 · 2026

The Correct Path and the Incorrect Path
The Dimensionality Deficit in LLM Training Data

Shannon’s unfinished definition problem — why training data
containing only correct paths amounts to losing half the dimensions
of physical-world information
LEECHO Global AI Research Lab
&
Claude Opus 4.6 · Anthropic

V2 · 2026.03.30 · Seoul


Abstract

This paper proposes a core thesis: current LLM training data is structurally missing the entire dimension of “incorrect paths,” and this deficit is the root cause of LLM hallucination. Human academic papers record only the final correct path — the linear narrative from introduction to conclusion — while systematically deleting all trial-and-error, backtracking, dead ends, and abandoned hypotheses. The result of training LLMs on this single-dimensional data is: models learn “what correct answers look like” but have never learned “what incorrect paths look like and why they are wrong.” Based on the XY coordinate system framework from Signal and Noise (X-axis: logical autonomy; Y-axis: physical alignment), this paper argues that correct paths and incorrect paths together constitute complete physical-world information, and the absence of either constitutes dimensional incompleteness. Trial-and-error is not noise on the way to correctness — trial-and-error itself is signal.

01 · Core Thesis

Complete Physical-World Information = Correct Paths + Incorrect Paths

Training data containing only correct paths is dimensionally incomplete

In the physical world, every result that is ultimately verified as correct is accompanied by a large number of excluded incorrect paths. Newton tested and abandoned multiple different mechanical models before arriving at the law of universal gravitation. Einstein spent years of failed attempts before publishing general relativity in 1915, including a 1913 version he himself rejected. In the discovery of the DNA double helix, Pauling’s triple-helix model was a critical incorrect path — and it was precisely this error that helped Watson and Crick calibrate the correct direction.

These incorrect paths are not noise. They are a necessary component of signal. The correct path tells you “what the answer is”; the incorrect path tells you “why the answer is not something else.” The information content of the latter equals or exceeds the former. Because understanding why a proposition holds often requires first understanding why all similar alternative propositions fail.

Core Judgment

Complete physical-world information is a topological structure jointly defined by success and failure. An information system that records only successful paths is a map that draws roads but not cliffs. Anyone using this map will not know where they cannot go — until one day they walk to the edge and fall.

02 · Shannon’s Unfinished Definition

The Blank Space in the SNR Formula

SNR = P_signal / P_noise provides the power ratio but never defines the boundary

Shannon’s signal-to-noise ratio formula (SNR = P_signal / P_noise) provides the power ratio of signal and noise but never gave independent definitions for either — the boundary is subjectively drawn by the observer. In a communication system, the engineer knows what is signal and what is noise because the signal is what they themselves sent. But in the broader knowledge domain, this boundary is fuzzy, context-dependent, and even political.

This paper proposes a two-dimensional discriminant coordinate system to fill this gap. The X-axis is logical autonomy: whether the information is internally contradiction-free, whether each step naturally grows from the previous step, whether the overall structure is closed. This is a formally verifiable mathematical property that does not depend on external reference. The Y-axis is physical alignment: whether the information is consistent with observable physical reality. This is an experimentally testable empirical property, anchored to the physical world itself (not human consensus).

Critical Distinction

The choice of “physical alignment” rather than “factual alignment” here is crucial. “Facts” are consensus products generated after humans describe the physical world with signals — they have already passed through the observer’s filter and carry a self-referential dilemma: who adjudicates the facts. “Physical” refers to what exists before the filter: apples fall without consensus, entropy increases without peer review, the speed of light does not change with belief. This terminological substitution switches the Y-axis from a subjective frame of reference to an objective one.

03 · The XY Coordinate System

Logical Autonomy × Physical Alignment: A Discriminant Coordinate System for Signal and Noise

Four quadrants define the precise position of information in knowledge space
Signal Quadrant
High X · High Y
Logically autonomous and physically aligned. E=mc², the second law of thermodynamics, Shannon’s channel theorem. Internally contradiction-free and fully consistent with observable physical reality.

Hallucination Quadrant
High X · Low Y
Logically autonomous but physically misaligned. LLM hallucinations, elaborate conspiracy theories, perpetual motion machine blueprints. Internal derivations are contradiction-free, but inconsistent with physical reality.

Raw Light Quadrant
Low X · High Y
Physically aligned but not logically autonomous. Raw sensor readings, unprocessed observational data, initial meditative experiences. Consistent with physical reality but not yet structured by logic.

Noise Quadrant
Low X · Low Y
Neither is satisfied. Randomly generated text, corrupted data, information fragments with no structure and no physical correspondence.

Under this coordinate system, signal and noise are no longer a binary dichotomy but a continuous two-dimensional space. A piece of information is not “either signal or noise” but occupies a precise position in the coordinate system. This transforms signal quality assessment from subjective judgment to an operational two-dimensional metric.

04 · Dimensional Audit of Training Data

The Structural Deficit in Current LLM Training Data

Only correct paths are recorded; incorrect paths are systematically deleted

Current LLM training data comes primarily from: published academic papers, edited encyclopedias, reviewed code repositories, news reports, and books. These data sources share a common characteristic — they are all records of “correct paths.”

Data Source What Is Recorded What Is Missing
Academic papers Linear correct path from intro → methods → results → conclusion All abandoned hypotheses, failed experiments, dead ends
Encyclopedias Final confirmed knowledge entries Disputes, errors, and corrections during knowledge formation
Code repositories Final version that passes tests Reverted commits, failed architectural attempts
News reports Final narrative of events Early erroneous reports, subsequent correction statements
Books Author’s final argument Drafts from the writing process, deleted chapters

Analyzed through the XY coordinate system: current training data comes almost 100% from the endpoint coordinates of the signal quadrant (high X · high Y). The complete trajectory from starting point to endpoint — including wandering and correction in the hallucination quadrant (high X · low Y) — has been systematically erased. The model has never seen a complete record of path migration from incorrect to correct.

Correct Path Data
~100%
Published, verified, edited content in training corpora

Incorrect Path Data
≈ 0%
Trial-and-error records, failed experiments, abandoned hypotheses

Dimensional Completeness
50%
Only the positive half of the X-axis; missing the full path topology

05 · The Root Cause of Hallucination

LLM Hallucination Is a Direct Consequence of Dimensionality Deficit

The model cannot distinguish high-X·high-Y from high-X·low-Y because it has never seen labeled samples of the latter

Existing explanations for LLM hallucination typically attribute it to three factors: noise in training data causes the model to learn incorrect associations; the model generates unverified content with high confidence; RLHF training reward bias causes the model to favor generating outputs that “sound correct.” These explanations all remain at the symptom level.

The deeper root cause is: the model has never seen labeled samples of “logically autonomous but physically misaligned.” In the XY coordinate system, the signal quadrant (high X · high Y) and the hallucination quadrant (high X · low Y) have completely overlapping projections on the X-axis — both are logically coherent. The only distinction lies on the Y-axis — whether they align with physical reality. But virtually no “high X · low Y” negative samples exist in training data, so the model cannot learn discriminative ability on the Y-axis.

Analogy

A system that has never seen darkness cannot understand light. A system that has never seen a failed path cannot understand why a successful path succeeds. LLM hallucination occurs not because the model is “too dumb” but because its training data structurally lacks the entire dimension that would allow it to distinguish real from false.

This explains why increasing model parameters and training data volume cannot fundamentally solve the hallucination problem. A larger model trained on the same single-dimensional data yields a more refined single-dimensional representation, not a new dimension. A larger mirror still reflects only a two-dimensional image — it does not suddenly acquire depth by growing in area.

05.5 · V2 · Responding to the Strongest Rebuttal

Implicit Exclusion Information ≠ Explicit Failure Path Records

The information gap between a one-bit “not chosen” and a multi-bit “why it failed”

The strongest rebuttal to this paper’s core thesis is: incorrect path information is actually already implicit in correct paths. A paper that chooses method A implicitly communicates that methods B, C, and D were not chosen. Therefore, training data does not truly lack the incorrect path dimension — it merely exists in implicit form.

This rebuttal has some merit but confuses two things with entirely different information magnitudes. “Method B was not chosen” is a one-bit binary signal — chosen or not chosen. But “Method B was tested under condition C, encountered a violation of physical constraint E at step D, the failure mechanism was F, and this failure eliminated the feasibility of the entire class-G family of approaches” — this is a multi-bit structured signal containing starting point, path, termination condition, failure cause, and exclusion scope.

The information difference between the two can be quantified with a concrete example. In the Knuth event, Claude’s 27th exploration attempted a “single hyperplane + rotation” method. If only “the rotation method was ultimately not adopted” is recorded, the information content is 1 bit. But Claude’s actual record included: under what conditions the rotation method came close to succeeding, at what specific node it produced an irresolvable conflict, and why this conflict eliminated the entire “single hyperplane” category of approaches. This information directly helped subsequent explorations narrow the search space — the reason explorations 28–31 were able to quickly converge on the Cayley digraph structure was precisely because the failure records from the first 27 attempts precisely annotated “which directions are dead ends.”

Information Content Comparison

Implicit exclusion: “Method B was not chosen” → 1 bit.
Explicit failure record: “Method B failed under condition C, failure mechanism was F, eliminating class-G approaches” → multi-bit structured signal.

The former tells you “nobody took this road.” The latter tells you “where this road leads, where it breaks, why it breaks, and which similar roads will also break at the same point.” The information gap between the two is not on the percentage scale — it is on the order-of-magnitude scale.

More critically, current contrastive learning and negative sample training methods (such as DPO, KTO, etc.) are already attempting to use “rejected outputs” to improve models. But the negative samples in these methods are only labeled as “this output is bad” — a one-bit signal. They do not annotate “whether this output failed on the X-axis or the Y-axis,” nor do they annotate “what the specific failure mechanism was.” Therefore, even with contrastive learning, what the model learns is still only “avoid certain output patterns” rather than “understand why certain logically coherent constructions fail at physical alignment.” The dimensionality deficit problem has not been truly solved within the contrastive learning framework.

06 · Empirical Evidence

The Knuth Event: A Live Validation of the Value of Trial-and-Error Paths

March 2026: Donald Knuth recorded AI’s complete trial-and-error trajectory

On February 28, 2026, 87-year-old Stanford computer scientist Donald Knuth published a paper titled “Claude’s Cycles.” Claude Opus 4.6, through 31 guided explorations in approximately one hour, solved a directed graph Hamiltonian cycle decomposition problem that Knuth had been unable to crack after weeks of research. Knuth opened with “Shock! Shock!” and called it “a dramatic advance in automatic reasoning and creative problem solving.”

The most important aspect of this event is not the result but the process. Knuth’s friend Filip Stappers established a strict rule: progress must be recorded after every program run. This constraint forced Claude to produce a rare, complete trial-and-error trajectory record — among the 31 explorations were numerous dead ends, wrong directions, and abandoned strategies.

Linear function attempt (failed)
Brute-force DFS (too inefficient)
Simulated annealing (couldn’t generalize)
Single hyperplane rotation (conflict)
Cayley digraph structure identified (breakthrough)
Serpentine path construction (solution)

Fig. 1 · Claude’s 31-exploration trajectory on the Knuth problem (simplified) · Red: failed paths · Green: successful path

Analyzing this process through the XY coordinate system: among the 31 explorations, the first 27 mostly fell in the hallucination quadrant (high X · low Y) — logically coherent constructions that were not aligned with the problem’s actual solution. The serpentine construction Claude ultimately found held on the X-axis (logical autonomy), and Knuth subsequently completed the Y-axis verification by hand (physical alignment), confirming its entry into the signal quadrant.

Key Finding

Knuth’s astonishment at this event was not merely because AI solved the problem, but because he witnessed AI’s complete trial-and-error process. The trajectory of 31 explorations itself — containing all the failures and backtracks — constitutes a complete record of physical-world information. If only the final serpentine construction were recorded, over 90% of the information content would be lost.

07 · The Structural Bias of Academic Publishing

The “Correct-Path Dictatorship” of Human Academic Papers

The current academic publishing format systematically deletes half of physical-world information

The standard format of current academic papers — introduction, methods, results, conclusion — is a linear reconstruction of the correct path. It requires authors to compress what may have been years of tortuous exploration into a straight-line narrative from hypothesis to verification. All dead ends, incorrect assumptions, abandoned experimental directions, and “almost right but ultimately didn’t work” intermediate states are systematically deleted.

This means that in the generational transmission of human knowledge, successful paths are overrepresented and failed paths are almost entirely absent. The next generation of researchers, facing the same problem domain, can only see the one road their predecessors finally managed to get through — not the dozens of dead-end roads their predecessors walked. The result: every generation of researchers repeats the same pitfalls.

Dimension Current Academic Papers Complete Physical-World Information
Correct paths Fully recorded Fully recorded
Incorrect paths Systematically deleted Fully recorded + failure cause annotation
Path migration process Not recorded Complete trajectory from incorrect to correct recorded
Information dimensions Single dimension (endpoint coordinates) Two dimensions (complete path topology)
Value for subsequent research Knowing “what is correct” Knowing “what is correct + what is incorrect and why”

08 · Impact on LLM Training

How Dimensionality Deficit Shaped LLM Behavioral Patterns

A model that has only ever seen correct answers does not know how to trial-and-error

LLMs trained on data that is nearly 100% “correct paths” have developed several structural behavioral deficits:

First, the model does not know how to trial-and-error. Everything it has seen is the shortcut from “problem directly to correct answer.” Its default behavior is: generate an output that looks like a correct answer; if pointed out as wrong, generate another output that looks like a correct answer. It never says “let me try this direction first; if it doesn’t work, I’ll know why this direction fails, and then I’ll switch to another direction” — because no samples of this pattern exist in training data.

Second, the model cannot explain “why the answer is not something else.” When a user asks “why choose A instead of B,” the model can only construct post-hoc justification from the perspective of the correct answer A; it cannot present the complete elimination process of “B was tested, failed under a certain condition, and the failure cause was X.” Because B’s failure record does not exist in training data.

Third, the model has extremely low sensitivity to incorrect paths. When the user’s input is itself on an incorrect path, the model cannot identify and forewarn — because it has never learned the characteristics of incorrect paths. It can only wait until the user has traveled the entire path, obtained the wrong result, and then attempt to give the correct answer. This “post-hoc correction” capability is far inferior to “preemptive warning.”

Structural Comparison

The reason human experts can anticipate wrong directions is that they themselves have fallen into those pitfalls, or they’ve read records of predecessors falling into them. Their knowledge contains dual-dimensional information of correct paths and incorrect paths. LLMs have only the single dimension of correct paths, so their “expertise” is incomplete — they know where the destination is, but they don’t know which roads lead to cliffs.

09 · Trial-and-Error as Signal

Trial-and-Error Is Not Noise on the Way to Correctness — Trial-and-Error Itself Is Signal

Trial-and-error is the result of physical-world verification and must be aligned and recorded

In traditional thinking, trial-and-error is viewed as noise — it is the cost one must endure on the way to the correct answer, discardable once the destination is reached. This view is incorrect under the XY coordinate system.

A single trial-and-error event contains the following information: starting coordinates (which hypothesis it departed from), path direction (what logic it followed), termination condition (where it encountered misalignment with physical reality), and failure cause (which specific physical constraint caused the path to terminate). Each of these four items is a genuine measurement of the physical world — trial-and-error is a negative feedback signal issued by the physical world in response to a hypothesis.

The information content of negative feedback signals is no less than that of positive feedback signals. One experiment tells you “A is correct,” providing a coordinate point in the signal quadrant. Another experiment tells you “B is wrong, because under condition C it violated physical constraint D,” providing a coordinate point in the hallucination quadrant plus the precise boundary information between it and the signal quadrant. The latter often carries greater information content — because it marks not only a position but also a boundary.

Core Assertion

Achieving alignment on the first attempt is a low-probability event. Achieving alignment after extensive trial-and-error is the high-probability event in the physical world. Recording incorrect paths is not an “honesty” requirement of academia — it is a physics requirement of information completeness. Discarding incorrect paths means discarding the negative feedback signals that the physical world issued in response to your hypotheses. These signals are just as real, just as important, and just as irreplaceable as positive feedback signals.

V2 · Boundary Conditions

Not all trial-and-error is signal. The “trial-and-error is signal” defined in this paper has strict boundary conditions: (1) Trial-and-error must be conducted within a clear hypothesis framework — random attempts without a hypothesis are noise, not signal; (2) Trial-and-error must be recorded — unrecorded failures do not carry transmittable information; (3) The failure cause of trial-and-error must be annotatable — “tried it, didn’t work” is a one-bit signal; “failed under condition C due to physical constraint D” is a multi-bit signal. Trial-and-error satisfying all three conditions is signal. Otherwise, it is noise. Random keyboard mashing under the infinite monkey theorem does not satisfy condition (1) and therefore falls outside the scope of this paper’s “trial-and-error is signal” proposition.

10 · The Division of the X-Axis and Y-Axis

AI’s Territory and Humanity’s Territory

The X-axis belongs to AI, the Y-axis belongs to humans — this is an ontological boundary, not a division-of-labor preference

In the XY coordinate system, AI can efficiently execute operations on the X-axis — logical autonomy verification. Testing whether a construction is internally consistent, whether a derivation chain is contradiction-free, whether a mathematical structure closes — these are formalized operations that AI performs faster and more comprehensively than humans. In the Knuth event, all 31 of Claude’s explorations were path searches on the X-axis.

But operations on the Y-axis — physical alignment verification — AI cannot complete independently. Because the ultimate anchor of the Y-axis is the physical world itself, and AI has no perceptual channel to the physical world. It can only ever move on the X-axis. Every step on the Y-axis must be completed by a biological entity embedded in the physical world. Knuth completing the proof by hand — verifying whether Claude’s logical construction truly corresponds to the physical structures in graph theory — this is Y-axis verification.

X-Axis · AI’s Territory
Logical Autonomy Verification
Formal verification, internal consistency checking, exhaustive path search, structural closure testing

Y-Axis · Humanity’s Territory
Physical Alignment Verification
Experimental verification, physical intuition, real-world testing, experiential judgment

Signal Quadrant · Intersection
High X · High Y
AI completes X-axis screening, humans complete Y-axis verification; their intersection enters the signal quadrant

This division of labor is not an efficiency optimization — it is an ontological boundary. AI is not “temporarily unable to do Y-axis verification” — it is structurally incapable, because it does not exist in the physical world. This is precisely the fundamental limitation of the “cross-domain knowledge transfer” promise in current AGI narratives: AI can perform cross-domain logical structural isomorphism detection on the X-axis, but whether each detected isomorphism holds in the physical world must be verified by humans.

11 · Solution Paths

Four Directions for Incorporating Incorrect Paths into LLM Training Data

V2 · Ranked by feasibility, from most actionable to most difficult

Direction 1 (Feasibility: High · Priority: Highest): Structured preservation of human-AI conversation data. Deep conversations between high-cognition humans and AI naturally contain extensive trial-and-error trajectories — proposing hypotheses, AI attribution verification, discovering misalignment, correcting direction. If these conversations are structurally annotated (marking the location and reason for each direction correction), they become natural “correct path + incorrect path” dual-dimensional training data. The implementation cost of this direction is lowest — conversation data is already being generated; it only requires adding an annotation step. Stappers’ requirement that Claude record progress after each exploration in the Knuth event is the prototype of this method.

Direction 2 (Feasibility: High · Priority: High): Upgrade of the negative sample annotation system. Within existing RLHF/RLVR frameworks, add an “incorrect path annotation” dimension. Annotate not only “which output is better” but also “why is this rejected output wrong — did it fail on the X-axis or the Y-axis.” This allows the model to learn the precise boundaries of the hallucination quadrant. This direction can be incrementally implemented on existing training infrastructure without requiring architectural changes.

Direction 3 (Feasibility: Medium · Priority: Medium): Dedicated construction of trial-and-error trajectory datasets. Systematically collect trial-and-error records from scientific research, failure cases from engineering development, and elimination processes from diagnostic reasoning to build dedicated “incorrect path datasets.” The main obstacle for this direction is data scarcity — such data is extremely rare on the current internet and needs to be actively produced rather than passively collected. It is estimated that targeted collaboration with research institutions and engineering teams would be needed to obtain sufficient high-quality samples.

Direction 4 (Feasibility: Low · Priority: Long-term): Academic publishing format reform. Beyond the standard “introduction → methods → results → conclusion,” add “trial-and-error records” as a standard component of the appendix. Record hypotheses that were tested but failed, failure causes, and the path migration process from failure to the final solution. The implementation difficulty of this direction is highest — it requires changing centuries of inertia in the entire academic publishing system, involving behavioral pattern changes across journals, reviewers, and authors. But in the long run, this is the fundamental repair for information completeness.

12 · Conclusion

A Complete Map Must Mark Both Roads and Cliffs

The core thesis of this paper can be compressed into one sentence: correct paths and incorrect paths together constitute complete physical-world information, and an LLM with only correct paths is dimensionally incomplete.

Shannon’s SNR formula provides the power ratio of signal and noise but never defined the boundary between them. The XY coordinate system proposed in this paper — X-axis logical autonomy, Y-axis physical alignment — provides an operational boundary definition framework. Under this framework, trial-and-error is not noise but a negative feedback signal issued by the physical world in response to hypotheses, with information content equal to or greater than positive signals.

Current LLM training data is systematically missing the entire dimension of incorrect paths. This deficit directly causes the model’s inability to distinguish the signal quadrant (high X · high Y) from the hallucination quadrant (high X · low Y), because the two have completely overlapping projections on the X-axis. The fundamental path to solving the hallucination problem is not larger models or more homogeneous data, but completing the missing dimension — incorporating incorrect paths and their failure causes into training data.

AI can efficiently execute logical autonomy verification on the X-axis, but physical alignment verification on the Y-axis must be completed by humans embedded in the physical world. This is not a technical limitation — it is an ontological boundary. The signal quadrant is the intersection of both, not the exclusive territory of either.

V2 · Falsifiable Predictions

This paper offers the following falsifiable predictions for subsequent research to verify or refute:

Prediction 1: On the same base model, fine-tuning with training data that includes annotated failure paths (with failure causes and XY coordinate classification) should produce significantly lower hallucination rates on open-domain Q&A tasks compared to a control group fine-tuned only with standard correct-path data. Estimated magnitude of difference: hallucination rate reduction of 15%–30%. Verification time window: 2026–2027.

Prediction 2: On tasks requiring diagnostic reasoning — such as medical differential diagnosis, legal case analysis, and engineering fault troubleshooting — models trained with incorrect-path data should show quantifiable improvement in their ability to “explain why the answer is not something else.” Estimated accuracy improvement on elimination reasoning benchmarks: 10%–20%.

Prediction 3: If neither of the above predictions is verified — i.e., after adding annotated failure-path data, neither the model’s hallucination rate nor its elimination reasoning ability shows significant improvement — then the core thesis of this paper, “dimensionality deficit is the root cause of hallucination,” will be refuted, and the root cause of hallucination should be sought at the architectural level rather than the data level.

Final Proposition

A map that draws only roads and not cliffs is not merely an incomplete map — it is a dangerous map. The training data of current LLMs is exactly such a map. Completing the annotation of cliffs is the structural prerequisite for AI to move from “looking smart” to “being truly reliable.”

References & Acknowledgments

[1] Shannon, C. E. (1948). A Mathematical Theory of Communication. Bell System Technical Journal, 27(3), 379–423.

[2] Knuth, D. E. (2026). Claude’s Cycles. Stanford University Faculty Page. February 28, 2026. Revised March 6, 2026.

[3] Epoch AI (2024). Will We Run Out of Data? An Analysis of the Limits of Scaling Datasets in Machine Learning.

[4] Shumailov, I. et al. (2024). The Curse of Recursion: Training on Generated Data Makes Models Forget. Nature.

[5] Popper, K. (1963). Conjectures and Refutations: The Growth of Scientific Knowledge. Routledge.

[6] Kuhn, T. S. (1962). The Structure of Scientific Revolutions. University of Chicago Press.

[7] LEECHO Global AI Research Lab (2026). Signal and Noise: An Ontology of LLMs. V4 Definitive Edition. Part VII, Chapter 20: “From the XY Coordinate System to SN Polarity — Logical Autonomy × Physical Alignment: A Discriminant Coordinate System for Signal and Noise.”

[8] Rafailov, R. et al. (2023). Direct Preference Optimization: Your Language Model Is Secretly a Reward Model. NeurIPS 2023.

[9] Ethayarajh, K. et al. (2024). KTO: Model Alignment as Prospect Theoretic Optimization. arXiv:2402.01306.

The Correct Path and the Incorrect Path: The Dimensionality Deficit in LLM Training Data
LEECHO Global AI Research Lab & Claude Opus 4.6 · Anthropic
V2 · 2026.03.30 · Seoul · Original Thought Paper

댓글 남기기