ORIGINAL THOUGHT PAPER · MAY 2026 · V5

The Game Between AI
Ontology and Methodology

Why Mathematics-Based AI Cannot Restore
the Ontology of the Physical World

Why AI based on mathematical computation cannot fully equate its own models with the ontology of the physical world

Published May 21, 2026
Category Original Thought Paper
Domains AI Epistemology · Philosophy of Science · Ontology · Quantum Physics · Mathematical Logic
Version V5
Authors LEECHO Global AI Research Lab & Opus 4.6 & GPT 5.5 & Gemini 3.1 (Cognitive Collective)

ABSTRACT

Starting from the competitive landscape of the AI industry in 2026, this paper traces a fundamental question layer by layer: Can the current AI paradigm, centered on statistical learning, fully equate its own models with the ontology of the physical world? The paper first defines four levels of “ontological reduction” (complete equivalence, exhaustive completeness, effective modeling, paradigm discovery), specifying that the attack targets are the first two levels and the industry narrative that inflates Level III engineering achievements into Level I/II ontological commitments. Core conceptual innovation: the paper proposes “Extra-Model Emergent Events” and provides five operationalized criteria for their identification. The triple constraints of mathematics (Gödel, Chaitin, quantum mechanics) form a convergent multi-constraint system. Triple stochastic coupling leads to path dependence in the absence of stable feedback. The paper directly addresses four major counterarguments and adds two chapters of frontier analysis: Chapter IX systematically tests Level IV’s frontier boundary using cases including Robot Scientist Adam, AI-Newton, and the Emory plasma PNAS discovery; Chapter X explores architectural directions that might reach Level IV. The conclusion includes a falsifiability statement: if an AI independently discovers a new physical law without human hypothesis guidance, and this law is experimentally verified, it would constitute a refutation of this paper’s core thesis. The map can be very useful, but the map is not the territory.

DEFINITIONS

Core Definitions
Foundational Conceptual Framework

DEFINITION I · Four Levels of “Ontological Reduction”

“Restoring the ontology of the physical world” in this paper is not a single concept, but encompasses four progressive levels:

Level I — Complete Equivalence: The model is the world itself; mathematical representation and physical reality are indistinguishable. This paper’s verdict: Impossible. Valid and non-trivial.

Level II — Exhaustive Completeness: Closing the entire variable space of the world, exhausting all state transitions, predicting all futures. This paper’s verdict: Almost certainly impossible. Essentially valid.

Level III — Effective Modeling: Constructing locally effective approximate representations at specific tasks, scales, and precisions. This paper’s verdict: Possible, and already happening. This paper never denies the engineering value of this level.

Level IV — Paradigm Discovery: AI independently generates entirely new scientific hypotheses that negate existing frameworks, rewriting the variable space. This paper’s verdict: Open question. Current evidence is insufficient to affirm or deny.

This paper’s attack targets are strictly limited to Levels I and II, and to the industry narrative that inflates Level III engineering achievements into Level I/II ontological commitments.

DEFINITION II · Extra-Model Emergent Events

An Extra-Model Emergent Event = an event that is not explicitly represented in the variable space, training sample space, or objective function of existing models, yet genuinely occurs in the physical world through cross-level coupling, forcing the system to rewrite its variable space.

It must be emphasized that “extra-model” does not mean “acausal”—the discovery of penicillin had causal preconditions including experimental conditions, contamination environment, observational capability, and knowledge background. Innovation does not appear from nothing; rather, its causal structure is highly complex, high-dimensional, sparse, nonlinear, and cannot be enumerated in advance. It is not that there is no causality, but that the causal chain exceeds the closed variable space of any existing model.

Operationalized Criteria: An event qualifies as an extra-model emergent event if and only if it simultaneously satisfies all five of the following conditions:
(1) No corresponding dimension exists in the old model’s variable space;
(2) The old model’s objective function cannot reward the event’s direction;
(3) The old model’s training samples contain no isomorphic structure;
(4) After the event occurs, the model’s fundamental variable space or causal graph must be modified;
(5) It cannot be absorbed through parameter updates alone—it must be absorbed through structural rewriting.

DEFINITION III · Precise Delimitation of Attack Scope

Future AI may include robotic closed loops, active experimentation systems, causal discovery engines, neuro-symbolic systems, evolutionary search, and automated laboratories—these are still based on mathematical computation but are not equivalent to LLMs.

This paper precisely delimits its attack scope to: the current AI paradigm centered on statistical learning (including LLMs, world models, and reinforcement learning)—they share a fundamental characteristic: learning from the distribution of existing data, optimizing within known variable spaces. It should be noted that reinforcement learning (RL) contains active exploration mechanisms (such as AlphaGo’s self-play) that can discover strategies beyond the training distribution—but this exploration still occurs within a known rule space (Go rules are predefined). RL’s exploration boundary is constrained by the predefined structure of its reward function and state space, which remains fundamentally different from “rewriting the variable space itself.” For entirely new AI paradigms that may emerge in the future (e.g., systems with active experimental closed loops, symbolic reconstruction, and physical sensor embedding capabilities), this paper does not prejudge their limits but marks them as open questions (see Chapters IX and X).

SECTION I

Starting Point: The Nature of AI Competition Revealed Through Acquisitions
From Business Acquisitions to AI’s Fundamental Dilemma

In May 2026, Anthropic acquired Stainless—an SDK generation platform serving OpenAI, Google, and Cloudflare—for over $300 million.¹ The acquisition immediately removed a critical infrastructure layer from competitors, forcing OpenAI and Google to rebuild their own SDK pipelines. This was not merely developer tool consolidation; it exposed a deeper shift in AI competition: competition is moving from model quality to infrastructure control.

OpenAI projects cumulative cash burn of $665 billion by 2030, with positive cash flow not expected until then.² Anthropic’s annualized revenue surged from $1 billion in late 2024 to $30 billion by April 2026.³ Google invests $180–190 billion annually in AI infrastructure. Some researchers have called the AI bubble 17 times larger than the dot-com bubble.^4,5 What returns do these astronomical investments with certainty expect? This seemingly financial question actually touches on AI’s most fundamental philosophical dilemma.

SECTION II

The Mathematical Essence of LLMs: A Quantifiable Closed Probability Space
Token Space, Cross-Entropy, and the Law of Large Numbers

The entire capability of large language models (LLMs) rests on a single premise: making conditional probability predictions within a discrete, finite token space. The vocabulary is fixed (~100,000 tokens), the context window is finite, and each output step selects the highest-probability next element within this closed space. The loss function (cross-entropy) essentially minimizes “average error”—variance is large, but quantifiable, enumerable, and convergent.

When training data is sufficiently large, the conditional probability distribution of tokens approaches the true distribution—a product of the law of large numbers. The model’s cross-entropy optimization naturally favors high-probability tokens, systematically compressing the low-probability tail. The core characteristic of the current statistical learning paradigm: high-probability patterns near the mean are precisely captured; the tail is systematically neglected.

SECTION III

The Dilemma of World Models: Unquantifiable Physical Variables
Continuous, Open, Causally Coupled Reality

World models attempt to shift AI from “predicting text” to “simulating reality.” By early 2026, over $1.3 billion had poured into this track: Google DeepMind’s Genie 3 (real-time interactive 3D world generation, 24fps/720p), NVIDIA Cosmos (2 million downloads, trained on 20 million hours of real-world data), Fei-Fei Li’s World Labs ($1 billion funding), and Yann LeCun’s AMI Labs ($1.03 billion funding, €3 billion valuation).^6,7,8,9 Yet LeCun himself admitted: “The world is not predictable.”¹⁰

“Reality is not predictable, full of edge cases. Building systems that can capture true causality (rather than merely correlations) is an enormous challenge. Even defining what it means for AI to ‘understand’ something remains an open question. We may be building systems that behave as if they understand the world before we can prove they actually do.”

— One Giant Leap, “Why world models are back on the AI agenda”, 2026

LeCun’s critique of generative models strikes at the heart: generative models must predict every pixel, which is computationally wasteful and prone to hallucination in continuous high-dimensional spaces—because much information is intrinsically unpredictable. The physical world is continuous, open, and causally coupled. A ball rolling on a table involves friction, air resistance, material elasticity, microscopic surface irregularities—every variable is continuous and interconnected. This is not a problem “a larger token space” can solve; this is a fundamentally different category of problem.

SECTION IV

Core Thesis: Extra-Model Emergent Events and the Boundaries of Sample Space
Why Innovation Occurs Outside Ω

AI’s quantitative evaluation logic rests on the normal distribution assumption: data yields incremental gains with normally distributed returns. But the physical world follows power-law distributions—low-probability events not only exist in the tail but can fundamentally transform the entire system’s state. Innovation is the paradigmatic case.

The discovery of penicillin had causal preconditions: experimental conditions, contamination environment, petri dishes, observational capability, knowledge background. Newton had extensive mathematical accumulation and Kepler’s astronomical data. Einstein had the Lorentz transformations and Maxwell’s equations as theoretical foundations. These discoveries did not appear from nothing—but their causal chains were highly complex, high-dimensional, sparse, and nonlinear, incapable of being closed-modeled by any prior model. Newton defined the concept of “force”—before him, no model’s variable space Ω contained this dimension. Einstein did not find a better combination within the old Ω; he reconstructed Ω itself.

These are precisely what this paper defines as extra-model emergent events: not explicitly represented in existing models’ variable spaces, yet genuinely occurring through cross-level coupling, forcing the system to rewrite its variable space. They are not “low-probability events” (which still reside in Ω’s tail and are theoretically modelable), but rather events that occur outside Ω—not low probability, but not within the probability distribution’s domain at all.

CORE THESIS

The current AI paradigm centered on statistical learning produces outputs that are functions of the training data distribution. It can discover high-dimensional combinations within Ω that humans have not yet explored (such as AlphaFold’s protein structure predictions), which has tremendous value.¹⁹ However, it lacks stable mechanisms for autonomous problem generation, experimental intervention, and ontological reconstruction, making it extremely difficult to independently produce paradigm shifts that rewrite Ω itself. The causal chains of extra-model emergent events exceed the model’s closed variable space—and current AI’s training and inference both operate within closed variable spaces. This experiment lacks the structural conditions for generating positive feedback.

Google DeepMind’s Hassabis admitted in 2026: “Can AI really come up with a new hypothesis—a new idea about how the world works? So far, these systems can’t do that.”¹¹ 2012 Nobel Physics laureate Serge Haroche pointed out that the human brain possesses a key variable that machines lack: emotion—the driving force of scientific discovery is an “intrinsic urge to understand the world,” fundamentally different from data processing. Berkeley professor Michael I. Jordan more directly called AGI and ASI “Silicon Valley bullshit.”¹²

SECTION V

This Is Not a Mathematical Problem — It Is a Physical One
Why Mathematics Itself Cannot Fully Describe the Physical World

The preceding analysis might be misread as “AI’s mathematical framework isn’t good enough.” The reality is far more severe: mathematics itself cannot fully describe the physical world. AI is merely the executor of mathematics; mathematics itself has a ceiling, and AI’s ceiling can only be lower.

Pi (π) is a completely determinate number, yet it has no “last digit”—finite symbols cannot precisely express it. The long tail of the physical world shares a structural feature with π: finite representations cannot exhaust infinite objects. But their ontological foundations differ: π’s inexhaustibility derives from the mathematical properties of irrational numbers (determinate but infinitely non-repeating), whereas the physical world’s long-tail inexhaustibility derives from the openness of causal structure and the existence of extra-model emergent events (indeterminate and not enumerable in advance). The latter is stronger than the former—not just “uncountable,” but “unknowing what to count.”

Three layers of constraint converge from different directions, forming a convergent multi-constraint chain against “Level I/II ontological reduction” (Note: this is not a strict deductive blocking chain—they belong to different tiers of mathematical and physical theory and cannot be “derived” from one another. But they impose convergent constraints on the same proposition from three independent directions: formal logic, algorithmic information theory, and fundamental physics):

Constraint I · Gödel’s Incompleteness: Any sufficiently powerful consistent formal system cannot prove all true propositions within itself.
This does not mean mathematical modeling fails, but it means a completely closed, completely self-sufficient, completely provable world model cannot exist.
+
Constraint II · Chaitin Undecidability: There exist incompressible mathematical objects that cannot be generated by short programs.
This does not mean all long-tail events are unmodelable, but it means the randomness properties of certain sequences are in principle unprovable.
+
Constraint III · Quantum Intrinsic Randomness: Several no-go theorems of quantum mechanics (Bell, PBR, etc.) impose strong constraints on classical, local, non-contextual ontological models.
This does not mean the macroscopic world is unmodelable, but it means the physical world at its deepest level is not a deterministic machine.
↓
∴ Triple Constraint Convergence: A completely closed, self-sufficient, deterministic world model is formally logically unprovably complete, information-theoretically not fully compressible, and physically not deterministically predictable. Level I and Level II “ontological reduction” face multi-directional constraints.

“Our abstract mathematical models can only describe different aspects of physical reality in an approximate way. To describe a meteorite’s motion, we can use the point-mass concept, but the point-mass approximation completely breaks down when the meteorite impacts Earth. In my view, a Theory of Everything does not exist.”

— Marian Kupczynski, “Mathematical Modeling of Physical Reality”, Entropy, 2024 ¹⁶

Furthermore: several no-go theorems of quantum mechanics impose strong constraints on classical, local, non-contextual ontological models—the state of a quantum system cannot simply correspond to a set of classical physical states representing the system’s independent reality.¹⁷ Undecidability is not a feature of physical systems but of mathematical models. Mathematical models of physical systems are often idealizations that real physical systems can never fully realize.¹⁸ Mathematical models are “projections” of the physical world; projections can never equal the original.

SECTION VI

Coupling of Three Random Variables: The Structural Absence of Global Determinism
Why Triple Stochastic Coupling Prevents Ontological Closure

Even setting aside all the above arguments about the physical world, AI systems face a structural problem at the operational level: triple stochastic coupling cannot guarantee globally unique convergence or ontological-level closure.

Random Variable A: AI’s own sampling mechanism. Every token an LLM generates involves probabilistic sampling. Temperature, top-k, and top-p essentially inject artificial randomness into the output. Fully deterministic output actually performs worse—to simulate intelligence, AI must first simulate randomness. This already concedes that the essence of intelligence contains ineliminable randomness.

Random Variable B: The human user. Human intent, emotion, cognitive state, linguistic expression—every layer is a stochastic process. Why does a person suddenly want to learn Japanese? Why decide to start a business today? Why suddenly feel like writing a poem one evening? The causal chains behind these decisions are highly complex, sparse, and not enumerable in advance—they belong to the same class of phenomena as extra-model emergent events. If the input cannot be closed-modeled in advance, how can the output be controllable?

Random Variable C: The interaction process. A human makes a typo, and the AI generates a completely different understanding. The AI returns an ambiguous response, and the human’s next thought is steered in an unexpected direction. Every round of dialogue is a collision between two chaotic systems, each collision producing a non-reproducible path bifurcation.

ENTROPY ANALYSIS

Joint entropy satisfies H(A,B,C) ≥ max{H(A), H(B), H(C)}, indicating the system state space is no smaller than that of any single variable. But increasing joint entropy alone is insufficient to prove the system “uncontrollable”—the key is conditional entropy H(Target | A,B,C): whether target uncertainty decreases given the interaction.

In scenarios with stable feedback (e.g., user’s vague requirement → AI asks for clarification → user clarifies → target space contracts), conditional entropy can indeed be locally reduced. This paper acknowledges this point.

However, because user intent, model sampling, and interaction path all contain randomness, the system cannot guarantee global reproducibility, unique convergence, or ontological-level closure. Every conversation is a non-reproducible path. Feedback can locally reduce task uncertainty, but it cannot eliminate path dependence—the same question at different moments, with different phrasing, in different moods, leads to entirely different interaction trajectories and outputs. This is not a defect; it is a structural feature.

SECTION VII

Convergent Multi-Constraint Argument
The Formal Convergence Structure

Current AI paradigm is based on statistical learning, optimizing within known variable spaces
+
Formal systems cannot be fully self-sufficient (Gödel) · Constraint level: Formal logic
+
Certain randomness properties are undecidable (Chaitin) · Constraint level: Algorithmic information theory
+
The physical world contains true randomness at its foundation (Quantum mechanics) · Constraint level: Fundamental physics
+
Causal chains of extra-model emergent events exceed closed variable spaces · Constraint level: Philosophy of science
+
Triple stochastic coupling leads to path dependence and non-reproducibility · Constraint level: Information theory
↓
∴ The current AI paradigm cannot fully equate its models with the ontology of the physical world (Level I)
It is also almost certainly unable to exhaust the physical world’s full variable space (Level II)
But it can construct locally effective, approximate, task-dependent representations (Level III)
Whether it can independently achieve paradigm discovery (Level IV) remains an open question.

SECTION VIII

Counterarguments and Rebuttal: Addressing Four Critiques
Honest Engagement with the Strongest Objections

This chapter directly addresses four major counterarguments to the argument chain of the preceding seven chapters, acknowledging the reasonable kernel of each while arguing why they cannot overturn the core thesis.

Counterargument 1: Scale Fallacy—quantum randomness cannot directly explain macroscopic innovation. Critics point out that quantum decoherence causes the macroscopic world to exhibit classical determinism or classical chaos in the vast majority of cases, not quantum randomness. Using quantum mechanics to directly explain the inspiration of Newton or Einstein is a cross-scale logical leap.

This criticism is physically accurate. This paper’s argument must be refined: the core thesis is not “quantum randomness directly triggered human inspiration” but a deeper structural argument—the physical world contains emergent phenomena at every scale that cannot be fully predicted by current frameworks. Quantum-level true randomness is the deepest-level evidence that physical ontology is not a Laplacian deterministic machine. At the macroscopic level, classical chaos (sensitive dependence on initial conditions), biological self-organized criticality, and neuronal avalanche dynamics likewise produce unpredictable emergence. Human innovation does not require quantum tunneling for explanation—bifurcation in classical chaotic systems suffices to produce path divergences that are practically unpredictable for finite observers. Quantum randomness locks the deepest level; classical chaos locks the macroscopic level—both layers jointly support the core thesis with no scale leap.

Counterargument 2: Combinatorial Explosion—AI’s combinatorial creativity is underestimated. Critics cite AlphaFold: even if fundamental elements (tokens/concepts) are finite and known, their combinatorial space is near-infinite. AI conducting massive high-dimensional combinatorial exploration within the known sample space Ω can discover entirely new protein structures not yet found by humans. This “unprecedented effective combination” is functionally equivalent to a paradigm breakthrough.

This criticism reveals a distinction that must be precisely drawn. This paper acknowledges and respects AI’s combinatorial creativity—AlphaFold’s exploration within Ω has indeed produced protein folding solutions humans had not conceived, representing tremendous engineering and scientific value. But this paper’s core thesis has never denied this. The critical distinction is:

CRITICAL DISTINCTION

Within-Ω combinatorial innovation: Discovering unexplored combinations within known rules and known sample spaces. AI excels at this and may do it better than humans. AlphaFold, AlphaGo’s “divine moves,” drug molecule screening—all belong to this category.

Beyond-Ω paradigm shift: Negating known rules themselves, redefining the boundaries of the sample space. Newton defined the concept of “force”—before him, Ω contained no such dimension. Einstein negated absolute time—he did not find a better combination within the old Ω but reconstructed Ω itself. Darwin’s evolution, Copernicus’s heliocentrism, Shannon’s information theory—every paradigm shift is a negation of the old sample space and creation of a new one.

AI under the current statistical learning paradigm can exhaust the combinatorial space within Ω. But it lacks a stable mechanism for negating Ω. Because negating Ω requires not more compute but a “perspective” that is not within Ω—and this perspective is itself an extra-model emergent event.

Counterargument 3: Straw Man Attack—world models aim for “good enough,” not “absolute reduction.” Critics point out that the engineering purpose of building world models (e.g., Waymo’s driving simulator) is not to ontologically reduce every atomic trajectory 100%, but to build macroscopically “self-consistent and sufficiently usable” approximate simulators. Using “cannot reduce absolute ontology” to attack engineering approximations sets an excessively high target.

This criticism is entirely valid, and this paper hereby explicitly recalibrates its attack target. This paper never denies the enormous engineering value of “good enough maps”—Waymo’s simulator genuinely trains safer autonomous driving systems, NVIDIA Cosmos genuinely accelerates robot development. This paper’s target is not engineering practice but industry narrative. When AI companies price themselves at trillion-dollar valuations, their implicit promise is not “we can build a good-enough simulator” but “AI will understand the world, change the world, and ultimately achieve general intelligence.” This narrative inflates engineering approximations into ontological commitments—equating “a good-enough map” with “the territory itself.” This paper’s argument targets precisely this ontological overreach: the map can be very useful, but the map is never the territory. When investors pay for “the territory” but receive “a map,” a bubble forms.

Counterargument 4: Pessimism Trap—human civilization itself is order that emerged from chaos. Critics argue that if “coupling of three random variables” inevitably leads to loss of control, then human society itself—economic, cultural, and scientific systems—also operates amid the coupling of countless chaotic and random variables. Why can human systems build order amid high entropy, but human-machine coupled systems absolutely cannot?

This is the most powerful of the four counterarguments. Responding to it requires introducing Ilya Prigogine’s Dissipative Structure Theory.

Dissipative structures tell us: open systems far from equilibrium, through continuous exchange of energy and matter with the environment, can locally create order against the background of entropy increase. Life, cities, civilizations—all are dissipative structures. They do not violate the second law of thermodynamics but rather locally and temporarily establish low-entropy islands within larger-scale entropy increase.

— Based on Ilya Prigogine, “Order Out of Chaos”, 1984

Human civilization has indeed emerged as order from chaos. But the key is: this emergence process itself depends precisely on the “extra-model emergent events” argued for in this paper. Every civilizational leap—from the agricultural revolution to the industrial revolution to the information revolution—was a result of a low-probability event puncturing high-probability equilibrium. The formation of dissipative structures requires “fluctuations”—when far from equilibrium, tiny random fluctuations are amplified by the system, leading to the emergence of entirely new ordered structures. This is precisely the physics expression of “low probability puncturing high probability.”

So the critics’ argument actually strengthens this paper’s core thesis: human civilization can build order from chaos precisely because human systems allow extra-model emergence—allowing some person on some afternoon to suddenly conceive an unprecedented idea, which is then amplified by the social system into a revolution. Under the current statistical learning paradigm, every output of pure software AI is a function of known distributions (even with temperature sampling, it is perturbation within the existing variable space), disallowing “true fluctuation.” Dissipative structures require genuine perturbation from outside the system as a trigger; AI enclosed within training distributions and sampling mechanisms can only generate within-distribution perturbations.

But this paper must honestly acknowledge an unsealed gap: if AI interfaces with real physical sensors (e.g., photon noise received by a robot’s camera inherently contains quantum randomness), then its input end contains genuine fluctuation from the physical world. At that point, the proposition “AI can only provide pseudo-random perturbation” is not fully valid for embodied AI systems. However, receiving external fluctuation and converting fluctuation into ontological reconstruction are two different things—the latter also depends on whether the system possesses a variable-space rewriting mechanism. The human brain as a carbon-based dissipative structure can accomplish this conversion; whether future more complex silicon-based or quantum nonlinear dynamic systems can achieve the same—this paper cannot provide strict proof or disproof from physics or mathematics. This is the ontological open boundary this paper acknowledges.

SECTION IX

Frontier Testing of Level IV: The Spectrum of Autonomous Scientific Discovery AI
Case-by-Case Assessment Against the Five Criteria

The preceding eight chapters marked “whether AI can independently achieve paradigm discovery” as an open question. This chapter no longer evades but directly tests: how far have the current systems closest to Level IV actually progressed? The following lists key cases chronologically, each analyzed against the five criteria for extra-model emergent events.

Case 1: Robot Scientist “Adam” (2000s–2009). An automated experimental system developed at Aberystwyth University, capable of autonomously generating hypotheses about yeast gene function, designing experiments, executing experiments, and verifying results. This is considered the first fully automated scientific discovery. But Adam’s hypothesis space was human-predefined (yeast metabolic pathways), and experiment types were preset—it performs combinatorial search within a known Ω, constituting an advanced form of Level III.

Case 2: AI-Newton (China, 2025). After inputting experimental data, it autonomously “discovered” physical principles including Newton’s second law. But it rederieves known laws from data within an established physical framework—a verificatory discovery (within Ω), not a paradigm shift. It introduced no new variables, negated no old framework, and satisfies none of the five criteria.

Case 3: Emory University Plasma Discovery (PNAS, April 2026). The most noteworthy current case. Emory physicists used custom neural networks combined with dusty plasma experimental data to describe non-reciprocal forces with over 99% accuracy—correcting long-standing theoretical assumptions. The researchers explicitly stated “we have shown that AI can discover new physics.” Testing against the five criteria: certain old theoretical assumptions were corrected (partially satisfies criterion 4), but the neural network’s physical constraints were human-embedded, the experiment was human-designed, and the variable space (force, position, velocity) was human-predefined. It is closer to “high-precision within-Ω combinatorial innovation + human-led experimental closed-loop correction of old approximations”—at the boundary of Levels III and IV, but not yet satisfying the core condition of “rewriting the variable space.”

Case 4: FirstPrinciples (nonprofit, ongoing). Aims to build an “autonomous AI physicist” to unify quantum field theory and general relativity by 2035. If successful, this would be direct evidence for Level IV—unification theory requires introducing new variables and new mathematical structures, a paradigmatic beyond-Ω paradigm shift. But the project remains in its early stages.

Case 5: AGS (Autonomous Generalist Scientist, arXiv 2025). A theoretical framework merging agentic AI with embodied robotics for fully automated scientific research. The authors foresee scientific discovery following new scaling laws. This represents academia’s most optimistic expectation for Level IV—but currently remains at the framework stage without empirical validation.

Case	① New Variable	② Old Obj. Function Cannot Reward	③ No Isomorphic Sample	④ Must Modify Causal Graph	⑤ Requires Structural Rewrite	Verdict
Robot Scientist Adam	No	No	No	No	No	Level III
AI-Newton	No	No	No	No	No	Level III
Emory Plasma (PNAS)	Partial	No	Partial	Partial	No	III/IV Boundary
FirstPrinciples	Unknown	Unknown	Unknown	Unknown	Unknown	Pending
AGS Framework	Theoretical	Theoretical	Theoretical	Theoretical	Theoretical	Unvalidated

LEVEL IV FRONTIER TEST SUMMARY

As of May 2026, all cases claiming “AI discovered new physics,” when tested item-by-item against the five criteria, fall at the upper boundary of Level III, not Level IV. The closest to the boundary is the Emory plasma case—AI corrected old theoretical approximations (partially satisfying criterion ④), but the variable space, experimental design, and physical constraints were all human-predefined (failing criteria ①⑤). The current statistical learning paradigm lacks a stable mechanism for independently completing Level IV. However, this paper does not prejudge the future: if new AI architectures with active experimental closed loops and variable-space rewriting capabilities emerge, the boundary of Level IV may be redrawn.

SECTION X

Open Directions: What Architectures Might Reach Level IV?
Constructive Paths Forward

This paper bears a responsibility not only to deconstruct but to point toward constructive directions. The following lists architectural families that might reach Level IV, ordered by current maturity:

(1) LLM + Automated Lab Closed Loop. The path closest to current realization. LLMs handle hypothesis generation and literature synthesis; robotic laboratories execute experiments; results feed back to refine hypotheses. Key limitation: the LLM’s hypothesis space remains constrained by training data distribution, and the creativity of experiment design depends on human-preset experiment templates.

(2) Neuro-Symbolic AI. Combining neural networks’ pattern recognition with symbolic reasoning’s logical rigor. Symbolic systems can explicitly manipulate variable spaces, theoretically possessing structural potential to “rewrite Ω.” But current neuro-symbolic integration remains unstable and lacks scaled validation.

(3) Physics-Informed Neural Networks (PINNs). Embedding physical laws as hard constraints in neural networks. The Emory plasma discovery is a success case of this direction. Limitation: physical constraints are human-defined, and AI optimizes within constraints—this remains Level III, but if AI could automatically discover new constraints that need embedding, it might reach Level IV.

(4) Multi-Agent Scientific Systems. Multiple AI agents assume different roles (hypothesis generation, experiment design, data analysis, theoretical critique), producing knowledge through debate and competition. This structure simulates the emergent dynamics of human scientific communities, but whether its “emergence” can exceed each agent’s training distribution boundary remains open.

(5) Embodied Active Inference. Based on Karl Friston’s Free Energy Principle, AI systems minimize prediction error through continuous interaction with the physical world. This has the strongest theoretical foundation among all directions—the free energy framework starts from first principles of physics without presupposing a variable space. But it has been validated only on simple robotic tasks, with an enormous gap remaining to scientific discovery.

SUMMARY TABLE

Proposition-Evidence-Strength Summary
The Paper’s Claims at a Glance

PROPOSITION SUMMARY

Proposition 1: The current AI paradigm cannot fully equate models with physical ontology (Level I)
Conclusion strength: Strong · Support: Philosophical definition + Multi-constraint convergence · Verdict: Valid

Proposition 2: The current AI paradigm can hardly exhaust the physical world’s full variable space (Level II)
Conclusion strength: Strong · Support: Physics / Information theory / Philosophy of science · Verdict: Convergent support

Proposition 3: AI can construct locally effective models (Level III)
Conclusion strength: Strong · Support: AlphaFold, Waymo, Emory — engineering facts · Verdict: Valid

Proposition 4: The current statistical learning paradigm lacks a stable mechanism for independently achieving paradigm discovery (Level IV)
Conclusion strength: Medium-Strong · Support: Mechanism analysis + frontier case testing · Verdict: Reasonable

Proposition 5: AI can never achieve paradigm discovery
Conclusion strength: Weak / Not claimed · Support: None · Verdict: Open question; this paper does not assert this proposition

Proposition 6: The industry narrative inflating Level III into Level I/II constitutes ontological overreach
Conclusion strength: Medium-Strong · Support: Industry data + logical analysis · Verdict: Reasonable

Conclusion

Definitive conclusion (Levels I/II): The current AI paradigm centered on statistical learning cannot fully equate its models with the ontology of the physical world (Level I), and it is almost certainly unable to exhaust the physical world’s full variable space (Level II). This judgment is supported by convergent multi-constraint evidence from formal logic (Gödel), algorithmic information theory (Chaitin), fundamental physics (quantum intrinsic randomness), and philosophy of science (extra-model emergent events).

Affirmative conclusion (Level III): AI can construct locally effective, approximate, task-dependent representations, and has already demonstrated tremendous engineering and scientific value in protein structure prediction, autonomous driving simulation, drug molecule screening, and plasma physics, among other fields. “A good-enough map” can indeed change the world. This paper never denies the value of this level.

Open conclusion (Level IV): As of May 2026, all cases claiming “AI discovered new physics,” when tested against the five criteria, fall at the upper boundary of Level III rather than Level IV. The current paradigm lacks stable mechanisms for autonomous problem generation, experimental intervention, and ontological reconstruction. However, new architectures with active experimental closed loops and variable-space rewriting capabilities (neuro-symbolic systems, PINNs, embodied active inference, etc.) may redraw this boundary—this paper does not prejudge their limits.

Critical conclusion (Industry narrative): This paper’s sharpest critique targets not engineering practice but ontological overreach—inflating Level III engineering achievements into Level I/II ontological commitments, then pricing this commitment at trillion-dollar valuations. The map can be very useful, but when investors pay for “the territory” and receive “a map,” a bubble forms.

Falsifiability statement: Under conditions where humans provide only a general experimental platform and resource constraints—without providing target laws, candidate variables, specific hypotheses, or explanatory frameworks—if an AI system independently proposes new variables or a new causal structure, which is then independently experimentally verified, and satisfies the five criteria for extra-model emergent events—this would constitute a refutation of this paper’s core thesis (Proposition 4). This paper welcomes the arrival of such a refutation.

Honest declaration of unproven presuppositions: This paper treats human paradigm shifts (Newton defining “force,” Einstein negating absolute time) as paradigmatic cases of extra-model emergent events. But this paper must acknowledge: cognitive science has not yet ruled out the possibility that human “innovation” is ultimately high-dimensional interpolation by the brain’s neural networks, rather than genuine “beyond-Ω leaps.” If human innovation is ultimately proven to be within-Ω combination in sufficiently complex systems, then the within-Ω/beyond-Ω distinction is not an ontological chasm between human and machine but a threshold on a continuous spectrum of system complexity—in which case, a sufficiently complex AI system could in theory cross the same threshold. This possibility constitutes this paper’s open boundary, and this paper does not pretend to have resolved it.

π has no last digit. The long tail of the physical world has no end. AI is the most precise map ever made. But the map is not the territory.

AI can find 42. It can even find 42 trillion paths to reach 42. But “asking the question itself”—as an extra-model emergent event—is not yet within the capability boundary of the current statistical learning paradigm.

REFERENCES · References

TechCrunch, “Anthropic has acquired the dev tools startup used by OpenAI, Google, and Cloudflare”, May 18, 2026.
https://techcrunch.com/2026/05/18/anthropic-has-acquired-the-dev-tools-startup-used-by-openai-google-and-cloudflare/
MLQ.ai, “OpenAI Revises Projections Upward with $112 Billion Extra Cash Burn by 2030”, Feb 21, 2026.
https://mlq.ai/news/openai-revises-projections-upward-with-112-billion-extra-cash-burn-by-2030/
Gagadget, “Anthropic buys the SDK startup that OpenAI and Google depend on”, May 2026.
https://gagadget.com/en/711017-anthropic-buys-the-sdk-startup-that-openai-and-google-depend-on/
Google I/O 2026: BusinessToday, “New Gemini app, Flash model, and agentic AI push”, May 20, 2026.
https://www.businesstoday.in/technology/artificial-intelligence/story/google-io-2026/
Software Thug, “The AI Money Pit: How Much Are OpenAI, Anthropic, and xAI Actually Losing?”, Feb 15, 2026.
https://softwarethug.com/posts/ai-company-financials-spending-losses-profitability/
Wikipedia, “Genie (world model) — Google DeepMind”, accessed May 2026.
https://en.wikipedia.org/wiki/Genie_(world_model)
AI2Work, “World Models in 2026: Why LeCun, Fei-Fei Li, and DeepMind Bet Billions on 3D AI”, Feb 14, 2026.
https://ai2.work/technology/world-models-in-2026/
NVIDIA, “Cosmos: World Foundation Models Powering Physical AI”, updated April 2026.
https://www.nvidia.com/en-us/ai/cosmos/
Introl Blog, “World Models Race 2026: LeCun, DeepMind, and the AGI Question”, Jan 3, 2026.
https://introl.com/blog/world-models-race-agi-2026
IBM Think, “Beyond language: Why world models could be the next frontier for enterprise AI”, Mar 30, 2026.
https://www.ibm.com/think/news/world-models-next-frontier-enterprise-ai
Science News, “Have we entered a new age of AI-enabled scientific discovery?”, Feb 18, 2026.
https://www.sciencenews.org/article/ai-enabled-science-discovery-insight
Tech Journal UK, “Scientific Laureates Question Whether AI Can Replicate Einstein-Level Insight”, Feb 2, 2026.
https://www.techjournal.uk/p/scientific-laureates-question-whether
Menin, B., “Beyond Gödel: Information-Theoretical Limits of Physical Models and the Principle of Optimal Incompleteness”, ResearchGate, Jan 2026.
https://www.researchgate.net/publication/399445607
Gonzalez, E., “Uncertainty, incompleteness, chance, and design”, arXiv:1301.7036.
https://arxiv.org/pdf/1301.7036
Müller, M.P., “Undecidability and unpredictability: not limitations, but triumphs of science”, arXiv:2008.09821.
https://arxiv.org/pdf/2008.09821
Kupczynski, M., “Mathematical Modeling of Physical Reality: From Numbers to Fractals, Quantum Mechanics and the Standard Model”, Entropy 26(11), 2024.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11592783/
Yong, T.T., “A no-go theorem for Quantum theory ontological models”, arXiv:2012.05712.
https://arxiv.org/pdf/2012.05712
Cubitt, T. et al., “Undecidability in Physics: a Review”, arXiv:2410.16532, Oct 2024.
https://arxiv.org/html/2410.16532v1
arXiv, “AI for Scientific Discovery is a Social Problem”, arXiv:2509.06580, Feb 2026.
https://arxiv.org/html/2509.06580v4
Bara, M., “Forecasting the Unforecastable: How the Illusion of AI Prediction Makes Black Swans More Dangerous”, Medium, Feb 2026.
https://medium.com/@marc.bara.iniesta/forecasting-the-unforecastable
Louzada, I., “The Reasonable Ineffectiveness of Mathematics in the Biological Sciences”, PMC, 2025.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11941032/
Quanta Magazine, “What Do Gödel’s Incompleteness Theorems Truly Mean?”, May 18, 2026.
https://www.quantamagazine.org/what-do-godels-incompleteness-theorems-truly-mean-20260518/
Prigogine, I. & Stengers, I., “Order Out of Chaos: Man’s New Dialogue with Nature”, Bantam Books, 1984.
Zurek, W.H., “Decoherence, einselection, and the quantum origins of the classical”, Reviews of Modern Physics 75(3), 2003.
https://doi.org/10.1103/RevModPhys.75.715
Bak, P., Tang, C. & Wiesenfeld, K., “Self-organized criticality: An explanation of the 1/f noise”, Physical Review Letters 59(4), 1987.
https://doi.org/10.1103/PhysRevLett.59.381
Beggs, J.M. & Plenz, D., “Neuronal Avalanches in Neocortical Circuits”, Journal of Neuroscience 23(35), 2003.
https://doi.org/10.1523/JNEUROSCI.23-35-11167.2003
Jumper, J. et al., “Highly accurate protein structure prediction with AlphaFold”, Nature 596, 2021.
https://doi.org/10.1038/s41586-021-03819-2
Kuhn, T.S., “The Structure of Scientific Revolutions”, University of Chicago Press, 1962.
Schimmel, A., “AGI is Mathematically Impossible 2: When Entropy Returns”, PhilArchive, 2025.
https://philarchive.org/archive/SCHAIM-14
MDPI Philosophies, “What Artificial Intelligence May Be Missing — And Why It Is Unlikely to Attain It Under Current Paradigms”, Feb 10, 2026.
https://www.mdpi.com/2409-9287/11/1/20
Tao, T. & Klowden, T., “Mathematical methods and human thought in the age of AI”, arXiv, Mar 29, 2026.
https://terrytao.wordpress.com/2026/03/29/mathematical-methods-and-human-thought-in-the-age-of-ai/
Burton, J. & Nemenman, I. et al., “AI reveals unexpected new physics in dusty plasma — non-reciprocal forces”, PNAS, Emory University, Apr 2026.
https://www.sciencedaily.com/releases/2026/04/260422044635.htm
Fang, Y.-L. et al., “AI-Newton: Autonomous discovery of physics principles from experimental data”, arXiv:2504.01538, 2025.
https://www.nature.com/articles/d41586-025-03659-4
FirstPrinciples, “Building an Autonomous AI Physicist to unify QFT and General Relativity by 2035”, accessed May 2026.
https://job-boards.greenhouse.io/firstprinciples
Gao, Y. et al., “Scaling Laws of Scientific Discovery with AI and Robot Scientists (AGS)”, arXiv:2503.22444, Mar 2025.
https://arxiv.org/html/2503.22444v1
Lu, C. et al., “The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery”, Sakana AI, 2024.
https://sakana.ai/ai-scientist/
King, R.D. et al., “Robot Scientist Adam: The Automation of Science”, Science 324(5923), 2009.
https://doi.org/10.1126/science.1165620

Core Definitions Foundational Conceptual Framework

Starting Point: The Nature of AI Competition Revealed Through Acquisitions From Business Acquisitions to AI’s Fundamental Dilemma

The Mathematical Essence of LLMs: A Quantifiable Closed Probability Space Token Space, Cross-Entropy, and the Law of Large Numbers

The Dilemma of World Models: Unquantifiable Physical Variables Continuous, Open, Causally Coupled Reality

Core Thesis: Extra-Model Emergent Events and the Boundaries of Sample Space Why Innovation Occurs Outside Ω

This Is Not a Mathematical Problem — It Is a Physical One Why Mathematics Itself Cannot Fully Describe the Physical World

Coupling of Three Random Variables: The Structural Absence of Global Determinism Why Triple Stochastic Coupling Prevents Ontological Closure

Convergent Multi-Constraint Argument The Formal Convergence Structure

Counterarguments and Rebuttal: Addressing Four Critiques Honest Engagement with the Strongest Objections

Frontier Testing of Level IV: The Spectrum of Autonomous Scientific Discovery AI Case-by-Case Assessment Against the Five Criteria

Open Directions: What Architectures Might Reach Level IV? Constructive Paths Forward

Proposition-Evidence-Strength Summary The Paper’s Claims at a Glance