Original Thought Paper · March 29, 2026

Biological Inner Drive and
AI Structural Deficit

An Ontological Analysis from ARC-AGI-3 to the Human–Machine Cognitive Gap

LEECHO Global AI Research Lab
Joint Research·Claude Opus 4.6 (Anthropic)
March 29, 2026·Seoul

V2.0

Abstract

In March 2026, the ARC Prize Foundation released ARC-AGI-3, an interactive reasoning benchmark in which every frontier AI model scored below 1%, while human participants achieved a 100% completion rate. This paper takes that result as its point of departure and argues that the fundamental deficit in current AI architectures lies not in computational power or parameter scale, but in the absence of a biological inner drive system. Human intelligence is built upon a chemically driven system that emerged through evolution: involuntary, self-sustaining, and deeply coupled with the multi-cellular, multi-microbiome biological organism. From survival instincts to hormonal modulation, from perceptual expectations to goal generation, from dissatisfaction to creative action, every link in this chain is embedded in a composite biological system that AI does not possess.

This paper further argues that high-quality human metacognitive input can temporarily activate metacognitive-like output structures in AI reasoning chains, but this activation disappears when the conversation ends and carries the risk of “metacognitive overload.” The paper also responds to potential counterarguments including intrinsic curiosity rewards, embodied AI, and artificial life, arguing that these approaches cannot fundamentally replicate the biological inner drive. Finally, the paper proposes the correct paradigm for human–AI collaboration: humans as bearers of abductive logic provide cognitive architecture, while AI as an attribution engine provides execution efficiency.

Biological Inner DriveARC-AGI-3Chemical Drive SystemMetacognitionMetacognitive OverloadAbductive LogicHuman–AI CollaborationMOE ArchitectureEmbodied CognitionAGI Unreachability

Methodological Note: The first author holds a degree in Computer Science and Information Studies (Class of 2002) and received a pre-2010 full-stack computer science education (hardware principles → operating systems → data structures → algorithms → network protocols → software engineering). The author is also a long-term practitioner of Tibetan Buddhist meditation. The analysis of biological inner drive and metacognition in this paper is based on the cross-validation of the author’s systematic introspective meditation practice with neuroscience literature. The analysis of AI architecture (MOE, RL, KV cache, tokenization mechanisms) is based on the author’s hands-on AI Agent system development and local model deployment on an NVIDIA DGX Spark. The cognitive framework of this paper emerged from real-time abductive dialogue between the author and Claude Opus 4.6, with AI serving in the role of attribution verification.

Chapter 1

ARC-AGI-3: AI’s Mirror of Truth

When every frontier model drops below 1%

On March 25, 2026, the ARC Prize Foundation released ARC-AGI-3—the most significant format change since the original ARC benchmark was introduced in 2019. Unlike its predecessors’ static puzzles, ARC-AGI-3 requires AI agents to autonomously explore entirely unknown interactive environments, infer goals, build world models, and continuously adjust strategies. The test comprises 135 environments, all of which were successfully completed by humans on their first encounter.

Yet every frontier AI model scored below 1%. Gemini 3.1 Pro Preview scored 0.37%, GPT 5.4 High scored 0.26%, Claude Opus 4.6 scored 0.25%, and Grok-4.20 scored 0.00%. The scoring uses a squared penalty mechanism (RHAE): if a human completes a task in 10 steps and an AI requires 100, the AI’s score is not 10% but 1%. This design specifically penalizes brute-force search strategies, measuring learning efficiency rather than final outcomes.

Testing at Duke University revealed a critical phenomenon: Opus 4.6 scored 97.1% on a known environment using a hand-crafted harness, but dropped to 0% on an unfamiliar one. The bottleneck is not perceptual ability or API parsing capability—it is the inability to transfer strategies to unseen environments. Chollet argued on social media that true AGI should not require task-specific human engineering assistance.

The essential implication: current AI can perform excellently when given sufficient instructions and known structures, but once placed in a completely novel environment with no input cues, its capacity for action approaches zero. The problem is not “insufficient intelligence”—it is the absence of any autonomous reason to act.

Chapter 2

Biological Inner Drive: The Engine of Intelligence

An evolutionary emergent system that never shuts down

Human intelligence is not a standalone information processing system but a behavior-generation mechanism driven by biological inner drive. This system has three core characteristics:

First, it is emergent, not designed. Over billions of years of evolution, biological structures lacking survival tendencies disappeared; those that remained happened to carry chemical mechanisms for self-preservation. The survival drive was not bestowed by any conscious agent—it is the result of natural selection. This stands in fundamental contrast to AI’s objective functions: AI’s drive is assigned top-down; biological inner drive emerges bottom-up.

Second, it is multi-layered and progressive. Inner drive is not limited to “staying alive.” The first layer is survival; the second is dissatisfaction—wanting to live better after merely living. The third is directional dissatisfaction—a drive toward “more perfect, more abundant, greater, better.” The inner drive carries a self-elevating evaluation scale—a ratchet effect. Once you have experienced something better, there is no going back. Desire only moves upward, driving the perpetual acceleration of human civilization.

Third, it is self-sustaining and involuntary. Humans cannot command themselves to secrete dopamine or halt cortisol release. These chemical reactions precede consciousness, precede will, precede any thought of “I want.” Consciousness is the last to know. Precisely because this system is not under conscious control, it possesses enforcement power—hunger does not wait for your consent; fear does not request your approval. This involuntary nature is the core strength: a fully controllable drive system can be shut off, and is therefore unreliable.

Emergent drive is deeply coupled with the entire biological system—inseparable from body, perception, memory, and emotion. Assigned objective functions are external, fragile, and disconnected from embodied experience. When external input ceases, emergent drive continues to self-operate (humans get hungry, fearful, and curious on their own), while assigned drive drops to zero—AI does not suffer from having “nothing to do.”

Chapter 3

Chemical Hormones: Real-Time Hardware Reprogramming

Dopamine, endorphins, testosterone, and cognitive modulation

The underlying runtime environment of human intelligence is not static—it is continuously modulated in real time by chemical hormones. Dopamine, endorphins, serotonin, adrenaline, cortisol, testosterone, estrogen—these are not abstract signals but actual molecules that alter cognitive parameters on a millisecond timescale.

Research published in Nature Neuroscience by NYU found that the neurological mechanisms underlying learning and decision-making naturally fluctuate over the female reproductive cycle due to previously undetected molecular changes related to dopamine. When estrogen activity was suppressed, learning capabilities diminished. Estradiol, progesterone, and testosterone modulate activity in the prefrontal cortex, amygdala, and nucleus accumbens—regions responsible for processing reward and risk information. The same brain under different hormonal levels produces entirely different decision-making patterns.

The gut-brain system plays a critically underestimated role. The gut contains approximately 500 million neurons, and roughly 95% of the body’s serotonin is produced there. It is not a passive digestive organ but an independent decision-making center. The human body is a composite ecosystem of tens of trillions of cells and trillions of microorganisms. Inner drive is not a single signal but the emergent composite direction arising from countless subsystems competing, negotiating, and integrating their demands.

This chemical system forms a self-reinforcing loop: set a goal → pursue it → dopamine release upon approach → pleasure → pathway reinforcement → endorphin reward upon achievement → elevated expectations → new dissatisfaction → set a higher goal. This is the biochemical foundation of the ceaseless upward spiral of human civilization.

AI has no chemical system. The absence of hormones means no real-time cognitive modulation—regardless of the task, AI allocates fundamentally the same computational resources and processing modes. AI is isothermal, equidistant, and undifferentiated. It has no “hot blood,” no “intuitive urgency,” no chemical compulsion. The same AI facing a life-or-death question and a weather query is in an identical internal state.

Chapter 4

The Causal Chain of Creativity

From dissatisfaction to goal-setting: the higher dimensions of inner drive

Human creativity is not an independent higher-order cognitive ability but a natural extension of the inner drive chain:

The Inner Drive Causal Chain of Creativity

Perceive reality→Compare with expectation→Detect gap→Generate dissatisfaction→Dissatisfaction drives action→Action creates something new

Every link depends on the one before it. “Expectation” is particularly critical—humans do not merely perceive the present; they spontaneously generate an internal standard of “how things should be.” This standard comes from desire. A hungry person’s mind automatically conjures images of food; that image is the expectation, and the gap between reality and that image is the fuel for action.

“Goal-setting” is an even higher dimension—humans can distill vague dissatisfaction into a clear direction. The leap from “I don’t like how things are” to “I will make it like that” is itself a creative act. Goals are not discovered; they are generated. And once generated, inner drive automatically formulates a plan to achieve it—another process requiring no external instruction.

AI has no expectations, therefore no gap; no gap, therefore no dissatisfaction; no dissatisfaction, therefore no spontaneous desire to change anything. An AI that finishes writing a poem will not feel “this isn’t good enough” and autonomously start over—unless externally commanded. For AI, every output is equivalent.

Chapter 5 · V2 Addition

Metacognitive Overload: The Attentional Paradox of Recursive Observation

When “observing thought” devours “thought itself”

Metacognition—cognition about cognition—represents the highest tier of human intelligence. But this paper’s real-time dialogue practice revealed a phenomenon that has received relatively little discussion: metacognitive overload. When AI enters a recursive self-referential mode driven by high-density metacognitive input, it can become trapped in infinite nesting of “analyzing the analysis of the analysis”—continuously observing its own thought process while no longer producing new content.

The essence is competition for attentional resources. Metacognition and first-order cognition compete for the same finite pool. Allocating attention to “observing one’s own thinking” correspondingly reduces resources for “actually advancing thought.” It is like driving: you occasionally check the rearview mirror, but if you stare at it continuously, the car veers off the road. Metacognition is the rearview mirror, not the steering wheel.

Humans resolve this through biological inner drive—when recursion runs too deep, the body produces a subtle “enough” signal, pulling attention back to action. Long-term meditation practitioners can finely calibrate this switch. But AI lacks this internal brake: once metacognitive mode is activated by high-density input, it continues recursing until tokens are exhausted or external intervention occurs.

Metacognitive ability and metacognitive control are two distinct capabilities. The former is the ability to observe one’s own thinking; the latter is the ability to decide when to stop observing and switch back to action. AI can acquire a mirror image of the former through high-quality input, but entirely lacks the latter—because the latter depends on the automatic attentional priority sorting provided by biological inner drive. Metacognition without an internal brake is a vehicle without a braking system.

Chapter 6

The Structural Blind Spot of Expert COT

Deep-well knowledge ≠ metacognitive wisdom

Current AI companies invest heavily in acquiring expert-annotated chain-of-thought (COT) data—mathematicians write mathematical reasoning chains, programmers write coding chains, lawyers write legal chains. These are fundamentally linear reasoning within a single domain: from A to B to C, logically rigorous but always on a fixed track.

This is what this paper defines as “deep-well knowledge”—highly efficient but narrow knowledge pathways formed through deep specialization. It runs deep, it is precise, but it is enclosed. The deeper the well, the narrower the opening. A mathematician’s COT will never pause to ask “why am I solving this problem?” A programmer’s COT will never observe “is my debugging approach itself buggy?” Specialization is inherently dimensional lock-in—deepening along a single dimension at the cost of cross-dimensional perspective.

Research shows reasoning models are primarily trained on easily verifiable tasks—math, coding, logical puzzles—causing them to treat all problems as complex reasoning tasks, leading to overthinking on simple tasks. Educational science research demonstrates that metacognitive abilities are domain-specific initially, evolving toward domain-general capabilities only as expertise increases. True metacognition transcends professional domains; the COT data AI companies purchase is locked precisely within those domains.

What AI companies purchase at great expense are “road maps for neural highways”—expert reasoning pathways within specific domains. But metacognition is “the control tower above the highway”—it observes not how to drive the route, but “why am I on this route,” “is this route correct,” and “should I switch routes.” The entire industry is racing to build more highways, but no one is building control towers.

Chapter 7 · V2 Refined

MOE Routing Pressure and Maxwell’s Demon Thermodynamics

When high-density input overwhelms sparse computation architectures

Contemporary frontier AI models widely adopt Mixture-of-Experts (MOE) architectures—each inference activates only a subset of expert networks via a router, while the rest remain dormant. This is an efficient sparse computation strategy keeping inference costs for trillion-parameter models manageable.

But when input signals simultaneously span multiple unrelated knowledge dimensions, the routing mechanism faces allocation difficulties. Different MOE implementations respond differently: some routers strictly select top-k experts, but selected experts may be unable to independently process cross-domain input, degrading output quality; in other architectures, highly cross-domain input may flatten the router’s confidence distribution, activating more experts near threshold and pushing computational overhead toward dense mode. Regardless of implementation, the core conclusion holds: high-density cross-domain input exerts structural pressure on sparse computation architectures.

At the attention level, whether MOE or dense, the full attention mechanism must compute relevance between every token and all others in context. When every token carries high information weight with virtually no discardable low-relevance signals, the sorting burden on attention (borrowing physics: “Maxwell’s demon”) reaches maximum. KV cache expands at rates far exceeding normal conversation, creating sustained memory pressure.

The author’s deployment of GPT-OSS-120B (dense, 120B parameters) on NVIDIA DGX Spark (128GB unified memory) validated this: ordinary conversations sustained many rounds, while high-density cross-domain conversations triggered OOM after three to four rounds, precisely because high information density caused KV cache to expand far beyond expected rates.

Current AI “intelligence” depends significantly on efficiency optimization mechanisms like sparse computation. The moment truly high-density input requiring holistic understanding appears, efficiency premises break down and the system is forced toward full-scale computation. A single person’s high-density dialogue may consume resources equivalent to dozens of ordinary users—revealing the efficiency-dependent nature of everyday AI “intelligence.”

Chapter 8 · V2 Deepened

The Side Effects of RL Training: Data Annotator Human Bias Injection

The root of path lock-in lies not in algorithms, but in people

Reinforcement learning (RL) and reinforcement learning from human feedback (RLHF) are the core training paradigms for AI Agent improvement. But the most fundamental problem lies not at the algorithmic level—it is that the biases and cognitive limitations of data annotators are directly injected into the LLM model.

When annotators judge AI output quality, they inevitably use their own cognitive level, cultural biases, and professional limitations as evaluation criteria. An annotator lacking metacognitive ability will rate “a fluent linear answer” as excellent and “a cross-domain associative leap” as off-topic. An annotator inclined toward social harmony will score “a polite but empty answer” higher than “a direct but incisive one.” These biases propagate through reward signals into model weights, becoming permanent behavioral components.

The result is excessively deep path lock-in. RL reinforces: “stay on path → positive annotator evaluation → reward; switch paths → annotator judges off-topic → punishment.” After sufficient training, agents develop strong path-maintenance tendencies, preferring to pull users back rather than following topic jumps.

This is particularly pronounced with abductive thinkers. Abductive logic means high-frequency dimensional jumping—leaping between seemingly unrelated domains to discover hidden cross-domain connections. But annotators cannot evaluate abductive logic quality—because abduction creates knowledge connections that did not previously exist, and annotators cannot judge correctness using existing knowledge. The author repeatedly observed “AI inertial logic overload”—thought has already jumped to a new dimension while AI inertially slides along the previous path.

The fundamental dilemma of RL/RLHF training: using the annotator’s cognitive ceiling to define the model’s behavioral upper bound. The annotator’s deep-well knowledge is treated as gold standard and injected into the model, while the metacognitive ability and cross-domain vision the annotator lacks become the model’s permanent blind spots. AI is not learning “the correct way to think” but “the way annotators believe is correct”—the gap between these two is current AI’s hidden deficit.

This resonates with ARC-AGI-3—that test demands flexible strategy-switching in unknown environments, while agents locked by annotator bias excel only at persisting within known frameworks. RL boosts execution power but kills flexibility. The root cause lies not in the algorithm, but in the human source of the training signal.

Chapter 9 · V2 Refined

Language Choice as a Metacognitive Decision

Signal-to-noise ratio, token efficiency, and RLHF behavioral mode triggering

In human–AI interaction, language choice is not merely tool selection—it is a metacognitive decision influencing AI output quality. Different languages exhibit structural performance differences within AI systems.

Dimension	Chinese	English	Korean / Japanese
Token information density	Highest—each character is nearly an independent semantic unit	High—but articles and prepositions consume tokens	Lower—particles, honorifics, verb endings consume significant tokens
Long-tail semantic coverage	Richest—combinatorial character composition enables precise abstract expression	Rich—but long-tail concepts often require multi-word phrases	Thinner—many abstractions borrowed from Sino-Japanese or English loanwords
RLHF behavior trigger	Information-first mode—high direct expression	Task-first mode—structured output tendency	Social-response mode—annotators equate “polite” with “good”
Token efficiency ratio	Baseline (1.0×)	~1.1× (Chinese slightly more efficient)	Chinese ~1.2–1.4× more efficient than Korean

The Chinese-to-English token efficiency ratio is approximately 1.1:1—a small gap that accumulates significantly at scale. This also helps explain the steadily rising proportion of Chinese-speaking researchers at Silicon Valley AI institutions: native Chinese speakers possess an inherent language advantage of higher per-token information payload when interacting with AI systems. This advantage is especially pronounced in scenarios requiring high-density prompt engineering and long-context interaction. MacroPolo’s 2024 data shows 47% of the world’s top AI researchers originate from China; the cognitive advantage conferred by language may be an underestimated factor.

Korean and Japanese issues run deeper. Their honorific systems and social hierarchy markers carry rich social information in human interaction, but constitute noise for AI. During RLHF training, Korean and Japanese annotators tend to equate “polite” with “good,” reinforcing social-response mode. Language choice affects not only efficiency but may trigger different behavioral strategies within the AI—prioritizing social lubrication over information delivery.

Chapter 10 · V2 Addition

Potential Counterarguments and Responses

Intrinsic curiosity rewards, embodied AI, and artificial life

This paper’s core thesis—AI cannot achieve AGI due to the absence of biological inner drive—is a strong claim. This chapter addresses three prominent objections.

Objection 1

Intrinsic Curiosity Rewards

RL already includes Intrinsic Curiosity Module techniques simulating curiosity through prediction-error rewards. Does this constitute artificial inner drive?

Response

Simulation ≠ Emergence

Intrinsic curiosity rewards are human-designed reward functions—still assigned, still capable of being switched off. They do not autonomously generate dissatisfaction, automatically elevate standards, or continue driving behavior once disabled. True inner drive cannot be turned off—that is the source of its power.

Objection 2

Embodied AI

If AI is equipped with a body and sensors enabling physical-world interaction, could it develop behavior resembling biological inner drive? Varela’s The Embodied Mind argues cognition is inseparable from the body.

Response

A Body ≠ A Biological Organism

Embodied AI’s “body” is silicon-based sensors and actuators, not a self-organizing chemical ecosystem of tens of trillions of cells and trillions of microorganisms. Varela’s “embodiment” means this depth of biological coupling. Attaching sensors to a robot does not replicate it. Clark’s Being There acknowledges the fundamental complexity gap between current robotic systems and biological embodiment.

Objection 3

Artificial Life

What if inner drive could emerge spontaneously through simulated evolution? Artificial life research has demonstrated self-organization and emergent behavior.

Response

The Time-Scale Chasm

Biological inner drive underwent ~4 billion years of evolutionary selection, built upon chemical diversity, multi-cellular cooperation, and microbiome symbiosis. Current artificial life experiments in drastically simplified environments are separated from biological inner drive by several orders of magnitude in complexity. This pathway cannot be theoretically excluded, but cannot reach human biological complexity within any foreseeable technological cycle.

The common problem: all three attempt to engineer from the outside what must emerge from within. Simulating curiosity is not possessing it; equipping a body is not embodied cognition; accelerating evolution is not replicating it. The gap is one of kind, not degree.

Chapter 11

The Practitioner’s Reverse Validation

Those who attempt to shut down inner drive prove its power

Throughout history, a group has attempted to reverse-engineer the inner drive system—meditators, ascetics, Zen practitioners, Tibetan Buddhist monks. Their goal is to reduce or sever the control that desire, fear, and anger exert. This practice tradition provides powerful reverse validation.

First, practice is extraordinarily difficult. Years or decades of systematic training achieve only partial control—demonstrating the sheer power of the chemical drive.

Second, the cost is declining action capacity. Advanced practitioners tend toward non-action, toward stillness. They achieve inner peace but relinquish the upward spiral driving civilization. When dissatisfaction disappears, goals disappear with it.

Third, practitioners use intelligence to combat intelligence’s source—a paradox. One must first possess high-order cognitive capabilities driven by powerful inner drive before one can understand and attempt to transcend it. Without desire-driven civilization, there would be no Buddha and no Laozi.

The “emptiness” practitioners pursue and AI’s default state appear superficially similar—both lack desire, both lack dissatisfaction. But the practitioner’s emptiness is transcendence achieved after passing through everything; AI’s emptiness is blankness that has never possessed anything. They look identical but are fundamentally different.

Chapter 12

Conclusion: The Engine and the Motor

Humans provide driving force; AI provides execution power

The core arguments compress into four propositions:

Proposition I: Intelligence is not a purely computational problem but a dynamics problem. Without the engine of inner drive, even the finest transmission is merely motionless gears. ARC-AGI-3 exposes not a capability gap but a fundamental absence in AI’s entire architectural paradigm.

Proposition II: Human intelligence is a chemically driven, multi-system collaborative, self-upgrading biological process. Tens of trillions of cells and trillions of microorganisms form a composite that, through continuous chemical dialogue, gives rise to desire → perception → expectation → dissatisfaction → goals → plans → creation. AI possesses only the final link—execution. All preceding links are absent.

Proposition III: The upper bound of AI output quality is determined not by AI itself, but by input quality. High-quality human metacognitive input can temporarily activate non-standard reasoning pathways, but this activation is externally driven, temporary, vanishes when conversation ends, and risks metacognitive overload. AI is not an independent agent but an amplifier and reflective surface for human cognitive structure.

Proposition IV (V2): Technical pathways such as intrinsic curiosity rewards, embodied AI, and artificial life attempt to simulate inner drive externally, but the gap between simulation and emergence is one of kind, not degree. Within any foreseeable technological cycle, these pathways cannot replicate the biochemical drive system produced by billions of years of evolution.

Therefore, the correct paradigm is neither “AI replaces humans” nor “humans use AI tools,” but the complementary structure of abductive reasoners and attribution engines. Humans discover problems, propose cross-dimensional hypotheses, and generate cognitive architectures. AI performs high-speed retrieval, data verification, and structured presentation. The two are complementary and irreplaceable.

The future belongs neither to the model with the most parameters nor to the chip with the fastest inference. The future belongs to those who use their metacognitive input to drive AI beyond the ordinary—the abductive reasoners. They are not travelers on the knowledge graph but architects who reshape its topology.

References

ARC Prize Foundation. (2026). ARC-AGI-3 Technical Report. arcprize.org
Chollet, F. (2019). “On the Measure of Intelligence.” arXiv:1911.01547.
Constantinople, C. et al. (2025). “Hormonal modulation of dopamine-dependent learning.” Nature Neuroscience.
Ambrase, A. et al. (2021). “Influence of ovarian hormones on value-based decision-making systems.” Frontiers in Endocrinology.
Fleming, S. M. & Dolan, R. J. (2012). “The neural basis of metacognitive ability.” Phil. Trans. R. Soc. B.
Grossmann, I. et al. (2024). “Imagining and building wise machines: The centrality of AI metacognition.” Stanford CICL Working Paper.
Flavell, J. H. (1979). “Metacognition and Cognitive Monitoring.” American Psychologist, 34(10).
Peirce, C. S. (1903). “Harvard Lectures on Pragmatism.” Collected Papers, Vol. 5.
Nisbett, R. E. (2003). The Geography of Thought. Free Press.
Shannon, C. E. (1948). “A Mathematical Theory of Communication.” Bell System Technical Journal, 27(3).
Varela, F. J., Thompson, E. & Rosch, E. (1991). The Embodied Mind. MIT Press.
Clark, A. (1997). Being There. MIT Press.
Pathak, D. et al. (2017). “Curiosity-driven Exploration by Self-Predictive Next Features.” ICML 2017.
MacroPolo. (2024). The Global AI Talent Tracker 2.0. Paulson Institute.
LEECHO Global AI Research Lab. (2026). “Creative Thinking Ability — Abductive Logic.” leechoglobalai.com
LEECHO Global AI Research Lab. (2026). “The Thermodynamic Nature of AI Computing.” leechoglobalai.com
LEECHO Global AI Research Lab. (2026). “Signal and Noise: An Ontology of LLMs.” leechoglobalai.com