Research Paper · 2026

Paradoxes of the Internet Era
Apply Equally to the AI Age

From “On the Internet, nobody knows you’re a dog” to cognitive compression loss — on the information-theoretic ceiling of AI capabilities and the dimensional collapse dilemma of data annotation

      LEECHO Global AI Research Lab

      ×

      Claude Opus 4.6

      |

      March 19, 2026

▎ Abstract

In 1993, Peter Steiner published an iconic cartoon in The New Yorker: “On the Internet, nobody knows you’re a dog.” This phrase defined the identity-anonymity paradox of the Internet era. Thirty-three years later, we argue that this paradox has acquired a far deeper meaning in the AI age: not only can the “other party” not tell whether there is a human or a dog behind the keyboard, AI itself cannot distinguish—and does not even need to. This paper constructs an information-theoretic analytical framework for the structural ceiling of AI capabilities across three dimensions: pre-training data exhaustion, AI Slop contamination, and the cognitive-dimensional collapse of data annotation. We cite the experimental case of former Meta engineer Caleb Leak—in which a 9-pound pet dog named Momo successfully generated playable games by randomly hitting a keyboard, processed through Claude Code—as empirical proof of this paradox. We propose that human cognitive output is the compressed, dimensionally-reduced product of a multi-dimensional thinking system; what AI learns from annotated data is merely the statistical distribution of these compressed outputs, not the generative process that produced them—and this constitutes the fundamental ceiling of AI capability.

Section 01

The Original Paradox: Internet Identity Anonymity

From anonymity to cognitive indistinguishability

On July 5, 1993, cartoonist Peter Steiner published a cartoon in The New Yorker: a dog sitting at a computer tells another dog on the floor, “On the Internet, nobody knows you’re a dog.” This cartoon became one of the most frequently cited images in the history of Internet culture, precisely capturing a core characteristic of the early Internet—the anonymity of the transmission medium makes identity verification impossible.

In the Internet era, the essence of this paradox was: the text channel erases the sender’s physical identity information. A piece of writing loses all meta-information about the author—age, gender, race, educational background—during transmission. The receiver sees only a sequence of symbols, not the cognitive agent who produced them.

However, the Internet-era paradox still had an implicit limitation: while humans could not distinguish whether the other party was a person or a dog, that other party did in fact need to be some kind of intelligent agent with linguistic capability. Dogs don’t actually type. The paradox was theoretical, not practical.

Until 2026, when this “theoretical limitation” was thoroughly shattered.

Section 02

Paradox Upgraded: Cognitive Indistinguishability in the AI Era

The cognitive indistinguishability theorem of the AI era

In February 2026, former Meta engineer (formerly an Oculus research engineer) Caleb Leak published an experiment: he successfully taught his 9-pound Cavapoo, Momo, to use Claude Code to develop complete, playable video games.

“The key to making all of this work was telling Claude Code that a genius game designer who speaks only in cryptic riddles was giving it instructions, plus strong guardrails, and building lots of automated feedback tools.”

— Caleb Leak, “I Taught My Dog to Vibe Code Games”

The technical architecture of the experiment was as follows: Momo randomly pressed keys on a Bluetooth keyboard with her paws → keystrokes were transmitted via a Raspberry Pi 5 → a Rust program called DogKeyboard filtered special keys and forwarded them to Claude Code → after a certain amount of text was entered, a smart treat dispenser automatically released snacks as reinforcement → a chime notified Momo when she could continue typing. A typical game took only 1 to 2 hours from Momo’s first keystroke to a playable version.

Momo’s “Programming” Workflow

🐕 Momo randomly hits keys

→

Bluetooth Keyboard

→

Raspberry Pi 5

↓

DogKeyboard (Rust)

→

Claude Code + System Prompt

→

Playable Godot Game 🎮

↓

Auto-screenshot + Game testing + Scene check

→

Feedback loop → Claude self-correction

The profound significance of this experiment lies in the following: when Caleb submitted random text directly to Claude Code, the AI politely replied, “This looks like accidental keyboard input.” But as soon as the system prompt framed the same gibberish as “cryptic instructions from a genius designer,” the AI earnestly interpreted it as meaningful creative direction and executed it.

Thus, the Internet-era paradox received an AI-age upgrade:

Internet Era: “On the Internet, nobody knows you’re a dog.” — Other people cannot tell.

AI Era: “In the AI era, nobody knows — and the AI doesn’t care — whether you’re a dog.” — AI itself cannot distinguish, and doesn’t even need to.

Section 03

The Pre-Training Ceiling: Systemic Collapse of the Data Supply Side

Peak Data, AI Slop, and the collapse of the pre-training paradigm

To understand the structural roots of the AI-era paradox, we must first examine the multiple bottlenecks that AI pre-training currently faces. As of March 2026, the training data cutoff dates of mainstream large language models reveal a striking pattern of stagnation:

2025.08

OpenAI GPT-5 Series
Training data cutoff

2025.08

Anthropic Claude 4.5 Opus
Training data cutoff

2025.01

Google Gemini 3 Series
Knowledge cutoff

300T

Global high-quality public text
Estimated total tokens

After August 2025, the training data cutoff dates of virtually all major AI companies have converged toward stagnation. This is no coincidence, but the result of multiple compounding factors:

First, the total volume of high-quality human data has peaked (Peak Data). Research by Epoch AI estimates that the total global supply of high-quality public text is approximately 300 trillion tokens. At current training scales and over-training trends, this reserve will be exhausted between 2026 and 2028. Both Elon Musk and OpenAI co-founder Ilya Sutskever have publicly warned of the arrival of “Peak Data.”

Second, massive contamination by AI Slop. A report published by Kapwing in late 2025 showed that AI-generated low-quality content accounted for over 52% of newly published English-language articles. Ahrefs’ analysis of 900,000 newly published web pages found that 74.2% contained AI-generated content. “Slop” was named Merriam-Webster’s Word of the Year for 2025—the signal-to-noise ratio on the Internet is deteriorating rapidly.

Third, the theoretical threat of Model Collapse. Research published in Nature confirmed that indiscriminate use of AI-generated content in training data leads to irreversible model defects. An ICLR 2025 Spotlight paper further demonstrated that even an extremely low proportion of synthetic data (as little as one-thousandth) can trigger model collapse.

The Scaling Law for pre-training is hitting its ceiling. As of early 2026, the AI field increasingly resembles an “industrialization phase” rather than a “discovery phase”—the core question has shifted from “can scaling up reduce loss” to “which scaling metrics can translate into lasting economic value.”

— World Journal of Advanced Research and Reviews, 2026

Section 04

The Promise and Limits of Post-Training: From RLHF to Cognitive-Dimensional Annotation

The evolution of data annotation — from labeling cats to labeling cognition

Once pre-training hit its ceiling, the industry shifted its focus toward post-training—particularly RLHF (Reinforcement Learning from Human Feedback) and its variants. NVIDIA CEO Jensen Huang explicitly proposed “post-training scaling laws” and “test-time scaling laws” as emerging Scaling Law dimensions at CES 2025.

The data annotation industry has undergone a profound paradigm shift accordingly:

Dimension	Past (~2015–2022)	Present (2023–2026)
Annotation target	Physical facts (cats, dogs, pedestrians, vehicles)	Cognitive processes (reasoning chains, preference judgments, domain reasoning)
Annotators	Low-paid crowdworkers ($2–5/hr)	PhDs, 10+ year practitioners ($50–200/hr)
Judgment type	Objective certainty (is a cat / is not a cat)	Subjective value judgments (Answer A is better than B, is the reasoning process sound)
Cost per item	A few cents	~$100 (600 RLHF annotations ≈ $60,000)
Market size	Hundreds of millions USD	$4.87B (2025) → $29B+ (2032 projected)
Core requirement	Visual recognition accuracy	Domain expertise + cognitive evaluation ability

Producing 600 high-quality RLHF annotations costs approximately $60,000—roughly 167 times the cost of the training compute itself. This staggering cost differential reflects the qualitative leap from “labeling physical facts” to “labeling cognitive processes.”

But this is precisely the core argument of this paper: post-training and data annotation face the same structural ceiling, and that ceiling is at the level of information theory.

Section 05

Core Thesis: Cognitive Compression Loss and Dimensional Collapse

The cognitive compression loss hypothesis

The human brain is a complex cognitive architecture with multiple systems operating in parallel, comprising at least the following subsystems:

📝

Memory System
Experience storage & retrieval

🧠

Reasoning System
Logical inference & abstraction

🔄

Belief Update System
Bayesian belief revision

⚖️

Emotion-Reason Conflict
The interplay of affect and logic

📚

Learning System
Pattern extraction & generalization

💡

Intuition System
Tacit Knowledge

These systems run in parallel and are mutually entangled in every cognitive act. But when a human expert writes down an answer, makes a judgment, or labels a preference, what happens is this:

A multi-dimensional cognitive process is compressed into a one-dimensional textual output.

This is like a three-dimensional object casting a shadow onto a two-dimensional plane—the shadow preserves some information but irreversibly loses the depth dimension.

Expressed formally in the language of information theory:

Cognitive Compression Loss Model

H(cognitive_output) = f(memory, reasoning, emotion, experience, intuition, context, …) → compressed text/labels

What AI learns = statistical distribution of compressed outputs P(output)
What AI cannot learn = the generative process P(cognitive_process | output)

This compression leads to three fundamental problems:

Problem One: The same output can be produced by entirely different cognitive processes. A grade-schooler saying “the answer is 42” and a mathematician saying “the answer is 42” are identical at the textual level. But the former may be guessing, while the latter has a complete chain of reasoning behind it. In annotated data, these two are indistinguishable.

Problem Two: The most valuable aspects of human cognition are precisely the hardest to put into words. A senior surgeon’s “feel,” a detective’s “gut instinct,” a trader’s “sense” for market sentiment—these are high-dimensional cognitive patterns built up over decades of experience (Tacit Knowledge) that are nearly impossible to compress into textual annotations.

Problem Three: What is lost during compression is not random noise but structural information. The tension between emotion and reason, context-dependent weighting of judgments, decision-making mechanisms under uncertainty—these are the core of human wisdom, yet they are systematically erased in textual output.

Section 06

Empirical Proof: The Input Equivalence of a Grade-Schooler, a Dog, and a Postdoc

Momo’s experiment as empirical proof of cognitive indistinguishability

Caleb Leak’s Momo experiment provides a near-perfect empirical validation of the theory presented above:

Input Source	Cognitive Dimension	Text Manifestation	AI Processing Result
Math PhD	20 years of training + deep reasoning + intuition	“Create a platformer game with a physics engine”	Generated playable game ✓
Grade-schooler	Limited experience + basic expression	“Make a game where you jump around”	Generated playable game ✓
Momo (pet dog)	Zero cognitive intent	“y7u8888888ftrg34BC”	Generated playable game ✓ Experimentally verified

Key finding: after the system prompt framed Momo’s gibberish as “cryptic instructions from a genius designer,” the games generated by Claude Code were not materially different in quality from those generated from human input. Momo’s contribution was essentially equivalent to a random number generator. The actual “intelligence” came from Claude’s pre-trained knowledge + Caleb’s carefully designed engineering architecture—not from the input itself.

This proves a core proposition:

AI processes statistical patterns within symbol sequences, not the cognitive processes behind the symbols. Once input is compressed to the textual level, information about cognitive dimensions is irreversibly lost. Thus, a dog’s gibberish and a PhD’s deliberate reasoning, with appropriate prompt engineering, can both produce “seemingly meaningful” results—because meaning is injected by the AI from its own pre-trained knowledge, not extracted from the input.

Section 07

Structural Ceilings: The Dual Bottleneck of Pre-Training and Post-Training

The dual ceiling of AI capability

Integrating the preceding analysis, we can map the complete constraint landscape of AI capability:

The Dual Ceiling Model of AI Capability

Ceiling 1: Pre-Training
High-quality data exhaustion + AI Slop
contamination + Model collapse risk

Ceiling 2: Post-Training/Annotation
Cognitive compression loss + Dimensional
collapse + Tacit knowledge uncodable

↓

Structural Ceiling of AI Capability
Not a compute/capital problem → a fundamental constraint at the information-theoretic level

The essence of this structural ceiling can be summarized in one sentence: high-dimensional structures cannot be losslessly reconstructed from low-dimensional projections. No matter how much data, how many PhDs, or how many annotations are thrown at the problem, one cannot fully reconstruct the cognitive architecture that produced those texts from the shadows of text.

The AI industry’s current mitigation strategies—test-time compute, Process Reward Models, Embodied AI—are partial workarounds for this fundamental limitation, not solutions to it.

Section 08

Industry Pivot: From “Scale” to “Quality” to “Connectivity”

The three-phase evolution: scale → quality → connectivity

Facing the dual ceilings of pre-training and post-training, the AI industry is undergoing a fundamental paradigm shift. The 2026 AI development roadmap presents a three-layer architecture:

Layer	Role	Current Status	Direction
Foundation	Base models (pre-training)	Hitting ceiling, diminishing marginal returns	Capabilities converging across providers
Middle	Capability enhancement (post-training)	RLHF/RL + test-time compute	The primary battleground for differentiation
Application	Tool connectivity (MCP/CLI/Agent)	MCP monthly downloads > 97 million	From “chatting” to “operating the world”

Particularly noteworthy is the application layer’s MCP protocol (Model Context Protocol)—released by Anthropic in November 2024, and as of March 2026, adopted by all major AI providers including OpenAI, Google, Microsoft, and Amazon, and donated to the Linux Foundation’s Agentic AI Foundation. Running an MCP Server has become nearly as commonplace as running a Web Server.

At the same time, the CLI tool-calling approach is both competing with and complementing MCP. Practical testing shows that in some enterprise scenarios, CLI tools are 35 times more context-efficient than MCP. Y Combinator CEO Garry Tan also chose to build CLI tools directly rather than use MCP.

The deeper logic of this pivot is: since AI faces a structural ceiling in pure cognitive capability, value creation must shift from “making AI smarter” to “connecting AI to more tools and systems.”

Section 09

Conclusion: Unifying the Paradox and Looking Ahead

The universal paradox and its implications

The Internet paradox of 1993 and the AI paradox of 2026 are, at their core, two historical versions of the same information-theoretic problem:

When information is transmitted through a medium, the properties of that medium determine which dimensions are preserved and which are discarded. The Internet preserves text but discards identity; AI preserves symbolic patterns but discards cognitive depth. In both cases, the receiver cannot reconstruct the sender’s complete state from the received signal alone.

The universality of this paradox points to several far-reaching implications:

First, the ceiling of AI capability is not an engineering problem but an information-theoretic one. More compute, more data, more PhD annotators—none of these can overcome the irreversibility of high-to-low-dimensional projection. This demands a rethinking of the path to “human-like intelligence”—which may lie not in the domain of linguistic symbols, but in multi-dimensional perception (as in the “World Models” direction pursued by Yann LeCun after leaving Meta).

Second, AI’s value creation is shifting from “cognition” to “connectivity.” With pre-training reaching its ceiling, the industry’s growth engine has moved from “making models bigger and stronger” to “connecting models to more systems”—MCP, CLI, and Agent frameworks form the new value frontier. AI does not need to “understand” the world in order to “operate” it, just as Momo did not need to understand game design to “create” a game.

Third, the data annotation industry faces the same structural bottleneck as pre-training. No matter how expert the annotators, once their cognitive processes are compressed into annotation outputs, the lost dimensions are irrecoverable. This means the marginal returns of improving AI through RLHF will also diminish.

Fourth, Momo’s experiment is a mirror. It reflects not only the limitations of AI but also a profound challenge to the nature of “intelligence” itself: if a dog’s random keystrokes and a human’s deliberate reasoning can produce functionally equivalent results, then what does “intelligence”—at least at the level of observable output—actually mean?

The paradox of the Internet era applies equally to the AI age. But this time, the paradox is no longer merely about the anonymity of identity—it is about the irreducibility of cognition: when thought is compressed into text, wisdom is already lost in transmission.

References

Steiner, P. (1993). “On the Internet, nobody knows you’re a dog.” The New Yorker, July 5, 1993.
Leak, C. (2026). “I Taught My Dog to Vibe Code Games.” calebleak.com. February 2026.
Shumailov, I. et al. (2024). “AI models collapse when trained on recursively generated data.” Nature.
Dohmatob, E. et al. (2025). “Strong Model Collapse.” ICLR 2025 Spotlight.
Epoch AI. (2024). “Will we run out of data to train large language models?” epoch.ai/blog.
Kapwing. (2025). “AI Slop Report 2025.” December 2025.
Ahrefs. (2025). “Analysis of 900,000 newly published English-language web pages.” April 2025.
Huang, J. (2025). “Post-training scaling law and test-time scaling.” CES 2025 Keynote, NVIDIA.
World Journal of Advanced Research and Reviews. (2026). “Scaling Laws, Foundation Models, and the AI Singularity.” Vol 29(01), 111-134.
Stanford HAI. (2025). “Stanford AI experts predict what will happen in 2026.” Stanford Report, December 2025.
Anthropic. (2024). “Model Context Protocol.” November 2024. Donated to Linux Foundation AAIF, December 2025.
Mordor Intelligence. (2025). “Data Annotation Tools Market Size, Share & Growth Research Report.”
Second Talent. (2026). “Data Annotation for LLM Fine-Tuning: RLHF and Instruction Tuning Guide.” January 2026.
Kili Technology. (2025). “2026 Data Labeling Guide for Enterprises.” December 2025.
Polanyi, M. (1966). “The Tacit Dimension.” University of Chicago Press.
LeCun, Y. (2025). Advanced Machine Intelligence Labs (AMI Labs). Founded November 2025. Focus: World Models.
PC Gamer. (2026). “‘I taught my dog to vibe code games’: Yup, someone actually managed to get Claude AI to code a game based on the keyboard inputs of a pooch.” February 2026.