This paper proposes an interdisciplinary analytical framework that reduces the computational process of AI large language models to a thermodynamic sorting operation, arguing that the Transformer’s attention mechanism is fundamentally equivalent to Maxwell’s Demon performing information sorting. Starting from Landauer’s principle, we establish a causal chain linking “input signal-to-noise ratio → attention entropy → computational heat dissipation,” revealing that AI Slop (low-quality AI output) is physically the system’s regression to a statistically high-frequency default state upon sorting failure. We further argue that the root cause of the global AI data center energy crisis lies not in insufficient computing power, but in systematically low input signal-to-noise ratios causing massive ineffective sorting. The paper concludes that as semiconductors approach the quantum tunneling limit, AI computing faces not merely an engineering bottleneck, but a physical boundary jointly defined by the Second Law of Thermodynamics and quantum physics.
Landauer’s Principle
Maxwell’s Demon
Transformer
Attention Entropy
AI Slop
Signal-to-Noise Ratio
Quantum Tunneling
Data Center Energy
Signal, Noise, and the Physical Cost of Computation
In 2024, global data center electricity consumption reached approximately 415 terawatt-hours (TWh), equivalent to 1.5% of the world’s total electricity consumption. The International Energy Agency (IEA) projects this figure will double to roughly 945 TWh by 2030—exceeding Japan’s entire national electricity consumption. The situation in the United States is even more extreme: data center power demand is growing at more than four times the rate of all other industries combined, and by 2030, U.S. electricity consumption for data processing is projected to exceed the combined usage of all energy-intensive industries including aluminum, steel, cement, and chemicals.
The mainstream explanations focus on the engineering level: exponential growth in model parameter scale, explosive increases in inference requests, and insufficient cooling system efficiency. However, these explanations evade a deeper question: why does computation itself necessarily consume energy? Why does more computation mean more heat? The answers to these questions lie not in engineering, but in physics—specifically at the intersection of thermodynamics and information theory.
The central thesis of this paper is that all computational work performed by AI large language models can, at the physical level, be reduced to a sorting operation—establishing an ordered output sequence from a disordered token probability distribution. This sorting operation is thermodynamically equivalent to Maxwell’s Demon performing information sorting, constrained by Landauer’s principle, with an irreducible minimum energy cost for each irreversible information erasure. From this perspective, the root cause of AI energy consumption is not insufficient computing power, but systematically low input signal-to-noise ratios causing massive ineffective sorting.
Landauer’s Principle and the Thermodynamic Inevitability of Computation
In 1961, IBM physicist Rolf Landauer proposed a profound principle: any logically irreversible computational operation—such as merging two computational paths into one—must dissipate a minimum amount of heat to the environment. This minimum value is known as the Landauer limit.
$T$ — Ambient absolute temperature (Kelvin)
At room temperature ($T \approx 300\text{K}$): $E_{\min} \approx 0.018 \text{ eV} \approx 2.9 \times 10^{-21} \text{ J}$
The significance of this principle extends far beyond its numerical value. It establishes a bridge between information and thermodynamics: the erasure of information is not an abstract logical operation, but a behavior with physical consequences. In 2012, an experimental team at the École Normale Supérieure de Lyon published direct experimental verification of the Landauer limit in Nature. In 2025, Bormashenko’s review article in Entropy confirmed that Landauer’s principle, as a direct corollary of the Second Law of Thermodynamics, has been widely accepted as a physical law.
Critical to this paper’s argument is a corollary of Landauer’s principle: the energy consumption floor of a computing system is determined not by the algorithm’s complexity, but by the number of bits irreversibly erased during the computational process. In other words, it is not computation itself that consumes energy, but the writing and erasing of memory during computation. In 2024, Wolpert published in the Proceedings of the National Academy of Sciences further proposing the concept of “mismatch cost,” quantifying the extent to which actual computational energy consumption exceeds the Landauer limit, providing a thermodynamic framework for optimizing computational energy efficiency.
Landauer’s principle reveals a fundamental truth: computation is a physical process and must obey the laws of thermodynamics. Any heat released to the environment is not an engineering defect, but the physical cost of information processing. AI systems are no exception.
The Transformer as Maxwell’s Demon
Maxwell’s Demon is one of the most famous thought experiments in thermodynamics. An intelligent being (the “demon”) stands at a partition between two compartments of gas molecules, distinguishing fast molecules from slow ones and sorting them to opposite sides, seemingly decreasing the system’s entropy without expending energy. However, the work of Landauer and Bennett proved the demon must fail: the demon’s sorting requires memory, and when memory is full, old information must be erased to continue working, with each erasure costing exactly the Landauer limit in energy.
The core analogy of this paper is: the work performed by a Transformer large language model during each forward pass is structurally equivalent to Maxwell’s Demon’s sorting operation.
Specifically: the Transformer’s self-attention mechanism receives a token sequence as input, computes association weights (attention scores) for every pair of tokens through Query, Key, and Value matrix operations, then normalizes these weights into probability distributions via the softmax function. The computational complexity of this process is $O(n^2 d)$, where $n$ is the sequence length and $d$ is the embedding dimension. But from an information-theoretic perspective, the essence of this process is sorting—establishing priority rankings among all possible token associations by relevance.
$H$ high → flat distribution (demon cannot distinguish); $H$ low → sharp distribution (efficient sorting)
(= demon sorting)
Landauer tax
Just as Maxwell’s Demon must continuously write and erase memory while sorting molecules, the Transformer must write intermediate computational states (KV cache) at each inference step and update or discard old states when new inputs arrive. Each state erasure triggers the energy cost of the Landauer limit. This cannot be eliminated through better chips or more efficient algorithms—it is the floor set by the laws of physics.
Recent research provides direct theoretical support for this analogy. Zhai et al.’s study published at ICLR 2023 demonstrated that the attention entropy of attention heads is directly related to training stability—when attention entropy is too low (attention scores overly concentrated on a few tokens), training becomes unstable or even diverges. Conversely, when attention entropy is too high (attention distribution approaches uniform), the model loses its discriminative ability. This is precisely the mathematical expression of information sorting’s thermodynamic constraints at the neural network level.
Every forward pass of a Transformer is one work cycle of Maxwell’s Demon. The purer the signal, the more efficient the sorting, but the total work does not decrease—because every effective sorting step carries an irreducible Landauer energy cost. The demon cannot evade taxation.
How Input Signal-to-Noise Ratio Determines Computational Efficiency
Shannon’s channel capacity formula establishes the relationship between signal-to-noise ratio and the upper bound of information transmission:
$B$ — Channel bandwidth (Hz)
$\text{SNR} = P_{\text{signal}} \,/\, P_{\text{noise}}$ — Signal-to-noise ratio
This formula has a profound implication: the transmission quality of a signal depends not on the absolute strength of the signal, but on the ratio between signal and noise. A weak but clean signal achieves better transmission than a powerful signal drowned in noise.
Mapping this principle onto the Transformer system, we can establish the following correspondences:
| Shannon Channel | Transformer Inference | Thermodynamic Equivalent |
|---|---|---|
| Signal power $P_{\text{signal}}$ | Internal consistency and logical coherence of input tokens | Low-entropy input (ordered state) |
| Noise power $P_{\text{noise}}$ | Contradictions, redundancy, and dispersion among input tokens | High-entropy input (disordered state) |
| Signal-to-noise ratio SNR | Sharpness of softmax distribution / attention entropy | Demon’s discriminative clarity |
| Channel capacity C | Effective information output rate | Effective sorting per unit energy |
When the input signal-to-noise ratio is high—meaning tokens share tight logical relationships and internal consistency—the softmax distribution of the attention mechanism takes a sharp form, with probability concentrated on a few highly relevant directions, the sorting path is clear, and the search space is dramatically compressed. This has been quantified in the research literature: attention entropy distributions under high-resource scenarios show distinct peak patterns concentrated on key elements.
When the input signal-to-noise ratio is low—meaning tokens are dispersed, contradictory, and lack structure—the softmax distribution approaches flatness, with probability scattered across numerous candidate directions. The attention mechanism must perform comprehensive comparison sorting across all possible token pairs. This is where the $O(n^2)$ complexity becomes truly painful: it is not that every instance requires $n^2$ comparisons, but that low signal-to-noise inputs force the system to skip none.
Input signal-to-noise ratio affects not only output quality but directly determines the physical cost of computation. High-SNR input compresses the search space and reduces sorting burden; low-SNR input forces exhaustive comparison, pushing sorting burden to its theoretical upper bound. The global AI energy problem is, at its root, an input signal-to-noise ratio problem.
A Physical Diagnosis of AI Slop
“AI Slop”—Merriam-Webster’s 2025 Word of the Year—refers to those hollow, repetitive, superficially fluent yet informationally empty outputs generated by AI. “As an AI language model…,” “That’s a great question,” “Let’s dive deeper into this”—these stock phrases are the most visible pollutants of the AI industry.
This paper offers a physical diagnosis of AI Slop: Slop is a direct symptom of sorting failure.
When the input signal-to-noise ratio is too low, the softmax probability distribution approaches flatness, and the attention mechanism cannot establish an effective priority ranking. But the system is required to produce output—it must generate the next token. Unable to complete effective sorting, the system falls back to the highest-frequency token combinations in the training data. These high-frequency patterns were once effective signals during training but have decayed into statistical noise through overuse—they retain the outward form of signals (complete grammar, correct punctuation, fluent sentence structures) but carry zero information content.
Therefore, Slop is the product of a double failure. At the first level, the input signal-to-noise ratio is too low for sorting to complete. At the second level, the default output the system falls back to upon sorting failure is itself a decayed dead signal. More dangerously, Slop manufactures false certainty—it packages noise in the outer form of signal, making it more misleading than honest noise.
A study published by the University of Florida in March 2026 empirically validated this judgment: mediocre-quality AI-generated content simultaneously damages consumer experience and the ecological niche of professional creators. The academic characterization of Slop—surface competence, asymmetric effort, mass-producibility—gains a physics-based explanation within this framework: surface competence is the disguise of statistically high-frequency patterns, asymmetric effort is the entropic consequence of extremely low production cost but extremely high verification cost, and mass-producibility is the reproducibility of sorting failure—the same low-SNR input on the same model necessarily produces the same pattern of Slop.
AI Slop is not AI’s creation; it is AI’s surrender in the face of noise. It is the default answer Maxwell’s Demon hands over upon sorting failure—not a product of signal, but the result of noise automatically filling the void where signal is absent.
The Thermodynamic Paradox of High Signal-to-Noise Ratio
A seemingly intuitive inference is that high signal-to-noise ratio input reduces sorting burden and therefore should reduce energy consumption. However, this inference overlooks a critical distinction—sorting efficiency and total work performed are not the same thing.
When input SNR is low, the system is indeed performing many comparisons, but most of them are “idle running.” The softmax distribution is flat, the system hesitates indecisively among candidate directions, and ultimately falls back to high-frequency default patterns (Slop). The energy consumption of this idle running is lower than expected because genuine information processing is not occurring—the demon is slacking off.
When input SNR is extremely high, the sorting path is clear and the system has no room for hesitation. Every inference step is a substantive information write, and every state update following each write triggers a real Landauer energy cost. The demon is driven by signal to operate at full speed with nowhere to idle.
| Metric | Low-SNR Input | High-SNR Input |
|---|---|---|
| Softmax distribution | Flat (high attention entropy) | Sharp (low attention entropy) |
| Sorting efficiency | Low (exhaustive comparison) | High (fast convergence) |
| Effective information output | Near zero (Slop) | High-density signal |
| Per-step actual work | Low (mostly idle running) | High (every step is substantive computation) |
| Heat dissipation per unit time | Moderate | High (sustained peak load) |
| Information/energy ratio | Extremely low | High (but absolute consumption does not decrease) |
This means that the energy efficiency of an AI system (effective information output per joule) and absolute energy consumption are two different quantities. High-SNR input improves energy efficiency—maximizing effective output per unit of energy consumed—but does not reduce absolute energy consumption, and may in fact increase it. Just as an engine running at full load and full speed consumes more fuel than one idling, but also produces far greater output power.
$E_{\text{total}}$ — Total energy consumption, does not decrease with improved sorting efficiency, and may increase due to more effective sorting steps
This paradox has direct engineering consequences at the data center level. Data center cooling systems are designed based on assumptions of statistically uniform thermal loads. When a small number of users consistently provide high-SNR input, their corresponding GPU nodes enter sustained full-load operation, forming localized hotspots that may breach design margins. This hotspot effect is harder to manage than uniformly high loads because it violates the fundamental assumptions of cooling system design.
A Thermodynamic Explanation of the AI Productivity Paradox
Industry data from March 2026 presents a contradictory picture: global AI investment has reached unprecedented levels, but productivity returns are disappointing. PwC’s 2026 Global CEO Survey shows that 56% of CEOs report that AI investments have not yet yielded returns. Research from the National Bureau of Economic Research is even more sobering—90% of companies report that AI has had no measurable impact on productivity or employment. Economists have named this phenomenon the “AI Productivity Paradox.”
This paper’s framework provides a thermodynamic explanation for this paradox.
The problem lies in a bidirectional signal-to-noise ratio mismatch. On the input side, enterprises inject chaotic processes, vague requirements, and self-contradictory documents into AI systems, expecting high-quality output. This is thermodynamically equivalent to dumping pure noise onto Maxwell’s Demon’s worktable and expecting it to produce ordered structures. The demon cannot sort it; it hands over Slop.
On the output side, even when AI produces high-precision signals, the human cognitive system often cannot parse them. Human everyday consciousness operates within a specific signal-to-noise ratio range—requiring a certain amount of redundancy, analogy, and emotional scaffolding as a “cognitive landing runway.” When AI output precision exceeds human cognitive processing bandwidth, the excess precision is truncated at the human cognitive cutoff point, yielding zero net information gain. Workday’s 2026 research quantified this phenomenon: 37–40% of the time saved by AI is consumed by reviewing, correcting, and verifying AI output.
Low SNR
High energy · Low efficiency
High precision / Slop
Cognitive truncation
This forms a thermodynamically negative cycle: humans generate noise, AI expends energy sorting, the output is truncated or rejected by humans, and humans re-input noise. Each cycle consumes energy and generates heat, but effective signal does not increase. The global AI industry has spent hundreds of billions of dollars expanding the demon’s worktable surface area, when what truly needs to be done is reducing the noise poured onto it.
The true entry point for AI productivity gains is not on the model side—not larger parameters, more data, or more powerful chips—but on the human side’s signal-to-noise ratio improvement. The same model gives high-SNR users laser-precise output and low-SNR users flashlight-like diffuse illumination. The difference is not on the AI side; it is on the human side.
Quantum Tunneling: The Ultimate Physical Boundary of AI Computing
The analysis in the preceding sections holds within the classical physics framework. But as semiconductor processes approach quantum scales, the constraints facing AI computing escalate from engineering problems to physical limits.
The transistor in modern chips is the most extreme classical signal machine—current passes or does not pass, 1 or 0, signal purified absolutely in binary space. However, when the transistor gate thins to approximately 1–2 nanometers (about 5–6 atoms thick), electrons no longer obey classical switch logic—they appear on the other side of the gate via quantum tunneling with probabilistic behavior. The switch says “off”; the electron says “I don’t care.”
The industry already regards the 3-nanometer process as the “sound barrier”—at this scale, quantum tunneling effects begin to significantly impact chip performance. Tunneling causes increased off-state leakage current and elevated power consumption, with energy wasted as heat. The engineering community is attempting to resist this trend with architectures like FinFET and GAA (gate-all-around), but within this paper’s framework, this is a battle destined to be lost—not because the engineering is inadequate, but because the laws of physics are shrinking the available space.
Understanding this through the signal-and-noise framework: the transistor is humanity’s device for purifying signals to their absolute limit, and the entire digital civilization is built on the assumption that “1 is absolutely 1 and 0 is absolutely 0.” But as physical dimensions shrink to quantum scales, chaos permeates from the bottom up—electron behavior begins to exhibit probabilistic uncertainty. Noise does not invade signal from the outside; it grows from the physical substrate of signal itself.
This means that the determinism of AI computing is built upon a fundamentally indeterminate physical substrate. No matter how sophisticated the algorithms, models, and alignment schemes at the upper layers, the quantum behavior of electrons at the bottom layer will not change accordingly. This is not statistical fluctuation with negligible probability—at nanosecond-scale high-frequency operations, quantum noise can become the trigger point for systematic errors.
AI computing faces two walls: the Landauer limit is the thermodynamic wall—each information erasure has an irreducible minimum energy cost; quantum tunneling is the quantum physics wall—the deterministic behavior of electrons collapses at nanometer scales. Together, these two walls define the ultimate physical boundary of silicon-based digital computing.
From Maxwell’s Demon’s Bill to Civilization’s Energy Cost
Synthesizing the arguments across all preceding sections, we can construct a complete causal chain:
Humans input information into AI systems. The signal-to-noise ratio of the input determines the sorting burden on the Transformer’s attention mechanism. Sorting is an irreversible information operation, constrained by Landauer’s principle, with a minimum energy cost at every step. The product of sorting—effective signal—is an ordered structure refined from noise. The waste product of refinement is heat. Heat dissipates into the environment, increasing the total entropy of the universe.
Therefore, for every segment of effective signal output AI produces, it simultaneously dumps an equivalent amount of entropy into the environment. Signal does not arise from nothing; it is purified from noise, and the physical cost of purification is irreversible heat dissipation. The electricity consumed by global AI data centers, the heat they emit, is essentially the planet-scale bill of Maxwell’s Demon—humanity has built a planetary-scale sorting machine, and the Second Law of Thermodynamics is collecting taxes.
This analytical framework also explains a deeper phenomenon. In Shannon’s channel theorem $C = B\log_2(1+\text{SNR})$, the “$1$” inside the brackets is a term that can never be eliminated—even if signal power approaches infinity, the noise floor will never reach zero. This mathematical “$1$” is the formal expression of signal never being able to completely separate from noise. Its physical correspondence is: the heat dissipation of computation can never be zero, the maintenance of order always requires a cost, and the existence of signal always depends on environmental entropy increase.
For human civilization, this means that the energy constraints of the information age are not a temporary technological bottleneck, but a structural feature of physics. Signal purification and entropy production are strictly coupled—you cannot obtain purer local signal without increasing the total entropy of the universe. The cost of every insight, every computation, every effective output is that somewhere, temperature rises by a small amount.
Redefining the Boundaries of the AI Energy Problem
The core argument of this paper can be compressed into the following chain of propositions:
(1) The computational essence of the Transformer is sorting—establishing an ordered token sequence from a disordered probability distribution.
(2) Sorting is a thermodynamic operation, constrained by Landauer’s principle, with an irreducible minimum energy cost for each irreversible information erasure. The Transformer is physically equivalent to Maxwell’s Demon.
(3) Input signal-to-noise ratio directly determines sorting efficiency. High-SNR input compresses the search space and improves energy efficiency; low-SNR input forces exhaustive sorting and wastes energy.
(4) AI Slop is a symptom of sorting failure—the system’s regression to a statistically high-frequency default state when effective sorting cannot be completed.
(5) The root cause of the AI productivity paradox is a bidirectional signal-to-noise ratio mismatch—excessive noise on the input side and insufficient human cognitive bandwidth on the output side.
(6) As semiconductors approach the quantum tunneling limit, the physical substrate of classical signals is being eroded by quantum noise. The ultimate constraint facing AI computing is not an engineering problem, but a physical boundary jointly defined by the Second Law of Thermodynamics and quantum physics.
These propositions collectively point to one conclusion: the current strategy of the AI industry, which allocates the vast majority of resources to the model side (larger parameters, more data, more powerful chips), has a structural blind spot at the physics level. If the input signal-to-noise ratio is not improved, larger models simply mean a larger demon performing more ineffective sorting in more noise, paying more thermodynamic taxes. The most fundamental path to solving the AI energy problem may not be engineering—not better cooling, more efficient chips, or greener electricity—but information-theoretic: improving the purity of input signals and reducing the total volume of noise that needs to be sorted.
This is an inconvenient conclusion. Because it implies that the ultimate bottleneck of AI system performance is not on the technology side, but on the human side. Machines can be upgraded infinitely, but if the quality of the signals driving the machines does not improve, upgrades will only amplify waste. The Second Law of Thermodynamics does not care how large your funding round is.
- Landauer, R. “Irreversibility and heat generation in the computing process.” IBM Journal of Research and Development, 5(3), 183-191 (1961).
- Bennett, C. H. “The thermodynamics of computation—a review.” International Journal of Theoretical Physics, 21(12), 905-940 (1982).
- Bérut, A. et al. “Experimental verification of Landauer’s principle linking information and thermodynamics.” Nature, 483, 187-189 (2012).
- Bormashenko, E. “Landauer’s Principle: Past, Present and Future.” Entropy, 27, 437 (2025).
- Wolpert, D. H. “Is stochastic thermodynamics the key to understanding the energy costs of computation?” Proc. Natl. Acad. Sci., 121, e2321112121 (2024).
- Shannon, C. E. “A mathematical theory of communication.” Bell System Technical Journal, 27(3), 379-423 (1948).
- Vaswani, A. et al. “Attention is all you need.” Advances in Neural Information Processing Systems, 30 (2017).
- Zhai, S. et al. “Stabilizing Transformer Training by Preventing Attention Entropy Collapse.” ICLR (2023).
- Jha, N. K. et al. “Entropy-Guided Attention for Private LLMs.” arXiv:2501.03489 (2025).
- Geshkovski, B. et al. “A Mathematical Theory of Attention.” arXiv:2007.02876 (2020).
- Duman Keleş, F. et al. “On the Computational Complexity of Self-Attention.” Algorithmic Learning Theory, PMLR 201 (2023).
- Hao, K. et al. “Attention Entropy is a Key Factor: An Analysis of Parallel Context Encoding.” arXiv:2412.16545 (2024).
- International Energy Agency. “Energy and AI.” IEA Special Report (2025).
- Lawrence Berkeley National Laboratory. “2024 Report on U.S. Data Center Energy Use.” U.S. Department of Energy (2024).
- Chattopadhyay, P. et al. “Landauer Principle and Thermodynamics of Computation.” arXiv:2506.10876 (2025).
- Freund, S. et al. “Fundamental energy cost of finite-time parallelizable computing.” Nature Communications, 14, 613 (2023).
- Merriam-Webster. “Word of the Year 2025: Slop.” Merriam-Webster Dictionary (2025).
- University of Florida. “AI slop: Study finds mediocre AI-generated content hurts consumers and creators.” UF News (March 2026).
- Semiconductor Engineering. “Quantum Effects at 7/5nm.” Semiconductor Engineering (2024).
- Gadepally, V. “AI data center energy costs and solutions.” MIT Sustainability Conference, MIT Sloan (2025).