Original Thought Paper · V4 · March 2026

The Bidirectional Black Box
in AI Systems Demands
an Evaluation Framework — Now

Why the AI Productivity Paradox Is Not About Technology —
It’s About Unmeasured Human Input

    이조글로벌인공지능연구소 LEECHO Global AI Research Lab
& Claude Opus 4.6 · Anthropic
    Published March 25, 2026 · Classification: Original Thought Paper · Version V4

    Domains: AI Epistemology · Human-Machine Interaction · Industrial Economics · Evaluation Systems Engineering
  

Abstract

The AI industry faces a widely overlooked structural problem: not only is the AI model’s output a probabilistic “black box,” but the human user’s input is equally an unpredictable, unstable, and subjectively influenced “black box.” When two black boxes are connected, the total system uncertainty grows superlinearly. This paper proposes the “Bidirectional Black Box” (BBB) theoretical framework, provides a formal definition grounded in information theory, and uses 2026 enterprise practice data, employee feedback, and customer complaints as empirical evidence to argue the root cause of the current AI productivity paradox. The paper engages directly with the J-Curve hypothesis, identifying its explanatory blind spots as precisely where BBB theory contributes. On the capital-versus-bubble debate, this paper does not conclude for the reader but presents both bearish and bullish arguments side by side, leaving the reader to find their own equilibrium. Finally, the paper proposes exploratory directions for a Bidirectional Real-Time Evaluation System (BRTES) — emphasizing that this is a framework still under development, not a finished solution. The value of this paper lies in raising questions and presenting bidirectional evidence, not in delivering ultimate answers.

01 · The Problem

Solow’s Paradox Returns: $2.5 Trillion Can’t Buy Productivity

The Most Quoted Sentence in Technology Economics Is Relevant Again

In 1987, Nobel laureate Robert Solow wrote the most famous sentence in the history of technology economics: “You can see the computer age everywhere but in the productivity statistics.” In February 2026, Apollo’s Chief Economist Torsten Slok nearly repeated it verbatim: “AI is everywhere except in the incoming macroeconomic data.”

90%NBER survey of 6,000 executives:
zero AI impact on productivity

95%MIT report: GenAI pilots
fail beyond experimental stage

56%PwC survey: CEOs report
“nothing” from AI adoption

$2.5TGlobal AI spending
in 2026

More ironically, a randomized controlled trial by METR on experienced open-source developers found that AI tools actually reduced their productivity by 19% — yet the developers themselves believed they were 20% faster. A gap of nearly 40 percentage points between perception and reality. ManpowerGroup’s 2026 global survey showed that AI usage frequency increased 13%, while confidence in AI’s utility plummeted 18%.

Core Problem

Mainstream explanations converge on two frameworks: the technology is immature (bubble thesis), or J-Curve delayed effects (optimist thesis). This paper proposes a third explanation: the root cause lies neither in AI technology itself nor merely in time lags, but in a structural defect overlooked across the entire human-machine interaction system — the Bidirectional Black Box problem.

02 · Theoretical Framework

Formal Definition of the Bidirectional Black Box (BBB)

From Metaphor to Theory

The AI field’s “black box” discussion focuses almost entirely on the model side — input goes in, internal computation is opaque. But this is only half the problem. The human input side is equally a black box.

The user’s intent is encoded into natural language before reaching the AI. This encoding process involves severe information loss: linguistic ambiguity, implicit assumptions, context dependency, and the user’s current cognitive state and emotional fluctuations. Using an information-theoretic framework, we can provide a formal definition:

Htotal(System) ≥ H(Input) + H(Model) + H(Input) × H(Model) × ρ

Where H(Input) is the information entropy (uncertainty) of the human input side, H(Model) is the model-side entropy,
and ρ is the coupling coefficient between the two sides. When uncertainties cross-couple,
total system uncertainty grows superlinearly — not additive, but multiplicative or even exponential.

Human Intent
H₁: Unobservable

→

Language Encoding
Lossy Compression ΔH₂

→

AI Probabilistic
Inference
H₃: Model Black Box

→

Output Result
H₄ ≥ H₂ × H₃

→

Human Evaluation
H₅: Subjective Bias

Definition 1 (Input-Side Black Box): The conversion from human intent to natural language is a lossy compression process. Information loss is governed by uncontrollable variables including the user’s cognitive level, emotional state, domain knowledge, and linguistic precision, producing an unpredictable input signal.

Definition 2 (Output-Side Black Box): Large language models generate output by probabilistically predicting the next token. Alignment training (RLHF, etc.) raises the probability of certain behaviors without locking them in, leaving irreducible randomness in outputs.

Definition 3 (Bidirectional Black Box System): When the input-side and output-side black boxes are connected in series, the total system uncertainty is not a simple sum but is superlinearly amplified through coupling effects, forming a “Bidirectional Black Box System” (BBB System).

Definition 4 (Evaluation-Side Bias): The evaluator of a BBB system (the human user) is simultaneously the producer of the input-side black box. Their psychological defaults (external attribution bias, loss-aversion instinct) make it impossible to objectively assess their own responsibility in the system, causing evaluations to systematically skew toward blaming the model side.

Theoretical Significance

BBB theory reveals an overlooked epistemological dilemma: in human-machine interaction systems, no deterministic communication channel exists. Human language fed to a large model, no matter how precise or programmatic, is “downgraded” to a probabilistic tendency. This is not a problem prompt engineering can solve — it is an architectural ceiling of the entire human-machine interaction paradigm.

03 · The Enterprise User Voice

Frontline Complaints: Real Experiences Erased by Aggregated Data

Six Categories of Enterprise AI Frustration, 2025–2026

Macro survey data tends to reduce enterprise AI dissatisfaction to a single percentage. But frontline user feedback is the most direct empirical evidence for BBB theory. Below are the six most concentrated categories of complaints from the enterprise field in 2025–2026:

Complaint 1: AI increased workloads instead of reducing them.

ActivTrak 2026 Behavioral Data Report

After AI adoption, employee email processing time increased 104%, instant messaging rose 145%, and management tool usage grew 94%. Not a single work category saw time savings from AI. The report states: “The data is unambiguous — AI does not reduce workloads.” Employees had to sacrifice deep thinking time to handle the increased daily tasks.

Complaint 2: Management and employees have completely opposite perceptions.

Checkr 2026 Survey · MetLife 2026 Report

40% of managers trust AI outputs; among employees, only 9%. 59% of employees say they rarely or never trust AI outputs at work. MetLife found 83% of HR managers say AI makes employees faster, but 67% simultaneously admit AI is “creating new points of friction and mistrust.” Management sees efficiency numbers on dashboards; employees see more garbage to process daily.

Complaint 3: Using more than four AI tools causes brain overload.

Boston Consulting Group 2026 Study

Employees using three or fewer AI tools self-reported improved efficiency; those using four or more saw efficiency collapse. Researchers call this “AI brain fry” — cognitive load exceeding brain processing capacity. Employee feedback: “Things were moving too fast, and I didn’t have the cognitive ability to process all the information and make all the decisions.”

Complaint 4: AI customer service is worse than no customer service.

Glance 2026 CX Report · Consumer Feedback

75% of consumers report frustration with AI customer service. Zomato users received coupons from AI support during emergencies. Eventbrite users filed mass complaints on Trustpilot. After deploying AI customer service, what customers experienced was more loops, dead ends, and repeat explanations — trust continued to decline.

Complaint 5: The greatest risk comes from employees’ own usage behavior.

Optro 2026 Risk Intelligence Report

Over the past 12 months, 40% of organizations reported inaccurate AI outputs, 33% experienced policy violations, and 28% received AI-related customer complaints. The primary risk source isn’t the model — 34% of respondents cited employees inputting sensitive data into AI tools as the top risk behavior. 21% attributed it to insufficient training, 21% to “pressure to move quickly.”

Complaint 6: People who use AI face a social penalty.

Duke University Study · Gallup 2026

Workers who use AI in the workplace are perceived by colleagues as “cutting corners.” 64% of U.S. adults plan to avoid AI “as long as possible.” 31% of employees are actively working against their company’s AI initiatives. 45% of CEOs admit most employees are resistant or openly hostile to AI.

Interpretation Through BBB Framework

These six complaints are not isolated phenomena — they are different symptoms of BBB system malfunction. Complaints 1 and 3 are direct consequences of output-side noise amplification. Complaint 2 is a manifestation of evaluation-side bias. Complaint 4 is the inevitable result of low-quality input (ambiguous customer queries) meeting probabilistic output. Complaint 5 is evidence of input-side loss of control. Complaint 6 is a social-psychological reaction on the evaluation side. All complaints point to the same systemic root cause: no one is measuring and managing bidirectional quality in human-machine interaction.

04 · Dialogue with the J-Curve

What the J-Curve Explains — And What It Cannot

Where BBB Theory Fills the Gap

The “Productivity J-Curve” proposed by Brynjolfsson and colleagues is the dominant academic framework for explaining the AI productivity paradox. It argues that when a General Purpose Technology (GPT) is introduced, measured productivity temporarily declines because firms redirect resources from production toward accumulating intangible assets (process redesign, data governance, organizational learning) — investments not captured by traditional GDP statistics but which release as productivity gains in the “harvest phase.”

MIT Sloan’s micro-level study of U.S. manufacturing firms confirmed the J-Curve: average productivity declined 1.33 percentage points after AI adoption; correcting for selection bias, the short-run negative impact was approximately 60 percentage points. However, among firms persisting for four or more years, over 60% ultimately achieved productivity improvements exceeding 25%. Younger firms recovered faster than older ones.

Dimension	J-Curve Can Explain	J-Curve Cannot Explain
Productivity Decline	Temporary dip from organizational adjustment costs	Why AI usage effects vary enormously between employees within the same firm
Time Dimension	Historically, GPTs take 20–40 years to appear in macro data	Why employee confidence drops as AI usage increases
Firm Differences	Digitally mature firms recover faster	Why output quality fluctuates wildly for the same tool, same firm, different employees
ROI	Intangible asset accumulation takes time to monetize	Why workloads increased by 104% instead of decreasing
Evaluation	Traditional GDP metrics miss intangible investments	Why managers and employees assess the same tool in completely opposite ways

BBB Theory’s Complementary Explanatory Power: The J-Curve attributes productivity decline to organizational adjustment costs — a supply-side explanation. BBB theory adds a demand-side/usage-side explanation: even after organizational adjustment is complete, as long as input-side quality is neither managed nor quantified, system output quality cannot stabilize. The J-Curve assumes the “harvest phase” will arrive naturally; BBB theory argues that without solving the bidirectional quality problem, the harvest phase may be indefinitely postponed — firms may forever circle the bottom of the J-Curve.

Theoretical Positioning

BBB theory does not negate the J-Curve — it explains what the J-Curve cannot: why some firms climb out of the J-Curve’s trough while others cannot. The difference lies not in investment scale or time horizon, but in whether a mechanism for managing input-side quality was established. The 12% of firms that gained both cost and revenue benefits (PwC data) were precisely those that undertook deep organizational transformation — essentially, they unconsciously and partially solved the input-side black box problem.

05 · The Missing Evaluation Framework

Subjective “Feeling” Reports: Wrong Ruler, Wrong Target

A Methodological Critique of Current AI Productivity Surveys

Current global AI productivity surveys suffer from a fatal methodological flaw: they ask users to evaluate a tool, but the users themselves are part of the variable. The logical essence of these reports is measuring an undefined metric with an uncalibrated instrument.

Missing Dimension	Specific Problem	Consequence
Input Quality Baseline	No standard for what constitutes a good prompt	Cannot distinguish AI problems from user problems
Controlled Variables	Input variance between employees on same tasks unmeasured	Cannot quantify input quality’s effect on output
Attribution Separation	No breakdown of how much output quality comes from model vs. input	No valid causal attribution possible
Human Baseline	Pre-AI quality variance range for pure human work unknown	No comparison reference; AI effect indeterminable
Bidirectional KPIs	No metric that simultaneously quantifies input and output	Entire evaluation system is one-sided

The result: all AI effectiveness surveys are “subjective use + subjective evaluation + subjective survey” — absolute subjective-perspective “human feeling” reports. The evaluator is simultaneously a variable within the evaluated system, and the human instinct toward self-preservation makes it impossible to objectively assess one’s own responsibility. At a deeper level — who designs this evaluation framework? Humans. The designer carries all the same biases, and including human-side variables amounts to admitting “our people aren’t good enough” — nearly impossible in organizational politics.

06 · Industry Data

The Paying User Ceiling and the Token Black Hole

Where Capital Burns Without Returns

900MChatGPT weekly
active users

~5%Free-to-paid
conversion rate

$700BBig Four 2026
AI infrastructure spend

4%Quarterly user growth
decelerating

95% of users refuse to pay. European ChatGPT spending stagnated since May 2025. The Big Four’s 2026 AI infrastructure spending approaches $700 billion, yet Microsoft’s AI revenue target is only $25 billion — the gap between input and output continues widening.

The Token Black Hole: Massive compute is consumed on ineffective scenarios — free users chatting, low-quality input generating low-quality output, repeat queries, AI Slop content. The model doesn’t refuse a bad question — it earnestly expends expensive compute on requests not worth processing. In the Chinese market, ByteDance’s Doubao frequently banned accounts for mass pornographic messaging. xAI’s Grok went to the opposite extreme — after opening adult content, only 2 of 12 co-founders remained, 35 state attorneys general demanded remediation, multiple countries blocked or investigated, and enterprise clients fled entirely.

These are not isolated incidents — they are direct manifestations of the BBB system at the commercial level: can’t use it well → won’t pay → model lacks quality private data → can’t improve → can use it even less well. Compute is subsidizing human boredom and desire, not creating commercial value.

07 · The Layoff Paradox

AI Cost-Cutting’s Self-Referential Loop

AI Optimizes AI, But Who Optimizes the Users?

In Q1 2026, over 45,000 tech workers were laid off, approximately 20% explicitly attributed to AI. But layoffs are highly concentrated within AI-related and technology companies: Block cut 4,000 (40%), Atlassian cut 1,600 (10%), Meta plans to cut 20%. Oxford Economics analysts suspect some firms are using AI as “wrapping paper” for restructuring.

Where AI genuinely cuts costs is in AI R&D itself — writing code with AI, optimizing models with AI, replacing junior programmers with AI. This is a closed loop: the AI industry optimizes the AI industry with AI. But on the non-AI enterprise usage side, success stories are scarce. BCG found that failing firms pursue an average of 6.1 AI use cases simultaneously versus 3.5 for leaders, yet leaders expect 2.1× higher ROI. Fewer than one-third of companies have upskilled even a quarter of their workforce. Most critically — most enterprises don’t track financial KPIs for their AI initiatives at all.

08 · Bubble or Dawn? — Bidirectional Evidence

Letting the Reader Find Their Own Equilibrium

This Paper Does Not Judge — It Presents Both Sides

This paper’s core claim is that one-directional information produces one-directional judgment. Bubble theorists see only deployment failure data; optimists see only technology progress curves. Both are blind to half the picture. Since this paper critiques one-sided evaluation, the paper itself should not conclude for the reader. Below, the core arguments of both sides are presented bidirectionally.

Evidence 1: Capital Structure

What Bubble Theorists See

Circular investment exists in the AI ecosystem — Microsoft invests in OpenAI, OpenAI buys compute from CoreWeave, CoreWeave leases NVIDIA GPUs, NVIDIA reinvests revenue in OpenAI. Morgan Stanley notes the same dollar is counted across multiple balance sheets.

Big Four 2026 AI capex approaches $700B, but J.P. Morgan warns $650B in annual revenue is needed for a mere 10% return. OpenAI pledges $1.4T in data centers over 8 years on $13B annual revenue.

What Optimists See

Current AI investors are Microsoft ($160B annual cash flow), Google, Amazon, Meta — they can absorb losses, and absorb them for years. In 2000, dotcom investors were retail traders and small VCs; three funding rounds burned through meant death. Capital resilience is fundamentally different.

NVIDIA FY2026 revenue $215.9B (+65% YoY), market cap $4.3T. S&P 500 forward P/E ~23×, far below the Nasdaq’s ~60× in 2000. Today’s valuations are backed by real earnings.

Evidence 2: Revenue Growth

What Bubble Theorists See

ChatGPT has 900M weekly active users but only ~5% pay. European spending stagnated since May 2025. Quarterly growth down to 4%. Enterprise GenAI spend is $37B but 95% of firms see zero return. API price wars with 40–70% cuts — a classic signal of growth ceiling.

What Optimists See

Anthropic’s annualized revenue grew from $1B (Dec 2024) to $19B (Mar 2026) — 19× in 14 months, unprecedented in B2B software history. Claude Code reached $2.5B ARR in 9 months from zero. Eight Fortune 10 companies are Claude customers. 1 in 5 businesses on Ramp now pay Anthropic, up from 1 in 25 a year ago.

Evidence 3: Productivity

What Bubble Theorists See

NBER survey of 6,000 executives: 90% report zero AI productivity impact. METR trial: AI made developers 19% slower. UK Copilot trial: Excel slower and worse quality; PPT faster but lower accuracy. ActivTrak data: email time +104%, zero time savings in any category.

What Optimists See

Brynjolfsson estimates 2025 U.S. productivity growth at ~2.7%, nearly double the decade average. MIT manufacturing study confirms J-Curve: initial −1.33pp, but 60%+ of firms persisting 4 years achieve 25%+ gains. A small cohort of “power users” has automated end-to-end workflows. 4% of GitHub public commits are by Claude Code.

Evidence 4: Historical Parallels

What Bubble Theorists See

Barron’s 2000 cover story: 74% of 207 internet companies had negative cash flows; 51 projected to run out of money within 12 months. The Nasdaq then crashed 78%, erasing $5T in market value. Latest Bank of America survey: most global fund managers call AI stocks “a bubble.”

What Optimists See

Only 14% of dotcoms were profitable in 2000; today’s major AI investors are the world’s most profitable companies. Janus Henderson identifies at least 8 structural differences (Y2K effect, audit standards, demand visibility, etc.). Even if a correction occurs, GPU clusters and data centers — like 2000’s fiber optics — will later underpin the next wave.

Authors’ Position Statement

This paper does not adjudicate whether AI is a bubble or a dawn. Both sides have data-supported arguments and both have blind spots. However, we observe one variable both sides overlook: whether or not AI is in a bubble, the Bidirectional Black Box problem is real and will not disappear automatically with abundant capital or technological breakthroughs. Capital can keep AI alive; technology can improve model capabilities; but unless input-side quality is managed and evaluation frameworks are built, system output quality cannot stabilize. This is a structural problem independent of the bubble debate. Transparency disclosure: This paper’s AI collaborator is an Anthropic product (Claude Opus 4.6). Anthropic growth data cited herein has been cross-verified against Bloomberg, Sacra, and Epoch AI, but readers should be aware of this potential conflict of interest.

The Nonlinear Nature of Scientific Progress

Regardless of whether the reader leans toward the bubble thesis or optimism, one fact both sides should acknowledge: scientific progress is nonlinear. At least five technology pathways are advancing in parallel within AI, each at a different stage of development:

Technology Pathway	Current Stage	Potential Impact
Multimodal Fusion	Early commercialization (GPT-4o, Gemini deployed)	Breaking text-image-video-audio barriers
Reasoning Enhancement	Rapid iteration (o1/o3 series)	Transforming deep thinking and multi-step reasoning
Agent Architecture	PoC → early commercialization transition	Redefining interaction: from conversation to delegation
Embodied Intelligence	Lab → prototype stage	Bridging digital and physical worlds
Novel Inference Chips	Early commercialization (Groq, Cerebras)	Redefining inference cost structure

A breakthrough in any one pathway could reshape the entire industry’s cost structure and usage paradigm. But “could reshape” is not “will reshape.” Linear extrapolation from current data is a methodological error — whether that extrapolation points toward collapse or prosperity.

09 · Exploratory Directions

BRTES: A Framework Under Development — Not a Finished Solution

Honest About What We Don’t Know

Based on BBB theory and enterprise-side empirical data, we propose exploratory directions for a Bidirectional Real-Time Evaluation System (BRTES). To be clear: this is a conceptual framework still under development, not a mature solution. We have not completed validation, nor do we pretend to have found answers. What follows is directional thinking, offered for joint exploration by industry and academia.

Layer 1: Input Quality Measurement Layer

Real-time quantitative evaluation of human Input, across five core measurement dimensions:

Dimension	Definition	Implementation
Intent Clarity	Ambiguity score of input instructions	Model confidence back-inference, polysemy detection
Signal-to-Noise Ratio	Ratio of effective information to redundancy/noise	Information density analysis, keyword extraction rate
Context Completeness	Coverage of information required for task execution	Prerequisite checking, missing information prompting
State Stability	Cross-session input quality variance for same user	Historical baseline comparison, variance tracking
Domain Relevance	Alignment between input and target task domain	Task classifier, domain knowledge validation

Key design principle: Real-time feedback, not post-hoc scoring. When the AI detects an ambiguous instruction, it proactively tells the user: “Your input has a low signal-to-noise ratio; there’s a 70% chance I’ll go off-track. Would you like to supplement these details?” Transforming post-hoc evaluation into a real-time calibration feedback loop.

Layer 2: Output Quality Measurement Layer

Objective, quantified evaluation of AI Output, with bidirectional comparison against the input side:

Dimension	Definition	Implementation
Task Completion	Degree to which output matches input intent	Intent alignment scoring, requirement coverage
Factual Accuracy	Correctness rate of output content	Knowledge base verification, citation validation
Consistency	Output stability for same input across different times	Repeat testing, variance analysis
Value Density	Ratio of useful output information to total tokens	Information-per-token ratio calculation
Constraint Compliance	Execution rate against user-specified constraints	Constraint checking, deviation rate calculation

Layer 3: Bidirectional Comparison & Attribution Layer

Lateral comparison of input and output measurement data, enabling responsibility attribution:

Input Quality Score
Input Score

→

Attribution Engine
Attribution Engine

←

Output Quality Score
Output Score

Result Quality = f(Input Quality, Model Capability, Task Complexity)

When output quality is low, the attribution engine automatically determines:
• Low input quality + Normal model capability → User-side issue → Trigger input improvement suggestions
• High input quality + Anomalous model capability → Model-side issue → Log model defect
• Low input quality + Anomalous model capability → Dual-side issue → Flag systemic risk
Enterprises can use this data to objectively assess employee AI usage competency and AI tool effectiveness.

Unsolved Key Challenge: Who Will Build This? Who Has the Incentive?

BRTES’s greatest obstacle is not technical but organizational-political and market-incentive driven. LLM companies won’t voluntarily build it — admitting output instability hurts valuations. Enterprises won’t voluntarily build it — admitting input-side problems amounts to admitting “our people aren’t good enough.” Media won’t push for it — extreme narratives get more traffic than systematic analysis. This means BRTES’s builder may only be: independent third-party evaluation bodies, academia-industry consortia, or government standards organizations. But none of these have mobilized yet.

The deeper resistance is psychological: having AI evaluate human cognitive ability and expressive precision is an enormous challenge — psychologically and organizationally. People accept AI evaluating tools as natural, but the reverse — AI evaluating human input quality — triggers instinctive resistance. This resistance won’t disappear no matter how elegant the technical solution.

Honesty Statement

BRTES is currently an exploratory direction, not a validated solution. The authors’ real-time evaluation system is still under development. How the attribution engine works for open-ended tasks, how input quality metrics achieve cross-scenario generalizability, how to break through organizational-political resistance — these remain unanswered questions. We choose to honestly present these unknowns rather than pretend we’ve found the answers.

10 · Open Reflections

We Raise Questions — We Don’t Pretend to Have All the Answers

The Value of This Paper Is in the Questions, Not the Conclusions

AI maximalism and AI bubble theory commit the same error — the parable of the blind men and the elephant. One side sees the upper bound of model capability; the other sees the lower bound of deployment results. Both have only partial information, and both lack scrutiny of the human input-side variable. This paper attempts to fill that overlooked dimension.

This paper’s contribution lies not in providing answers but in raising a set of overlooked questions:

Theoretical level — BBB theory reveals the superlinear stacking of uncertainty on both sides of human-machine interaction systems. Is this sufficient to explain the portion of the productivity paradox that the J-Curve cannot cover? More empirical testing is needed.

Empirical level — Six categories of enterprise complaints demonstrate that BBB system malfunction is a daily reality. But do these complaints truly point to input-side quality issues, or are there other explanations? First-hand controlled experiments are needed.

Engineering level — BRTES’s three-layer architecture is a direction, far from a mature solution. Can the attribution engine work for open-ended tasks? Would input quality scoring lead to “KPI gamification”? We cannot yet answer these questions.

Capital level — We presented bubble and optimist evidence bidirectionally. Where the equilibrium lies is not for us to adjudicate. But regardless of whether AI is in a bubble, the BBB problem is real — this point is unaffected by market cycles.

Limitations of this paper: The BBB theory’s entropy formula is a qualitative descriptive model requiring rigorous information-theoretic derivation; the BRTES three-layer architecture is unvalidated empirically; Chinese market case depth is insufficient; BBB theory’s applicability across different policy and cultural environments requires further study; enterprise complaint data mostly originates from industry survey reports rather than primary experiments; this paper’s AI collaborator is an Anthropic product, and related data citations have been cross-verified but readers should be aware of the potential conflict of interest. These limitations are simultaneously directions for future research.

Final Reflection · For the Reader

This paper’s value does not lie in telling you whether AI will succeed or fail. Its value lies in pointing out a structural variable overlooked by both bubble theorists and optimists — the quality of human input. If you are an enterprise decision-maker, before evaluating AI tools, first examine what your team is feeding the AI. If you are an AI practitioner, before optimizing models, first consider why your users can’t use them well. If you are an investor, before judging AI valuations, first consider whether the industry has built infrastructure that makes AI value quantifiable. A transparent protocol layer is needed between two black boxes. That protocol does not yet exist. Who builds it, how to build it, whether it can be built — these are open questions, beyond this paper’s ability to answer. But we can be certain of one thing: without raising this question, no one will ever answer it.

References

NBER Working Paper (2026). Survey of 6,000 executives on AI impact on employment and productivity. NBER
METR (2026). Randomized controlled trial on AI tools and developer productivity: −19% actual vs. +20% perceived. PEER-REVIEWED
MIT GenAI Divide Report (2026). 95% failure rate of GenAI pilots beyond experimental phase. ACADEMIC
PwC 2026 Global CEO Survey. 56% report zero gains; 12% report dual cost-revenue benefits. INDUSTRY
Duke University / Federal Reserve CFO Survey (2026). Perceived vs. actual AI productivity gap. ACADEMIC
UK Department for Business and Trade. Microsoft 365 Copilot controlled trial — no productivity evidence. GOVERNMENT
ManpowerGroup 2026 Global Talent Barometer. AI use +13%, confidence −18%. INDUSTRY
ActivTrak 2026 Behavioral Data Report. Email time +104%, messaging +145%, zero time savings. INDUSTRY
Checkr / Pollfish Survey (Feb 2026). Manager AI trust 40% vs. employee trust 9%. INDUSTRY
MetLife 2026 Workplace Report. 83% say faster, 67% say “new friction and mistrust.” INDUSTRY
Boston Consulting Group (2026). “AI brain fry” — cognitive overload beyond 4+ tools. INDUSTRY
Glance 2026 CX Trends Report. 75% consumer frustration with AI customer service. INDUSTRY
Optro 2026 Risk Intelligence Report. 40% inaccurate outputs, 34% cite employee data input as top risk. INDUSTRY
Duke University (2025). Social penalty study — 4,400 participants, AI users perceived as “cutting corners.” ACADEMIC
Gallup (2026). 64% of U.S. adults plan to avoid AI “as long as possible.” INDUSTRY
Brynjolfsson, Rock & Syverson. “The Productivity J-Curve: How Intangibles Complement GPTs.” NBER WP 25148. FOUNDATIONAL
MIT Sloan / McElheran et al. (2025/2026). AI adoption J-curve in U.S. manufacturing. −1.33pp initial, 60% see 25%+ gains after 4yr. ACADEMIC
Apollo Chief Economist Torsten Slok (2026). “AI is everywhere except in the incoming macroeconomic data.” INDUSTRY
BCG (2026). Failing firms average 6.1 use cases vs. 3.5 for leaders; <1/3 upskilled 25% of workforce. INDUSTRY
Deutsche Bank Research Institute (2025). European ChatGPT spending stagnation since May 2025. INDUSTRY
Sacra Research (2026). OpenAI $25B ARR, Anthropic $19B ARR, paying user and revenue estimates. INDUSTRY
RationalFX / TNGlobal (2026). 45,000+ tech layoffs Q1 2026, 20.4% AI-attributed. MEDIA
Oxford Economics / Revelio Labs (2026). AI as pretext for corporate restructuring analysis. INDUSTRY
CNN / TechCrunch / CNBC (2026). xAI co-founder departures (10/12 left) and Grok controversy. MEDIA
NY State AG + 34 AG coalition (2026). Demand for xAI action on nonconsensual content. GOVERNMENT
Robert Solow (1987). “You can see the computer age everywhere but in the productivity statistics.” FOUNDATIONAL
Erik Brynjolfsson, FT (2026). “The AI productivity take-off is finally visible” — U.S. productivity ~2.7% in 2025. ACADEMIC
Cornell University / Zitek (2024). AI surveillance reduces employee autonomy and productivity. ACADEMIC
Anthropic (2026). Series G: $30B raised at $380B valuation. ARR $1B→$19B in 14 months. Claude Code $2.5B ARR in 9 months. INDUSTRY
Epoch AI (2026). Anthropic 10×/year growth vs. OpenAI 3.4×/year; crossover projected mid-2026 at ~$43B. ACADEMIC
SaaStr / Alex Clayton (2026). “We’ve looked at 200+ public software company IPOs — this growth rate has never happened.” INDUSTRY
IntuitionLabs (2025/2026). Data-driven comparison: AI bubble vs. dot-com bubble — capex, valuations, capital structure. INDUSTRY
Janus Henderson (2025). “8 reasons the AI wave is different” — Y2K, fraud standards, demand visibility, geopolitics. INDUSTRY
Barron’s (2000). “Burning Up” cover: 74% of 207 internet companies had negative cash flows. FOUNDATIONAL
Simply Wall St (2026). Dotcom P/E ~60× vs. current S&P 500 ~23×; today’s multiples supported by real earnings. INDUSTRY
NVIDIA FY2026. Revenue $215.9B (+65% YoY); market cap ~$4.3T; P/E ~47× vs. Cisco 2000 peak ~472×. INDUSTRY