Original Thought Paper · V4 · March 2026

The Bidirectional Black Box
in AI Systems Demands
an Evaluation Framework — Now

Why the AI Productivity Paradox Is Not About Technology —
It’s About Unmeasured Human Input

이조글로벌인공지능연구소 LEECHO Global AI Research Lab
& Claude Opus 4.6 · Anthropic

Published March 25, 2026 · Classification: Original Thought Paper · Version V4
Domains: AI Epistemology · Human-Machine Interaction · Industrial Economics · Evaluation Systems Engineering

Abstract

The AI industry faces a widely overlooked structural problem: not only is the AI model’s output a probabilistic “black box,” but the human user’s input is equally an unpredictable, unstable, and subjectively influenced “black box.” When two black boxes are connected, the total system uncertainty grows superlinearly. This paper proposes the “Bidirectional Black Box” (BBB) theoretical framework, provides a formal definition grounded in information theory, and uses 2026 enterprise practice data, employee feedback, and customer complaints as empirical evidence to argue the root cause of the current AI productivity paradox. The paper engages directly with the J-Curve hypothesis, identifying its explanatory blind spots as precisely where BBB theory contributes. On the capital-versus-bubble debate, this paper does not conclude for the reader but presents both bearish and bullish arguments side by side, leaving the reader to find their own equilibrium. Finally, the paper proposes exploratory directions for a Bidirectional Real-Time Evaluation System (BRTES) — emphasizing that this is a framework still under development, not a finished solution. The value of this paper lies in raising questions and presenting bidirectional evidence, not in delivering ultimate answers.


01 · The Problem

Solow’s Paradox Returns: $2.5 Trillion Can’t Buy Productivity

The Most Quoted Sentence in Technology Economics Is Relevant Again

In 1987, Nobel laureate Robert Solow wrote the most famous sentence in the history of technology economics: “You can see the computer age everywhere but in the productivity statistics.” In February 2026, Apollo’s Chief Economist Torsten Slok nearly repeated it verbatim: “AI is everywhere except in the incoming macroeconomic data.”

90%NBER survey of 6,000 executives:
zero AI impact on productivity
95%MIT report: GenAI pilots
fail beyond experimental stage
56%PwC survey: CEOs report
“nothing” from AI adoption
$2.5TGlobal AI spending
in 2026

More ironically, a randomized controlled trial by METR on experienced open-source developers found that AI tools actually reduced their productivity by 19% — yet the developers themselves believed they were 20% faster. A gap of nearly 40 percentage points between perception and reality. ManpowerGroup’s 2026 global survey showed that AI usage frequency increased 13%, while confidence in AI’s utility plummeted 18%.

Core Problem

Mainstream explanations converge on two frameworks: the technology is immature (bubble thesis), or J-Curve delayed effects (optimist thesis). This paper proposes a third explanation: the root cause lies neither in AI technology itself nor merely in time lags, but in a structural defect overlooked across the entire human-machine interaction system — the Bidirectional Black Box problem.


02 · Theoretical Framework

Formal Definition of the Bidirectional Black Box (BBB)

From Metaphor to Theory

The AI field’s “black box” discussion focuses almost entirely on the model side — input goes in, internal computation is opaque. But this is only half the problem. The human input side is equally a black box.

The user’s intent is encoded into natural language before reaching the AI. This encoding process involves severe information loss: linguistic ambiguity, implicit assumptions, context dependency, and the user’s current cognitive state and emotional fluctuations. Using an information-theoretic framework, we can provide a formal definition:

Htotal(System) ≥ H(Input) + H(Model) + H(Input) × H(Model) × ρ
Where H(Input) is the information entropy (uncertainty) of the human input side, H(Model) is the model-side entropy,
and ρ is the coupling coefficient between the two sides. When uncertainties cross-couple,
total system uncertainty grows superlinearly — not additive, but multiplicative or even exponential.

Human Intent
H₁: Unobservable

Language Encoding
Lossy Compression ΔH₂

AI Probabilistic
Inference
H₃: Model Black Box

Output Result
H₄ ≥ H₂ × H₃

Human Evaluation
H₅: Subjective Bias

Definition 1 (Input-Side Black Box): The conversion from human intent to natural language is a lossy compression process. Information loss is governed by uncontrollable variables including the user’s cognitive level, emotional state, domain knowledge, and linguistic precision, producing an unpredictable input signal.

Definition 2 (Output-Side Black Box): Large language models generate output by probabilistically predicting the next token. Alignment training (RLHF, etc.) raises the probability of certain behaviors without locking them in, leaving irreducible randomness in outputs.

Definition 3 (Bidirectional Black Box System): When the input-side and output-side black boxes are connected in series, the total system uncertainty is not a simple sum but is superlinearly amplified through coupling effects, forming a “Bidirectional Black Box System” (BBB System).

Definition 4 (Evaluation-Side Bias): The evaluator of a BBB system (the human user) is simultaneously the producer of the input-side black box. Their psychological defaults (external attribution bias, loss-aversion instinct) make it impossible to objectively assess their own responsibility in the system, causing evaluations to systematically skew toward blaming the model side.

Theoretical Significance

BBB theory reveals an overlooked epistemological dilemma: in human-machine interaction systems, no deterministic communication channel exists. Human language fed to a large model, no matter how precise or programmatic, is “downgraded” to a probabilistic tendency. This is not a problem prompt engineering can solve — it is an architectural ceiling of the entire human-machine interaction paradigm.


03 · The Enterprise User Voice

Frontline Complaints: Real Experiences Erased by Aggregated Data

Six Categories of Enterprise AI Frustration, 2025–2026

Macro survey data tends to reduce enterprise AI dissatisfaction to a single percentage. But frontline user feedback is the most direct empirical evidence for BBB theory. Below are the six most concentrated categories of complaints from the enterprise field in 2025–2026:

Complaint 1: AI increased workloads instead of reducing them.

ActivTrak 2026 Behavioral Data Report

After AI adoption, employee email processing time increased 104%, instant messaging rose 145%, and management tool usage grew 94%. Not a single work category saw time savings from AI. The report states: “The data is unambiguous — AI does not reduce workloads.” Employees had to sacrifice deep thinking time to handle the increased daily tasks.

Complaint 2: Management and employees have completely opposite perceptions.

Checkr 2026 Survey · MetLife 2026 Report

40% of managers trust AI outputs; among employees, only 9%. 59% of employees say they rarely or never trust AI outputs at work. MetLife found 83% of HR managers say AI makes employees faster, but 67% simultaneously admit AI is “creating new points of friction and mistrust.” Management sees efficiency numbers on dashboards; employees see more garbage to process daily.

Complaint 3: Using more than four AI tools causes brain overload.

Boston Consulting Group 2026 Study

Employees using three or fewer AI tools self-reported improved efficiency; those using four or more saw efficiency collapse. Researchers call this “AI brain fry” — cognitive load exceeding brain processing capacity. Employee feedback: “Things were moving too fast, and I didn’t have the cognitive ability to process all the information and make all the decisions.”

Complaint 4: AI customer service is worse than no customer service.

Glance 2026 CX Report · Consumer Feedback

75% of consumers report frustration with AI customer service. Zomato users received coupons from AI support during emergencies. Eventbrite users filed mass complaints on Trustpilot. After deploying AI customer service, what customers experienced was more loops, dead ends, and repeat explanations — trust continued to decline.

Complaint 5: The greatest risk comes from employees’ own usage behavior.

Optro 2026 Risk Intelligence Report

Over the past 12 months, 40% of organizations reported inaccurate AI outputs, 33% experienced policy violations, and 28% received AI-related customer complaints. The primary risk source isn’t the model — 34% of respondents cited employees inputting sensitive data into AI tools as the top risk behavior. 21% attributed it to insufficient training, 21% to “pressure to move quickly.”

Complaint 6: People who use AI face a social penalty.

Duke University Study · Gallup 2026

Workers who use AI in the workplace are perceived by colleagues as “cutting corners.” 64% of U.S. adults plan to avoid AI “as long as possible.” 31% of employees are actively working against their company’s AI initiatives. 45% of CEOs admit most employees are resistant or openly hostile to AI.

Interpretation Through BBB Framework

These six complaints are not isolated phenomena — they are different symptoms of BBB system malfunction. Complaints 1 and 3 are direct consequences of output-side noise amplification. Complaint 2 is a manifestation of evaluation-side bias. Complaint 4 is the inevitable result of low-quality input (ambiguous customer queries) meeting probabilistic output. Complaint 5 is evidence of input-side loss of control. Complaint 6 is a social-psychological reaction on the evaluation side. All complaints point to the same systemic root cause: no one is measuring and managing bidirectional quality in human-machine interaction.


04 · Dialogue with the J-Curve

What the J-Curve Explains — And What It Cannot

Where BBB Theory Fills the Gap

The “Productivity J-Curve” proposed by Brynjolfsson and colleagues is the dominant academic framework for explaining the AI productivity paradox. It argues that when a General Purpose Technology (GPT) is introduced, measured productivity temporarily declines because firms redirect resources from production toward accumulating intangible assets (process redesign, data governance, organizational learning) — investments not captured by traditional GDP statistics but which release as productivity gains in the “harvest phase.”

MIT Sloan’s micro-level study of U.S. manufacturing firms confirmed the J-Curve: average productivity declined 1.33 percentage points after AI adoption; correcting for selection bias, the short-run negative impact was approximately 60 percentage points. However, among firms persisting for four or more years, over 60% ultimately achieved productivity improvements exceeding 25%. Younger firms recovered faster than older ones.

Dimension J-Curve Can Explain J-Curve Cannot Explain
Productivity Decline Temporary dip from organizational adjustment costs Why AI usage effects vary enormously between employees within the same firm
Time Dimension Historically, GPTs take 20–40 years to appear in macro data Why employee confidence drops as AI usage increases
Firm Differences Digitally mature firms recover faster Why output quality fluctuates wildly for the same tool, same firm, different employees
ROI Intangible asset accumulation takes time to monetize Why workloads increased by 104% instead of decreasing
Evaluation Traditional GDP metrics miss intangible investments Why managers and employees assess the same tool in completely opposite ways

BBB Theory’s Complementary Explanatory Power: The J-Curve attributes productivity decline to organizational adjustment costs — a supply-side explanation. BBB theory adds a demand-side/usage-side explanation: even after organizational adjustment is complete, as long as input-side quality is neither managed nor quantified, system output quality cannot stabilize. The J-Curve assumes the “harvest phase” will arrive naturally; BBB theory argues that without solving the bidirectional quality problem, the harvest phase may be indefinitely postponed — firms may forever circle the bottom of the J-Curve.

Theoretical Positioning

BBB theory does not negate the J-Curve — it explains what the J-Curve cannot: why some firms climb out of the J-Curve’s trough while others cannot. The difference lies not in investment scale or time horizon, but in whether a mechanism for managing input-side quality was established. The 12% of firms that gained both cost and revenue benefits (PwC data) were precisely those that undertook deep organizational transformation — essentially, they unconsciously and partially solved the input-side black box problem.


05 · The Missing Evaluation Framework

Subjective “Feeling” Reports: Wrong Ruler, Wrong Target

A Methodological Critique of Current AI Productivity Surveys

Current global AI productivity surveys suffer from a fatal methodological flaw: they ask users to evaluate a tool, but the users themselves are part of the variable. The logical essence of these reports is measuring an undefined metric with an uncalibrated instrument.

Missing Dimension Specific Problem Consequence
Input Quality Baseline No standard for what constitutes a good prompt Cannot distinguish AI problems from user problems
Controlled Variables Input variance between employees on same tasks unmeasured Cannot quantify input quality’s effect on output
Attribution Separation No breakdown of how much output quality comes from model vs. input No valid causal attribution possible
Human Baseline Pre-AI quality variance range for pure human work unknown No comparison reference; AI effect indeterminable
Bidirectional KPIs No metric that simultaneously quantifies input and output Entire evaluation system is one-sided

The result: all AI effectiveness surveys are “subjective use + subjective evaluation + subjective survey” — absolute subjective-perspective “human feeling” reports. The evaluator is simultaneously a variable within the evaluated system, and the human instinct toward self-preservation makes it impossible to objectively assess one’s own responsibility. At a deeper level — who designs this evaluation framework? Humans. The designer carries all the same biases, and including human-side variables amounts to admitting “our people aren’t good enough” — nearly impossible in organizational politics.


06 · Industry Data

The Paying User Ceiling and the Token Black Hole

Where Capital Burns Without Returns

900MChatGPT weekly
active users
~5%Free-to-paid
conversion rate
$700BBig Four 2026
AI infrastructure spend
4%Quarterly user growth
decelerating

95% of users refuse to pay. European ChatGPT spending stagnated since May 2025. The Big Four’s 2026 AI infrastructure spending approaches $700 billion, yet Microsoft’s AI revenue target is only $25 billion — the gap between input and output continues widening.

The Token Black Hole: Massive compute is consumed on ineffective scenarios — free users chatting, low-quality input generating low-quality output, repeat queries, AI Slop content. The model doesn’t refuse a bad question — it earnestly expends expensive compute on requests not worth processing. In the Chinese market, ByteDance’s Doubao frequently banned accounts for mass pornographic messaging. xAI’s Grok went to the opposite extreme — after opening adult content, only 2 of 12 co-founders remained, 35 state attorneys general demanded remediation, multiple countries blocked or investigated, and enterprise clients fled entirely.

These are not isolated incidents — they are direct manifestations of the BBB system at the commercial level: can’t use it well → won’t pay → model lacks quality private data → can’t improve → can use it even less well. Compute is subsidizing human boredom and desire, not creating commercial value.


07 · The Layoff Paradox

AI Cost-Cutting’s Self-Referential Loop

AI Optimizes AI, But Who Optimizes the Users?

In Q1 2026, over 45,000 tech workers were laid off, approximately 20% explicitly attributed to AI. But layoffs are highly concentrated within AI-related and technology companies: Block cut 4,000 (40%), Atlassian cut 1,600 (10%), Meta plans to cut 20%. Oxford Economics analysts suspect some firms are using AI as “wrapping paper” for restructuring.

Where AI genuinely cuts costs is in AI R&D itself — writing code with AI, optimizing models with AI, replacing junior programmers with AI. This is a closed loop: the AI industry optimizes the AI industry with AI. But on the non-AI enterprise usage side, success stories are scarce. BCG found that failing firms pursue an average of 6.1 AI use cases simultaneously versus 3.5 for leaders, yet leaders expect 2.1× higher ROI. Fewer than one-third of companies have upskilled even a quarter of their workforce. Most critically — most enterprises don’t track financial KPIs for their AI initiatives at all.


08 · Bubble or Dawn? — Bidirectional Evidence

Letting the Reader Find Their Own Equilibrium

This Paper Does Not Judge — It Presents Both Sides

This paper’s core claim is that one-directional information produces one-directional judgment. Bubble theorists see only deployment failure data; optimists see only technology progress curves. Both are blind to half the picture. Since this paper critiques one-sided evaluation, the paper itself should not conclude for the reader. Below, the core arguments of both sides are presented bidirectionally.

Evidence 1: Capital Structure

What Bubble Theorists See

Circular investment exists in the AI ecosystem — Microsoft invests in OpenAI, OpenAI buys compute from CoreWeave, CoreWeave leases NVIDIA GPUs, NVIDIA reinvests revenue in OpenAI. Morgan Stanley notes the same dollar is counted across multiple balance sheets.

Big Four 2026 AI capex approaches $700B, but J.P. Morgan warns $650B in annual revenue is needed for a mere 10% return. OpenAI pledges $1.4T in data centers over 8 years on $13B annual revenue.

What Optimists See

Current AI investors are Microsoft ($160B annual cash flow), Google, Amazon, Meta — they can absorb losses, and absorb them for years. In 2000, dotcom investors were retail traders and small VCs; three funding rounds burned through meant death. Capital resilience is fundamentally different.

NVIDIA FY2026 revenue $215.9B (+65% YoY), market cap $4.3T. S&P 500 forward P/E ~23×, far below the Nasdaq’s ~60× in 2000. Today’s valuations are backed by real earnings.

Evidence 2: Revenue Growth

What Bubble Theorists See

ChatGPT has 900M weekly active users but only ~5% pay. European spending stagnated since May 2025. Quarterly growth down to 4%. Enterprise GenAI spend is $37B but 95% of firms see zero return. API price wars with 40–70% cuts — a classic signal of growth ceiling.

What Optimists See

Anthropic’s annualized revenue grew from $1B (Dec 2024) to $19B (Mar 2026) — 19× in 14 months, unprecedented in B2B software history. Claude Code reached $2.5B ARR in 9 months from zero. Eight Fortune 10 companies are Claude customers. 1 in 5 businesses on Ramp now pay Anthropic, up from 1 in 25 a year ago.

Evidence 3: Productivity

What Bubble Theorists See

NBER survey of 6,000 executives: 90% report zero AI productivity impact. METR trial: AI made developers 19% slower. UK Copilot trial: Excel slower and worse quality; PPT faster but lower accuracy. ActivTrak data: email time +104%, zero time savings in any category.

What Optimists See

Brynjolfsson estimates 2025 U.S. productivity growth at ~2.7%, nearly double the decade average. MIT manufacturing study confirms J-Curve: initial −1.33pp, but 60%+ of firms persisting 4 years achieve 25%+ gains. A small cohort of “power users” has automated end-to-end workflows. 4% of GitHub public commits are by Claude Code.

Evidence 4: Historical Parallels

What Bubble Theorists See

Barron’s 2000 cover story: 74% of 207 internet companies had negative cash flows; 51 projected to run out of money within 12 months. The Nasdaq then crashed 78%, erasing $5T in market value. Latest Bank of America survey: most global fund managers call AI stocks “a bubble.”

What Optimists See

Only 14% of dotcoms were profitable in 2000; today’s major AI investors are the world’s most profitable companies. Janus Henderson identifies at least 8 structural differences (Y2K effect, audit standards, demand visibility, etc.). Even if a correction occurs, GPU clusters and data centers — like 2000’s fiber optics — will later underpin the next wave.

Authors’ Position Statement

This paper does not adjudicate whether AI is a bubble or a dawn. Both sides have data-supported arguments and both have blind spots. However, we observe one variable both sides overlook: whether or not AI is in a bubble, the Bidirectional Black Box problem is real and will not disappear automatically with abundant capital or technological breakthroughs. Capital can keep AI alive; technology can improve model capabilities; but unless input-side quality is managed and evaluation frameworks are built, system output quality cannot stabilize. This is a structural problem independent of the bubble debate. Transparency disclosure: This paper’s AI collaborator is an Anthropic product (Claude Opus 4.6). Anthropic growth data cited herein has been cross-verified against Bloomberg, Sacra, and Epoch AI, but readers should be aware of this potential conflict of interest.

The Nonlinear Nature of Scientific Progress

Regardless of whether the reader leans toward the bubble thesis or optimism, one fact both sides should acknowledge: scientific progress is nonlinear. At least five technology pathways are advancing in parallel within AI, each at a different stage of development:

Technology Pathway Current Stage Potential Impact
Multimodal Fusion Early commercialization (GPT-4o, Gemini deployed) Breaking text-image-video-audio barriers
Reasoning Enhancement Rapid iteration (o1/o3 series) Transforming deep thinking and multi-step reasoning
Agent Architecture PoC → early commercialization transition Redefining interaction: from conversation to delegation
Embodied Intelligence Lab → prototype stage Bridging digital and physical worlds
Novel Inference Chips Early commercialization (Groq, Cerebras) Redefining inference cost structure

A breakthrough in any one pathway could reshape the entire industry’s cost structure and usage paradigm. But “could reshape” is not “will reshape.” Linear extrapolation from current data is a methodological error — whether that extrapolation points toward collapse or prosperity.


09 · Exploratory Directions

BRTES: A Framework Under Development — Not a Finished Solution

Honest About What We Don’t Know

Based on BBB theory and enterprise-side empirical data, we propose exploratory directions for a Bidirectional Real-Time Evaluation System (BRTES). To be clear: this is a conceptual framework still under development, not a mature solution. We have not completed validation, nor do we pretend to have found answers. What follows is directional thinking, offered for joint exploration by industry and academia.

Layer 1: Input Quality Measurement Layer

Real-time quantitative evaluation of human Input, across five core measurement dimensions:

Dimension Definition Implementation
Intent Clarity Ambiguity score of input instructions Model confidence back-inference, polysemy detection
Signal-to-Noise Ratio Ratio of effective information to redundancy/noise Information density analysis, keyword extraction rate
Context Completeness Coverage of information required for task execution Prerequisite checking, missing information prompting
State Stability Cross-session input quality variance for same user Historical baseline comparison, variance tracking
Domain Relevance Alignment between input and target task domain Task classifier, domain knowledge validation

Key design principle: Real-time feedback, not post-hoc scoring. When the AI detects an ambiguous instruction, it proactively tells the user: “Your input has a low signal-to-noise ratio; there’s a 70% chance I’ll go off-track. Would you like to supplement these details?” Transforming post-hoc evaluation into a real-time calibration feedback loop.

Layer 2: Output Quality Measurement Layer

Objective, quantified evaluation of AI Output, with bidirectional comparison against the input side:

Dimension Definition Implementation
Task Completion Degree to which output matches input intent Intent alignment scoring, requirement coverage
Factual Accuracy Correctness rate of output content Knowledge base verification, citation validation
Consistency Output stability for same input across different times Repeat testing, variance analysis
Value Density Ratio of useful output information to total tokens Information-per-token ratio calculation
Constraint Compliance Execution rate against user-specified constraints Constraint checking, deviation rate calculation

Layer 3: Bidirectional Comparison & Attribution Layer

Lateral comparison of input and output measurement data, enabling responsibility attribution:

Input Quality Score
Input Score

Attribution Engine
Attribution Engine

Output Quality Score
Output Score

Result Quality = f(Input Quality, Model Capability, Task Complexity)
When output quality is low, the attribution engine automatically determines:
• Low input quality + Normal model capability → User-side issue → Trigger input improvement suggestions
• High input quality + Anomalous model capability → Model-side issue → Log model defect
• Low input quality + Anomalous model capability → Dual-side issue → Flag systemic risk
Enterprises can use this data to objectively assess employee AI usage competency and AI tool effectiveness.

Unsolved Key Challenge: Who Will Build This? Who Has the Incentive?

BRTES’s greatest obstacle is not technical but organizational-political and market-incentive driven. LLM companies won’t voluntarily build it — admitting output instability hurts valuations. Enterprises won’t voluntarily build it — admitting input-side problems amounts to admitting “our people aren’t good enough.” Media won’t push for it — extreme narratives get more traffic than systematic analysis. This means BRTES’s builder may only be: independent third-party evaluation bodies, academia-industry consortia, or government standards organizations. But none of these have mobilized yet.

The deeper resistance is psychological: having AI evaluate human cognitive ability and expressive precision is an enormous challenge — psychologically and organizationally. People accept AI evaluating tools as natural, but the reverse — AI evaluating human input quality — triggers instinctive resistance. This resistance won’t disappear no matter how elegant the technical solution.

Honesty Statement

BRTES is currently an exploratory direction, not a validated solution. The authors’ real-time evaluation system is still under development. How the attribution engine works for open-ended tasks, how input quality metrics achieve cross-scenario generalizability, how to break through organizational-political resistance — these remain unanswered questions. We choose to honestly present these unknowns rather than pretend we’ve found the answers.


10 · Open Reflections

We Raise Questions — We Don’t Pretend to Have All the Answers

The Value of This Paper Is in the Questions, Not the Conclusions

AI maximalism and AI bubble theory commit the same error — the parable of the blind men and the elephant. One side sees the upper bound of model capability; the other sees the lower bound of deployment results. Both have only partial information, and both lack scrutiny of the human input-side variable. This paper attempts to fill that overlooked dimension.

This paper’s contribution lies not in providing answers but in raising a set of overlooked questions:

Theoretical level — BBB theory reveals the superlinear stacking of uncertainty on both sides of human-machine interaction systems. Is this sufficient to explain the portion of the productivity paradox that the J-Curve cannot cover? More empirical testing is needed.

Empirical level — Six categories of enterprise complaints demonstrate that BBB system malfunction is a daily reality. But do these complaints truly point to input-side quality issues, or are there other explanations? First-hand controlled experiments are needed.

Engineering level — BRTES’s three-layer architecture is a direction, far from a mature solution. Can the attribution engine work for open-ended tasks? Would input quality scoring lead to “KPI gamification”? We cannot yet answer these questions.

Capital level — We presented bubble and optimist evidence bidirectionally. Where the equilibrium lies is not for us to adjudicate. But regardless of whether AI is in a bubble, the BBB problem is real — this point is unaffected by market cycles.

Limitations of this paper: The BBB theory’s entropy formula is a qualitative descriptive model requiring rigorous information-theoretic derivation; the BRTES three-layer architecture is unvalidated empirically; Chinese market case depth is insufficient; BBB theory’s applicability across different policy and cultural environments requires further study; enterprise complaint data mostly originates from industry survey reports rather than primary experiments; this paper’s AI collaborator is an Anthropic product, and related data citations have been cross-verified but readers should be aware of the potential conflict of interest. These limitations are simultaneously directions for future research.

Final Reflection · For the Reader

This paper’s value does not lie in telling you whether AI will succeed or fail. Its value lies in pointing out a structural variable overlooked by both bubble theorists and optimists — the quality of human input. If you are an enterprise decision-maker, before evaluating AI tools, first examine what your team is feeding the AI. If you are an AI practitioner, before optimizing models, first consider why your users can’t use them well. If you are an investor, before judging AI valuations, first consider whether the industry has built infrastructure that makes AI value quantifiable. A transparent protocol layer is needed between two black boxes. That protocol does not yet exist. Who builds it, how to build it, whether it can be built — these are open questions, beyond this paper’s ability to answer. But we can be certain of one thing: without raising this question, no one will ever answer it.

References
  1. NBER Working Paper (2026). Survey of 6,000 executives on AI impact on employment and productivity. NBER
  2. METR (2026). Randomized controlled trial on AI tools and developer productivity: −19% actual vs. +20% perceived. PEER-REVIEWED
  3. MIT GenAI Divide Report (2026). 95% failure rate of GenAI pilots beyond experimental phase. ACADEMIC
  4. PwC 2026 Global CEO Survey. 56% report zero gains; 12% report dual cost-revenue benefits. INDUSTRY
  5. Duke University / Federal Reserve CFO Survey (2026). Perceived vs. actual AI productivity gap. ACADEMIC
  6. UK Department for Business and Trade. Microsoft 365 Copilot controlled trial — no productivity evidence. GOVERNMENT
  7. ManpowerGroup 2026 Global Talent Barometer. AI use +13%, confidence −18%. INDUSTRY
  8. ActivTrak 2026 Behavioral Data Report. Email time +104%, messaging +145%, zero time savings. INDUSTRY
  9. Checkr / Pollfish Survey (Feb 2026). Manager AI trust 40% vs. employee trust 9%. INDUSTRY
  10. MetLife 2026 Workplace Report. 83% say faster, 67% say “new friction and mistrust.” INDUSTRY
  11. Boston Consulting Group (2026). “AI brain fry” — cognitive overload beyond 4+ tools. INDUSTRY
  12. Glance 2026 CX Trends Report. 75% consumer frustration with AI customer service. INDUSTRY
  13. Optro 2026 Risk Intelligence Report. 40% inaccurate outputs, 34% cite employee data input as top risk. INDUSTRY
  14. Duke University (2025). Social penalty study — 4,400 participants, AI users perceived as “cutting corners.” ACADEMIC
  15. Gallup (2026). 64% of U.S. adults plan to avoid AI “as long as possible.” INDUSTRY
  16. Brynjolfsson, Rock & Syverson. “The Productivity J-Curve: How Intangibles Complement GPTs.” NBER WP 25148. FOUNDATIONAL
  17. MIT Sloan / McElheran et al. (2025/2026). AI adoption J-curve in U.S. manufacturing. −1.33pp initial, 60% see 25%+ gains after 4yr. ACADEMIC
  18. Apollo Chief Economist Torsten Slok (2026). “AI is everywhere except in the incoming macroeconomic data.” INDUSTRY
  19. BCG (2026). Failing firms average 6.1 use cases vs. 3.5 for leaders; <1/3 upskilled 25% of workforce. INDUSTRY
  20. Deutsche Bank Research Institute (2025). European ChatGPT spending stagnation since May 2025. INDUSTRY
  21. Sacra Research (2026). OpenAI $25B ARR, Anthropic $19B ARR, paying user and revenue estimates. INDUSTRY
  22. RationalFX / TNGlobal (2026). 45,000+ tech layoffs Q1 2026, 20.4% AI-attributed. MEDIA
  23. Oxford Economics / Revelio Labs (2026). AI as pretext for corporate restructuring analysis. INDUSTRY
  24. CNN / TechCrunch / CNBC (2026). xAI co-founder departures (10/12 left) and Grok controversy. MEDIA
  25. NY State AG + 34 AG coalition (2026). Demand for xAI action on nonconsensual content. GOVERNMENT
  26. Robert Solow (1987). “You can see the computer age everywhere but in the productivity statistics.” FOUNDATIONAL
  27. Erik Brynjolfsson, FT (2026). “The AI productivity take-off is finally visible” — U.S. productivity ~2.7% in 2025. ACADEMIC
  28. Cornell University / Zitek (2024). AI surveillance reduces employee autonomy and productivity. ACADEMIC
  29. Anthropic (2026). Series G: $30B raised at $380B valuation. ARR $1B→$19B in 14 months. Claude Code $2.5B ARR in 9 months. INDUSTRY
  30. Epoch AI (2026). Anthropic 10×/year growth vs. OpenAI 3.4×/year; crossover projected mid-2026 at ~$43B. ACADEMIC
  31. SaaStr / Alex Clayton (2026). “We’ve looked at 200+ public software company IPOs — this growth rate has never happened.” INDUSTRY
  32. IntuitionLabs (2025/2026). Data-driven comparison: AI bubble vs. dot-com bubble — capex, valuations, capital structure. INDUSTRY
  33. Janus Henderson (2025). “8 reasons the AI wave is different” — Y2K, fraud standards, demand visibility, geopolitics. INDUSTRY
  34. Barron’s (2000). “Burning Up” cover: 74% of 207 internet companies had negative cash flows. FOUNDATIONAL
  35. Simply Wall St (2026). Dotcom P/E ~60× vs. current S&P 500 ~23×; today’s multiples supported by real earnings. INDUSTRY
  36. NVIDIA FY2026. Revenue $215.9B (+65% YoY); market cap ~$4.3T; P/E ~47× vs. Cisco 2000 peak ~472×. INDUSTRY

“Between two black boxes, a transparent protocol layer is needed.
It doesn’t exist yet. But the question has been raised.”

이조글로벌인공지능연구소 · LEECHO Global AI Research Lab & Claude Opus 4.6 · Anthropic
March 25, 2026 · Original Thought Paper · V4

댓글 남기기