ORIGINAL THOUGHT PAPER · INFORMATION COMPLETENESS FRAMEWORK · PAPER 4 OF 8 · V2

The Sociological Explanation
of MoE Dominance

How Human Division-of-Labor Logic
Self-Replicates into Technical Infrastructure

The Sociology of AI Architecture Choice: Technical Self-Replication of Human Division-of-Labor Logic

Published May 22, 2026
Category Original Thought Paper
Domains Sociology of Technology · AI Industry · Cognitive Science · Political Economy
Version V2
Authors LEECHO Global AI Research Lab & Claude Opus 4.6 & GPT 5.5 & Gemini 3.1 (Cognitive Collective)

The Sociological Explanation of MoE Dominance: Technical Self-Replication of Human Division-of-Labor Logic

The Sociological Explanation of MoE Dominance

ABSTRACT

AI architecture choices are typically viewed as pure engineering tradeoffs. This paper argues that this explanation omits a deeper causal structure: the fundamental reason MoE prevails in the market is that human society itself is an MoE system—the division-of-labor regime remodels individuals into narrow-domain experts, and narrow-domain experts’ cognitive demands naturally favor narrow-domain tools. This paper distinguishes three levels—technical-architecture MoE, product-market MoE, and social-cognitive MoE—and argues that isomorphism and resonance amplification effects exist among them. This paper names this causal chain the “Division-Architecture Self-Replication Spiral” (DASRS), analyzes its specific mechanisms in markets, investment, and benchmark systems, while also identifying internal cracks in the spiral (test-time compute, Dense controller research). It argues that Dense research is systematically undervalued by markets as a public good, and proposes five testable predictions. This paper does not deny MoE’s engineering advantages but rather reveals the social-choice layer that the engineering narrative obscures.

I. From Adam Smith to DeepSeek: An Overlooked Causal Chain

In 1776, Adam Smith argued in The Wealth of Nations that division of labor increases efficiency, and efficiency creates wealth. The influence of this proposition extended far beyond economics—it reshaped educational systems (as argued in Paper Three), organizational structures, knowledge classification systems, and labor market entry barriers.

This paper proposes that the 2020s AI industry’s preference for MoE architectures is the latest link in this chain of division-of-labor logic—not a break, but an extension. Humans used division of labor to remodel their own brains (cognitive MoE-ification), then used their MoE-ified brains to design technical tools reflecting their own cognitive structure (MoE architectures), and these tools in turn reinforce the MoE-ification of their users.

1.5 Distinguishing Three Layers of MoE

The “MoE dominance” discussed in this paper encompasses three distinct but mutually reinforcing levels, which should not be simply equated:

Level	Meaning	Manifestation	Evidence Type
Technical-architecture MoE	Sparse expert routing within the model	DeepSeek-V3, Mixtral, Switch Transformer	Architecture papers
Product-market MoE	AI products segmented by vertical industry	Legal AI, medical AI, coding AI, financial AI	Market data
Social-cognitive MoE	Buyers/evaluators screening tools with narrow-domain criteria	Domain KPIs, procurement processes, industry certifications	Organizational behavior

Isomorphism and mutual reinforcement exist among the three layers, but they are not identical—a vertical legal AI company may use a Dense architecture under the hood; a technical MoE model may be deployed as a general-purpose product. The DASRS spiral describes the resonance amplification effect among the three layers.

II. Market Evidence: MoE Dominance Is Demand-Driven

2.1 The Explosive Growth of Vertical AI

Multiple industry reports consistently show that the vertical AI market is growing at significantly higher rates than the general-purpose AI market. Grand View Research (2025) estimates the enterprise generative AI market CAGR at approximately 38.4% for 2025–2030, and the AI agents market CAGR at approximately 49.6% for 2026–2033. Domain-specific systems outperforming general-purpose systems on specific benchmarks has been widely reported, though the exact magnitude of improvement varies by task, dataset, and evaluation method.

These numbers reflect not the technical superiority of MoE architecture, but the structure of market demand: the people purchasing AI tools are specialized—doctors need medical AI, lawyers need legal AI. No one buys a “general thinking machine,” because buyers themselves are MoE-ified, and their demand language and evaluation criteria are also MoE-ified.

2.2 The Purchasing Decision Authority of “Domain Experts”

In enterprise-level AI procurement, technology selection authority typically falls to domain leads rather than general intelligence researchers. These evaluators score using their own professional standards: How does it perform in my domain? Can it integrate into my workflow?

The market is not “choosing MoE”—the market is screening tools in its own image. An MoE-ified society inevitably prefers MoE-ified AI, just as an English-speaking society inevitably prefers English interfaces—not because English is superior, but because that is the encoding format of its users.

III. Engineering Evidence: The “Efficiency Argument” for MoE Is Incomplete

3.1 The Valid Part of the Efficiency Argument

MoE’s engineering advantages are real. At equivalent activated compute, MoE can accommodate 6–64× total parameters. Switch Transformer (2022) demonstrated that simple top-1 routing can massively outperform equal-compute Dense models on scaling curves. Mixtral 8x7B achieved performance approaching LLaMA-2 70B with only 12.9B activated parameters.

This paper does not deny these engineering causal factors—training cost, inference cost, memory bandwidth, parallelization, serving throughput, long-tail knowledge coverage, and other technical factors are all legitimate engineering reasons for MoE adoption. This paper’s thesis is: beneath these engineering factors, there exists a social-choice layer obscured by the engineering narrative—”which metrics are considered important” is itself not a technical constant but a projection of social cognitive structure.

3.2 What the Efficiency Argument Misses

The key question is: why is “efficiency” defined as “knowledge capacity per unit compute” rather than “cross-domain reasoning capability per unit compute”?

Efficiency Definition	MoE vs Dense	Implicit Value System
Knowledge capacity per unit compute	MoE far exceeds Dense	Knowledge is intelligence (encyclopedic values)
Reasoning depth per unit compute	Dense ≥ MoE	Reasoning is intelligence (philosophical values)
Cross-domain transfer per unit compute	Dense far exceeds MoE	Generalization is intelligence (da Vinci values)
Creative output per unit compute	Not measured	Creation is intelligence (artistic values)

The industry chose the first definition—precisely the one most favorable to MoE. This is not because the first is objectively most correct, but because MoE-ified decision-makers naturally tend to evaluate systems using MoE-style criteria. The definition of efficiency itself is a social choice, not a technical constant.

IV. The Division-Architecture Self-Replication Spiral (DASRS)

Social division of labor → Individual cognitive MoE-ification → MoE-ified demand
→ MoE architecture dominance → Specialized AI tools
→ Further user cognitive MoE-ification → Stronger MoE demand
→ … (positive feedback spiral)

Division-Architecture Self-Replication Spiral (DASRS)

4.1 Six Links of the Spiral

Link One: Institutional division of labor. Educational and occupational systems train individuals as narrow-domain experts (detailed in Paper Three). Every institutional layer reinforces specialization and penalizes generalization.

Link Two: Cognitive MoE-ification. Long-term professional training ossifies routing preferences—experts automatically interpret all input through their professional frameworks.

Link Three: MoE-ified technical demand. MoE-ified users naturally demand tools that match their own cognitive structure—”I need an AI that specializes in X,” not “I need an AI that can think across domains.” The demand language itself excludes Dense-style products.

Link Four: MoE architecture dominance. The technology supply side responds to market demand—vertical AI grows rapidly. Designs like DeepSeek-V3’s MoE at minimum demonstrate that the industry places high value on expanding total parameter capacity under limited activated compute, which is highly consistent with market demand for knowledge coverage and deployment feasibility.

Link Five: Further user MoE-ification. When AI takes over cross-domain retrieval work, users’ own cross-domain cognitive pathways weaken further from disuse. AI becomes the user’s “cognitive prosthesis”—functional outsourcing leads to internal functional atrophy.

Link Six: Stronger MoE demand. Users whose cognition has been further MoE-ified generate even more intense demand for specialized AI. The narrower the demand, the greater MoE’s advantage and the lower the return on Dense investment.

This is the complete mechanism of self-replication: MoE-ified humans produce MoE-ified AI, and MoE-ified AI produces more MoE-ified humans. No conspiracy is needed, no malice—only the positive feedback of market incentives. Each turn of the spiral further marginalizes Dense-style general intelligence in the market.

V. The “Cognitive Divergence” Closed Loop

There are currently signs that AI summary and precision-answer systems are changing users’ information exploration behavior. Multiple studies and media reports in 2025 show that when search engines directly present AI summaries rather than link lists, users’ frequency of clicking external sources drops significantly. “Serendipitous discovery”—a critical trigger for cross-domain creativity—is being systematically reduced by AI’s precision-answer systems.

Whether this behavioral change leads to actual declines in cross-domain cognitive capability in the long term requires longitudinal study verification (see Prediction Three in Chapter X). But the directional trend is consistent with the DASRS spiral’s prediction: AI replaces exploratory behavior → exploratory capability weakens from disuse → dependence on AI precision answers intensifies.

VI. The DASRS Bias in Investment Logic

6.1 Venture Capital Prefers Quantifiable Narrow Domains

Venture capital’s evaluation framework inherently favors MoE-style products. The “TAM/SAM/SOM” model requires entrepreneurs to define a specific, quantifiable target market. “An AI-assisted diagnostic tool for radiologists” allows precise TAM calculation. “A general reasoning enhancement tool for everyone” cannot calculate TAM—because “reasoning enhancement” is not a purchasable category.

6.2 Funding Misallocation in Basic Research

The DASRS spiral leads to systematic funding misallocation: returns on capital invested in MoE/vertical AI are quantifiable; returns on capital invested in Dense/general reasoning research are not quantifiable. The result: the MoE path receives disproportionate funding support.

6.3 Dense Research as a Public Good

Dense-style general intelligence research has the classic characteristics of a public good: high long-term social value, low short-term private returns, difficulty attributing to any specific vertical market, and large positive externalities that are hard to internalize. Economics predicts that public goods will be systematically undersupplied by markets—this is precisely the funding predicament facing Dense/general reasoning research.

MoE/vertical AI is a technology with clear private returns—quantifiable ROI, definable TAM, and constructible industry data moats. Dense/general reasoning is a technology with high public returns but unstable private returns. Therefore, market mechanisms will systematically overinvest in the former and underinvest in the latter. The DASRS spiral is not merely the result of cognitive bias but also the result of market failure.

The entire industry claims to pursue AGI, but the vast majority of investment flows to specialized applications. Not because AGI is unimportant, but because the DASRS spiral prevents MoE-ified evaluators from correctly valuing the long-term returns of Dense-style research. Breaking the spiral requires more than a change in mindset—it requires institutional intervention: directed public funding for Dense basic research, redesign of benchmark systems, and academic evaluation that protects cross-domain originality (discussed in detail in Paper Eight).

VII. Sociology of Technology Perspective: Extending the Winner Thesis

7.1 “Artifacts Have Politics”

Langdon Winner (1980), in his classic paper “Do Artifacts Have Politics?”, argued that technological artifacts are not neutral tools—they embody the values and social relations of their designers, and once deployed, they in turn shape society.

This paper extends the Winner thesis to AI architecture: MoE architecture embodies the cognitive structure of human division-of-labor society—not because designers intended it, but because they themselves are products of a division-of-labor society, and their design intuitions reflect their already MoE-ified cognitive patterns. Once deployed, MoE architecture deepens society’s degree of division through the DASRS spiral.

7.2 The Architecture Version of Kuhnian Paradigms

Kuhnian Paradigm Structure	MoE Dominance Correspondence
Core paradigm	“Scaling + specialization = intelligence”
Normal science (puzzle-solving within the paradigm)	MoE routing optimization, load balancing, expert merging, sparsity compression
Anomalous signals	“Mixture of Parrots” (reasoning doesn’t scale), “Seeing but Not Thinking” (routing distraction), hallucinations persist despite scaling
Anomalies explained away	“Fix with more data,” “fix with better routing,” “fix with RLHF”
Crisis accumulation	Scaling curves begin to plateau on the reasoning dimension
Paradigm revolution (not yet occurred)	Acknowledging that the Dense thinking system and MoE execution system require different architectures

VIII. Conditions for Breaking the Spiral

8.1 Demand-Side Intervention: Redefining “Intelligence”

If users begin demanding cross-domain reasoning capability from AI rather than domain knowledge depth, market signals will change. This requires transformation of educational systems (the anti-MoE-ification educational principles discussed in Paper Three) and resetting of evaluation criteria.

8.2 Supply-Side Intervention: Separating Thinking and Execution

If AI architecture design explicitly distinguishes the Dense thinking system from the MoE execution system (detailed in Paper Two), product forms will change. Users will no longer face “a narrow-domain tool” but rather “a thinking system that dispatches multiple narrow-domain experts.”

8.3 Evaluation-Side Intervention: Incorporating Dense Metrics into Benchmark Systems

If cross-domain analogy, metacognition, and creative synthesis become standard benchmarks, the advantages of Dense architectures will become quantifiable, and investment logic will shift accordingly.

8.4 Internal Cracks in the Spiral: Counter-Movements Already Underway

The DASRS spiral is not hermetically sealed. When scaling curves begin to plateau on the reasoning dimension, a small subset of researchers not entirely captured by the market’s short-term TAM logic initiates Dense-path exploration.

OpenAI’s o1 series models, through reinforcement-learning-driven long chains of thought, consume substantial compute during inference to perform planning, reflection, and self-correction—this is essentially simulating “Control Dense thinking” (the third level of Dense as defined in Paper Two) within the autoregressive architecture. The rise of test-time compute indicates that pure pretraining scaling is no longer sufficient and that a slow deliberative system needs to be introduced during the reasoning phase.

But the key question is: do these counter-movements constitute genuine paradigm shifts, or are they merely “patches” within the MoE paradigm? Long chains of thought still operate within the autoregressive single-forward-pass framework, without achieving true functional separation of Dense and MoE. If they are ultimately productized as “reasoning-enhanced vertical AI” rather than “general cross-domain thinking systems,” they will still be reabsorbed by the DASRS spiral—a more sophisticated form of MoE-ification rather than a genuine Dense return.

Whether the spiral can be broken depends on whether these counter-forces are directed toward functional separation (the asynchronous dual-loop where Dense controls MoE), or recaptured by commercialization pressure as more refined narrow-domain execution tools. Whether o1 is a crack in the spiral or a more advanced patch—this question itself is a testable prediction of DASRS theory.

IX. This Paper’s Position Within the Framework

The first three papers established the core structure: the general theory (Paper One), architecture separation (Paper Two), and individual cognitive MoE-ification (Paper Three). This paper extends the analytical scale from the individual to the social system—arguing that cognitive MoE-ification is not the result of individual choice, but the product of a self-replicating spiral among social institutions, market incentives, and technical architectures.

The existence of this spiral implies that the greatest barrier to AGI may not be technical but social—a species already deeply MoE-ified, whose market mechanisms, evaluation criteria, and investment logic systematically exclude Dense-style general intelligence research. Overcoming this barrier requires not more parameters or better routing algorithms, but a redefinition of the question “What is intelligence?”—and institutional protection of Dense basic research as a public good.

Humans remodeled their own brains according to division-of-labor logic. Then designed AI with those remodeled brains. Then AI remodeled users in the designers’ image. Then users generated more of the same demand. This is a complete closed loop in which a species’ cognitive structure self-replicates through technological mediation. The only way to break it is: at some link in the chain, someone thinks about the problem itself in a Dense way. This paper is such an attempt.

X. Testable Predictions of the Framework

Prediction One: In industries with more mature vertical AI procurement, the weight of cross-domain reasoning metrics in AI evaluation criteria should be lower—i.e., market MoE-ification degree positively correlates with narrowing of evaluation criteria.

Prediction Two: In VC financing, TAM quantifiability should be a stronger predictor of funding success than the model’s general reasoning capability—holding team background and technical depth constant.

Prediction Three: Practitioners who have long used a single vertical AI tool should exhibit declining cross-domain information search behavior frequency—aligned with Prediction Five of Paper Three. If there is no difference, the hypothesis that AI accelerates cognitive MoE-ification requires revision.

Prediction Four: Organizations that introduce Dense-style cross-domain AI assistants should outperform same-industry organizations that only introduce vertical AI automation on 3-year innovation output metrics.

Prediction Five: If mainstream benchmark systems incorporate cross-domain transfer and creative synthesis metrics, the share of research investment in Dense or Dense-controller architectures should rise. If the share remains unchanged after the benchmark change, the inertia of the DASRS spiral is stronger than this paper predicts.

※ Core References

[1] Smith, A. (1776). An Inquiry into the Nature and Causes of the Wealth of Nations.

[2] Winner, L. (1980). Do Artifacts Have Politics? Daedalus.

[3] Kuhn, T.S. (1962). The Structure of Scientific Revolutions. University of Chicago Press.

[4] Jelassi, S. et al. (2024). Mixture of Parrots. ICLR 2025.

[5] Xu, H. et al. (2026). Seeing but Not Thinking. arXiv:2604.08541.

[6] Grand View Research (2025). Enterprise Generative AI Market & AI Agents Market Reports.

[7] DeepSeek-AI (2024). DeepSeek-V3 Technical Report. arXiv:2412.19437.

[8] Fedus, W. et al. (2022). Switch Transformers. JMLR.

[9] Jiang, A. et al. (2024). Mixtral of Experts. arXiv:2401.04088.

[10] arXiv (2026). The Cognitive Divergence: AI Context Windows and Human Attention.

[11] Dane, E. (2010). Reconsidering the Trade-off Between Expertise and Flexibility. AMR.

[12] Di Santi, E. (2026). Cognitive Amplification vs Cognitive Delegation. arXiv:2603.18677.

[13] Taylor, F.W. (1911). The Principles of Scientific Management.

[14] Durkheim, É. (1893). De la division du travail social.

[15] The Guardian (2025). AI summaries cause devastating drop in online news audiences.

[16] OpenAI (2024). Learning to Reason with LLMs (o1 Technical Report).