Thought Paper · V3 · May 2026

Markdown vs HTML

An Architectural Theory of Information Formats in the Age of Multimodal AI


Published May 12, 2026
Category Original Thought Paper
Domains Information Theory · AI Architecture · Multimodal · Token Economics
Version V3
이조글로벌인공지능연구소
LEECHO Global AI Research Lab
&
Claude Opus 4.6 · Anthropic

Abstract

In May 2026, Anthropic engineer Thariq Shihipar published “The Unreasonable Effectiveness of HTML,” arguing that HTML should replace Markdown as AI’s default output format—sparking widespread industry debate. This paper conducts an architectural analysis of the Markdown and HTML technology paths along the AI development timeline—past, present, and future—demonstrating that no substitution relationship exists between them. Rather, they are complementary components serving different functional layers within the multimodal AI organism. The paper finds that two seemingly opposed figures—Andrej Karpathy, who champions Markdown, and Thariq Shihipar, who advocates for HTML—are in fact each describing a different facet of the same multilayer architecture: the former is the spokesperson for the storage layer, the latter for the execution layer. The paper advances three core propositions: (1) Markdown is the native carrier of the LLM’s text prediction core and is irreplaceable; (2) HTML is the inevitable path for multimodal interaction and is indispensable; (3) voice is the future dominant mode of multimodal interaction. Together, these three constitute the complete “Interaction Layer · Storage Layer · Execution Layer” architecture, whose alignment is the emergence of multimodal AI capabilities.

Keywords:
MarkdownHTMLMultimodal AIToken EconomicsInformation EntropyReasoning TokensVoice InteractionMathematical Formalization
Section I

A Debate That Needs Reframing

On May 8, 2026, Thariq Shihipar, engineering lead for Claude Code at Anthropic, published a post that set social media ablaze[1]. Within 16 hours, it garnered over 4.4 million views, 8,200 likes, and 15,700 bookmarks[2]. Its central claim: the era of Markdown as AI’s default output format should end, and HTML is the superior alternative. Almost simultaneously, Andrej Karpathy’s CLAUDE.md file earned over 60,000 stars on GitHub[3], establishing a complete AI knowledge management paradigm built entirely in Markdown[4].

The industry framed this as a “format war.” However, this paper will argue that this is not a war of “who replaces whom,” but an evolutionary process that requires a timeline to understand—Markdown’s dominance stems from the LLM’s text prediction core, HTML’s rise responds to multimodal interaction demands, and voice will become the primary channel for future human-AI interaction. The three appear sequentially on the timeline and each finds its place within the architecture.


Part I
The Past: Establishment of the Markdown Era
2022 – 2023

Section II

The Text Prediction Core and the Inevitability of Markdown

The entire capability of large language models rests on a single foundational mechanism: next-token prediction. From GPT-1 to GPT-4, from Claude to Gemini, the pretraining objective of every LLM is to predict the next token given the preceding context. This mechanism determines the LLM’s “native language”—the content it generates inevitably carries the distributional characteristics of its training corpus.

And the formatted text distribution in training corpora is overwhelmingly skewed toward Markdown[18]. GitHub README.md files, arXiv LaTeX papers, Stack Overflow technical Q&As, developer documentation—virtually all of the highest-quality structured text on the internet exists in Markdown or its close relatives. Markdown is the LLM’s “mother tongue”—they generate cleaner, more consistent Markdown than any other format[9]. This cannot be changed through retraining, because it reflects the content distribution of the internet itself[18].

When ChatGPT launched in November 2022, GPT-3.5 defaulted to outputting text in Markdown syntax[17]. This was not a design choice by OpenAI but a natural consequence of the training data. Karpathy later drew an analogy between the structure of a textbook and the three stages of AI training: “exposition of content equals pretraining, worked examples equal supervised fine-tuning, exercise problems equal the reinforcement learning environment”[5]—and the carrier for all of these is Markdown.

In April 2026, Karpathy pushed this vision to its logical extreme. He released the “LLM Wiki”—a zero-code architectural pattern in which the LLM actively compiles, maintains, and cross-links Markdown files to build a self-healing knowledge base[29]. The post garnered 16 million views within 48 hours. His wiki grew to approximately 100 articles and 400,000 words in a single research domain—with virtually no manual editing[30]. VentureBeat called Markdown “the most LLM-friendly and compact data format”[29]. Karpathy’s own framework consists of three layers: raw sources (immutable factual sources) → Wiki (LLM-maintained Markdown knowledge layer) → Schema (CLAUDE.md as an operational contract)[31]. This is structurally isomorphic with the three-layer architecture proposed in this paper.

More critically, Karpathy redefined the central question for 2026: “The interesting question is no longer ‘how to make models smarter,’ but ‘how to structure the information that models can access.'”[29] Markdown’s value lies not in aesthetic formatting but in being the optimal carrier for information structuring.

The core of the LLM is text prediction. The training data for text prediction is overwhelmingly distributed in Markdown. Therefore, Markdown is not an interchangeable “output format”—it is the foundational encoding of the LLM’s cognitive structure. Changing an LLM’s output format preference is tantamount to changing the content distribution of the internet itself—practically impossible.

Section III

Information Density and Math Nativeness: Markdown’s Double Moat

In the early LLM era of 2022–2023, context windows were extremely limited (GPT-3.5 offered only 4,096–8,192 tokens[8]), making every token precious. Under this constraint, Markdown’s information compression advantage was decisive.

Token Compression Ratio
Same content: HTML vs Markdown[8]
Full-Document Savings
40%
Markdown vs HTML token consumption[9]
Table Accuracy
60.7%
Markdown vs HTML 53.6%[9]

But information density is only Markdown’s first moat. The second runs deeper: the ecosystem nativeness of mathematical formulas. Strictly speaking, the Markdown core specification (CommonMark) does not include math syntax. However, through LaTeX extensions ($...$ and $$...$$), the Markdown ecosystem—GitHub Flavored Markdown, Obsidian, Typora, Notion—has developed math formula support into a de facto standard. $E=mc^2$ expresses a physical law in 5 characters; HTML requires importing MathJax/KaTeX libraries and using verbose MathML tags. According to a 2025 developer documentation survey, 35% of technical documents contain math formulas[10]—and this proportion is accelerating.

At this stage, using HTML as an AI output format was economically infeasible and technically unnecessary. Markdown’s dominance was the inevitable result of three converging factors: LLM training data distribution, token economic constraints, and math ecosystem affinity.


Part II
The Present: The Rise of HTML and the Format Debate
2024 – 2026

Section IV

The Rise of HTML as AI Output Format

2024 was the turning point. Two events occurred simultaneously: token prices collapsed, and a paradigm shift emerged in AI interaction interfaces.

On the pricing side, GPT-4o mini launched in July 2024, driving the cost of “good enough” AI reasoning below $1 per million tokens[14]—just 16 months after GPT-4 debuted at $30 per million tokens. The economic constraint on tokens was dramatically relaxed; HTML’s 3× token overhead was no longer a prohibitive cost.

On the interface side, Anthropic released Claude Artifacts on June 20, 2024[21]—the AI industry’s first instance of rendering HTML as an interactive output interface. Users could view, interact with, and iterate on HTML content in real time within a side panel next to the conversation. This was not simply outputting code in a chat window but the creation of an entirely new AI delivery paradigm. Since launch, users have created over 500 million Artifacts.

Competing platforms quickly followed suit: OpenAI’s Canvas (October 2024), Microsoft’s Copilot Pages (September 2024), and Google’s Gemini Canvas (March 2025)[21]. HTML transformed from “a format AI could theoretically generate but with little practical use” to “an interactive output format with dedicated rendering containers.”

This was HTML’s historic breakthrough as an AI output format—it did not replace Markdown but opened a dimension that Markdown had never covered: visual interaction.


Section V

The Iceberg Structure of Token Economics

Yet, during the very same period when HTML output was becoming economically viable, a far deeper structural transformation was underway in token economics—the birth of reasoning tokens.

In September 2024, OpenAI introduced invisible “reasoning tokens” in the o1 model—tokens representing the model’s internal thought process, billed at output rates but absent from the final reply[13]. This was the structural fracture point of the entire token economy: the tokens users see are no longer equal to the tokens consumed.

A single query to o3, producing 500 visible output tokens, may actually consume 2,000 to 5,000 reasoning tokens behind the scenes[14]. Reasoning tokens can multiply the cost of a single query by 5–50×[15]. The user sees “50-word input → 200-word reply,” but what actually occurs is “50-word input + 5,000-word internal reasoning + 200-word reply.”
Period Token Structure User-Visible Share User Perception
2022.11 – 2024.06 Input + Output = Total 100% What you see is what you pay for—transparent
2024.07 (price collapse) Input + Output 100% “AI is getting cheaper!”
2024.09 (o1 launch) Input + Reasoning (hidden) + Output ~30% “Why is it expensive again?”
2025 (reasoning becomes standard) Input + System + History + Reasoning + Output ~5% “It’s more powerful now—why isn’t it cheaper?”
2026 (Agent era) Input (1%) + Hidden layers (95%) + Output (4%) ~5% “A simple task costs this much?”
Figure 1: The historical evolution of token visibility—unit prices dropped 280×[24], yet users perceive increasing costs

In one tracked Claude session, a user’s 14-token prompt cost $0.0018 at turn 1 and approximately $2.41 by turn 260—purely due to conversation history growth, costs increased 1,339×. The user’s own input tokens constituted only about 1.3% of all processed tokens[16].

This structural change fundamentally alters the premises of the format debate. When reasoning tokens account for over 95% of total cost, the 40% token savings from choosing Markdown over HTML on the output side becomes a rounding error in the total bill. Anthropic’s own data shows that “computing and mathematics” tasks increased by 14% in API usage while decreasing by 18% in the chat interface[23]—tool-oriented usage is replacing conversational interaction, and the criteria for format selection have shifted from “save tokens” to “maximize information density” and “functional fitness.”


Section VI

Two Advocates for Different Layers of the Same Architecture

With the technology evolution timeline and the structural changes in token economics now understood, a pattern far more precise than “format war” emerges: Karpathy and Thariq are not opposing factions but advocates for two different layers of the same multilayer architecture—one describing the storage layer, the other the execution layer.

6.1 Karpathy: Advocate for the Storage Layer

Andrej Karpathy—OpenAI founding member, former Tesla AI Director. His LLM Wiki defines Markdown as AI’s knowledge management layer: having the LLM actively compile, maintain, and cross-link Markdown files to build a self-healing knowledge base[4][29]. For him, Markdown is AI’s memory medium—the carrier for information persistence, structuring, and retrieval.

But Karpathy himself is not “anti-HTML.” In his 2025 year-end review, he explicitly stated: “Text is the raw/favored data representation for computers (and LLMs), but it is not the favored format for people. People like to consume information in visual and spatial ways. LLMs should use our preferred formats to communicate with us—images, infographics, slides, whiteboards, animations, web apps, etc.”[32] He acknowledged that Markdown and emoji are merely “early versions” of this trend—the visual “dressing up” of text.

6.2 Thariq: Advocate for the Execution Layer

Thariq Shihipar’s professional background reveals why he focuses on the execution layer: University of Toronto graduate, product development at Rocket Insights[6], founder of a YC-backed video game company that he ran for five years, graduate studies at MIT Media Lab—and approximately a year ago joined Anthropic[7]. His 20 HTML examples—code review panels, data visualizations, interactive comparisons—are all consumer-side experience optimizations, corresponding precisely to Karpathy’s statement that “LLMs should use our preferred formats to communicate with us.”

Dimension Karpathy (Storage Layer Advocate) Thariq (Execution Layer Advocate)
Layer of focus AI’s internal knowledge management and persistence AI’s delivery to and interaction with humans
Core question “How to structure the information models can access” “How to help humans better consume AI output”
Professional background AI/ML research, pretraining, deep learning Web products, gaming, SaaS
Attitude toward the other layer Acknowledges “LLMs should use visual formats to communicate with humans”[32] Acknowledges “if the reader is a model, use Markdown”[8]
Essential role Infrastructure layer—helping AI think better Application layer—helping humans receive better
Figure 2: The two representative figures are not opposing factions but advocates for two layers of the same architecture—their views implicitly align when each acknowledges the other’s layer

This finding transforms the nature of the entire debate. This is not a format war of “Markdown camp vs. HTML camp”—it is two people standing on different floors of the same building, each describing the view they see from their vantage point. Karpathy sees the foundation (how knowledge is structurally stored), while Thariq sees the façade (how knowledge is consumed by humans). When we superimpose the two layers, what emerges is the multilayer architecture proposed in this paper. It should be noted that this observation is based on the public statements of two representative figures; larger-sample validation is a direction for future work.


Section VII

AI as Entropy Reduction and the Trend Toward Formalization

Behind the current format debate, a deeper structural force is at work: AI is fundamentally an entropy reduction machine—it ingests high-entropy human language input, extracts its structure (causal relationships, constraints, variable dependencies), and outputs low-entropy structured information. The entropy-based training framework ENTRA has demonstrated that by suppressing redundant content in LLM reasoning, output length can be reduced by 37%–53% while accuracy actually improves[22].

So where does entropy reduction point? In formalizable knowledge domains, the answer is mathematical formulas. F=ma—three symbols compress the entirety of Newtonian mechanics’ laws of motion. E=mc²—five symbols unify mass and energy. Mathematical formulas represent the information density ceiling for this class of knowledge.

limt→∞ H(AI output)formalizable = H(math)
In formalizable knowledge domains, the information entropy of AI output trends toward the information entropy of mathematical expression

This trend is not AI’s “subjective choice” but a structural inevitability. OpenAI has demonstrated that AI hallucinations are mathematically unavoidable[11]. Natural language output cannot be structurally verified, but formalized mathematical expressions can be mechanically verified by theorem provers such as Lean[12]. In the “First Proof” challenge in early 2026, AI autonomously solved more than half of the research-level mathematical problems within one week[26]. An increasing share of AI users have tool-oriented needs—they want not conversation but verifiable results.

It is not that AI subjectively wants to move toward formalization—rather, AI’s structure inevitably slides in that direction. As AI output becomes increasingly mathematized, Markdown + LaTeX’s native math ecosystem becomes the source of its long-term structural advantage. But it must equally be noted: knowledge in law, ethics, aesthetics, and narrative resists mathematical formalization; in these domains, structured natural language (Markdown) remains the optimal carrier.

Part III
The Future: Voice Interaction and Multimodal Architecture
2026 →

Section VIII

Voice: The Inevitable Dominant Mode of Human-AI Interaction

When we raise our gaze from the format layer to the interaction layer, a far larger paradigm shift comes into view: text is yielding to voice as the primary channel for human-AI interaction.

By 2026, the global AI voice assistant market is projected to exceed $50 billion, and 87.5% of developers are actively building voice agents[19]. Voice is humanity’s oldest interface—predating writing and typing by millennia[20]. Humans speak at approximately 150 words per minute; they type at approximately 40. The keyboard is becoming a secondary input method[20]. Karpathy himself noted: “People don’t actually like reading text—it is slow and effortful”[32].

This means the Markdown vs. HTML debate—fundamentally a dispute between two text markup languages—is being superseded by a larger trend. When humans interact with AI via voice, they write neither Markdown nor HTML. Both formats retreat to the “backend implementation” layer—just as no one today cares whether a browser internally uses binary or hexadecimal.

It should be noted that voice is the dominant interaction mode of the near future, but not necessarily the ultimate one. As of early 2026, Neuralink has implanted brain-computer interface devices in at least 21 patients[33], and BCI technology is transitioning from the laboratory to consumer electronics. Longer-term interaction paradigms may transcend voice, directly linking human cognition with AI. But within the foreseeable future (2026–2035), voice is the most realistic and largest-scale dominant interaction channel.

If the entire debate is about “what format should AI use to output to humans,” then the voice era’s answer is: speak it to them. The format debate is not about which is better—it is debating a question that is receding.

Section IX

The Incompressibility of Voice Data and Multimodal Alignment

Voice is not an inefficient version of text. It is an independent information dimension.

The phrase “I’m fine” conveys reassurance in a calm tone, sarcasm in a flat tone, and frustration in a sharp, elevated tone—these distinctions are irrecoverable without acoustic information. Neuroscience evidence indicates that tonal prosody is a critical channel for emotion decoding[25].

When speech is transcribed to text, emotion, hesitation, tempo changes, and breathing pauses are all compressed into flat textual symbols. This means that voice signals have irreplaceable value for multimodal AI training: if only the text transcript is preserved while the original audio is discarded, the model can never learn cross-modal relationships like “falling intonation + pause = hesitant agreement ≠ firm agreement.”

The core architecture of multimodal AI confirms this. Alibaba’s Qwen2.5-Omni separates reasoning and expression into two components—the “Thinker” processes all input modalities in the text domain and produces reasoning, while the “Talker” converts reasoning results into streaming audio in real time[27]. The textual relational structure is the anchor point for reasoning, and other modalities align to this anchor—but alignment presupposes that the original data of other modalities has been preserved. Interaction without data preservation is pure consumption—it produces none of the assets needed for AI evolution.


Section X

Markdown’s Ceiling and HTML’s Multimodal Role

In the multimodal context, Markdown has a structural ceiling: it cannot carry audio and video. Markdown’s native capability boundary is plain text plus image reference links. To embed a voice clip in an .md file, the only option is to fall back to inline HTML tags—which precisely illustrates the hierarchical relationship between the two.

HTML, by contrast, is a native multimodal container—<audio>, <video>, <canvas>, <svg>, <script>—all natively supported. In the world of multimodal AI, HTML is the only web markup language capable of simultaneously carrying text, audio, video, images, and interactivity.

If AI remains confined to the text-only world, it will forever be merely a tool for programmers and researchers. To penetrate the mass consumer market, AI must become multimodal—and the container for multimodal content on the web can only be HTML. This is not HTML “replacing” Markdown but rather HTML covering dimensions that Markdown cannot reach. It should also be noted that in native mobile applications (React Native, Flutter, SwiftUI), HTML is not the only option—but within the web ecosystem, it is irreplaceable.

HTML’s irreplaceability manifests in yet another dimension: it is the foundation of Progressive Web Apps (PWA). PWAs are built on HTML/CSS/JavaScript and possess three key capabilities: installability, offline availability, and push notifications[34]. PWAs can be indexed by search engines and more easily comply with Web accessibility standards—both global priorities in 2026. This means HTML can not only carry multimodal content but also turn AI deliverables into installable offline applications, make them discoverable by search engines, and ensure accessibility compliance—capabilities entirely absent from Markdown.

With this, the positioning of the two formats finally becomes clear:

Markdown = Carrier of text and textual training information = LLM Text Prediction Core

HTML = Carrier of multimodal training and interactive information = Multimodal Perception & Consumer Reach

Alignment of the two = Emergence of Multimodal AI Capabilities
Figure 3: Markdown and HTML are not in competition—they are the two alignment pipelines of multimodal AI

Section XI

Three-Layer Architecture: Interaction · Storage · Execution

Synthesizing the analysis across three temporal dimensions—the past (Markdown’s LLM nativeness), the present (HTML’s rise in multimodal interaction, the iceberg-ification of token economics), and the future (the inevitable dominance of voice interaction)—this paper proposes a three-layer architecture for information formats in the multimodal AI era:

Interaction Layer: Voice
The primary channel between humans and AI — Humanity’s most natural information input/output modality — The inevitable dominant interaction form of the future
Storage Layer: Markdown + LaTeX + Raw Multimodal Data (Audio/Image/Video)
Textual backbone (logic/formulas/reasoning chains) + Multimodal raw signals (emotion/tone/visuals) — The irreplaceable persistence layer
Execution Layer: HTML
Multimodal container and task delivery interface — Text+audio+video+images+interactivity — The inevitable path for consumer reach
Training Feedback Loop: All data feeds back into next-generation model training
Voice signals → Multimodal alignment | Structured text → LLM pretraining | Interaction behavior → RLHF | Formulas/code → Reasoning capability

Each layer uses the format best suited to its functional role, rather than forcing a single format to rule all layers. Voice optimizes interaction efficiency (maximizing human input bandwidth), Markdown optimizes information persistence (compression density, math nativeness, searchability, version control), and HTML optimizes delivery experience (multimodal rendering, visualization, interactivity).

Moreover, data from every layer—voice’s emotional signals, Markdown’s structured text, HTML’s user interaction behavior—must all be preserved and fed back into model training. Interaction without data preservation is pure consumption. AI’s evolution depends on the continuous accumulation of data assets from every layer.


Section XII

Counterarguments and Responses

12.1 “Markdown essentially compiles to HTML”

This is a fact. When John Gruber created Markdown, he explicitly defined it as a “text-to-HTML conversion tool.” From a technical standpoint, Markdown is HTML shorthand. But this does not undermine the arguments in this paper—just as assembly language compiles to machine code, yet we still distinguish their usage layers. Markdown and HTML serve different stages in the data lifecycle: one optimizes writing and storage, the other optimizes rendering and consumption. A compilation relationship does not equal functional equivalence.

12.2 “JSON is the real carrier for AI structured output”

In agent-to-agent communication, JSON (along with YAML, Protocol Buffers, etc.) is indeed the more appropriate format—they are more precise and more machine-parseable than both Markdown and HTML. This paper’s discussion focuses on human-readable/writable information formats. In the complete AI data ecosystem, JSON serves the machine-to-machine communication layer, Markdown serves the human-readable storage layer, and HTML serves the human-interactive rendering layer—the three operate at different communication interfaces.

12.3 “Mathematical formalization doesn’t apply to all knowledge”

Entirely agreed. Section VII already corrected this argument: the claim that AI output trends toward mathematical expression applies only to “formalizable knowledge domains.” Knowledge in legal argumentation, ethical reasoning, literary narrative, and diplomatic negotiation resists mathematical formalization; structured natural language remains the optimal carrier for these domains. This actually strengthens the long-term value of Markdown—it can carry formulas (through LaTeX extensions) as well as structured text that defies formalization.

12.4 “Markdown’s math support isn’t native”

Strictly speaking, this is correct. The CommonMark specification does not include math syntax; $...$ is LaTeX syntax parsed by third-party renderers. But the power of a de facto standard is no less than that of a formal specification. GitHub, Obsidian, Typora, Notion, and VS Code all natively support math rendering. When 35% of technical documents contain math formulas, this ecosystem advantage constitutes a substantive competitive moat.

Section XIII

Conclusion: Alignment Is Emergence

Through analysis across three temporal dimensions—past, present, and future—this paper arrives at the following conclusions:

Past (2022–2023): The LLM’s text prediction core determined that Markdown would become AI’s native format. Three converging factors—training data distribution, token economic constraints, and math ecosystem affinity—made Markdown’s dominant position an inevitability.

Present (2024–2026): The collapse of token prices loosened the economic constraint; Claude Artifacts pioneered the HTML interactive output paradigm; yet the birth and expansion of reasoning tokens rendered the format cost differential on the output side negligible. The two seemingly opposing representative figures—Karpathy and Thariq—are in fact each describing a different facet of the same multilayer architecture: the former is the spokesperson for the storage layer, the latter for the execution layer. Karpathy himself explicitly acknowledged in his 2025 year-end review that “LLMs should use visual formats to communicate with humans”—the two views implicitly aligned when each acknowledged the other’s layer. Simultaneously, AI’s intrinsic nature as an entropy reduction engine is driving output toward mathematical formalization.

Future (2026→): Voice will become the primary channel for human-AI interaction, with both Markdown and HTML retreating to the backend implementation layer. Yet Markdown remains irreplaceable due to the LLM’s text prediction core, its information compression density, and its native math ecosystem; HTML remains indispensable due to its multimodal container capabilities and consumer interaction reach. The incompressibility of voice signals demands that raw multimodal data be preserved rather than reduced to text transcripts alone.

The core of the LLM is text prediction—therefore Markdown is irreplaceable.

AI needs multimodal extension to penetrate the consumer market—therefore HTML is indispensable.

Voice is humanity’s most natural interface—therefore it will dominate future interaction.


Markdown is AI’s left brain—logic, reasoning, formulas. HTML is AI’s right brain—perception, multimodality, consumer reach. Voice is AI’s mouth and ears. The three are not in competition; they are different organs of a complete multimodal AI organism. Karpathy and Thariq are not fighting—they are standing on different floors of the same building, each describing the view from their vantage point. Alignment is emergence.

References

  1. Shihipar, T. “Using Claude Code: The Unreasonable Effectiveness of HTML.” X/Twitter, May 8, 2026.
  2. Pillitteri, P. “HTML vs Markdown in Claude Code: Why Anthropic’s Thariq Changed the Default.” pasqualepillitteri.it, May 2026.
  3. Liu, Y. “The 4 Lines Every CLAUDE.md Needs.” Level Up Coding / Medium, April 2026.
  4. MindStudio. “What Is the Karpathy LLM Wiki Pattern?” mindstudio.ai, April 2026.
  5. Zannarbor, F. “Andrej Karpathy on Books & LLMs.” Substack, October 2025.
  6. Crunchbase. “Thariq Shihipar — Founder and CEO @ One More Multiverse.” crunchbase.com.
  7. Shihipar, T. Personal website and Vibe Code Camp interview. thariq.io; davidguttman.github.io, 2026.
  8. RentierDigital. “HTML vs Markdown for AI Agent Output.” rentierdigital.xyz, May 2026. web2md.org data.
  9. Unmarkdown. “The AI Output Problem: Why Every AI Tool Writes in Markdown.” unmarkdown.com, February 2026.
  10. Markdown Visualizer. “Math & LaTeX in Markdown — Complete Guide.” markdownvisualizer.com, March 2026.
  11. Computerworld. “OpenAI Admits AI Hallucinations Are Mathematically Inevitable.” February 2026.
  12. Yang, K. et al. “Formal Mathematical Reasoning: A New Frontier in AI.” arXiv:2412.16075, December 2024.
  13. Ibbaka. “Pricing Thought: OpenAI Will Price Reasoning Tokens in o1.” ibbaka.com, September 2024.
  14. PriceWorld. “ChatGPT vs Claude vs Gemini: What Every AI Subscription Actually Costs in 2026.” March 2026.
  15. EG3. “What Are AI Reasoning Tokens and Their Hidden Costs.” eg3.com, April 2026.
  16. IntuitionLabs. “Token Optimization and Cost Management for ChatGPT & Claude.” intuitionlabs.ai, May 2026.
  17. OpenAI. “GPT models use a syntax called Markdown.” GPT documentation, November 2022.
  18. The Last Fingerprint. “How Markdown Training Shapes LLM Prose.” arXiv:2603.27006, 2026.
  19. AssemblyAI. “Voice AI in 2026.” assemblyai.com, February 2026. “87.5% of builders actively building voice agents.”
  20. ViitorCloud. “Best UI/UX Trends for AI-Powered Applications in 2026.” April 2026; Mistral AI, “Voxtral,” 2026.
  21. AI Wiki. “Claude Artifacts.” aiwiki.ai, May 2026. “Launched June 20, 2024. 500M+ Artifacts created.”
  22. ENTRA. “Entropy-Based Redundancy Avoidance in LLM Reasoning.” arXiv:2601.07123, 2026.
  23. GetPanto. “Anthropic AI Statistics 2026.” getpanto.ai, May 2026.
  24. Horecny, J. “The AI Price Collapse Is Real.” Medium, March 2026. Stanford AI Index 2025 data.
  25. PMC. “Bridging Text and Speech for Emotion Understanding.” December 2025.
  26. Quanta Magazine. “The AI Revolution in Math Has Arrived.” April 2026.
  27. Sopyla, K. “Speech-to-Speech Models in 2026.” ai.ksopyla.com, February 2026. Qwen2.5-Omni architecture.
  28. Willison, S. “Using Claude Code: The Unreasonable Effectiveness of HTML.” simonwillison.net, May 8, 2026.
  29. VentureBeat. “Karpathy shares ‘LLM Knowledge Base’ architecture that bypasses RAG with an evolving markdown library maintained by AI.” venturebeat.com, April 3, 2026. “Markdown — the most LLM-friendly and compact data format.”
  30. Codersera. “Karpathy’s LLM Knowledge Base: Build an AI Second Brain.” codersera.com, April 6, 2026. 100 articles, 400K words, zero manual editing.
  31. AI Critique. “Andrej Karpathy’s latest concept ‘LLM Wiki’ and the future of enterprise knowledge.” aicritique.org, May 8, 2026. Three-layer architecture: raw sources → wiki → schema.
  32. Karpathy, A. “2025 LLM Year in Review.” karpathy.bearblog.dev, December 2025. “Text is the raw/favored data representation for computers (and LLMs), but it is not the favored format for people.”
  33. The Week. “Neuralink and beyond: How BCIs are rewriting the future of human-technology interaction.” theweek.in, May 10, 2026. “As of early 2026, Neuralink has implanted devices in at least 21 patients.”
  34. WebPiki. “PWA in 2026: Are Progressive Web Apps Still Worth It?” webpiki.com, February 2026. PWA: HTML/CSS/JS foundation, installable, offline-capable, push notifications.

이조글로벌인공지능연구소

LEECHO Global AI Research Lab

& Claude Opus 4.6 · Anthropic


© 2026 LEECHO Global AI Research Lab. All rights reserved.

This paper was produced through human-AI collaboration under CC BY-NC 4.0.

댓글 남기기