ORIGINAL RESEARCH PAPER · APRIL 2026

AI Search Information Alignment
Is the Core Function of LLMs

A Function-Defining Proposition Based on Multi-Dimensional Empirical Evidence

From Descriptive Findings to Theoretical Definition

LEECHO Global AI Research Lab
이조글로벌인공지능연구소
&
Claude Opus 4.6 · Anthropic
April 6, 2026 · V1

Abstract

This paper advances a function-defining proposition: AI search information alignment is the core function of large language models (LLMs). This proposition is grounded in cross-integrated analysis of multi-dimensional empirical data, encompassing official OpenAI user behavior data (NBER Working Paper 34255), a 243-country global AI search penetration study (arxiv 2602.13415), industry-level AI traffic data (Previsible/Adobe/Semrush 2025), and the structural impact of RAG technology evolution on search paradigms. Our research finds that while existing academic literature has independently accumulated substantial empirical evidence across individual dimensions—discovering that information retrieval is the highest-frequency use case for LLMs—these studies remain at the descriptive level and have not integrated their findings into a unified functional definition. Through cross-validation across four independent evidence dimensions, this paper demonstrates that information search and alignment is not merely “one application” of LLMs but rather the foundational layer supporting all other functions, thereby bridging the gap from empirical description to theoretical definition.


SECTION 01 · Introduction

The Question: What Is the Essential Function of LLMs?

Posing the Fundamental Theoretical Problem

Since ChatGPT’s launch in November 2022, large language models (LLMs) have penetrated human information life at an unprecedented pace. By July 2025, ChatGPT had 700 million weekly users sending 18 billion messages—approximately 10% of the world’s adult population. Yet a fundamental theoretical question remains unanswered: What is the core function of LLMs from the perspective of human users?

Academia defines LLMs from a technical architecture perspective as “probabilistic language models trained on massive text corpora,” with their core capability described as “language understanding and generation.” But this is an answer from technical ontology, not from functional phenomenology. When billions of users interact with LLMs, what are they actually doing? Should this behavioral-level reality inform how we define the core function of LLMs?

The central argument of this paper is: cross-dimensional analysis of user behavior data, global search penetration rates, industry distribution, and technical architecture logic yields a clear conclusion—AI search information alignment is the core function of LLMs. Information retrieval and alignment is not one function among many alongside coding, writing, and analysis—it is the foundational layer supporting all of them.


SECTION 02 · Evidence Dimension One

User Behavior Data: What Do People Use LLMs For?

The Largest-Scale Study of ChatGPT Usage Patterns

In September 2025, OpenAI’s economic research team and Harvard economist David Deming jointly published the largest-scale study of ChatGPT usage behavior to date (NBER Working Paper No. 34255), based on privacy-protected analysis of 1.5 million real conversations.

Information Seeking (Asking)
49%
Share of all messages
Task Execution (Doing)
40%
Writing, coding, planning
Personal Expression
11%
Reflection, entertainment
Top 3 Topic Coverage
~80%
Guidance + Info + Writing

The paper classified conversations by topic and found that “Practical Guidance” (28.1%), “Seeking Information” (21.3%→24%), and “Writing” (28.3%→24%) were the three most common topics, collectively accounting for nearly 80% of all conversations.

Key Trend: “Seeking Information” was the fastest-growing category—rising from 14% in July 2024 to 24% in July 2025, a 71% increase within one year. During the same period, “Writing” declined from 36% to 24%, and “Technical Help” fell from 12% to 5%. Programming queries accounted for only 4.2% of all messages.

More notably, “Practical Guidance” is essentially a form of personalized information search and alignment. The paper explicitly distinguishes: practical guidance involves “highly customized user consultations that can be adjusted through dialogue and follow-up questions,” while information seeking involves “factual information that should be consistent for all users.” The shared underlying logic is: users have information needs, and LLMs help them find and align the most relevant content.

Independent survey data further reinforces this finding:

Data Source Year Key Finding
NBEROpenAI Signals 2026.02 75% of conversations focused on practical guidance, information seeking, and writing
SurveySearcherries 2026.02 n=1,090: Quick fact-checking 67%, deep research 52.3% as top uses
SurveyPew/AP-NORC 2025 60% of Americans use AI for information search—ranked #1 across all uses
IndustryTTMS 2025 2/3 of LLM users report using LLMs as a search engine
SurveyHigherVisibility 2025.08 54.1% of users search with AI daily; 78% multiple times per week
Evidence Summary

Across all data sources and classification frameworks, information retrieval, search, and guidance consistently dominate LLM usage. “Information seeking” is the only category showing sustained growth, while coding and technical assistance are declining. This indicates that as LLM user bases expand from technical professionals to the general public, the centrality of information search will only strengthen further.


SECTION 03 · Evidence Dimension Two

Global Search Penetration: The Explosive Growth of AI Search

A 243-Country Empirical Study

A large-scale empirical study published in February 2026, covering 243 countries, 24,000 queries, and 2.8 million search results (arxiv 2602.13415), provides the most authoritative data on global AI search penetration rates.

US Queries Answered by AI
67%
2024: 42%
AI Monthly Sessions vs Search
56%
56% of global search engines
AI Referral Traffic Growth
527%
YoY, Jan–May 2025
Total Search Volume Change
+26%
Global search volume growth

A key finding of this study is that AI search has not replaced traditional search but rather amplified the total volume of human search behavior. Traditional search has not decreased; instead, total search volume (combining search engines and LLM search) has grown 26% globally. This demonstrates that “information retrieval” is the most fundamental need AI fulfills—it is not a zero-sum replacement but a release of latent demand.

The geographic distribution of AI search growth across countries also reveals deeper trends:

Country AI Search Growth Rate 2025 Queries Answered by AI
Brazil +82% ~33%
Indonesia +78% ~28%
Japan +76% ~36%
Mexico +73% ~38%
United States +60% 67%
India +54% ~59%
United Kingdom +44% ~52%
Core Insight: Countries with later adoption show faster growth. AI search growth rates in low-income countries are more than 4× those of high-income countries (OpenAI Signals, 2026). This is not organic user behavior but platform-driven global penetration—AI search is becoming the global infrastructure for human information access.

SECTION 04 · Evidence Dimension Three

Industry Distribution: Who Uses AI Search, and for What?

Non-Technical Industries Lead AI Search Adoption

If AI search were merely a tool for technical professionals, it could not be defined as a “core function.” The data shows precisely the opposite—non-technical industries with mainstream users show higher proportional usage of AI search.

YMYL Industries Lead AI Search Adoption

YMYL (Your Money Your Life) industries—those involving financial and health decisions—exhibit the highest AI search growth rates:

Industry AI Traffic Growth Multiple Share of Total LLM Sessions
Legal 11.9× Combined 55%
Finance 2.9×
Healthcare 2.9×
Insurance 2.5×
Small Business 2.2×
SaaS 1.8× Lower
E-commerce 1.4× Lower

These five high-consultation industries account for 55% of all LLM-driven sessions. Users in these industries pose consultation-oriented, trust-intensive, complex questions—”What should I ask a lawyer before signing a contract?” “Is this medication safe given my condition?” “How should I set up payroll for a 5-person flower shop?”

The AI Search Explosion in Consumer Sectors

Adobe’s research shows that retail AI referral traffic grew 35× in the 11 months from July 2024 to May 2025, travel grew 33×, and banking grew 28×. In consumer electronics and home appliances, 40–55% of buyers use AI search to assist purchase decisions.

Counterintuitive Finding: Programming queries account for only 4.2% of all ChatGPT messages. India’s programming query rate is 3× the global median—demonstrating that coding is a market-specific feature, not a mainstream global AI use case. The true mainstream is: ordinary people using AI to search for information, obtain guidance, and make decisions.

SECTION 05 · Evidence Dimension Four

Technical Architecture: RAG as the Bridge to Information Alignment

From Static Memory Recall to Dynamic Knowledge Connection

From a technical architecture perspective, the emergence and proliferation of Retrieval-Augmented Generation (RAG) technology has provided infrastructure-level support for “AI search information alignment.”

RAG was introduced in 2020 by a research team from Meta AI, University College London, and New York University at NeurIPS. Its core mechanism pairs a retrieval component with a large language model—the retriever finds relevant passages, and the generator composes fluent answers. This technology transformed LLMs from “static memory replay” to “dynamic knowledge connection.”

RAG’s Essential Contribution: Without RAG, LLMs merely recall patterns from training data; with RAG, LLMs genuinely search. RAG transformed AI search from a “possible byproduct” to an “architecture-level core capability.” All major generative search engines today—Google AI Overviews, ChatGPT Search, Perplexity, Claude—operate on RAG or its variants.

The introduction of RAG brought three structural changes:

First, from keyword matching to semantic understanding. Traditional search engines match user queries to document keywords. RAG-LLM systems understand the semantic intent of queries and can find relevant content even when the query wording differs entirely from the document.

Second, from document ranking to information synthesis. Traditional search returns a list of links for users to filter and integrate themselves. RAG-LLM systems retrieve information from multiple sources and synthesize it into a coherent, cited answer.

Third, from single retrieval to iterative alignment. Traditional search is a one-shot query-response process. RAG-LLM systems support multi-turn dialogue where users progressively refine their needs and the system continuously re-retrieves and adjusts output, achieving incremental information alignment.

Market projections for AI knowledge management systems further corroborate this trend: the U.S. AI knowledge management systems market is projected to reach $3.1 billion in 2025, growing at a 42.9% CAGR to $68.7 billion by 2034. The GEO (Generative Engine Optimization) market starts at $848 million in 2025, projected to grow at a 50.5% CAGR to $33.7 billion by 2034. The entire industry is allocating resources around the core function of “AI search.”


SECTION 06 · Core Argument

Why “Core Function” Rather Than “Most Common Use”

The Functional Dependency Argument

Highest usage frequency alone is insufficient to support a claim of “core function.” The key argument of this paper is that information search and alignment is not only the most frequent use case but also possesses functional dependency—all other LLM functions depend on it as a prerequisite.

The Functional Dependency Argument

LLM Application Implicit Information Search Requirement
Programming Retrieving code patterns, API documentation, best practices
Writing Retrieving facts, source material, style references, background knowledge
Data Analysis Retrieving analytical frameworks, industry benchmarks, comparative data
Decision Consulting Retrieving options, pro/con evidence, case references
Translation Retrieving contextual adaptations, terminology equivalents, cultural background
Creative Generation Retrieving inspiration sources, style references, constraint conditions

No high-frequency AI use case can bypass the underlying step of “information retrieval and alignment.” When a user asks an LLM to write an article, the LLM is essentially: retrieving the most relevant information from training data or external knowledge sources → filtering and aligning it according to user intent → generating output in an appropriate format. “Generation” is the final step; “retrieval and alignment” constitutes the first two.

The Demand Release Argument

AI search is not zero-sum replacing traditional search but releasing suppressed information demand. Global search volume (search engines + AI search) has grown 26%, meaning there existed a vast reservoir of information needs previously unsatisfied by traditional search—questions too complex, too personalized, or too requiring of synthetic judgment—now fulfilled through LLMs’ information alignment capabilities.

The Cognitive Paradigm Shift Argument

The arxiv 2602.13415 paper reveals a deep cognitive change: search is undergoing a paradigm shift from “navigation” to “synthesis.” The web taught billions of people to navigate knowledge by selecting sources. AI search is retraining them to default to trusting a synthesized, integrated answer. This is not merely a tool-level replacement but a fundamental reorganization of how humans access information.

Argument by Analogy: The core function of a car is “transportation,” not “the engine.” The engine is the technical architecture that enables transportation, but users buy cars for mobility. Similarly, the technical architecture of LLMs is “language understanding and generation,” but users employ LLMs for “information search and alignment.” Defining core function should proceed from the user’s functional needs, not from technical implementation.

SECTION 07 · Literature Comparison

Position of Existing Research and This Paper’s Contribution

From Empirical Description to Functional Definition

This paper’s proposition has a precise relationship to existing academic literature: existing studies provide empirical evidence but remain at the descriptive level, without making a function-defining judgment.

Paper What Was Found What Was Concluded Step Not Taken
NBERHow People Use ChatGPT (2025) Practical guidance + info seeking + writing = 80% Descriptive: three categories listed as parallel Did not subsume “practical guidance” under broad information alignment; no functional ranking
arxivRise of AI Search (2026) 67% of US queries answered by AI AI search “possibly” the most impactful application Focused on impact, not definition; used qualifying language (“possibly”)
ACMLLMs for IR: A Survey (2025) LLMs bring three paradigm shifts to IR LLMs have enhanced IR systems Perspective: “LLMs serve search” rather than “search defines LLMs”
SIGIRLLMs and Future of IR (2024) LLMs will not replace search engines LLMs need to learn to use search engines Treated search as an external tool, not an intrinsic function
SpringerLLMs for IR: Challenges (2025) LLMs address “compromised information needs” LLMs bridge the gap between user needs and system response Treated information alignment as one capability, not the core definition
isjtrendLLMs as Info Seeking Tool (2024) LLMs facilitate information seeking behavior Combined use of LLMs with traditional methods is advisable Used “facilitate” rather than “core function”
This Paper’s Contribution: This paper performs multi-dimensional cross-integration of findings from six independent studies: user behavior data (NBER) × global penetration rates (arxiv) × industry distribution (Previsible/Adobe) × technical architecture logic (RAG), distilling a defining proposition from descriptive findings. This represents a theoretical leap from “what LLMs are used for” to “what the core function of LLMs is.”

SECTION 08 · Qualifications and Counterarguments

Temporal Qualification and Possible Objections

Acknowledging Boundaries and Addressing Critiques

Temporal Qualification

This paper’s proposition requires an explicit temporal qualification: In the current phase (2024–2026), information search and alignment is the core function of human AI usage. AI Agent technology is developing rapidly, and if AI begins autonomously executing tasks at scale in the future (shopping, booking, management), “execution” may rival “search.” However, as of 2026, only 24% of consumers feel comfortable with AI agents autonomously shopping on their behalf, and agents remain in their early stages.

Possible Objections and Responses

Objection 1: “The core of LLMs is language capability; search is merely an application.”

Response: This reflects the difference between technical ontology and functional phenomenology. We do not deny that language capability is the technical foundation of LLMs, but argue that functional definitions from the user perspective should reflect actual usage reality. The core technology of a smartphone is its chips and operating system, but functionally, it is a communication and information device.

Objection 2: “Writing (28.3%) ranks higher than information seeking (21.3%), so writing is the primary function.”

Response: (a) Information seeking is the only category showing sustained growth (14%→24%), while writing is declining (36%→24%); (b) writing itself depends on information retrieval as a prerequisite step; (c) “Practical Guidance” (28.1%) is essentially personalized information alignment and should be subsumed under broad information search. Combined, broad information search alignment far exceeds writing.

Objection 3: “11% of ‘personal expression’ does not constitute information search.”

Response: We do not claim that all LLM usage is information search. The 11% of personal expression (emotional companionship, entertainment) indeed cannot be captured by this definition. However, defining a core function does not require 100% coverage—just as “communication” is the core function of a phone, yet using phones for photography and gaming does not alter that judgment.


SECTION 09 · Conclusion

Conclusion and Outlook

From Empirical Evidence to Theoretical Definition

Based on cross-validation across four independent evidence dimensions—user behavior data, global search penetration rates, industry distribution data, and technical architecture logic—this paper arrives at the following conclusion:

Core Proposition: In the current phase of 2024–2026, AI search information alignment is the core function of large language models (LLMs). Information retrieval and alignment is not a vertical application of LLMs but the foundational layer supporting all other functions and the cornerstone function of the entire AI industry.

This proposition fills a theoretical gap in existing academic literature, bridging the distance from empirical description to functional definition. While everyone debates the “generative” capabilities of LLMs, AI’s greatest value actually lies in “finding” and “aligning.” The primary need humans bring to LLMs is not creating new content but finding knowledge that best matches their needs within a vast ocean of information—and completing this alignment process in an unprecedentedly natural way.

This insight carries profound practical implications for the industry:

First, for LLM product design, search and information alignment capability (rather than pure generation quality) should be the top optimization priority.

Second, for GEO/AEO practitioners, understanding the information alignment essence of LLMs enables more fundamental content strategy formulation—the goal is not “being cited by AI” but becoming a high-quality signal source within AI’s alignment process.

Third, for AI governance, recognizing that LLMs are fundamentally intermediary systems for human information access means that standards for accuracy, fairness, and transparency should be modeled on information infrastructure (not creative tools).

References

  1. Chatterji, A., Cunningham, T., Deming, D., Hitzig, Z., Ong, C., Shan, C.Y., & Wadman, K. (2025). “How People Use ChatGPT.” NBER Working Paper No. 34255. Harvard IRB25-0983.
  2. arxiv 2602.13415 (2026). “The Rise of AI Search: Implications for Information Markets and Human Judgement at Scale.” 243 countries, 24,000 queries, 2.8M search results.
  3. Zhu, Y. et al. (2025). “Large Language Models for Information Retrieval: A Survey.” ACM Transactions on Information Systems.
  4. SIGIR 2024. “Large Language Models and Future of Information Retrieval: Opportunities and Challenges.” ACM.
  5. Datenbank-Spektrum / Springer (2025). “Large Language Models for Information Retrieval: Challenges and Chances.”
  6. isjtrend (2024). “Large Language Models (LLMs) as a Tool to Facilitate Information Seeking Behavior.”
  7. Taylor & Francis (2025). “Information Retrieval in the Age of Generative AI: A Mismatch That Matters.”
  8. Lewis, P. et al. (2020). “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.” NeurIPS 2020.
  9. Pradeep, R. et al. (2023). “GEO: Generative Engine Optimization.” arXiv:2311.09735.
  10. Previsible (2025). “2025 AI Traffic Report.” 19 GA4 properties, 527% YoY growth.
  11. Adobe (2025). “The Explosive Rise of Generative AI Referral Traffic.” & “Q2 2025 AI-Driven Traffic Report.”
  12. OpenAI (2026). “OpenAI Signals.” Based on NBER Working Paper 34255.
  13. Searcherries (2026). “AI Search Statistics for 2026.” n=1,090, February 2026 survey.
  14. HigherVisibility (2025). “How People Search in 2025.” Identical surveys Feb & Aug 2025, n=1,500.
  15. Frase.io (2026). “What is Generative Engine Optimization (GEO)? 2026 Guide.”
  16. Grand View Research (2025). “AI Search Engine Market Size, Share | Industry Report, 2033.”
  17. Semrush (2025). Multiple AI search visibility studies, LLM visitor conversion data.
  18. Seer Interactive (2025). LLM traffic conversion rates by platform.

“The most consequential application of AI today may simply be helping humans find what they’re looking for — and aligning it with what they actually need.”

LEECHO Global AI Research Lab · 이조글로벌인공지능연구소 & Claude Opus 4.6 · Anthropic
V1 · April 6, 2026

댓글 남기기