This paper advances a function-defining proposition: AI search information alignment is the core function of large language models (LLMs). This proposition is grounded in cross-integrated analysis of multi-dimensional empirical data, encompassing official OpenAI user behavior data (NBER Working Paper 34255), a 243-country global AI search penetration study (arxiv 2602.13415), industry-level AI traffic data (Previsible/Adobe/Semrush 2025), and the structural impact of RAG technology evolution on search paradigms. Our research finds that while existing academic literature has independently accumulated substantial empirical evidence across individual dimensions—discovering that information retrieval is the highest-frequency use case for LLMs—these studies remain at the descriptive level and have not integrated their findings into a unified functional definition. Through cross-validation across four independent evidence dimensions, this paper demonstrates that information search and alignment is not merely “one application” of LLMs but rather the foundational layer supporting all other functions, thereby bridging the gap from empirical description to theoretical definition.
The Question: What Is the Essential Function of LLMs?
Posing the Fundamental Theoretical Problem
Since ChatGPT’s launch in November 2022, large language models (LLMs) have penetrated human information life at an unprecedented pace. By July 2025, ChatGPT had 700 million weekly users sending 18 billion messages—approximately 10% of the world’s adult population. Yet a fundamental theoretical question remains unanswered: What is the core function of LLMs from the perspective of human users?
Academia defines LLMs from a technical architecture perspective as “probabilistic language models trained on massive text corpora,” with their core capability described as “language understanding and generation.” But this is an answer from technical ontology, not from functional phenomenology. When billions of users interact with LLMs, what are they actually doing? Should this behavioral-level reality inform how we define the core function of LLMs?
The central argument of this paper is: cross-dimensional analysis of user behavior data, global search penetration rates, industry distribution, and technical architecture logic yields a clear conclusion—AI search information alignment is the core function of LLMs. Information retrieval and alignment is not one function among many alongside coding, writing, and analysis—it is the foundational layer supporting all of them.
User Behavior Data: What Do People Use LLMs For?
The Largest-Scale Study of ChatGPT Usage Patterns
In September 2025, OpenAI’s economic research team and Harvard economist David Deming jointly published the largest-scale study of ChatGPT usage behavior to date (NBER Working Paper No. 34255), based on privacy-protected analysis of 1.5 million real conversations.
The paper classified conversations by topic and found that “Practical Guidance” (28.1%), “Seeking Information” (21.3%→24%), and “Writing” (28.3%→24%) were the three most common topics, collectively accounting for nearly 80% of all conversations.
More notably, “Practical Guidance” is essentially a form of personalized information search and alignment. The paper explicitly distinguishes: practical guidance involves “highly customized user consultations that can be adjusted through dialogue and follow-up questions,” while information seeking involves “factual information that should be consistent for all users.” The shared underlying logic is: users have information needs, and LLMs help them find and align the most relevant content.
Independent survey data further reinforces this finding:
| Data Source | Year | Key Finding |
|---|---|---|
| NBEROpenAI Signals | 2026.02 | 75% of conversations focused on practical guidance, information seeking, and writing |
| SurveySearcherries | 2026.02 | n=1,090: Quick fact-checking 67%, deep research 52.3% as top uses |
| SurveyPew/AP-NORC | 2025 | 60% of Americans use AI for information search—ranked #1 across all uses |
| IndustryTTMS | 2025 | 2/3 of LLM users report using LLMs as a search engine |
| SurveyHigherVisibility | 2025.08 | 54.1% of users search with AI daily; 78% multiple times per week |
Across all data sources and classification frameworks, information retrieval, search, and guidance consistently dominate LLM usage. “Information seeking” is the only category showing sustained growth, while coding and technical assistance are declining. This indicates that as LLM user bases expand from technical professionals to the general public, the centrality of information search will only strengthen further.
Global Search Penetration: The Explosive Growth of AI Search
A 243-Country Empirical Study
A large-scale empirical study published in February 2026, covering 243 countries, 24,000 queries, and 2.8 million search results (arxiv 2602.13415), provides the most authoritative data on global AI search penetration rates.
A key finding of this study is that AI search has not replaced traditional search but rather amplified the total volume of human search behavior. Traditional search has not decreased; instead, total search volume (combining search engines and LLM search) has grown 26% globally. This demonstrates that “information retrieval” is the most fundamental need AI fulfills—it is not a zero-sum replacement but a release of latent demand.
The geographic distribution of AI search growth across countries also reveals deeper trends:
| Country | AI Search Growth Rate | 2025 Queries Answered by AI |
|---|---|---|
| Brazil | +82% | ~33% |
| Indonesia | +78% | ~28% |
| Japan | +76% | ~36% |
| Mexico | +73% | ~38% |
| United States | +60% | 67% |
| India | +54% | ~59% |
| United Kingdom | +44% | ~52% |
Industry Distribution: Who Uses AI Search, and for What?
Non-Technical Industries Lead AI Search Adoption
If AI search were merely a tool for technical professionals, it could not be defined as a “core function.” The data shows precisely the opposite—non-technical industries with mainstream users show higher proportional usage of AI search.
YMYL Industries Lead AI Search Adoption
YMYL (Your Money Your Life) industries—those involving financial and health decisions—exhibit the highest AI search growth rates:
| Industry | AI Traffic Growth Multiple | Share of Total LLM Sessions |
|---|---|---|
| Legal | 11.9× | Combined 55% |
| Finance | 2.9× | |
| Healthcare | 2.9× | |
| Insurance | 2.5× | |
| Small Business | 2.2× | |
| SaaS | 1.8× | Lower |
| E-commerce | 1.4× | Lower |
These five high-consultation industries account for 55% of all LLM-driven sessions. Users in these industries pose consultation-oriented, trust-intensive, complex questions—”What should I ask a lawyer before signing a contract?” “Is this medication safe given my condition?” “How should I set up payroll for a 5-person flower shop?”
The AI Search Explosion in Consumer Sectors
Adobe’s research shows that retail AI referral traffic grew 35× in the 11 months from July 2024 to May 2025, travel grew 33×, and banking grew 28×. In consumer electronics and home appliances, 40–55% of buyers use AI search to assist purchase decisions.
Technical Architecture: RAG as the Bridge to Information Alignment
From Static Memory Recall to Dynamic Knowledge Connection
From a technical architecture perspective, the emergence and proliferation of Retrieval-Augmented Generation (RAG) technology has provided infrastructure-level support for “AI search information alignment.”
RAG was introduced in 2020 by a research team from Meta AI, University College London, and New York University at NeurIPS. Its core mechanism pairs a retrieval component with a large language model—the retriever finds relevant passages, and the generator composes fluent answers. This technology transformed LLMs from “static memory replay” to “dynamic knowledge connection.”
The introduction of RAG brought three structural changes:
First, from keyword matching to semantic understanding. Traditional search engines match user queries to document keywords. RAG-LLM systems understand the semantic intent of queries and can find relevant content even when the query wording differs entirely from the document.
Second, from document ranking to information synthesis. Traditional search returns a list of links for users to filter and integrate themselves. RAG-LLM systems retrieve information from multiple sources and synthesize it into a coherent, cited answer.
Third, from single retrieval to iterative alignment. Traditional search is a one-shot query-response process. RAG-LLM systems support multi-turn dialogue where users progressively refine their needs and the system continuously re-retrieves and adjusts output, achieving incremental information alignment.
Market projections for AI knowledge management systems further corroborate this trend: the U.S. AI knowledge management systems market is projected to reach $3.1 billion in 2025, growing at a 42.9% CAGR to $68.7 billion by 2034. The GEO (Generative Engine Optimization) market starts at $848 million in 2025, projected to grow at a 50.5% CAGR to $33.7 billion by 2034. The entire industry is allocating resources around the core function of “AI search.”
Why “Core Function” Rather Than “Most Common Use”
The Functional Dependency Argument
Highest usage frequency alone is insufficient to support a claim of “core function.” The key argument of this paper is that information search and alignment is not only the most frequent use case but also possesses functional dependency—all other LLM functions depend on it as a prerequisite.
The Functional Dependency Argument
| LLM Application | Implicit Information Search Requirement |
|---|---|
| Programming | Retrieving code patterns, API documentation, best practices |
| Writing | Retrieving facts, source material, style references, background knowledge |
| Data Analysis | Retrieving analytical frameworks, industry benchmarks, comparative data |
| Decision Consulting | Retrieving options, pro/con evidence, case references |
| Translation | Retrieving contextual adaptations, terminology equivalents, cultural background |
| Creative Generation | Retrieving inspiration sources, style references, constraint conditions |
No high-frequency AI use case can bypass the underlying step of “information retrieval and alignment.” When a user asks an LLM to write an article, the LLM is essentially: retrieving the most relevant information from training data or external knowledge sources → filtering and aligning it according to user intent → generating output in an appropriate format. “Generation” is the final step; “retrieval and alignment” constitutes the first two.
The Demand Release Argument
AI search is not zero-sum replacing traditional search but releasing suppressed information demand. Global search volume (search engines + AI search) has grown 26%, meaning there existed a vast reservoir of information needs previously unsatisfied by traditional search—questions too complex, too personalized, or too requiring of synthetic judgment—now fulfilled through LLMs’ information alignment capabilities.
The Cognitive Paradigm Shift Argument
The arxiv 2602.13415 paper reveals a deep cognitive change: search is undergoing a paradigm shift from “navigation” to “synthesis.” The web taught billions of people to navigate knowledge by selecting sources. AI search is retraining them to default to trusting a synthesized, integrated answer. This is not merely a tool-level replacement but a fundamental reorganization of how humans access information.
Position of Existing Research and This Paper’s Contribution
From Empirical Description to Functional Definition
This paper’s proposition has a precise relationship to existing academic literature: existing studies provide empirical evidence but remain at the descriptive level, without making a function-defining judgment.
| Paper | What Was Found | What Was Concluded | Step Not Taken |
|---|---|---|---|
| NBERHow People Use ChatGPT (2025) | Practical guidance + info seeking + writing = 80% | Descriptive: three categories listed as parallel | Did not subsume “practical guidance” under broad information alignment; no functional ranking |
| arxivRise of AI Search (2026) | 67% of US queries answered by AI | AI search “possibly” the most impactful application | Focused on impact, not definition; used qualifying language (“possibly”) |
| ACMLLMs for IR: A Survey (2025) | LLMs bring three paradigm shifts to IR | LLMs have enhanced IR systems | Perspective: “LLMs serve search” rather than “search defines LLMs” |
| SIGIRLLMs and Future of IR (2024) | LLMs will not replace search engines | LLMs need to learn to use search engines | Treated search as an external tool, not an intrinsic function |
| SpringerLLMs for IR: Challenges (2025) | LLMs address “compromised information needs” | LLMs bridge the gap between user needs and system response | Treated information alignment as one capability, not the core definition |
| isjtrendLLMs as Info Seeking Tool (2024) | LLMs facilitate information seeking behavior | Combined use of LLMs with traditional methods is advisable | Used “facilitate” rather than “core function” |
Temporal Qualification and Possible Objections
Acknowledging Boundaries and Addressing Critiques
Temporal Qualification
This paper’s proposition requires an explicit temporal qualification: In the current phase (2024–2026), information search and alignment is the core function of human AI usage. AI Agent technology is developing rapidly, and if AI begins autonomously executing tasks at scale in the future (shopping, booking, management), “execution” may rival “search.” However, as of 2026, only 24% of consumers feel comfortable with AI agents autonomously shopping on their behalf, and agents remain in their early stages.
Possible Objections and Responses
Objection 1: “The core of LLMs is language capability; search is merely an application.”
Response: This reflects the difference between technical ontology and functional phenomenology. We do not deny that language capability is the technical foundation of LLMs, but argue that functional definitions from the user perspective should reflect actual usage reality. The core technology of a smartphone is its chips and operating system, but functionally, it is a communication and information device.
Objection 2: “Writing (28.3%) ranks higher than information seeking (21.3%), so writing is the primary function.”
Response: (a) Information seeking is the only category showing sustained growth (14%→24%), while writing is declining (36%→24%); (b) writing itself depends on information retrieval as a prerequisite step; (c) “Practical Guidance” (28.1%) is essentially personalized information alignment and should be subsumed under broad information search. Combined, broad information search alignment far exceeds writing.
Objection 3: “11% of ‘personal expression’ does not constitute information search.”
Response: We do not claim that all LLM usage is information search. The 11% of personal expression (emotional companionship, entertainment) indeed cannot be captured by this definition. However, defining a core function does not require 100% coverage—just as “communication” is the core function of a phone, yet using phones for photography and gaming does not alter that judgment.
Conclusion and Outlook
From Empirical Evidence to Theoretical Definition
Based on cross-validation across four independent evidence dimensions—user behavior data, global search penetration rates, industry distribution data, and technical architecture logic—this paper arrives at the following conclusion:
This proposition fills a theoretical gap in existing academic literature, bridging the distance from empirical description to functional definition. While everyone debates the “generative” capabilities of LLMs, AI’s greatest value actually lies in “finding” and “aligning.” The primary need humans bring to LLMs is not creating new content but finding knowledge that best matches their needs within a vast ocean of information—and completing this alignment process in an unprecedentedly natural way.
This insight carries profound practical implications for the industry:
First, for LLM product design, search and information alignment capability (rather than pure generation quality) should be the top optimization priority.
Second, for GEO/AEO practitioners, understanding the information alignment essence of LLMs enables more fundamental content strategy formulation—the goal is not “being cited by AI” but becoming a high-quality signal source within AI’s alignment process.
Third, for AI governance, recognizing that LLMs are fundamentally intermediary systems for human information access means that standards for accuracy, fairness, and transparency should be modeled on information infrastructure (not creative tools).
References
- Chatterji, A., Cunningham, T., Deming, D., Hitzig, Z., Ong, C., Shan, C.Y., & Wadman, K. (2025). “How People Use ChatGPT.” NBER Working Paper No. 34255. Harvard IRB25-0983.
- arxiv 2602.13415 (2026). “The Rise of AI Search: Implications for Information Markets and Human Judgement at Scale.” 243 countries, 24,000 queries, 2.8M search results.
- Zhu, Y. et al. (2025). “Large Language Models for Information Retrieval: A Survey.” ACM Transactions on Information Systems.
- SIGIR 2024. “Large Language Models and Future of Information Retrieval: Opportunities and Challenges.” ACM.
- Datenbank-Spektrum / Springer (2025). “Large Language Models for Information Retrieval: Challenges and Chances.”
- isjtrend (2024). “Large Language Models (LLMs) as a Tool to Facilitate Information Seeking Behavior.”
- Taylor & Francis (2025). “Information Retrieval in the Age of Generative AI: A Mismatch That Matters.”
- Lewis, P. et al. (2020). “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.” NeurIPS 2020.
- Pradeep, R. et al. (2023). “GEO: Generative Engine Optimization.” arXiv:2311.09735.
- Previsible (2025). “2025 AI Traffic Report.” 19 GA4 properties, 527% YoY growth.
- Adobe (2025). “The Explosive Rise of Generative AI Referral Traffic.” & “Q2 2025 AI-Driven Traffic Report.”
- OpenAI (2026). “OpenAI Signals.” Based on NBER Working Paper 34255.
- Searcherries (2026). “AI Search Statistics for 2026.” n=1,090, February 2026 survey.
- HigherVisibility (2025). “How People Search in 2025.” Identical surveys Feb & Aug 2025, n=1,500.
- Frase.io (2026). “What is Generative Engine Optimization (GEO)? 2026 Guide.”
- Grand View Research (2025). “AI Search Engine Market Size, Share | Industry Report, 2033.”
- Semrush (2025). Multiple AI search visibility studies, LLM visitor conversion data.
- Seer Interactive (2025). LLM traffic conversion rates by platform.