Research Papers research paper arxiv ai artificial-intelligence

Beyond Via: Analysis and Estimation of the Impact of Large Language Models in Academic Papers

arXivMarch 26, 202610 min read0 views

Through an analysis of arXiv papers, we report several shifts in word usage that are likely driven by large language models (LLMs) but have not previously received sufficient attention, such as the increased frequency of "beyond" and "via" in titles and the decreased frequency of "the" and "of" in abstracts. Due to the similarities among different LLMs, experiments show that current classifiers struggle to accurately determine which specific model generated a given text in multi-class classification tasks. Meanwhile, variations across LLMs also result in evolving patterns of word usage in acad — Mingmeng Geng, Yuhang Dong, Thierry Poibeau

View PDF HTML (experimental)

Abstract:Through an analysis of arXiv papers, we report several shifts in word usage that are likely driven by large language models (LLMs) but have not previously received sufficient attention, such as the increased frequency of "beyond" and "via" in titles and the decreased frequency of "the" and "of" in abstracts. Due to the similarities among different LLMs, experiments show that current classifiers struggle to accurately determine which specific model generated a given text in multi-class classification tasks. Meanwhile, variations across LLMs also result in evolving patterns of word usage in academic papers. By adopting a direct and highly interpretable linear approach and accounting for differences between models and prompts, we quantitatively assess these effects and show that real-world LLM usage is heterogeneous and dynamic.

Comments: Visualization of word usage patterns in arXiv abstracts: this https URL

Subjects:

Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Digital Libraries (cs.DL); Machine Learning (cs.LG)

Cite as: arXiv:2603.25638 [cs.CL]

(or arXiv:2603.25638v1 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2603.25638

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Mingmeng Geng [view email] [v1] Thu, 26 Mar 2026 16:49:00 UTC (678 KB)

Original source

arXiv

https://arxiv.org/abs/2603.25638v1

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Research PapersRecent

OptiMer: Optimal Distribution Vector Merging Is Better than Data Mixing for Continual Pre-Training

OptiMer enables flexible continual pre-training by decoupling data mixture ratio selection from training through post-hoc Bayesian optimization of distribution vectors extracted from individual dataset models. (1 upvotes on HuggingFace)

HuggingFace Papers

8m1 day ago

Research Papers

LongCat-Next: Lexicalizing Modalities as Discrete Tokens

Discrete Native Autoregressive framework enables unified multimodal processing by representing diverse modalities in a shared discrete space through a novel visual transformer architecture. (43 upvotes on HuggingFace)

HuggingFace Papers

8m3 days ago

Research PapersRecent

CARLA-Air: Fly Drones Inside a CARLA World -- A Unified Infrastructure for Air-Ground Embodied Intelligence

CARLA-Air integrates high-fidelity driving and multirotor flight simulation within a unified Unreal Engine framework, supporting joint air-ground agent modeling with photorealistic environments and multi-modal sensing capabilities. (1 upvotes on HuggingFace)

HuggingFace Papers

8m2 days ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 170 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research PapersRecent

CARLA-Air: Fly Drones Inside a CARLA World -- A Unified Infrastructure for Air-Ground Embodied Intelligence

HuggingFace Papers

8m2 days ago

Research Papers

LongCat-Next: Lexicalizing Modalities as Discrete Tokens

HuggingFace Papers

8m3 days ago

Research PapersRecent

OptiMer: Optimal Distribution Vector Merging Is Better than Data Mixing for Continual Pre-Training

HuggingFace Papers

8m1 day ago

Research Papers

AutoWeather4D: Autonomous Driving Video Weather Conversion via G-Buffer Dual-Pass Editing

AutoWeather4D is a 3D-aware weather editing framework that decouples geometry and illumination through a dual-pass mechanism, enabling efficient and physically accurate weather modification for autonomous driving applications. (1 upvotes on HuggingFace)

HuggingFace Papers

8m4 days ago