Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessCoreWeave Stock Analysis: Buy or Sell This Nvidia-Backed AI Stock? - The Motley FoolGNews AI NVIDIAIntel Arc B70 Benchmarks/Comparison to Nvidia RTX 4070 SuperReddit r/LocalLLaMABanning All Anthropic EmployeesHacker NewsMicrosoft is automatically updating Windows 11 24H2 to 25H2 using machine learning - TweakTownGoogle News: Machine Learning80 Years to an Overnight Success: The Real History of Artificial Intelligence - Futurist SpeakerGoogle News: AIWhat next for the struggling rural mothers in China who helped to build AI?SCMP Tech (Asia AI)Apple reportedly signed a 3rd-party driver, by Tiny Corp, for AMD or Nvidia eGPUs for Apple Silicon Macs; it s meant for AI research, not accelerating graphics (AppleInsider)TechmemeBest Resume Builders in 2026: I Applied to 50 Jobs to Test TheseDEV CommunityTruth Technology and the Architecture of Digital TrustDEV CommunityI Switched From GitKraken to This Indie Git Client and I’m Not Going BackDEV CommunityWhy I Run 22 Docker Services at HomeDEV CommunityHow to Embed ChatGPT in Your Website: 5 Methods Compared [2026 Guide]DEV CommunityBlack Hat USADark ReadingBlack Hat AsiaAI BusinessCoreWeave Stock Analysis: Buy or Sell This Nvidia-Backed AI Stock? - The Motley FoolGNews AI NVIDIAIntel Arc B70 Benchmarks/Comparison to Nvidia RTX 4070 SuperReddit r/LocalLLaMABanning All Anthropic EmployeesHacker NewsMicrosoft is automatically updating Windows 11 24H2 to 25H2 using machine learning - TweakTownGoogle News: Machine Learning80 Years to an Overnight Success: The Real History of Artificial Intelligence - Futurist SpeakerGoogle News: AIWhat next for the struggling rural mothers in China who helped to build AI?SCMP Tech (Asia AI)Apple reportedly signed a 3rd-party driver, by Tiny Corp, for AMD or Nvidia eGPUs for Apple Silicon Macs; it s meant for AI research, not accelerating graphics (AppleInsider)TechmemeBest Resume Builders in 2026: I Applied to 50 Jobs to Test TheseDEV CommunityTruth Technology and the Architecture of Digital TrustDEV CommunityI Switched From GitKraken to This Indie Git Client and I’m Not Going BackDEV CommunityWhy I Run 22 Docker Services at HomeDEV CommunityHow to Embed ChatGPT in Your Website: 5 Methods Compared [2026 Guide]DEV Community
AI NEWS HUBbyEIGENVECTOREigenvector

The Rise of Language Models in Mining Software Repositories: A Survey

arXiv cs.SEby [Submitted on 1 Apr 2026]April 2, 20261 min read1 views
Source Quiz

arXiv:2604.00787v1 Announce Type: new Abstract: The Mining Software Repositories (MSR) field focuses on analysing the rich data contained in software repositories to derive actionable insights into software processes and products. Mining repositories at scale requires techniques capable of handling large volumes of heterogeneous data, a challenge for which language models (LMs) are increasingly well-suited. Since the advent of Transformer-based architectures, LMs have been rapidly adopted across a wide range of MSR tasks. This article presents a comprehensive survey of the use of LMs in MSR, based on an analysis of 85 papers. We examine how LMs are applied, the types of artefacts analysed, which models are used, how their adoption has evolved over time, and the extent to which studies supp

View PDF HTML (experimental)

Abstract:The Mining Software Repositories (MSR) field focuses on analysing the rich data contained in software repositories to derive actionable insights into software processes and products. Mining repositories at scale requires techniques capable of handling large volumes of heterogeneous data, a challenge for which language models (LMs) are increasingly well-suited. Since the advent of Transformer-based architectures, LMs have been rapidly adopted across a wide range of MSR tasks. This article presents a comprehensive survey of the use of LMs in MSR, based on an analysis of 85 papers. We examine how LMs are applied, the types of artefacts analysed, which models are used, how their adoption has evolved over time, and the extent to which studies support reproducibility and reuse. Building on this analysis, we propose a taxonomy of LM applications in MSR, identify key trends shaping the field, and highlight open challenges alongside actionable directions for future research.

Subjects:

Software Engineering (cs.SE)

Cite as: arXiv:2604.00787 [cs.SE]

(or arXiv:2604.00787v1 [cs.SE] for this version)

https://doi.org/10.48550/arXiv.2604.00787

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Miguel Romero-Arjona [view email] [v1] Wed, 1 Apr 2026 11:53:12 UTC (2,033 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
The Rise of…modellanguage mo…transformerannounceproductapplicationarXiv cs.SE

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 201 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!