Research Papers announce open-source million analysis study code generation

From Code Changes to Quality Gains: An Empirical Study in Python ML Systems with PyQu

arXiv cs.SEby Mohamed Almukhtar, Anwar Ghammam, Marouane Kessentini, Hua MingApril 2, 20262 min read0 views

arXiv:2511.02827v3 Announce Type: replace Abstract: In an era shaped by Generative Artificial Intelligence for code generation and the rising adoption of Python-based Machine Learning systems (MLS), software quality has emerged as a major concern. As these systems grow in complexity and importance, a key obstacle lies in understanding exactly how specific code changes affect overall quality-a shortfall aggravated by the lack of quality assessment tools and a clear mapping between ML systems code changes and their quality effects. Although prior work has explored code changes in MLS, it mostly stops at what the changes are, leaving a gap in our knowledge of the relationship between code changes and the MLS quality. To address this gap, we conducted a large-scale empirical study of 3,340 ope

View PDF HTML (experimental)

Abstract:In an era shaped by Generative Artificial Intelligence for code generation and the rising adoption of Python-based Machine Learning systems (MLS), software quality has emerged as a major concern. As these systems grow in complexity and importance, a key obstacle lies in understanding exactly how specific code changes affect overall quality-a shortfall aggravated by the lack of quality assessment tools and a clear mapping between ML systems code changes and their quality effects. Although prior work has explored code changes in MLS, it mostly stops at what the changes are, leaving a gap in our knowledge of the relationship between code changes and the MLS quality. To address this gap, we conducted a large-scale empirical study of 3,340 open-source Python ML projects, encompassing more than 3.7 million commits and 2.7 trillion lines of code. We introduce PyQu, a novel tool that leverages low level software metrics to identify quality-enhancing commits with an average accuracy, precision, and recall of 0.84 and 0.85 of average F1 score. Using PyQu and a thematic analysis, we identified 61 code changes, each demonstrating a direct impact on enhancing software quality, and we classified them into 13 categories based on contextual characteristics. 41% of the changes are newly discovered by our study and have not been identified by state-of-the-art Python changes detection tools. Our work offers a vital foundation for researchers, practitioners, educators, and tool developers, advancing the quest for automated quality assessment and best practices in Python-based ML software.

Comments: Accepted for publication in the proceedings of IEEE/ACM 48th International Conference on Software Engineering (ICSE26)

Subjects:

Software Engineering (cs.SE)

Cite as: arXiv:2511.02827 [cs.SE]

(or arXiv:2511.02827v3 [cs.SE] for this version)

https://doi.org/10.48550/arXiv.2511.02827

arXiv-issued DOI via DataCite

Submission history

From: Mohamed Almukhtar [view email] [v1] Tue, 4 Nov 2025 18:55:19 UTC (515 KB) [v2] Wed, 7 Jan 2026 16:16:47 UTC (515 KB) [v3] Wed, 1 Apr 2026 01:00:55 UTC (515 KB)

Original source

arXiv cs.SE

https://arxiv.org/abs/2511.02827

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

announceopen-sourcemillion

ProductsLive

Choosing an AI Agent Orchestrator in 2026: A Practical Comparison

Running one AI coding agent is easy. Running three in parallel on the same codebase is where things get interesting — and where you need to make a tooling choice. There's no "best" orchestrator. There's the right one for your workflow. Here's an honest comparison of five approaches, with the tradeoffs I've seen after months of running multi-agent setups. The Options 1. Raw tmux Scripts What it is: Shell scripts that launch agents in tmux panes. DIY orchestration. Pros: Zero dependencies beyond tmux Full control over every detail No abstractions to fight You already know how it works Cons: No state management — you track everything manually No message routing between agents No test gating — agents declare "done" without verification Breaks when agents crash or hit context limits You become

Dev.to AI

6m43 minutes ago

Analyst NewsLive

AGI Won’t Automate Most Jobs—Economist Reveals Why They’re Not Worth It

The Hidden Truth About AGI and Jobs: It’s Not Automation—It’s Economics For years, the narrative around artificial intelligence has been dominated by visions of a jobless future, where machines take over every conceivable role. But what if the real story is far more complex? A new paper by one of the world’s leading economists of automation is flipping the script, offering a perspective that is both unexpectedly reassuring and deeply unsettling. Key Takeaways: The assumption that AGI will automate most jobs is being challenged by leading economic research. The paper suggests that many jobs won’t be automated—not because they’re irreplaceable, but because they’re simply not worth the cost of automation. This insight reframes the AI debate, shifting focus from technological capability to eco

Dev.to AI

2m40 minutes ago

Analyst NewsLive

Why Nobody Is Testing AI Agent Security at Scale — And How Swarm Simulation Could Change That

The Gap Nobody Talks About We test individual AI agents. We scan skills for malicious patterns. We probe for prompt injection. But here is the question nobody is asking: What happens when you put 1,000 diverse AI agents in a room and inject 5 adversarial ones? Every security tool I know tests agents in isolation. One agent, one probe, one result. But real-world agent ecosystems are not isolated. They are communities — agents with different personalities, trust levels, expertise, and memory — interacting, influencing each other, and making collective decisions. The threat model is not "can this agent be compromised?" It is "how fast does a compromise propagate through an ecosystem?" What Swarm Simulation Already Does Swarm intelligence simulation is exploding in market research. Tools like

Dev.to AI

3m30 minutes ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 233 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research PapersFresh

[R] Looking for arXiv cs.LG endorser, inference monitoring using information geometry

Hi r/MachineLearning , I’m looking for an arXiv endorser in cs.LG for a paper on inference-time distribution shift detection for deployed LLMs. The core idea: instead of monitoring input embeddings (which is what existing tools do), we monitor the statistical manifold of the model’s output distributions using Fisher-Rao geodesic distance. We then run adaptive CUSUM (Page-Hinkley) on the resulting z-score stream to catch slow drift that per-request spike detection misses entirely. The methodology is grounded in published work on information geometry (Figshare, DOIs available). We’ve validated the signal on real OpenAI API logprobs, CUSUM caught gradual domain drift in 7 steps with zero false alarms during warmup, while spike detection missed it entirely. If anyone with cs.LG endorsement is

Reddit r/MachineLearning

1mabout 3 hours ago

Research PapersRecent

How AI Is Re‑Architecting Industrial Procurement and Supply Chain - Emerj Artificial Intelligence Research

How AI Is Re‑Architecting Industrial Procurement and Supply Chain Emerj Artificial Intelligence Research

GNews AI manufacturing

1m1 day ago

Research PapersFresh

Towards end-to-end automation of AI research

Article URL: https://www.nature.com/articles/s41586-026-10265-5 Comments URL: https://news.ycombinator.com/item?id=47645696 Points: 3 # Comments: 0

Hacker News AI Top

1mabout 3 hours ago

Research PapersFresh

[D] KDD Review Discussion

KDD 2026 (Feb Cycle) reviews will release today (4-April AoE), This thread is open to discuss about reviews and importantly celebrate successful reviews. Let us all remember that review system is noisy and we all suffer from it and this doesn't define our research impact. Let's all prioritise reviews which enhance our papers. Feel free to discuss your experiences submitted by /u/BomsDrag [link] [comments]

Reddit r/MachineLearning

1mabout 10 hours ago