Models model language model announce review arxiv fine-tuning

PAFT: Preservation Aware Fine-Tuning for Minimal-Edit Program Repair

arXiv cs.SEby Boyang Yang, Zijian Cai, Shunfu Jin, Haoye TainApril 6, 20261 min read0 views

arXiv:2604.03113v1 Announce Type: new Abstract: Large language models (LLMs) are effective for automated program repair, but plausible patches that pass the full test suite often rewrite more code than necessary, increasing review and maintenance costs. This over-editing is common because most bugs are localized, while standard supervised fine-tuning provides no explicit signal about which tokens should be preserved and which should be changed. We propose PAFT, a preservation-aware fine-tuning method for minimal-edit program repair. PAFT derives token-level preservation signals by aligning buggy and fixed code, combines them with full-sequence masking, and applies an edit-difficulty curriculum. Across Defects4J and HumanEval-Java, PAFT improves pass@1 by up to 65.6% over standard supervise

View PDF HTML (experimental)

Abstract:Large language models (LLMs) are effective for automated program repair, but plausible patches that pass the full test suite often rewrite more code than necessary, increasing review and maintenance costs. This over-editing is common because most bugs are localized, while standard supervised fine-tuning provides no explicit signal about which tokens should be preserved and which should be changed. We propose PAFT, a preservation-aware fine-tuning method for minimal-edit program repair. PAFT derives token-level preservation signals by aligning buggy and fixed code, combines them with full-sequence masking, and applies an edit-difficulty curriculum. Across Defects4J and HumanEval-Java, PAFT improves pass@1 by up to 65.6% over standard supervised fine-tuning (StdFT) while reducing average edit distance (AED) by up to 32.6%. On Defects4J with DeepSeek-Coder-6.7B, PAFT also outperforms AdaPatcher, a strong preference-based repair baseline, improving pass@1 from 5.9% to 10.1% while reducing median AED from 61.0 to 42.0. Overall, PAFT preserves stable context and concentrates edits on faulty regions, yielding smaller, more localized, plausible patches without inference-time search, reranking, or post-processing.

Subjects:

Software Engineering (cs.SE)

Cite as: arXiv:2604.03113 [cs.SE]

(or arXiv:2604.03113v1 [cs.SE] for this version)

https://doi.org/10.48550/arXiv.2604.03113

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Boyang Yang [view email] [v1] Fri, 3 Apr 2026 15:35:47 UTC (808 KB)

Original source

arXiv cs.SE

https://arxiv.org/abs/2604.03113

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modellanguage modelannounce

AI ToolsFresh

AI is Driving Cognitive Surrender Whilst Influencing Confidence Levels

AI has rapidly transformed how people access information and make decisions. Tools like ChatGPT offer speed, convenience and support for everyday tasks, however growing evidence suggested overreliance on AI may influence how we think, reason and evaluate information. The research from the University of Pennsylvania’s Wharton School of Business has reviewed 1,300 subjects use of [ ] The post AI is Driving Cognitive Surrender Whilst Influencing Confidence Levels appeared first on DIGIT .

Digit.fyi

1mabout 2 hours ago

Self-Evolving AILive

Apps Are Dead? | Agentic AI, Gemma 4 #1 Model & Microsoft vs OpenAI Begins

AI YouTube Channel 9

1mabout 2 hours ago

Research Papers

SocioEval: A Template-Based Framework for Evaluating Socioeconomic Status Bias in Foundation Models

As Large Language Models (LLMs) increasingly power decision-making systems across critical domains, understanding and mitigating their biases becomes essential for responsible AI deployment. Although bias assessment frameworks have proliferated for attributes such as race and gender, socioeconomic status bias remains significantly underexplored despite its widespread implications in the real world. We introduce SocioEval, a template-based framework for systematically evaluating socioeconomic bias in foundation models through decision-making tasks. Our hierarchical framework encompasses 8 theme — Divyanshu Kumar, Ishita Gupta, Nitin Aravind Birur

arXiv

1m4 days ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 264 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Models

ModelsRecent

Benchmarks and methods for 3D medical image retrieval

Nature Machine Learning

1mabout 15 hours ago

ModelsLive

This Windows 98-inspired launcher on Apple App Store was designed with Claude AI as an experiment - The Financial Express

This Windows 98-inspired launcher on Apple App Store was designed with Claude AI as an experiment The Financial Express

GNews AI Apple

1mabout 2 hours ago

ModelsFresh

Empirical Evaluation of Structured Synthetic Data Privacy Metrics: Novel experimental framework

arXiv:2512.16284v2 Announce Type: replace Abstract: Synthetic data generation is gaining traction as a privacy enhancing technology (PET). When properly generated, synthetic data preserve the analytic utility of real data while avoiding the retention of information that would allow the identification of specific individuals. However, the concept of data privacy remains elusive, making it challenging for practitioners to evaluate and benchmark the degree of privacy protection offered by synthetic data. In this paper, we propose a framework to empirically assess the efficacy of tabular synthetic data privacy quantification methods through controlled, deliberate risk insertion. To demonstrate this framework, we survey existing approaches to synthetic data privacy quantification and the relate

arXiv cs.CR

1mabout 11 hours ago

ModelsLive

A technical, 100% local writeup on how I replicated and then surpassed the Secret Detection model from Wiz (and the challenges along the way) - including labeling an entire dataset with local AI

Hey everybody, I have a strong interest in offloading work to small, specialized models that I can parallelize - this lets me scale work significantly (plus, I am less dependent on proprietary APIs) Some time ago, I saw a blog post from Wiz about fine-tuning Llama 3.2-1B for secret detection in code. They got 86% Precision and 82% Recall. I wanted to see if I can replicate (or beat) those numbers using purely local AI and produce a local specialized model. After a couple of weekends of trying it out I managed to get a Llama 3.2-1B hitting 88% Precision and 84.4% Recall simultaneously! I also benchmarked Qwen 3.5-2B and 4B - expectedly, they outperformed Llama 1B at the cost of more VRAM and longer inference time. I’ve put together a full write-up with the training stats, examples, and a st

Reddit r/LocalLLaMA

2mabout 1 hour ago