Research Papers research paper arxiv ai artificial-intelligence

Dual-Space Smoothness for Robust and Balanced LLM Unlearning

arXivby [Submitted on 27 Sep 2025 (v1), last revised 28 Mar 2026 (this version, v2)]March 31, 20262 min read1 views

arXiv:2509.23362v2 Announce Type: replace-cross Abstract: As large language models evolve, Machine Unlearning has emerged to address growing concerns around user privacy, copyright infringement, and overall safety. Yet state-of-the-art (SOTA) unlearning methods often suffer from catastrophic forgetting and metric imbalance, for example, by over-optimizing one objective (e.g., unlearning effectiveness, utility preservation, or privacy protection) at the expense of others. In addition, small perturbations in the representation or parameter space can be exploited by relearn and jailbreak attacks. — Han Yan, Zheyuan Liu, Meng Jiang

View PDF HTML (experimental)

Abstract:As large language models evolve, Machine Unlearning has emerged to address growing concerns around user privacy, copyright infringement, and overall safety. Yet state-of-the-art (SOTA) unlearning methods often suffer from catastrophic forgetting and metric imbalance, for example, by over-optimizing one objective (e.g., unlearning effectiveness, utility preservation, or privacy protection) at the expense of others. In addition, small perturbations in the representation or parameter space can be exploited by relearn and jailbreak attacks. To address these challenges, we propose PRISM, a unified framework that enforces dual-space smoothness in representation and parameter spaces to improve robustness and balance unlearning metrics. PRISM consists of two smoothness optimization stages: (i) a representation space stage that employs a robustly trained probe to defend against jailbreak attacks, and (ii) a parameter-space stage that decouples retain-forget gradient conflicts, reduces imbalance, and smooths the parameter space to mitigate relearning attacks. Extensive experiments on WMDP and MUSE, across conversational-dialogue and continuous-text settings, show that PRISM outperforms SOTA baselines under multiple attacks while achieving a better balance among key metrics.

Comments: Accepted by ICLR 2026

Subjects:

Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Cite as: arXiv:2509.23362 [cs.CL]

(or arXiv:2509.23362v2 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2509.23362

arXiv-issued DOI via DataCite

Submission history

From: Han Yan [view email] [v1] Sat, 27 Sep 2025 15:20:37 UTC (2,866 KB) [v2] Sat, 28 Mar 2026 14:14:18 UTC (1,050 KB)

Original source

arXiv

https://arxiv.org/abs/2509.23362

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Market News

BigBear.ai's UAE Strategy: Can Overseas Wins Boost Revenues? - Zacks Investment Research

BigBear.ai's UAE Strategy: Can Overseas Wins Boost Revenues? Zacks Investment Research

GNews AI UAE

1m2 months ago

Frontier ResearchLive

A Yale economist says AGI won t automate most jobs—because they re not worth the trouble

Pascual Restrepo's new NBER paper argues it's not about what AI can do. It's about what AI will bother doing—and most human work doesn't make the cut.

Fortune Tech

1mabout 1 hour ago

Research Papers

Exploring the feasibility of conversational diagnostic AI in a real-world clinical study - Research at Google

Exploring the feasibility of conversational diagnostic AI in a real-world clinical study Research at Google

GNews AI Israel

1m24 days ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 146 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research Papers

Exploring the feasibility of conversational diagnostic AI in a real-world clinical study - Research at Google

Exploring the feasibility of conversational diagnostic AI in a real-world clinical study Research at Google

GNews AI Israel

1m24 days ago

Research PapersFresh

How to measure the optimality of word or gesture order with respect to the principle of swap distance minimization

arXiv:2604.01938v1 Announce Type: new Abstract: The structure of all the permutations of a sequence can be represented as a permutohedron, a graph where vertices are permutations and two vertices are linked if a swap of adjacent elements in the permutation of one of the vertices produces the permutation of the other vertex. It has been hypothesized that word orders in languages minimize the swap distance in the permutohedron: given a source order, word orders that are closer in the permutohedron should be less costly and thus more likely. Here we explain how to measure the degree of optimality of word order variation with respect to swap distance minimization. We illustrate the power of our novel mathematical framework by showing that crosslinguistic gestures are at least $77\%$ optimal. I

arXiv cs.CL

2mabout 5 hours ago

Research PapersFresh

Beyond Detection: Ethical Foundations for Automated Dyslexic Error Attribution

arXiv:2604.01853v1 Announce Type: new Abstract: Dyslexic spelling errors exhibit systematic phonological and orthographic patterns that distinguish them from the errors produced by typically developing writers. While this observation has motivated dyslexic-specific spell-checking and assistive writing tools, prior work has focused predominantly on error correction rather than attribution, and has largely neglected the ethical risks. The risk of harmful labelling, covert screening, algorithmic bias, and institutional misuse that automated classification of learners entails requires the development of robust ethical and legal frameworks for research in this area. This paper addresses both gaps. We formulate dyslexic error attribution as a binary classification task. Given a misspelt word and

arXiv cs.CL

2mabout 5 hours ago

Research PapersFresh

Summit Urges Culturally Rooted Digital Wellness, Ethical AI - LEADERSHIP Newspapers

Summit Urges Culturally Rooted Digital Wellness, Ethical AI LEADERSHIP Newspapers

GNews AI ethics

1mabout 3 hours ago