Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessWhen a Conversation with AI Became ContinuityMedium AIAI for Business: How Consultants Turn Automation Into Competitive AdvantageMedium AIAnthropic drops 400 million in shares on an eight-month-old AI pharma startup with fewer than ten employeesThe DecoderPrismML debuts energy-sipping 1-bit LLM in bid to free AI from the cloudThe Register AI/MLThe Invisible Broken Clock in AI Video Generation - HackerNoonGNews AI video[D] Budget Machine Learning HardwareReddit r/MachineLearningA Yale economist says AGI won t automate most jobs—because they re not worth the troubleFortune TechAnthropic cuts off third-party tools like OpenClaw for Claude subscribers, citing unsustainable demandThe DecoderDesktop Canary v2.1.48-canary.31LobeChat ReleasesQwen 3.5 397B vs Qwen 3.6-PlusReddit r/LocalLLaMAThe Invisible Broken Clock in AI Video GenerationHackernoon AIMean field sequence: an introductionLessWrong AIBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessWhen a Conversation with AI Became ContinuityMedium AIAI for Business: How Consultants Turn Automation Into Competitive AdvantageMedium AIAnthropic drops 400 million in shares on an eight-month-old AI pharma startup with fewer than ten employeesThe DecoderPrismML debuts energy-sipping 1-bit LLM in bid to free AI from the cloudThe Register AI/MLThe Invisible Broken Clock in AI Video Generation - HackerNoonGNews AI video[D] Budget Machine Learning HardwareReddit r/MachineLearningA Yale economist says AGI won t automate most jobs—because they re not worth the troubleFortune TechAnthropic cuts off third-party tools like OpenClaw for Claude subscribers, citing unsustainable demandThe DecoderDesktop Canary v2.1.48-canary.31LobeChat ReleasesQwen 3.5 397B vs Qwen 3.6-PlusReddit r/LocalLLaMAThe Invisible Broken Clock in AI Video GenerationHackernoon AIMean field sequence: an introductionLessWrong AI
AI NEWS HUBbyEIGENVECTOREigenvector

Dual-Space Smoothness for Robust and Balanced LLM Unlearning

arXivby [Submitted on 27 Sep 2025 (v1), last revised 28 Mar 2026 (this version, v2)]March 31, 20262 min read1 views
Source Quiz

arXiv:2509.23362v2 Announce Type: replace-cross Abstract: As large language models evolve, Machine Unlearning has emerged to address growing concerns around user privacy, copyright infringement, and overall safety. Yet state-of-the-art (SOTA) unlearning methods often suffer from catastrophic forgetting and metric imbalance, for example, by over-optimizing one objective (e.g., unlearning effectiveness, utility preservation, or privacy protection) at the expense of others. In addition, small perturbations in the representation or parameter space can be exploited by relearn and jailbreak attacks. — Han Yan, Zheyuan Liu, Meng Jiang

View PDF HTML (experimental)

Abstract:As large language models evolve, Machine Unlearning has emerged to address growing concerns around user privacy, copyright infringement, and overall safety. Yet state-of-the-art (SOTA) unlearning methods often suffer from catastrophic forgetting and metric imbalance, for example, by over-optimizing one objective (e.g., unlearning effectiveness, utility preservation, or privacy protection) at the expense of others. In addition, small perturbations in the representation or parameter space can be exploited by relearn and jailbreak attacks. To address these challenges, we propose PRISM, a unified framework that enforces dual-space smoothness in representation and parameter spaces to improve robustness and balance unlearning metrics. PRISM consists of two smoothness optimization stages: (i) a representation space stage that employs a robustly trained probe to defend against jailbreak attacks, and (ii) a parameter-space stage that decouples retain-forget gradient conflicts, reduces imbalance, and smooths the parameter space to mitigate relearning attacks. Extensive experiments on WMDP and MUSE, across conversational-dialogue and continuous-text settings, show that PRISM outperforms SOTA baselines under multiple attacks while achieving a better balance among key metrics.

Comments: Accepted by ICLR 2026

Subjects:

Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Cite as: arXiv:2509.23362 [cs.CL]

(or arXiv:2509.23362v2 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2509.23362

arXiv-issued DOI via DataCite

Submission history

From: Han Yan [view email] [v1] Sat, 27 Sep 2025 15:20:37 UTC (2,866 KB) [v2] Sat, 28 Mar 2026 14:14:18 UTC (1,050 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Dual-Space …researchpaperarxivaiartificial-…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 146 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers