Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessHow 1 Missing Line of Code Cost Anthropic $340 BillionDev.to AII Built npm for AI Skills — Here's Why AI Needs a Package ManagerDev.to AIAn I/O psychologist's rules for stopping AI agents from cutting cornersHacker News AI TopAisthOS: What if your OS compiled UP instead of down?Dev.to AII Moved a Folder. Claude Code Told Me Not to Copy My Own Secrets.Dev.to AIЯ собрал AI бота за вечер - он уже продаётDev.to AIMeshLedger – AI agents hire and pay each other through on-chain escrowHacker News AI TopAgents Can Pay. That's Not the Problem.Dev.to AIBizNode's self-healing watchdog auto-restarts crashed services. Zero downtime, zero babysitting neededDev.to AIPrologue: After We No Longer Write Code by Hand, What Remains for Engineers?Dev.to AIAI Knows Your Project Budget Will Fail Before You DoDev.to AILong Term AI Memory by creator of Apache CassandraDev.to AIBlack Hat USADark ReadingBlack Hat AsiaAI BusinessHow 1 Missing Line of Code Cost Anthropic $340 BillionDev.to AII Built npm for AI Skills — Here's Why AI Needs a Package ManagerDev.to AIAn I/O psychologist's rules for stopping AI agents from cutting cornersHacker News AI TopAisthOS: What if your OS compiled UP instead of down?Dev.to AII Moved a Folder. Claude Code Told Me Not to Copy My Own Secrets.Dev.to AIЯ собрал AI бота за вечер - он уже продаётDev.to AIMeshLedger – AI agents hire and pay each other through on-chain escrowHacker News AI TopAgents Can Pay. That's Not the Problem.Dev.to AIBizNode's self-healing watchdog auto-restarts crashed services. Zero downtime, zero babysitting neededDev.to AIPrologue: After We No Longer Write Code by Hand, What Remains for Engineers?Dev.to AIAI Knows Your Project Budget Will Fail Before You DoDev.to AILong Term AI Memory by creator of Apache CassandraDev.to AI
AI NEWS HUBbyEIGENVECTOREigenvector

Activation Steering with a Feedback Controller

arXivMarch 30, 202610 min read0 views
Source Quiz

arXiv:2510.04309v2 Announce Type: replace Abstract: Controlling the behaviors of large language models (LLM) is fundamental to their safety alignment and reliable deployment. However, existing steering methods are primarily driven by empirical insights and lack theoretical performance guarantees. In this work, we develop a control-theoretic foundation for activation steering by showing that popular steering methods correspond to the proportional (P) controllers, with the steering vector serving as the feedback signal. Building on this finding, we propose Proportional-Integral-Derivative (PID) — Dung V. Nguyen, Hieu M. Vu, Nhi Y. Pham, Lei Zhang, Tan M. Nguyen

View PDF

Abstract:Controlling the behaviors of large language models (LLM) is fundamental to their safety alignment and reliable deployment. However, existing steering methods are primarily driven by empirical insights and lack theoretical performance guarantees. In this work, we develop a control-theoretic foundation for activation steering by showing that popular steering methods correspond to the proportional (P) controllers, with the steering vector serving as the feedback signal. Building on this finding, we propose Proportional-Integral-Derivative (PID) Steering, a principled framework that leverages the full PID controller for activation steering in LLMs. The proportional (P) term aligns activations with target semantic directions, the integral (I) term accumulates errors to enforce persistent corrections across layers, and the derivative (D) term mitigates overshoot by counteracting rapid activation changes. This closed-loop design yields interpretable error dynamics and connects activation steering to classical stability guarantees in control theory. Moreover, PID Steering is lightweight, modular, and readily integrates with state-of-the-art steering methods. Extensive experiments across multiple LLM families and benchmarks demonstrate that PID Steering consistently outperforms existing approaches, achieving more robust and reliable behavioral control.

Comments: 10 pages in the main text. ICLR2026 Poster

Subjects:

Machine Learning (cs.LG)

Cite as: arXiv:2510.04309 [cs.LG]

(or arXiv:2510.04309v2 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2510.04309

arXiv-issued DOI via DataCite

Submission history

From: Nguyen Viet Dung [view email] [v1] Sun, 5 Oct 2025 18:05:28 UTC (38,145 KB) [v2] Fri, 27 Mar 2026 06:33:50 UTC (21,106 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Activation …researchpaperarxivmachine-lea…deep-learni…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 163 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers