Activation Steering with a Feedback Controller
arXiv:2510.04309v2 Announce Type: replace Abstract: Controlling the behaviors of large language models (LLM) is fundamental to their safety alignment and reliable deployment. However, existing steering methods are primarily driven by empirical insights and lack theoretical performance guarantees. In this work, we develop a control-theoretic foundation for activation steering by showing that popular steering methods correspond to the proportional (P) controllers, with the steering vector serving as the feedback signal. Building on this finding, we propose Proportional-Integral-Derivative (PID) — Dung V. Nguyen, Hieu M. Vu, Nhi Y. Pham, Lei Zhang, Tan M. Nguyen
View PDF
Abstract:Controlling the behaviors of large language models (LLM) is fundamental to their safety alignment and reliable deployment. However, existing steering methods are primarily driven by empirical insights and lack theoretical performance guarantees. In this work, we develop a control-theoretic foundation for activation steering by showing that popular steering methods correspond to the proportional (P) controllers, with the steering vector serving as the feedback signal. Building on this finding, we propose Proportional-Integral-Derivative (PID) Steering, a principled framework that leverages the full PID controller for activation steering in LLMs. The proportional (P) term aligns activations with target semantic directions, the integral (I) term accumulates errors to enforce persistent corrections across layers, and the derivative (D) term mitigates overshoot by counteracting rapid activation changes. This closed-loop design yields interpretable error dynamics and connects activation steering to classical stability guarantees in control theory. Moreover, PID Steering is lightweight, modular, and readily integrates with state-of-the-art steering methods. Extensive experiments across multiple LLM families and benchmarks demonstrate that PID Steering consistently outperforms existing approaches, achieving more robust and reliable behavioral control.
Comments: 10 pages in the main text. ICLR2026 Poster
Subjects:
Machine Learning (cs.LG)
Cite as: arXiv:2510.04309 [cs.LG]
(or arXiv:2510.04309v2 [cs.LG] for this version)
https://doi.org/10.48550/arXiv.2510.04309
arXiv-issued DOI via DataCite
Submission history
From: Nguyen Viet Dung [view email] [v1] Sun, 5 Oct 2025 18:05:28 UTC (38,145 KB) [v2] Fri, 27 Mar 2026 06:33:50 UTC (21,106 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxiv
DenseNet Paper Walkthrough: All Connected
When we try to train a very deep neural network model, one issue that we might encounter is the vanishing gradient problem. This is essentially a problem where the weight update of a model during training slows down or even stops, hence causing the model not to improve. When a network is very deep, the [ ] The post DenseNet Paper Walkthrough: All Connected appeared first on Towards Data Science .
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers

How Leg Stiffness Affects Energy Economy in Hopping
arXiv:2501.03971v2 Announce Type: replace Abstract: In the fields of robotics and biomechanics, the integration of elastic elements such as springs and tendons in legged systems has long been recognized for enabling energy-efficient locomotion. Yet, a significant challenge persists: designing a robotic leg that perform consistently across diverse operating conditions, especially varying average forward speeds. It remains unclear whether, for such a range of operating conditions, the stiffness of the elastic elements needs to be varied or if a similar performance can be obtained by changing the motion and actuation while keeping the stiffness fixed. This work explores the influence of the leg stiffness on the energy efficiency of a monopedal robot through an extensive parametric study of it





Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!