Attention Misses Visual Risk: Risk-Adaptive Steering for Multimodal Safety Alignment
arXiv:2510.13698v3 Announce Type: replace Abstract: Even modern AI models often remain vulnerable to multimodal queries in which harmful intent is embedded in images. A widely used approach for safety alignment is training with extensive multimodal safety datasets, but the costs of data curation and training are often prohibitive. To mitigate these costs, inference-time alignment has recently been explored, but they often lack generalizability across diverse multimodal jailbreaks and still incur notable overhead due to extra forward passes for response refinement or heavy pre-deployment calibr — Jonghyun Park, Minhyuk Seo, Chaewon Yeo, Jonghyun Choi
View PDF HTML (experimental)
Abstract:Even modern AI models often remain vulnerable to multimodal queries in which harmful intent is embedded in images. A widely used approach for safety alignment is training with extensive multimodal safety datasets, but the costs of data curation and training are often prohibitive. To mitigate these costs, inference-time alignment has recently been explored, but they often lack generalizability across diverse multimodal jailbreaks and still incur notable overhead due to extra forward passes for response refinement or heavy pre-deployment calibration procedures. Here, we identify insufficient visual attention to safety-critical image regions as one of the key causes of multimodal safety failures. Building on this insight, we propose Multimodal Risk-Adaptive Steering (MoRAS), which enhances safety-critical visual attention via concise visual contexts for accurate multimodal risk assessment. This risk signal enables risk-adaptive steering for direct refusals, reducing inference overhead while remaining generalizable across diverse multimodal jailbreaks. Notably, MoRAS requires only a small calibration set to estimate multimodal risk, substantially reducing pre-deployment overhead. We conduct various empirical validations across multiple benchmarks and MLLM backbones, and observe that the proposed MoRAS consistently mitigates jailbreaks, preserves utility, and reduces computational overhead compared to state-of-the-art inference-time defenses.
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Cite as: arXiv:2510.13698 [cs.CV]
(or arXiv:2510.13698v3 [cs.CV] for this version)
https://doi.org/10.48550/arXiv.2510.13698
arXiv-issued DOI via DataCite
Submission history
From: Jonghyun Park [view email] [v1] Wed, 15 Oct 2025 15:57:17 UTC (10,671 KB) [v2] Mon, 3 Nov 2025 02:09:36 UTC (10,671 KB) [v3] Fri, 27 Mar 2026 01:19:31 UTC (5,575 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxiv
New Rowhammer attack can grant kernel-level control on Nvidia workstation GPUs
A study from researchers at UNC Chapel Hill and Georgia Tech shows that GDDR6-based Rowhammer attacks can grant kernel-level access to Linux systems equipped with GPUs based on Nvidia's Ampere and Ada Lovelace architectures. The vulnerability appears significantly more severe than what was outlined in a paper last year. Read Entire Article
![[D] ICML Reviewer Acknowledgement](https://d2xsxph8kpxj0f.cloudfront.net/310419663032563854/konzwo8nGf8Z4uZsMefwMr/default-img-matrix-rain-CvjLrWJiXfamUnvj5xT9J9.webp)
[D] ICML Reviewer Acknowledgement
Hi, I'm a little confused about ICML discussion period Does the period for reviewer acknowledging responses have already ended? One of the four reviewers did not present any answer to a paper of mine. Do you know if the reviewer can still change their score before April 7th? There is a reviewer comment that I will answer on Monday. Will the reviewer be able to update the score after seeing my answer? Thanks! submitted by /u/Massive_Horror9038 [link] [comments]

Considerations for growing the pie
Recently some friends and I were comparing growing the pie interventions to an increasing our friends' share of the pie intervention, and at first we mostly missed some general considerations against the latter type. 1. Decision-theoretic considerations The world is full of people with different values working towards their own ends; each of them can choose to use their resources to increase the total size of the pie or to increase their share of the pie. All of them would significantly prefer a world in which resources were used to increase the size of the pie, and this leads to a number [of] compelling justifications for each individual to cooperate. . . . by increasing the size of the pie we create a world which is better for people on average, and from behind the veil of ignorance we s
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers

New Rowhammer attack can grant kernel-level control on Nvidia workstation GPUs
A study from researchers at UNC Chapel Hill and Georgia Tech shows that GDDR6-based Rowhammer attacks can grant kernel-level access to Linux systems equipped with GPUs based on Nvidia's Ampere and Ada Lovelace architectures. The vulnerability appears significantly more severe than what was outlined in a paper last year. Read Entire Article
![[D] ICML Reviewer Acknowledgement](https://d2xsxph8kpxj0f.cloudfront.net/310419663032563854/konzwo8nGf8Z4uZsMefwMr/default-img-matrix-rain-CvjLrWJiXfamUnvj5xT9J9.webp)
[D] ICML Reviewer Acknowledgement
Hi, I'm a little confused about ICML discussion period Does the period for reviewer acknowledging responses have already ended? One of the four reviewers did not present any answer to a paper of mine. Do you know if the reviewer can still change their score before April 7th? There is a reviewer comment that I will answer on Monday. Will the reviewer be able to update the score after seeing my answer? Thanks! submitted by /u/Massive_Horror9038 [link] [comments]

Considerations for growing the pie
Recently some friends and I were comparing growing the pie interventions to an increasing our friends' share of the pie intervention, and at first we mostly missed some general considerations against the latter type. 1. Decision-theoretic considerations The world is full of people with different values working towards their own ends; each of them can choose to use their resources to increase the total size of the pie or to increase their share of the pie. All of them would significantly prefer a world in which resources were used to increase the size of the pie, and this leads to a number [of] compelling justifications for each individual to cooperate. . . . by increasing the size of the pie we create a world which is better for people on average, and from behind the veil of ignorance we s



Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!