Research Papers research paper arxiv ai artificial-intelligence

The Geometry of Harmful Intent: Training-Free Anomaly Detection via Angular Deviation in LLM Residual Streams

arXivMarch 31, 202610 min read0 views

arXiv:2603.27412v1 Announce Type: cross Abstract: We present LatentBiopsy, a training-free method for detecting harmful prompts by analysing the geometry of residual-stream activations in large language models. Given 200 safe normative prompts, LatentBiopsy computes the leading principal component of their activations at a target layer and characterises new prompts by their radial deviation angle $\theta$ from this reference direction. The anomaly score is the negative log-likelihood of $\theta$ under a Gaussian fit to the normative distribution, flagging deviations symmetrically regardless of — Isaac Llorente-Saguer

View PDF HTML (experimental)

Abstract:We present LatentBiopsy, a training-free method for detecting harmful prompts by analysing the geometry of residual-stream activations in large language models. Given 200 safe normative prompts, LatentBiopsy computes the leading principal component of their activations at a target layer and characterises new prompts by their radial deviation angle $\theta$ from this reference direction. The anomaly score is the negative log-likelihood of $\theta$ under a Gaussian fit to the normative distribution, flagging deviations symmetrically regardless of orientation. No harmful examples are required for training. We evaluate two complete model triplets from the Qwen3.5-0.8B and Qwen2.5-0.5B families: base, instruction-tuned, and \emph{abliterated} (refusal direction surgically removed via orthogonalisation). Across all six variants, LatentBiopsy achieves AUROC $\geq$0.937 for harmful-vs-normative detection and AUROC = 1.000 for discriminating harmful from benign-aggressive prompts (XSTest), with sub-millisecond per-query overhead. Three empirical findings emerge. First, geometry survives refusal ablation: both abliterated variants achieve AUROC at most 0.015 below their instruction-tuned counterparts, establishing a geometric dissociation between harmful-intent representation and the downstream generative refusal mechanism. Second, harmful prompts exhibit a near-degenerate angular distribution ($\sigma_\theta \approx 0.03$ rad), an order of magnitude tighter than the normative distribution ($\sigma_\theta \approx 0.27$ rad), preserved across all alignment stages including abliteration. Third, the two families exhibit opposite ring orientations at the same depth: harmful prompts occupy the outer ring in Qwen3.5-0.8B but the inner ring in Qwen2.5-0.5B, directly motivating the direction-agnostic scoring rule.

Comments: 20 pages, 10 figures, 3 tables. Training-free harmful-prompt detector via angular deviation in LLM residual streams. Evaluated on six Qwen variants (base / instruct / abliterated). Achieves AUROC over 0.937 (harmful-vs-normative) and 1.000 (harmful-vs-benign-aggressive) with no harmful training data

Subjects:

Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

ACM classes: I.2.7

Cite as: arXiv:2603.27412 [cs.LG]

(or arXiv:2603.27412v1 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2603.27412

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Isaac Llorente-Saguer [view email] [v1] Sat, 28 Mar 2026 21:19:58 UTC (8,826 KB)

Original source

arXiv

https://arxiv.org/abs/2603.27412

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Countries

Uganda To Host Climate Change, Artificial Intelligence Summit, Sept 5-6 - Independent Newspaper Nigeria

<a href="https://news.google.com/rss/articles/CBMimAFBVV95cUxNcnBtdldJUERlX0dzOTJEY2sybEc2ZjZSbUtiLWIzUUhJbkQ1N3BwUWlCcV95YmZNSmFGbFQ1enE5VWJlY0JBWDhlSENlNEFNMmM5Q0hrM080V3Q2eUF3cmpkeFBXRS01YXBpRUI4Uk5KOVY5bjFaRm1GNmVudGUtNTFmVDlBMDIyNGVGaF9WTkdHTDMxY1BZcw?oc=5" target="_blank">Uganda To Host Climate Change, Artificial Intelligence Summit, Sept 5-6</a> Independent Newspaper Nigeria

Google News - AI Uganda

1m15 days ago

Research Papers

AI could transform research assessment — and some academics are worried - Nature

<a href="https://news.google.com/rss/articles/CBMiX0FVX3lxTE12VmJ3THU1WmwzcENmWFJqTVRfclJGVkhzTG9Kcm9mTm1VZnJsV2IyZGwtc21EWnZRSkRfSXM3SDRlOVZnUlhpVm9VUEMtRWRRYmNDVU1kdHg5NllvSERj?oc=5" target="_blank">AI could transform research assessment — and some academics are worried</a> Nature

GNews AI UK

1mabout 2 months ago

Releases

Instrument maker Roland launches AI melody generator powered by research from Sony Computer Science Laboratories - Music Business Worldwide

<a href="https://news.google.com/rss/articles/CBMi5wFBVV95cUxQaW5rU25RUmwtd01xd0xKRVlDWEx6b204MFYzM3FHQlBXeE5wYzhYczVGdm1HOS03VjVURE02YzBGcE8yYTRzbk1IX3AtVlJmeUVaazlVQWduNnYxN05mamVYVGNmNGdFOVRxbTRhV3hqamhfY1JNSTdsTTB1U2Nic2lNcnd2YVpFMUY5YmlyWVZFY1FQTGd3dndCS3R6Zmt3QWVnWm14WFdVeUNFd0Y0a1FQU1ZLT2psSVRxeWQ0X0FaSGhxQU5UbjZBT1JGWDZERmRRV1c1VEU0RkNkZF9HLWZyXzFxUmc?oc=5" target="_blank">Instrument maker Roland launches AI melody generator powered by research from Sony Computer Science Laboratories</a> Music Business Worldwide

GNews AI music

1m14 days ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 96 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research Papers

AI could transform research assessment — and some academics are worried - Nature

GNews AI UK

1mabout 2 months ago

Research PapersLive

Watch Out Bitcoin: Cryptography-Breaking Quantum Computers May Be Closer Than Expected, Says Caltech

Research suggests fault-tolerant quantum machines could arrive sooner than expected, posing a threat to Bitcoin and Ethereum cryptography.

Decrypt AI

1mabout 1 hour ago

Research Papers

As AI-Generated Music Advances, Humans Still Lead in Creativity, CMU Research Finds

<img loading="lazy" src="https://www.cmu.edu/news/sites/default/files/styles/listings_desktop_1x_/public/2026-01/251104A_WTM_AI-Creativity-Music102.jpg.webp?itok=uEc2ayOO" width="900" height="508" alt="A woman with long black hair is seated on the right opposite a computer screen with a small piano keyboard and computer keyboard in front of her on a desk, where a man next to her with glasses and wavy black hair operates the mouse and talks to her."> AI can write songs, but still has a way to go before matching the creativity of tunes made by people, according to Carnegie Mellon University research.

Carnegie Mellon News

1m2 months ago

Research PapersFresh

Precision Proactivity: Measuring Cognitive Load in Real-World AI-Assisted Work

Article URL: https://arxiv.org/abs/2505.10742 Comments URL: https://news.ycombinator.com/item?id=47595100 Points: 1 # Comments: 0

Hacker News AI Top

2mabout 2 hours ago