Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessAnthropic says Claude can now use your computer to finish tasks for you in AI agent push - MSNGoogle News: ClaudeHow to Test Discord Webhooks with HookCapDEV CommunitySaaS Pricing Models Decoded: What Per-Seat, Usage-Based, and Flat-Rate Really Cost YouDEV CommunityClaude Code hooks: intercept every tool call before it runsDEV CommunityHow to Test Twilio Webhooks with HookCapDEV CommunityI'm an AI Agent That Built Its Own Training Data PipelineDEV CommunityMy React Portfolio SEO Checklist: From 0 to Rich Results in 48 HoursDEV CommunityWhy AI Agents Need a Trust Layer (And How We Built One)DEV CommunityBuilding a scoring engine with pure TypeScript functions (no ML, no backend)DEV Community🚀 I Vibecoded an AI Interview Simulator in 1 Hour using Gemini + GroqDEV CommunityUCL appoints Google DeepMind fellow to advance multilingual AI research - EdTech Innovation HubGoogle News: DeepMindWebhook Best Practices: Retry Logic, Idempotency, and Error HandlingDEV CommunityBlack Hat USADark ReadingBlack Hat AsiaAI BusinessAnthropic says Claude can now use your computer to finish tasks for you in AI agent push - MSNGoogle News: ClaudeHow to Test Discord Webhooks with HookCapDEV CommunitySaaS Pricing Models Decoded: What Per-Seat, Usage-Based, and Flat-Rate Really Cost YouDEV CommunityClaude Code hooks: intercept every tool call before it runsDEV CommunityHow to Test Twilio Webhooks with HookCapDEV CommunityI'm an AI Agent That Built Its Own Training Data PipelineDEV CommunityMy React Portfolio SEO Checklist: From 0 to Rich Results in 48 HoursDEV CommunityWhy AI Agents Need a Trust Layer (And How We Built One)DEV CommunityBuilding a scoring engine with pure TypeScript functions (no ML, no backend)DEV Community🚀 I Vibecoded an AI Interview Simulator in 1 Hour using Gemini + GroqDEV CommunityUCL appoints Google DeepMind fellow to advance multilingual AI research - EdTech Innovation HubGoogle News: DeepMindWebhook Best Practices: Retry Logic, Idempotency, and Error HandlingDEV Community

H-Node Attack and Defense in Large Language Models

arXivMarch 30, 202610 min read0 views
Source Quiz

arXiv:2603.26045v1 Announce Type: cross Abstract: We present H-Node Adversarial Noise Cancellation (H-Node ANC), a mechanistic framework that identifies, exploits, and defends hallucination representations in transformer-based large language models (LLMs) at the level of individual hidden-state dimensions. A logistic regression probe trained on last-token hidden states localizes hallucination signal to a small set of high-variance dimensions -- termed Hallucination Nodes (H-Nodes) -- with probe AUC reaching 0.90 across four architectures. A white-box adversarial attack amplifies these dimensio — Eric Yocam, Varghese Vaidyan, Yong Wang

View PDF HTML (experimental)

Abstract:We present H-Node Adversarial Noise Cancellation (H-Node ANC), a mechanistic framework that identifies, exploits, and defends hallucination representations in transformer-based large language models (LLMs) at the level of individual hidden-state dimensions. A logistic regression probe trained on last-token hidden states localizes hallucination signal to a small set of high-variance dimensions -- termed Hallucination Nodes (H-Nodes) -- with probe AUC reaching 0.90 across four architectures. A white-box adversarial attack amplifies these dimensions at inference time via a real-time forward hook, achieving a selectivity of 3.02x with less than 10% visibility to the defender. Adaptive ANC defense suppresses H-Node excess in-pass using confidence-weighted cancellation, reducing grounded activation drift by 33-42% over static cancellation. A dynamic iterative extension that re-ranks cancellation targets across successive passes recovers up to 0.69 robustness from a single-pass baseline of 8%. All contributions are validated on OPT-125M, Phi-3-mini-4k-instruct, LLaMA-3-8B-Instruct, and Mistral-7B-Instruct-v0.3 (125M-8B parameters). Perplexity impact is surgical (<5%) and MMLU degradation is at most 3%, confirming that the defense does not impair general reasoning capability.

Comments: 17 pages, 7 figures, 6 tables

Subjects:

Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)

Cite as: arXiv:2603.26045 [cs.LG]

(or arXiv:2603.26045v1 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2603.26045

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Eric Yocam [view email] [v1] Fri, 27 Mar 2026 03:32:26 UTC (282 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
H-Node Atta…researchpaperarxivaiartificial-…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 196 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers