Shape and Substance: Dual-Layer Side-Channel Attacks on Local Vision-Language Models
On-device Vision-Language Models (VLMs) promise data privacy via local execution. However, we show that the architectural shift toward Dynamic High-Resolution preprocessing (e.g., AnyRes) introduces an inherent algorithmic side-channel. Unlike static models, dynamic preprocessing decomposes images into a variable number of patches based on their aspect ratio, creating workload-dependent inputs. We demonstrate a dual-layer attack framework against local VLMs. In Tier 1, an unprivileged attacker can exploit significant execution-time variations using standard unprivileged OS metrics to reliably — Eyal Hadad, Mordechai Guri
View PDF HTML (experimental)
Abstract:On-device Vision-Language Models (VLMs) promise data privacy via local execution. However, we show that the architectural shift toward Dynamic High-Resolution preprocessing (e.g., AnyRes) introduces an inherent algorithmic side-channel. Unlike static models, dynamic preprocessing decomposes images into a variable number of patches based on their aspect ratio, creating workload-dependent inputs. We demonstrate a dual-layer attack framework against local VLMs. In Tier 1, an unprivileged attacker can exploit significant execution-time variations using standard unprivileged OS metrics to reliably fingerprint the input's geometry. In Tier 2, by profiling Last-Level Cache (LLC) contention, the attacker can resolve semantic ambiguity within identical geometries, distinguishing between visually dense (e.g., medical X-rays) and sparse (e.g., text documents) content. By evaluating state-of-the-art models such as LLaVA-NeXT and Qwen2-VL, we show that combining these signals enables reliable inference of privacy-sensitive contexts. Finally, we analyze the security engineering trade-offs of mitigating this vulnerability, reveal substantial performance overhead with constant-work padding, and propose practical design recommendations for secure Edge AI deployments.
Comments: 13 pages, 8 figures
Subjects:
Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as: arXiv:2603.25403 [cs.CR]
(or arXiv:2603.25403v1 [cs.CR] for this version)
https://doi.org/10.48550/arXiv.2603.25403
arXiv-issued DOI via DataCite
Submission history
From: Eyal Hadad [view email] [v1] Thu, 26 Mar 2026 12:53:49 UTC (5,693 KB) [v2] Fri, 27 Mar 2026 15:01:28 UTC (5,694 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxiv
Seeking arXiv cs.AI endorsement — neuroscience-inspired memory architecture for AI agents
Hi everyone, I’m an independent researcher (Zensation AI) seeking endorsement for my first arXiv submission in cs.AI. Paper: “ZenBrain: A Neuroscience-Inspired 7-Layer Memory Architecture for Autonomous AI Systems” Summary: ZenBrain is the first AI memory system grounded in cognitive neuroscience. It implements 7 memory layers (working, short-term, episodic, semantic, procedural, core, cross-context) with 12 algorithms including Hebbian learning, FSRS spaced repetition, sleep-time consolidation (Stickgold & Walker 2013), and Bayesian confidence propagation. Prior art: Published as defensive publication on TDCommons (dpubs_series/9683) and archived on Zenodo (DOI: 10.5281/zenodo.19353663). Open-source npm packages with 9,000+ tests. Why this matters: Recent surveys (arxiv:2603.07670) identi
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers

Seeking arXiv cs.AI endorsement — neuroscience-inspired memory architecture for AI agents
Hi everyone, I’m an independent researcher (Zensation AI) seeking endorsement for my first arXiv submission in cs.AI. Paper: “ZenBrain: A Neuroscience-Inspired 7-Layer Memory Architecture for Autonomous AI Systems” Summary: ZenBrain is the first AI memory system grounded in cognitive neuroscience. It implements 7 memory layers (working, short-term, episodic, semantic, procedural, core, cross-context) with 12 algorithms including Hebbian learning, FSRS spaced repetition, sleep-time consolidation (Stickgold & Walker 2013), and Bayesian confidence propagation. Prior art: Published as defensive publication on TDCommons (dpubs_series/9683) and archived on Zenodo (DOI: 10.5281/zenodo.19353663). Open-source npm packages with 9,000+ tests. Why this matters: Recent surveys (arxiv:2603.07670) identi




Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!