Live
Black Hat USADark ReadingBlack Hat AsiaAI Business40 Days of Building HarshAI: What I Learned About AI AutomationDEV CommunityMoving fast with agents without losing comprehensionDEV CommunityCharlie's Chocolate Factory Paperclip — Ep.1DEV CommunityAI-Generated APIs Keep Shipping Wildcard CORS. Here's the Fix.DEV CommunityHarshAI: I Built a Zapier Killer in 40 Days (Open Source)DEV CommunitySanta Augmentcode Intent Ep.5DEV CommunityBuilding a Production-Ready Composable AI Agent System with CopilotKit and LangGraphDEV CommunityI Built 3 APIs for Turkey’s Used-Car Market with ApifyDEV CommunitySemantic Search with TypeScript: Using embed() and embedMany() for Vector SearchDEV CommunityVoice AI Agents: Building Speech-to-Speech Apps with TypeScriptDEV CommunityRightNow AI Releases AutoKernel: An Open-Source Framework that Applies an Autonomous Agent Loop to GPU Kernel Optimization for Arbitrary PyTorch ModelsMarkTechPostGet 30K more context using Q8 mmproj with Gemma 4Reddit r/LocalLLaMABlack Hat USADark ReadingBlack Hat AsiaAI Business40 Days of Building HarshAI: What I Learned About AI AutomationDEV CommunityMoving fast with agents without losing comprehensionDEV CommunityCharlie's Chocolate Factory Paperclip — Ep.1DEV CommunityAI-Generated APIs Keep Shipping Wildcard CORS. Here's the Fix.DEV CommunityHarshAI: I Built a Zapier Killer in 40 Days (Open Source)DEV CommunitySanta Augmentcode Intent Ep.5DEV CommunityBuilding a Production-Ready Composable AI Agent System with CopilotKit and LangGraphDEV CommunityI Built 3 APIs for Turkey’s Used-Car Market with ApifyDEV CommunitySemantic Search with TypeScript: Using embed() and embedMany() for Vector SearchDEV CommunityVoice AI Agents: Building Speech-to-Speech Apps with TypeScriptDEV CommunityRightNow AI Releases AutoKernel: An Open-Source Framework that Applies an Autonomous Agent Loop to GPU Kernel Optimization for Arbitrary PyTorch ModelsMarkTechPostGet 30K more context using Q8 mmproj with Gemma 4Reddit r/LocalLLaMA
AI NEWS HUBbyEIGENVECTOREigenvector

EnsembleSHAP: Faithful and Certifiably Robust Attribution for Random Subspace Method

arXiv cs.CRby Yanting Wang, Jinyuan JiaApril 1, 20261 min read0 views
Source Quiz

arXiv:2603.30034v1 Announce Type: new Abstract: Random subspace method has wide security applications such as providing certified defenses against adversarial and backdoor attacks, and building robustly aligned LLM against jailbreaking attacks. However, the explanation of random subspace method lacks sufficient exploration. Existing state-of-the-art feature attribution methods, such as Shapley value and LIME, are computationally impractical and lacks security guarantee when applied to random subspace method. In this work, we propose EnsembleSHAP, an intrinsically faithful and secure feature attribution for random subspace method that reuses its computational byproducts. Specifically, our feature attribution method is 1) computationally efficient, 2) maintains essential properties of effect

View PDF

Abstract:Random subspace method has wide security applications such as providing certified defenses against adversarial and backdoor attacks, and building robustly aligned LLM against jailbreaking attacks. However, the explanation of random subspace method lacks sufficient exploration. Existing state-of-the-art feature attribution methods, such as Shapley value and LIME, are computationally impractical and lacks security guarantee when applied to random subspace method. In this work, we propose EnsembleSHAP, an intrinsically faithful and secure feature attribution for random subspace method that reuses its computational byproducts. Specifically, our feature attribution method is 1) computationally efficient, 2) maintains essential properties of effective feature attribution (such as local accuracy), and 3) offers guaranteed protection against privacy-preserving attacks on feature attribution methods. To the best of our knowledge, this is the first work to establish provable robustness against explanation-preserving attacks. We also perform comprehensive evaluations for our explanation's effectiveness when faced with different empirical attacks, including backdoor attacks, adversarial attacks, and jailbreak attacks. The code is at this https URL. WARNING: This document may include content that could be considered harmful.

Comments: Published at ICLR 2026

Subjects:

Cryptography and Security (cs.CR)

Cite as: arXiv:2603.30034 [cs.CR]

(or arXiv:2603.30034v1 [cs.CR] for this version)

https://doi.org/10.48550/arXiv.2603.30034

arXiv-issued DOI via DataCite

Submission history

From: Yanting Wang [view email] [v1] Tue, 31 Mar 2026 17:30:52 UTC (5,012 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

announceproductapplication

Knowledge Map

Knowledge Map
TopicsEntitiesSource
EnsembleSHA…announceproductapplicationfeaturevaluationarxivarXiv cs.CR

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 295 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Products