CivicShield: A Cross-Domain Defense-in-Depth Framework for Securing Government-Facing AI Chatbots Against Multi-Turn Adversarial Attacks
arXiv:2603.29062v1 Announce Type: new Abstract: LLM-based chatbots in government services face critical security gaps. Multi-turn adversarial attacks achieve over 90% success against current defenses, and single-layer guardrails are bypassed with similar rates. We present CivicShield, a cross-domain defense-in-depth framework for government-facing AI chatbots. Drawing on network security, formal verification, biological immune systems, aviation safety, and zero-trust cryptography, CivicShield introduces seven defense layers: (1) zero-trust foundation with capability-based access control, (2) perimeter input validation, (3) semantic firewall with intent classification, (4) conversation state machine with safety invariants, (5) behavioral anomaly detection, (6) multi-model consensus verifica
View PDF HTML (experimental)
Abstract:LLM-based chatbots in government services face critical security gaps. Multi-turn adversarial attacks achieve over 90% success against current defenses, and single-layer guardrails are bypassed with similar rates. We present CivicShield, a cross-domain defense-in-depth framework for government-facing AI chatbots. Drawing on network security, formal verification, biological immune systems, aviation safety, and zero-trust cryptography, CivicShield introduces seven defense layers: (1) zero-trust foundation with capability-based access control, (2) perimeter input validation, (3) semantic firewall with intent classification, (4) conversation state machine with safety invariants, (5) behavioral anomaly detection, (6) multi-model consensus verification, and (7) graduated human-in-the-loop escalation. We present a formal threat model covering 8 multi-turn attack families, map the framework to NIST SP 800-53 controls across 14 families, and evaluate using ablation analysis. Theoretical analysis shows layered defenses reduce attack probability by 1-2 orders of magnitude versus single-layer approaches. Simulation against 1,436 scenarios including HarmBench (416), JailbreakBench (200), and XSTest (450) achieves 72.9% combined detection [69.5-76.0% CI] with 2.9% effective false positive rate after graduated response, while maintaining 100% detection of multi-turn crescendo and slow-drift attacks. The honest drop on real benchmarks versus author-generated scenarios (71.2% vs 76.7% on HarmBench, 47.0% vs 70.0% on JailbreakBench) validates independent evaluation importance. CivicShield addresses an open gap at the intersection of AI safety, government compliance, and practical deployment.
Comments: 25 pages, 17 tables, 2 figures
Subjects:
Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
Cite as: arXiv:2603.29062 [cs.CR]
(or arXiv:2603.29062v1 [cs.CR] for this version)
https://doi.org/10.48550/arXiv.2603.29062
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: KrishnaSaiReddy Patil [view email] [v1] Mon, 30 Mar 2026 22:58:04 UTC (49 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modelbenchmarkannounce
AI News This Week: April 05, 2026 - A New Era of Rapid Development and Multimodal Intelligence
AI News This Week: April 05, 2026 - A New Era of Rapid Development and Multimodal Intelligence Published: April 05, 2026 | Reading time: ~10 min This week has been nothing short of phenomenal for the AI community, with breakthroughs and announcements that promise to revolutionize the way we develop and interact with artificial intelligence. From building personal AI agents in a matter of hours to the unveiling of cutting-edge multimodal intelligence models, the pace of innovation is not just accelerating - it's transforming the landscape of what's possible. Whether you're a seasoned developer or just starting to explore the world of AI, this week's news is a must-know, offering insights into how technology is making AI more accessible, powerful, and integrated into our daily lives. Buildin

Untitled
You have 50 models. Each trained on different data, different domain, different patient population. You want them to get smarter from each other. So you do the obvious thing — you set up a central aggregator. Round 1: gradients in, averaged weights out. Works fine at N=5. At N=20 you notice the coordinator is sweating. At N=50, round latency has tripled, your smallest sites are timing out, and your bandwidth budget is gone. You tune the hell out of it. Same ceiling. This is not a configuration problem. This is an architecture ceiling. The math underneath it guarantees you hit a wall. A different architecture changes the math. The combinatorics you are not harvesting Start with a fact that has nothing to do with any particular framework: N agents have exactly N(N-1)/2 unique pairwise relati

This Week in AI: April 05, 2026 - Revolutionizing Development with Personal Agents and Multimodal Intelligence
This Week in AI: April 05, 2026 - Revolutionizing Development with Personal Agents and Multimodal Intelligence Published: April 05, 2026 | Reading time: ~10 min This week has been incredibly exciting for AI enthusiasts and developers alike. With advancements in personal AI agents, multimodal intelligence, and compact models for enterprise documents, the field is rapidly evolving. One of the most significant trends is the ability to build and deploy useful AI prototypes in a remarkably short amount of time. This shift is largely due to innovative tools and ecosystems that are making AI more accessible to individual builders. In this article, we'll dive into the latest AI news, exploring what these developments mean for developers and the broader implications for the industry. Building a Per
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Models

research-llm-apis 2026-04-04
Release: research-llm-apis 2026-04-04 I'm working on a major change to my LLM Python library and CLI tool. LLM provides an abstraction layer over hundreds of different LLMs from dozens of different vendors thanks to its plugin system, and some of those vendors have grown new features over the past year which LLM's abstraction layer can't handle, such as server-side tool execution. To help design that new abstraction layer I had Claude Code read through the Python client libraries for Anthropic, OpenAI, Gemini and Mistral and use those to help craft curl commands to access the raw JSON for both streaming and non-streaming modes across a range of different scenarios. Both the scripts and the captured outputs now live in this new repo. Tags: llm , apis , json , llms

scan-for-secrets 0.1
Release: scan-for-secrets 0.1 I like publishing transcripts of local Claude Code sessions using my claude-code-transcripts tool but I'm often paranoid that one of my API keys or similar secrets might inadvertently be revealed in the detailed log files. I built this new Python scanning tool to help reassure me. You can feed it secrets and have it scan for them in a specified directory: uvx scan-for-secrets $OPENAI_API_KEY -d logs-to-publish/ If you leave off the -d it defaults to the current directory. It doesn't just scan for the literal secrets - it also scans for common encodings of those secrets e.g. backslash or JSON escaping, as described in the README . If you have a set of secrets you always want to protect you can list commands to echo them in a ~/.scan-for-secrets.conf.sh file. Mi

Harvard Proved Emotions Don't Make AI Smarter — That's Exactly Why You Need Soul Spec
The Myth Dies Hard "I'll tip you $200 if you get this right." "This is really important to my career." "I'm so frustrated — please help me." If you've spent any time on AI Twitter, you've seen people swear that emotional prompting makes LLMs perform better. A few anecdotal successes became gospel. The technique spread. Now Harvard has the data. It doesn't work. What the Research Actually Shows A team from Harvard and Bryn Mawr ( arXiv:2604.02236 , April 2026) ran a systematic study across 6 benchmarks, 6 emotions, 3 models (Qwen3-14B, Llama 3.3-70B, DeepSeek-V3.2), and multiple intensity levels. Finding 1: Fixed emotional prefixes have negligible effect. Adding "I'm angry about this" or "This makes me so happy" before your prompt? Across GSM8K, BIG-Bench Hard, MedQA, BoolQ, OpenBookQA, and

Self-Improving Python Scripts with LLMs: My Journey
As a developer, I've always been fascinated by the idea of self-improving code. Recently, I've been experimenting with using Large Language Models (LLMs) to make my Python scripts more autonomous and efficient. In this article, I'll share my experience with integrating LLMs into my Python workflow and how it has revolutionized my development process. I'll also provide a step-by-step guide on how to get started with making your own Python scripts improve themselves using LLMs. My journey with LLMs began when I stumbled upon the llm_groq module, which allows you to interact with LLMs using a simple and intuitive API. I was impressed by the accuracy and speed of the model, and I quickly realized that it could be used to improve my Python scripts. The first step in making my scripts self-impro


Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!