Privacy Guard & Token Parsimony by Prompt and Context Handling and LLM Routing
arXiv:2603.28972v1 Announce Type: new Abstract: The large-scale adoption of Large Language Models (LLMs) forces a trade-off between operational cost (OpEx) and data privacy. Current routing frameworks reduce costs but ignore prompt sensitivity, exposing users and institutions to leakage risks towards third-party cloud providers. We formalise the "Inseparability Paradigm": advanced context management intrinsically coincides with privacy management. We propose a local "Privacy Guard" -- a holistic contextual observer powered by an on-premise Small Language Model (SLM) -- that performs abstractive summarisation and Automatic Prompt Optimisation (APO) to decompose prompts into focused sub-tasks, re-routing high-risk queries to Zero-Trust or NDA-covered models. This dual mechanism simultaneousl
View PDF HTML (experimental)
Abstract:The large-scale adoption of Large Language Models (LLMs) forces a trade-off between operational cost (OpEx) and data privacy. Current routing frameworks reduce costs but ignore prompt sensitivity, exposing users and institutions to leakage risks towards third-party cloud providers. We formalise the "Inseparability Paradigm": advanced context management intrinsically coincides with privacy management. We propose a local "Privacy Guard" -- a holistic contextual observer powered by an on-premise Small Language Model (SLM) -- that performs abstractive summarisation and Automatic Prompt Optimisation (APO) to decompose prompts into focused sub-tasks, re-routing high-risk queries to Zero-Trust or NDA-covered models. This dual mechanism simultaneously eliminates sensitive inference vectors (Zero Leakage) and reduces cloud token payloads (OpEx Reduction). A LIFO-based context compacting mechanism further bounds working memory, limiting the emergent leakage surface. We validate the framework through a 2x2 benchmark (Lazy vs. Expert users; Personal vs. Institutional secrets) on a 1,000-sample dataset, achieving a 45% blended OpEx reduction, 100% redaction success on personal secrets, and -- via LLM-as-a-Judge evaluation -- an 85% preference rate for APO-compressed responses over raw baselines. Our results demonstrate that Token Parsimony and Zero Leakage are mathematically dual projections of the same contextual compression operator.
Subjects:
Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
Cite as: arXiv:2603.28972 [cs.CR]
(or arXiv:2603.28972v1 [cs.CR] for this version)
https://doi.org/10.48550/arXiv.2603.28972
arXiv-issued DOI via DataCite
Submission history
From: Alessio Langiu [view email] [v1] Mon, 30 Mar 2026 20:16:42 UTC (1,689 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modellanguage modelbenchmarkChatGPT vs. Claude: 7 real-life benchmarks that crown the 2026 AI Madness Champion - Tom's Guide
<a href="https://news.google.com/rss/articles/CBMirgFBVV95cUxOdUdicEhTLURxU0paMjZJNGNpZzFLVWhmUjZPVEJxazgzR0JsNDFSRUQwZUVIenVWWmo0Z0FJUGMxT0lrb1pyTnlVSDFzcUtmVWFpTHNKY2hUN2p5WWt3UFBxRFJ6bEIwWm9iSjVxMldQRE1HclM2WWtzUk5LMFBmLWRrWkZXcWVzS3N3M3h3d3Y1eWpsdFd4dG5jaGZKektFRXh4dHdWNVRhZDRMR1E?oc=5" target="_blank">ChatGPT vs. Claude: 7 real-life benchmarks that crown the 2026 AI Madness Champion</a> <font color="#6f6f6f">Tom's Guide</font>
OpenAI raises $122 billion at $852 billion valuation, closing largest funding round in history - The Cool Down
<a href="https://news.google.com/rss/articles/CBMijwFBVV95cUxOaDRKOEktVHZLVk10eE8zSlVmTkJXc0hXamcxYWlnQldZOFNtc3liakJjOHJBMENDTmFWS3dtNnpBeG9GZWhfVFZMamhMVHo0UmFsZHVqdUdSUmNmSmdJb2FhZ21sT1NpWi1MbmRRUXVaZWhhbU9Pbm50Q0tzYUxwWmdYNm1JMl80UHpSR1Mwcw?oc=5" target="_blank">OpenAI raises $122 billion at $852 billion valuation, closing largest funding round in history</a> <font color="#6f6f6f">The Cool Down</font>
Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models - WSJ
<a href="https://news.google.com/rss/articles/CBMiuANBVV95cUxNZUVvVzNvMVZBN09Hc3c5QWoxMUN4MEo1dDIzQjFXd1pkUlVWT25ZQ3pjOS1RMGdONzNrLUpfMHdLYTZUVm5YQkZXSDZxLU5qSkpCR0pYTTRURWtOR2JhMWMtMWVPX1dIRmlwN0lGTkFtVlVaRHdIeEFabFBTQU9FeWFiY3B1NERUNTc1X2N1djhGUk0yYVdUUjFzdGd0eFVuWVZ1TzN1akZCQmtuS3RzNWg3YVNraTFWd0ozX2hISTlUbElnQ29vUUx0WThlUVppU3E5LTNTSllDcXV0dUJUU29mWkVWZDV2OEtRbC1mRWYtWGZQOXUxZ1ppS1B6dThsSXM3V3lfNUxseERsUFVISW1HUXNUSVRhbUNLZ09sSjhDRnNGZXQtdGVsbGZCZXppRUdsdUhrWERta1V5Vm5GeWR6V1pvWE91TUFqQ0tfdU04TkpNTnIwSnd6RGNLWUo2ZVc1dHlzV1lWbi0xdWYtWXJCZ3RwTkdmS1R3Sk5RTl9iTFpTNXVFQktzYWZqYmNRZUpJMHBnRkNUb0tWMG1fV2JzckVhS1Zla0hoOE9Ra2hkdmZFT2lXWA?oc=5" target="_blank">Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models</a> <font color="#6f6f6f">WSJ</font>
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Models
ChatGPT vs. Claude: 7 real-life benchmarks that crown the 2026 AI Madness Champion - Tom's Guide
<a href="https://news.google.com/rss/articles/CBMirgFBVV95cUxOdUdicEhTLURxU0paMjZJNGNpZzFLVWhmUjZPVEJxazgzR0JsNDFSRUQwZUVIenVWWmo0Z0FJUGMxT0lrb1pyTnlVSDFzcUtmVWFpTHNKY2hUN2p5WWt3UFBxRFJ6bEIwWm9iSjVxMldQRE1HclM2WWtzUk5LMFBmLWRrWkZXcWVzS3N3M3h3d3Y1eWpsdFd4dG5jaGZKektFRXh4dHdWNVRhZDRMR1E?oc=5" target="_blank">ChatGPT vs. Claude: 7 real-life benchmarks that crown the 2026 AI Madness Champion</a> <font color="#6f6f6f">Tom's Guide</font>
Exclusive | The Sudden Fall of OpenAI’s Most Hyped Product Since ChatGPT - WSJ
<a href="https://news.google.com/rss/articles/CBMiogNBVV95cUxOZWxXTEVKWE9LdjlmTUtCRGt3YTc5aU51d2JfakIzdzdLUHlDNDYwLXl6WWlnMWhFdHM0Rm1ZeDVvRWo4MWEwN19TQTFmQXVnb285bFlLQ18zOTBsMFZOOTZrNkpzNHgya3gwRzRTSjNEWTd5aXd0RkZoMEVZa2xIeGhhRzBhZC02UG5NXzdlX0JUMXIzV3lBRm5FNExoclV4d2ZUY3pCWnoxOFczVDFuWEY3cFZFV2hQRzZhdU05bjZ0b252dHJfSnltUTZ6Z2czZ3RMUm1qQWI4R3lvdUVmOU9IREU4b2lvRi1aM1JhMEsxamtkMjBETkh6YzFrVDFmckQ0eENMNW5NVXNtWXNNNmYzSUE2TTY3T3RzTy1FNEFESWxwUTh2NmpDaFVXczIzZ3RXVk5CTjR4S0g1MllkWjdsN2VGREx0ZWdsZ1AyeFJPd20tSGdEZWhKR2c5alMzTDBrVVZheTZGRjRvRkFRV1duejdIMHhlRFFPLXY2UmIycHZaWExIN3J6djlOcy1KanRwSzg4NkJkaVBQd2MwemNn?oc=5" target="_blank">Exclusive | The Sudden Fall of OpenAI’s Most Hyped Product Since ChatGPT</a> <font color="#6f6f6f">WSJ</font>
Exclusive | The Sudden Fall of OpenAI’s Most Hyped Product Since ChatGPT - WSJ
<a href="https://news.google.com/rss/articles/CBMiogNBVV95cUxNUDBsOElrTS1fVG1zWTJYQnZPNXBLY3lLUUZYLVhrUHlmcUI2ak1fTWlKTTFzSy1jSWZFNk5kYnhpdmxyWWxFRnd0TmtJSnE2QXktRVFpRUI2U2pTdThTaDNwY3Rjcm15bDM2ZXpUSm1DOGNwY0dsSjhjUHpHSVhNZ002SndLbDBTa3Q3bDhIazZXTFUwMDVFbzl1R0ZMODRHaTZXZzVaTHMxRDVWR21KWnUwN1pNaVJUcXVWVHhsTDBtOS16d0NwWnh0aDhPZnJFb1Y5SFIxc3BvMXJWTWQ5RldVdmhTN1dkR1ozRWFIOG5DeGJEMWpianVsWGNZaTljUjZ2MERYWk82ZW1UTjVkYTc4a1JZd2FjcFFsT0RYbmdfSVQ5ZnlpR0R2cWcwRnU4NmI5WHFNdkRqell3d3luOEFYdFlPZXBrMlJRUWdDa0s1VDljcThxY1p2dTAtQWdCVmozbkRKdXpmQXlodEltRmsyOUtPS2swd0V4Zkg0UVRDb0JreEhaVU8yRjZYVTR2cUExVWhkdGs2b1RzNnpCcWVB?oc=5" target="_blank">Exclusive | The Sudden Fall of OpenAI’s Most Hyped Product Since ChatGPT</a> <font color="#6f6f6f">WSJ</font>
Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models - WSJ
<a href="https://news.google.com/rss/articles/CBMiuANBVV95cUxNZUVvVzNvMVZBN09Hc3c5QWoxMUN4MEo1dDIzQjFXd1pkUlVWT25ZQ3pjOS1RMGdONzNrLUpfMHdLYTZUVm5YQkZXSDZxLU5qSkpCR0pYTTRURWtOR2JhMWMtMWVPX1dIRmlwN0lGTkFtVlVaRHdIeEFabFBTQU9FeWFiY3B1NERUNTc1X2N1djhGUk0yYVdUUjFzdGd0eFVuWVZ1TzN1akZCQmtuS3RzNWg3YVNraTFWd0ozX2hISTlUbElnQ29vUUx0WThlUVppU3E5LTNTSllDcXV0dUJUU29mWkVWZDV2OEtRbC1mRWYtWGZQOXUxZ1ppS1B6dThsSXM3V3lfNUxseERsUFVISW1HUXNUSVRhbUNLZ09sSjhDRnNGZXQtdGVsbGZCZXppRUdsdUhrWERta1V5Vm5GeWR6V1pvWE91TUFqQ0tfdU04TkpNTnIwSnd6RGNLWUo2ZVc1dHlzV1lWbi0xdWYtWXJCZ3RwTkdmS1R3Sk5RTl9iTFpTNXVFQktzYWZqYmNRZUpJMHBnRkNUb0tWMG1fV2JzckVhS1Zla0hoOE9Ra2hkdmZFT2lXWA?oc=5" target="_blank">Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models</a> <font color="#6f6f6f">WSJ</font>
Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!