Semantic Identity Compression: Zero-Error Laws, Rate-Distortion, and Neurosymbolic Necessity

arXiv cs.PLby Tristan SimasApril 1, 20262 min read0 views

arXiv:2601.14252v5 Announce Type: replace-cross Abstract: Symbolic systems operate over precise identities: variables denote specific objects, pointers target precise memory locations, and database keys refer to singular records. Neural embeddings generalize by compressing away semantic detail, but this compression creates collision ambiguity: multiple distinct entities can share the same representation value. We characterize how much additional information must be supplied to recover precise identity from such representations. The answer is controlled by a single combinatorial object: the collision-fiber geometry of the representation map $\pi$. Let $A_{\pi}=\max_u |\pi^{-1}(u)|$ be the largest collision fiber. We prove a tight fixed-length converse $L \ge \log_2 A_{\pi}$, an exact finite

View PDF HTML (experimental)

Abstract:Symbolic systems operate over precise identities: variables denote specific objects, pointers target precise memory locations, and database keys refer to singular records. Neural embeddings generalize by compressing away semantic detail, but this compression creates collision ambiguity: multiple distinct entities can share the same representation value. We characterize how much additional information must be supplied to recover precise identity from such representations. The answer is controlled by a single combinatorial object: the collision-fiber geometry of the representation map $\pi$. Let $A_{\pi}=\max_u |\pi^{-1}(u)|$ be the largest collision fiber. We prove a tight fixed-length converse $L \ge \log_2 A_{\pi}$, an exact finite-block scaling law, a pointwise adaptive budget $\lceil \log_2 |\pi^{-1}(u)|\rceil$, and an exact fiberwise rate-distortion law for arbitrary finite sources via recoverable-mass decomposition across representation fibers. The uniform single-block formula $D^\star(L)=\max(0,1-2^L/a)$ appears as a closed-form special case when all mass lies on one collision block, where $a = A_{\pi}$ is the collision block size. The same fiber geometry determines query complexity and canonical structure for distinguishing families. Because this residual ambiguity is structural rather than representation-specific, symbolic identity mechanisms (handles, keys, pointers, nominal tags) are the necessary system-level complement to any non-injective semantic representation. All main results are machine-checked in Lean 4.

Comments: 13 pages, 2 tables. Lean 4 artifact and supplementary material available at this https URL

Subjects:

Information Theory (cs.IT); Programming Languages (cs.PL)

MSC classes: 94A15, 94A24, 05B35

ACM classes: E.4; G.2.1

Cite as: arXiv:2601.14252 [cs.IT]

(or arXiv:2601.14252v5 [cs.IT] for this version)

https://doi.org/10.48550/arXiv.2601.14252

arXiv-issued DOI via DataCite

Submission history

From: Tristan Simas [view email] [v1] Tue, 20 Jan 2026 18:58:51 UTC (177 KB) [v2] Thu, 22 Jan 2026 01:11:26 UTC (177 KB) [v3] Fri, 20 Feb 2026 21:52:16 UTC (196 KB) [v4] Mon, 16 Mar 2026 23:06:17 UTC (373 KB) [v5] Tue, 31 Mar 2026 15:29:35 UTC (383 KB)

Original source

arXiv cs.PL

https://arxiv.org/abs/2601.14252

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

announcearxiv

ProductsLive

Anthropic cuts off the ability to use Claude subscriptions with OpenClaw and third-party AI agents

Are you a subscriber to Anthropic's Claude Pro ($20 monthly) or Max ($100-$200 monthly) plans and use its Claude AI models and products to power third-party AI agents like OpenClaw ? If so, you're in for an unpleasant surprise. Anthropic announced a few hours ago that starting tomorrow, Saturday, April 4, 2026, at 12 pm PT/3 pm ET, it will no longer be possible for those Claude subscribers to use their subscriptions to hook Anthropic's Claude models up to third-party agentic tools, citing the strain such usage was placing on Anthropic's compute and engineering resources, and desire to serve a wide number of users reliably. "We’ve been working hard to meet the increase in demand for Claude, and our subscriptions weren't built for the usage patterns of these third-pa

VentureBeat AI

6mabout 1 hour ago

ProductsFresh

Karpathy shares 'LLM Knowledge Base' architecture that bypasses RAG with an evolving markdown library maintained by AI

AI vibe coders have yet another reason to thank Andrej Karpathy , the coiner of the term. The former Director of AI at Tesla and co-founder of OpenAI, now running his own independent AI project, recently posted on X describing a "LLM Knowledge Bases" approach he's using to manage various topics of research interest. By building a persistent, LLM-maintained record of his projects, Karpathy is solving the core frustration of "stateless" AI development: the dreaded context-limit reset. As anyone who has vibe coded can attest, hitting a usage limit or ending a session often feels like a lobotomy for your project. You’re forced to spend valuable tokens (and time) reconstructing context for the AI, hoping it "remembers" the architectural nuances you just established. Karpathy proposes somet

VentureBeat AI

9mabout 5 hours ago

ReleasesFresh

Google's biggest AI announcements this week: Gemma 4, AI Studio and more - Moneycontrol.com

Google's biggest AI announcements this week: Gemma 4, AI Studio and more Moneycontrol.com

GNews AI Gemma

1mabout 3 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 210 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Releases

ReleasesLive

Using AI as your therapist?

We live in a world where we are scared of sharing our true feelings with others. Continue reading on Medium »

Medium AI

1m43 minutes ago

ReleasesFresh

How 2 downed jets show a critical vulnerability for the US as Iran war rages on

One crew member from a US fighter jet shot down over Iran has been rescued by US forces, multiple news outlets reported on Friday, citing two US officials, while a second crew member remains missing. A separate US aircraft was also hit near the Strait of Hormuz, though its pilot was rescued safely, according to the reports. Iran on Friday claimed to have shot down an American fighter jet, releasing photos of apparent wreckage of an F-15E, while the United States reportedly launched a...

SCMP Tech (Asia AI)

1mabout 11 hours ago

ReleasesLive

Stop Writing Rules for AI Agents

Stop Writing Rules for AI Agents Every developer building AI agents makes the same mistake: they write rules. "Don't do X." "Always do Y." Rules feel like control. But they are an illusion. Why Rules Fail Rules are static. Agents operate in dynamic environments. The moment reality diverges from your rule set it breaks. Behavior Over Rules Instead of telling your agent what NOT to do, design what it IS: The system prompt (identity, not restrictions) The tools available (capability shapes behavior) The feedback loops (what gets rewarded) The memory architecture A Real Example I built FORGE, an autonomous AI agent running 24/7. Early versions had dozens of rules. Every rule created a new edge case. The fix: stop writing rules, start designing behavior. FORGE's identity: orchestrator, not exec

Dev.to AI

1m43 minutes ago

ReleasesFresh

ciflow/trunk/179082: Update

[ghstack-poisoned]

PyTorch Releases

1mabout 4 hours ago