Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessMeta paused its work with AI training startup Mercor after a data breachBusiness Insider[R], 31 MILLIONS High frequency data, Light GBM worked perfectlyReddit r/MachineLearningConsidering NeurIPS submission [D]Reddit r/MachineLearningAutomate Your Handyman Pricing: The True Hourly Cost AI ForgetsDev.to AIScience Is Not a Reading ProblemMedium AIHow Antigravity AI Changed My React Workflow (In Ways I Didn’t Expect)Medium AIToken Usage Is the New RAM UsageDev.to AIStop Writing Rules for AI AgentsDev.to AIUsing AI as your therapist?Medium AIDigital Marketing Trends and the Role of AI in Modern Business StrategiesMedium AI7 evals that catch “helpful” AI before it harms user trustMedium AIThe AI Pen: Collaborating With Artificial Intelligence Without Losing Your Unique VoiceMedium AIBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessMeta paused its work with AI training startup Mercor after a data breachBusiness Insider[R], 31 MILLIONS High frequency data, Light GBM worked perfectlyReddit r/MachineLearningConsidering NeurIPS submission [D]Reddit r/MachineLearningAutomate Your Handyman Pricing: The True Hourly Cost AI ForgetsDev.to AIScience Is Not a Reading ProblemMedium AIHow Antigravity AI Changed My React Workflow (In Ways I Didn’t Expect)Medium AIToken Usage Is the New RAM UsageDev.to AIStop Writing Rules for AI AgentsDev.to AIUsing AI as your therapist?Medium AIDigital Marketing Trends and the Role of AI in Modern Business StrategiesMedium AI7 evals that catch “helpful” AI before it harms user trustMedium AIThe AI Pen: Collaborating With Artificial Intelligence Without Losing Your Unique VoiceMedium AI
AI NEWS HUBbyEIGENVECTOREigenvector

Semantic Identity Compression: Zero-Error Laws, Rate-Distortion, and Neurosymbolic Necessity

arXiv cs.PLby Tristan SimasApril 1, 20262 min read0 views
Source Quiz

arXiv:2601.14252v5 Announce Type: replace-cross Abstract: Symbolic systems operate over precise identities: variables denote specific objects, pointers target precise memory locations, and database keys refer to singular records. Neural embeddings generalize by compressing away semantic detail, but this compression creates collision ambiguity: multiple distinct entities can share the same representation value. We characterize how much additional information must be supplied to recover precise identity from such representations. The answer is controlled by a single combinatorial object: the collision-fiber geometry of the representation map $\pi$. Let $A_{\pi}=\max_u |\pi^{-1}(u)|$ be the largest collision fiber. We prove a tight fixed-length converse $L \ge \log_2 A_{\pi}$, an exact finite

View PDF HTML (experimental)

Abstract:Symbolic systems operate over precise identities: variables denote specific objects, pointers target precise memory locations, and database keys refer to singular records. Neural embeddings generalize by compressing away semantic detail, but this compression creates collision ambiguity: multiple distinct entities can share the same representation value. We characterize how much additional information must be supplied to recover precise identity from such representations. The answer is controlled by a single combinatorial object: the collision-fiber geometry of the representation map $\pi$. Let $A_{\pi}=\max_u |\pi^{-1}(u)|$ be the largest collision fiber. We prove a tight fixed-length converse $L \ge \log_2 A_{\pi}$, an exact finite-block scaling law, a pointwise adaptive budget $\lceil \log_2 |\pi^{-1}(u)|\rceil$, and an exact fiberwise rate-distortion law for arbitrary finite sources via recoverable-mass decomposition across representation fibers. The uniform single-block formula $D^\star(L)=\max(0,1-2^L/a)$ appears as a closed-form special case when all mass lies on one collision block, where $a = A_{\pi}$ is the collision block size. The same fiber geometry determines query complexity and canonical structure for distinguishing families. Because this residual ambiguity is structural rather than representation-specific, symbolic identity mechanisms (handles, keys, pointers, nominal tags) are the necessary system-level complement to any non-injective semantic representation. All main results are machine-checked in Lean 4.

Comments: 13 pages, 2 tables. Lean 4 artifact and supplementary material available at this https URL

Subjects:

Information Theory (cs.IT); Programming Languages (cs.PL)

MSC classes: 94A15, 94A24, 05B35

ACM classes: E.4; G.2.1

Cite as: arXiv:2601.14252 [cs.IT]

(or arXiv:2601.14252v5 [cs.IT] for this version)

https://doi.org/10.48550/arXiv.2601.14252

arXiv-issued DOI via DataCite

Submission history

From: Tristan Simas [view email] [v1] Tue, 20 Jan 2026 18:58:51 UTC (177 KB) [v2] Thu, 22 Jan 2026 01:11:26 UTC (177 KB) [v3] Fri, 20 Feb 2026 21:52:16 UTC (196 KB) [v4] Mon, 16 Mar 2026 23:06:17 UTC (373 KB) [v5] Tue, 31 Mar 2026 15:29:35 UTC (383 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

announcearxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Semantic Id…announcearxivarXiv cs.PL

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 210 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Releases