Models model billion study agentic agent arxiv

Scaling Agentic Memory to 5 Billion Vectors via Binary Quantization and Dynamic Wavelet Matrices

discuss.huggingface.coby AscaniusApril 4, 20262 min read2 views

In a study, a new “dynamic wavelet matrix” was used as a vector database, where the memory grows only with log(σ) instead of with n. I considered building a KNN model with a huge memory, capable of holding, for example, 5 billion vectors. First, the words in the context window are converted into an embedding using deberta-v3-small. This is a fast encoder that also takes the position of the tokens into account (disentangled attention) and is responsible for the context in the model. The embedding is then converted into a bit sequence using binary quantization, where dimensions greater than 0 are converted to 1 and otherwise to 0. The advantage is that bit sequences are compressible and are entered into the dynamic wavelet matrix, where the memory grows only with log(σ). A response token is

Could not retrieve the full article text.

Read on discuss.huggingface.co →

Original source

discuss.huggingface.co

https://discuss.huggingface.co/t/scaling-agentic-memory-to-5-billion-vectors-via-binary-quantization-and-dynamic-wavelet-matrices/174951

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modelbillionstudy

ProductsLive

Every agent trust proposal is building the wrong thing

I've spent weeks reading through GitHub issues across A2A, MCP, OWASP, CrewAI, LangChain, AutoGen, W3C, AWS, and about a dozen other repos. The pattern is the same everywhere: someone opens a thread about agent trust, and within 50 comments there are 5 separate proposals for 5 separate systems that don't compose. Identity registry over here. Trust scoring API over there. Audit trail database in the corner. Delegation protocol on top. Sybil detection as a roadmap item for later. None of these projects are wrong about the problem. They're all building the wrong solution. The pattern Pick any thread. Someone proposes DID-based identity. Someone else points out that identity doesn't equal trust. A third person proposes a trust scoring service. A fourth asks where the trust data comes from. The

Dev.to AI

4m27 minutes ago

ProductsLive

The Claude Code Leak Changed the Threat Model. Here's How to Defend Your AI Agents.

IntentGuard — a policy enforcement layer for MCP tool calls and AI coding agents The Leak That Rewrote the Attacker's Playbook On March 31, 2026, 512,000 lines of Claude Code source were accidentally published via an npm source map. Within hours the code was mirrored across GitHub. What was already extractable from the minified bundle became instantly readable : the compaction pipeline, every bash-security regex, the permission short-circuit logic, and the exact MCP interface contract. The leak didn't create new vulnerability classes — it collapsed the cost of exploiting them . Attackers no longer need to brute-force prompt injections or reverse-engineer shell validators. They can read the code, study the gaps, and craft payloads that a cooperative model will execute and a reasonable devel

Dev.to AI

14m36 minutes ago

ModelsLive

If Memory Could Compute, Would We Still Need GPUs?

If Memory Could Compute, Would We Still Need GPUs? The bottleneck for LLM inference isn't GPU compute. It's memory bandwidth. A February 2026 ArXiv paper (arXiv:2601.05047) states it plainly: the primary challenges for LLM inference are memory and interconnect, not computation. GPU arithmetic units spend more than half their time idle, waiting for data to arrive. So flip the paradigm. Compute where the data lives, and data movement disappears. This is the core idea behind Processing-in-Memory (PIM). SK Hynix's AiM is shipping as a commercial product. Samsung announced LPDDR5X-PIM in February 2026. HBM4 integrates logic dies, turning the memory stack itself into a co-processor. Is the GPU era ending? Short answer: no. But PIM will change LLM inference architecture. How far the change goes,

Dev.to AI

8m26 minutes ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 143 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

Scaling Agentic Memory to 5 Billion Vectors via Binary Quantization and Dynamic Wavelet Matrices

Daily AI Digest

More about

Every agent trust proposal is building the wrong thing

The Claude Code Leak Changed the Threat Model. Here's How to Defend Your AI Agents.

If Memory Could Compute, Would We Still Need GPUs?

Knowledge Map

Connected Articles — Knowledge Graph

Discussion

More in Models

If Memory Could Compute, Would We Still Need GPUs?

Anthropic Races to Contain Leak of Code Behind Claude AI Agent - WSJ

Fears Over U.S. AI Dominance Boost Business for France’s Mistral - WSJ

Anthropic Races to Contain Leak of Code Behind Claude AI Agent - WSJ