Long Term AI Memory by creator of Apache Cassandra
cortexdb.ai CortexDB is the long-term memory layer for AI systems — The problem is fundamental: today's AI agents are stateless. Every conversation starts from zero. The dominant approach to giving AI memory — having an LLM rewrite and merge your data on every single write — is lossy, fragile, and ruinously expensive. The LLM decides what to keep and what to throw away, replaces the original with a summary, and that decision is irreversible. Information it deemed unimportant today may be exactly what a future query needs tomorrow. CortexDB takes a fundamentally different approach: every piece of information is appended to an immutable event log and never overwritten. A lightweight LLM extracts entities and relationships asynchronously, but the original data is always preserved — if the ext
cortexdb.ai
CortexDB is the long-term memory layer for AI systems — The problem is fundamental: today's AI agents are stateless. Every conversation starts from zero. The dominant approach to giving AI memory — having an LLM rewrite and merge your data on every single write — is lossy, fragile, and ruinously expensive. The LLM decides what to keep and what to throw away, replaces the original with a summary, and that decision is irreversible. Information it deemed unimportant today may be exactly what a future query needs tomorrow. CortexDB takes a fundamentally different approach: every piece of information is appended to an immutable event log and never overwritten. A lightweight LLM extracts entities and relationships asynchronously, but the original data is always preserved — if the extraction misses something, the raw event is still there for any future query or reprocessing. From this event stream. CortexDB automatically builds a temporal knowledge graph — entities, relationships, causal chains, and provenance — and uses hybrid retrieval combining vector search, full-text matching, graph traversal, and adaptive ranking to assemble the exact context an AI agent needs at query time. The results are not incremental. In controlled benchmarks using identical language models, identical embeddings, and identical test data across five production-scale scenarios, CortexDB achieved a huge gap that is structural, not incidental, because you cannot retrieve information you've already destroyed. The cost difference is equally dramatic because CortexDB's write path uses a lightweight extraction model while rewriting systems burn expensive LLM inference to merge and regenerate entire memory stores on every write operation.
CortexDB scales the same way Cassandra scales — through consistent hashing, partition-aware data placement, and leaderless replication, where every index, every graph shard, and every vector store is scoped to a partition from day one. Adding capacity means adding a node; the cluster rebalances automatically with zero downtime. A single-node deployment is simply a distributed system with one node — the same code path runs whether you have one machine or a hundred. This is not a single-node prototype that will be distributed later. Distribution is the architecture itself, at scale — retrofitting distribution onto a monolithic design costs more than building it right from the start. CortexDB is not a better version of what exists. It is a new layer of infrastructure — the memory layer — built from first principle that scales infinitely unlike any other solution in the market.
Dev.to AI
https://dev.to/prashant_malik_c0d77148e8/long-term-ai-memory-by-creator-of-apache-cassandra-5ap0Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modellanguage modelbenchmark
Untitled
You have 50 models. Each trained on different data, different domain, different patient population. You want them to get smarter from each other. So you do the obvious thing — you set up a central aggregator. Round 1: gradients in, averaged weights out. Works fine at N=5. At N=20 you notice the coordinator is sweating. At N=50, round latency has tripled, your smallest sites are timing out, and your bandwidth budget is gone. You tune the hell out of it. Same ceiling. This is not a configuration problem. This is an architecture ceiling. The math underneath it guarantees you hit a wall. A different architecture changes the math. The combinatorics you are not harvesting Start with a fact that has nothing to do with any particular framework: N agents have exactly N(N-1)/2 unique pairwise relati

AI News This Week: April 05, 2026 - A New Era of Rapid Development and Multimodal Intelligence
AI News This Week: April 05, 2026 - A New Era of Rapid Development and Multimodal Intelligence Published: April 05, 2026 | Reading time: ~10 min This week has been nothing short of phenomenal for the AI community, with breakthroughs and announcements that promise to revolutionize the way we develop and interact with artificial intelligence. From building personal AI agents in a matter of hours to the unveiling of cutting-edge multimodal intelligence models, the pace of innovation is not just accelerating - it's transforming the landscape of what's possible. Whether you're a seasoned developer or just starting to explore the world of AI, this week's news is a must-know, offering insights into how technology is making AI more accessible, powerful, and integrated into our daily lives. Buildin

This Week in AI: April 05, 2026 - Revolutionizing Development with Personal Agents and Multimodal Intelligence
This Week in AI: April 05, 2026 - Revolutionizing Development with Personal Agents and Multimodal Intelligence Published: April 05, 2026 | Reading time: ~10 min This week has been incredibly exciting for AI enthusiasts and developers alike. With advancements in personal AI agents, multimodal intelligence, and compact models for enterprise documents, the field is rapidly evolving. One of the most significant trends is the ability to build and deploy useful AI prototypes in a remarkably short amount of time. This shift is largely due to innovative tools and ecosystems that are making AI more accessible to individual builders. In this article, we'll dive into the latest AI news, exploring what these developments mean for developers and the broader implications for the industry. Building a Per
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Models

research-llm-apis 2026-04-04
Release: research-llm-apis 2026-04-04 I'm working on a major change to my LLM Python library and CLI tool. LLM provides an abstraction layer over hundreds of different LLMs from dozens of different vendors thanks to its plugin system, and some of those vendors have grown new features over the past year which LLM's abstraction layer can't handle, such as server-side tool execution. To help design that new abstraction layer I had Claude Code read through the Python client libraries for Anthropic, OpenAI, Gemini and Mistral and use those to help craft curl commands to access the raw JSON for both streaming and non-streaming modes across a range of different scenarios. Both the scripts and the captured outputs now live in this new repo. Tags: llm , apis , json , llms

scan-for-secrets 0.1
Release: scan-for-secrets 0.1 I like publishing transcripts of local Claude Code sessions using my claude-code-transcripts tool but I'm often paranoid that one of my API keys or similar secrets might inadvertently be revealed in the detailed log files. I built this new Python scanning tool to help reassure me. You can feed it secrets and have it scan for them in a specified directory: uvx scan-for-secrets $OPENAI_API_KEY -d logs-to-publish/ If you leave off the -d it defaults to the current directory. It doesn't just scan for the literal secrets - it also scans for common encodings of those secrets e.g. backslash or JSON escaping, as described in the README . If you have a set of secrets you always want to protect you can list commands to echo them in a ~/.scan-for-secrets.conf.sh file. Mi

Harvard Proved Emotions Don't Make AI Smarter — That's Exactly Why You Need Soul Spec
The Myth Dies Hard "I'll tip you $200 if you get this right." "This is really important to my career." "I'm so frustrated — please help me." If you've spent any time on AI Twitter, you've seen people swear that emotional prompting makes LLMs perform better. A few anecdotal successes became gospel. The technique spread. Now Harvard has the data. It doesn't work. What the Research Actually Shows A team from Harvard and Bryn Mawr ( arXiv:2604.02236 , April 2026) ran a systematic study across 6 benchmarks, 6 emotions, 3 models (Qwen3-14B, Llama 3.3-70B, DeepSeek-V3.2), and multiple intensity levels. Finding 1: Fixed emotional prefixes have negligible effect. Adding "I'm angry about this" or "This makes me so happy" before your prompt? Across GSM8K, BIG-Bench Hard, MedQA, BoolQ, OpenBookQA, and

Self-Improving Python Scripts with LLMs: My Journey
As a developer, I've always been fascinated by the idea of self-improving code. Recently, I've been experimenting with using Large Language Models (LLMs) to make my Python scripts more autonomous and efficient. In this article, I'll share my experience with integrating LLMs into my Python workflow and how it has revolutionized my development process. I'll also provide a step-by-step guide on how to get started with making your own Python scripts improve themselves using LLMs. My journey with LLMs began when I stumbled upon the llm_groq module, which allows you to interact with LLMs using a simple and intuitive API. I was impressed by the accuracy and speed of the model, and I quickly realized that it could be used to improve my Python scripts. The first step in making my scripts self-impro


Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!