Models model language model announce emergent reasoning arxiv

Human-Like Lifelong Memory: A Neuroscience-Grounded Architecture for Infinite Interaction

arXiv cs.CLby [Submitted on 30 Mar 2026]April 1, 20262 min read2 views

🧒Explain Like I'm 5Simple language

Hey there, little explorer! Imagine your favorite toy robot, right? 🤖

Right now, when you tell it something, it sometimes forgets it quickly, like a goldfish! 🐠 It's like it can only remember what you just said, not everything from yesterday.

But guess what? Smart grown-ups are trying to teach robots to remember things forever, just like you remember your birthday party or your favorite story! 🎉

They want to give robots a special memory box that helps them learn from feelings (like if something was happy or sad) and remember things better, like a super-duper smart elephant! 🐘 This way, robots can become even better friends and remember all your fun adventures together! It's like giving them a brain that grows smarter and smarter! ✨

arXiv:2603.29023v1 Announce Type: new Abstract: Large language models lack persistent, structured memory for long-term interaction and context-sensitive retrieval. Expanding context windows does not solve this: recent evidence shows that context length alone degrades reasoning by up to 85% - even with perfect retrieval. We propose a bio-inspired memory framework grounded in complementary learning systems theory, cognitive behavioral therapy's belief hierarchy, dual-process cognition, and fuzzy-trace theory, organized around three principles: (1) Memory has valence, not just content - pre-computed emotional-associative summaries (valence vectors) organized in an emergent belief hierarchy inspired by Beck's cognitive model enable instant orientation before deliberation; (2) Retrieval default

View PDF HTML (experimental)

Abstract:Large language models lack persistent, structured memory for long-term interaction and context-sensitive retrieval. Expanding context windows does not solve this: recent evidence shows that context length alone degrades reasoning by up to 85% - even with perfect retrieval. We propose a bio-inspired memory framework grounded in complementary learning systems theory, cognitive behavioral therapy's belief hierarchy, dual-process cognition, and fuzzy-trace theory, organized around three principles: (1) Memory has valence, not just content - pre-computed emotional-associative summaries (valence vectors) organized in an emergent belief hierarchy inspired by Beck's cognitive model enable instant orientation before deliberation; (2) Retrieval defaults to System 1 with System 2 escalation - automatic spreading activation and passive priming as default, with deliberate retrieval only when needed, and graded epistemic states that address hallucination structurally; and (3) Encoding is active, present, and feedback-dependent - a thalamic gateway tags and routes information between stores, while the executive forms gists through curiosity-driven investigation, not passive exposure. Seven functional properties specify what any implementation must satisfy. Over time, the system converges toward System 1 processing - the computational analog of clinical expertise - producing interactions that become cheaper, not more expensive, with experience.

Comments: 14 pages, 1 figure. Accepted at the MemAgents Workshop, ICLR 2026

Subjects:

Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Cite as: arXiv:2603.29023 [cs.CL]

(or arXiv:2603.29023v1 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2603.29023

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Diego Cesar Lerma Torres [view email] [v1] Mon, 30 Mar 2026 21:35:28 UTC (51 KB)

Original source

arXiv cs.CL

https://arxiv.org/abs/2603.29023

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modellanguage modelannounce

ModelsFresh

Bench 2xMI50 Qwen3.5 27b vs Gemma4 31B (vllm-gfx906-mobydick)

Inference engine used (vllm fork) : https://github.com/ai-infos/vllm-gfx906-mobydick/tree/main Huggingface Quants used: QuantTrio/Qwen3.5-27B-AWQ vs cyankiwi/gemma-4-31B-it-AWQ-4bit Relevant commands to run : docker run -it --name vllm-gfx906-mobydick -v ~/llm/models:/models --network host --device=/dev/kfd --device=/dev/dri --group-add video --group-add $(getent group render | cut -d: -f3) --ipc=host aiinfos/vllm-gfx906-mobydick:latest FLASH_ATTENTION_TRITON_AMD_ENABLE="TRUE" OMP_NUM_THREADS=4 VLLM_LOGGING_LEVEL=DEBUG vllm serve \ /models/gemma-4-31B-it-AWQ-4bit \ --served-model-name gemma-4-31B-it-AWQ-4bit \ --dtype float16 \ --max-model-len auto \ --gpu-memory-utilization 0.95 \ --enable-auto-tool-choice \ --tool-call-parser gemma4 \ --reasoning-parser gemma4 \ --mm-processor-cache-gb 1

Reddit r/LocalLLaMA

3mabout 4 hours ago

Open Source AILive

Vllm gemma4 26b a4b it-nvfp4 run success

#!/usr/bin/env bash set -euo pipefail BASE_DIR="/mnt/d/AI/docker-gemma4" PATCH_DIR="$BASE_DIR/nvfp4_patch" BUILD_DIR="$BASE_DIR/build" HF_CACHE_DIR="$BASE_DIR/hf-cache" LOG_DIR="$BASE_DIR/logs" PATCH_FILE="$PATCH_DIR/gemma4_patched.py" DOCKERFILE_PATH="$BUILD_DIR/Dockerfile" BASE_IMAGE="vllm/vllm-openai:gemma4" PATCHED_IMAGE="vllm-gemma4-nvfp4-patched" CONTAINER_NAME="vllm-gemma4-nvfp4" MODEL_ID="bg-digitalservices/Gemma-4-26B-A4B-it-NVFP4" SERVED_MODEL_NAME="gemma-4-26b-a4b-it-nvfp4" GPU_MEMORY_UTILIZATION="0.88" MAX_MODEL_LEN="512" MAX_NUM_SEQS="1" PORT=" " PATCH_URL=" https://huggingface.co/bg-digitalservices/Gemma-4-26B-A4B-it-NVFP4/resolve/main/gemma4_patched.py?download=true " if [[ -z "${HF_TOKEN:-}" ]]; then echo "[ERROR] HF_TOKEN environment variable is empty." echo "Please run th

Reddit r/LocalLLaMA

2mabout 1 hour ago

ModelsLive

Prompts you use to test/trip up your LLMs

I'm obsessed with finding prompts to test the quality of different local models. I've pretty much landed on several that I use across the board. Tell me about the Apple A6 (a pass is if it mentions Apple made their own microarchitecture called swift for the CPU cores, the main thing that the A6 is historically known for as the first Apple SOC to do it. This tests if it is smart enough to mention historically relevant information first) Tell me about the history of Phoenix's freeway network (A pass is if it gives a historical narration instead of just listing freeways. We asked for history, after all. Again, testing for its understanding of putting relevant information first.) Tell me about the Pentium D. Why was it a bad processor ( A pass is it it mentions that it glued two separate penti

Reddit r/LocalLLaMA

4mabout 1 hour ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 177 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Models

Models

Vietnam IP law change fuels debate, uncertainty over AI training data - MLex

Vietnam IP law change fuels debate, uncertainty over AI training data MLex

Google News - AI Vietnam

1m4 months ago

ModelsFresh

Bench 2xMI50 Qwen3.5 27b vs Gemma4 31B (vllm-gfx906-mobydick)

Reddit r/LocalLLaMA

3mabout 4 hours ago

ModelsFresh

RTX 5090 gemma4-26b TG performance report

Nothing exhaustive... but I thought I'd report what I've seen from early testing. I'm running a modified version of vLLM that has NVFP4 support for gemma4-26b. Weights come in around 15.76 GiB and the remainder is KV cache. I'm running full context as well. For a "story telling" prompt and raw output with no thinking, I'm seeing about 150 t/s on TG. TTFT in streaming mode is about 80ms. Quality is good! submitted by /u/Nice_Cellist_7595 [link] [comments]

Reddit r/LocalLLaMA

1mabout 2 hours ago

ModelsLive

Prompts you use to test/trip up your LLMs

Reddit r/LocalLLaMA

4mabout 1 hour ago