Reasoning as Energy Minimization over Structured Latent Trajectories
arXiv:2603.28248v1 Announce Type: new Abstract: Single-shot neural decoders commit to answers without iterative refinement, while chain-of-thought methods introduce discrete intermediate steps but lack a scalar measure of reasoning progress. We propose Energy-Based Reasoning via Structured Latent Planning (EBRM), which models reasoning as gradient-based optimization of a multi-step latent trajectory $z_{1:T}$ under a learned energy function $E(h_x, z)$. The energy decomposes into per-step compatibility, transition consistency, and trajectory smoothness terms. Training combines supervised encod — David K. Johansson
View PDF HTML (experimental)
Abstract:Single-shot neural decoders commit to answers without iterative refinement, while chain-of-thought methods introduce discrete intermediate steps but lack a scalar measure of reasoning progress. We propose Energy-Based Reasoning via Structured Latent Planning (EBRM), which models reasoning as gradient-based optimization of a multi-step latent trajectory $z_{1:T}$ under a learned energy function $E(h_x, z)$. The energy decomposes into per-step compatibility, transition consistency, and trajectory smoothness terms. Training combines supervised encoder-decoder learning with contrastive energy shaping using hard negatives, while inference performs gradient descent or Langevin dynamics over $z$ and decodes from $z_T$. We identify a critical failure mode: on CNF logic satisfaction, latent planning reduces accuracy from $\approx 95%$ to $\approx 56%$. This degradation arises from a distribution mismatch, where the decoder is trained on encoder outputs $h_x$ but evaluated on planner outputs $z_T$ that drift into unseen latent regions. We analyze this behavior through per-step decoding, latent drift tracking, and gradient decomposition. To address it, we propose dual-path decoder training and latent anchoring. We further introduce a six-part ablation protocol covering component contributions, trajectory length, planner dynamics, initialization, decoder training distribution, and anchor weight. Experiments on three synthetic tasks show that energy decreases monotonically and induces structured latent trajectories on graph and logic tasks, while remaining flat on arithmetic ($r = 0.073$), indicating a negative result. Code is available at this https URL.
Comments: 7 pages
Subjects:
Artificial Intelligence (cs.AI)
Cite as: arXiv:2603.28248 [cs.AI]
(or arXiv:2603.28248v1 [cs.AI] for this version)
https://doi.org/10.48550/arXiv.2603.28248
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: David K. Johansson [view email] [v1] Mon, 30 Mar 2026 10:20:56 UTC (580 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxiv![[D] AI research on small language models](https://d2xsxph8kpxj0f.cloudfront.net/310419663032563854/konzwo8nGf8Z4uZsMefwMr/default-img-ai-chip-closeup-KMZ5N5zRxP2NRiYJ8TB9TM.webp)
[D] AI research on small language models
i'm doing research on some trending fields in AI, currently working on small language models and would love to meet people who are working in similar domains and are looking to write/publish papers! submitted by /u/StoicWithSyrup [link] [comments]

Promising Signals on AI Governance from China
View the official memo here. China has consistently signaled a willingness to engage on global AI governance since at least 2017. This memo compiles key statements from the Chinese government and prominent figures demonstrating their desire to coordinate on the problem of AI. Chinese Vice Premier Ding Xuexiang, at the 2025 World Economic Forum, said: [ ] The post Promising Signals on AI Governance from China appeared first on Machine Intelligence Research Institute .
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers
VLMs Need Words: Vision Language Models Ignore Visual Detail In Favor of Semantic Anchors
Vision Language Models struggle with fine-grained visual perception tasks due to their language-centric training approach, performing poorly on unnamed visual entities despite having relevant information in their representations. (1 upvotes on HuggingFace)





Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!