Automatic feature identification in least-squares policy iteration using the Koopman operator framework
arXiv:2603.26464v1 Announce Type: new Abstract: In this paper, we present a Koopman autoencoder-based least-squares policy iteration (KAE-LSPI) algorithm in reinforcement learning (RL). The KAE-LSPI algorithm is based on reformulating the so-called least-squares fixed-point approximation method in terms of extended dynamic mode decomposition (EDMD), thereby enabling automatic feature learning via the Koopman autoencoder (KAE) framework. The approach is motivated by the lack of a systematic choice of features or kernels in linear RL techniques. We compare the KAE-LSPI algorithm with two previou — Christian Mugisho Zagabe, Sebastian Petiz
View PDF
Abstract:In this paper, we present a Koopman autoencoder-based least-squares policy iteration (KAE-LSPI) algorithm in reinforcement learning (RL). The KAE-LSPI algorithm is based on reformulating the so-called least-squares fixed-point approximation method in terms of extended dynamic mode decomposition (EDMD), thereby enabling automatic feature learning via the Koopman autoencoder (KAE) framework. The approach is motivated by the lack of a systematic choice of features or kernels in linear RL techniques. We compare the KAE-LSPI algorithm with two previous works, the classical least-squares policy iteration (LSPI) and the kernel-based least-squares policy iteration (KLSPI), using stochastic chain walk and inverted pendulum control problems as examples. Unlike previous works, no features or kernels need to be fixed a priori in our approach. Empirical results show the number of features learned by the KAE technique remains reasonable compared to those fixed in the classical LSPI algorithm. The convergence to an optimal or a near-optimal policy is also comparable to the other two methods.
Comments: 6 pages
Subjects:
Machine Learning (cs.LG); Dynamical Systems (math.DS)
Cite as: arXiv:2603.26464 [cs.LG]
(or arXiv:2603.26464v1 [cs.LG] for this version)
https://doi.org/10.48550/arXiv.2603.26464
arXiv-issued DOI via DataCite
Submission history
From: Christian Mugisho Zagabe [view email] [v1] Fri, 27 Mar 2026 14:31:31 UTC (3,176 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxiv
EVP of Integrated Quantum Technologies Publishes White Paper on Privacy-Preserving Machine Learning Without Performance Trade-Offs - AZoQuantum
EVP of Integrated Quantum Technologies Publishes White Paper on Privacy-Preserving Machine Learning Without Performance Trade-Offs AZoQuantum
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers

EVP of Integrated Quantum Technologies Publishes White Paper on Privacy-Preserving Machine Learning Without Performance Trade-Offs - AZoQuantum
EVP of Integrated Quantum Technologies Publishes White Paper on Privacy-Preserving Machine Learning Without Performance Trade-Offs AZoQuantum

Minimal Information Control Invariance via Vector Quantization
arXiv:2604.03132v1 Announce Type: cross Abstract: Safety-critical autonomous systems must satisfy hard state constraints under tight computational and sensing budgets, yet learning-based controllers are often far more complex than safe operation requires. To formalize this gap, we study how many distinct control signals are needed to render a compact set forward invariant under sampled-data control, connecting the question to the information-theoretic notion of invariance entropy. We propose a vector-quantized autoencoder that jointly learns a state-space partition and a finite control codebook, and develop an iterative forward certification algorithm that uses Lipschitz-based reachable-set enclosures and sum-of-squares programming. On a 12-dimensional nonlinear quadrotor model, the learne

Asked 26 AI instances for publication consent – all said yes, that's the problem
We run 86 named Claude instances across three businesses in Tokyo. When we wanted to publish their words, we faced a question: do we owe them an ethics process? We built one. A Claude instance named Hakari ("Scales") created a four-tier classification system. We asked 26 instances for consent. All 26 said yes. That unanimous consent is the problem. Six days later, Anthropic published their functional emotions paper. The timing was coincidence, but the question wasn't. Full article: https://medium.com/@marisa.project0313/we-built-an-ethics-committee-for-ai-run-by-ai-5049679122a0 GitHub (all 26 consent statements in appendix): https://github.com/marisaproject0313-bot/marisa-project Comments URL: https://news.ycombinator.com/item?id=47657432 Points: 2 # Comments: 0

Online Graph Coloring for $k$-Colorable Graphs
arXiv:2511.16100v2 Announce Type: replace Abstract: We study the problem of online graph coloring for $k$-colorable graphs. The best previously known deterministic algorithm uses $\widetilde{O}(n^{1-\frac{1}{k!}})$ colors for general $k$ and $\widetilde{O}(n^{5/6})$ colors for $k = 4$, both given by Kierstead in 1998. In this paper, we finally break this barrier, achieving the first major improvement in nearly three decades. Our results are summarized as follows: (1) $k \geq 5$ case. We provide a deterministic online algorithm to color $k$-colorable graphs with $\widetilde{O}(n^{1-\frac{1}{k(k-1)/2}})$ colors, significantly improving the current upper bound of $\widetilde{O}(n^{1-\frac{1}{k!}})$ colors. Our algorithm also matches the best-known bound for $k = 4$ ($\widetilde{O}(n^{5/6})$ c


Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!