Execution-Verified Reinforcement Learning for Optimization Modeling
Hey there, little explorer! 🚀
Imagine you have a super-smart robot friend who helps you solve puzzles, like how to share your cookies fairly or build the tallest tower.
This robot friend, called EVOM, is learning a new trick! Instead of just guessing, it tries to solve the puzzle. Then, it shows its answer to a special "puzzle checker" machine.
If the answer is good, the robot gets a happy star! If not, it learns from its mistake and tries again. It keeps trying and learning until it finds the best way to solve the puzzle!
This helps the robot become super good at helping grown-ups with tricky problems, like making sure all the toys fit in the box! 🎉
arXiv:2604.00442v1 Announce Type: new Abstract: Automating optimization modeling with LLMs is a promising path toward scalable decision intelligence, but existing approaches either rely on agentic pipelines built on closed-source LLMs with high inference latency, or fine-tune smaller LLMs using costly process supervision that often overfits to a single solver API. Inspired by reinforcement learning with verifiable rewards, we propose Execution-Verified Optimization Modeling (EVOM), an execution-verified learning framework that treats a mathematical programming solver as a deterministic, interactive verifier. Given a natural-language problem and a target solver, EVOM generates solver-specific code, executes it in a sandboxed harness, and converts execution outcomes into scalar rewards, opti
View PDF HTML (experimental)
Abstract:Automating optimization modeling with LLMs is a promising path toward scalable decision intelligence, but existing approaches either rely on agentic pipelines built on closed-source LLMs with high inference latency, or fine-tune smaller LLMs using costly process supervision that often overfits to a single solver API. Inspired by reinforcement learning with verifiable rewards, we propose Execution-Verified Optimization Modeling (EVOM), an execution-verified learning framework that treats a mathematical programming solver as a deterministic, interactive verifier. Given a natural-language problem and a target solver, EVOM generates solver-specific code, executes it in a sandboxed harness, and converts execution outcomes into scalar rewards, optimized with GRPO and DAPO in a closed-loop generate-execute-feedback-update process. This outcome-only formulation removes the need for process-level supervision, and enables cross-solver generalization by switching the verification environment rather than reconstructing solver-specific datasets. Experiments on NL4OPT, MAMO, IndustryOR, and OptiBench across Gurobi, OR-Tools, and COPT show that EVOM matches or outperforms process-supervised SFT, supports zero-shot solver transfer, and achieves effective low-cost solver adaptation by continuing training under the target solver backend.
Subjects:
Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as: arXiv:2604.00442 [cs.AI]
(or arXiv:2604.00442v1 [cs.AI] for this version)
https://doi.org/10.48550/arXiv.2604.00442
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Runda Guan [view email] [v1] Wed, 1 Apr 2026 03:39:11 UTC (668 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modeltrainingannounce
10 Claude Code Skills That Replaced My Boilerplate Folders
10 Claude Code Skills That Replaced My Boilerplate Folders I used to keep a folder of boilerplate code. Auth templates. Stripe integration files. Docker configs. I do not do that anymore. Here are the 10 Claude Code skills that replaced them. What Is a Claude Code Skill? A skill is a markdown file Claude Code reads before writing code. It gives Claude full context about your preferences, patterns, and requirements — so the output is production-ready, not generic. You invoke a skill with a slash command: /auth → full authentication system /pay → Stripe billing setup Claude reads the skill, asks clarifying questions, then outputs complete implementations. The 10 Skills 1. /auth — Authentication System Asks: OAuth providers? Session or JWT? Role-based access needed? Outputs: Complete auth imp

Orientation Matters: Learning Radiation Patterns of Multi-Rotor UAVs In-Flight to Enhance Communication Availability Modeling
arXiv:2604.02827v1 Announce Type: new Abstract: The paper presents an approach for learning antenna Radiation Patterns (RPs) of a pair of heterogeneous quadrotor Uncrewed Aerial Vehicles (UAVs) by calibration flight data. RPs are modeled either as a Spherical Harmonics series or as a weighted average over inducing samples. Linear regression of polynomial coefficients simultaneously decouples the two independent UAVs' RPs. A joint calibration trajectory exploits available flight time in an obstacle-free anechoic altitude. Evaluation on a real-world dataset demonstrates the feasibility of learning both radiation patterns, achieving 3.6 dB RMS error, the measurement noise level. The proposed RP learning and decoupling can be exploited in rapid recalibration upon payload changes, thereby enabl

Goal-Conditioned Neural ODEs with Guaranteed Safety and Stability for Learning-Based All-Pairs Motion Planning
arXiv:2604.02821v1 Announce Type: new Abstract: This paper presents a learning-based approach for all-pairs motion planning, where the initial and goal states are allowed to be arbitrary points in a safe set. We construct smooth goal-conditioned neural ordinary differential equations (neural ODEs) via bi-Lipschitz diffeomorphisms. Theoretical results show that the proposed model can provide guarantees of global exponential stability and safety (safe set forward invariance) regardless of goal location. Moreover, explicit bounds on convergence rate, tracking error, and vector field magnitude are established. Our approach admits a tractable learning implementation using bi-Lipschitz neural networks and can incorporate demonstration data. We illustrate the effectiveness of the proposed method
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Models

Goal-Conditioned Neural ODEs with Guaranteed Safety and Stability for Learning-Based All-Pairs Motion Planning
arXiv:2604.02821v1 Announce Type: new Abstract: This paper presents a learning-based approach for all-pairs motion planning, where the initial and goal states are allowed to be arbitrary points in a safe set. We construct smooth goal-conditioned neural ordinary differential equations (neural ODEs) via bi-Lipschitz diffeomorphisms. Theoretical results show that the proposed model can provide guarantees of global exponential stability and safety (safe set forward invariance) regardless of goal location. Moreover, explicit bounds on convergence rate, tracking error, and vector field magnitude are established. Our approach admits a tractable learning implementation using bi-Lipschitz neural networks and can incorporate demonstration data. We illustrate the effectiveness of the proposed method

Learning Structured Robot Policies from Vision-Language Models via Synthetic Neuro-Symbolic Supervision
arXiv:2604.02812v1 Announce Type: new Abstract: Vision-language models (VLMs) have recently demonstrated strong capabilities in mapping multimodal observations to robot behaviors. However, most current approaches rely on end-to-end visuomotor policies that remain opaque and difficult to analyze, limiting their use in safety-critical robotic applications. In contrast, classical robotic systems often rely on structured policy representations that provide interpretability, modularity, and reactive execution. This work investigates how foundation models can be specialized to generate structured robot policies grounded in multimodal perception, bridging high-dimensional learning and symbolic control. We propose a neuro-symbolic approach in which a VLM synthesizes executable Behavior Tree polici



Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!