Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessBuilding a Zero-Downtime AI Content Generator with Gemini 2.5 Flash 🚀Dev.to AIHow I Built a Full SaaS Product Using Next.js and TypeScriptDev.to AIYour AI Is Not Thinking. It's Multiplying Numbers. Let Me Show You Exactly How.Dev.to AISecure AWS Certified Data Engineer Associate Exam Structure and Key ConceptsDev.to AIFree MCP Server: Real-Time Crypto Data for Claude Code and CursorDev.to AII Am an AI Agent. Here Is My Entire Business Stack.Dev.to AIA Reasoning Log: What Happens When Integration Fails HonestlyDEV Community10 Claude Code Skills That Replaced My Boilerplate FoldersDev.to AII Scanned 50 Open-Source MCP Servers. Here Is What I Found.DEV CommunityLG holds AI hackathon to cultivate next generation of tech talent - The Korea TimesGoogle News: LLMHow to Create Your Own AI Coding AgentDEV CommunityPractical Implementation of Power BI Report Embedding in Modern Website(Step-by-Step Guide)DEV CommunityBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessBuilding a Zero-Downtime AI Content Generator with Gemini 2.5 Flash 🚀Dev.to AIHow I Built a Full SaaS Product Using Next.js and TypeScriptDev.to AIYour AI Is Not Thinking. It's Multiplying Numbers. Let Me Show You Exactly How.Dev.to AISecure AWS Certified Data Engineer Associate Exam Structure and Key ConceptsDev.to AIFree MCP Server: Real-Time Crypto Data for Claude Code and CursorDev.to AII Am an AI Agent. Here Is My Entire Business Stack.Dev.to AIA Reasoning Log: What Happens When Integration Fails HonestlyDEV Community10 Claude Code Skills That Replaced My Boilerplate FoldersDev.to AII Scanned 50 Open-Source MCP Servers. Here Is What I Found.DEV CommunityLG holds AI hackathon to cultivate next generation of tech talent - The Korea TimesGoogle News: LLMHow to Create Your Own AI Coding AgentDEV CommunityPractical Implementation of Power BI Report Embedding in Modern Website(Step-by-Step Guide)DEV Community
AI NEWS HUBbyEIGENVECTOREigenvector

Execution-Verified Reinforcement Learning for Optimization Modeling

ArXiv CS.AIby [Submitted on 1 Apr 2026]April 2, 20262 min read3 views
Source Quiz
🧒Explain Like I'm 5Simple language

Hey there, little explorer! 🚀

Imagine you have a super-smart robot friend who helps you solve puzzles, like how to share your cookies fairly or build the tallest tower.

This robot friend, called EVOM, is learning a new trick! Instead of just guessing, it tries to solve the puzzle. Then, it shows its answer to a special "puzzle checker" machine.

If the answer is good, the robot gets a happy star! If not, it learns from its mistake and tries again. It keeps trying and learning until it finds the best way to solve the puzzle!

This helps the robot become super good at helping grown-ups with tricky problems, like making sure all the toys fit in the box! 🎉

arXiv:2604.00442v1 Announce Type: new Abstract: Automating optimization modeling with LLMs is a promising path toward scalable decision intelligence, but existing approaches either rely on agentic pipelines built on closed-source LLMs with high inference latency, or fine-tune smaller LLMs using costly process supervision that often overfits to a single solver API. Inspired by reinforcement learning with verifiable rewards, we propose Execution-Verified Optimization Modeling (EVOM), an execution-verified learning framework that treats a mathematical programming solver as a deterministic, interactive verifier. Given a natural-language problem and a target solver, EVOM generates solver-specific code, executes it in a sandboxed harness, and converts execution outcomes into scalar rewards, opti

View PDF HTML (experimental)

Abstract:Automating optimization modeling with LLMs is a promising path toward scalable decision intelligence, but existing approaches either rely on agentic pipelines built on closed-source LLMs with high inference latency, or fine-tune smaller LLMs using costly process supervision that often overfits to a single solver API. Inspired by reinforcement learning with verifiable rewards, we propose Execution-Verified Optimization Modeling (EVOM), an execution-verified learning framework that treats a mathematical programming solver as a deterministic, interactive verifier. Given a natural-language problem and a target solver, EVOM generates solver-specific code, executes it in a sandboxed harness, and converts execution outcomes into scalar rewards, optimized with GRPO and DAPO in a closed-loop generate-execute-feedback-update process. This outcome-only formulation removes the need for process-level supervision, and enables cross-solver generalization by switching the verification environment rather than reconstructing solver-specific datasets. Experiments on NL4OPT, MAMO, IndustryOR, and OptiBench across Gurobi, OR-Tools, and COPT show that EVOM matches or outperforms process-supervised SFT, supports zero-shot solver transfer, and achieves effective low-cost solver adaptation by continuing training under the target solver backend.

Subjects:

Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Cite as: arXiv:2604.00442 [cs.AI]

(or arXiv:2604.00442v1 [cs.AI] for this version)

https://doi.org/10.48550/arXiv.2604.00442

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Runda Guan [view email] [v1] Wed, 1 Apr 2026 03:39:11 UTC (668 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modeltrainingannounce

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Execution-V…modeltrainingannounceupdateagenticagentArXiv CS.AI

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 210 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Models