Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessAI on CanvasHacker News AI TopDFRobot Showcases AI Maker Projects at Robot Hokoten in Akihabara - Thailand Business NewsGoogle News - AI ThailandAGI vs artificial intelligence: What’s the real difference - WIONGNews AI AGIAI agents promise to 'run the business,' but who is liable if things go wrong?Hacker News AI TopJapan Turns Labor Crisis Into Physical AI Testing Ground - The Tech BuzzGNews AI jobsOnly 7.4% of Fortune 500 have an llms.txt file, study finds - PPC LandGNews AI searchBuy Facebook Reviews | Boost Brand Trust & VisibilityDev.to AIMy AI Pendant Turned Voice Memos Into Two Shipped ProjectsMedium AIWhy Your Website Is Invisible to AI Search Engines (And How to Fix It)Dev.to AI85% of Companies Claim Skills-Based Hiring. Only 0.14% of Hires Are Actually Affected.Medium AII Tried the Tea Checker App as a Developer — Here’s My Honest ReviewDev.to AIBeyond Simple OCR: Building an Autonomous VLM Auditor for E-Commerce ScaleDev.to AIBlack Hat USADark ReadingBlack Hat AsiaAI BusinessAI on CanvasHacker News AI TopDFRobot Showcases AI Maker Projects at Robot Hokoten in Akihabara - Thailand Business NewsGoogle News - AI ThailandAGI vs artificial intelligence: What’s the real difference - WIONGNews AI AGIAI agents promise to 'run the business,' but who is liable if things go wrong?Hacker News AI TopJapan Turns Labor Crisis Into Physical AI Testing Ground - The Tech BuzzGNews AI jobsOnly 7.4% of Fortune 500 have an llms.txt file, study finds - PPC LandGNews AI searchBuy Facebook Reviews | Boost Brand Trust & VisibilityDev.to AIMy AI Pendant Turned Voice Memos Into Two Shipped ProjectsMedium AIWhy Your Website Is Invisible to AI Search Engines (And How to Fix It)Dev.to AI85% of Companies Claim Skills-Based Hiring. Only 0.14% of Hires Are Actually Affected.Medium AII Tried the Tea Checker App as a Developer — Here’s My Honest ReviewDev.to AIBeyond Simple OCR: Building an Autonomous VLM Auditor for E-Commerce ScaleDev.to AI
AI NEWS HUBbyEIGENVECTOREigenvector

WybeCoder: Verified Imperative Code Generation

arXiv cs.SEby Fabian Gloeckle, Mantas Baksys, Darius Feher, Kunhao Zheng, Amaury Hayat, Sean B. Holden, Gabriel Synnaeve, Peter O'HearnApril 1, 20261 min read0 views
Source Quiz

arXiv:2603.29088v1 Announce Type: new Abstract: Recent progress in large language models (LLMs) has advanced automatic code generation and formal theorem proving, yet software verification has not seen the same improvement. To address this gap, we propose WybeCoder, an agentic code verification framework that enables prove-as-you-generate development where code, invariants, and proofs co-evolve. It builds on a recent framework that combines automatic verification condition generation and SMT solvers with interactive proofs in Lean. To enable systematic evaluation, we translate two benchmarks for functional verification in Lean, Verina and Clever, to equivalent imperative code specifications. On complex algorithms such as Heapsort, we observe consistent performance improvements by scaling o

View PDF

Abstract:Recent progress in large language models (LLMs) has advanced automatic code generation and formal theorem proving, yet software verification has not seen the same improvement. To address this gap, we propose WybeCoder, an agentic code verification framework that enables prove-as-you-generate development where code, invariants, and proofs co-evolve. It builds on a recent framework that combines automatic verification condition generation and SMT solvers with interactive proofs in Lean. To enable systematic evaluation, we translate two benchmarks for functional verification in Lean, Verina and Clever, to equivalent imperative code specifications. On complex algorithms such as Heapsort, we observe consistent performance improvements by scaling our approach, synthesizing dozens of valid invariants and dispatching of dozens of subgoals, resulting in hundreds of lines of verified code, overcoming plateaus reported in previous works. Our best system solves 74% of Verina tasks and 62% of Clever tasks at moderate compute budgets, significantly surpassing previous evaluations and paving a path to automated construction of large-scale datasets of verified imperative code.

Subjects:

Software Engineering (cs.SE); Artificial Intelligence (cs.AI)

Cite as: arXiv:2603.29088 [cs.SE]

(or arXiv:2603.29088v1 [cs.SE] for this version)

https://doi.org/10.48550/arXiv.2603.29088

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Mantas Baksys [view email] [v1] Tue, 31 Mar 2026 00:06:44 UTC (757 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modellanguage modelbenchmark

Knowledge Map

Knowledge Map
TopicsEntitiesSource
WybeCoder: …modellanguage mo…benchmarkannouncevaluationreportarXiv cs.SE

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 150 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Models