Research Papers research paper arxiv ai artificial-intelligence

Draft-and-Prune: Improving the Reliability of Auto-formalization for Logical Reasoning

arXivby [Submitted on 18 Mar 2026 (v1), last revised 26 Mar 2026 (this version, v2)]March 30, 20262 min read1 views

arXiv:2603.17233v2 Announce Type: replace Abstract: Auto-formalization (AF) translates natural-language reasoning problems into solver-executable programs, enabling symbolic solvers to perform sound logical deduction. In practice, however, AF pipelines are currently brittle: programs may fail to execute, or execute but encode incorrect semantics. While prior work largely mitigates syntactic failures via repairs based on solver feedback, reducing semantics failures remains a major bottleneck. We propose Draft-and-Prune (D&P), an inference-time framework that improves AF-based logical reason — Zhiyu Ni, Zheng Liang, Liangcheng Song, Chenrui Cao, Xian Zhang, Alberto Sangiovanni-Vincentelli, Pierluigi Nuzzo

View PDF HTML (experimental)

Abstract:Auto-formalization (AF) translates natural-language reasoning problems into solver-executable programs, enabling symbolic solvers to perform sound logical deduction. In practice, however, AF pipelines are currently brittle: programs may fail to execute, or execute but encode incorrect semantics. While prior work largely mitigates syntactic failures via repairs based on solver feedback, reducing semantics failures remains a major bottleneck. We propose Draft-and-Prune (D&P), an inference-time framework that improves AF-based logical reasoning via diversity and verification. D&P first drafts multiple natural-language plans and conditions program generation on them. It further prunes executable but contradictory or ambiguous formalizations, and aggregates predictions from surviving paths via majority voting. Across four representative benchmarks (AR-LSAT, ProofWriter, PrOntoQA, LogicalDeduction), D&P substantially strengthens AF-based reasoning without extra supervision. On AR-LSAT, in the AF-only setting, D&P achieves 78.43% accuracy with GPT-4 and 78.00% accuracy with GPT-4o, significantly outperforming the strongest AF baselines MAD-LOGIC and CLOVER. D&P then attains near-ceiling performance on the other benchmarks, including 100% on PrOntoQA and LogicalDeduction.

Subjects:

Artificial Intelligence (cs.AI)

Cite as: arXiv:2603.17233 [cs.AI]

(or arXiv:2603.17233v2 [cs.AI] for this version)

https://doi.org/10.48550/arXiv.2603.17233

arXiv-issued DOI via DataCite

Submission history

From: Zhiyu Ni [view email] [v1] Wed, 18 Mar 2026 00:35:14 UTC (361 KB) [v2] Thu, 26 Mar 2026 23:54:42 UTC (357 KB)

Original source

arXiv

https://arxiv.org/abs/2603.17233

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Research Papers

This Ancient Roman Game Board Was a Mystery. Researchers Used A.I. to Figure Out How to Play - Smithsonian Magazine

This Ancient Roman Game Board Was a Mystery. Researchers Used A.I. to Figure Out How to Play Smithsonian Magazine

GNews AI Netherlands

1mabout 1 month ago

ProductsLive

How to Build an AI Content Playbook That Actually Protects Your Voice

Ahnii! You've read the articles warning you not to let AI take over your content. Ruth Doherty's latest piece is one of the best: a clear-eyed breakdown of where AI helps and where it silently destroys your brand. This post shows you how to take that framework and turn it into an actual operating document for your content pipeline. Why a Framework Without a Playbook Doesn't Stick Ruth's core argument is sharp: AI is an efficiency engine, not a strategy engine. Use it for research, structuring, repurposing, and editing. Keep it away from messaging, customer research, and anything that requires your actual point of view. That distinction is easy to agree with. It's harder to enforce on a Tuesday afternoon when you're behind on three social posts and the AI can draft all of them in 90 seconds

Dev.to AI

6mabout 1 hour ago

CountriesFresh

Top 10 Best Universities to Study AI in USA 2026 Led by CMU and MIT With Strong Research and Industry Ties - International Business Times Australia

Top 10 Best Universities to Study AI in USA 2026 Led by CMU and MIT With Strong Research and Industry Ties International Business Times Australia

Google News: Machine Learning

1mabout 6 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 155 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

Draft-and-Prune: Improving the Reliability of Auto-formalization for Logical Reasoning

Submission history

Daily AI Digest

More about

This Ancient Roman Game Board Was a Mystery. Researchers Used A.I. to Figure Out How to Play - Smithsonian Magazine

How to Build an AI Content Playbook That Actually Protects Your Voice

Top 10 Best Universities to Study AI in USA 2026 Led by CMU and MIT With Strong Research and Industry Ties - International Business Times Australia

Knowledge Map

Connected Articles — Knowledge Graph

Discussion

More in Research Papers

This Ancient Roman Game Board Was a Mystery. Researchers Used A.I. to Figure Out How to Play - Smithsonian Magazine

URI Day Highlights Student Research and the Future of AI Education in Rhode Island - uri.edu

AI could transform patient education in eye care, new research shows - Medical Xpress

🥇Top AI Papers of the Week