ProofBridge: Auto-Formalization of Natural Language Proofs in Lean via Joint Embeddings
arXiv:2510.15681v3 Announce Type: replace-cross Abstract: Translating human-written mathematical theorems and proofs from natural language (NL) into formal languages (FLs) like Lean 4 has long been a significant challenge for AI. Most state-of-the-art methods either focus on theorem-only NL-to-FL auto-formalization or on FL proof synthesis from FL theorems. In practice, auto-formalization of both theorem and proof still requires human intervention, as seen in AlphaProof's silver-medal performance at the 2024 IMO, where problem statements were manually translated before automated proof synthesi — Prithwish Jana, Kaan Kale, Ahmet Ege Tanriverdi, Cruise Song, Sriram Vishwanath, Vijay Ganesh
View PDF HTML (experimental)
Abstract:Translating human-written mathematical theorems and proofs from natural language (NL) into formal languages (FLs) like Lean 4 has long been a significant challenge for AI. Most state-of-the-art methods either focus on theorem-only NL-to-FL auto-formalization or on FL proof synthesis from FL theorems. In practice, auto-formalization of both theorem and proof still requires human intervention, as seen in AlphaProof's silver-medal performance at the 2024 IMO, where problem statements were manually translated before automated proof synthesis. We present ProofBridge, a unified framework for automatically translating entire NL theorems and proofs into Lean 4. At its core is a joint embedding model that aligns NL and FL (NL-FL) theorem+proof pairs in a shared semantic space, enabling cross-modal retrieval of semantically relevant FL examples to guide translation. ProofBridge integrates retrieval-augmented fine-tuning with iterative proof repair, leveraging Lean's type checker and semantic equivalence feedback to ensure both syntactic correctness and semantic fidelity. Experiments show substantial improvements in proof auto-formalization over strong baselines (including GPT-5, Gemini-2.5, Kimina-Prover, DeepSeek-Prover), with our retrieval-augmented approach yielding significant gains in semantic correctness (SC, via proving bi-directional equivalence) and type correctness (TC, via type-checking theorem+proof) across pass@k metrics on miniF2F-Test-PF, a dataset we curated. In particular, ProofBridge improves cross-modal retrieval quality by up to 3.28x Recall@1 over all-MiniLM-L6-v2, and achieves +31.14% SC and +1.64% TC (pass@32) compared to the baseline Kimina-Prover-RL-1.7B.
Comments: Published as a conference paper at the 14th International Conference on Learning Representations (ICLR 2026), Rio de Janeiro, Brazil, April 23-27, 2026
Subjects:
Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI)
ACM classes: I.2.3; I.2.7; F.4; F.3.1; I.2.6
Cite as: arXiv:2510.15681 [cs.LO]
(or arXiv:2510.15681v3 [cs.LO] for this version)
https://doi.org/10.48550/arXiv.2510.15681
arXiv-issued DOI via DataCite
Submission history
From: Prithwish Jana [view email] [v1] Fri, 17 Oct 2025 14:20:50 UTC (1,993 KB) [v2] Sun, 7 Dec 2025 23:34:50 UTC (2,078 KB) [v3] Sun, 29 Mar 2026 12:53:42 UTC (2,112 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxivExclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models - WSJ
<a href="https://news.google.com/rss/articles/CBMiuANBVV95cUxNYUVQMi1oOXZRcFlOR0tqMkVXY3EwMWljTlNlSW9aQWpmSVIzUkxmOE9pMmN4RUZ2RlpJWk1hSWlmdWxQNm9kUWdreFZzdTcxWTVubllRdWNWZW03UlBVSm83SHNDaWhja2tiYnpMeW5NQm4zWE1XMzRQRkpuU0pTSV9nWUJGaFk5UEQzU2lDTEViZEdnZFlVbTY5UXdDc2pYTGg5VkhhSlFMVXphMkFhbm9USzhReDNqN0JTQTFzWFl2cmMxQmNULVlhUXRHOWlHb1BoMHI2V0hnZ0pvWERVYVkxSDAxS2FTalEwYzRwQm5ERWhHRjRXNzJxbm1qMjhWM0l6MWJuT1BPRnp2cjU4QV9iRHk4SlJjSnVkQTBreXhFc01LX1dQeGxQVW1GV1Qxal9Ua3RDODVpVnZwVXZYdzVVd1RpcUw1RklYR0gycDFHOF9Id1VLYjZsc3RuYWFnVkV4TlQzTlY3WlVCTkszVm1XUGN6TFNucVN3ZlFfcXEtRFhEV01EY3g4X1psdkR1czk1RFhWalFSU3Bna3BRQjFOY2xnVVlYcE84eFR6T0Rucm9LOE43Zg?oc=5" target="_blank">Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models</a> <font color="#6f6f6f">WSJ</font>
Predicting new research directions in materials science using large language models and concept graphs - Nature
<a href="https://news.google.com/rss/articles/CBMiX0FVX3lxTFB4bkczZlNBWVNwTkF3UFR1VkpiTGE4U0drSnNHa3J3WnlnOGtMbEdaLTE5VUhsaGpCUHFieWZWNjZ1UEE3Qk9nb1NEbklNNEE4aERhWVUyWndLLUZNYkdR?oc=5" target="_blank">Predicting new research directions in materials science using large language models and concept graphs</a> <font color="#6f6f6f">Nature</font>

Google Deepmind study exposes six "traps" that can easily hijack autonomous AI agents in the wild
AI agents are expected to browse the web on their own, handle emails, and carry out transactions. But the very environment they operate in can be weaponized against them. Researchers at Google Deepmind have put together the first systematic catalog of how websites, documents, and APIs can be used to manipulate, deceive, and hijack autonomous agents, and they've identified six main categories of attack. The article Google Deepmind study exposes six "traps" that can easily hijack autonomous AI agents in the wild appeared first on The Decoder .
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers
Researchers to use robotics and AI to help sheep producers - University of Nevada, Reno
<a href="https://news.google.com/rss/articles/CBMic0FVX3lxTFB4UmxpREpFODBJN0lKakYwRVVtdlZPNmNiTExRelVFaDYzYW9kX2RCc0pEZjlmX01fT1dWYTlxZE1ET2ZKVVgzSVZIenY3bDlHa3FXS1dUdVBmTEdLa1hUR2x3OWxHbkE2RnROSjl6VHVHQ2c?oc=5" target="_blank">Researchers to use robotics and AI to help sheep producers</a> <font color="#6f6f6f">University of Nevada, Reno</font>
AIRA_2: Breaking Bottlenecks In AI Research Agents - Forbes
<a href="https://news.google.com/rss/articles/CBMiowFBVV95cUxNNmtndHhmQ2lpZGdPdTJwY25xejcyV1c1SWNLdWFOWnNwbjRUQTF0ZWdOZFNaclNBNWVsaUgtU0JUM2xrakhoOXVLMVJzVTNkajdrMmJGeS1lYUpMUG1NMkZNMDJFREZZdXU2ZVdEbkNZSDNBRjJBLVYyZE9XeEY4T0RJY3J5aDVWcEZVQ2lWUjhUYXBsUk16d09NdGdsQ3lxb3gw?oc=5" target="_blank">AIRA_2: Breaking Bottlenecks In AI Research Agents</a> <font color="#6f6f6f">Forbes</font>
Oracle Layoffs Recast Costs To Back US$50b AI Infrastructure Bet - simplywall.st
<a href="https://news.google.com/rss/articles/CBMivwFBVV95cUxQNWpZb2ZQVDBIOGVZTTBtLThzaGwxS3NkMnJBSS1wek5pQlJXRWdTOEh5aTdPTE9Cd3JHdjZDeWRtVzdMUUdESHJOQXZDdGNVdGZtTTBhanpfb3UxQnRobVlzNGdVUXJLZWptV2V6NXlNSWllX3FxOU5XYTF0RkM2TnJIaFJkcVBFOGc2alBSLTZEeU85QU1oTjBrMVZSTl84dm9GeFl5OGtUMjc3LVd1dS1fcHZ1RG9HcV82T2JFWdIBxAFBVV95cUxOSE5XVXh0QkM4Yi1WbXNhWkJ2Z2dLRlBGNjAwaTcyNFJWMWRPdXo5WjRQQkRGTG9IamxxbmdhMHpsaEJ6RDQwZl9ENGl5WDc5a2lrTXZ1bVpFbGdsdndHYjFINnZPSnNKX1dZamszUXByR1BlRXF6d1pKOHpBU3M5UFhUSldlUWtIMlRNQzdvTk9haEJKeDI1ZEg0WWQ1SXYzLUZCWElQc3pzR19ucGExdVpnc2hBQXlQNVpOZFVBVzRkLXFE?oc=5" target="_blank">Oracle Layoffs Recast Costs To Back US$50b AI Infrastructure Bet</a> <font color="#6f6f6f">simplywall.st</font>


Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!