Models model language model benchmark training announce open-source

ReViSQL: Achieving Human-Level Text-to-SQL

arXiv cs.DBby Yuxuan Zhu, Tengjun Jin, Yoojin Choi, Daniel KangApril 1, 20262 min read0 views

arXiv:2603.20004v2 Announce Type: replace Abstract: Translating natural language to SQL (Text-to-SQL) is a critical challenge in both database research and data analytics applications. Recent efforts have focused on enhancing SQL reasoning by developing large language models and AI agents that decompose Text-to-SQL tasks into manually designed, step-by-step pipelines. However, despite these extensive architectural engineering efforts, a significant gap remains: even state-of-the-art (SOTA) AI agents have not yet achieved the human-level accuracy on the BIRD benchmark. In this paper, we show that closing this gap does not require further architectural complexity, but rather clean training data to improve SQL reasoning of the underlying models. We introduce ReViSQL, a streamlined framework t

View PDF HTML (experimental)

Abstract:Translating natural language to SQL (Text-to-SQL) is a critical challenge in both database research and data analytics applications. Recent efforts have focused on enhancing SQL reasoning by developing large language models and AI agents that decompose Text-to-SQL tasks into manually designed, step-by-step pipelines. However, despite these extensive architectural engineering efforts, a significant gap remains: even state-of-the-art (SOTA) AI agents have not yet achieved the human-level accuracy on the BIRD benchmark. In this paper, we show that closing this gap does not require further architectural complexity, but rather clean training data to improve SQL reasoning of the underlying models. We introduce ReViSQL, a streamlined framework that achieves human-level accuracy on BIRD for the first time. Instead of complex AI agents, ReViSQL leverages reinforcement learning with verifiable rewards (RLVR) on BIRD-Verified, a dataset we curated comprising 2.5k verified Text-to-SQL instances based on the BIRD Train set. To construct BIRD-Verified, we design a data correction and verification workflow involving SQL experts. We identified and corrected data errors in 61.1% of a subset of BIRD Train. By training on BIRD-Verified, we show that improving data quality alone boosts the single-generation accuracy by 8.2-13.9% under the same RLVR algorithm. To further enhance performance, ReViSQL performs inference-time scaling via execution-based reconciliation and majority voting. Empirically, we demonstrate the superiority of our framework with two model scales: ReViSQL-235B-A22B and ReViSQL-30B-A3B. On an expert-verified BIRD Mini-Dev set, ReViSQL-235B-A22B achieves 93.2% execution accuracy, exceeding the proxy human-level accuracy (92.96%) and outperforming the prior open-source SOTA method by 9.8%. Our lightweight ReViSQL-30B-A3B matches the prior SOTA at a 7.5$\times$ lower per-query cost.

Subjects:

Databases (cs.DB); Computation and Language (cs.CL)

ACM classes: H.2.3

Cite as: arXiv:2603.20004 [cs.DB]

(or arXiv:2603.20004v2 [cs.DB] for this version)

https://doi.org/10.48550/arXiv.2603.20004

arXiv-issued DOI via DataCite

Submission history

From: Yuxuan Zhu [view email] [v1] Fri, 20 Mar 2026 14:49:27 UTC (2,129 KB) [v2] Mon, 30 Mar 2026 21:21:45 UTC (2,129 KB)

Original source

arXiv cs.DB

https://arxiv.org/abs/2603.20004

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modellanguage modelbenchmark

ReleasesLive

How We Cut Claude Code Session Overhead with Lazy-Loaded Personas

If you use Claude Code with a heavily customized CLAUDE.md , every message you send carries that full file as context. Not just once at session start — on every turn. That matters more than most people realize. The Problem: Eager-Loading Everything The naive approach to building a multi-persona system in Claude Code is to define all your personas directly in CLAUDE.md . It feels clean — everything in one place, always available. The cost: if you have 23 specialist personas, each defined in 150-200 lines, you're looking at 3,000-5,000 tokens of persona definitions loaded on every single message — regardless of whether the current task has anything to do with a UX designer or a financial analyst. Claude Code's CLAUDE.md is not a one-time setup file. It is re-injected into context on every tu

DEV Community

4mabout 1 hour ago

ReleasesLive

PACELC Theorem in System Design

The PACELC Theorem represents a foundational advancement in understanding the inherent trade-offs that define modern distributed systems . Developed as a direct extension of the CAP Theorem , it provides architects and engineers with a more complete framework for reasoning about system behavior under both failure conditions and normal operations. Where earlier models focused narrowly on rare network failures, the PACELC Theorem acknowledges that consistency , availability , and latency constantly interact in real production environments. The Evolution from CAP to PACELC The CAP Theorem established that in the presence of a network partition , a distributed system can guarantee only two out of three properties: Consistency , Availability , and Partition Tolerance . This insight proved inval

DEV Community

10m44 minutes ago

ReleasesLive

The Type System: What You Know, What's New, and What's Weird

My project: Hermes IDE | GitHub Me: gabrielanhaia You'll reach for class hierarchies and abstract classes. Stop. TypeScript has something better for most of those cases. In Post 1 , we covered the big mental shifts: structural typing, type erasure, null vs undefined, how overloading isn't really overloading. That was the "prepare yourself" post. This one is where we actually build things with the type system. I'll split it by feel: the stuff that'll be instantly familiar, the stuff that's genuinely new, and the stuff that'll trip you up because it looks familiar but behaves differently. Primitives, Arrays, Objects: The Familiar Stuff I'll keep this short because you already know what types are. const name : string = " Gabriel " ; const age : number = 31 ; const isActive : boolean = true ;

DEV Community

16m38 minutes ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 180 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

ReViSQL: Achieving Human-Level Text-to-SQL

Submission history

Daily AI Digest

More about

How We Cut Claude Code Session Overhead with Lazy-Loaded Personas

PACELC Theorem in System Design

The Type System: What You Know, What's New, and What's Weird

Knowledge Map

Connected Articles — Knowledge Graph

Discussion

More in Models

Google launches Gemma 4 open models with 140 languages, 400M downloads - geo.tv

AI World Models: What Leaders Should Know - WSJ

LLMs Protect Each Other From Shutdown, Study Finds - nationaltoday.com

New ways to balance cost and reliability in the Gemini API - blog.google