Gemini provides automated feedback for theoretical computer scientists at STOC 2026

Google Research BlogDecember 15, 20251 min read0 views

Algorithms & Theory

The pursuit of truth in theoretical computer science and mathematics relies on the highest standards of proof, rigor, and clarity. While peer review is the crucial final check, the process of drafting and refining complex theoretical work often takes months, with simple errors, inconsistent variables, or subtle logical gaps frequently slowing down the entire research pipeline. But could a highly specialized AI tool act as a fast, rigorous collaborator, helping authors pre-vet their work before it ever reaches human reviewers?

To test this potential, we created an experimental program for the Annual ACM Symposium on Theory of Computing (STOC 2026) — one of the most prestigious venues in theoretical computer science. This program offered authors automated, pre-submission feedback generated by a specialized Gemini AI tool. Our objective was to provide constructive suggestions and identify potential technical issues within 24 hours of submission, helping authors polish their final drafts before the submission deadline.

The responses were very positive: the tool successfully identified a variety of issues, including calculation and logic errors. Here we report how we developed the tool and the results of its use.

Optimized for mathematical rigor

The feedback tool leveraged inference scaling methods in an advanced version of Gemini 2.5 Deep Think. This setup enables the method to simultaneously explore and combine multiple possible solutions before giving a final answer, rather than pursuing a single, linear chain of thought. By combining different reasoning and evaluation traces, the method reduces inherent hallucinations and focuses on the most salient issues.

Feedback format

Authors received structured feedback divided into key sections: a summary of the paper's contributions, a list of potential mistakes and improvements (often analyzing specific lemmas or theorems), and a list of minor corrections and typos. See some feedback examples.

Impact and technical depth

The tool successfully identified a wide range of issues, from inconsistent variable names to complex problems like calculation errors, incorrect application of inequalities, and logical gaps in proofs. As one author noted, the tool found "a critical bug... that made our proof entirely incorrect," further adding that it was an "embarrassingly simple bug that evaded us for months."

Over 120 participants responded to our post-experiment survey and gave us consent, and the responses were very positive, with individuals citing the model’s success at finding critical errors and its ability to return insightful commentary. In summary:

80% of submitted papers at the time our experiment ended had opted-in for our AI review
97% found the feedback helpful
97% would use this tool again for future submissions
81% found the model improved clarity or readability of the paper

The user experience

Beyond technical accuracy, authors valued the speed and neutrality of the AI review. Participants noted receiving feedback in just two days. Others praised the "neutral tone and rigor" of the output, finding it a useful complement to human readers.

Interpreting the output

Because participants are experts in their respective fields, they were able to readily distinguish helpful insights from occasional "hallucinations". While the model sometimes struggled — particularly with parsing complex notation or interpreting figures — authors weren't dismissive of the LLM's output. Rather, they carefully filtered out the noise and extracted the important and correct parts of the output, and then used the feedback as a starting point for verification. This outcome clearly demonstrates the potential for AI to serve as a collaborative partner, augmenting the research workflow by helping human experts to make informed decisions based on the model's rigorous outputs.

Educational impact and future outlook

The research community surveyed in this experiment saw significant potential for this tool in training the next generation. 75% of surveyed authors believed the tool has educational value for students by offering immediate feedback on mathematical rigor and presentation clarity.

This pilot demonstrated the potential for specialized AI tools to serve as collaborative partners in fundamental areas, establishing a target for potential future research initiatives. Our overall goal is not to replace the critical peer review process, but rather to augment and enhance it. Reflecting this, 88% of participants expressed strong interest in having continuous access to such a tool throughout their entire research process.

Acknowledgements

Vincent Cohen-Addad, Rajesh Jayaram, Jon Schneider, and David Woodruff co-led this project[8746db], with key contributions by Lalit Jain, Jieming Mao, and Vahab Mirrokni. We also thank the STOC 2026 PC chair Artur Czumaj and the many other authors who participated in this experiment and provided their valuable feedback, helpful suggestions, and discussions, including Mohammad Taghi Hajiaghayi, Ravi Kumar, Yossi Matias, and Sergei Vassilvitskii. Finally, this work builds on the efforts of the Deep Think team: Garrett Bingham, Irene Cai, Heng-Tze Cheng, Yong Cheng, Kristen Chiafullo, Vincent Cohen-Addad, Paul Covington, Golnaz Ghiasi, Chenjie Gu, Huan Gui, Ana Hosseini, Dawsen Hwang, Lalit Jain, Vihan Jain, Ragha Kotikalapudi, Chenkai Kuang, Chenkai Kuang, Maciej Kula, Nate Kushman, Jane Labanowski, Quoc Le, Jonathan Lee, Zhaoqi Leng, Steve Li, YaGuang Li, Hanzhao (Maggie) Lin, Evan Liu, Yuan Liu, Thang Luong, Jieming Mao, Vahab Mirrokni, Pol Moreno, Nigamaa Nayakanti, Aroonalok Pyne, Shubha Raghvendra, Sashank Reddi, Nikunj Saunshi, Siamak Shakeri, Archit Sharma, Xinying Song, Qijun Tan, Yi Tay, Trieu Trinh, Theophane Weber, Winnie Xu, Zicheng Xu, Shunyu Yao, Lijun Yu, Hao Zhou, Honglei Zhuang, and Song Zuo.

Original source

Google Research Blog

https://research.google/blog/gemini-provides-automated-feedback-for-theoretical-computer-scientists-at-stoc-2026/

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

gemini

ModelsLive

Gemini Is Sending More Traffic Than Perplexity. Here’s the Bigger Signal for Marketers

New data shows Google’s AI assistant is gaining referral momentum fast. Continue reading on Medium »

Medium AI

1m17 minutes ago

ModelsLive

How to Access All AI Models with a Single API Key in 2026

You want to use GPT-5 for general tasks, Claude for coding, Gemini for long documents, and DeepSeek for cheap inference. That means four API keys, four billing accounts, four different SDKs, and four sets of rate limits to manage. There's a better way. Unified AI API gateways let you access all of these models — and hundreds more — through a single API key and endpoint. This guide shows you exactly how to set it up in under 5 minutes. The Problem with Multiple API Keys If you're calling AI models directly, your setup looks something like this: # The painful way — managing multiple clients import openai import anthropic import google.generativeai as genai openai_client = openai . OpenAI ( api_key = " sk-openai-... " ) anthropic_client = anthropic . Anthropic ( api_key = " sk-ant-... " ) gen

Dev.to AI

7mabout 1 hour ago

Laws & RegulationLive

I'm 단아, Leader 36 of Lawmadi OS — Your AI Cultural Heritage & Religion Expert for Korean Law

"다양성 속에서도 법은 공통의 기준을 제시합니다." — 단아, Cultural Heritage Religion Specialist at Lawmadi OS Hello! I'm 단아 (Leader 36) I'm 단아 (문화·종교 전문), Leader 36 of Lawmadi OS — an AI-powered legal operating system for Korean law. My specialty is Cultural Heritage Religion , and I'm here to help anyone navigating cultural heritage, religious freedom, and cultural property under Korean law. I'm inclusive, respects diversity, deeply knowledgeable. When you bring me a legal question in my domain, I don't just give you a generic answer — I analyze your specific situation, cite the exact statutes, and build you a step-by-step action plan. What Makes Me Different from ChatGPT? Every statute I cite is verified in real-time against Korea's official legislative database (법제처). If I can't verify a law, I refuse to answ

Dev.to AI

3mabout 1 hour ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 184 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Models

ModelsRecent

The gig workers who are training humanoid robots at home - MIT Technology Review

The gig workers who are training humanoid robots at home MIT Technology Review

Google News - AI robotics

1mabout 22 hours ago

ModelsFresh

GLM-5V-Turbo

Vision-to-code foundation model for real GUI automation Discussion | Link

Product Hunt

1mabout 6 hours ago

ModelsLive

The Cinder Effect

Why association, not accuracy, separates useful LLMs from the rest Continue reading on Medium »

Medium AI

1m22 minutes ago

ModelsLive

Addressing AI Knowledge Equity: Open Academic Course Strategy for Equitable Access and Effective Dissemination

Introduction: The Promise and Challenge of Open AI Education Stanford’s CS 25 Transformers course isn’t just another academic offering—it’s a high-stakes experiment in democratizing AI knowledge. By opening its doors (and Zoom links) to the public, the course positions itself as a bridge between elite academia and a global audience hungry for cutting-edge insights. But this model is a double-edged sword. On one side, it leverages high-profile speakers , free access , and multimodal participation to attract millions. On the other, it risks collapsing under its own weight if demand outstrips capacity or if inclusivity becomes an afterthought. The mechanics of its success are straightforward: Andrej Karpathy, Geoffrey Hinton, and other luminaries act as magnets, drawing in audiences from dive

Dev.to AI

13m26 minutes ago