Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessMassachusetts Sen. Ed Markey is putting AV firms on blast for using human staffersFast Company TechOpenClaw has 500,000 instances and no enterprise kill switchVentureBeat AIJump to play: Building with Gemini & MediaPipeGoogle Developers BlogADK Go 1.0 Arrives!Google Developers BlogAnnouncing ADK for Java 1.0.0: Building the Future of AI Agents in JavaGoogle Developers BlogPlan mode is now available in Gemini CLIGoogle Developers BlogUnleash Your Development Superpowers: Refining the Core Coding ExperienceGoogle Developers BlogClosing the knowledge gap with agent skillsGoogle Developers BlogBuild a smart financial assistant with LlamaParse and Gemini 3.1Google Developers BlogDeveloper’s Guide to AI Agent ProtocolsGoogle Developers BlogAnnouncing the Colab MCP Server: Connect Any AI Agent to Google ColabGoogle Developers BlogIntroducing Finish Changes and Outlines, now available in Gemini Code Assist extensions on IntelliJ and VS CodeGoogle Developers BlogBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessMassachusetts Sen. Ed Markey is putting AV firms on blast for using human staffersFast Company TechOpenClaw has 500,000 instances and no enterprise kill switchVentureBeat AIJump to play: Building with Gemini & MediaPipeGoogle Developers BlogADK Go 1.0 Arrives!Google Developers BlogAnnouncing ADK for Java 1.0.0: Building the Future of AI Agents in JavaGoogle Developers BlogPlan mode is now available in Gemini CLIGoogle Developers BlogUnleash Your Development Superpowers: Refining the Core Coding ExperienceGoogle Developers BlogClosing the knowledge gap with agent skillsGoogle Developers BlogBuild a smart financial assistant with LlamaParse and Gemini 3.1Google Developers BlogDeveloper’s Guide to AI Agent ProtocolsGoogle Developers BlogAnnouncing the Colab MCP Server: Connect Any AI Agent to Google ColabGoogle Developers BlogIntroducing Finish Changes and Outlines, now available in Gemini Code Assist extensions on IntelliJ and VS CodeGoogle Developers Blog

{\dag}DAGGER: Distractor-Aware Graph Generation for Executable Reasoning in Math Problems

arXivMarch 31, 202610 min read0 views
Source Quiz

arXiv:2601.06853v2 Announce Type: replace-cross Abstract: Chain-of-Thought (CoT) prompting is widely adopted for mathematical problem solving, including in low-resource languages, yet its behavior under irrelevant context remains underexplored. To systematically study this challenge, we introduce DISTRACTMATH-BN, a Bangla benchmark that augments MGSM and MSVAMP with semantically coherent but computationally irrelevant information. Evaluating seven models ranging from 3B to 12B parameters, we observe substantial performance degradation under distractors: standard models drop by up to 41 points, — Zabir Al Nazi, Shubhashis Roy Dipta, Sudipta Kar

View PDF HTML (experimental)

Abstract:Chain-of-Thought (CoT) prompting is widely adopted for mathematical problem solving, including in low-resource languages, yet its behavior under irrelevant context remains underexplored. To systematically study this challenge, we introduce DISTRACTMATH-BN, a Bangla benchmark that augments MGSM and MSVAMP with semantically coherent but computationally irrelevant information. Evaluating seven models ranging from 3B to 12B parameters, we observe substantial performance degradation under distractors: standard models drop by up to 41 points, while reasoning-specialized models decline by 14 to 20 points despite consuming five times more tokens. We propose †DAGGER, which reformulates mathematical problem solving as executable computational graph generation with explicit modeling of distractor nodes. Fine-tuning Gemma-3 models using supervised fine-tuning followed by Group Relative Policy Optimization achieves comparable weighted accuracy on augmented benchmarks while using 89 percent fewer tokens than reasoning models. Importantly, this robustness emerges without explicit training on distractor-augmented examples. Our results suggest that enforcing structured intermediate representations improves robustness and inference efficiency in mathematical reasoning compared to free-form approaches, particularly in noisy, low-resource settings.

Subjects:

Computation and Language (cs.CL); Machine Learning (cs.LG)

Cite as: arXiv:2601.06853 [cs.CL]

(or arXiv:2601.06853v2 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2601.06853

arXiv-issued DOI via DataCite

Submission history

From: Zabir Al Nazi [view email] [v1] Sun, 11 Jan 2026 10:51:03 UTC (1,074 KB) [v2] Sat, 28 Mar 2026 07:32:22 UTC (4,318 KB)

Original source

arXiv

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
{\dag}DAGGE…researchpaperarxivmachine-lea…deep-learni…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 150 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers