Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessCalifornia cements its role as the national testing ground for AI rules - AxiosGNews AI regulationEconomists Once Dismissed the A.I. Job Threat, but Not Anymore - nytimes.comGoogle News: AITech billionaires want to put data centers in space. The math could get ugly fast.Business InsiderStudent Parker Jones calls out college professors for being slow on AIBusiness InsiderGoogle Introduces Gemma 4 Open-Source AI Model, Enables Building Autonomous Agents - gadgets360.comGNews AI open sourceThe clock is ticking on law's billable hour, says a top Cleary Gottlieb lawyerBusiness InsiderLocal colleges ready students for a workforce laden with artificial intelligence - WXXI NewsGoogle News: AIScientists question why anyone would put data centers in space—a big bet for tech leaders like Elon Musk - Business InsiderGoogle News - Scale AI dataWhich cloud architecture decision do tech leaders regret most? Treating AI like just another workloadCIO Magazine90000 Tech Workers Got Fired This Year and Everyone Is Blaming AI but Thats Not the Whole StoryDev.to AILarge language models: the AI systems clinicians are now encountering - Irish Medical TimesGoogle News: LLMDesktop Canary v2.1.48-canary.26LobeChat ReleasesBlack Hat USADark ReadingBlack Hat AsiaAI BusinessCalifornia cements its role as the national testing ground for AI rules - AxiosGNews AI regulationEconomists Once Dismissed the A.I. Job Threat, but Not Anymore - nytimes.comGoogle News: AITech billionaires want to put data centers in space. The math could get ugly fast.Business InsiderStudent Parker Jones calls out college professors for being slow on AIBusiness InsiderGoogle Introduces Gemma 4 Open-Source AI Model, Enables Building Autonomous Agents - gadgets360.comGNews AI open sourceThe clock is ticking on law's billable hour, says a top Cleary Gottlieb lawyerBusiness InsiderLocal colleges ready students for a workforce laden with artificial intelligence - WXXI NewsGoogle News: AIScientists question why anyone would put data centers in space—a big bet for tech leaders like Elon Musk - Business InsiderGoogle News - Scale AI dataWhich cloud architecture decision do tech leaders regret most? Treating AI like just another workloadCIO Magazine90000 Tech Workers Got Fired This Year and Everyone Is Blaming AI but Thats Not the Whole StoryDev.to AILarge language models: the AI systems clinicians are now encountering - Irish Medical TimesGoogle News: LLMDesktop Canary v2.1.48-canary.26LobeChat Releases
AI NEWS HUBbyEIGENVECTOREigenvector

Can MLLMs Read Students' Minds? Unpacking Multimodal Error Analysis in Handwritten Math

arXivby [Submitted on 26 Mar 2026]March 26, 20262 min read1 views
Source Quiz

Assessing student handwritten scratchwork is crucial for personalized educational feedback but presents unique challenges due to diverse handwriting, complex layouts, and varied problem-solving approaches. Existing educational NLP primarily focuses on textual responses and neglects the complexity and multimodality inherent in authentic handwritten scratchwork. Current multimodal large language models (MLLMs) excel at visual reasoning but typically adopt an "examinee perspective", prioritizing generating correct answers rather than diagnosing student errors. To bridge these gaps, we introduce S — Dingjie Song, Tianlong Xu, Yi-Fan Zhang

View PDF HTML (experimental)

Abstract:Assessing student handwritten scratchwork is crucial for personalized educational feedback but presents unique challenges due to diverse handwriting, complex layouts, and varied problem-solving approaches. Existing educational NLP primarily focuses on textual responses and neglects the complexity and multimodality inherent in authentic handwritten scratchwork. Current multimodal large language models (MLLMs) excel at visual reasoning but typically adopt an "examinee perspective", prioritizing generating correct answers rather than diagnosing student errors. To bridge these gaps, we introduce ScratchMath, a novel benchmark specifically designed for explaining and classifying errors in authentic handwritten mathematics scratchwork. Our dataset comprises 1,720 mathematics samples from Chinese primary and middle school students, supporting two key tasks: Error Cause Explanation (ECE) and Error Cause Classification (ECC), with seven defined error types. The dataset is meticulously annotated through rigorous human-machine collaborative approaches involving multiple stages of expert labeling, review, and verification. We systematically evaluate 16 leading MLLMs on ScratchMath, revealing significant performance gaps relative to human experts, especially in visual recognition and logical reasoning. Proprietary models notably outperform open-source models, with large reasoning models showing strong potential for error explanation. All evaluation data and frameworks are publicly available to facilitate further research.

Comments: Accepted by the 27th International Conference on Artificial Intelligence in Education (AIED'26)

Subjects:

Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)

ACM classes: I.2.7; K.3.1

Cite as: arXiv:2603.24961 [cs.AI]

(or arXiv:2603.24961v1 [cs.AI] for this version)

https://doi.org/10.48550/arXiv.2603.24961

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Dingjie Song [view email] [v1] Thu, 26 Mar 2026 02:57:20 UTC (3,295 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Can MLLMs R…researchpaperarxivnlplanguage-mo…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 163 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers