Research Papers research paper arxiv ai artificial-intelligence

SciEGQA: A Dataset for Scientific Evidence-Grounded Question Answering and Reasoning

arXivby [Submitted on 19 Nov 2025 (v1), last revised 30 Mar 2026 (this version, v2)]March 31, 20262 min read1 views

arXiv:2511.15090v2 Announce Type: replace-cross Abstract: Scientific documents contain complex multimodal structures, which makes evidence localization and scientific reasoning in Document Visual Question Answering particularly challenging. However, most existing benchmarks evaluate models only at the page level without explicitly annotating the evidence regions that support the answer, which limits both interpretability and the reliability of evaluation. To address this limitation, we introduce SciEGQA, a scientific document question answering and reasoning dataset with semantic evidence grou — Wenhan Yu, Zhaoxi Zhang, Wang Chen, Guanqiang Qi, Weikang Li, Lei Sha, Deguo Xia, Jizhou Huang

View PDF HTML (experimental)

Abstract:Scientific documents contain complex multimodal structures, which makes evidence localization and scientific reasoning in Document Visual Question Answering particularly challenging. However, most existing benchmarks evaluate models only at the page level without explicitly annotating the evidence regions that support the answer, which limits both interpretability and the reliability of evaluation. To address this limitation, we introduce SciEGQA, a scientific document question answering and reasoning dataset with semantic evidence grounding, where supporting evidence is represented as semantically coherent document regions annotated with bounding boxes. SciEGQA consists of two components: a human-annotated fine-grained benchmark containing 1,623 high-quality question--answer pairs, and a large-scale automatically constructed training set with over 30K QA pairs generated through an automated data construction pipeline. Extensive experiments on a wide range of Vision-Language Models (VLMs) show that existing models still struggle with evidence localization and evidence-based question answering in scientific documents. Training on the proposed dataset significantly improves the scientific reasoning capabilities of VLMs. The project page is available at this https URL.

Comments: 8 pages, 4 figures, 3 tables

Subjects:

Databases (cs.DB); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2511.15090 [cs.DB]

(or arXiv:2511.15090v2 [cs.DB] for this version)

https://doi.org/10.48550/arXiv.2511.15090

arXiv-issued DOI via DataCite

Submission history

From: Wenhan Yu [view email] [v1] Wed, 19 Nov 2025 04:03:54 UTC (2,212 KB) [v2] Mon, 30 Mar 2026 06:53:39 UTC (22,175 KB)

Original source

arXiv

https://arxiv.org/abs/2511.15090

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Models

Transformer Paper Authors at AI Startup Debut Open Source Model - Bloomberg.com

Transformer Paper Authors at AI Startup Debut Open Source Model Bloomberg.com

GNews AI transformer

1m4 months ago

Research PapersFresh

RFT FPCM OV - a Hugging Face Space by RFTSystems

huggingface.co RFT FPCM OV - a Hugging Face Space by RFTSystems RFT Fixed Parameter Cosmology Model, Open Validation 1. Fixed‑Parameter Cosmology Panel (FPCM‑OV) This side of the Space shows the core RFT cosmology running on one locked parameter set. Nothing adjusts itself — the whole model stands or falls on this single solution. What people can see here Age at z = 13.67: RFT gives 568.52 Myr , which lines up with JWST early‑galaxy maturity without any tuning. Horizon Ratio: The model naturally produces a horizon about 490× larger than ΛCDM. (This removes the horizon problem without inflation.) Unified Expansion Curve (H_RFT) The purple curve shows how expansion behaves across all redshifts using the same fixed parameters. JWST Maturity Plot The cyan and red lines show how RFT’s predicted