Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessGeopolitics, AI, and Cybersecurity: Insights From RSAC 2026Dark ReadingThis International Fact-Checking Day, use these 5 tips to spot AI-generated contentFast Company TechFilevine Emphasizes Ethical AI and Autonomous Systems in Legal Tech Strategy - TipRanksGNews AI ethicsPriced Out by AI: The Memory Chip Crisis Hitting Every ConsumerHacker News AI TopShow HN: AgentDog – Open-source dashboard for monitoring local AI agentsHacker News AI TopAI Enforcement Accelerates as Federal Policy Stalls and States Step In - Morgan LewisGNews AI USAGemma 4 and Qwen3.5 on shared benchmarksReddit r/LocalLLaMA[P] Gemma 4 running on NVIDIA B200 and AMD MI355X from the same inference stack, 15% throughput gain over vLLM on BlackwellReddit r/MachineLearningThe energy and environmental impact of AI and how it undermines democracy - greenpeace.orgGNews AI energyShow HN: A TUI for checking and comparing cloud and AI pricingHacker News AI TopAttorney General Pam Bondi pushed outAxios TechShow HN: Screenbox – Self-hosted virtual desktops for AI agentsHacker News AI TopBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessGeopolitics, AI, and Cybersecurity: Insights From RSAC 2026Dark ReadingThis International Fact-Checking Day, use these 5 tips to spot AI-generated contentFast Company TechFilevine Emphasizes Ethical AI and Autonomous Systems in Legal Tech Strategy - TipRanksGNews AI ethicsPriced Out by AI: The Memory Chip Crisis Hitting Every ConsumerHacker News AI TopShow HN: AgentDog – Open-source dashboard for monitoring local AI agentsHacker News AI TopAI Enforcement Accelerates as Federal Policy Stalls and States Step In - Morgan LewisGNews AI USAGemma 4 and Qwen3.5 on shared benchmarksReddit r/LocalLLaMA[P] Gemma 4 running on NVIDIA B200 and AMD MI355X from the same inference stack, 15% throughput gain over vLLM on BlackwellReddit r/MachineLearningThe energy and environmental impact of AI and how it undermines democracy - greenpeace.orgGNews AI energyShow HN: A TUI for checking and comparing cloud and AI pricingHacker News AI TopAttorney General Pam Bondi pushed outAxios TechShow HN: Screenbox – Self-hosted virtual desktops for AI agentsHacker News AI Top
AI NEWS HUBbyEIGENVECTOREigenvector

DAGverse: Building Document-Grounded Semantic DAGs from Scientific Papers

arXivMarch 26, 202610 min read0 views
Source Quiz

Directed Acyclic Graphs (DAGs) are widely used to represent structured knowledge in scientific and technical domains. However, datasets for real-world DAGs remain scarce because constructing them typically requires expert interpretation of domain documents. We study Doc2SemDAG construction: recovering a preferred semantic DAG from a document together with the cited evidence and context that explain it. This problem is challenging because a document may admit multiple plausible abstractions, the intended structure is often implicit, and the supporting evidence is scattered across prose, equatio — Shu Wan, Saketh Vishnubhatla, Iskander Kushbay

View PDF HTML (experimental)

Abstract:Directed Acyclic Graphs (DAGs) are widely used to represent structured knowledge in scientific and technical domains. However, datasets for real-world DAGs remain scarce because constructing them typically requires expert interpretation of domain documents. We study Doc2SemDAG construction: recovering a preferred semantic DAG from a document together with the cited evidence and context that explain it. This problem is challenging because a document may admit multiple plausible abstractions, the intended structure is often implicit, and the supporting evidence is scattered across prose, equations, captions, and figures. To address these challenges, we leverage scientific papers containing explicit DAG figures as a natural source of supervision. In this setting, the DAG figure provides the DAG structure, while the accompanying text provides context and explanation. We introduce DAGverse, a framework for constructing document-grounded semantic DAGs from online scientific papers. Its core component, DAGverse-Pipeline, is a semi-automatic system designed to produce high-precision semantic DAG examples through figure classification, graph reconstruction, semantic grounding, and validation. As a case study, we test the framework for causal DAGs and release DAGverse-1, a dataset of 108 expert-validated semantic DAGs with graph-level, node-level, and edge-level evidence. Experiments show that DAGverse-Pipeline outperforms existing Vision-Language Models on DAG classification and annotation. DAGverse provides a foundation for document-grounded DAG benchmarks and opens new directions for studying structured reasoning grounded in real-world evidence.

Subjects:

Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Cite as: arXiv:2603.25293 [cs.AI]

(or arXiv:2603.25293v1 [cs.AI] for this version)

https://doi.org/10.48550/arXiv.2603.25293

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Shu Wan [view email] [v1] Thu, 26 Mar 2026 10:33:12 UTC (1,801 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
DAGverse: B…researchpaperarxivnlplanguage-mo…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 91 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!