Research Papers research paper arxiv computer-vision image-recognition

MarkushGrapher-2: End-to-end Multimodal Recognition of Chemical Structures

arXivMarch 31, 20262 min read0 views

arXiv:2603.28550v1 Announce Type: new Abstract: Automatically extracting chemical structures from documents is essential for the large-scale analysis of the literature in chemistry. Automatic pipelines have been developed to recognize molecules represented either in figures or in text independently. However, methods for recognizing chemical structures from multimodal descriptions (Markush structures) lag behind in precision and cannot be used for automatic large-scale processing. In this work, we present MarkushGrapher-2, an end-to-end approach for the multimodal recognition of chemical struct — Tim Strohmeyer, Lucas Morin, Gerhard Ingmar Meijer, Val\'ery Weber, Ahmed Nassar, Peter Staar

View PDF HTML (experimental)

Abstract:Automatically extracting chemical structures from documents is essential for the large-scale analysis of the literature in chemistry. Automatic pipelines have been developed to recognize molecules represented either in figures or in text independently. However, methods for recognizing chemical structures from multimodal descriptions (Markush structures) lag behind in precision and cannot be used for automatic large-scale processing. In this work, we present MarkushGrapher-2, an end-to-end approach for the multimodal recognition of chemical structures in documents. First, our method employs a dedicated OCR model to extract text from chemical images. Second, the text, image, and layout information are jointly encoded through a Vision-Text-Layout encoder and an Optical Chemical Structure Recognition vision encoder. Finally, the resulting encodings are effectively fused through a two-stage training strategy and used to auto-regressively generate a representation of the Markush structure. To address the lack of training data, we introduce an automatic pipeline for constructing a large-scale dataset of real-world Markush structures. In addition, we present IP5-M, a large manually-annotated benchmark of real-world Markush structures, designed to advance research on this challenging task. Extensive experiments show that our approach substantially outperforms state-of-the-art models in multimodal Markush structure recognition, while maintaining strong performance in molecule structure recognition. Code, models, and datasets are released publicly.

Comments: 15 pages, to be published in CVPR 2026

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2603.28550 [cs.CV]

(or arXiv:2603.28550v1 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2603.28550

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Tim Strohmeyer [view email] [v1] Mon, 30 Mar 2026 15:11:17 UTC (7,368 KB)

Original source

arXiv

https://arxiv.org/abs/2603.28550

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

ModelsLive

Google DeepMind s Research Lets an LLM Rewrite Its Own Game Theory Algorithms — And It Outperformed the Experts

Designing algorithms for Multi-Agent Reinforcement Learning (MARL) in imperfect-information games — scenarios where players act sequentially and cannot see each other s private information, like poker — has historically relied on manual iteration. Researchers identify weighting schemes, discounting rules, and equilibrium solvers through intuition and trial-and-error. Google DeepMind researchers proposes AlphaEvolve, an LLM-powered evolutionary coding agent [ ] The post Google DeepMind s Research Lets an LLM Rewrite Its Own Game Theory Algorithms — And It Outperformed the Experts appeared first on MarkTechPost .

MarkTechPost

8mabout 1 hour ago

AI ToolsFresh

Researchers build Wi-Fi chip that can operate inside a nuclear reactor — receiver uses special materials and design to withstand high doses of radiation for at least six months

tomshardware.com

3mabout 8 hours ago

Open Source AILive

We Ditched LangChain. Here’s What We Built Instead — and Why It’s Better for Serious AI Research.

How two lean open-source frameworks outperform the incumbents when you need typed skill contracts, concurrent scientific tool execution… Continue reading on Medium »