Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessThe AI-Powered Agency: A Developer Playbook for Selling AI Services in 2026Dev.to AIYour AI Chatbot Isn't Stupid. It Just Has No Memory. Here's How We Fixed That.Dev.to AIInternational RegLab Project reports on AI use in nuclear power plant operations - Nuclear Energy Agency (NEA)Google News: AIAI Agent Tools for Small Business Owners: A Practical GuideDev.to AIPRH Germany sues OpenAI for ‘copyright infringement’ of children’s series - The BooksellerGoogle News: OpenAIEmail obfuscation: What works in 2026?!DEV CommunityReply Signs Strategic Collaboration Agreement with AWS to Accelerate AI-Driven Cloud Transformation - Press Release HubGoogle News: Generative AIDeepSource vs Qodana: Code Quality Platforms Compared (2026)DEV CommunityThe Senior Angular Take‑Home That Made Me Rethink Tech InterviewsDEV CommunityClaude Code Leak: 16 Lessons on Building Production-Ready AI SystemsAnalytics VidhyaImage Optimisation Strategies for Better LCP ScoresDEV CommunityStop Building AI Into Your Product. Start Building Products With AI.DEV CommunityBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessThe AI-Powered Agency: A Developer Playbook for Selling AI Services in 2026Dev.to AIYour AI Chatbot Isn't Stupid. It Just Has No Memory. Here's How We Fixed That.Dev.to AIInternational RegLab Project reports on AI use in nuclear power plant operations - Nuclear Energy Agency (NEA)Google News: AIAI Agent Tools for Small Business Owners: A Practical GuideDev.to AIPRH Germany sues OpenAI for ‘copyright infringement’ of children’s series - The BooksellerGoogle News: OpenAIEmail obfuscation: What works in 2026?!DEV CommunityReply Signs Strategic Collaboration Agreement with AWS to Accelerate AI-Driven Cloud Transformation - Press Release HubGoogle News: Generative AIDeepSource vs Qodana: Code Quality Platforms Compared (2026)DEV CommunityThe Senior Angular Take‑Home That Made Me Rethink Tech InterviewsDEV CommunityClaude Code Leak: 16 Lessons on Building Production-Ready AI SystemsAnalytics VidhyaImage Optimisation Strategies for Better LCP ScoresDEV CommunityStop Building AI Into Your Product. Start Building Products With AI.DEV Community
Eigenvector logo
AI NEWS HUBbyEIGENVECTOR

GeoTikzBridge: Advancing Multimodal Code Generation for Geometric Perception and Reasoning

arXivby [Submitted on 24 Mar 2026 (v1), last revised 27 Mar 2026 (this version, v2)]March 30, 20262 min read1 views
Source Quiz

arXiv:2603.22687v2 Announce Type: replace Abstract: Multimodal Large Language Models (MLLMs) have recently demonstrated remarkable perceptual and reasoning abilities. However, they struggle to perceive fine-grained geometric structures, constraining their ability of geometric understanding and visual reasoning. To address this, we propose GeoTikzBridge, a framework that enhances local geometric perception and visual reasoning through tikz-based code generation. Within this framework, we build two models supported by two complementary datasets. The GeoTikzBridge-Base model is trained on GeoTikz — Jiayin Sun, Caixia Sun, Boyu Yang, Hailin Li, Xiao Chen, Yi Zhang, Errui Ding, Liang Li, Chao Deng, Junlan Feng

View PDF HTML (experimental)

Abstract:Multimodal Large Language Models (MLLMs) have recently demonstrated remarkable perceptual and reasoning abilities. However, they struggle to perceive fine-grained geometric structures, constraining their ability of geometric understanding and visual reasoning. To address this, we propose GeoTikzBridge, a framework that enhances local geometric perception and visual reasoning through tikz-based code generation. Within this framework, we build two models supported by two complementary datasets. The GeoTikzBridge-Base model is trained on GeoTikz-Base dataset, the largest image-to-tikz dataset to date with 2.5M pairs (16 $\times$ larger than existing open-sourced datasets). This process is achieved via iterative data expansion and a localized geometric transformation strategy. Subsequently, GeoTikzBridge-Instruct is fine-tuned on GeoTikz-Instruct dataset which is the first instruction-augmented tikz dataset supporting visual reasoning. Extensive experimental results demonstrate that our models achieve state-of-the-art performance among open-sourced MLLMs. Furthermore, GeoTikzBridge models can serve as plug-and-play reasoning modules for any MLLM(LLM), enhancing reasoning performance in geometric problem-solving. Datasets and codes are publicly available at: this https URL.

Comments: accepted by CVPR 2026

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2603.22687 [cs.CV]

(or arXiv:2603.22687v2 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2603.22687

arXiv-issued DOI via DataCite

Submission history

From: Jiayin Sun [view email] [v1] Tue, 24 Mar 2026 01:27:51 UTC (6,313 KB) [v2] Fri, 27 Mar 2026 03:55:53 UTC (6,313 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
GeoTikzBrid…researchpaperarxivcomputer-vi…image-recog…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 228 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers