Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessI Rewrote Our Payment Gateway in Rust. Revenue Impact Surprised MeDEV Community🚀 Gudu SQL Omni Lineage Analysis — Directly Inside VS CodeDEV CommunityThe $200 Billion Wait: How Outdated Banking Rails Are Strangling the Global WorkforceDEV CommunityBuilding AI Visibility Infrastructure: The Technical Architecture Behind JonomorDEV CommunityAlma and Rocky Linux ISOs: DVD vs Boot vs MinimalDEV CommunityA beginner's guide to the Nano-Banana-2 model by Google on ReplicateDEV CommunityStop Prompting; Use the Design-Log Method to Build Predictable ToolsDEV CommunityAI Day 2026 seeks to leverage AI for socio-economic development - VOV WorldGoogle News - AI VietnamAI Citations: The New Backlink and How to Track Them at ScaleDEV CommunityConnecting Generative Adversarial Networks and Actor-Critic MethodsDEV CommunityAre BP rings the future of ward monitoring?MobiHealthNewsI tested the 'survival computer' that has all the offline utility you need - including AIZDNet Big DataBlack Hat USADark ReadingBlack Hat AsiaAI BusinessI Rewrote Our Payment Gateway in Rust. Revenue Impact Surprised MeDEV Community🚀 Gudu SQL Omni Lineage Analysis — Directly Inside VS CodeDEV CommunityThe $200 Billion Wait: How Outdated Banking Rails Are Strangling the Global WorkforceDEV CommunityBuilding AI Visibility Infrastructure: The Technical Architecture Behind JonomorDEV CommunityAlma and Rocky Linux ISOs: DVD vs Boot vs MinimalDEV CommunityA beginner's guide to the Nano-Banana-2 model by Google on ReplicateDEV CommunityStop Prompting; Use the Design-Log Method to Build Predictable ToolsDEV CommunityAI Day 2026 seeks to leverage AI for socio-economic development - VOV WorldGoogle News - AI VietnamAI Citations: The New Backlink and How to Track Them at ScaleDEV CommunityConnecting Generative Adversarial Networks and Actor-Critic MethodsDEV CommunityAre BP rings the future of ward monitoring?MobiHealthNewsI tested the 'survival computer' that has all the offline utility you need - including AIZDNet Big Data
AI NEWS HUBbyEIGENVECTOREigenvector

Colon-Bench: An Agentic Workflow for Scalable Dense Lesion Annotation in Full-Procedure Colonoscopy Videos

arXivby [Submitted on 26 Mar 2026]March 26, 20262 min read1 views
Source Quiz

Early screening via colonoscopy is critical for colon cancer prevention, yet developing robust AI systems for this domain is hindered by the lack of densely annotated, long-sequence video datasets. Existing datasets predominantly focus on single-class polyp detection and lack the rich spatial, temporal, and linguistic annotations required to evaluate modern Multimodal Large Language Models (MLLMs). To address this critical gap, we introduce Colon-Bench, generated via a novel multi-stage agentic workflow. Our pipeline seamlessly integrates temporal proposals, bounding-box tracking, AI-driven vi — Abdullah Hamdi, Changchun Yang, Xin Gao

View PDF HTML (experimental)

Abstract:Early screening via colonoscopy is critical for colon cancer prevention, yet developing robust AI systems for this domain is hindered by the lack of densely annotated, long-sequence video datasets. Existing datasets predominantly focus on single-class polyp detection and lack the rich spatial, temporal, and linguistic annotations required to evaluate modern Multimodal Large Language Models (MLLMs). To address this critical gap, we introduce Colon-Bench, generated via a novel multi-stage agentic workflow. Our pipeline seamlessly integrates temporal proposals, bounding-box tracking, AI-driven visual confirmation, and human-in-the-loop review to scalably annotate full-procedure videos. The resulting verified benchmark is unprecedented in scope, encompassing 528 videos, 14 distinct lesion categories (including polyps, ulcers, and bleeding), over 300,000 bounding boxes, 213,000 segmentation masks, and 133,000 words of clinical descriptions. We utilize Colon-Bench to rigorously evaluate state-of-the-art MLLMs across lesion classification, Open-Vocabulary Video Object Segmentation (OV-VOS), and video Visual Question Answering (VQA). The MLLM results demonstrate surprisingly high localization performance in medical domains compared to SAM-3. Finally, we analyze common VQA errors from MLLMs to introduce a novel "colon-skill" prompting strategy, improving zero-shot MLLM performance by up to 9.7% across most MLLMs. The dataset and the code are available at this https URL .

Comments: preprint

Subjects:

Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)

Cite as: arXiv:2603.25645 [eess.IV]

(or arXiv:2603.25645v1 [eess.IV] for this version)

https://doi.org/10.48550/arXiv.2603.25645

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Abdullah Hamdi [view email] [v1] Thu, 26 Mar 2026 16:58:43 UTC (36,739 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Colon-Bench…researchpaperarxivcomputer-vi…image-recog…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Building knowledge graph…

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!