Research Papers research paper arxiv computer-vision image-recognition

An Annotation-to-Detection Framework for Autonomous and Robust Vine Trunk Localization in the Field by Mobile Agricultural Robots

arXivby [Submitted on 20 Mar 2026]March 31, 20262 min read0 views

arXiv:2603.26724v1 Announce Type: new Abstract: The dynamic and heterogeneous nature of agricultural fields presents significant challenges for object detection and localization, particularly for autonomous mobile robots that are tasked with surveying previously unseen unstructured environments. Concurrently, there is a growing need for real-time detection systems that do not depend on large-scale manually labeled real-world datasets. In this work, we introduce a comprehensive annotation-to-detection framework designed to train a robust multi-modal detector using limited and partially labeled — Dimitrios Chatziparaschis, Elia Scudiero, Brent Sams, Konstantinos Karydis

View PDF HTML (experimental)

Abstract:The dynamic and heterogeneous nature of agricultural fields presents significant challenges for object detection and localization, particularly for autonomous mobile robots that are tasked with surveying previously unseen unstructured environments. Concurrently, there is a growing need for real-time detection systems that do not depend on large-scale manually labeled real-world datasets. In this work, we introduce a comprehensive annotation-to-detection framework designed to train a robust multi-modal detector using limited and partially labeled training data. The proposed methodology incorporates cross-modal annotation transfer and an early-stage sensor fusion pipeline, which, in conjunction with a multi-stage detection architecture, effectively trains and enhances the system's multi-modal detection capabilities. The effectiveness of the framework was demonstrated through vine trunk detection in novel vineyard settings that featured diverse lighting conditions and varying crop densities to validate performance. When integrated with a customized multi-modal LiDAR and Odometry Mapping (LOAM) algorithm and a tree association module, the system demonstrated high-performance trunk localization, successfully identifying over 70% of trees in a single traversal with a mean distance error of less than 0.37m. The results reveal that by leveraging multi-modal, incremental-stage annotation and training, the proposed framework achieves robust detection performance regardless of limited starting annotations, showcasing its potential for real-world and near-ground agricultural applications.

Comments: 7 pages, 6 figures, conference

Subjects:

Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

Cite as: arXiv:2603.26724 [cs.CV]

(or arXiv:2603.26724v1 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2603.26724

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Dimitrios Chatziparaschis [view email] [v1] Fri, 20 Mar 2026 00:01:28 UTC (22,298 KB)

Original source

arXiv

https://arxiv.org/abs/2603.26724

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Building knowledge graph…

Discussion

No comments yet — be the first to share your thoughts!