Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessConnecting MCP servers to Amazon Bedrock AgentCore Gateway using Authorization Code flowAWS Machine Learning BlogParsing the AI and gaming future with Nvidia’s Jensen Huang | GTC Q&A - GamesBeatGNews AI NVIDIAStartup Battlefield 200 applications open: A chance for VC access, TechCrunch coverage, and $100KTechCrunch Venture🔥 Jeffallan/claude-skillsGitHub Trending🔥 teng-lin/notebooklm-pyGitHub Trending🔥 HKUDS/DeepTutorGitHub TrendingNebius Stock Rises on $12B Meta AI Deal & Nvidia Investment | 2026 - News and Statistics - indexbox.ioGNews AI NVIDIAHow to use the new ChatGPT app integrations, including DoorDash, Spotify, Uber, and othersTechCrunch AINvidia’s AI Boom Faces Taiwan Supply Risk - NVIDIA (NASDAQ:NVDA), Taiwan Semiconductor (NYSE:TSM) - BenzingaGNews AI NVIDIAFrom Prompt Engineering to Harness Engineering: The Next Evolution of LLM SystemsTowards AIAdvancing Responsible AI Adoption and Use in the Public Sector: Three Policy Priorities for State LegislationCenter for Democracy & TechnologyNvidia Might Have a Memory Problem, Analyst Says. What It Means for the Stock. - Barron'sGNews AI NVIDIABlack Hat USAAI BusinessBlack Hat AsiaAI BusinessConnecting MCP servers to Amazon Bedrock AgentCore Gateway using Authorization Code flowAWS Machine Learning BlogParsing the AI and gaming future with Nvidia’s Jensen Huang | GTC Q&A - GamesBeatGNews AI NVIDIAStartup Battlefield 200 applications open: A chance for VC access, TechCrunch coverage, and $100KTechCrunch Venture🔥 Jeffallan/claude-skillsGitHub Trending🔥 teng-lin/notebooklm-pyGitHub Trending🔥 HKUDS/DeepTutorGitHub TrendingNebius Stock Rises on $12B Meta AI Deal & Nvidia Investment | 2026 - News and Statistics - indexbox.ioGNews AI NVIDIAHow to use the new ChatGPT app integrations, including DoorDash, Spotify, Uber, and othersTechCrunch AINvidia’s AI Boom Faces Taiwan Supply Risk - NVIDIA (NASDAQ:NVDA), Taiwan Semiconductor (NYSE:TSM) - BenzingaGNews AI NVIDIAFrom Prompt Engineering to Harness Engineering: The Next Evolution of LLM SystemsTowards AIAdvancing Responsible AI Adoption and Use in the Public Sector: Three Policy Priorities for State LegislationCenter for Democracy & TechnologyNvidia Might Have a Memory Problem, Analyst Says. What It Means for the Stock. - Barron'sGNews AI NVIDIA
AI NEWS HUBbyEIGENVECTOREigenvector

FAST3DIS: Feed-forward Anchored Scene Transformer for 3D Instance Segmentation

arXivby [Submitted on 27 Mar 2026]March 30, 20262 min read1 views
Source Quiz

arXiv:2603.25993v1 Announce Type: new Abstract: While recent feed-forward 3D reconstruction models provide a strong geometric foundation for scene understanding, extending them to 3D instance segmentation typically relies on a disjointed "lift-and-cluster" paradigm. Grouping dense pixel-wise embeddings via non-differentiable clustering scales poorly with the number of views and disconnects representation learning from the final segmentation objective. In this paper, we present a Feed-forward Anchored Scene Transformer for 3D Instance Segmentation (FAST3DIS), an end-to-end approach that effecti — Changyang Li, Xueqing Huang, Shin-Fang Chng, Huangying Zhan, Qingan Yan, Yi Xu

View PDF HTML (experimental)

Abstract:While recent feed-forward 3D reconstruction models provide a strong geometric foundation for scene understanding, extending them to 3D instance segmentation typically relies on a disjointed "lift-and-cluster" paradigm. Grouping dense pixel-wise embeddings via non-differentiable clustering scales poorly with the number of views and disconnects representation learning from the final segmentation objective. In this paper, we present a Feed-forward Anchored Scene Transformer for 3D Instance Segmentation (FAST3DIS), an end-to-end approach that effectively bypasses post-hoc clustering. We introduce a 3D-anchored, query-based Transformer architecture built upon a foundational depth backbone, adapted efficiently to learn instance-specific semantics while retaining its zero-shot geometric priors. We formulate a learned 3D anchor generator coupled with an anchor-sampling cross-attention mechanism for view-consistent 3D instance segmentation. By projecting 3D object queries directly into multi-view feature maps, our method samples context efficiently. Furthermore, we introduce a dual-level regularization strategy, that couples multi-view contrastive learning with a dynamically scheduled spatial overlap penalty to explicitly prevent query collisions and ensure precise instance boundaries. Experiments on complex indoor 3D datasets demonstrate that our approach achieves competitive segmentation accuracy with significantly improved memory scalability and inference speed over state-of-the-art clustering-based methods.

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2603.25993 [cs.CV]

(or arXiv:2603.25993v1 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2603.25993

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Changyang Li [view email] [v1] Fri, 27 Mar 2026 00:45:31 UTC (2,194 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
FAST3DIS: F…researchpaperarxivcomputer-vi…image-recog…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 264 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers