Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessSmart food safety: implementing AI for risk, compliance and control - New Food magazineGoogle News: AI SafetyDonald Trump's Iran Address: White House Confirms Major Security Update Following Toll ThreatsInternational Business TimesTikTok ran ads for AI apps that let users undress strangersBusiness InsiderEnd of an era: Elon Musk says Tesla is no longer producing the Model S and XBusiness InsiderOpenAI's new partner wants to build ads that can chat with youBusiness InsiderOpenAI's new partner wants to build ads that can chat with you - Business InsiderGoogle News: OpenAIAnthropic confirms it leaked 512,000 lines of Claude Code source code — spilling some of its biggest secrets - TechRadarGoogle News: ClaudeQ1 2026 Shatters Venture Funding Records As AI Boom Pushes Startup Investment To Nearly $300BCrunchbase NewsMeet 'Dobby': The AI agent that could kill the app economyBusiness InsiderThis company is turning YouTube videos into TV shows as streamers chase Gen AlphaBusiness InsiderThe gig workers who are training humanoid robots at homeMIT Technology Review AIWhat to expect from WWDC 2026EngadgetBlack Hat USADark ReadingBlack Hat AsiaAI BusinessSmart food safety: implementing AI for risk, compliance and control - New Food magazineGoogle News: AI SafetyDonald Trump's Iran Address: White House Confirms Major Security Update Following Toll ThreatsInternational Business TimesTikTok ran ads for AI apps that let users undress strangersBusiness InsiderEnd of an era: Elon Musk says Tesla is no longer producing the Model S and XBusiness InsiderOpenAI's new partner wants to build ads that can chat with youBusiness InsiderOpenAI's new partner wants to build ads that can chat with you - Business InsiderGoogle News: OpenAIAnthropic confirms it leaked 512,000 lines of Claude Code source code — spilling some of its biggest secrets - TechRadarGoogle News: ClaudeQ1 2026 Shatters Venture Funding Records As AI Boom Pushes Startup Investment To Nearly $300BCrunchbase NewsMeet 'Dobby': The AI agent that could kill the app economyBusiness InsiderThis company is turning YouTube videos into TV shows as streamers chase Gen AlphaBusiness InsiderThe gig workers who are training humanoid robots at homeMIT Technology Review AIWhat to expect from WWDC 2026Engadget

Falcon Perception

arXivMarch 31, 20262 min read0 views
Source Quiz

arXiv:2603.27365v1 Announce Type: new Abstract: Perception-centric systems are typically implemented with a modular encoder-decoder pipeline: a vision backbone for feature extraction and a separate decoder (or late-fusion module) for task prediction. This raises a central question: is this architectural separation essential or can a single early-fusion stack do both perception and task modeling at scale? We introduce Falcon Perception, a unified dense Transformer that processes image patches and text tokens in a shared parameter space from the first layer, using a hybrid attention pattern (bid — Aviraj Bevli, Sofian Chaybouti, Yasser Dahou, Hakim Hacid, Ngoc Dung Huynh, Phuc H. Le Khac, Sanath Narayan, Wamiq Reyaz Para, Ankit Singh

View PDF HTML (experimental)

Abstract:Perception-centric systems are typically implemented with a modular encoder-decoder pipeline: a vision backbone for feature extraction and a separate decoder (or late-fusion module) for task prediction. This raises a central question: is this architectural separation essential or can a single early-fusion stack do both perception and task modeling at scale? We introduce Falcon Perception, a unified dense Transformer that processes image patches and text tokens in a shared parameter space from the first layer, using a hybrid attention pattern (bidirectional among image tokens, causal for prediction tokens) to combine global visual context with autoregressive, variable-length instance generation. To keep dense outputs practical, Falcon Perception retains a lightweight token interface and decodes continuous spatial outputs with specialized heads, enabling parallel high-resolution mask prediction. Our design promotes simplicity: we keep a single scalable backbone and shift complexity toward data and training signals, adding only small heads where outputs are continuous and dense. On SA-Co, Falcon Perception improves mask quality to 68.0 Macro-F$_1$ compared to 62.3 of SAM3. We also introduce PBench, a benchmark targeting compositional prompts (OCR, spatial constraints, relations) and dense long-context regimes, where the model shows better gains. Finally, we extend the same early-fusion recipe to Falcon OCR: a compact 300M-parameter model which attains 80.3% on olmOCR and 88.64 on OmniDocBench.

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2603.27365 [cs.CV]

(or arXiv:2603.27365v1 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2603.27365

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Yasser Abdelaziz Dahou Djilali [view email] [v1] Sat, 28 Mar 2026 18:23:20 UTC (16,146 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Falcon Perc…researchpaperarxivcomputer-vi…image-recog…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 123 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers