Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessAnthropic took down thousands of Github repos trying to yank its leaked source code — a move the company says was an accidentTechCrunch1 Artificial Intelligence (AI) Software Stock to Buy Hand Over Fist Before It Soars 62%, According to Dan Ives - The Motley FoolGoogle News: AIGroup Pushing Age Verification Requirements for AI Turns Out to Be Sneakily Backed by OpenAIGizmodoGroup Pushing Age Verification Requirements for AI Turns Out to Be Sneakily Backed by OpenAI - GizmodoGoogle News: OpenAIInside the race to recreate Claude Code and mine its guts for revelationsBusiness InsiderAnthropic Executive Sees Cowork Agent as Bigger Than Claude Code - Bloomberg.comGoogle News: ClaudeAnthropic Executive Sees Cowork Agent as Bigger Than Claude CodeBloomberg TechnologyABAP OOP Design Patterns — Part 2: Factory, Observer, and Decorator Patterns in Real SAP SystemsDEV CommunityWhy Your AI Agent Health Check Is Lying to YouDEV CommunityDeep Dive: Array Internals & Memory LayoutDEV CommunityIllinois Tech computer science researcher honored by IEEE Chicago Section - EurekAlert!Google News: Machine LearningICE Tells Lawmakers It’s Using Spyware in Fight Against FentanylBloomberg TechnologyBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessAnthropic took down thousands of Github repos trying to yank its leaked source code — a move the company says was an accidentTechCrunch1 Artificial Intelligence (AI) Software Stock to Buy Hand Over Fist Before It Soars 62%, According to Dan Ives - The Motley FoolGoogle News: AIGroup Pushing Age Verification Requirements for AI Turns Out to Be Sneakily Backed by OpenAIGizmodoGroup Pushing Age Verification Requirements for AI Turns Out to Be Sneakily Backed by OpenAI - GizmodoGoogle News: OpenAIInside the race to recreate Claude Code and mine its guts for revelationsBusiness InsiderAnthropic Executive Sees Cowork Agent as Bigger Than Claude Code - Bloomberg.comGoogle News: ClaudeAnthropic Executive Sees Cowork Agent as Bigger Than Claude CodeBloomberg TechnologyABAP OOP Design Patterns — Part 2: Factory, Observer, and Decorator Patterns in Real SAP SystemsDEV CommunityWhy Your AI Agent Health Check Is Lying to YouDEV CommunityDeep Dive: Array Internals & Memory LayoutDEV CommunityIllinois Tech computer science researcher honored by IEEE Chicago Section - EurekAlert!Google News: Machine LearningICE Tells Lawmakers It’s Using Spyware in Fight Against FentanylBloomberg Technology

Brain-Inspired Multimodal Spiking Neural Network for Image-Text Retrieval

arXivMarch 31, 20262 min read0 views
Source Quiz

arXiv:2603.26787v1 Announce Type: new Abstract: Spiking neural networks (SNNs) have recently shown strong potential in unimodal visual and textual tasks, yet building a directly trained, low-energy, and high-performance SNN for multimodal applications such as image-text retrieval (ITR) remains highly challenging. Existing artificial neural network (ANN)-based methods often pursue richer unimodal semantics using deeper and more complex architectures, while overlooking cross-modal interaction, retrieval latency, and energy efficiency. To address these limitations, we present a brain-inspired Cro — Xintao Zong, Xian Zhong, Wenxuan Liu, Jianhao Ding, Zhaofei Yu, Tiejun Huang

View PDF HTML (experimental)

Abstract:Spiking neural networks (SNNs) have recently shown strong potential in unimodal visual and textual tasks, yet building a directly trained, low-energy, and high-performance SNN for multimodal applications such as image-text retrieval (ITR) remains highly challenging. Existing artificial neural network (ANN)-based methods often pursue richer unimodal semantics using deeper and more complex architectures, while overlooking cross-modal interaction, retrieval latency, and energy efficiency. To address these limitations, we present a brain-inspired Cross-Modal Spike Fusion network (CMSF) and apply it to ITR for the first time. The proposed spike fusion mechanism integrates unimodal features at the spike level, generating enhanced multimodal representations that act as soft supervisory signals to refine unimodal spike embeddings, effectively mitigating semantic loss within CMSF. Despite requiring only two time steps, CMSF achieves top-tier retrieval accuracy, surpassing state-of-the-art ANN counterparts while maintaining exceptionally low energy consumption and high retrieval speed. This work marks a significant step toward multimodal SNNs, offering a brain-inspired framework that unifies temporal dynamics with cross-modal alignment and provides new insights for future spiking-based multimodal research. The code is available at this https URL.

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2603.26787 [cs.CV]

(or arXiv:2603.26787v1 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2603.26787

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Xintao Zong [view email] [v1] Wed, 25 Mar 2026 08:41:07 UTC (4,449 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Brain-Inspi…researchpaperarxivcomputer-vi…image-recog…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 172 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers