Learning to See through Illumination Extremes with Event Streaming in Multimodal Large Language Models
arXiv:2603.27558v1 Announce Type: new Abstract: Multimodal Large Language Models (MLLMs) perform strong vision-language reasoning under standard conditions but fail in extreme illumination, where RGB inputs lose irrevocable structure and semantics. We propose Event-MLLM, an event-enhanced model that performs all-light visual reasoning by dynamically fusing event streams with RGB frames. Two key components drive our approach: an Illumination Indicator - a learnable signal derived from a DINOv2 branch that represents exposure degradation and adaptively modulates event-RGB fusion - and an Illumin — Baoheng Zhang, Jiahui Liu, Gui Zhao, Weizhou Zhang, Yixuan Ma, Jun Jiang, Yingxian Chen, Wilton W. T. Fok, Xiaojuan Qi, Hayden Kwok-Hay So
View PDF HTML (experimental)
Abstract:Multimodal Large Language Models (MLLMs) perform strong vision-language reasoning under standard conditions but fail in extreme illumination, where RGB inputs lose irrevocable structure and semantics. We propose Event-MLLM, an event-enhanced model that performs all-light visual reasoning by dynamically fusing event streams with RGB frames. Two key components drive our approach: an Illumination Indicator - a learnable signal derived from a DINOv2 branch that represents exposure degradation and adaptively modulates event-RGB fusion - and an Illumination Correction Loss that aligns fused features with non-degraded (normal-light) semantics in the latent space, compensating for information lost in extreme lighting. We curate the first multi-illumination event-instruction corpus for MLLMs, with 2,241 event-RGB samples (around 6 QA pairs each) across diverse scenes and 17 brightness rates (0.05x - 20x), plus an instruct-following benchmark for reasoning, counting, and fine-grained recognition under extreme lighting. Experiments show that Event-MLLM markedly outperforms general-purpose, illumination-adaptive, and event-only baselines, setting a new state of the art in robust multimodal perception and reasoning under challenging illumination.
Comments: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2026
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Cite as: arXiv:2603.27558 [cs.CV]
(or arXiv:2603.27558v1 [cs.CV] for this version)
https://doi.org/10.48550/arXiv.2603.27558
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Jiahui Liu [view email] [v1] Sun, 29 Mar 2026 07:46:32 UTC (1,879 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxiv‘It’s all very possible’: Michael Patrick King on The Comeback return’s shocking AI twist – and why And Just Like That will age well
<p>Could AI write an entire sitcom series? That’s the premise of the new season of comedy drama The Comeback. Its co-creator talks about being shocked by his research – and why the world needs to catch up with AJLT</p><p>TV veteran Michael Patrick King has had a long, lively career, writing, directing and producing on shows including Murphy Brown, Will & Grace and 2 Broke Girls. He’s best-known, though, for his work on the Sex and the City franchise, serving as its showrunner for the bulk of its run, writing and directing its two films, and masterminding its controversial <a href="https://www.theguardian.com/tv-and-radio/2025/aug/02/goodbye-and-just-like-that-right-time-to-end-cursed-spin-off">2020s revival And Just Like That</a>. But this month sees the return of one of his most loved
SEAS Researchers Expose Hidden “Alignment Discretion” Shaping AI Behavior - Harvard School of Engineering and Applied Sciences
<a href="https://news.google.com/rss/articles/CBMiowFBVV95cUxONEZoeldndUE2ZUZsSkloQmZVMk1jZUhncUY0V3c5NDQ0TlNLVTluWkppejlpOVdpemxqNEVvaDAwbG43VWpxOFpuakFtRDNLUlVTbEwzR25kYnJieVdkSEs0MVRESHpPSHN6dEhXUk9qVGRlVjFhT2ZqVjlJV056MG94MDN3dWVCSUtoRWVDODF5bVVET2gxQW5DSk1oT1pUNlVB?oc=5" target="_blank">SEAS Researchers Expose Hidden “Alignment Discretion” Shaping AI Behavior</a> <font color="#6f6f6f">Harvard School of Engineering and Applied Sciences</font>
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers
Oracle Layoffs Recast Costs To Back US$50b AI Infrastructure Bet - simplywall.st
<a href="https://news.google.com/rss/articles/CBMivwFBVV95cUxQNWpZb2ZQVDBIOGVZTTBtLThzaGwxS3NkMnJBSS1wek5pQlJXRWdTOEh5aTdPTE9Cd3JHdjZDeWRtVzdMUUdESHJOQXZDdGNVdGZtTTBhanpfb3UxQnRobVlzNGdVUXJLZWptV2V6NXlNSWllX3FxOU5XYTF0RkM2TnJIaFJkcVBFOGc2alBSLTZEeU85QU1oTjBrMVZSTl84dm9GeFl5OGtUMjc3LVd1dS1fcHZ1RG9HcV82T2JFWdIBxAFBVV95cUxOSE5XVXh0QkM4Yi1WbXNhWkJ2Z2dLRlBGNjAwaTcyNFJWMWRPdXo5WjRQQkRGTG9IamxxbmdhMHpsaEJ6RDQwZl9ENGl5WDc5a2lrTXZ1bVpFbGdsdndHYjFINnZPSnNKX1dZamszUXByR1BlRXF6d1pKOHpBU3M5UFhUSldlUWtIMlRNQzdvTk9haEJKeDI1ZEg0WWQ1SXYzLUZCWElQc3pzR19ucGExdVpnc2hBQXlQNVpOZFVBVzRkLXFE?oc=5" target="_blank">Oracle Layoffs Recast Costs To Back US$50b AI Infrastructure Bet</a> <font color="#6f6f6f">simplywall.st</font>
Riyadh conference to discuss role of AI in media industry - Arab News PK
<a href="https://news.google.com/rss/articles/CBMiVEFVX3lxTE1jdFVMUFA3R2RXM19JR1M1NnpjX210dUZuNkI3VWdQc0tzVVBZaXR3ZlNqUVFyZlB5aTMxOGI3OXFpdGpQX2RsOXF3UU5kaXlma2VpTQ?oc=5" target="_blank">Riyadh conference to discuss role of AI in media industry</a> <font color="#6f6f6f">Arab News PK</font>
Losito named IBM Italia general manager - Telecompaper
<a href="https://news.google.com/rss/articles/CBMiigFBVV95cUxNRTQ0RzVrcHJsVXo0THF3UllROGwyam1FNl9RWlV2dzJFRGtGMktoTGlYVUR5dU1WX1JSTkExQlNSVEFSWktVQVJSazFUUTJyV2tadUlraVlGM3M3WHNZNFNodm5DeVBvTXFkaDNkNXJ4SzF0RnphNGxOYlFGaFRtR241R2M0NFhUakE?oc=5" target="_blank">Losito named IBM Italia general manager</a> <font color="#6f6f6f">Telecompaper</font>


Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!