A Multimodal Deep Learning Framework for Edema Classification Using HCT and Clinical Data
arXiv:2603.26726v1 Announce Type: cross Abstract: We propose AttentionMixer, a unified deep learning framework for multimodal detection of brain edema that combines structural head CT (HCT) with routine clinical metadata. While HCT provides rich spatial information, clinical variables such as age, laboratory values, and scan timing capture complementary context that might be ignored or naively concatenated. AttentionMixer is designed to fuse these heterogeneous sources in a principled and efficient manner. HCT volumes are first encoded using a self-supervised Vision Transformer Autoencoder (Vi — Aram Ansary Ogholbake, Hannah Choi, Spencer Brandenburg, Alyssa Antuna, Zahraa Al-Sharshahi, Makayla Cox, Haseeb Ahmed, Jacqueline Frank, Nathan Millson, Luke Bauerle, Jessica Lee, David Dornbos III, Qiang Cheng
Authors:Aram Ansary Ogholbake, Hannah Choi, Spencer Brandenburg, Alyssa Antuna, Zahraa Al-Sharshahi, Makayla Cox, Haseeb Ahmed, Jacqueline Frank, Nathan Millson, Luke Bauerle, Jessica Lee, David Dornbos III, Qiang Cheng
View PDF HTML (experimental)
Abstract:We propose AttentionMixer, a unified deep learning framework for multimodal detection of brain edema that combines structural head CT (HCT) with routine clinical metadata. While HCT provides rich spatial information, clinical variables such as age, laboratory values, and scan timing capture complementary context that might be ignored or naively concatenated. AttentionMixer is designed to fuse these heterogeneous sources in a principled and efficient manner. HCT volumes are first encoded using a self-supervised Vision Transformer Autoencoder (ViT-AE++), without requiring large labeled datasets. Clinical metadata are mapped into the same feature space and used as keys and values in a cross-attention module, where HCT-derived feature vector serves as queries. This cross-attention fusion allows the network to dynamically modulate imaging features based on patient-specific context and provides an interpretable mechanism for multimodal integration. A lightweight MLP-Mixer then refines the fused representation before final classification, enabling global dependency modeling with substantially reduced parameter overhead. Missing or incomplete metadata are handled via a learnable embedding, promoting robustness to real-world clinical data quality. We evaluate AttentionMixer on a curated brain HCT cohort with expert edema annotations using five-fold cross-validation. Compared with strong HCT-only, metadata-only, and prior multimodal baselines, AttentionMixer achieves superior performance (accuracy 87.32%, precision 92.10%, F1-score 85.37%, AUC 94.14%). Ablation studies confirm the benefit of both cross-attention and MLP-Mixer refinement, and permutation-based metadata importance analysis highlights clinically meaningful variables driving predictions. These results demonstrate that structured, interpretable multimodal fusion can substantially improve edema detection in clinical practice.
Subjects:
Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as: arXiv:2603.26726 [cs.CV]
(or arXiv:2603.26726v1 [cs.CV] for this version)
https://doi.org/10.48550/arXiv.2603.26726
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Aram Ansary Ogholbake [view email] [v1] Fri, 20 Mar 2026 03:28:49 UTC (404 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxiv
China reveals military capabilities in new space solar power plant design
A senior Chinese scientist has outlined the potential military applications of space-based solar power technology, offering a rare glimpse into how energy beamed from orbit could also support surveillance and electronic warfare. Duan Baoyan, a leading architect of China’s “Zhuri” space solar power initiative, wrote in a paper published in Scientia Sinica Informationis last month, that his team had revamped the design of the giant orbital infrastructure. In addition to energy transmission, the...

Positional Restructuring of System Prompts: Mitigating Transformer Attention Bias in Sub-Frontier Models
I built a sovereign AI system on a Mac Mini that kept forgetting facts written in its own system prompt. Instead of upgrading hardware, I figured out why — and found some things I was not expecting. The obvious part: moving critical facts from the middle to the beginning and end of the system prompt fixes recall (2.0 to 7.0 on a verification battery). This builds on Liu et al.'s lost-in-the-middle work. The less obvious part: a model with 83.4% IFBench scored 3.4/10 on fact recall while a model with 23.9% IFBench scored 7.5/10 after restructuring. Instruction-following and fact recall appear to be independent capabilities. I have not seen this documented elsewhere. The paper also covers a behavioral rule methodology that took a 32B model from 6.2 to 9.4 across seven dimensions with cold re
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers

This Wi-Fi receiver can work inside a nuclear reactor, keeping robots connected
The research, presented at the IEEE International Solid-State Circuits Conference in San Francisco earlier this year, shows the receiver can continue operating after exposure to 500 kilograys of radiation. That level of endurance far exceeds what even space-grade electronics are designed to handle. Read Entire Article




Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!