External Benchmarking of Lung Ultrasound Models for Pneumothorax-Related Signs: A Manifest-Based Multi-Source Study

arXivMarch 31, 20262 min read0 views

arXiv:2603.26832v1 Announce Type: cross Abstract: Background and Aims: Reproducible external benchmarks for pneumothorax-related lung ultrasound (LUS) AI are scarce, and binary lung-sliding classification may obscure clinically important signs. We therefore developed a manifest-based external benchmark and used it to test both cross-domain generalization and task validity. Methods: We curated 280 clips from 190 publicly accessible LUS source videos and released a reconstruction manifest containing URLs, timestamps, crop coordinates, labels, and probe shape. Labels were normal lung sliding, abs — Takehiro Ishikawa

View PDF

Abstract:Background and Aims: Reproducible external benchmarks for pneumothorax-related lung ultrasound (LUS) AI are scarce, and binary lung-sliding classification may obscure clinically important signs. We therefore developed a manifest-based external benchmark and used it to test both cross-domain generalization and task validity. Methods: We curated 280 clips from 190 publicly accessible LUS source videos and released a reconstruction manifest containing URLs, timestamps, crop coordinates, labels, and probe shape. Labels were normal lung sliding, absent lung sliding, lung point, and lung pulse. A previously published single-site binary classifier was evaluated on this benchmark; challenge-state analysis examined lung point and lung pulse using the predicted probability of absent sliding, P(absent). Results: The single-site comparator achieved ROC-AUC 0.9625 in-domain but 0.7050 on the heterogeneous external benchmark; restricting external evaluation to linear clips still yielded ROC-AUC 0.7212. In challenge-state analysis, mean P(absent) ranked absent (0.504) > lung point (0.313) > normal (0.186) > lung pulse (0.143). Lung pulse differed from absent clips (p=0.000470) but not from normal clips (p=0.813), indicating that the binary model treated pulse as normal-like despite absent sliding. Lung point differed from both absent (p=0.000468) and normal (p=0.000026), supporting its interpretation as an intermediate ambiguity state rather than a clean binary class. Conclusion: A manifest-based, multi-source benchmark can support reproducible external evaluation without redistributing source videos. Binary lung-sliding classification is an incomplete proxy for pneumothorax reasoning because it obscures blind-spot and ambiguity states such as lung pulse and lung point.

Subjects:

Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2603.26832 [eess.IV]

(or arXiv:2603.26832v1 [eess.IV] for this version)

https://doi.org/10.48550/arXiv.2603.26832

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Takehiro Ishikawa [view email] [v1] Fri, 27 Mar 2026 05:11:47 UTC (266 KB)

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

External Benchmarking of Lung Ultrasound Models for Pneumothorax-Related Signs: A Manifest-Based Multi-Source Study

Submission history

Daily AI Digest

Knowledge Map

Connected Articles — Knowledge Graph

Discussion