Dictionary-based Pathology Mining with Hard-instance-assisted Classifier Debiasing for Genetic Biomarker Prediction from WSIs
arXiv:2603.26809v1 Announce Type: cross Abstract: Prediction of genetic biomarkers, e.g., microsatellite instability in colorectal cancer is crucial for clinical decision making. But, two primary challenges hamper accurate prediction: (1) It is difficult to construct a pathology-aware representation involving the complex interconnections among pathological components. (2) WSIs contain a large proportion of areas unrelated to genetic biomarkers, which make the model easily overfit simple but irrelative instances. We hereby propose a Dictionary-based hierarchical pathology mining with hard-insta — Ling Zhang, Boxiang Yun, Ting Jin, Qingli Li, Xinxing Li, Yan Wang
View PDF HTML (experimental)
Abstract:Prediction of genetic biomarkers, e.g., microsatellite instability in colorectal cancer is crucial for clinical decision making. But, two primary challenges hamper accurate prediction: (1) It is difficult to construct a pathology-aware representation involving the complex interconnections among pathological components. (2) WSIs contain a large proportion of areas unrelated to genetic biomarkers, which make the model easily overfit simple but irrelative instances. We hereby propose a Dictionary-based hierarchical pathology mining with hard-instance-assisted classifier Debiasing framework to address these challenges, dubbed as D2Bio. Our first module, dictionary-based hierarchical pathology mining, is able to mine diverse and very fine-grained pathological contextual interaction without the limit to the distances between patches. The second module, hard-instance-assisted classfier debiasing, learns a debiased classifier via focusing on hard but task-related features, without any additional annotations. Experimental results on five cohorts show the superiority of our method, with over 4% improvement in AUROC compared with the second best on the TCGA-CRC-MSI cohort. Our analysis further shows the clinical interpretability of D2Bio in genetic biomarker diagnosis and potential clinical utility in survival analysis. Code will be available at this https URL.
Comments: 13 pages, 13 figures
Subjects:
Quantitative Methods (q-bio.QM); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as: arXiv:2603.26809 [q-bio.QM]
(or arXiv:2603.26809v1 [q-bio.QM] for this version)
https://doi.org/10.48550/arXiv.2603.26809
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Ling Zhang [view email] [v1] Thu, 26 Mar 2026 14:23:28 UTC (5,468 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxivTrailer: The Shape of Things to Come
Microsoft research lead Doug Burger introduces his new podcast series, "The Shape of Things to Come", an exploration into the fundamental truths about AI and how the technology will reshape the future. The post Trailer: The Shape of Things to Come appeared first on Microsoft Research .

Will machines ever be intelligent?
Are machines truly intelligent? AI researchers Subutai Ahmad and Nicolò Fusi join Doug Burger to compare transformer-based AI with the human brain, exploring continual learning, efficiency, and whether today’s models are on a path toward human intelligence. The post Will machines ever be intelligent? appeared first on Microsoft Research .
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.




Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!