Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessIs Scale AI Stock Public in 2026? Price, Symbol & Alternatives - Bullish BearsGoogle News - Scale AI dataHow to Choose Your MVP Tech StackDEV CommunityDocument Workflow Automation: An Architectural Guide to Building API-Driven Document PipelinesDEV CommunityHow to Roll Back a Failed Deployment in 30 SecondsDEV CommunityWho's hiring — April 2026DEV CommunityScraped 300 pages successfully. Site updated robots.txt at page 187 and blocked me.DEV CommunityI built an npm malware scanner in Rust because npm audit isn't enoughDEV CommunityMCP App CSP Explained: Why Your Widget Won't RenderDEV CommunityVS-wet dreigt ASML-export van immersiemachines naar China af te knijpenTweakers.netBuilt a script to categorize expenses automatically. Saved 3 hours/month.DEV CommunityFrom MLOps to LLMOps: A Practical AWS GenAI Operations GuideDEV CommunityCleaned 10k customer records. One emoji crashed my entire pipeline.DEV CommunityBlack Hat USADark ReadingBlack Hat AsiaAI BusinessIs Scale AI Stock Public in 2026? Price, Symbol & Alternatives - Bullish BearsGoogle News - Scale AI dataHow to Choose Your MVP Tech StackDEV CommunityDocument Workflow Automation: An Architectural Guide to Building API-Driven Document PipelinesDEV CommunityHow to Roll Back a Failed Deployment in 30 SecondsDEV CommunityWho's hiring — April 2026DEV CommunityScraped 300 pages successfully. Site updated robots.txt at page 187 and blocked me.DEV CommunityI built an npm malware scanner in Rust because npm audit isn't enoughDEV CommunityMCP App CSP Explained: Why Your Widget Won't RenderDEV CommunityVS-wet dreigt ASML-export van immersiemachines naar China af te knijpenTweakers.netBuilt a script to categorize expenses automatically. Saved 3 hours/month.DEV CommunityFrom MLOps to LLMOps: A Practical AWS GenAI Operations GuideDEV CommunityCleaned 10k customer records. One emoji crashed my entire pipeline.DEV Community
AI NEWS HUBbyEIGENVECTOREigenvector

GS-BrainText: A Multi-Site Brain Imaging Report Dataset from Generation Scotland for Clinical Natural Language Processing Development and Validation

arXivMarch 30, 202610 min read0 views
Source Quiz

arXiv:2603.26235v1 Announce Type: new Abstract: We present GS-BrainText, a curated dataset of 8,511 brain radiology reports from the Generation Scotland cohort, of which 2,431 are annotated for 24 brain disease phenotypes. This multi-site dataset spans five Scottish NHS health boards and includes broad age representation (mean age 58, median age 53), making it uniquely valuable for developing and evaluating generalisable clinical natural language processing (NLP) algorithms and tools. Expert annotations were performed by a multidisciplinary clinical team using an annotation schema, with 10-100 — Beatrice Alex, Claire Grover, Arlene Casey, Richard Tobin, Heather Whalley, William Whiteley

View PDF HTML (experimental)

Abstract:We present GS-BrainText, a curated dataset of 8,511 brain radiology reports from the Generation Scotland cohort, of which 2,431 are annotated for 24 brain disease phenotypes. This multi-site dataset spans five Scottish NHS health boards and includes broad age representation (mean age 58, median age 53), making it uniquely valuable for developing and evaluating generalisable clinical natural language processing (NLP) algorithms and tools. Expert annotations were performed by a multidisciplinary clinical team using an annotation schema, with 10-100% double annotation per NHS health board and rigorous quality assurance. Benchmark evaluation using EdIE-R, an existing rule-based NLP system developed in conjunction with the annotation schema, revealed some performance variation across health boards (F1: 86.13-98.13), phenotypes (F1: 22.22-100) and age groups (F1: 87.01-98.13), highlighting critical challenges in generalisation of NLP tools. The GS-BrainText dataset addresses a significant gap in available UK clinical text resources and provides a valuable resource for the study of linguistic variation, diagnostic uncertainty expression and the impact of data characteristics on NLP system performance.

Comments: 11 pages, 1 figure

Subjects:

Computation and Language (cs.CL)

Cite as: arXiv:2603.26235 [cs.CL]

(or arXiv:2603.26235v1 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2603.26235

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Beatrice Alex [view email] [v1] Fri, 27 Mar 2026 09:57:20 UTC (287 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
GS-BrainTex…researchpaperarxivnlplanguage-mo…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 140 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!