Research Papers research paper arxiv nlp language-models

Clinical named entity recognition in the Portuguese language: a benchmark of modern BERT models and LLMs

arXivMarch 30, 202610 min read0 views

arXiv:2603.26510v1 Announce Type: new Abstract: Clinical notes contain valuable unstructured information. Named entity recognition (NER) enables the automatic extraction of medical concepts; however, benchmarks for Portuguese remain scarce. In this study, we aimed to evaluate BERT-based models and large language models (LLMs) for clinical NER in Portuguese and to test strategies for addressing multilabel imbalance. We compared BioBERTpt, BERTimbau, ModernBERT, and mmBERT with LLMs such as GPT-5 and Gemini-2.5, using the public SemClinBr corpus and a private breast cancer dataset. Models were t — Vinicius Anjos de Almeida, Sandro Saorin da Silva, Josimar Chire, Leonardo Vicenzi, N\'icolas Henrique Borges, Helena Kociolek, Sarah Miri\~a de Castro Rocha, Frederico Nassif Gomes, J\'ulia Cristina Ferreira, Oge Marques, Lucas Emanuel Silva e Oliveira

Authors:Vinicius Anjos de Almeida, Sandro Saorin da Silva, Josimar Chire, Leonardo Vicenzi, Nícolas Henrique Borges, Helena Kociolek, Sarah Miriã de Castro Rocha, Frederico Nassif Gomes, Júlia Cristina Ferreira, Oge Marques, Lucas Emanuel Silva e Oliveira

View PDF HTML (experimental)

Abstract:Clinical notes contain valuable unstructured information. Named entity recognition (NER) enables the automatic extraction of medical concepts; however, benchmarks for Portuguese remain scarce. In this study, we aimed to evaluate BERT-based models and large language models (LLMs) for clinical NER in Portuguese and to test strategies for addressing multilabel imbalance. We compared BioBERTpt, BERTimbau, ModernBERT, and mmBERT with LLMs such as GPT-5 and Gemini-2.5, using the public SemClinBr corpus and a private breast cancer dataset. Models were trained under identical conditions and evaluated using precision, recall, and F1-score. Iterative stratification, weighted loss, and oversampling were explored to mitigate class imbalance. The mmBERT-base model achieved the best performance (micro F1 = 0.76), outperforming all other models. Iterative stratification improved class balance and overall performance. Multilingual BERT models, particularly mmBERT, perform strongly for Portuguese clinical NER and can run locally with limited computational resources. Balanced data-splitting strategies further enhance performance.

Comments: Under peer review. GitHub: this https URL

Subjects:

Computation and Language (cs.CL)

Cite as: arXiv:2603.26510 [cs.CL]

(or arXiv:2603.26510v1 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2603.26510

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Vinicius Anjos De Almeida [view email] [v1] Fri, 27 Mar 2026 15:22:07 UTC (163 KB)

Original source

arXiv

https://arxiv.org/abs/2603.26510

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

ModelsRecent

Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models - WSJ

Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models WSJ

Google News: LLM

1m2 days ago

ModelsRecent

Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models - WSJ

Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models WSJ

Google News: LLM

1m2 days ago

Analyst NewsLive

New MIT jobs report: Why AI's work impact will roll in like a rising tide, not a crashing wave

AI may be 'minimally sufficient' at most of your text work tasks by 2029, according to new MIT research. Here's why that's good news.

ZDNet Big Data

1mabout 2 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 156 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research PapersLive

30 Unpublished Poems From Iconic Greek Philosopher Discovered in Cairo

For over two millennia, the philosopher’s poetic work had largely been alluded to in the writings of others.

Gizmodo

1mabout 2 hours ago

Research Papers

I was a beta tester for the Nobel prize-winning AlphaFold AI – it’s going to revolutionise health research - The Conversation

I was a beta tester for the Nobel prize-winning AlphaFold AI – it’s going to revolutionise health research The Conversation

GNews AI protein

1mover 1 year ago

Research PapersRecent

IBM Advances Quantum Computing Research: Will it Boost Prospects? - Yahoo Finance Singapore

IBM Advances Quantum Computing Research: Will it Boost Prospects? Yahoo Finance Singapore

GNews AI quantum

1m1 day ago

Research PapersFresh

Quantum computers might crack today's encryption far sooner than we thought

According to a study by engineers at Caltech and the UC Department of Physics, quantum computers do not need to be nearly as powerful as previously believed to crack the most advanced cryptographic technologies. The research claims that Shor's algorithm could break RSA public-key encryption using quantum computers with just... Read Entire Article

TechSpot

1mabout 6 hours ago