MiNER: A Two-Stage Pipeline for Metadata Extraction from Municipal Meeting Minutes

arXivMarch 30, 202610 min read0 views
Source Quiz

arXiv:2602.00316v3 Announce Type: replace Abstract: Municipal meeting minutes are official documents of local governance, exhibiting heterogeneous formats and writing styles. Effective information retrieval (IR) requires identifying metadata such as meeting number, date, location, participants, and start/end times, elements that are rarely standardized or easy to extract automatically. Existing named entity recognition (NER) models are ill-suited to this task, as they are not adapted to such domain-specific categories. In this paper, we propose a two-stage pipeline for metadata extraction from — Rodrigo Batista, Lu\'is Filipe Cunha, Purifica\c{c}\~ao Silvano, Nuno Guimar\~aes, Al\'ipio Jorge, Evelin Amorim, Ricardo Campos

View PDF HTML (experimental)

Abstract:Municipal meeting minutes are official documents of local governance, exhibiting heterogeneous formats and writing styles. Effective information retrieval (IR) requires identifying metadata such as meeting number, date, location, participants, and start/end times, elements that are rarely standardized or easy to extract automatically. Existing named entity recognition (NER) models are ill-suited to this task, as they are not adapted to such domain-specific categories. In this paper, we propose a two-stage pipeline for metadata extraction from municipal minutes. First, a question answering (QA) model identifies the opening and closing text segments containing metadata. Transformer-based models (BERTimbau and XLM-RoBERTa with and without a CRF layer) are then applied for fine-grained entity extraction and enhanced through deslexicalization. To evaluate our proposed pipeline, we benchmark both open-weight (Phi) and closed-weight (Gemini) LLMs, assessing predictive performance, inference cost, and carbon footprint. Our results demonstrate strong in-domain performance, better than larger general-purpose LLMs. However, cross-municipality evaluation reveals reduced generalization reflecting the variability and linguistic complexity of municipal records. This work establishes the first benchmark for metadata extraction from municipal meeting minutes, providing a solid foundation for future research in this domain.

Subjects:

Computation and Language (cs.CL)

Cite as: arXiv:2602.00316 [cs.CL]

(or arXiv:2602.00316v3 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2602.00316

arXiv-issued DOI via DataCite

Journal reference: Advances in Information Retrieval. ECIR 2026. Lecture Notes in Computer Science, vol 16484. Springer, Cham

Related DOI:

https://doi.org/10.1007/978-3-032-21300-6_33

DOI(s) linking to related resources

Submission history

From: Nuno Guimaraes [view email] [v1] Fri, 30 Jan 2026 21:09:13 UTC (23 KB) [v2] Mon, 9 Feb 2026 10:04:48 UTC (23 KB) [v3] Thu, 26 Mar 2026 19:56:25 UTC (23 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
MiNER: A Tw…researchpaperarxivnlplanguage-mo…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 301 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers