Live
Black Hat USADark ReadingBlack Hat AsiaAI Business🥇Top AI Papers of the WeekNLP News SubstackA profile of Mikko Hyppönen, a cybersecurity veteran who pivoted from fighting malware to developing anti-drone systems for law enforcement and the military (Lorenzo Franceschi-Bicchierai/TechCrunch)Techmeme[D] ICML Rebuttal QuestionReddit r/MachineLearningDFRobot Showcases AI Maker Projects at Robot Hokoten in Akihabara - TNGlobalGNews AI educationAI on CanvasHacker News AI TopAI could transform patient education in eye care, new research shows - Medical XpressGNews AI educationDFRobot Showcases AI Maker Projects at Robot Hokoten in Akihabara - Thailand Business NewsGoogle News - AI ThailandAGI vs artificial intelligence: What’s the real difference - WIONGNews AI AGIAI agents promise to 'run the business,' but who is liable if things go wrong?Hacker News AI TopJapan Turns Labor Crisis Into Physical AI Testing Ground - The Tech BuzzGNews AI jobsOnly 7.4% of Fortune 500 have an llms.txt file, study finds - PPC LandGNews AI searchBuy Facebook Reviews | Boost Brand Trust & VisibilityDev.to AIBlack Hat USADark ReadingBlack Hat AsiaAI Business🥇Top AI Papers of the WeekNLP News SubstackA profile of Mikko Hyppönen, a cybersecurity veteran who pivoted from fighting malware to developing anti-drone systems for law enforcement and the military (Lorenzo Franceschi-Bicchierai/TechCrunch)Techmeme[D] ICML Rebuttal QuestionReddit r/MachineLearningDFRobot Showcases AI Maker Projects at Robot Hokoten in Akihabara - TNGlobalGNews AI educationAI on CanvasHacker News AI TopAI could transform patient education in eye care, new research shows - Medical XpressGNews AI educationDFRobot Showcases AI Maker Projects at Robot Hokoten in Akihabara - Thailand Business NewsGoogle News - AI ThailandAGI vs artificial intelligence: What’s the real difference - WIONGNews AI AGIAI agents promise to 'run the business,' but who is liable if things go wrong?Hacker News AI TopJapan Turns Labor Crisis Into Physical AI Testing Ground - The Tech BuzzGNews AI jobsOnly 7.4% of Fortune 500 have an llms.txt file, study finds - PPC LandGNews AI searchBuy Facebook Reviews | Boost Brand Trust & VisibilityDev.to AI
AI NEWS HUBbyEIGENVECTOREigenvector

ENEIDE: A High Quality Silver Standard Dataset for Named Entity Recognition and Linking in Historical Italian

arXiv cs.CLby Cristian Santini, Sebastian Barzaghi, Paolo Sernani, Emanuele Frontoni, Laura Melosi, Mehwish AlamApril 1, 20261 min read0 views
Source Quiz

arXiv:2603.29801v1 Announce Type: new Abstract: This paper introduces ENEIDE (Extracting Named Entities from Italian Digital Editions), a silver standard dataset for Named Entity Recognition and Linking (NERL) in historical Italian texts. The corpus comprises 2,111 documents with over 8,000 entity annotations semi-automatically extracted from two scholarly digital editions: Digital Zibaldone, the philosophical diary of the Italian poet Giacomo Leopardi (1798--1837), and Aldo Moro Digitale, the complete works of the Italian politician Aldo Moro (1916--1978). Annotations cover multiple entity types (person, location, organization, literary work) linked to Wikidata identifiers, including NIL entities that cannot be mapped to the knowledge graph. To the best of our knowledge, ENEIDE represents

View PDF HTML (experimental)

Abstract:This paper introduces ENEIDE (Extracting Named Entities from Italian Digital Editions), a silver standard dataset for Named Entity Recognition and Linking (NERL) in historical Italian texts. The corpus comprises 2,111 documents with over 8,000 entity annotations semi-automatically extracted from two scholarly digital editions: Digital Zibaldone, the philosophical diary of the Italian poet Giacomo Leopardi (1798--1837), and Aldo Moro Digitale, the complete works of the Italian politician Aldo Moro (1916--1978). Annotations cover multiple entity types (person, location, organization, literary work) linked to Wikidata identifiers, including NIL entities that cannot be mapped to the knowledge graph. To the best of our knowledge, ENEIDE represents the first multi-domain, publicly available NERL dataset for historical Italian with training, development, and test splits. We present a methodology for semi-automatic annotations extraction from manually curated scholarly digital editions, including quality control and annotation enhancement procedures. Baseline experiments using state-of-the-art models demonstrate the dataset's challenge for NERL and the gap between zero-shot approaches and fine-tuned models. The dataset's diachronic coverage spanning two centuries makes it particularly suitable for temporal entity disambiguation and cross-domain evaluation. ENEIDE is released under a CC BY-NC-SA 4.0 license.

Subjects:

Computation and Language (cs.CL)

Cite as: arXiv:2603.29801 [cs.CL]

(or arXiv:2603.29801v1 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2603.29801

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Cristian Santini [view email] [v1] Tue, 31 Mar 2026 14:32:34 UTC (245 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modeltrainingrelease

Knowledge Map

Knowledge Map
TopicsEntitiesSource
ENEIDE: A H…modeltrainingreleaseannounceavailablevaluationarXiv cs.CL

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 97 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!