Approaches to Analysing Historical Newspapers Using LLMs
arXiv:2603.25051v2 Announce Type: replace Abstract: This study presents a computational analysis of the Slovene historical newspapers \textit{Slovenec} and \textit{Slovenski narod} from the sPeriodika corpus, combining topic modelling, large language model (LLM)-based aspect-level sentiment analysis, entity-graph visualisation, and qualitative discourse analysis to examine how collective identities, political orientations, and national belonging were represented in public discourse at the turn of the twentieth century. Using BERTopic, we identify major thematic patterns and show both shared co — Filip Dobrani\'c, Tina Munda, Oliver Peji\'c, Vojko Gorjanc, Uro\v{s} \v{S}majdek, David Bordon, Jakob Lenardi\v{c}, Tja\v{s}a Konov\v{s}ek, Kristina Pahor de Maiti Tekav\v{c}i\v{c}, Ciril Bohak, Darja Fi\v{s}er
Authors:Filip Dobranić, Tina Munda, Oliver Pejić, Vojko Gorjanc, Uroš Šmajdek, David Bordon, Jakob Lenardič, Tjaša Konovšek, Kristina Pahor de Maiti Tekavčič, Ciril Bohak, Darja Fišer
View PDF HTML (experimental)
Abstract:This study presents a computational analysis of the Slovene historical newspapers \textit{Slovenec} and \textit{Slovenski narod} from the sPeriodika corpus, combining topic modelling, large language model (LLM)-based aspect-level sentiment analysis, entity-graph visualisation, and qualitative discourse analysis to examine how collective identities, political orientations, and national belonging were represented in public discourse at the turn of the twentieth century. Using BERTopic, we identify major thematic patterns and show both shared concerns and clear ideological differences between the two newspapers, reflecting their conservative-Catholic and liberal-progressive orientations. We further evaluate four instruction-following LLMs for targeted sentiment classification in OCR-degraded historical Slovene and select the Slovene-adapted GaMS3-12B-Instruct model as the most suitable for large-scale application, while also documenting important limitations, particularly its stronger performance on neutral sentiment than on positive or negative sentiment. Applied at dataset scale, the model reveals meaningful variation in the portrayal of collective identities, with some groups appearing predominantly in neutral descriptive contexts and others more often in evaluative or conflict-related discourse. We then create NER graphs to explore the relationships between collective identities and places. We apply a mixed methods approach to analyse the named entity graphs, combining quantitative network analysis with critical discourse analysis. The investigation focuses on the emergence and development of intertwined historical political and socionomic identities. Overall, the study demonstrates the value of combining scalable computational methods with critical interpretation to support digital humanities research on noisy historical newspaper data.
Subjects:
Computation and Language (cs.CL)
Cite as: arXiv:2603.25051 [cs.CL]
(or arXiv:2603.25051v2 [cs.CL] for this version)
https://doi.org/10.48550/arXiv.2603.25051
arXiv-issued DOI via DataCite
Submission history
From: Ciril Bohak [view email] [v1] Thu, 26 Mar 2026 05:38:30 UTC (3,483 KB) [v2] Fri, 27 Mar 2026 09:09:00 UTC (3,476 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxiv![First time NeurIPS. How different is it from low-ranked conferences? [D]](https://d2xsxph8kpxj0f.cloudfront.net/310419663032563854/konzwo8nGf8Z4uZsMefwMr/default-img-robot-hand-JvPW6jsLFTCtkgtb97Kys5.webp)
First time NeurIPS. How different is it from low-ranked conferences? [D]
I'm a PhD student and already published papers in A/B ranked paper (10+). My field of work never allowed me to work on something really exciting and a core A* conference. But finally after years I think I have work worthy of some discussion at the top venue. I'm referring to papers (my field and top papers) from previous editions and I notice that there's a big difference on how people write, how they put their message on table and also it is too theoretical sometimes. Are there any golden rules people follow who frequently get into these conferences? Should I be soft while making novelty claims? Also those who moved from submitting to niche-conferences to NeurIPS/ICML/CVPR, did you change your approach? My field is imaging in healthcare. submitted by /u/ade17_in [link] [comments]

AI can describe human experiences but lacks experience in an actual ‘body.’ UCLA researchers say understanding this ‘body gap’ may matter for safety - UCLA Health
AI can describe human experiences but lacks experience in an actual ‘body.’ UCLA researchers say understanding this ‘body gap’ may matter for safety UCLA Health

New IAEA Research Project Uses Machine Learning to Better Predict Polymer Changes under Radiation - International Atomic Energy Agency
New IAEA Research Project Uses Machine Learning to Better Predict Polymer Changes under Radiation International Atomic Energy Agency
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers
![First time NeurIPS. How different is it from low-ranked conferences? [D]](https://d2xsxph8kpxj0f.cloudfront.net/310419663032563854/konzwo8nGf8Z4uZsMefwMr/default-img-robot-hand-JvPW6jsLFTCtkgtb97Kys5.webp)
First time NeurIPS. How different is it from low-ranked conferences? [D]
I'm a PhD student and already published papers in A/B ranked paper (10+). My field of work never allowed me to work on something really exciting and a core A* conference. But finally after years I think I have work worthy of some discussion at the top venue. I'm referring to papers (my field and top papers) from previous editions and I notice that there's a big difference on how people write, how they put their message on table and also it is too theoretical sometimes. Are there any golden rules people follow who frequently get into these conferences? Should I be soft while making novelty claims? Also those who moved from submitting to niche-conferences to NeurIPS/ICML/CVPR, did you change your approach? My field is imaging in healthcare. submitted by /u/ade17_in [link] [comments]



![[D] CVPR 2026 Travel Grant/Registration Waiver](https://d2xsxph8kpxj0f.cloudfront.net/310419663032563854/konzwo8nGf8Z4uZsMefwMr/default-img-circuit-gold-PMJWD5qsqGfXwX8w9a97Cb.webp)
Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!