Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessAI gives Japan's voice actors new commercial clout, rights protections - Japan TodayGNews AI JapanMicrosoft to invest $10 bil for Japan AI data centers - Japan TodayGNews AI JapanComcast Blackouts And NVIDIA AI Push Reshape Investor View On CMCSA - simplywall.stGNews AI NVIDIAOperationalize analytics agents: dbt AI updates + Mammoth’s AE agent in actiondbt BlogWhy OpenAI Buying TBPN Matters More Than It LooksDev.to AI'Every Industrial Company Will Become A Robotics Company,' Nvidia CEO Jensen Huang Says - Yahoo FinanceGNews AI NVIDIAI Built a Governance Layer That Works Across Claude Code, Codex, and Gemini CLIDev.to AICanônicoDev.to AIEconomyAI: Route to the Cheapest LLM That WorksDev.to AIWith hf cli, how do I resume an interrupted model download?discuss.huggingface.co5 способов использовать ChatGPT, не платя ни рубляDev.to AI⚖️ AI Is Transforming Legal Practice in Romania — Why Lawyers Who Ignore It Are Already Falling BehindDev.to AIBlack Hat USADark ReadingBlack Hat AsiaAI BusinessAI gives Japan's voice actors new commercial clout, rights protections - Japan TodayGNews AI JapanMicrosoft to invest $10 bil for Japan AI data centers - Japan TodayGNews AI JapanComcast Blackouts And NVIDIA AI Push Reshape Investor View On CMCSA - simplywall.stGNews AI NVIDIAOperationalize analytics agents: dbt AI updates + Mammoth’s AE agent in actiondbt BlogWhy OpenAI Buying TBPN Matters More Than It LooksDev.to AI'Every Industrial Company Will Become A Robotics Company,' Nvidia CEO Jensen Huang Says - Yahoo FinanceGNews AI NVIDIAI Built a Governance Layer That Works Across Claude Code, Codex, and Gemini CLIDev.to AICanônicoDev.to AIEconomyAI: Route to the Cheapest LLM That WorksDev.to AIWith hf cli, how do I resume an interrupted model download?discuss.huggingface.co5 способов использовать ChatGPT, не платя ни рубляDev.to AI⚖️ AI Is Transforming Legal Practice in Romania — Why Lawyers Who Ignore It Are Already Falling BehindDev.to AI
AI NEWS HUBbyEIGENVECTOREigenvector

Estonian WinoGrande Dataset: Comparative Analysis of LLM Performance on Human and Machine Translation

arXivMarch 31, 20262 min read0 views
Source Quiz

arXiv:2511.17290v2 Announce Type: replace Abstract: In this paper, we present a localized and culturally adapted Estonian translation of the test set from the widely used commonsense reasoning benchmark, WinoGrande. We detail the translation and adaptation process carried out by translation specialists and evaluate the performance of both proprietary and open source models on the human translated benchmark. Additionally, we explore the feasibility of achieving high-quality machine translation by incorporating insights from the manual translation process into the design of a detailed prompt. Th — Marii Ojastu, Hele-Andra Kuulmets, Aleksei Dorkin, Marika Borovikova, Dage S\"arg, Kairit Sirts

View PDF HTML (experimental)

Abstract:In this paper, we present a localized and culturally adapted Estonian translation of the test set from the widely used commonsense reasoning benchmark, WinoGrande. We detail the translation and adaptation process carried out by translation specialists and evaluate the performance of both proprietary and open source models on the human translated benchmark. Additionally, we explore the feasibility of achieving high-quality machine translation by incorporating insights from the manual translation process into the design of a detailed prompt. This prompt is specifically tailored to address both the linguistic characteristics of Estonian and the unique translation challenges posed by the WinoGrande dataset. Our findings show that model performance on the human translated Estonian dataset is slightly lower than on the original English test set, while performance on machine-translated data is notably worse. Additionally, our experiments indicate that prompt engineering offers limited improvement in translation quality or model accuracy, and highlight the importance of involving language specialists in dataset translation and adaptation to ensure reliable and interpretable evaluations of language competency and reasoning in large language models.

Comments: LREC 2026

Subjects:

Computation and Language (cs.CL)

Cite as: arXiv:2511.17290 [cs.CL]

(or arXiv:2511.17290v2 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2511.17290

arXiv-issued DOI via DataCite

Submission history

From: Marii Ojastu [view email] [v1] Fri, 21 Nov 2025 15:01:57 UTC (68 KB) [v2] Mon, 30 Mar 2026 13:26:58 UTC (78 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Estonian Wi…researchpaperarxivnlplanguage-mo…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 156 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers