Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessCrowdStrike, Cisco and Palo Alto Networks all shipped agentic SOC tools at RSAC 2026 — and all three missed the same gapVentureBeat AIMassachusetts Sen. Ed Markey is putting AV firms on blast for using human staffersFast Company TechThe New Duet: AI as Creative MediumDev.to AIThree Things Had to Align: The Real Story Behind the LLM RevolutionDev.to AIBig Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.Dev.to AIThe World of AIDev.to AIIntel to Report First-Quarter 2026 Financial Resultsnewsroom.intel.comAgentic AI Patterns Reinforce Engineering Discipline - infoq.comGNews AI agentic# I Tried 7 AI Tools for 30 Days — Here’s How I Made My First $300 Online (Beginner Friendly)Medium AIHow TurboQuant Works for LLMs and Why It Uses Much Less RAMDev.to AIb8598llama.cpp ReleasesThe Worst Case & Deceptive Best Case scenario of 21st CenturyMedium AIBlack Hat USADark ReadingBlack Hat AsiaAI BusinessCrowdStrike, Cisco and Palo Alto Networks all shipped agentic SOC tools at RSAC 2026 — and all three missed the same gapVentureBeat AIMassachusetts Sen. Ed Markey is putting AV firms on blast for using human staffersFast Company TechThe New Duet: AI as Creative MediumDev.to AIThree Things Had to Align: The Real Story Behind the LLM RevolutionDev.to AIBig Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.Dev.to AIThe World of AIDev.to AIIntel to Report First-Quarter 2026 Financial Resultsnewsroom.intel.comAgentic AI Patterns Reinforce Engineering Discipline - infoq.comGNews AI agentic# I Tried 7 AI Tools for 30 Days — Here’s How I Made My First $300 Online (Beginner Friendly)Medium AIHow TurboQuant Works for LLMs and Why It Uses Much Less RAMDev.to AIb8598llama.cpp ReleasesThe Worst Case & Deceptive Best Case scenario of 21st CenturyMedium AI

Translation or Recitation? Calibrating Evaluation Scores for Machine Translation of Extremely Low-Resource Languages

arXivMarch 26, 202610 min read0 views
Source Quiz

The landscape of extremely low-resource machine translation (MT) is characterized by perplexing variability in reported performance, often making results across different language pairs difficult to contextualize. For researchers focused on specific language groups -- such as ancient languages -- it is nearly impossible to determine if breakthroughs reported in other contexts (e.g., native African or American languages) result from superior methodologies or are merely artifacts of benchmark collection. To address this problem, we introduce the FRED Difficulty Metrics, which include the Fertili — Danlu Chen, Ka Sing He, Jiahe Tian

View PDF

Abstract:The landscape of extremely low-resource machine translation (MT) is characterized by perplexing variability in reported performance, often making results across different language pairs difficult to contextualize. For researchers focused on specific language groups -- such as ancient languages -- it is nearly impossible to determine if breakthroughs reported in other contexts (e.g., native African or American languages) result from superior methodologies or are merely artifacts of benchmark collection. To address this problem, we introduce the FRED Difficulty Metrics, which include the Fertility Ratio (F), Retrieval Proxy (R), Pre-training Exposure (E), and Corpus Diversity (D) and serve as dataset-intrinsic metrics to contextualize reported scores. These metrics reveal that a significant portion of result variability is explained by train-test overlap and pre-training exposure rather than model capability. Additionally, we identify that some languages -- particularly extinct and non-Latin indigenous languages -- suffer from poor tokenization coverage (high token fertility), highlighting a fundamental limitation of transferring models from high-resource languages that lack a shared vocabulary. By providing these indices alongside performance scores, we enable more transparent evaluation of cross-lingual transfer and provide a more reliable foundation for the XLR MT community.

Subjects:

Computation and Language (cs.CL); Machine Learning (cs.LG)

Cite as: arXiv:2603.25222 [cs.CL]

(or arXiv:2603.25222v1 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2603.25222

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Danlu Chen [view email] [v1] Thu, 26 Mar 2026 09:20:17 UTC (6,934 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Translation…researchpaperarxivnlplanguage-mo…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 128 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers