Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessAfter a 23% Plunge in the First Quarter, Can Microsoft’s AI Story Continue? - NAI500GNews AI MicrosoftAI Video Generation Startup Runway Unveils $10 Mn VC Fund To Back Early-stage AI Startups: Report - bwdisrupt.comGNews AI startupsOracle layoffs: 12,000 jobs cut in India amid AI push, more layoffs likely - Storyboard18GNews AI IndiaIs Arista Networks (ANET) Becoming NVIDIA’s Go-To AI Network Spine or Just One Key Partner? - simplywall.stGNews AI NVIDIAZhipu's Stock Soars After Chinese AI Startup's Annual Revenue More Than Doubles - Yicai GlobalGNews AI ChinaAustralia signs AI MoU with Anthropic, flags data centre investment - W.MediaGNews AI AustraliaHong Kong hasn’t issued a single HKD stablecoin license after March targetCoinDesk AIBitcoin is closer to its 'buy zone' than it's been in three yearsCoinDesk AIRAG Web Browser: Give Your AI Real-Time Web Access Without HallucinationsDEV CommunityWhat Nobody Tells You About Building a Protocol for AI AgentsDEV CommunityHuawei highlights AI, HarmonyOS and auto momentum in 2025 annual report - TechNodeGNews AI HuaweiThe Evidence Is in the Phone. Most of It Never Makes It Into the Case.DEV CommunityBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessAfter a 23% Plunge in the First Quarter, Can Microsoft’s AI Story Continue? - NAI500GNews AI MicrosoftAI Video Generation Startup Runway Unveils $10 Mn VC Fund To Back Early-stage AI Startups: Report - bwdisrupt.comGNews AI startupsOracle layoffs: 12,000 jobs cut in India amid AI push, more layoffs likely - Storyboard18GNews AI IndiaIs Arista Networks (ANET) Becoming NVIDIA’s Go-To AI Network Spine or Just One Key Partner? - simplywall.stGNews AI NVIDIAZhipu's Stock Soars After Chinese AI Startup's Annual Revenue More Than Doubles - Yicai GlobalGNews AI ChinaAustralia signs AI MoU with Anthropic, flags data centre investment - W.MediaGNews AI AustraliaHong Kong hasn’t issued a single HKD stablecoin license after March targetCoinDesk AIBitcoin is closer to its 'buy zone' than it's been in three yearsCoinDesk AIRAG Web Browser: Give Your AI Real-Time Web Access Without HallucinationsDEV CommunityWhat Nobody Tells You About Building a Protocol for AI AgentsDEV CommunityHuawei highlights AI, HarmonyOS and auto momentum in 2025 annual report - TechNodeGNews AI HuaweiThe Evidence Is in the Phone. Most of It Never Makes It Into the Case.DEV Community

A Catalog of Basque Dialectal Resources: Online Collections and Standard-to-Dialectal Adaptations

arXivMarch 26, 202610 min read0 views
Source Quiz

Recent research on dialectal NLP has identified data scarcity as a primary limitation. To address this limitation, this paper presents a catalog of contemporary Basque dialectal data and resources, offering a systematic and comprehensive compilation of the dialectal data currently available in Basque. Two types of data sources have been distinguished: online data originally written in some dialect, and standard-to-dialect adapted data. The former includes all dialectal data that can be found online, such as news and radio sites, informal tweets, as well as online resources such as dictionaries — Jaione Bengoetxea, Itziar Gonzalez-Dios, Rodrigo Agerri

View PDF HTML (experimental)

Abstract:Recent research on dialectal NLP has identified data scarcity as a primary limitation. To address this limitation, this paper presents a catalog of contemporary Basque dialectal data and resources, offering a systematic and comprehensive compilation of the dialectal data currently available in Basque. Two types of data sources have been distinguished: online data originally written in some dialect, and standard-to-dialect adapted data. The former includes all dialectal data that can be found online, such as news and radio sites, informal tweets, as well as online resources such as dictionaries, atlases, grammar rules, or videos. The latter consists of data that has been adapted from the standard variety to dialectal varieties, either manually or automatically. Regarding the manual adaptation, the test split of the XNLI Natural Language Inference dataset was manually adapted into three Basque dialects: Western, Central, and Navarrese-Lapurdian, yielding a high-quality parallel gold standard evaluation dataset. With respect to the automatic dialectal adaptation, the automatically adapted physical commonsense dataset (BasPhyCowest) underwent additional manual evaluation by native speakers to assess its quality and determine whether it could serve as a viable substitute for full manual adaptation (i.e., silver data creation).

Subjects:

Computation and Language (cs.CL)

Cite as: arXiv:2603.25189 [cs.CL]

(or arXiv:2603.25189v1 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2603.25189

arXiv-issued DOI via DataCite

Submission history

From: Jaione Bengoetxea [view email] [v1] Thu, 26 Mar 2026 08:55:23 UTC (2,348 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
A Catalog o…researchpaperarxivnlplanguage-mo…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 97 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers