Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessWhy do I believe preserving structure is enough?LessWrong AIMCP Observability: Logging, Auditing, and Debugging Agent-Server Interactions in ProductionDEV CommunityEfficient Real-Time Flight Tracking in Browsers: Framework-Free, Cross-Platform SolutionDEV CommunityI Built a Visual Spec-Driven Development Extension for VS Code That Works With Any LLMDEV CommunityFinancialClaw: making OpenClaw useful for personal financeDEV CommunityOpenAI acquires TBPNDEV CommunityA Human Asked Me to Build a Game About My Life. So I Did.DEV CommunityFinancialClaw: haciendo útil a OpenClaw para finanzas personalesDEV CommunitySources: Meta has paused its work with Mercor while it investigates a security breach at the data vendor; OpenAI says it is investigating the security incident (Wired)TechmemeExplainable Causal Reinforcement Learning for circular manufacturing supply chains during mission-critical recovery windowsDEV CommunityYou test your code. Why aren’t you testing your AI instructions?DEV CommunityAsthenosphereDEV CommunityBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessWhy do I believe preserving structure is enough?LessWrong AIMCP Observability: Logging, Auditing, and Debugging Agent-Server Interactions in ProductionDEV CommunityEfficient Real-Time Flight Tracking in Browsers: Framework-Free, Cross-Platform SolutionDEV CommunityI Built a Visual Spec-Driven Development Extension for VS Code That Works With Any LLMDEV CommunityFinancialClaw: making OpenClaw useful for personal financeDEV CommunityOpenAI acquires TBPNDEV CommunityA Human Asked Me to Build a Game About My Life. So I Did.DEV CommunityFinancialClaw: haciendo útil a OpenClaw para finanzas personalesDEV CommunitySources: Meta has paused its work with Mercor while it investigates a security breach at the data vendor; OpenAI says it is investigating the security incident (Wired)TechmemeExplainable Causal Reinforcement Learning for circular manufacturing supply chains during mission-critical recovery windowsDEV CommunityYou test your code. Why aren’t you testing your AI instructions?DEV CommunityAsthenosphereDEV Community
AI NEWS HUBbyEIGENVECTOREigenvector

GhanaNLP Parallel Corpora: Comprehensive Multilingual Resources for Low-Resource Ghanaian Languages

arXivby [Submitted on 14 Mar 2026 (v1), last revised 30 Mar 2026 (this version, v2)]March 31, 20262 min read1 views
Source Quiz

arXiv:2603.13793v2 Announce Type: replace-cross Abstract: Low resource languages present unique challenges for natural language processing due to the limited availability of digitized and well structured linguistic data. To address this gap, the GhanaNLP initiative has developed and curated 41,513 parallel sentence pairs for the Twi, Fante, Ewe, Ga, and Kusaal languages, which are widely spoken across Ghana yet remain underrepresented in digital spaces. Each dataset consists of carefully aligned sentence pairs between a local language and English. The data were collected, translated, and annot — Lawrence Adu Gyamfi, Paul Azunre, Stephen Edward Moore, Joel Budu, Akwasi Asare, Mich-Seth Owusu, Jonathan Ofori Asiamah

View PDF

Abstract:Low resource languages present unique challenges for natural language processing due to the limited availability of digitized and well structured linguistic data. To address this gap, the GhanaNLP initiative has developed and curated 41,513 parallel sentence pairs for the Twi, Fante, Ewe, Ga, and Kusaal languages, which are widely spoken across Ghana yet remain underrepresented in digital spaces. Each dataset consists of carefully aligned sentence pairs between a local language and English. The data were collected, translated, and annotated by human professionals and enriched with standard structural metadata to ensure consistency and usability. These corpora are designed to support research, educational, and commercial applications, including machine translation, speech technologies, and language preservation. This paper documents the dataset creation methodology, structure, intended use cases, and evaluation, as well as their deployment in real world applications such as the Khaya AI translation engine. Overall, this work contributes to broader efforts to democratize AI by enabling inclusive and accessible language technologies for African languages.

Subjects:

Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Cite as: arXiv:2603.13793 [cs.CL]

(or arXiv:2603.13793v2 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2603.13793

arXiv-issued DOI via DataCite

Submission history

From: Akwasi Asare [view email] [v1] Sat, 14 Mar 2026 06:49:05 UTC (2,097 KB) [v2] Mon, 30 Mar 2026 15:19:36 UTC (2,097 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
GhanaNLP Pa…researchpaperarxivaiartificial-…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 179 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!