Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessI Stress-Tested PAIO for OpenClaw: Faster Setup, Lower Token Use, Better Security?DEV CommunitySources: AI startup Poolside held talks with Google and others to revive a Texas data center project after a CoreWeave deal and a $2B Nvidia-led round collapsed (Stephen Morris/Financial Times)Techmeme🚀 I Built an API Documentation Generator That Works in 5 SecondsDEV CommunitySum, Count, and Reverse of Digits in Python (While Loop & Recursion)DEV CommunityWhen LangChain Is Enough: How to Build Useful AI Apps Without OverengineeringDEV CommunityThe Evolution of Natural Language Processing: A Journey from 1960 to 2020DEV CommunityApple Just Killed a $100M Vibe Coding App. Here's the Security Angle Nobody's Talking About.DEV CommunitySamsung SDS Unveils AI, Digital Twin Logistics Innovations at 2026 Conference - 조선일보GNews AI SamsungImplementing ECDSA from Scratch Without LibrariesDEV CommunityMachine Learning in Blockchain for AI Engineers and Blockchain Developers - Blockchain CouncilGoogle News: Machine LearningGitHub Issue Template: How to Get More Contributions and Build CommunityDEV CommunityAlpha Ladder Group and MetaComp Partner with Maqam International Holding, an Abu Dhabi (UAE) company, to Advance RWA Tokenisation and Web2.5 Payments Across Singapore-UAE Corridor - The AI JournalGNews AI UAEBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessI Stress-Tested PAIO for OpenClaw: Faster Setup, Lower Token Use, Better Security?DEV CommunitySources: AI startup Poolside held talks with Google and others to revive a Texas data center project after a CoreWeave deal and a $2B Nvidia-led round collapsed (Stephen Morris/Financial Times)Techmeme🚀 I Built an API Documentation Generator That Works in 5 SecondsDEV CommunitySum, Count, and Reverse of Digits in Python (While Loop & Recursion)DEV CommunityWhen LangChain Is Enough: How to Build Useful AI Apps Without OverengineeringDEV CommunityThe Evolution of Natural Language Processing: A Journey from 1960 to 2020DEV CommunityApple Just Killed a $100M Vibe Coding App. Here's the Security Angle Nobody's Talking About.DEV CommunitySamsung SDS Unveils AI, Digital Twin Logistics Innovations at 2026 Conference - 조선일보GNews AI SamsungImplementing ECDSA from Scratch Without LibrariesDEV CommunityMachine Learning in Blockchain for AI Engineers and Blockchain Developers - Blockchain CouncilGoogle News: Machine LearningGitHub Issue Template: How to Get More Contributions and Build CommunityDEV CommunityAlpha Ladder Group and MetaComp Partner with Maqam International Holding, an Abu Dhabi (UAE) company, to Advance RWA Tokenisation and Web2.5 Payments Across Singapore-UAE Corridor - The AI JournalGNews AI UAE

L-ReLF: A Framework for Lexical Dataset Creation

arXiv cs.CLby Anass Sedrati, Mounir Afifi, Reda BenkhadraApril 1, 20261 min read0 views
Source Quiz

arXiv:2603.29346v1 Announce Type: new Abstract: This paper introduces the L-ReLF (Low-Resource Lexical Framework), a novel, reproducible methodology for creating high-quality, structured lexical datasets for underserved languages. The lack of standardized terminology, exemplified by Moroccan Darija, poses a critical barrier to knowledge equity in platforms like Wikipedia, often forcing editors to rely on inconsistent, ad-hoc methods to create new words in their language. Our research details the technical pipeline developed to overcome these challenges. We systematically address the difficulties of working with low-resource data, including source identification, utilizing Optical Character Recognition (OCR) despite its bias towards Modern Standard Arabic, and rigorous post-processing to co

View PDF

Abstract:This paper introduces the L-ReLF (Low-Resource Lexical Framework), a novel, reproducible methodology for creating high-quality, structured lexical datasets for underserved languages. The lack of standardized terminology, exemplified by Moroccan Darija, poses a critical barrier to knowledge equity in platforms like Wikipedia, often forcing editors to rely on inconsistent, ad-hoc methods to create new words in their language. Our research details the technical pipeline developed to overcome these challenges. We systematically address the difficulties of working with low-resource data, including source identification, utilizing Optical Character Recognition (OCR) despite its bias towards Modern Standard Arabic, and rigorous post-processing to correct errors and standardize the data model. The resulting structured dataset is fully compatible with Wikidata Lexemes, serving as a vital technical resource. The L-ReLF methodology is designed for generalizability, offering other language communities a clear path to build foundational lexical data for downstream NLP applications, such as Machine Translation and morphological analysis.

Comments: Accepted to the 2026 International Conference on Natural Language Processing (ICNLP). 6 pages, 1 figure

Subjects:

Computation and Language (cs.CL)

Cite as: arXiv:2603.29346 [cs.CL]

(or arXiv:2603.29346v1 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2603.29346

arXiv-issued DOI via DataCite (pending registration)

Journal reference: Proceedings of the 2026 International Conference on Natural Language Processing (ICNLP)

Submission history

From: Anass Sedrati [view email] [v1] Tue, 31 Mar 2026 07:19:00 UTC (330 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modelannounceapplication

Knowledge Map

Knowledge Map
TopicsEntitiesSource
L-ReLF: A F…modelannounceapplicationplatformanalysispaperarXiv cs.CL

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 109 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Products