L-ReLF: A Framework for Lexical Dataset Creation
arXiv:2603.29346v1 Announce Type: new Abstract: This paper introduces the L-ReLF (Low-Resource Lexical Framework), a novel, reproducible methodology for creating high-quality, structured lexical datasets for underserved languages. The lack of standardized terminology, exemplified by Moroccan Darija, poses a critical barrier to knowledge equity in platforms like Wikipedia, often forcing editors to rely on inconsistent, ad-hoc methods to create new words in their language. Our research details the technical pipeline developed to overcome these challenges. We systematically address the difficulties of working with low-resource data, including source identification, utilizing Optical Character Recognition (OCR) despite its bias towards Modern Standard Arabic, and rigorous post-processing to co
View PDF
Abstract:This paper introduces the L-ReLF (Low-Resource Lexical Framework), a novel, reproducible methodology for creating high-quality, structured lexical datasets for underserved languages. The lack of standardized terminology, exemplified by Moroccan Darija, poses a critical barrier to knowledge equity in platforms like Wikipedia, often forcing editors to rely on inconsistent, ad-hoc methods to create new words in their language. Our research details the technical pipeline developed to overcome these challenges. We systematically address the difficulties of working with low-resource data, including source identification, utilizing Optical Character Recognition (OCR) despite its bias towards Modern Standard Arabic, and rigorous post-processing to correct errors and standardize the data model. The resulting structured dataset is fully compatible with Wikidata Lexemes, serving as a vital technical resource. The L-ReLF methodology is designed for generalizability, offering other language communities a clear path to build foundational lexical data for downstream NLP applications, such as Machine Translation and morphological analysis.
Comments: Accepted to the 2026 International Conference on Natural Language Processing (ICNLP). 6 pages, 1 figure
Subjects:
Computation and Language (cs.CL)
Cite as: arXiv:2603.29346 [cs.CL]
(or arXiv:2603.29346v1 [cs.CL] for this version)
https://doi.org/10.48550/arXiv.2603.29346
arXiv-issued DOI via DataCite (pending registration)
Journal reference: Proceedings of the 2026 International Conference on Natural Language Processing (ICNLP)
Submission history
From: Anass Sedrati [view email] [v1] Tue, 31 Mar 2026 07:19:00 UTC (330 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modelannounceapplicationVoices Enables Fast Text-to-Speech for Java Applications - infoq.com
<a href="https://news.google.com/rss/articles/CBMiaEFVX3lxTE1LMUxFYkdzRzRYRnBsNU85SVRreGlFQVduU1E0aWpZOHlob0dnMnNaQ1hxc19QVkI4VFFxblZmbEd5dVcwS2JpMmFEbkNQZDJIY2k4bUhlQXhoZTVmVWk3U2JjR0x2a3FV?oc=5" target="_blank">Voices Enables Fast Text-to-Speech for Java Applications</a> <font color="#6f6f6f">infoq.com</font>
Variables: Data Storage and Information Organization
<p>Level: Beginner | Stack: Frontend and Backend | Type: Dictionary</p> <p>A <strong>variable</strong> is a space in the computer's memory reserved to store data that can be used and modified during the execution of a program. They solve the problem of value memorization, allowing the developer to use user-friendly names to manipulate complex or dynamic information.</p> <h3> Variable Types and Data Types </h3> <p>In development, every language has its own way of handling data. While the core concepts are similar (numbers, text, booleans), the <strong>nomenclatures</strong> and <strong>typing</strong> vary significantly.</p> <h4> JavaScript (and TypeScript) </h4> <p>JavaScript is known for its dynamic typing, but TypeScript adds rigor to these types.</p> <ul> <li> <strong>Number</strong>: R
Apple Just Killed a $100M Vibe Coding App. Here's the Security Angle Nobody's Talking About.
<p>Last week, Apple removed "Anything" from the App Store. The startup had raised $11M at a $100M valuation. Gone overnight.</p> <p>Replit and Vibecode are also blocked from releasing updates.</p> <p>The tech press is calling it anticompetitive. X is full of takes about Apple killing innovation. The narrative is simple: Apple wants you to use Xcode with their AI tools, not third-party vibe coding apps.</p> <p>But here's what nobody's talking about: <strong>Apple cited Guideline 2.5.2</strong>. And that's a security rule, not a competition rule.</p> <h2> What Guideline 2.5.2 Actually Says </h2> <blockquote> <p>"Apps should be self-contained in their bundles, and may not read or write data outside the designated container area, nor may they download, install, or execute code which introduces
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Products
M42's Abu Dhabi Health Data Services and TELUS Health to collaborate on AI-powered healthcare innovation across the UAE and broader region - PR Newswire
<a href="https://news.google.com/rss/articles/CBMingJBVV95cUxOWUZOUVctQXB6TEpBcno5YmtOMmV1ak82Qk9Sdm5jVF8wZnRESGNSMFB6bmlWdTlGRmxxTkh2TDNsY2ZpckdEQ0lVTURwWTJjUlJBaklLTUtEeW9rWGFsU2Mxc0JnTld5Y0NNbHNWN1JtUUREZ1cyUkZwRFBtOTQwNTJBaGZwaUpyX0M2TWRrOWNDNnRuS18tcWdOWTZkNVlyV081NC14bVNjLVRnMFlJR0JmNEJyYkYxSTNjUFdUczJPS1hKX3g1RjMyTE1QRlFPbktVcGF6d05oV2ttRVd5MUVsRUJhbDdDUUJmUGFqZ2FoOVpHamY5ckJGdlNRejhkM1F2ZDFQcVc5ZkNDVzNWWGNTdE5ZdXRIQURkVUh3?oc=5" target="_blank">M42's Abu Dhabi Health Data Services and TELUS Health to collaborate on AI-powered healthcare innovation across the UAE and broader region</a> <font color="#6f6f6f">PR Newswire</font>
Paris-based AI voice startup Gradium nabs $70M seed - TechCrunch
<a href="https://news.google.com/rss/articles/CBMijgFBVV95cUxQbldEdnpXaXo0YTRhSXRyS0I1R0VsR0w2QmVXLWpDTGdWZHhzOVdzUlBVejhiNUpsdkstWjNxMDhtQkNxZWtha0lyekNXTEVaVEJJdGxJYTM2OWVmMkZUMzE2clFwZGtMZkc5ZzlYRE9OMTVMSVpxd05IRngtM0hodHdQeER3WjN2dlphVl9B?oc=5" target="_blank">Paris-based AI voice startup Gradium nabs $70M seed</a> <font color="#6f6f6f">TechCrunch</font>
Voices Enables Fast Text-to-Speech for Java Applications - infoq.com
<a href="https://news.google.com/rss/articles/CBMiaEFVX3lxTE1LMUxFYkdzRzRYRnBsNU85SVRreGlFQVduU1E0aWpZOHlob0dnMnNaQ1hxc19QVkI4VFFxblZmbEd5dVcwS2JpMmFEbkNQZDJIY2k4bUhlQXhoZTVmVWk3U2JjR0x2a3FV?oc=5" target="_blank">Voices Enables Fast Text-to-Speech for Java Applications</a> <font color="#6f6f6f">infoq.com</font>
PixVerse Unveils V6 to Push AI Video From Creative Toy to Production Tool - TipRanks
<a href="https://news.google.com/rss/articles/CBMivgFBVV95cUxOUE5OcmNhWldpVzdOSUJTSF9Ba19rUXlHVTJzeEdobm9qYThRaUExUGR6RmF5QzNqWUZpU3Fwc1psYzVsVGlNdF9rOWlTd255U1JzcHh0R3RKZDlvOFFXbUY3NVd1cDhWem9jZEVxYlk3cldSTzJPdXFJQUZaczRXYVJJekZQbjI2ZmpXclZPYmFTQk1UNkdhR0FuemZ6YkpIQWRhcDVqcndCeFVhWkVuSm9fSUJGWW04Y0h2SkhR?oc=5" target="_blank">PixVerse Unveils V6 to Push AI Video From Creative Toy to Production Tool</a> <font color="#6f6f6f">TipRanks</font>

Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!