Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessWe found $50k in forgotten subscriptionsDev.to AISMD/飞达 吸嘴、贴片机、物料车、刮刀等耗材对应的场景,以及这些耗材的市场行情,还有对应场景下的经济模式,处在哪个生态位上能够获得比较可观的收益Dev.to AIЯ автоматизировал 80% задач и уволил себя самDev.to AIIs 32GB RAM Enough for Developers in 2026? Or Will It Slow You Down?Medium AIWe Cut Our LLM Inference Bill by 73% Without Degrading Clinical AccuracyMedium AII Think I Found the Best Way to Rank in LLMsMedium AII Tested Gemma 4 on My Laptop and Turned It Into a Free Intelligence Layer for My AI AppsDev.to AIDesigning AI-Powered Event-Driven Systems: When Kafka Meets Intelligent AgentsMedium AIFrom Bit to Being: Why the Next AI Revolution Is Not Technical, but ConsciousMedium AIFrom APIs to AI Agents: How Backend Systems Are Evolving in 2026Medium AIThe AI Ascent and the No-Code Evolution Reshaping Software DevelopmentDev.to AI9 Reasons qwen3.5:9B Outshines Larger Models for Local Agents on RTX 5070 TiDev.to AIBlack Hat USADark ReadingBlack Hat AsiaAI BusinessWe found $50k in forgotten subscriptionsDev.to AISMD/飞达 吸嘴、贴片机、物料车、刮刀等耗材对应的场景,以及这些耗材的市场行情,还有对应场景下的经济模式,处在哪个生态位上能够获得比较可观的收益Dev.to AIЯ автоматизировал 80% задач и уволил себя самDev.to AIIs 32GB RAM Enough for Developers in 2026? Or Will It Slow You Down?Medium AIWe Cut Our LLM Inference Bill by 73% Without Degrading Clinical AccuracyMedium AII Think I Found the Best Way to Rank in LLMsMedium AII Tested Gemma 4 on My Laptop and Turned It Into a Free Intelligence Layer for My AI AppsDev.to AIDesigning AI-Powered Event-Driven Systems: When Kafka Meets Intelligent AgentsMedium AIFrom Bit to Being: Why the Next AI Revolution Is Not Technical, but ConsciousMedium AIFrom APIs to AI Agents: How Backend Systems Are Evolving in 2026Medium AIThe AI Ascent and the No-Code Evolution Reshaping Software DevelopmentDev.to AI9 Reasons qwen3.5:9B Outshines Larger Models for Local Agents on RTX 5070 TiDev.to AI
AI NEWS HUBbyEIGENVECTOREigenvector

LLM Probe: Evaluating LLMs for Low-Resource Languages

arXiv cs.CLby Hailay Kidu Teklehaymanot, Gebrearegawi Gebremariam, Wolfgang NejdlApril 1, 20262 min read0 views
Source Quiz

arXiv:2603.29517v1 Announce Type: new Abstract: Despite rapid advances in large language models (LLMs), their linguistic abilities in low-resource and morphologically rich languages are still not well understood due to limited annotated resources and the absence of standardized evaluation frameworks. This paper presents LLM Probe, a lexicon-based assessment framework designed to systematically evaluate the linguistic skills of LLMs in low-resource language environments. The framework analyzes models across four areas of language understanding: lexical alignment, part-of-speech recognition, morphosyntactic probing, and translation accuracy. To illustrate the framework, we create a manually annotated benchmark dataset using a low-resource Semitic language as a case study. The dataset compris

View PDF

Abstract:Despite rapid advances in large language models (LLMs), their linguistic abilities in low-resource and morphologically rich languages are still not well understood due to limited annotated resources and the absence of standardized evaluation frameworks. This paper presents LLM Probe, a lexicon-based assessment framework designed to systematically evaluate the linguistic skills of LLMs in low-resource language environments. The framework analyzes models across four areas of language understanding: lexical alignment, part-of-speech recognition, morphosyntactic probing, and translation accuracy. To illustrate the framework, we create a manually annotated benchmark dataset using a low-resource Semitic language as a case study. The dataset comprises bilingual lexicons with linguistic annotations, including part-of-speech tags, grammatical gender, and morphosyntactic features, which demonstrate high inter-annotator agreement to ensure reliable annotations. We test a variety of models, including causal language models and sequence-to-sequence architectures. The results reveal notable differences in performance across various linguistic tasks: sequence-to-sequence models generally excel in morphosyntactic analysis and translation quality, whereas causal models demonstrate strong performance in lexical alignment but exhibit weaker translation accuracy. Our results emphasize the need for linguistically grounded evaluation to better understand LLM limitations in low-resource settings. We release LLM Probe and the accompanying benchmark dataset as open-source tools to promote reproducible benchmarking and to support the development of more inclusive multilingual language technologies.

Comments: 11 pages, 6 tables

Subjects:

Computation and Language (cs.CL)

Cite as: arXiv:2603.29517 [cs.CL]

(or arXiv:2603.29517v1 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2603.29517

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Hailay Kidu Teklehaymanot [view email] [v1] Tue, 31 Mar 2026 10:03:38 UTC (104 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modellanguage modelbenchmark

Knowledge Map

Knowledge Map
TopicsEntitiesSource
LLM Probe: …modellanguage mo…benchmarkreleaseannounceopen-sourcearXiv cs.CL

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 139 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Models

Я автоматизировал 80% задач и уволил себя сам
ModelsLive

Я автоматизировал 80% задач и уволил себя сам

Ты когда-нибудь задумывался, каково это - уволить самого себя и быть счастливым? Я это сделал. 80% моей работы теперь выполняет искусственный интеллект, и, честно говоря, я не чувствую себя ни капельки уволенным. Вместо этого я занимаюсь тем, что меня действительно вдохновляет. Мой рабочий день раньше был как бесконечный цикл однообразных задач: шесть часов рутины, состоящей из отчётов, писем, таблиц и созвонов, которые, казалось, были просто ради того, чтобы заполнить время. Занятость не означала продуктивность - это была типичная ловушка «обезьяньего труда». Каждое утро начиналось с надеждой, что сегодня всё сложится иначе, а заканчивалось разочарованием. Я собрал AI для автоматизации работы-промпты в PDF. Забери бесплатно в Telegram (в закрепе): https://t.me/yevheniirozov Первым шагом н