Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessI Built an MCP Server That Understands Your MSBuild Project Graph — Before You BuildDEV CommunityAgent Middleware in Microsoft Agent Framework 1.0DEV Communityکود کشاورزی — Complete GuideDEV CommunityHow I Track My AI Spending as a Solo Dev (Without Going Broke)DEV CommunityWe Shipped an AI Song Generator. The Hardest Part Wasn't the AI.DEV CommunityPOTS explained: The disorder that forced OpenAI exec Fidji Simo to take medical leaveBusiness InsiderWhat is POTS, the disorder that forced OpenAI exec Fidji Simo to take medical leave - Business InsiderGoogle News: OpenAI"I Built a Web Browser from Scratch in 42 Days — No Libraries, Just Node.js"DEV CommunityWhy 80% of AI Projects Fail — And the 4-Layer Readiness Framework That Changes the OddsTowards AIWhy Your Data Governance is Already ObsoleteAI YouTube Channel 35How to Clean Up Xcode and Free 30-50GB on Your MacDEV CommunityNo Fooling, Spaceballs 2 Will Hit Theaters April 2027GizmodoBlack Hat USADark ReadingBlack Hat AsiaAI BusinessI Built an MCP Server That Understands Your MSBuild Project Graph — Before You BuildDEV CommunityAgent Middleware in Microsoft Agent Framework 1.0DEV Communityکود کشاورزی — Complete GuideDEV CommunityHow I Track My AI Spending as a Solo Dev (Without Going Broke)DEV CommunityWe Shipped an AI Song Generator. The Hardest Part Wasn't the AI.DEV CommunityPOTS explained: The disorder that forced OpenAI exec Fidji Simo to take medical leaveBusiness InsiderWhat is POTS, the disorder that forced OpenAI exec Fidji Simo to take medical leave - Business InsiderGoogle News: OpenAI"I Built a Web Browser from Scratch in 42 Days — No Libraries, Just Node.js"DEV CommunityWhy 80% of AI Projects Fail — And the 4-Layer Readiness Framework That Changes the OddsTowards AIWhy Your Data Governance is Already ObsoleteAI YouTube Channel 35How to Clean Up Xcode and Free 30-50GB on Your MacDEV CommunityNo Fooling, Spaceballs 2 Will Hit Theaters April 2027Gizmodo
AI NEWS HUBbyEIGENVECTOREigenvector

Toward Culturally Grounded Natural Language Processing

arXivMarch 30, 202610 min read0 views
Source Quiz

arXiv:2603.26013v1 Announce Type: new Abstract: Recent progress in multilingual NLP is often taken as evidence of broader global inclusivity, but a growing literature shows that multilingual capability and cultural competence come apart. This paper synthesizes over 50 papers from 2020--2026 spanning multilingual performance inequality, cross-lingual transfer, culture-aware evaluation, cultural alignment, multimodal local-knowledge modeling, benchmark design critiques, and community-grounded data practices. Across this literature, training data coverage remains a strong determinant of performan — Sina Bagheri Nezhad

View PDF HTML (experimental)

Abstract:Recent progress in multilingual NLP is often taken as evidence of broader global inclusivity, but a growing literature shows that multilingual capability and cultural competence come apart. This paper synthesizes over 50 papers from 2020--2026 spanning multilingual performance inequality, cross-lingual transfer, culture-aware evaluation, cultural alignment, multimodal local-knowledge modeling, benchmark design critiques, and community-grounded data practices. Across this literature, training data coverage remains a strong determinant of performance, yet it is not sufficient: tokenization, prompt language, translated benchmark design, culturally specific supervision, and multimodal context all materially affect outcomes. Recent work on Global-MMLU, CDEval, WorldValuesBench, CulturalBench, CULEMO, CulturalVQA, GIMMICK, DRISHTIKON, WorldCuisines, CARE, CLCA, and newer critiques of benchmark design and community-grounded evaluation shows that strong multilingual models can still flatten local norms, misread culturally grounded cues, and underperform in lower-resource or community-specific settings. We argue that the field should move from treating languages as isolated rows in a benchmark spreadsheet toward modeling communicative ecologies: the institutions, scripts, translation pipelines, domains, modalities, and communities through which language is used. On that basis, we propose a research agenda for culturally grounded NLP centered on richer contextual metadata, culturally stratified evaluation, participatory alignment, within-language variation, and multimodal community-aware design.

Subjects:

Computation and Language (cs.CL)

Cite as: arXiv:2603.26013 [cs.CL]

(or arXiv:2603.26013v1 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2603.26013

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Sina Bagheri Nezhad [view email] [v1] Fri, 27 Mar 2026 02:08:32 UTC (59 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Toward Cult…researchpaperarxivnlplanguage-mo…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 133 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers