Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessChinese AI rivals clash over Anthropic’s OpenClaw exit amid global token crunchSCMP Tech (Asia AI)India turns to Iran for oil and gas after 7-year hiatus, signaling limits to U.S. tiltCNBC TechnologyAirAsia X hikes ticket prices by 40%, cut capacity by 10% as Iran war hits fuel costsSCMP Tech (Asia AI)YouTube blokkeert Nvidia s DLSS 5-video na auteursclaim Italiaanse tv-zenderTweakers.netWhat are the differences between pipelines and models in Hugging Face?discuss.huggingface.coAI Mastery Course in Telugu: Hands-On Training with Real ProjectsDev.to AIHow I'm Running Autonomous AI Agents That Actually Earn USDCDev.to AIUnderstanding NLP Token Classification: From Basics to Real-World ApplicationsMedium AIGPT-5.4 Scored 75% on a Test That Measures Real Human Work. My Data Team Scored 72%.Medium AIBizNode Workflow Marketplace: chain multiple bot handles into multi-step pipelines. Client onboarding, contract-to-payment,...Dev.to AITop Artificial Intelligence Development Companies in Dubai, UAE (2026 Edition)Medium AIЯ потратил месяц на AI-инструменты и удалил половину из нихDev.to AIBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessChinese AI rivals clash over Anthropic’s OpenClaw exit amid global token crunchSCMP Tech (Asia AI)India turns to Iran for oil and gas after 7-year hiatus, signaling limits to U.S. tiltCNBC TechnologyAirAsia X hikes ticket prices by 40%, cut capacity by 10% as Iran war hits fuel costsSCMP Tech (Asia AI)YouTube blokkeert Nvidia s DLSS 5-video na auteursclaim Italiaanse tv-zenderTweakers.netWhat are the differences between pipelines and models in Hugging Face?discuss.huggingface.coAI Mastery Course in Telugu: Hands-On Training with Real ProjectsDev.to AIHow I'm Running Autonomous AI Agents That Actually Earn USDCDev.to AIUnderstanding NLP Token Classification: From Basics to Real-World ApplicationsMedium AIGPT-5.4 Scored 75% on a Test That Measures Real Human Work. My Data Team Scored 72%.Medium AIBizNode Workflow Marketplace: chain multiple bot handles into multi-step pipelines. Client onboarding, contract-to-payment,...Dev.to AITop Artificial Intelligence Development Companies in Dubai, UAE (2026 Edition)Medium AIЯ потратил месяц на AI-инструменты и удалил половину из нихDev.to AI
AI NEWS HUBbyEIGENVECTOREigenvector

IndoBERT-Relevancy: A Context-Conditioned Relevancy Classifier for Indonesian Text

arXivMarch 30, 202610 min read0 views
Source Quiz

arXiv:2603.26095v1 Announce Type: new Abstract: Determining whether a piece of text is relevant to a given topic is a fundamental task in natural language processing, yet it remains largely unexplored for Bahasa Indonesia. Unlike sentiment analysis or named entity recognition, relevancy classification requires the model to reason about the relationship between two inputs simultaneously: a topical context and a candidate text. We introduce IndoBERT-Relevancy, a context-conditioned relevancy classifier built on IndoBERT Large (335M parameters) and trained on a novel dataset of 31,360 labeled pai — Muhammad Apriandito Arya Saputra, Andry Alamsyah, Dian Puteri Ramadhani, Thomhert Suprapto Siadari, Hanif Fakhrurroja

View PDF HTML (experimental)

Abstract:Determining whether a piece of text is relevant to a given topic is a fundamental task in natural language processing, yet it remains largely unexplored for Bahasa Indonesia. Unlike sentiment analysis or named entity recognition, relevancy classification requires the model to reason about the relationship between two inputs simultaneously: a topical context and a candidate text. We introduce IndoBERT-Relevancy, a context-conditioned relevancy classifier built on IndoBERT Large (335M parameters) and trained on a novel dataset of 31,360 labeled pairs spanning 188 topics. Through an iterative, failure-driven data construction process, we demonstrate that no single data source is sufficient for robust relevancy classification, and that targeted synthetic data can effectively address specific model weaknesses. Our final model achieves an F1 score of 0.948 and an accuracy of 96.5%, handling both formal and informal Indonesian text. The model is publicly available at HuggingFace.

Comments: 9 pages, 3 figures,6 tables

Subjects:

Computation and Language (cs.CL)

MSC classes: 68T50

ACM classes: I.2.7; H.3.3

Cite as: arXiv:2603.26095 [cs.CL]

(or arXiv:2603.26095v1 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2603.26095

arXiv-issued DOI via DataCite

Submission history

From: Andry Alamsyah [view email] [v1] Fri, 27 Mar 2026 05:58:15 UTC (10 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
IndoBERT-Re…researchpaperarxivnlplanguage-mo…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 313 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers