Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessMy forays into cyborgism: theory, pt. 1LessWrongAI Is a Threat to Everything the American People Hold Dear – Bernie Sanders OpEdHacker News AI TopIgnore AI FOMO – For NowHacker News AI TopThe Engineer as Reader: Why Literature Skills Matter for Software Engineers in the Age of AIMedium AIApex Protocol – An open MCP-based standard for AI agent tradingHacker News AI TopWhen Enterprises Build an Agent OS, the Operating Model Must Change TooMedium AIBuilding a RAG-Powered Smart AI Chatbot for E-commerce application using LangChainMedium AIIntelligence isn’t genetic it’s something to be built part 2Medium AIWhich AI Tool Should You Use for What?Medium AIAI and Authority: What Happens When Writing No Longer Proves ExpertiseMedium AIThe One-Person Unicorn Is Impossible Until AI Outputs Are Officially RecognizedMedium AIShow HN: hot or not for .ai websitesHacker News AI TopBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessMy forays into cyborgism: theory, pt. 1LessWrongAI Is a Threat to Everything the American People Hold Dear – Bernie Sanders OpEdHacker News AI TopIgnore AI FOMO – For NowHacker News AI TopThe Engineer as Reader: Why Literature Skills Matter for Software Engineers in the Age of AIMedium AIApex Protocol – An open MCP-based standard for AI agent tradingHacker News AI TopWhen Enterprises Build an Agent OS, the Operating Model Must Change TooMedium AIBuilding a RAG-Powered Smart AI Chatbot for E-commerce application using LangChainMedium AIIntelligence isn’t genetic it’s something to be built part 2Medium AIWhich AI Tool Should You Use for What?Medium AIAI and Authority: What Happens When Writing No Longer Proves ExpertiseMedium AIThe One-Person Unicorn Is Impossible Until AI Outputs Are Officially RecognizedMedium AIShow HN: hot or not for .ai websitesHacker News AI Top
AI NEWS HUBbyEIGENVECTOREigenvector

Detecting Toxic Language: Ontology and BERT-based Approaches for Bulgarian Text

arXiv cs.CLby [Submitted on 2 Apr 2026]April 4, 20261 min read1 views
Source Quiz

arXiv:2604.01745v1 Announce Type: new Abstract: Toxic content detection in online communication remains a significant challenge, with current solutions often inadvertently blocking valuable information, including medical terms and text related to minority groups. This paper presents a more nu-anced approach to identifying toxicity in Bulgarian text while preserving access to essential information. The research explores two distinct methodologies for detecting toxic content. The developed methodologies have po-tential applications across diverse online platforms and content moderation systems. First, we propose an ontology that models the potentially toxic words in Bulgarian language. Then, we compose a dataset that comprises 4,384 manually anno-tated sentences from Bulgarian online forums

View PDF

Abstract:Toxic content detection in online communication remains a significant challenge, with current solutions often inadvertently blocking valuable information, including medical terms and text related to minority groups. This paper presents a more nu-anced approach to identifying toxicity in Bulgarian text while preserving access to essential information. The research explores two distinct methodologies for detecting toxic content. The developed methodologies have po-tential applications across diverse online platforms and content moderation systems. First, we propose an ontology that models the potentially toxic words in Bulgarian language. Then, we compose a dataset that comprises 4,384 manually anno-tated sentences from Bulgarian online forums across four categories: toxic language, medical terminology, non-toxic lan-guage, and terms related to minority communities. We then train a BERT-based model for toxic language classification, which reaches a 0.89 F1 macro score. The trained model is directly applicable in a real environment and can be integrated as a com-ponent of toxic content detection systems.

Subjects:

Computation and Language (cs.CL)

Cite as: arXiv:2604.01745 [cs.CL]

(or arXiv:2604.01745v1 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2604.01745

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Melania Berbatova [view email] [v1] Thu, 2 Apr 2026 08:06:26 UTC (600 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modelannounceapplication

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Detecting T…modelannounceapplicationplatformpaperarxivarXiv cs.CL

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 170 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!