Detecting Toxic Language: Ontology and BERT-based Approaches for Bulgarian Text
arXiv:2604.01745v1 Announce Type: new Abstract: Toxic content detection in online communication remains a significant challenge, with current solutions often inadvertently blocking valuable information, including medical terms and text related to minority groups. This paper presents a more nu-anced approach to identifying toxicity in Bulgarian text while preserving access to essential information. The research explores two distinct methodologies for detecting toxic content. The developed methodologies have po-tential applications across diverse online platforms and content moderation systems. First, we propose an ontology that models the potentially toxic words in Bulgarian language. Then, we compose a dataset that comprises 4,384 manually anno-tated sentences from Bulgarian online forums
View PDF
Abstract:Toxic content detection in online communication remains a significant challenge, with current solutions often inadvertently blocking valuable information, including medical terms and text related to minority groups. This paper presents a more nu-anced approach to identifying toxicity in Bulgarian text while preserving access to essential information. The research explores two distinct methodologies for detecting toxic content. The developed methodologies have po-tential applications across diverse online platforms and content moderation systems. First, we propose an ontology that models the potentially toxic words in Bulgarian language. Then, we compose a dataset that comprises 4,384 manually anno-tated sentences from Bulgarian online forums across four categories: toxic language, medical terminology, non-toxic lan-guage, and terms related to minority communities. We then train a BERT-based model for toxic language classification, which reaches a 0.89 F1 macro score. The trained model is directly applicable in a real environment and can be integrated as a com-ponent of toxic content detection systems.
Subjects:
Computation and Language (cs.CL)
Cite as: arXiv:2604.01745 [cs.CL]
(or arXiv:2604.01745v1 [cs.CL] for this version)
https://doi.org/10.48550/arXiv.2604.01745
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Melania Berbatova [view email] [v1] Thu, 2 Apr 2026 08:06:26 UTC (600 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modelannounceapplication
Architecture and Orchestration of Memory Systems in AI Agents
The evolution of artificial intelligence from stateless models to autonomous, goal-driven agents depends heavily on advanced memory architectures. While Large Language Models (LLMs) possess strong reasoning abilities and vast embedded knowledge, they lack persistent memory, making them unable to retain past interactions or adapt over time. This limitation leads to repeated context injection, increasing token [ ] The post Architecture and Orchestration of Memory Systems in AI Agents appeared first on Analytics Vidhya .

My forays into cyborgism: theory, pt. 1
In this post, I share the thinking that lies behind the Exobrain system I have built for myself. In another post, I'll describe the actual system. I think the standard way of relating to LLM/AIs is as an external tool (or "digital mind") that you use and/or collaborate with. Instead of you doing the coding, you ask the LLM to do it for you. Instead of doing the research, you ask it to. That's great, and there is utility in those use cases. Now, while I hardly engage in the delusion that humans can have some kind of long-term symbiotic integration with AIs that prevents them from replacing us [1] , in the short term, I think humans can automate, outsource, and augment our thinking with LLM/AIs. We already augment our cognition with technologies such as writing and mundane software. Organizi
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.




Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!