OnCoCo 1.0: A Public Dataset for Fine-Grained Message Classification in Online Counseling Conversations
arXiv:2512.09804v2 Announce Type: replace-cross Abstract: This paper presents OnCoCo 1.0, a new public dataset for fine-grained message classification in online counseling. It is based on a new, integrative system of categories, designed to improve the automated analysis of psychosocial online counseling conversations. Existing category systems, predominantly based on Motivational Interviewing (MI), are limited by their narrow focus and dependence on datasets derived mainly from face-to-face counseling. This limits the detailed examination of textual counseling conversations. In response, we d — Jens Albrecht, Robert Lehmann, Aleksandra Poltermann, Eric Rudolph, Philipp Steigerwald, Mara Stieler
View PDF HTML (experimental)
Abstract:This paper presents OnCoCo 1.0, a new public dataset for fine-grained message classification in online counseling. It is based on a new, integrative system of categories, designed to improve the automated analysis of psychosocial online counseling conversations. Existing category systems, predominantly based on Motivational Interviewing (MI), are limited by their narrow focus and dependence on datasets derived mainly from face-to-face counseling. This limits the detailed examination of textual counseling conversations. In response, we developed a comprehensive new coding scheme that differentiates between 38 types of counselor and 28 types of client utterances, and created a labeled dataset consisting of about 2.800 messages from counseling conversations. We fine-tuned several models on our dataset to demonstrate its applicability. The data and models are publicly available to researchers and practitioners. Thus, our work contributes a new type of fine-grained conversational resource to the language resources community, extending existing datasets for social and mental-health dialogue analysis.
Comments: Accepted at SoCon-NLPSI@LREC 2026
Subjects:
Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as: arXiv:2512.09804 [cs.CL]
(or arXiv:2512.09804v2 [cs.CL] for this version)
https://doi.org/10.48550/arXiv.2512.09804
arXiv-issued DOI via DataCite
Submission history
From: Jens Albrecht [view email] [v1] Wed, 10 Dec 2025 16:18:20 UTC (69 KB) [v2] Sun, 29 Mar 2026 13:07:02 UTC (71 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxiv
We're running an AI-authored research workshop for Northeast India's 200+ languages - and publishing everything openly
<p>At MWire Labs, we build language technology for Northeast India's indigenous languages - ASR, MT, OCR, LLMs. The region has 200+ languages. Almost none of them exist in mainstream AI datasets.<br> So we're doing something a bit unusual.</p> <p>NortheastGenAI 2026 is a virtual workshop on May 29 where every submission must be AI-generated or AI-assisted - with full disclosure of how. All reviews are AI-assisted too, followed by a human editorial check. Everything is public on OpenReview. Inspired by Agents4Science 2025 (Stanford).</p> <p>We're not claiming AI research is ready. We're asking the question openly and publishing whatever comes out.</p> <p>*<em>Three tracks:<br> *</em><br> Language, Culture & Heritage<br> Society, History & Anthropology<br> AI and Technology for NE In
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers

Iran’s Revolutionary Guards just named 18 US tech firms as military targets. The age of the civilian data centre is over.
At 8pm Tehran time on Tuesday, a new kind of front line was drawn, not through desert terrain or along a disputed border, but through the server farms, cloud regions, and corporate campuses of America’s largest technology companies. The Islamic Revolutionary Guard Corps published a statement on its official Sepah News channel naming 18 US […] This story continues at The Next Web
Real-time speech-to-speech translation - research.google
<a href="https://news.google.com/rss/articles/CBMid0FVX3lxTFAxeFFhNVhOTjVXeEhXeGFHOXE3WENYeGFISjlpVGNueGtDS2ZZTEVsZHh6dkhLc191aFFYNEpMUUxraV9uTWF6YW1RcF9VTFlIZDBuQTlpbkhBRnJxU1FuTGY4aEtFc2FEaWMxekxUTnlzV3dFN1ow?oc=5" target="_blank">Real-time speech-to-speech translation</a> <font color="#6f6f6f">research.google</font>

Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!