Live
Black Hat USADark ReadingBlack Hat AsiaAI Businessv1.83.0-nightlyLiteLLM ReleasesShow HN: Tama96 – A virtual pet for your desktop, terminal, or AI agentHacker News AI TopWhy Your AI Agent Shouldn't Define WordsHacker News AI TopCaltech Researchers Claim Compression of High-Fidelity AI ModelsHacker News AI Topb8601llama.cpp ReleasesCafé, e o prompt principal para gerar as ilustrações — Temperança DigitalMedium AIAustin-based Saronic, which builds military autonomous ships, raised a $1.75B Series D led by Kleiner Perkins at a $9.25B valuation, up from $4B in Feb. 2025 (Samantha Subin/CNBC)TechmemeWe Don’t Have a Memory Problem. We Have a Knowledge Problem.Medium AII Use AI to Prepare for Every Oral Exam. Here’s Exactly How.Medium AIQuantum Machine Learning Gains Vital Reliability Checks For Data Mapping - Quantum ZeitgeistGoogle News: Machine Learningb8600llama.cpp ReleasesFalse Flags Are Killing Writers— Here’s How to Avoid Them in 2026Medium AIBlack Hat USADark ReadingBlack Hat AsiaAI Businessv1.83.0-nightlyLiteLLM ReleasesShow HN: Tama96 – A virtual pet for your desktop, terminal, or AI agentHacker News AI TopWhy Your AI Agent Shouldn't Define WordsHacker News AI TopCaltech Researchers Claim Compression of High-Fidelity AI ModelsHacker News AI Topb8601llama.cpp ReleasesCafé, e o prompt principal para gerar as ilustrações — Temperança DigitalMedium AIAustin-based Saronic, which builds military autonomous ships, raised a $1.75B Series D led by Kleiner Perkins at a $9.25B valuation, up from $4B in Feb. 2025 (Samantha Subin/CNBC)TechmemeWe Don’t Have a Memory Problem. We Have a Knowledge Problem.Medium AII Use AI to Prepare for Every Oral Exam. Here’s Exactly How.Medium AIQuantum Machine Learning Gains Vital Reliability Checks For Data Mapping - Quantum ZeitgeistGoogle News: Machine Learningb8600llama.cpp ReleasesFalse Flags Are Killing Writers— Here’s How to Avoid Them in 2026Medium AI

Don't Stop the Multi-Party! On Generating Synthetic Written Multi-Party Conversations with Constraints

arXivMarch 30, 202610 min read0 views
Source Quiz

arXiv:2502.13592v2 Announce Type: replace Abstract: Written Multi-Party Conversations (WMPCs) are widely studied across disciplines, with social media as a primary data source due to their accessibility. However, these datasets raise privacy concerns and often reflect platform-specific properties. For example, interactions between speakers may be limited due to rigid platform structures (e.g., threads, tree-like discussions), which yield overly simplistic interaction patterns (e.g., one-to-one "reply-to" links). This work explores the feasibility of generating synthetic WMPCs with instruction- — Nicol\`o Penzo, Marco Guerini, Bruno Lepri, Goran Glava\v{s}, Sara Tonelli

View PDF HTML (experimental)

Abstract:Written Multi-Party Conversations (WMPCs) are widely studied across disciplines, with social media as a primary data source due to their accessibility. However, these datasets raise privacy concerns and often reflect platform-specific properties. For example, interactions between speakers may be limited due to rigid platform structures (e.g., threads, tree-like discussions), which yield overly simplistic interaction patterns (e.g., one-to-one "reply-to" links). This work explores the feasibility of generating synthetic WMPCs with instruction-tuned Large Language Models (LLMs) by providing deterministic constraints such as dialogue structure and participants' stance. We investigate two complementary strategies of leveraging LLMs in this context: (i.) LLMs as WMPC generators, where we task the LLM to generate a whole WMPC at once and (ii.) LLMs as WMPC parties, where the LLM generates one turn of the conversation at a time (made of speaker, addressee and message), provided the conversation history. We next introduce an analytical framework to evaluate compliance with the constraints, content quality, and interaction complexity for both strategies. Finally, we assess the level of obtained WMPCs via human and LLM-as-a-judge evaluations. We find stark differences among LLMs, with only some being able to generate high-quality WMPCs. We also find that turn-by-turn generation yields better conformance to constraints and higher linguistic variability than generating WMPCs in one pass. Nonetheless, our structural and qualitative evaluation indicates that both generation strategies can yield high-quality WMPCs.

Comments: Accepted at AAAI2026

Subjects:

Computation and Language (cs.CL)

Cite as: arXiv:2502.13592 [cs.CL]

(or arXiv:2502.13592v2 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2502.13592

arXiv-issued DOI via DataCite

Related DOI:

https://doi.org/10.1609/aaai.v40i39.40548

DOI(s) linking to related resources

Submission history

From: Nicolò Penzo [view email] [v1] Wed, 19 Feb 2025 10:10:43 UTC (1,028 KB) [v2] Fri, 27 Mar 2026 12:43:18 UTC (1,211 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Don't Stop …researchpaperarxivnlplanguage-mo…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 147 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers