Models model language model announce study arxiv findings

Authorship Impersonation via LLM Prompting does not Evade Authorship Verification Methods

arXiv cs.CLby Baoyi Zeng, Andrea NiniApril 1, 20262 min read0 views

arXiv:2603.29454v1 Announce Type: new Abstract: Authorship verification (AV), the task of determining whether a questioned text was written by a specific individual, is a critical part of forensic linguistics. While manual authorial impersonation by perpetrators has long been a recognized threat in historical forensic cases, recent advances in large language models (LLMs) raise new challenges, as adversaries may exploit these tools to impersonate another's writing. This study investigates whether prompted LLMs can generate convincing authorial impersonations and whether such outputs can evade existing forensic AV systems. Using GPT-4o as the adversary model, we generated impersonation texts under four prompting conditions across three genres: emails, text messages, and social media posts.

View PDF HTML (experimental)

Abstract:Authorship verification (AV), the task of determining whether a questioned text was written by a specific individual, is a critical part of forensic linguistics. While manual authorial impersonation by perpetrators has long been a recognized threat in historical forensic cases, recent advances in large language models (LLMs) raise new challenges, as adversaries may exploit these tools to impersonate another's writing. This study investigates whether prompted LLMs can generate convincing authorial impersonations and whether such outputs can evade existing forensic AV systems. Using GPT-4o as the adversary model, we generated impersonation texts under four prompting conditions across three genres: emails, text messages, and social media posts. We then evaluated these outputs against both non-neural AV methods (n-gram tracing, Ranking-Based Impostors Method, LambdaG) and neural approaches (AdHominem, LUAR, STAR) within a likelihood-ratio framework. Results show that LLM-generated texts failed to sufficiently replicate authorial individuality to bypass established AV systems. We also observed that some methods achieved even higher accuracy when rejecting impersonation texts compared to genuine negative samples. Overall, these findings indicate that, despite the accessibility of LLMs, current AV systems remain robust against entry-level impersonation attempts across multiple genres. Furthermore, we demonstrate that this counter-intuitive resilience stems, at least in part, from the higher lexical diversity and entropy inherent in LLM-generated texts.

Comments: 11 pages, 3 figures

Subjects:

Computation and Language (cs.CL)

Cite as: arXiv:2603.29454 [cs.CL]

(or arXiv:2603.29454v1 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2603.29454

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Baoyi Zeng [view email] [v1] Tue, 31 Mar 2026 08:59:09 UTC (83 KB)

Original source

arXiv cs.CL

https://arxiv.org/abs/2603.29454

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modellanguage modelannounce

Models

Large language model diagnostic assistance for physicians in a lower-middle-income country: a randomized controlled trial - Nature

Large language model diagnostic assistance for physicians in a lower-middle-income country: a randomized controlled trial Nature

Google News: LLM

1mabout 2 months ago

Models

Classroom AI: large language models as grade-specific teachers - npj Artificial Intelligence - Nature

Classroom AI: large language models as grade-specific teachers - npj Artificial Intelligence Nature

Google News: LLM

1mabout 1 month ago

ProductsLive

The Pre-Flight Checklist: 7 Things I Verify Before Sending Any Prompt to Production

You wouldn't deploy code without running tests. So why are you sending prompts to production without checking them first? After shipping dozens of AI-powered features, I've settled on a 7-item pre-flight checklist that catches most problems before they reach users. Here it is. 1. Input Boundaries Does the prompt handle edge cases in the input? Empty strings Extremely long inputs (token overflow) Unexpected formats (JSON when expecting plain text) Quick test: Feed it the worst input you can imagine. If it degrades gracefully, you're good. 2. Output Format Lock Is the expected output format explicitly stated in the prompt? Bad: "Summarize this article." Good: "Summarize this article in exactly 3 bullet points, each under 20 words." Without format constraints, you get different shapes every r

Dev.to AI

3mabout 1 hour ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 185 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!