Does Tone Change the Answer? Evaluating Prompt Politeness Effects on Modern LLMs: GPT, Gemini, and LLaMA
arXiv:2512.12812v2 Announce Type: replace-cross Abstract: Prompt engineering has emerged as a critical factor influencing large language model (LLM) performance, yet the impact of pragmatic elements such as linguistic tone and politeness remains underexplored, particularly across different model families. In this work, we propose a systematic evaluation framework to examine how interaction tone affects model accuracy and apply it to three recently released and widely available LLMs: GPT-4o mini (OpenAI), Gemini 2.0 Flash (Google DeepMind), and Llama 4 Scout (Meta). Using the MMMLU benchmark, w — Hanyu Cai, Binqi Shen, Lier Jin, Lan Hu, Xiaojing Fan
View PDF HTML (experimental)
Abstract:Prompt engineering has emerged as a critical factor influencing large language model (LLM) performance, yet the impact of pragmatic elements such as linguistic tone and politeness remains underexplored, particularly across different model families. In this work, we propose a systematic evaluation framework to examine how interaction tone affects model accuracy and apply it to three recently released and widely available LLMs: GPT-4o mini (OpenAI), Gemini 2.0 Flash (Google DeepMind), and Llama 4 Scout (Meta). Using the MMMLU benchmark, we evaluate model performance under Very Polite, Neutral, and Very Rude prompt variants across six tasks spanning STEM and Humanities domains, and analyze pairwise accuracy differences with statistical significance testing. Our results show that tone sensitivity is both model-dependent and domain-specific. Neutral or Very Polite prompts generally yield higher accuracy than Very Rude prompts, but statistically significant effects appear only in a subset of Humanities tasks, where rude tone reduces accuracy for GPT and Llama, while Gemini remains comparatively tone-insensitive. When performance is aggregated across tasks within each domain, tone effects diminish and largely lose statistical significance. Compared with earlier research, these findings suggest that dataset scale and coverage materially influence the detection of tone effects. Overall, our study indicates that while interaction tone can matter in specific interpretive settings, modern LLMs are broadly robust to tonal variation in typical mixed-domain use, providing practical guidance for prompt design and model selection in real-world deployments.
Subjects:
Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as: arXiv:2512.12812 [cs.CL]
(or arXiv:2512.12812v2 [cs.CL] for this version)
https://doi.org/10.48550/arXiv.2512.12812
arXiv-issued DOI via DataCite
Submission history
From: Hanyu Cai [view email] [v1] Sun, 14 Dec 2025 19:25:20 UTC (83 KB) [v2] Fri, 27 Mar 2026 20:35:20 UTC (83 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxivAI models will secretly scheme to protect other AI models from being shut down, researchers find - Fortune
<a href="https://news.google.com/rss/articles/CBMixgFBVV95cUxPdDVrRUpkN1RRQU91SDJYYzVzejV4b1JoTWdwVEZVamltZHdKaGtfS3FNQlMyWVdmS2NqRi1pUHJWbG9KX1ZkUmFPeEllc0Q1SjlPdnVPMHRYTXE2S2EtbThEM1lncnVac01Wc2N2V0NGelIwUVFWUTFtdGRxMGpSby11QWNEcHlqcF96QWhuYWQ0YWFuWDBhWGFqSDNFRVNGc19uNzJnUHR4X0VxQzdZTDhUNjg2Y3pOWWw2QjUweFc0djFUSFE?oc=5" target="_blank">AI models will secretly scheme to protect other AI models from being shut down, researchers find</a> <font color="#6f6f6f">Fortune</font>
I Built a Social Post Engine to Escape the Canva-Export-Schedule Loop
<p>As a solo founder running WahResume.com, I was spending way too much time on social media - not on creativity, but on process.<br> Same templates. Same brand assets. Same hashtags. Every post meant opening Canva, exporting, uploading, scheduling… and repeating it the next day.</p> <p>So I built something to fix that.</p> <p>Social Post Engine is a small tool that helps me stay consistent on social media without having to touch Canva or an endless queue of schedulers.</p> <p>Here’s what it does:</p> <p>✅ Seed & review topics in one command — it researches, outlines, and preps your next posts.<br> ✅ Pre-generates branded images from templates (checklists, stat cards, charts, comparisons). It also writes captions in your brand’s voice using AI.<br> ✅ Publishes automatically to LinkedIn
Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models - WSJ
<a href="https://news.google.com/rss/articles/CBMiuANBVV95cUxOQU9Xc09YTnZwb0Myb3VQMDk5MjVGeE50aEUzbkhWdW1OcUltMGMtQXZwYkN0R2l4ZTloTU1scUNkdTA0cHgwdG9LS2lYazk4dWxMLXJuU0liZnN1S2c2RmszV054VUJCMXhrZjFuQmtUQjk0aGU1M1V2RVpfQ0d3amhYMF92dzFhWGkzelFKd2VhaGJDOV9uOXBfZFpkc3A5N3JnT2dNSzBTMGE3Q0pfdzJrbTI4ZmY4S2dYOG1uaTl4UTZoMFFadE54cHlxUk03ZFgwZm1qV2ZnazJTcFNnX2dMN19xMHZtTHB5QmpQeDFKRDljNi1BX01vc2hkQV9rWHpYNk9oSXVpR1pWS2VQVjVIOHhlVHFqalJRZGZTWXd0VjhfMXFhQ3RXLWdNaS03cDYxMDYxWmlCUEg5MjVzNWg4RGVWVks3b3BLSWpOUXpBU255NDBMRFhzd2lwNTBmOGRHVlFXaGRsR3VaaFZJOU9VTDlXZTQ5V3JVcGRSWG13amZLWjNaVm5RejRTT1NlNTFxOXozWEh3eFd2UlByNUJFWHQtSGJUdS1fdy1UM1ZvcUs4ZGlHeQ?oc=5" target="_blank">Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models</a> <font color="#6f6f6f">WSJ</font>
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers
Researchers to use robotics and AI to help sheep producers - University of Nevada, Reno
<a href="https://news.google.com/rss/articles/CBMic0FVX3lxTFB4UmxpREpFODBJN0lKakYwRVVtdlZPNmNiTExRelVFaDYzYW9kX2RCc0pEZjlmX01fT1dWYTlxZE1ET2ZKVVgzSVZIenY3bDlHa3FXS1dUdVBmTEdLa1hUR2x3OWxHbkE2RnROSjl6VHVHQ2c?oc=5" target="_blank">Researchers to use robotics and AI to help sheep producers</a> <font color="#6f6f6f">University of Nevada, Reno</font>
AIRA_2: Breaking Bottlenecks In AI Research Agents - Forbes
<a href="https://news.google.com/rss/articles/CBMiowFBVV95cUxNNmtndHhmQ2lpZGdPdTJwY25xejcyV1c1SWNLdWFOWnNwbjRUQTF0ZWdOZFNaclNBNWVsaUgtU0JUM2xrakhoOXVLMVJzVTNkajdrMmJGeS1lYUpMUG1NMkZNMDJFREZZdXU2ZVdEbkNZSDNBRjJBLVYyZE9XeEY4T0RJY3J5aDVWcEZVQ2lWUjhUYXBsUk16d09NdGdsQ3lxb3gw?oc=5" target="_blank">AIRA_2: Breaking Bottlenecks In AI Research Agents</a> <font color="#6f6f6f">Forbes</font>
Oracle Layoffs Recast Costs To Back US$50b AI Infrastructure Bet - simplywall.st
<a href="https://news.google.com/rss/articles/CBMivwFBVV95cUxQNWpZb2ZQVDBIOGVZTTBtLThzaGwxS3NkMnJBSS1wek5pQlJXRWdTOEh5aTdPTE9Cd3JHdjZDeWRtVzdMUUdESHJOQXZDdGNVdGZtTTBhanpfb3UxQnRobVlzNGdVUXJLZWptV2V6NXlNSWllX3FxOU5XYTF0RkM2TnJIaFJkcVBFOGc2alBSLTZEeU85QU1oTjBrMVZSTl84dm9GeFl5OGtUMjc3LVd1dS1fcHZ1RG9HcV82T2JFWdIBxAFBVV95cUxOSE5XVXh0QkM4Yi1WbXNhWkJ2Z2dLRlBGNjAwaTcyNFJWMWRPdXo5WjRQQkRGTG9IamxxbmdhMHpsaEJ6RDQwZl9ENGl5WDc5a2lrTXZ1bVpFbGdsdndHYjFINnZPSnNKX1dZamszUXByR1BlRXF6d1pKOHpBU3M5UFhUSldlUWtIMlRNQzdvTk9haEJKeDI1ZEg0WWQ1SXYzLUZCWElQc3pzR19ucGExdVpnc2hBQXlQNVpOZFVBVzRkLXFE?oc=5" target="_blank">Oracle Layoffs Recast Costs To Back US$50b AI Infrastructure Bet</a> <font color="#6f6f6f">simplywall.st</font>


Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!