Research Papers research paper arxiv ai artificial-intelligence

Does Tone Change the Answer? Evaluating Prompt Politeness Effects on Modern LLMs: GPT, Gemini, and LLaMA

arXivMarch 31, 202610 min read0 views

arXiv:2512.12812v2 Announce Type: replace-cross Abstract: Prompt engineering has emerged as a critical factor influencing large language model (LLM) performance, yet the impact of pragmatic elements such as linguistic tone and politeness remains underexplored, particularly across different model families. In this work, we propose a systematic evaluation framework to examine how interaction tone affects model accuracy and apply it to three recently released and widely available LLMs: GPT-4o mini (OpenAI), Gemini 2.0 Flash (Google DeepMind), and Llama 4 Scout (Meta). Using the MMMLU benchmark, w — Hanyu Cai, Binqi Shen, Lier Jin, Lan Hu, Xiaojing Fan

View PDF HTML (experimental)

Abstract:Prompt engineering has emerged as a critical factor influencing large language model (LLM) performance, yet the impact of pragmatic elements such as linguistic tone and politeness remains underexplored, particularly across different model families. In this work, we propose a systematic evaluation framework to examine how interaction tone affects model accuracy and apply it to three recently released and widely available LLMs: GPT-4o mini (OpenAI), Gemini 2.0 Flash (Google DeepMind), and Llama 4 Scout (Meta). Using the MMMLU benchmark, we evaluate model performance under Very Polite, Neutral, and Very Rude prompt variants across six tasks spanning STEM and Humanities domains, and analyze pairwise accuracy differences with statistical significance testing. Our results show that tone sensitivity is both model-dependent and domain-specific. Neutral or Very Polite prompts generally yield higher accuracy than Very Rude prompts, but statistically significant effects appear only in a subset of Humanities tasks, where rude tone reduces accuracy for GPT and Llama, while Gemini remains comparatively tone-insensitive. When performance is aggregated across tasks within each domain, tone effects diminish and largely lose statistical significance. Compared with earlier research, these findings suggest that dataset scale and coverage materially influence the detection of tone effects. Overall, our study indicates that while interaction tone can matter in specific interpretive settings, modern LLMs are broadly robust to tonal variation in typical mixed-domain use, providing practical guidance for prompt design and model selection in real-world deployments.

Subjects:

Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Cite as: arXiv:2512.12812 [cs.CL]

(or arXiv:2512.12812v2 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2512.12812

arXiv-issued DOI via DataCite

Submission history

From: Hanyu Cai [view email] [v1] Sun, 14 Dec 2025 19:25:20 UTC (83 KB) [v2] Fri, 27 Mar 2026 20:35:20 UTC (83 KB)

Original source

arXiv

https://arxiv.org/abs/2512.12812

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

ModelsLive

AI models will secretly scheme to protect other AI models from being shut down, researchers find - Fortune

<a href="https://news.google.com/rss/articles/CBMixgFBVV95cUxPdDVrRUpkN1RRQU91SDJYYzVzejV4b1JoTWdwVEZVamltZHdKaGtfS3FNQlMyWVdmS2NqRi1pUHJWbG9KX1ZkUmFPeEllc0Q1SjlPdnVPMHRYTXE2S2EtbThEM1lncnVac01Wc2N2V0NGelIwUVFWUTFtdGRxMGpSby11QWNEcHlqcF96QWhuYWQ0YWFuWDBhWGFqSDNFRVNGc19uNzJnUHR4X0VxQzdZTDhUNjg2Y3pOWWw2QjUweFc0djFUSFE?oc=5" target="_blank">AI models will secretly scheme to protect other AI models from being shut down, researchers find</a> Fortune

Google News: AI Safety

1m15 minutes ago

Open Source AILive

I Built a Social Post Engine to Escape the Canva-Export-Schedule Loop

As a solo founder running WahResume.com, I was spending way too much time on social media - not on creativity, but on process. Same templates. Same brand assets. Same hashtags. Every post meant opening Canva, exporting, uploading, scheduling… and repeating it the next day. So I built something to fix that. Social Post Engine is a small tool that helps me stay consistent on social media without having to touch Canva or an endless queue of schedulers. Here’s what it does: ✅ Seed & review topics in one command — it researches, outlines, and preps your next posts. ✅ Pre-generates branded images from templates (checklists, stat cards, charts, comparisons). It also writes captions in your brand’s voice using AI. ✅ Publishes automatically to LinkedIn

DEV Community

2m10 minutes ago

ModelsRecent

Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models - WSJ

<a href="https://news.google.com/rss/articles/CBMiuANBVV95cUxOQU9Xc09YTnZwb0Myb3VQMDk5MjVGeE50aEUzbkhWdW1OcUltMGMtQXZwYkN0R2l4ZTloTU1scUNkdTA0cHgwdG9LS2lYazk4dWxMLXJuU0liZnN1S2c2RmszV054VUJCMXhrZjFuQmtUQjk0aGU1M1V2RVpfQ0d3amhYMF92dzFhWGkzelFKd2VhaGJDOV9uOXBfZFpkc3A5N3JnT2dNSzBTMGE3Q0pfdzJrbTI4ZmY4S2dYOG1uaTl4UTZoMFFadE54cHlxUk03ZFgwZm1qV2ZnazJTcFNnX2dMN19xMHZtTHB5QmpQeDFKRDljNi1BX01vc2hkQV9rWHpYNk9oSXVpR1pWS2VQVjVIOHhlVHFqalJRZGZTWXd0VjhfMXFhQ3RXLWdNaS03cDYxMDYxWmlCUEg5MjVzNWg4RGVWVks3b3BLSWpOUXpBU255NDBMRFhzd2lwNTBmOGRHVlFXaGRsR3VaaFZJOU9VTDlXZTQ5V3JVcGRSWG13amZLWjNaVm5RejRTT1NlNTFxOXozWEh3eFd2UlByNUJFWHQtSGJUdS1fdy1UM1ZvcUs4ZGlHeQ?oc=5" target="_blank">Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models</a> WSJ

Google News: LLM

1mabout 23 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 192 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research PapersLive

Researchers to use robotics and AI to help sheep producers - University of Nevada, Reno

<a href="https://news.google.com/rss/articles/CBMic0FVX3lxTFB4UmxpREpFODBJN0lKakYwRVVtdlZPNmNiTExRelVFaDYzYW9kX2RCc0pEZjlmX01fT1dWYTlxZE1ET2ZKVVgzSVZIenY3bDlHa3FXS1dUdVBmTEdLa1hUR2x3OWxHbkE2RnROSjl6VHVHQ2c?oc=5" target="_blank">Researchers to use robotics and AI to help sheep producers</a> University of Nevada, Reno

Google News: AI

1mabout 1 hour ago

Research PapersLive

AIRA_2: Breaking Bottlenecks In AI Research Agents - Forbes

<a href="https://news.google.com/rss/articles/CBMiowFBVV95cUxNNmtndHhmQ2lpZGdPdTJwY25xejcyV1c1SWNLdWFOWnNwbjRUQTF0ZWdOZFNaclNBNWVsaUgtU0JUM2xrakhoOXVLMVJzVTNkajdrMmJGeS1lYUpMUG1NMkZNMDJFREZZdXU2ZVdEbkNZSDNBRjJBLVYyZE9XeEY4T0RJY3J5aDVWcEZVQ2lWUjhUYXBsUk16d09NdGdsQ3lxb3gw?oc=5" target="_blank">AIRA_2: Breaking Bottlenecks In AI Research Agents</a> Forbes

Google News: Machine Learning

1m34 minutes ago

Research PapersLive

Can Science Predict When a Study Won’t Hold Up?

Conducting research is hard; confirming the results is, too. And artificial intelligence isn’t yet ready to help, a major new study finds.

NYT Technology

1mabout 2 hours ago

Research PapersFresh

Oracle Layoffs Recast Costs To Back US$50b AI Infrastructure Bet - simplywall.st

<a href="https://news.google.com/rss/articles/CBMivwFBVV95cUxQNWpZb2ZQVDBIOGVZTTBtLThzaGwxS3NkMnJBSS1wek5pQlJXRWdTOEh5aTdPTE9Cd3JHdjZDeWRtVzdMUUdESHJOQXZDdGNVdGZtTTBhanpfb3UxQnRobVlzNGdVUXJLZWptV2V6NXlNSWllX3FxOU5XYTF0RkM2TnJIaFJkcVBFOGc2alBSLTZEeU85QU1oTjBrMVZSTl84dm9GeFl5OGtUMjc3LVd1dS1fcHZ1RG9HcV82T2JFWdIBxAFBVV95cUxOSE5XVXh0QkM4Yi1WbXNhWkJ2Z2dLRlBGNjAwaTcyNFJWMWRPdXo5WjRQQkRGTG9IamxxbmdhMHpsaEJ6RDQwZl9ENGl5WDc5a2lrTXZ1bVpFbGdsdndHYjFINnZPSnNKX1dZamszUXByR1BlRXF6d1pKOHpBU3M5UFhUSldlUWtIMlRNQzdvTk9haEJKeDI1ZEg0WWQ1SXYzLUZCWElQc3pzR19ucGExdVpnc2hBQXlQNVpOZFVBVzRkLXFE?oc=5" target="_blank">Oracle Layoffs Recast Costs To Back US$50b AI Infrastructure Bet</a> simplywall.st

GNews AI USA

1mabout 4 hours ago