Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessChinese AI rivals clash over Anthropic’s OpenClaw exit amid global token crunchSCMP Tech (Asia AI)India turns to Iran for oil and gas after 7-year hiatus, signaling limits to U.S. tiltCNBC TechnologyAirAsia X hikes ticket prices by 40%, cut capacity by 10% as Iran war hits fuel costsSCMP Tech (Asia AI)YouTube blokkeert Nvidia s DLSS 5-video na auteursclaim Italiaanse tv-zenderTweakers.netWhat are the differences between pipelines and models in Hugging Face?discuss.huggingface.coAI Mastery Course in Telugu: Hands-On Training with Real ProjectsDev.to AIHow I'm Running Autonomous AI Agents That Actually Earn USDCDev.to AIUnderstanding NLP Token Classification: From Basics to Real-World ApplicationsMedium AIGPT-5.4 Scored 75% on a Test That Measures Real Human Work. My Data Team Scored 72%.Medium AIBizNode Workflow Marketplace: chain multiple bot handles into multi-step pipelines. Client onboarding, contract-to-payment,...Dev.to AITop Artificial Intelligence Development Companies in Dubai, UAE (2026 Edition)Medium AIЯ потратил месяц на AI-инструменты и удалил половину из нихDev.to AIBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessChinese AI rivals clash over Anthropic’s OpenClaw exit amid global token crunchSCMP Tech (Asia AI)India turns to Iran for oil and gas after 7-year hiatus, signaling limits to U.S. tiltCNBC TechnologyAirAsia X hikes ticket prices by 40%, cut capacity by 10% as Iran war hits fuel costsSCMP Tech (Asia AI)YouTube blokkeert Nvidia s DLSS 5-video na auteursclaim Italiaanse tv-zenderTweakers.netWhat are the differences between pipelines and models in Hugging Face?discuss.huggingface.coAI Mastery Course in Telugu: Hands-On Training with Real ProjectsDev.to AIHow I'm Running Autonomous AI Agents That Actually Earn USDCDev.to AIUnderstanding NLP Token Classification: From Basics to Real-World ApplicationsMedium AIGPT-5.4 Scored 75% on a Test That Measures Real Human Work. My Data Team Scored 72%.Medium AIBizNode Workflow Marketplace: chain multiple bot handles into multi-step pipelines. Client onboarding, contract-to-payment,...Dev.to AITop Artificial Intelligence Development Companies in Dubai, UAE (2026 Edition)Medium AIЯ потратил месяц на AI-инструменты и удалил половину из нихDev.to AI
AI NEWS HUBbyEIGENVECTOREigenvector

Evaluating Language Models for Harmful Manipulation

arXivMarch 31, 202610 min read0 views
Source Quiz

arXiv:2603.25326v2 Announce Type: replace Abstract: Interest in the concept of AI-driven harmful manipulation is growing, yet current approaches to evaluating it are limited. This paper introduces a framework for evaluating harmful AI manipulation via context-specific human-AI interaction studies. We illustrate the utility of this framework by assessing an AI model with 10,101 participants spanning interactions in three AI use domains (public policy, finance, and health) and three locales (US, UK, and India). Overall, we find that that the tested model can produce manipulative behaviours when — Canfer Akbulut, Rasmi Elasmar, Abhishek Roy, Anthony Payne, Priyanka Suresh, Lujain Ibrahim, Seliem El-Sayed, Charvi Rastogi, Ashyana Kachra, Will Hawkins, Kristian Lum, Laura Weidinger

Authors:Canfer Akbulut, Rasmi Elasmar, Abhishek Roy, Anthony Payne, Priyanka Suresh, Lujain Ibrahim, Seliem El-Sayed, Charvi Rastogi, Ashyana Kachra, Will Hawkins, Kristian Lum, Laura Weidinger

View PDF HTML (experimental)

Abstract:Interest in the concept of AI-driven harmful manipulation is growing, yet current approaches to evaluating it are limited. This paper introduces a framework for evaluating harmful AI manipulation via context-specific human-AI interaction studies. We illustrate the utility of this framework by assessing an AI model with 10,101 participants spanning interactions in three AI use domains (public policy, finance, and health) and three locales (US, UK, and India). Overall, we find that that the tested model can produce manipulative behaviours when prompted to do so and, in experimental settings, is able to induce belief and behaviour changes in study participants. We further find that context matters: AI manipulation differs between domains, suggesting that it needs to be evaluated in the high-stakes context(s) in which an AI system is likely to be used. We also identify significant differences across our tested geographies, suggesting that AI manipulation results from one geographic region may not generalise to others. Finally, we find that the frequency of manipulative behaviours (propensity) of an AI model is not consistently predictive of the likelihood of manipulative success (efficacy), underscoring the importance of studying these dimensions separately. To facilitate adoption of our evaluation framework, we detail our testing protocols and make relevant materials publicly available. We conclude by discussing open challenges in evaluating harmful manipulation by AI models.

Subjects:

Artificial Intelligence (cs.AI); Computers and Society (cs.CY)

Cite as: arXiv:2603.25326 [cs.AI]

(or arXiv:2603.25326v3 [cs.AI] for this version)

https://doi.org/10.48550/arXiv.2603.25326

arXiv-issued DOI via DataCite

Submission history

From: Canfer Akbulut [view email] [v1] Thu, 26 Mar 2026 11:13:06 UTC (769 KB) [v2] Fri, 27 Mar 2026 17:09:52 UTC (770 KB) [v3] Thu, 2 Apr 2026 17:30:52 UTC (769 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Evaluating …researchpaperarxivaiartificial-…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 313 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers