Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessCoreWeave Stock Analysis: Buy or Sell This Nvidia-Backed AI Stock? - The Motley FoolGNews AI NVIDIAIntel Arc B70 Benchmarks/Comparison to Nvidia RTX 4070 SuperReddit r/LocalLLaMAMicrosoft is automatically updating Windows 11 24H2 to 25H2 using machine learning - TweakTownGoogle News: Machine Learning80 Years to an Overnight Success: The Real History of Artificial Intelligence - Futurist SpeakerGoogle News: AIWhat next for the struggling rural mothers in China who helped to build AI?SCMP Tech (Asia AI)Apple reportedly signed a 3rd-party driver, by Tiny Corp, for AMD or Nvidia eGPUs for Apple Silicon Macs; it s meant for AI research, not accelerating graphics (AppleInsider)TechmemeBest Resume Builders in 2026: I Applied to 50 Jobs to Test TheseDEV CommunityTruth Technology and the Architecture of Digital TrustDEV CommunityI Switched From GitKraken to This Indie Git Client and I’m Not Going BackDEV CommunityWhy I Run 22 Docker Services at HomeDEV CommunityHow to Embed ChatGPT in Your Website: 5 Methods Compared [2026 Guide]DEV CommunityThe Spaceballs sequel will be released in April next yearEngadgetBlack Hat USADark ReadingBlack Hat AsiaAI BusinessCoreWeave Stock Analysis: Buy or Sell This Nvidia-Backed AI Stock? - The Motley FoolGNews AI NVIDIAIntel Arc B70 Benchmarks/Comparison to Nvidia RTX 4070 SuperReddit r/LocalLLaMAMicrosoft is automatically updating Windows 11 24H2 to 25H2 using machine learning - TweakTownGoogle News: Machine Learning80 Years to an Overnight Success: The Real History of Artificial Intelligence - Futurist SpeakerGoogle News: AIWhat next for the struggling rural mothers in China who helped to build AI?SCMP Tech (Asia AI)Apple reportedly signed a 3rd-party driver, by Tiny Corp, for AMD or Nvidia eGPUs for Apple Silicon Macs; it s meant for AI research, not accelerating graphics (AppleInsider)TechmemeBest Resume Builders in 2026: I Applied to 50 Jobs to Test TheseDEV CommunityTruth Technology and the Architecture of Digital TrustDEV CommunityI Switched From GitKraken to This Indie Git Client and I’m Not Going BackDEV CommunityWhy I Run 22 Docker Services at HomeDEV CommunityHow to Embed ChatGPT in Your Website: 5 Methods Compared [2026 Guide]DEV CommunityThe Spaceballs sequel will be released in April next yearEngadget
AI NEWS HUBbyEIGENVECTOREigenvector

Designing User-Centric Metrics for Evaluation of Counterfactual Explanations

arXivMarch 31, 202610 min read0 views
Source Quiz

arXiv:2507.15162v2 Announce Type: replace Abstract: Counterfactual Explanations (CFEs) have grown in popularity as a means of offering actionable guidance by identifying the minimum changes in feature values required to flip an ML model's prediction to something more desirable. Unfortunately, most prior research on CFEs relies on artificial evaluation metrics, such as proximity, which may overlook end-user preferences and constraints, e.g., the user's perception of effort needed to make certain feature changes may differ from that of the model designer. To address this research gap, this paper — Firdaus Ahmed Choudhury, Ethan Leicht, Jude Ethan Bislig, Hangzhi Guo, Amulya Yadav

View PDF HTML (experimental)

Abstract:Counterfactual Explanations (CFEs) have grown in popularity as a means of offering actionable guidance by identifying the minimum changes in feature values required to flip an ML model's prediction to something more desirable. Unfortunately, most prior research on CFEs relies on artificial evaluation metrics, such as proximity, which may overlook end-user preferences and constraints, e.g., the user's perception of effort needed to make certain feature changes may differ from that of the model designer. To address this research gap, this paper makes three novel contributions. First, we conduct a pilot study with 20 crowd-workers on Amazon MTurk to experimentally validate the alignment of existing CF evaluation metrics with real-world user preferences. Results show that user-preferred CFEs matched those based on proximity in only 63.81% of cases, highlighting the limited applicability of these metrics in real-world settings. Second, inspired by the need to design a user-informed evaluation metric for CFEs, we conduct a more detailed two-day user study with 41 participants facing realistic credit application scenarios to find experimental support for or against three intuitive hypotheses that may explain how end users evaluate CFEs. Third, based on the findings of this second study, we propose the AWP model, a novel user-centric, two-stage model that describes one possible mechanism by which users evaluate and select CFEs. Our results show that AWP predicts user-preferred CFEs with 84.37% accuracy. Our study provides the first human-centered validation for personalized cost models in CFE generation and highlights the need for adaptive, user-centered evaluation metrics.

Subjects:

Machine Learning (cs.LG)

Cite as: arXiv:2507.15162 [cs.LG]

(or arXiv:2507.15162v2 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2507.15162

arXiv-issued DOI via DataCite

Submission history

From: Firdaus Ahmed Choudhury [view email] [v1] Sun, 20 Jul 2025 23:58:29 UTC (2,678 KB) [v2] Mon, 30 Mar 2026 01:31:53 UTC (474 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Designing U…researchpaperarxivmachine-lea…deep-learni…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 204 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers