Research Papers research paper arxiv machine-learning deep-learning

Who Leads? Comparing Human-Centric and Model-Centric Strategies for Defining ML Target Variables

arXivby [Submitted on 29 Oct 2025 (v1), last revised 29 Mar 2026 (this version, v2)]March 31, 20262 min read1 views

arXiv:2510.25974v2 Announce Type: replace-cross Abstract: Predictive modeling has the potential to enhance human decision-making. However, many predictive models fail in practice due to problematic problem formulation in cases where the prediction target is an abstract concept or construct and practitioners need to define an appropriate target variable as a proxy to operationalize the construct of interest. The choice of an appropriate proxy target variable is rarely self-evident in practice, requiring both domain knowledge and iterative data modeling. This process is inherently collaborative, — Mengtian Guo, David Gotz, Yue Wang

View PDF HTML (experimental)

Abstract:Predictive modeling has the potential to enhance human decision-making. However, many predictive models fail in practice due to problematic problem formulation in cases where the prediction target is an abstract concept or construct and practitioners need to define an appropriate target variable as a proxy to operationalize the construct of interest. The choice of an appropriate proxy target variable is rarely self-evident in practice, requiring both domain knowledge and iterative data modeling. This process is inherently collaborative, involving both domain experts and data scientists. In this work, we explore how human-machine teaming can support this process by accelerating iterations while preserving human judgment. We study the impact of two human-machine teaming strategies on proxy construction: 1) relevance-first: humans leading the process by selecting relevant proxies, and 2) performance-first: machines leading the process by recommending proxies based on predictive performance. Based on a controlled user study of a proxy construction task (N = 20), we show that the performance-first strategy facilitated faster iterations and decision-making, but also biased users towards well-performing proxies that are misaligned with the application goal. Our study highlights the opportunities and risks of human-machine teaming in operationalizing machine learning target variables, yielding insights for future research to explore the opportunities and mitigate the risks.

Comments: 23 pages, 6 figures

Subjects:

Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

Cite as: arXiv:2510.25974 [cs.HC]

(or arXiv:2510.25974v2 [cs.HC] for this version)

https://doi.org/10.48550/arXiv.2510.25974

arXiv-issued DOI via DataCite

Submission history

From: Mengtian Guo [view email] [v1] Wed, 29 Oct 2025 21:17:50 UTC (1,119 KB) [v2] Sun, 29 Mar 2026 22:45:34 UTC (1,106 KB)

Original source

arXiv

https://arxiv.org/abs/2510.25974

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

ModelsFresh

Scaling Agentic Memory to 5 Billion Vectors via Binary Quantization and Dynamic Wavelet Matrices

In a study, a new “dynamic wavelet matrix” was used as a vector database, where the memory grows only with log(σ) instead of with n. I considered building a KNN model with a huge memory, capable of holding, for example, 5 billion vectors. First, the words in the context window are converted into an embedding using deberta-v3-small. This is a fast encoder that also takes the position of the tokens into account (disentangled attention) and is responsible for the context in the model. The embedding is then converted into a bit sequence using binary quantization, where dimensions greater than 0 are converted to 1 and otherwise to 0. The advantage is that bit sequences are compressible and are entered into the dynamic wavelet matrix, where the memory grows only with log(σ). A response token is

discuss.huggingface.co

2mabout 2 hours ago

Research PapersFresh

[D] ICML reviewer making up false claim in acknowledgement, what to do?

In a rebuttal acknowledgement we received, the reviewer made up a claim that our method performs worse than baselines with some hyperparameter settings. We did do a comprehensive list of hyperparameter comparisons and the reviewer's claim is not supported by what's presented in the paper. In this case what can we do? submitted by /u/dontknowwhattoplay [link] [comments]

Reddit r/MachineLearning

1mabout 3 hours ago

ModelsLive

Anthropic Spots 'Emotion Vectors' Inside Claude That Influence AI Behavior

Researchers say internal emotion-like signals shape how large language models make decisions.

Decrypt AI

1mabout 2 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 188 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

Who Leads? Comparing Human-Centric and Model-Centric Strategies for Defining ML Target Variables

Submission history

Daily AI Digest

More about

Scaling Agentic Memory to 5 Billion Vectors via Binary Quantization and Dynamic Wavelet Matrices

[D] ICML reviewer making up false claim in acknowledgement, what to do?

Anthropic Spots 'Emotion Vectors' Inside Claude That Influence AI Behavior

Knowledge Map

Connected Articles — Knowledge Graph

Discussion

More in Research Papers

Milton Keynes University Hospital pioneers AI to combat clinician burnout - Oracle

[D] ICML reviewer making up false claim in acknowledgement, what to do?

College grads in ‘AI-proof’ careers like psychology and education are seeing negative returns on their degrees

Researchers 3D print robot the size of a single-cell organism — devices move and navigate even without a ‘brain,’ uses their shape and the environment to get going