Principal Prototype Analysis on Manifold for Interpretable Reinforcement Learning
arXiv:2603.27971v1 Announce Type: new Abstract: Recent years have witnessed the widespread adoption of reinforcement learning (RL), from solving real-time games to fine-tuning large language models using human preference data significantly improving alignment with user expectations. However, as model complexity grows exponentially, the interpretability of these systems becomes increasingly challenging. While numerous explainability methods have been developed for computer vision and natural language processing to elucidate both local and global reasoning patterns, their application to RL remai — Bodla Krishna Vamshi, Haizhao Yang
View PDF HTML (experimental)
Abstract:Recent years have witnessed the widespread adoption of reinforcement learning (RL), from solving real-time games to fine-tuning large language models using human preference data significantly improving alignment with user expectations. However, as model complexity grows exponentially, the interpretability of these systems becomes increasingly challenging. While numerous explainability methods have been developed for computer vision and natural language processing to elucidate both local and global reasoning patterns, their application to RL remains limited. Direct extensions of these methods often struggle to maintain the delicate balance between interpretability and performance within RL settings. Prototype-Wrapper Networks (PW-Nets) have recently shown promise in bridging this gap by enhancing explainability in RL domains without sacrificing the efficiency of the original black-box models. However, these methods typically require manually defined reference prototypes, which often necessitate expert domain knowledge. In this work, we propose a method that removes this dependency by automatically selecting optimal prototypes from the available data. Preliminary experiments on standard Gym environments demonstrate that our approach matches the performance of existing PW-Nets, while remaining competitive with the original black-box models.
Subjects:
Machine Learning (cs.LG)
Cite as: arXiv:2603.27971 [cs.LG]
(or arXiv:2603.27971v2 [cs.LG] for this version)
https://doi.org/10.48550/arXiv.2603.27971
arXiv-issued DOI via DataCite
Submission history
From: Krishna Vamshi Bodla [view email] [v1] Mon, 30 Mar 2026 02:48:13 UTC (1,160 KB) [v2] Tue, 31 Mar 2026 14:11:24 UTC (1,160 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxiv
How Ukraine became a drone factory and invented the future of war
Ukraine has responded to a war it didn’t start by creating an industry it doesn’t want, but could the nation s drone expertise help it rebuild? To learn more, New Scientist gained exclusive access to the research labs, factories and military training schools behind Ukraine’s drones
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers
Exploring the Interplay Between Voice, Personality, and Gender in Human-Agent Interactions
arXiv:2602.10535v2 Announce Type: replace Abstract: To foster effective human-agent interactions, designers must understand how vocal cues influence the perception of agent personality and the role of user-agent alignment in shaping these perceptions. In this work, we examine whether users can perceive extroversion in voice-only artificial agents and how perceived personality relates to user-agent synchrony. We conducted a study with 388 participants, who evaluated four synthetic voices derived from human recordings, varying by gender (male, female) and personality expression (introverted, extroverted). Our results show that participants were able to differentiate perceived extroversion in female agent voices, but not consistently in male voices. We also observed evidence of perceived pers
Explaining the Reputational Risks of AI-Mediated Communication: Messages labeled as AI-assisted are viewed as less diagnostic of the sender's moral character
arXiv:2509.09645v2 Announce Type: replace Abstract: When someone sends us a thoughtful message, we naturally form judgments about their character. But what happens when that message carries a label indicating it was written with the help of AI? This paper investigates how the appearance of AI assistance affects our perceptions of message senders. Adding nuance to previous research, through two studies (N=399) featuring vignette scenarios, we find that AI-assistance labels don't necessarily make people view senders negatively. Rather, they dampen the strength of character signals in communication. We show that when someone sends a warmth-signalling message (like thanking or apologizing) without AI help, people more strongly categorize the sender as warm. At the same time, when someone sends
Exploring and Analyzing the Effect of Avatar's Visual Style on Anxiety of English as Second Language (ESL) Speakers
arXiv:2311.05126v3 Announce Type: replace Abstract: Virtual avatars offer new opportunities to reshape communication experiences beyond traditional live video. However, it remains unclear how avatar representations influence communication anxiety for English as a Second Language (ESL) speakers, and why such effects emerge. To take a first step to address this, we conducted a controlled laboratory study in which Mandarin-speaking ESL participants engaged in one-on-one conversations under three representation conditions: live video, stylized avatars, and realistic avatars. We assessed anxiety using both self-reported measures and physiological signals (EDA, ECG, PPG). Our results show that avatar style plays a critical role in shaping communication anxiety. While live video remained a strong


Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!