Research Papers research paper arxiv machine-learning deep-learning

Principal Prototype Analysis on Manifold for Interpretable Reinforcement Learning

arXivMarch 31, 202610 min read0 views

arXiv:2603.27971v1 Announce Type: new Abstract: Recent years have witnessed the widespread adoption of reinforcement learning (RL), from solving real-time games to fine-tuning large language models using human preference data significantly improving alignment with user expectations. However, as model complexity grows exponentially, the interpretability of these systems becomes increasingly challenging. While numerous explainability methods have been developed for computer vision and natural language processing to elucidate both local and global reasoning patterns, their application to RL remai — Bodla Krishna Vamshi, Haizhao Yang

View PDF HTML (experimental)

Abstract:Recent years have witnessed the widespread adoption of reinforcement learning (RL), from solving real-time games to fine-tuning large language models using human preference data significantly improving alignment with user expectations. However, as model complexity grows exponentially, the interpretability of these systems becomes increasingly challenging. While numerous explainability methods have been developed for computer vision and natural language processing to elucidate both local and global reasoning patterns, their application to RL remains limited. Direct extensions of these methods often struggle to maintain the delicate balance between interpretability and performance within RL settings. Prototype-Wrapper Networks (PW-Nets) have recently shown promise in bridging this gap by enhancing explainability in RL domains without sacrificing the efficiency of the original black-box models. However, these methods typically require manually defined reference prototypes, which often necessitate expert domain knowledge. In this work, we propose a method that removes this dependency by automatically selecting optimal prototypes from the available data. Preliminary experiments on standard Gym environments demonstrate that our approach matches the performance of existing PW-Nets, while remaining competitive with the original black-box models.

Subjects:

Machine Learning (cs.LG)

Cite as: arXiv:2603.27971 [cs.LG]

(or arXiv:2603.27971v2 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2603.27971

arXiv-issued DOI via DataCite

Submission history

From: Krishna Vamshi Bodla [view email] [v1] Mon, 30 Mar 2026 02:48:13 UTC (1,160 KB) [v2] Tue, 31 Mar 2026 14:11:24 UTC (1,160 KB)

Original source

arXiv

https://arxiv.org/abs/2603.27971

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Research Papers

Exclusive | OpenAI’s Former Research Chief Aims to Automate Manufacturing With AI - WSJ

Exclusive | OpenAI’s Former Research Chief Aims to Automate Manufacturing With AI WSJ

GNews AI manufacturing

1m29 days ago

ReleasesLive

IBM And ETH Zurich Launch 10-Year Algorithmic Research Initiative - Quantum Zeitgeist

IBM And ETH Zurich Launch 10-Year Algorithmic Research Initiative Quantum Zeitgeist

GNews AI quantum

1mabout 2 hours ago

Models

How Ukraine became a drone factory and invented the future of war

Ukraine has responded to a war it didn’t start by creating an industry it doesn’t want, but could the nation s drone expertise help it rebuild? To learn more, New Scientist gained exclusive access to the research labs, factories and military training schools behind Ukraine’s drones

New Scientist Tech

1mabout 1 month ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 189 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research Papers

Exclusive | OpenAI’s Former Research Chief Aims to Automate Manufacturing With AI - WSJ

Exclusive | OpenAI’s Former Research Chief Aims to Automate Manufacturing With AI WSJ

GNews AI manufacturing

1m29 days ago

Research PapersFresh

Exploring the Interplay Between Voice, Personality, and Gender in Human-Agent Interactions

arXiv:2602.10535v2 Announce Type: replace Abstract: To foster effective human-agent interactions, designers must understand how vocal cues influence the perception of agent personality and the role of user-agent alignment in shaping these perceptions. In this work, we examine whether users can perceive extroversion in voice-only artificial agents and how perceived personality relates to user-agent synchrony. We conducted a study with 388 participants, who evaluated four synthetic voices derived from human recordings, varying by gender (male, female) and personality expression (introverted, extroverted). Our results show that participants were able to differentiate perceived extroversion in female agent voices, but not consistently in male voices. We also observed evidence of perceived pers

arXiv cs.HC

1mabout 7 hours ago

Research PapersFresh

Explaining the Reputational Risks of AI-Mediated Communication: Messages labeled as AI-assisted are viewed as less diagnostic of the sender's moral character

arXiv:2509.09645v2 Announce Type: replace Abstract: When someone sends us a thoughtful message, we naturally form judgments about their character. But what happens when that message carries a label indicating it was written with the help of AI? This paper investigates how the appearance of AI assistance affects our perceptions of message senders. Adding nuance to previous research, through two studies (N=399) featuring vignette scenarios, we find that AI-assistance labels don't necessarily make people view senders negatively. Rather, they dampen the strength of character signals in communication. We show that when someone sends a warmth-signalling message (like thanking or apologizing) without AI help, people more strongly categorize the sender as warm. At the same time, when someone sends

arXiv cs.HC

2mabout 7 hours ago

Research PapersFresh

Exploring and Analyzing the Effect of Avatar's Visual Style on Anxiety of English as Second Language (ESL) Speakers

arXiv:2311.05126v3 Announce Type: replace Abstract: Virtual avatars offer new opportunities to reshape communication experiences beyond traditional live video. However, it remains unclear how avatar representations influence communication anxiety for English as a Second Language (ESL) speakers, and why such effects emerge. To take a first step to address this, we conducted a controlled laboratory study in which Mandarin-speaking ESL participants engaged in one-on-one conversations under three representation conditions: live video, stylized avatars, and realistic avatars. We assessed anxiety using both self-reported measures and physiological signals (EDA, ECG, PPG). Our results show that avatar style plays a critical role in shaping communication anxiety. While live video remained a strong

arXiv cs.HC

1mabout 7 hours ago