REFINE: Real-world Exploration of Interactive Feedback and Student Behaviour
arXiv:2603.29142v1 Announce Type: new Abstract: Formative feedback is central to effective learning, yet providing timely, individualised feedback at scale remains a persistent challenge. While recent work has explored the use of large language models (LLMs) to automate feedback, most existing systems still conceptualise feedback as a static, one-way artifact, offering limited support for interpretation, clarification, or follow-up. In this work, we introduce REFINE, a locally deployable, multi-agent feedback system built on small, open-source LLMs that treats feedback as an interactive process. REFINE combines a pedagogically-grounded feedback generation agent with an LLM-as-a-judge-guided regeneration loop using a human-aligned judge, and a self-reflective tool-calling interactive agent
View PDF HTML (experimental)
Abstract:Formative feedback is central to effective learning, yet providing timely, individualised feedback at scale remains a persistent challenge. While recent work has explored the use of large language models (LLMs) to automate feedback, most existing systems still conceptualise feedback as a static, one-way artifact, offering limited support for interpretation, clarification, or follow-up. In this work, we introduce REFINE, a locally deployable, multi-agent feedback system built on small, open-source LLMs that treats feedback as an interactive process. REFINE combines a pedagogically-grounded feedback generation agent with an LLM-as-a-judge-guided regeneration loop using a human-aligned judge, and a self-reflective tool-calling interactive agent that supports student follow-up questions with context-aware, actionable responses. We evaluate REFINE through controlled experiments and an authentic classroom deployment in an undergraduate computer science course. Automatic evaluations show that judge-guided regeneration significantly improves feedback quality, and that the interactive agent produces efficient, high-quality responses comparable to a state-of-the-art closed-source model. Analysis of real student interactions further reveals distinct engagement patterns and indicates that system-generated feedback systematically steers subsequent student inquiry. Our findings demonstrate the feasibility and effectiveness of multi-agent, tool-augmented feedback systems for scalable, interactive feedback.
Comments: Accepted to AIED 2026
Subjects:
Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Cite as: arXiv:2603.29142 [cs.AI]
(or arXiv:2603.29142v1 [cs.AI] for this version)
https://doi.org/10.48550/arXiv.2603.29142
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Fares Fawzi [view email] [v1] Tue, 31 Mar 2026 01:48:08 UTC (1,382 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modellanguage modelannounceHDF5 vs. TsFile: Efficient Time-Series Data Storage
<p>In the era of big data, efficient data storage and management are critical to the success of both scientific research and industrial applications. <a href="https://www.hdfgroup.org/solutions/hdf5/" rel="noopener noreferrer">HDF5</a>, a hierarchical format for managing experimental data, and <a href="https://tsfile.apache.org" rel="noopener noreferrer">TsFile</a>, a modern time-series data storage format, each offer unique strengths and design philosophies. This article takes a deep dive into the origins, use cases, and limitations of HDF5, and explores the similarities and differences between HDF5 and TsFile.</p> <h2> Origins of HDF5 </h2> <p>HDF5, short for <em>Hierarchical Data Format version 5</em>, is more than just a file format. It encompasses a full data model, software libraries
Securing the Agentic Frontier: Why Your AI Agents Need a "Citadel" 🏰
<p>Remember when we thought chatbots were the peak of AI? Fast forward to early 2026, and we’re all-in on <strong>autonomous agents</strong>. Frameworks like <a href="https://neuraltrust.ai/blog/openclaw-moltbook" rel="noopener noreferrer"><strong>OpenClaw</strong></a> have made it incredibly easy to build agents that don't just talk, they <em>do</em>. They manage calendars, write code, and even deploy to production.</p> <p>But here’s the catch: the security models we built for humans are fundamentally broken for autonomous systems. </p> <p>If you’re a developer building with agentic AI, you’ve probably heard of the <strong>"unbounded blast radius."</strong> Unlike a human attacker limited by typing speed and sleep, an AI agent operates at compute speed, 24/7. One malicious "skill" or a po
Claude Code's Leaked Source: A Real-World Masterclass in Harness Engineering
<p>Earlier this year, Mitchell Hashimoto coined the term "harness engineering" — the discipline of building everything <em>around</em> the model that makes an AI agent actually work in production. OpenAI wrote about it. Anthropic published guides. Martin Fowler analyzed it.</p> <p>Then Claude Code's source leaked. 512K lines of TypeScript. And suddenly we have the first real look at what production harness engineering looks like at scale.</p> <h2> The Evolution: From Prompt to Harness </h2> <p>The AI engineering discipline has shifted rapidly:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight plaintext"><code>2023-2024: Prompt Engineering → "How to ask the model" 2025: Context Engineering → "What information to feed the model" 2026: Harness Engineering → "How the ent
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Models
Claude Code's Leaked Source: A Real-World Masterclass in Harness Engineering
<p>Earlier this year, Mitchell Hashimoto coined the term "harness engineering" — the discipline of building everything <em>around</em> the model that makes an AI agent actually work in production. OpenAI wrote about it. Anthropic published guides. Martin Fowler analyzed it.</p> <p>Then Claude Code's source leaked. 512K lines of TypeScript. And suddenly we have the first real look at what production harness engineering looks like at scale.</p> <h2> The Evolution: From Prompt to Harness </h2> <p>The AI engineering discipline has shifted rapidly:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight plaintext"><code>2023-2024: Prompt Engineering → "How to ask the model" 2025: Context Engineering → "What information to feed the model" 2026: Harness Engineering → "How the ent
I Built an AI PPT Maker and Resume Builder Website
<p>I Built an AI PPT Maker and Resume Builder Website built a website that helps students and professionals create PowerPoint presentations and resumes using AI in just a few minutes.</p> <p>What the Website Does</p> <p>The website has two main tools:</p> <p>AI PPT Maker – Generate presentations from a topic</p> <p>Resume Maker – Create professional resumes quickly</p> <p>You just enter your topic or details, and the tool generates content automatically.</p> <p>Why I Built This</p> <p>Many students spend hours making presentations and resumes. I wanted to build a simple tool that saves time and makes this process easier using AI.</p> <p>Tools Used</p> <p>React</p> <p>Node.js</p> <p>Gemini API</p> <p>AI Studio</p> <p>Vercel for deployment</p> <p>Try It Here</p> <p>You can try the website he
An updated analysis of large language model performance on ophthalmology speciality examinations - Nature
<a href="https://news.google.com/rss/articles/CBMiX0FVX3lxTFBpUV83NFU5RDZtbm9pb3NTN3UwaEhaZmh2Mm04OHJNS3JYbHVVQVFQYXRISld2SG1UdHlHWWxsQWk4VFZtR1Q1QWhqakM5ME1COEtCdXZCM3FOazg4UlhB?oc=5" target="_blank">An updated analysis of large language model performance on ophthalmology speciality examinations</a> <font color="#6f6f6f">Nature</font>
Exclusive | The Sudden Fall of OpenAI’s Most Hyped Product Since ChatGPT - WSJ
<a href="https://news.google.com/rss/articles/CBMiogNBVV95cUxOM2VrdzJzY3pZWGphZ0NnVlFGTGJVNllDeHN6Y09QU09MNlA0UEhJbFR0dmFqWHFhYkpGUXVYV3FaMFZNM3pEenptVVN3TW5ZZHBQcUZoeWxBY19LbDA3dWsyRlVsQmF6NEJkYmlmcUh4RHh6V0NieTAzZE5PNWtNTU5oSC1jbmY0U3R0SnhPY0Y0RU51THNDaklVS2FOczN3MC0yNkVidTRtZktJa3ViVzFXdUJvdEFvMmlqUHl5OGRVbkpUeVRkY1d6ZERBU2J1NDllVTRacjNZZ1Q0ZmU0ci13SG91Wk8tTnl2RlphOU9tQVViZzlHRWNnVHVWbEJ4NEZTYVRvc05OYzBUQVctblV1RGxpMDlHNVpSQzZLR0FhSG5pZ0RYMS1wRllJbnRpNFFMOEIzMzdkazc2Z0tSYm11MEtqRFNtbl95Y0U5cEVrWEViZEpSZ2hhTFUtY1FoTm90eHlOcFhmNGpRMnBKWldQVHlBeGZFZGtfTUlRelFuUVNFdlgyeHp2aXIxVGVLdGpmdlJXUllWYWFCamE3WFNR?oc=5" target="_blank">Exclusive | The Sudden Fall of OpenAI’s Most Hyped Product Since ChatGPT</a> <font color="#6f6f6f">WSJ</font>
Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!