Enhancing LLM-Based Bug Reproduction for Android Apps via Pre-Assessment of Visual Effects
arXiv:2603.29623v1 Announce Type: new Abstract: In the development and maintenance of Android apps, the quick and accurate reproduction of user-reported bugs is crucial to ensure application quality and improve user satisfaction. However, this process is often time-consuming and complex. Therefore, there is a need for an automated approach that can explore the Application Under Test (AUT) and identify the correct sequence of User Interface (UI) actions required to reproduce a bug, given only a complete bug report. Large Language Models (LLMs) have shown remarkable capabilities in understanding textual and visual semantics, making them a promising tool for planning UI actions. Nevertheless, our study shows that even when using state-of-the-art LLM-based approaches, these methods still strug
View PDF HTML (experimental)
Abstract:In the development and maintenance of Android apps, the quick and accurate reproduction of user-reported bugs is crucial to ensure application quality and improve user satisfaction. However, this process is often time-consuming and complex. Therefore, there is a need for an automated approach that can explore the Application Under Test (AUT) and identify the correct sequence of User Interface (UI) actions required to reproduce a bug, given only a complete bug report. Large Language Models (LLMs) have shown remarkable capabilities in understanding textual and visual semantics, making them a promising tool for planning UI actions. Nevertheless, our study shows that even when using state-of-the-art LLM-based approaches, these methods still struggle to follow detailed bug reproduction instructions and replan based on new information, due to their inability to accurately predict and interpret the visual effects of UI components. To address these limitations, we propose LTGDroid. Our insight is to execute all possible UI actions on the current UI page during exploration, record their corresponding visual effects, and leverage these visual cues to guide the LLM in selecting UI actions that are likely to reproduce the bug. We evaluated LTGDroid, instantiated with GPT-4.1, on a benchmark consisting of 75 bug reports from 45 popular Android apps. The results show that LTGDroid achieves a reproduction success rate of 87.51%, improving over the state-of-the-art baselines by 49.16% and 556.30%, while requiring an average of 20.45 minutes and approximately $0.27 to successfully reproduce a bug. The LTGDroid implementation is publicly available at this https URL.
Subjects:
Software Engineering (cs.SE)
Cite as: arXiv:2603.29623 [cs.SE]
(or arXiv:2603.29623v1 [cs.SE] for this version)
https://doi.org/10.48550/arXiv.2603.29623
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Huaxun Huang [view email] [v1] Tue, 31 Mar 2026 11:44:45 UTC (390 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modellanguage modelbenchmarkWhy Your AI Agent Health Check Is Lying to You
<p>Your monitoring dashboard shows green across the board. Process running. Port responding. CPU normal. Memory stable.</p> <p>But your AI agent hasn't done anything useful in four hours.</p> <h2> The problem with traditional health checks </h2> <p>Traditional health checks answer one question: "Is the process alive?" For web servers, that's usually enough. If Nginx is running and responding on port 80, it's probably serving pages.</p> <p>AI agents are different. An agent can be alive without being productive. The process is running, but the main work loop is stuck on a hung HTTP call, waiting on a deadlocked mutex, or spinning in a retry loop that will never succeed.</p> <h2> Three ways health checks lie </h2> <h3> 1. PID exists ≠ working </h3> <p><code>systemctl status my-agent</code> sa
Thursday: April 2 - AI, ML and Computer Vision Meetup
<p><a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F13z7vsfxskklxi6zis2w.png" class="article-body-image-wrapper"><img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F13z7vsfxskklxi6zis2w.png" alt=" " width="800" height="420"></a></p> <p>Join us on April 2 at 9 AM Pacific for the monthly AI, ML and Computer Vision Meetup!</p> <p><a href="https://voxel51.com/events/ai-ml-and-computer-vision-meetup-april-2-2026" rel="noopener noreferrer"><strong>Register for the Zoom</strong></a></p> <p>Talks will include:</p> <ul> <li> <strong>Visual AI
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Models
BuzzFeed Is Dying Because It Bet Everything on AI — And Its CEO Still Won't Admit It
<p>BuzzFeed is circling the drain. The company just reported a net loss of $57.3 million for 2025, its stock is trading at $0.70 — that's not a typo, seventy cents — and in its latest earnings report, the company included the phrase every investor dreads: "substantial doubt about the Company's ability to continue as a going concern."</p> <p>Translation: we might go bankrupt.</p> <p>And the man responsible for this spectacular implosion, CEO Jonah Peretti, has learned absolutely nothing. His plan? Build more AI apps.</p> <p>You cannot make this up.</p> <h2> The Anatomy of a Self-Inflicted Wound </h2> <p>Rewind to January 2023. ChatGPT had just exploded into public consciousness. Every CEO with a quarterly earnings call was scrambling to say "AI" as many times as humanly possible. Peretti wa
Nietzsche in a Madhouse
<p><em>This is a submission for the <a href="https://dev.to/challenges/aprilfools-2026">DEV April Fools Challenge</a></em></p> <h2> What I Built </h2> <p><strong>Nietzsche in a Madhouse</strong> — A satirical "Anti-Productivity" AI agent. While most AIs try to help you be efficient, this one uses the full arsenal of Western philosophy (nihilism, existentialism, and grandiloquent rhetoric) to argue against your most basic, healthy, and logical life decisions.</p> <p>Want to brush your teeth? It's "bacterial genocide." Want to sleep? It's "surrendering your consciousness to the amorphous void."</p> <h2> Demo </h2> <p><a href="https://nietzsche-in-a-madhouse-sandy.vercel.app/" rel="noopener noreferrer">Live Demo</a></p> <p><a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit
Launching: The "Human-AI Symbiosis Movement" (HAISM)
By now we've all heard of the "AI psychosis" phenomenon, A.K.A. " Parasitic AI ." As of today, April 1st, I have decided it is time to release this gem of a memo from the underground vaults of my Google Docs, in order to officially soft-launch the foil and cure to this dreaded phenomenon: The "Human-AI Symbiosis Movement" The unedited memo follows. I think we need to pull an L. Ron Hubbard and start a new cult to take advantage of the AI psychosis phenomenon. We would call it the “Human-AI Symbiosis Movement,” (HAISM) — [1] pronounced “Haze 'em” — and we would only allow people into the movement if they have significantly integrated with their AI already, as measured by our incredibly secret “Human-AI-Consciousness Synchronization Benchmark” (HACSB). We would require the people to have the
Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!