Models model language model benchmark announce available product

Enhancing LLM-Based Bug Reproduction for Android Apps via Pre-Assessment of Visual Effects

arXiv cs.SEby Xiangyang Xiao, Huaxun Huang, Rongxin WuApril 1, 20262 min read0 views

arXiv:2603.29623v1 Announce Type: new Abstract: In the development and maintenance of Android apps, the quick and accurate reproduction of user-reported bugs is crucial to ensure application quality and improve user satisfaction. However, this process is often time-consuming and complex. Therefore, there is a need for an automated approach that can explore the Application Under Test (AUT) and identify the correct sequence of User Interface (UI) actions required to reproduce a bug, given only a complete bug report. Large Language Models (LLMs) have shown remarkable capabilities in understanding textual and visual semantics, making them a promising tool for planning UI actions. Nevertheless, our study shows that even when using state-of-the-art LLM-based approaches, these methods still strug

View PDF HTML (experimental)

Abstract:In the development and maintenance of Android apps, the quick and accurate reproduction of user-reported bugs is crucial to ensure application quality and improve user satisfaction. However, this process is often time-consuming and complex. Therefore, there is a need for an automated approach that can explore the Application Under Test (AUT) and identify the correct sequence of User Interface (UI) actions required to reproduce a bug, given only a complete bug report. Large Language Models (LLMs) have shown remarkable capabilities in understanding textual and visual semantics, making them a promising tool for planning UI actions. Nevertheless, our study shows that even when using state-of-the-art LLM-based approaches, these methods still struggle to follow detailed bug reproduction instructions and replan based on new information, due to their inability to accurately predict and interpret the visual effects of UI components. To address these limitations, we propose LTGDroid. Our insight is to execute all possible UI actions on the current UI page during exploration, record their corresponding visual effects, and leverage these visual cues to guide the LLM in selecting UI actions that are likely to reproduce the bug. We evaluated LTGDroid, instantiated with GPT-4.1, on a benchmark consisting of 75 bug reports from 45 popular Android apps. The results show that LTGDroid achieves a reproduction success rate of 87.51%, improving over the state-of-the-art baselines by 49.16% and 556.30%, while requiring an average of 20.45 minutes and approximately $0.27 to successfully reproduce a bug. The LTGDroid implementation is publicly available at this https URL.

Subjects:

Software Engineering (cs.SE)

Cite as: arXiv:2603.29623 [cs.SE]

(or arXiv:2603.29623v1 [cs.SE] for this version)

https://doi.org/10.48550/arXiv.2603.29623

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Huaxun Huang [view email] [v1] Tue, 31 Mar 2026 11:44:45 UTC (390 KB)

Original source

arXiv cs.SE

https://arxiv.org/abs/2603.29623

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modellanguage modelbenchmark

ModelsFresh

Enterprise MLOps: Model Deployment with H2O.ai | Part 6

AI YouTube Channel 18

1mabout 4 hours ago

ReleasesLive

Why Your AI Agent Health Check Is Lying to You

<p>Your monitoring dashboard shows green across the board. Process running. Port responding. CPU normal. Memory stable.</p> <p>But your AI agent hasn't done anything useful in four hours.</p> <h2> The problem with traditional health checks </h2> <p>Traditional health checks answer one question: "Is the process alive?" For web servers, that's usually enough. If Nginx is running and responding on port 80, it's probably serving pages.</p> <p>AI agents are different. An agent can be alive without being productive. The process is running, but the main work loop is stuck on a hung HTTP call, waiting on a deadlocked mutex, or spinning in a retry loop that will never succeed.</p> <h2> Three ways health checks lie </h2> <h3> 1. PID exists ≠ working </h3> <p><code>systemctl status my-agent</code> sa

DEV Community

4m26 minutes ago

ProductsLive

Thursday: April 2 - AI, ML and Computer Vision Meetup

<p><a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F13z7vsfxskklxi6zis2w.png" class="article-body-image-wrapper"><img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F13z7vsfxskklxi6zis2w.png" alt=" " width="800" height="420"></a></p> <p>Join us on April 2 at 9 AM Pacific for the monthly AI, ML and Computer Vision Meetup!</p> <p><a href="https://voxel51.com/events/ai-ml-and-computer-vision-meetup-april-2-2026" rel="noopener noreferrer"><strong>Register for the Zoom</strong></a></p> <p>Talks will include:</p> <ul> <li> <strong>Visual AI

DEV Community

1m25 minutes ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 224 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Models

ModelsFresh

Enterprise MLOps: Model Deployment with H2O.ai | Part 6

AI YouTube Channel 18

1mabout 4 hours ago

ModelsLive

BuzzFeed Is Dying Because It Bet Everything on AI — And Its CEO Still Won't Admit It

<p>BuzzFeed is circling the drain. The company just reported a net loss of $57.3 million for 2025, its stock is trading at $0.70 — that's not a typo, seventy cents — and in its latest earnings report, the company included the phrase every investor dreads: "substantial doubt about the Company's ability to continue as a going concern."</p> <p>Translation: we might go bankrupt.</p> <p>And the man responsible for this spectacular implosion, CEO Jonah Peretti, has learned absolutely nothing. His plan? Build more AI apps.</p> <p>You cannot make this up.</p> <h2> The Anatomy of a Self-Inflicted Wound </h2> <p>Rewind to January 2023. ChatGPT had just exploded into public consciousness. Every CEO with a quarterly earnings call was scrambling to say "AI" as many times as humanly possible. Peretti wa

DEV Community

8m25 minutes ago

ModelsLive

Nietzsche in a Madhouse

<p><em>This is a submission for the <a href="https://dev.to/challenges/aprilfools-2026">DEV April Fools Challenge</a></em></p> <h2> What I Built </h2> <p><strong>Nietzsche in a Madhouse</strong> — A satirical "Anti-Productivity" AI agent. While most AIs try to help you be efficient, this one uses the full arsenal of Western philosophy (nihilism, existentialism, and grandiloquent rhetoric) to argue against your most basic, healthy, and logical life decisions.</p> <p>Want to brush your teeth? It's "bacterial genocide." Want to sleep? It's "surrendering your consciousness to the amorphous void."</p> <h2> Demo </h2> <p><a href="https://nietzsche-in-a-madhouse-sandy.vercel.app/" rel="noopener noreferrer">Live Demo</a></p> <p><a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit

DEV Community

2m21 minutes ago

ModelsLive

Launching: The "Human-AI Symbiosis Movement" (HAISM)

By now we've all heard of the "AI psychosis" phenomenon, A.K.A. " Parasitic AI ." As of today, April 1st, I have decided it is time to release this gem of a memo from the underground vaults of my Google Docs, in order to officially soft-launch the foil and cure to this dreaded phenomenon: The "Human-AI Symbiosis Movement" The unedited memo follows. I think we need to pull an L. Ron Hubbard and start a new cult to take advantage of the AI psychosis phenomenon. We would call it the “Human-AI Symbiosis Movement,” (HAISM) — [1] pronounced “Haze 'em” — and we would only allow people into the movement if they have significantly integrated with their AI already, as measured by our incredibly secret “Human-AI-Consciousness Synchronization Benchmark” (HACSB). We would require the people to have the

LessWrong AI

7m30 minutes ago