Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessWhy Software Engineers Burn Out Differently And What To Do About ItDEV Community512,000 Lines of Claude Code Leaked Through a Single .npmignore MistakeDEV CommunityStop Wasting Tokens on npm Install NoiseDEV CommunityProgramming Logic: The First Step to Mastering Any LanguageDEV CommunityThe $10 Billion Trust Data Market That AI Companies Can't SeeDEV CommunityAI company insiders can bias models for election interferenceLessWrong AIMiniScript Weekly News — Apr 1, 2026DEV CommunityBuilding a Real-Time Dota 2 Draft Prediction System with Machine LearningDEV Community🚀 Build a Full-Stack Python Web App (No JS Framework Needed)DEV CommunityGoogle increases the storage of its $19.99/month AI Pro subscription plan to 5TB, up from 2TB, at no additional cost (Abner Li/9to5Google)TechmemeI open sourced a production MLOps pipeline. Here is what it took to get it to PyPI and Hugging Face in one day.DEV CommunityBuilding a Future in Artificial Intelligence: Complete Guide to AI-900 and AI-102 Certifications - North Penn NowGoogle News: Machine LearningBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessWhy Software Engineers Burn Out Differently And What To Do About ItDEV Community512,000 Lines of Claude Code Leaked Through a Single .npmignore MistakeDEV CommunityStop Wasting Tokens on npm Install NoiseDEV CommunityProgramming Logic: The First Step to Mastering Any LanguageDEV CommunityThe $10 Billion Trust Data Market That AI Companies Can't SeeDEV CommunityAI company insiders can bias models for election interferenceLessWrong AIMiniScript Weekly News — Apr 1, 2026DEV CommunityBuilding a Real-Time Dota 2 Draft Prediction System with Machine LearningDEV Community🚀 Build a Full-Stack Python Web App (No JS Framework Needed)DEV CommunityGoogle increases the storage of its $19.99/month AI Pro subscription plan to 5TB, up from 2TB, at no additional cost (Abner Li/9to5Google)TechmemeI open sourced a production MLOps pipeline. Here is what it took to get it to PyPI and Hugging Face in one day.DEV CommunityBuilding a Future in Artificial Intelligence: Complete Guide to AI-900 and AI-102 Certifications - North Penn NowGoogle News: Machine Learning

From Reviews to Requirements: Can LLMs Generate Human-Like User Stories?

arXivMarch 31, 20262 min read0 views
Source Quiz

arXiv:2603.28163v1 Announce Type: new Abstract: App store reviews provide a constant flow of real user feedback that can help improve software requirements. However, these reviews are often messy, informal, and difficult to analyze manually at scale. Although automated techniques exist, many do not perform well when replicated and often fail to produce clean, backlog-ready user stories for agile projects. In this study, we evaluate how well large language models (LLMs) such as GPT-3.5 Turbo, Gemini 2.0 Flash, and Mistral 7B Instruct can generate usable user stories directly from raw app review — Shadman Sakib, Oishy Fatema Akhand, Tasnia Tasneem, Shohel Ahmed

View PDF HTML (experimental)

Abstract:App store reviews provide a constant flow of real user feedback that can help improve software requirements. However, these reviews are often messy, informal, and difficult to analyze manually at scale. Although automated techniques exist, many do not perform well when replicated and often fail to produce clean, backlog-ready user stories for agile projects. In this study, we evaluate how well large language models (LLMs) such as GPT-3.5 Turbo, Gemini 2.0 Flash, and Mistral 7B Instruct can generate usable user stories directly from raw app reviews. Using the Mini-BAR dataset of 1,000+ health app reviews, we tested zero-shot, one-shot, and two-shot prompting methods. We evaluated the generated user stories using both human judgment (via the RUST framework) and a RoBERTa classifier fine-tuned on UStAI to assess their overall quality. Our results show that LLMs can match or even outperform humans in writing fluent, well-formatted user stories, especially when few-shot prompts are used. However, they still struggle to produce independent and unique user stories, which are essential for building a strong agile backlog. Overall, our findings show how LLMs can reliably turn unstructured app reviews into actionable software requirements, providing developers with clear guidance to turn user feedback into meaningful improvements.

Subjects:

Computation and Language (cs.CL)

Cite as: arXiv:2603.28163 [cs.CL]

(or arXiv:2603.28163v1 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2603.28163

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Shadman Sakib [view email] [v1] Mon, 30 Mar 2026 08:31:18 UTC (841 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
From Review…researchpaperarxivnlplanguage-mo…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 196 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers