Models model language model announce integration analysis alignment

APITestGenie: Generating Web API Tests from Requirements and API Specifications with LLMs

arXiv cs.SEby [Submitted on 2 Apr 2026]April 3, 20262 min read1 views

arXiv:2604.02039v1 Announce Type: new Abstract: Modern software systems rely heavily on Web APIs, yet creating meaningful and executable test scripts remains a largely manual, time-consuming, and error-prone task. In this paper, we present APITestGenie, a novel tool that leverages Large Language Models (LLMs), Retrieval-Augmented Generation (RAG), and prompt engineering to automatically generate API integration tests directly from business requirements and OpenAPI specifications. We evaluated APITestGenie on 10 real-world APIs, including 8 APIs comprising circa 1,000 live endpoints from an industrial partner in the automotive domain. The tool was able to generate syntactically and semantically valid test scripts for 89\% of the business requirements under test after at most three attempts.

View PDF HTML (experimental)

Abstract:Modern software systems rely heavily on Web APIs, yet creating meaningful and executable test scripts remains a largely manual, time-consuming, and error-prone task. In this paper, we present APITestGenie, a novel tool that leverages Large Language Models (LLMs), Retrieval-Augmented Generation (RAG), and prompt engineering to automatically generate API integration tests directly from business requirements and OpenAPI specifications. We evaluated APITestGenie on 10 real-world APIs, including 8 APIs comprising circa 1,000 live endpoints from an industrial partner in the automotive domain. The tool was able to generate syntactically and semantically valid test scripts for 89% of the business requirements under test after at most three attempts. Notably, some generated tests revealed previously unknown defects in the APIs, including integration issues between endpoints. Statistical analysis identified API complexity and level of detail in business requirements as primary factors influencing success rates, with the level of detail in API documentation also affecting outcomes. Feedback from industry practitioners confirmed strong interest in adoption, substantially reducing the manual effort in writing acceptance tests, and improving the alignment between tests and business requirements.

Subjects:

Software Engineering (cs.SE)

Cite as: arXiv:2604.02039 [cs.SE]

(or arXiv:2604.02039v1 [cs.SE] for this version)

https://doi.org/10.48550/arXiv.2604.02039

arXiv-issued DOI via DataCite (pending registration)

Journal reference: 7th ACM/IEEE International Conference on Automation of Software Test (AST 2026)

Related DOI:

https://doi.org/10.1145/3793654.3793743

DOI(s) linking to related resources

Submission history

From: Bruno Lima Mr. [view email] [v1] Thu, 2 Apr 2026 13:43:56 UTC (470 KB)

Original source

arXiv cs.SE

https://arxiv.org/abs/2604.02039

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modellanguage modelannounce

ModelsLive

Summarization model doesn't work

I try to run this below code (provided in Hugging Face’s LLMs course, lesson: Transformers, what can they do?) from transformers import pipeline summarize = pipeline("summarization") summarize( """ America has changed dramatically during recent years. Not only has the number of graduates in traditional engineering disciplines such as mechanical, civil, electrical, chemical, and aeronautical engineering declined, but in most of the premier American universities engineering curricula now concentrate on and encourage largely the study of engineering science. As a result, there are declining offerings in engineering subjects dealing with infrastructure, the environment, and related issues, and greater concentration on high technology subjects, largely supporting increasingly complex scientific

discuss.huggingface.co

2mabout 1 hour ago

ModelsLive

Scaling Agentic Memory to 5 Billion Vectors via Binary Quantization and Dynamic Wavelet Matrices

In a study, a new “dynamic wavelet matrix” was used as a vector database, where the memory grows only with log(σ) instead of with n. I considered building a KNN model with a huge memory, capable of holding, for example, 5 billion vectors. First, the words in the context window are converted into an embedding using deberta-v3-small. This is a fast encoder that also takes the position of the tokens into account (disentangled attention) and is responsible for the context in the model. The embedding is then converted into a bit sequence using binary quantization, where dimensions greater than 0 are converted to 1 and otherwise to 0. The advantage is that bit sequences are compressible and are entered into the dynamic wavelet matrix, where the memory grows only with log(σ). A response token is

discuss.huggingface.co

2mabout 2 hours ago

ReleasesFresh

Seedance 2.0 vs Sora 2: I Tested Both with Identical Prompts — Here's the Full Breakdown

When I started building with AI video APIs, the first question was obvious: which model should I default to? Spec comparisons didn’t help much. So I ran the same prompts through both Seedance 2.0 and Sora 2 and compared what actually came out. Three tests, three different failure modes: Physics realism — destruction and particle dynamics Fast motion + hard lighting — complex human movement under challenging conditions Character + emotion — subtle facial transitions All tests used identical prompts. Both models accessed through EvoLink’s unified API . Test Setup Variable Setup Prompting The same prompt for both models in each test Goal Compare output behavior, not marketing claims Focus areas Physics, motion coherence, lighting, facial detail, and audio behavior Reading rule We judge what a

discuss.huggingface.co

5mabout 6 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 151 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

APITestGenie: Generating Web API Tests from Requirements and API Specifications with LLMs

Submission history

Daily AI Digest

More about

Summarization model doesn't work

Scaling Agentic Memory to 5 Billion Vectors via Binary Quantization and Dynamic Wavelet Matrices

Seedance 2.0 vs Sora 2: I Tested Both with Identical Prompts — Here's the Full Breakdown

Knowledge Map

Connected Articles — Knowledge Graph

Discussion

More in Models

Exclusive | The Sudden Fall of OpenAI’s Most Hyped Product Since ChatGPT - WSJ

Anthropic Races to Contain Leak of Code Behind Claude AI Agent - WSJ

Anthropic Races to Contain Leak of Code Behind Claude AI Agent - WSJ

Anthropic Races to Contain Leak of Code Behind Claude AI Agent - WSJ