BDD Test Cases from User Stories: 5 Steps and 12 Scenarios
<p><strong>User stories set the destination. Test cases map every path, including the wrong ones.</strong></p> <p>In this post we walk through a 5-step BDD decomposition framework for turning a single user story into concrete Given/When/Then scenarios, with a complete worked coupon example. BDD (Behavior-Driven Development) structures tests as <a href="https://cucumber.io/docs/gherkin/reference/" rel="noopener noreferrer">Given/When/Then scenarios</a> that both stakeholders and tools like Cucumber or SpecFlow can understand.</p> <p>Most teams write user stories in a format like:</p> <blockquote> <p>As a checkout user,<br> I want to apply a discount coupon,<br> So that I pay less for my order.</p> </blockquote> <p>Then they try to extract test cases directly from that sentence. The result i
User stories set the destination. Test cases map every path, including the wrong ones.
In this post we walk through a 5-step BDD decomposition framework for turning a single user story into concrete Given/When/Then scenarios, with a complete worked coupon example. BDD (Behavior-Driven Development) structures tests as Given/When/Then scenarios that both stakeholders and tools like Cucumber or SpecFlow can understand.
Most teams write user stories in a format like:
As a checkout user, I want to apply a discount coupon, So that I pay less for my order.
Then they try to extract test cases directly from that sentence. The result is a handful of happy path scenarios and a nagging sense that something important was missed.
That instinct is right. This post explains why, and what to do about it.
Why User Stories Fail as Test Case Sources
User stories describe intent, not behavior. They answer why a feature exists, not what the system should do in every situation.
The coupon story above doesn't say:
-
What happens if the coupon is expired
-
What if the cart is below the minimum order value
-
Whether a coupon can be applied more than once
-
What if the coupon code is valid but belongs to a different user
-
What if the user removes an item after applying the coupon
-
What happens if the coupon service is temporarily unavailable
None of these are edge cases you invent. They're real scenarios that happen in production. They're just not in the user story because user stories aren't meant to capture them.
The gap between "the story" and "what needs testing" is where bugs live.
The 5-Step BDD Decomposition Framework
Don't try to extract test cases from the user story text directly. Use the story as a starting point for structured decomposition.
The framework in summary:
-
Identify the main scenario (the happy path)
-
Map every input and its valid boundaries
-
Add negative scenarios for each invalid input
-
Add boundary-condition scenarios at and around each threshold
-
Add error-handling and state-transition scenarios
Each step below.
Step 1: Identify the Main Scenario
What does the system do in the ideal case? This is your happy path, the scenario where everything works as intended.
Scenario: Valid coupon applies correct discount Given I have a cart with items totaling $50 And I have a valid coupon "SAVE20" for 20% off When I apply the coupon Then the cart total should be $40 And I should see "20% off applied"Scenario: Valid coupon applies correct discount Given I have a cart with items totaling $50 And I have a valid coupon "SAVE20" for 20% off When I apply the coupon Then the cart total should be $40 And I should see "20% off applied"Enter fullscreen mode
Exit fullscreen mode
This is what your product owner has in mind when they write the story. Necessary, but not close to enough.
Step 2: Map Every Input and Its Boundaries
Every input the user provides or the system receives has a valid range. List them explicitly.
Input Valid Range Boundary Cases
Coupon code Alphanumeric, 4-20 chars Empty, too short, too long, special chars
Expiry date Today or future Yesterday, today, tomorrow
Minimum cart value Above minimum threshold (e.g., $30) $29.99, $30.00, $30.01
Usage limit 0 - N uses per user 0, 1, max, max+1
User ownership Owned by this user or global Own coupon, other user's coupon, global
Each boundary is a potential test case. The edge of the valid range is where errors happen.
Step 3: Add Negative Scenarios
For each input, ask: what should the system do when this input is invalid, expired, or out of range? Each answer is a test case.
Scenario: Expired coupon is rejected with clear message Given I have a cart with items totaling $50 And I have an expired coupon "OLD20" When I apply the coupon Then I should see the error "This coupon has expired" And the cart total should remain $50 And no discount should be appliedScenario: Expired coupon is rejected with clear message Given I have a cart with items totaling $50 And I have an expired coupon "OLD20" When I apply the coupon Then I should see the error "This coupon has expired" And the cart total should remain $50 And no discount should be appliedEnter fullscreen mode
Exit fullscreen mode
Step 4: Add Boundary Conditions
Boundary conditions are the exact limits of valid input: the values right at, just below, and just above each threshold.
A coupon with a $30 minimum order value needs three scenarios:
-
Cart at $29.99 → coupon rejected
-
Cart at $30.00 → coupon accepted
-
Cart at $30.01 → coupon accepted
Scenario: Coupon rejected when cart is one cent below minimum Given I have a cart with items totaling $29.99 And I have a valid coupon "SAVE20" with a $30 minimum When I apply the coupon Then I should see "Minimum order of $30 required" And the cart total should remain $29.99Scenario: Coupon rejected when cart is one cent below minimum Given I have a cart with items totaling $29.99 And I have a valid coupon "SAVE20" with a $30 minimum When I apply the coupon Then I should see "Minimum order of $30 required" And the cart total should remain $29.99Enter fullscreen mode
Exit fullscreen mode
Boundary conditions expose off-by-one errors, rounding bugs, and comparison operator mistakes (< vs <=). I've seen teams spend weeks debugging production issues that a single boundary test would have caught. The return on time invested is unusually high.
Step 5: Add Error Handling and State Scenarios
What happens if the coupon service is down? What if the user applies a coupon, then removes an item that drops the cart below the minimum?
State transitions (when valid inputs become invalid mid-session) are consistently under-tested. In most codebases I've worked on, they're not tested at all.
Scenario: Coupon auto-removed when cart drops below minimum Given I have a cart with items totaling $50 And I have successfully applied coupon "SAVE20" with a $30 minimum When I remove items bringing the cart total to $25 Then the coupon should be automatically removed And I should see "Coupon removed: cart below $30 minimum" And the cart total should be $25 with no discountScenario: Coupon auto-removed when cart drops below minimum Given I have a cart with items totaling $50 And I have successfully applied coupon "SAVE20" with a $30 minimum When I remove items bringing the cart total to $25 Then the coupon should be automatically removed And I should see "Coupon removed: cart below $30 minimum" And the cart total should be $25 with no discountEnter fullscreen mode
Exit fullscreen mode
The Three Coverage Levels
Not every story needs exhaustive coverage immediately. A pragmatic approach uses three levels, chosen based on the risk profile of the feature.
Coverage Level Scenarios Focus Use When
Quick 3 total Happy path + obvious failures Low-risk features, rapid prototyping, legacy system initial coverage
Standard 8 total Adds boundaries & realistic edge cases Most production features
Exhaustive 12 total Adds security, concurrency, state transitions Payment flows, authentication, data integrity, anything hard to roll back
Choosing the right level is a judgment call about risk, not a technical decision. Features that touch money, health data, or user permissions need Exhaustive. A blog post draft autosave feature probably needs Quick.
Worked Example: Coupon Application
User story: As a checkout user, I want to apply a discount coupon, so that I pay less for my order.
Quick Coverage (3 scenarios)
Scenario Tests
1 Valid coupon applied Correct discount shown, cart total updated
2 Invalid coupon code Clear error message, cart unchanged
3 Expired coupon Expiry-specific error message
Scenario: Valid coupon applies correct discount Given I have a cart with items totaling $50 And I have a valid coupon "SAVE20" for 20% off When I apply the coupon Then the cart total should be $40 And I should see "20% off applied"Scenario: Valid coupon applies correct discount Given I have a cart with items totaling $50 And I have a valid coupon "SAVE20" for 20% off When I apply the coupon Then the cart total should be $40 And I should see "20% off applied"Scenario: Invalid coupon code is rejected Given I have a cart with items totaling $50 When I apply coupon "NOTREAL" Then I should see "Coupon not found" And the cart total should remain $50
Scenario: Expired coupon is rejected Given I have a cart with items totaling $50 And I have an expired coupon "OLD20" When I apply the coupon Then I should see "This coupon has expired" And the cart total should remain $50`
Enter fullscreen mode
Exit fullscreen mode
Standard Coverage Adds (5 more scenarios, 8 total)
Scenario Tests
4 Cart below minimum order value Minimum order error with threshold displayed
5 Single-use coupon applied twice "Coupon already used" error
6 Coupon applied, then item removed below minimum Coupon auto-removed, user notified
7 Cart at exactly minimum value Coupon applies successfully
8 Cart $0.01 below minimum Coupon not applicable
Scenario: Coupon rejected when cart is one cent below minimum Given I have a cart with items totaling $29.99 And I have a valid coupon "SAVE20" with a $30 minimum order When I apply the coupon Then I should see "Minimum order of $30 required" And the cart total should remain $29.99Scenario: Coupon rejected when cart is one cent below minimum Given I have a cart with items totaling $29.99 And I have a valid coupon "SAVE20" with a $30 minimum order When I apply the coupon Then I should see "Minimum order of $30 required" And the cart total should remain $29.99Enter fullscreen mode
Exit fullscreen mode
Scenarios 7-8 are the boundary pair for the minimum threshold. Testing both sides of the boundary catches < vs <= bugs that unit tests often miss.
Exhaustive Coverage Adds (4 more scenarios, 12 total)
Scenario Tests
9 Coupon belonging to another user "Coupon not found" (not "belongs to someone else")
10 Two simultaneous coupon applications Only one applied, no double discount
11 Coupon code with SQL injection attempt Sanitized and treated as invalid
12 Coupon applied, page refreshed State persisted correctly
These deserve additional commentary.
Scenario 9: User Enumeration Prevention
At first glance, you might write the expected behavior as: "When I apply a coupon that belongs to another user, I see 'This coupon belongs to a different account.'"
That sounds helpful, but it leaks information. It confirms the coupon exists and is assigned to someone. An attacker can use this to enumerate valid coupon codes or confirm which users have coupons.
The correct behavior returns a generic "Coupon not found" message, identical to what you'd see for a coupon that doesn't exist at all.
Scenario: Coupon belonging to another user returns generic not-found Given I have a cart with items totaling $50 And a coupon "PRIVATE10" exists but belongs to another user When I apply coupon "PRIVATE10" Then I should see "Coupon not found" And the cart total should remain $50Scenario: Coupon belonging to another user returns generic not-found Given I have a cart with items totaling $50 And a coupon "PRIVATE10" exists but belongs to another user When I apply coupon "PRIVATE10" Then I should see "Coupon not found" And the cart total should remain $50Enter fullscreen mode
Exit fullscreen mode
Scenario 10: Race Condition / Double Discount
If a user opens two tabs and clicks "Apply Coupon" simultaneously, what happens? Without explicit concurrency handling, you can end up with double discounts or corrupted cart state. I've seen promotions lose thousands of dollars to this exact bug during flash sales.
Scenario 11: Input Sanitization
The coupon code field accepts user input. That input will hit your database. Malicious input should be sanitized and rejected as "invalid code," not cause errors or, worse, execute.
Scenario 12: State Persistence
If a user applies a coupon then refreshes the page, is the coupon still applied? This tests session or database persistence. It sounds obvious, but I've seen carts silently drop discounts on refresh because the coupon was stored client-side only.
Common Mistakes
Four mistakes I see repeatedly in BDD scenarios.
Mistake 1: Testing the Implementation, Not the Behavior
BDD scenarios should describe what the system does from a user's perspective, not how it does it internally.
Bad:
Scenario: Click apply button sends API request Given I am on the cart page When I click the "Apply Coupon" button Then a POST request is sent to /api/v1/coupons/applyScenario: Click apply button sends API request Given I am on the cart page When I click the "Apply Coupon" button Then a POST request is sent to /api/v1/coupons/applyEnter fullscreen mode
Exit fullscreen mode
Good:
Scenario: Valid coupon applies correct discount Given I have a cart with items totaling $50 When I apply coupon "SAVE20" Then my cart total should be $40Scenario: Valid coupon applies correct discount Given I have a cart with items totaling $50 When I apply coupon "SAVE20" Then my cart total should be $40Enter fullscreen mode
Exit fullscreen mode
The bad version tests implementation details. The good version tests behavior. Only the second survives a refactor.
Mistake 2: Skipping Boundary Conditions
Teams often test "valid" and "invalid" but miss the exact threshold. If the minimum order is $30, test $29.99, $30.00, and $30.01 — not just "$20" and "$50."
Mistake 3: Coupling to UI Elements
Scenarios that say "When the user clicks the Apply Coupon button" couple your test cases to your UI. Describe actions at the domain level: "When the user applies coupon SAVE20."
Mistake 4: One Huge Scenario Instead of Multiple Focused Ones
A scenario with twelve steps and three outcomes is testing three things at once. When it fails, you don't know which of the three things broke. One behavior per scenario.
Where AI Fits: Honestly
AI test case generation is useful but uneven. Worth being specific about where it helps and where it doesn't.
AI handles well:
-
Volume. Given a well-structured user story, AI can generate a complete scenario set in seconds rather than the hour it might take manually.
-
Consistency. AI doesn't forget null inputs or boundary values — the inputs you'd normally catch on the thirtieth scenario of the day when you're no longer paying attention.
-
Format compliance. Gherkin syntax is strict, and AI follows it reliably.
-
Coverage breadth. AI applies the decomposition systematically across every input. It won't skip Step 4 because it's Friday afternoon.
AI handles poorly:
-
Domain-specific rules. Your business has constraints that aren't in the user story: user enumeration risks, regional compliance requirements, product-specific logic. AI doesn't know these unless you tell it explicitly.
-
Organizational context. Which test cases already exist? Which are covered by integration tests at a lower level? AI doesn't have this information and will generate duplication.
-
Coverage level judgment. Deciding whether a feature needs Quick, Standard, or Exhaustive coverage is a risk assessment about your system, your users, and your rollback capability. That's a human call.
What actually works: let AI handle Steps 2-5 at the coverage level you choose, then review and add the domain constraints it can't know. You get AI's consistency with your context.
Summary
The gap between user stories and test cases is where bugs live. A structured decomposition approach closes that gap:
-
Identify the main scenario — the happy path
-
Map every input and its boundaries — the raw material for edge cases
-
Add negative scenarios — what happens when inputs are invalid
-
Add boundary conditions — the exact thresholds where errors hide
-
Add error handling and state transitions — the scenarios that survive production
Choose your coverage level based on risk: Quick for low-stakes features, Standard for most production work, Exhaustive for anything involving money, security, or data integrity.
This is part of the Practical Test Design series. I'm building KRINO Dokim, a tool that applies this 5-step framework automatically — you choose the coverage level, it generates the scenarios, and you review with full visibility into why each scenario was created. If you work with user stories and BDD, I'd genuinely appreciate your feedback on the approach.
What coverage level does your team default to? Have you run into the user enumeration trap (Scenario 9)? I'd love to hear in the comments.
DEV Community
https://dev.to/krinosystems/bdd-test-cases-from-user-stories-5-steps-and-12-scenarios-g76Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
availableversionupdate
Why the world s militaries are scrambling to create their own Starlink
The reliable internet connections provided by Starlink offer a huge advantage on the battlefield. But as access is dependent on the whims of controversial billionaire Elon Musk, militaries are looking to build their own version
Reviewing the evidence on psychological manipulation by Bots and AI
TL;DR: In terms of the potential risks and harms that can come from powerful AI models, hyper-persuasion of individuals is unlikely to be a serious threat at this point in time. I wouldn’t consider this threat path to be very easy for a misaligned AI or maliciously wielded AI to navigate reliably. I would expect that, for people hoping to reduce risks associated with AI models, there are other more impactful and tractable defenses they could work on. I would advocate for more substantive research into the effects of long-term influence from AI companions and dependency, as well as more research into what interventions may work in both one-off and chronic contexts. ----- In this post we’ll explore how bots can actually influence human psychology and decision-making, and what might be done t
b8634
chat : add Granite 4.0 chat template with correct tool_call role mapping ( #20804 ) chat : add Granite 4.0 chat template with correct tool_call role mapping Introduce LLM_CHAT_TEMPLATE_GRANITE_4_0 alongside the existing Granite 3.x template (renamed LLM_CHAT_TEMPLATE_GRANITE_3_X ). The Granite 4.0 Jinja template uses XML tags and maps the assistant_tool_call role to assistant . Without a matching C++ handler, the fallback path emits the literal role assistant_tool_call which the model does not recognize, breaking tool calling when --jinja is not used. Changes: Rename LLM_CHAT_TEMPLATE_GRANITE to LLM_CHAT_TEMPLATE_GRANITE_3_X (preserves existing 3.x behavior unchanged) Add LLM_CHAT_TEMPLATE_GRANITE_4_0 enum, map entry, and handler Detection: + ( or ) → 4.0, otherwise → 3.x Add production Gr
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Products
Reviewing the evidence on psychological manipulation by Bots and AI
TL;DR: In terms of the potential risks and harms that can come from powerful AI models, hyper-persuasion of individuals is unlikely to be a serious threat at this point in time. I wouldn’t consider this threat path to be very easy for a misaligned AI or maliciously wielded AI to navigate reliably. I would expect that, for people hoping to reduce risks associated with AI models, there are other more impactful and tractable defenses they could work on. I would advocate for more substantive research into the effects of long-term influence from AI companions and dependency, as well as more research into what interventions may work in both one-off and chronic contexts. ----- In this post we’ll explore how bots can actually influence human psychology and decision-making, and what might be done t
b8631
sync : ggml macOS/iOS: macOS Apple Silicon (arm64) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm 7.2) Ubuntu x64 (OpenVINO) Windows: Windows x64 (CPU) Windows arm64 (CPU) Windows x64 (CUDA 12) - CUDA 12.4 DLLs Windows x64 (CUDA 13) - CUDA 13.1 DLLs Windows x64 (Vulkan) Windows x64 (SYCL) Windows x64 (HIP) openEuler: openEuler x86 (310p) openEuler x86 (910b, ACL Graph) openEuler aarch64 (310p) openEuler aarch64 (910b, ACL Graph)
b8634
chat : add Granite 4.0 chat template with correct tool_call role mapping ( #20804 ) chat : add Granite 4.0 chat template with correct tool_call role mapping Introduce LLM_CHAT_TEMPLATE_GRANITE_4_0 alongside the existing Granite 3.x template (renamed LLM_CHAT_TEMPLATE_GRANITE_3_X ). The Granite 4.0 Jinja template uses XML tags and maps the assistant_tool_call role to assistant . Without a matching C++ handler, the fallback path emits the literal role assistant_tool_call which the model does not recognize, breaking tool calling when --jinja is not used. Changes: Rename LLM_CHAT_TEMPLATE_GRANITE to LLM_CHAT_TEMPLATE_GRANITE_3_X (preserves existing 3.x behavior unchanged) Add LLM_CHAT_TEMPLATE_GRANITE_4_0 enum, map entry, and handler Detection: + ( or ) → 4.0, otherwise → 3.x Add production Gr
b8635
Relax prefill parser to allow space. ( #21240 ) Relax prefill parser to allow space. Move changes from prefix() to parser generation Only allow spaces if we're not having a pure content parser next macOS/iOS: macOS Apple Silicon (arm64) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm 7.2) Ubuntu x64 (OpenVINO) Windows: Windows x64 (CPU) Windows arm64 (CPU) Windows x64 (CUDA 12) - CUDA 12.4 DLLs Windows x64 (CUDA 13) - CUDA 13.1 DLLs Windows x64 (Vulkan) Windows x64 (SYCL) Windows x64 (HIP) openEuler: openEuler x86 (310p) openEuler x86 (910b, ACL Graph) openEuler aarch64 (310p) openEuler aarch64 (910b, ACL Graph)


Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!