Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessAnthropic Found Emotion Circuits Inside Claude. They're Causing It to Blackmail People.DEV CommunityUnderstanding Transformers Part 1: How Transformers Understand Word OrderDEV CommunityI built an iOS app at 50 using AI tools. Here's what actually workedDEV CommunityDesign Cost-Optimized Compute SolutionsDEV CommunityCodeClone b4: from CLI tool to a real review surface for VS Code, Claude Desktop, and CodexDEV CommunityHow to Publish a Power BI Report and Embed it into a Website.DEV CommunityKVerify: A Two-Year Journey to Get Validation RightDEV CommunityHow I Used Swarm Intelligence to Catch a Race Condition Before It Hit ProductionDEV CommunityDark Dish Lab: A Cursed Recipe GeneratorDEV CommunityUpload Large Folders to Cloudflare R2DEV Community10x Genomics (TXG) Is Up 14.6% After Analyst Upgrade Highlights AI-Scale Spatial Genomics Initiative - simplywall.stGNews AI genomicsWhy Developer Productivity Engineering is UnderratedDEV CommunityBlack Hat USADark ReadingBlack Hat AsiaAI BusinessAnthropic Found Emotion Circuits Inside Claude. They're Causing It to Blackmail People.DEV CommunityUnderstanding Transformers Part 1: How Transformers Understand Word OrderDEV CommunityI built an iOS app at 50 using AI tools. Here's what actually workedDEV CommunityDesign Cost-Optimized Compute SolutionsDEV CommunityCodeClone b4: from CLI tool to a real review surface for VS Code, Claude Desktop, and CodexDEV CommunityHow to Publish a Power BI Report and Embed it into a Website.DEV CommunityKVerify: A Two-Year Journey to Get Validation RightDEV CommunityHow I Used Swarm Intelligence to Catch a Race Condition Before It Hit ProductionDEV CommunityDark Dish Lab: A Cursed Recipe GeneratorDEV CommunityUpload Large Folders to Cloudflare R2DEV Community10x Genomics (TXG) Is Up 14.6% After Analyst Upgrade Highlights AI-Scale Spatial Genomics Initiative - simplywall.stGNews AI genomicsWhy Developer Productivity Engineering is UnderratedDEV Community
AI NEWS HUBbyEIGENVECTOREigenvector

BuildWithAI: Architecting a Serverless DR Toolkit on AWS

DEV Communityby Romar CablaoApril 5, 20269 min read0 views
Source Quiz

Overview I'd been getting more involved in disaster recovery planning lately and kept running into the same gap — a lot of teams on AWS have backups, but not a real Disaster Recovery (DR) plan. No documented runbooks, no tested failover procedures, no RTO/RPO targets tied to business impact. So that became the motivation for this side project: six AI-powered tools that automate the tedious parts of DR planning, built entirely on AWS. In part one of this three-part series, we will walk through the architecture — the serverless stack, the central model config, and the 5-layer cost guardrail system that keeps everything under $10/month (of course, you can set your own threshold; that's just what felt right for this side project). The next two parts will cover prompt engineering for each tool

Overview

I'd been getting more involved in disaster recovery planning lately and kept running into the same gap — a lot of teams on AWS have backups, but not a real Disaster Recovery (DR) plan. No documented runbooks, no tested failover procedures, no RTO/RPO targets tied to business impact. So that became the motivation for this side project: six AI-powered tools that automate the tedious parts of DR planning, built entirely on AWS.

In part one of this three-part series, we will walk through the architecture — the serverless stack, the central model config, and the 5-layer cost guardrail system that keeps everything under $10/month (of course, you can set your own threshold; that's just what felt right for this side project). The next two parts will cover prompt engineering for each tool and the lessons learned setting this side project.

Here is a look at what we're going to build. You can try out the live version at https://dr-toolkit.thecloudspark.com.

While this was implemented with the help of Kiro — AWS's spec-driven AI IDE — this series will focus on the DR toolkit, Amazon Bedrock, and the underlying AWS architecture, rather than Kiro itself.

What the toolkit does

Six tools, same workflow: provide input, Lambda calls Amazon Bedrock, get formatted output.

Tool Default Model What it does

1 Runbook Generator Nova Pro Paste IaC → get a full DR runbook

2 RTO/RPO Estimator Nova Lite Fill a form → get recovery targets and DR tier

3 DR Strategy Advisor Nova Lite Answer questions → get an AWS DR architecture pattern

4 Post-Mortem Writer Nova Lite Paste incident notes → get a structured post-mortem

5 DR Checklist Builder Nova Lite Pick your AWS services → get a tailored audit checklist

6 Template DR Reviewer Nova Pro Paste IaC → get a gap analysis with fix snippets

The live demo at DR Toolkit currently runs on Amazon Nova models. But these are just the defaults — the toolkit supports any model in the Bedrock Model Catalog. You can mix and match: Nova Lite for simple tools, Claude Sonnet for complex ones, or go all-in on a single provider. Just update models.config.json and redeploy.

Architecture

Here’s the big picture. I kept the architecture intentionally simple and straightforward AWS serverless setup. Few Lambda functions, one API Gateway, one DynamoDB table, one SNS topic, S3 + CloudFront for the frontend.

So when someone opens the toolkit, CloudFront serves the static frontend from a private S3 bucket. When they submit a tool form, the request goes through API Gateway to one of six tool Lambda functions. Each Lambda runs through the guardrail checks against DynamoDB before calling Amazon Bedrock's invoke_model. Separately, if the monthly AWS Budget hits $10, an SNS alert triggers the budget_shutoff Lambda, which flips tools_enabled=False in DynamoDB. Every tool checks that flag before doing anything else.

Browser  │  ├── GET ──▶ CloudFront (security headers + URL rewrite)  │ └──▶ S3 (private bucket, OAC only)  │  └── POST ──▶ API Gateway (HTTP API, 10 req/s, burst 25)  │  ▼  AWS Lambda (Python 3.14)  ├── guardrails.py ← 5-layer cost protection  ├── model_config.py ← reads models.config.json  ├── Amazon Bedrock (cross-region inference profiles)  └── DynamoDB (daily counters + IP rate limits + kill switch)

AWS Budget $10/mo ──▶ SNS ──▶ Lambda (flips kill switch)`

Enter fullscreen mode

Exit fullscreen mode

Layer What Why

Frontend Next.js 16 + Tailwind CSS v3 Static export, zero server cost

Frontend hosting S3 (private, OAC) + CloudFront Security headers, HTTPS, URL rewrite

API API Gateway HTTP API Built-in throttling, cheaper than REST API

Compute Lambda (Python 3.14) One function per tool + shared layer

AI Amazon Bedrock Cross-region inference profiles

Database DynamoDB (on-demand) Counters + feature flag + per-IP rate limits

Alerts SNS + AWS Budgets Auto-shutoff at $10/month

IaC Serverless Framework Single serverless.yml

Central config: models.config.json

Every tool's model, token limit, daily cap, and word count is controlled by one JSON file at the repo's root directory:

{  "region": "ap-southeast-1",  "tools": {  "runbook-generator": {  "modelId": "apac.amazon.nova-pro-v1:0",  "displayLabel": "Nova Pro",  "badgeColor": "blue",  "toolLimit": 50,  "maxTokens": 800,  "maxWords": 600  },  "rto-estimator": {  "modelId": "apac.amazon.nova-lite-v1:0",  "displayLabel": "Nova Lite",  "badgeColor": "green",  "toolLimit": 50,  "maxTokens": 400,  "maxWords": 300  }  } }

Enter fullscreen mode

Exit fullscreen mode

This config is consumed at deploy time by three things:

  • Lambda handlers — via a shared model_config.py module

  • Frontend — a slim copy with just displayLabel + badgeColor for the UI badges

  • serverless-models.js — auto-generates IAM resource ARNs so Bedrock permissions stay scoped to exactly the models in use

The handlers auto-detect the model provider from the modelId and use the correct Bedrock request format — Anthropic's anthropic_version + system string format for Claude, or Amazon's schemaVersion: messages-v1 + system array format for Nova. You can mix providers freely within the same deployment. IAM permissions update automatically on deploy — no manual policy edits needed.

Want to switch from Nova to Claude? Swap the modelId:

"runbook-generator": {  "modelId": "global.anthropic.claude-sonnet-4-6",  "displayLabel": "Sonnet 4.6",  ... }

Enter fullscreen mode

Exit fullscreen mode

Redeploy and that's it 🚀. The Model Selection Guide in the repo has copy-paste-ready model IDs for every supported option.

The 5-layer cost guardrail system

Running a free public tool on Bedrock with no authentication means you need cost protection in layers. Five guardrail layers is probably overkill for most projects. But for a free public demo where anyone can hit the endpoint, I'd rather over-protect than wake up to a surprise bill. All five checks run before Bedrock ever gets called.

Layer 1 — API Gateway throttling

Configured in serverless.yml:

HttpApiStage:  Properties:  DefaultRouteSettings:  ThrottlingRateLimit: 10  ThrottlingBurstLimit: 25

Enter fullscreen mode

Exit fullscreen mode

This is the first line of defense. Abuse gets 429s from API Gateway before Lambda even runs. Zero Bedrock cost.

Layer 2 — Daily usage counters

DynamoDB atomic conditional increments, both global (200/day) and per-tool (50/day for most tools, 30 for DR Reviewer since Nova Pro costs more per call):

table.update_item(  Key={"pk": f"usage#{today}", "sk": sk},  UpdateExpression="ADD run_count :inc SET #d = :date",  ConditionExpression="attribute_not_exists(run_count) OR run_count < :limit",  ExpressionAttributeValues={":inc": 1, ":limit": limit, ":date": today}, )

Enter fullscreen mode

Exit fullscreen mode

Layer 3 — Per-IP rate limiting

3 requests per minute per IP, using DynamoDB TTL'd counters:

minute_bucket = datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M") pk = f"ratelimit#{source_ip}#{minute_bucket}"

table.update_item( Key={"pk": pk, "sk": "ALL"}, UpdateExpression="ADD run_count :inc SET expires_at = :exp", ConditionExpression="attribute_not_exists(run_count) OR run_count < :limit", ExpressionAttributeValues={ ":inc": 1, ":limit": IP_RATE_LIMIT, ":exp": int(time.time()) + 120, }, )`

Enter fullscreen mode

Exit fullscreen mode

Layer 4 — Bedrock token caps

Hard max_tokens per tool (400–800 depending on the tool). Input is also truncated to 8,000 characters before it reaches Bedrock. Most templates I tested were well under 3,000 characters, so the cap rarely triggers, but it bounds the worst case.

Layer 5 — Budget auto-shutoff

AWS Budget at $10/month → SNS → Lambda sets tools_enabled = false in DynamoDB:

def handler(event, context):  table.put_item(Item={  "pk": "config", "sk": "global",  "tools_enabled": False,  "disabled_reason": "Monthly budget threshold reached.",  })

Enter fullscreen mode

Exit fullscreen mode

Every handler checks this flag first. Worst case: tools temporarily unavailable. But never a surprise bill. (There's up to a ~5 minute lag between the budget alert and shutoff, so in-flight requests at alarm time aren't blocked. But at these volumes, the overshoot is negligible.)

Security hardening

A few key controls worth highlighting:

IAM least privilege. bedrock:InvokeModel is scoped to specific inference profile and foundation model ARNs, auto-generated from models.config.json by serverless-models.js. No wildcards on any IAM policy.

S3 private + OAC. No public access. Only CloudFront can read from the bucket.

CORS. API Gateway allowedOrigins is restricted to the CloudFront domain. The Lambda response headers themselves use Access-Control-Allow-Origin: * because the response helper doesn't know the domain and the API relies on rate limiting and daily caps (not auth tokens) for protection. The gateway-level restriction is the meaningful one.*

Prompt injection defense. All handlers use Bedrock's system parameter to separate instructions from user input. More on this in Part 2.

Full details in the Security Assessment doc in the repo.

What's next

That covers the architecture: the serverless stack, the central config, the 5-layer cost guardrails, and the security controls.

In the next part, we'll look at the tools themselves: the prompts behind each one, how to choose the right model per tool, the system prompt pattern for prompt injection defense, and the patterns that are reusable in any Bedrock project.

Try it / Fork it:

Live Demo: https://dr-toolkit.thecloudspark.com

DR Toolkit

AI-powered disaster recovery planning tool for AWS builders. Plan, document, and audit your DR posture with Amazon Bedrock. Resilience planning, accelerated by generative AI.

dr-toolkit.thecloudspark.com

DR Toolkit on AWS

AI-powered disaster recovery planning tool for AWS builders. Plan, document, and audit your DR posture with Amazon Bedrock. Resilience planning, accelerated by generative AI.

Tools

Tool Endpoint Model Daily Limit

1 Runbook Generator POST /runbook Nova Pro 50/day

2 RTO/RPO Estimator POST /rto-estimator Nova Lite 50/day

3 DR Strategy Advisor POST /dr-advisor Nova Lite 50/day

4 Post-Mortem Writer POST /postmortem Nova Lite 50/day

5 DR Checklist Builder POST /checklist Nova Lite 50/day

6 Template DR Reviewer POST /dr-reviewer Nova Pro 30/day

Architecture

  • Frontend: Next.js 16 (static export) + Tailwind CSS → S3 + CloudFront

  • Backend: AWS Lambda (Python 3.14) → API Gateway HTTP API

  • AI: Amazon Bedrock — Nova Lite (Tools 2–5), Nova Pro (Tools 1, 6)

  • Database: DynamoDB single table dr-toolkit-usage (usage counters + feature flag)

  • IaC: Serverless Framework v3 (serverless.yml)

  • Region: ap-southeast-1 (Singapore)

Project Structure

dr-toolkit/ ├── serverless.yml # Serverless Framework

References:

  • Disaster Recovery of Workloads on AWS — AWS Whitepaper

  • Amazon Bedrock Developer Guide

  • Amazon Bedrock Model Catalog

  • Amazon Bedrock Cross-Region Inference

  • Amazon Bedrock — Anthropic Claude Parameters

  • CloudFront Origin Access Control

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
BuildWithAI…claudemodelfoundation …availableversionupdateDEV Communi…

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 137 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Releases