Building a Production-Ready Serverless App on Google Cloud (Part 1: Architecture)
<h2> The Problem </h2> <p>In my <a href="https://dev.to/gde/testing-antigravity-building-a-data-intensive-poc-at-300kmh-4c57">previous post</a>, I shared how I used an AI agent framework during a train ride to build a Proof of Concept (POC) for a project called the <a href="https://github.com/patricio-navarro/dog_finder_app" rel="noopener noreferrer">Dog Finder App</a>. The response was great, but the experiment raised a technical question: How do you build a POC quickly without creating a messy monolith that you'll have to rewrite later?</p> <p>When building a data-intensive application, engineers usually face a harsh trade-off. You can either build it fast to prove the concept (and inherit massive technical debt), or you can build it "right" (and spend weeks provisioning infrastructure a
The Problem
In my previous post, I shared how I used an AI agent framework during a train ride to build a Proof of Concept (POC) for a project called the Dog Finder App. The response was great, but the experiment raised a technical question: How do you build a POC quickly without creating a messy monolith that you'll have to rewrite later?
When building a data-intensive application, engineers usually face a harsh trade-off. You can either build it fast to prove the concept (and inherit massive technical debt), or you can build it "right" (and spend weeks provisioning infrastructure and writing boilerplate).
By leveraging serverless services on Google Cloud Platform (GCP), we can break that trade-off.
This is the first in a three-part series where I will show you how to architect, automate, and deploy a complete, decoupled data application. We will look at how combining serverless tools with strict Data Engineering practices allows you to spin up a solution that is both incredibly fast to build and ready for production traffic.
The Architecture: Decoupling by Default
In traditional POCs, it is common to see a tightly coupled monolith: a single backend service receiving HTTP requests, saving images to a local disk, writing state to a database, and running heavy analytical queries. If one component bottlenecks, the entire application crashes.
For the Dog Finder app—a system designed to ingest real-time sightings of lost dogs and route them for geographical analysis—we needed a system that scales instantly under load but costs absolutely nothing when there is no traffic.
To achieve this, we default to a decoupled architecture. We split the ingestion, state, and analytics across specialized, managed serverless components:
-
Google Cloud Run (Compute): Hosts our stateless Flask web application and API. It handles incoming user traffic, scales up automatically on demand, and drops to zero when idle.
-
Google Cloud Storage (Blob Storage): Handles the heavy payloads. User-uploaded images of dogs go straight here, keeping our databases lean and performant.
-
Firestore (Operational Database): Our OLTP layer. This NoSQL database stores the real-time state of the application, allowing the frontend to read and display current sightings with millisecond latency.
-
Cloud Pub/Sub (Ingestion Buffer): The shock absorber of our system. When a sighting occurs, the backend publishes an event here and immediately responds to the user, completely decoupling the web app from the analytics pipeline.
-
BigQuery (Data Warehouse): Our OLAP layer. The final destination where all structured sightings land for historical storage, regional partitioning, and complex analytical querying.
The Compute Layer: Scaling to Zero with Cloud Run
At the core of the Dog Finder app is a Python Flask backend. In a traditional setup, you would provision a Virtual Machine (Compute Engine) to run this application. You’d pay for that VM 24/7, even at 3:00 AM when no one is reporting lost dogs.
Instead, we containerized the application using a standard Dockerfile and deployed it to Google Cloud Run.
Cloud Run is a fully managed compute platform that automatically scales stateless containers. As an architect, enforcing statelessness is critical here. The Flask app does not store any session data or images on its local filesystem. Its only job is to act as a highly efficient traffic cop:
-
Receive the HTTP POST request (the sighting payload and image).
-
Validate the data payload.
-
Offload the heavy lifting to our specialized data services.
-
Return a success response to the user.
If there is a massive spike in lost dog reports, Cloud Run spins up multiple container instances instantly. When traffic drops, it scales down to zero. We only pay for the exact number of milliseconds the CPU spends processing a request.
The Data Split: OLTP vs. OLAP in a Serverless World
This is where many rapid POCs turn into unmaintainable monoliths. A common mistake is throwing all your data—images, real-time app state, and analytical history—into a single relational database like PostgreSQL. As the application grows, database locks increase, queries slow down, and storage costs skyrocket.
To prevent this, we split the data path into three specialized lanes:
1. Cloud Storage (The Payload)
Databases are expensive places to store binary files. When a user uploads a photo of a dog, our Cloud Run app sends that file directly to a Google Cloud Storage bucket. The app then grabs the resulting image_url and uses that string for the rest of the data pipeline. This keeps our databases incredibly lean and fast.
2. Firestore (Operational / OLTP)
Users expect a snappy UI. When they open the app, they want to see the latest dog sightings immediately. We use Firestore (a NoSQL document database) as our operational layer. After saving the image, Cloud Run writes the sighting record to Firestore. This provides low-latency reads and writes, ensuring the web frontend feels instantaneous without running complex SQL joins.
3. BigQuery (Analytical / OLAP)
While Firestore is great for the UI, it is not designed for heavy aggregations (e.g., "How many Golden Retrievers were lost in the northern region last month compared to last year?").
For this, we route the data to BigQuery. We explicitly partitioned the BigQuery table by sighting_date. This is a crucial Data Engineering standard: when analysts query the table for recent trends, BigQuery only scans the relevant partitions, drastically reducing query costs and execution time.
The Payoff: Visualizing the Data (Looker Studio)
An architecture is only as good as the insights it delivers. The real payoff of this decoupled, partitioned setup became obvious when I wanted to add a visualization layer.
Because we cleanly separated our operational state (Firestore) from our analytical history (BigQuery), I was able to connect Looker Studio directly to the BigQuery table in minutes. I didn't have to worry about complex API integrations or degrading the performance of the live web app.
I created a real-time dashboard that plots the sightings by region. As new records flow through the serverless pipeline, the dashboard updates automatically, providing a live heat map of lost dog hotspots. This transforms the POC from a simple "data entry" app into a complete, end-to-end data product.
Conclusion & What’s Next
In this first part, we laid out the "boxes" of our architecture. By leveraging Cloud Run, Cloud Storage, Firestore, and BigQuery, we designed a system that scales instantly, costs nothing when idle, and handles both operational and analytical workloads perfectly.
But having the right boxes is only half the battle. How do we connect them reliably?
In Part 2, we will dive into the lines connecting the boxes. I will show you how to use Pub/Sub to fully decouple ingestion, how to set up a direct serverless subscription from Pub/Sub to BigQuery (no code required), and how to enforce strict Data Contracts so your beautiful data warehouse doesn’t turn into a data swamp.
DEV Community
https://dev.to/gde/building-a-production-ready-serverless-app-on-google-cloud-part-1-architecture-49dSign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
updateproductapplicationWebhook Best Practices: Retry Logic, Idempotency, and Error Handling
<h1> Webhook Best Practices: Retry Logic, Idempotency, and Error Handling </h1> <p>Most webhook integrations fail silently. A handler returns 500, the provider retries a few times, then stops. Your system never processed the event and no one knows.</p> <p>Webhooks are not guaranteed delivery by default. How reliably your integration works depends almost entirely on how you write the receiver. This guide covers the patterns that make webhook handlers production-grade: proper retry handling, idempotency, error response codes, and queue-based processing.</p> <h2> Understand the Delivery Model </h2> <p>Before building handlers, understand what you are dealing with:</p> <ul> <li>Providers send webhook events as HTTP POST requests</li> <li>They expect a 2xx response within a timeout (typically 5
Why AI Agents Need a Trust Layer (And How We Built One)
<p><em>What happens when AI agents need to prove they're reliable before anyone trusts them with real work?</em></p> <h2> The Problem No One's Talking About </h2> <p>Every week, a new AI agent framework drops. Autonomous agents that can write code, send emails, book flights, manage databases. The capabilities are incredible.</p> <p>But here's the question nobody's answering: <strong>how do you know which agent to trust?</strong></p> <p>Right now, hiring an AI agent feels like hiring a contractor with no references, no portfolio, and no track record. You're just... hoping it works. And when it doesn't, there's no accountability trail.</p> <p>We kept running into this building our own multi-agent systems:</p> <ul> <li>Agent A says it can handle email outreach. Can it? Who knows.</li> <li>Age
Building a scoring engine with pure TypeScript functions (no ML, no backend)
<p>We needed to score e-commerce products across multiple dimensions: quality, profitability, market conditions, and risk.</p> <p>The constraints:</p> <ul> <li>Scores must update in real time</li> <li>Must run entirely in the browser (Chrome extension)</li> <li>Must be explainable (not a black box)</li> </ul> <p>We almost built an ML pipeline — training data, model serving, APIs, everything.</p> <p>Then we asked a simple question:</p> <p><strong>Do we actually need machine learning for this?</strong></p> <p>The answer was no.</p> <p>We ended up building several scoring engines in pure TypeScript.<br> Each one is a single function, under 100 lines, zero dependencies, and runs in under a millisecond.</p> <h2> What "pure function" means here </h2> <p>Each scoring engine follows 3 rules:</p> <
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Products
Webhook Best Practices: Retry Logic, Idempotency, and Error Handling
<h1> Webhook Best Practices: Retry Logic, Idempotency, and Error Handling </h1> <p>Most webhook integrations fail silently. A handler returns 500, the provider retries a few times, then stops. Your system never processed the event and no one knows.</p> <p>Webhooks are not guaranteed delivery by default. How reliably your integration works depends almost entirely on how you write the receiver. This guide covers the patterns that make webhook handlers production-grade: proper retry handling, idempotency, error response codes, and queue-based processing.</p> <h2> Understand the Delivery Model </h2> <p>Before building handlers, understand what you are dealing with:</p> <ul> <li>Providers send webhook events as HTTP POST requests</li> <li>They expect a 2xx response within a timeout (typically 5
Building a scoring engine with pure TypeScript functions (no ML, no backend)
<p>We needed to score e-commerce products across multiple dimensions: quality, profitability, market conditions, and risk.</p> <p>The constraints:</p> <ul> <li>Scores must update in real time</li> <li>Must run entirely in the browser (Chrome extension)</li> <li>Must be explainable (not a black box)</li> </ul> <p>We almost built an ML pipeline — training data, model serving, APIs, everything.</p> <p>Then we asked a simple question:</p> <p><strong>Do we actually need machine learning for this?</strong></p> <p>The answer was no.</p> <p>We ended up building several scoring engines in pure TypeScript.<br> Each one is a single function, under 100 lines, zero dependencies, and runs in under a millisecond.</p> <h2> What "pure function" means here </h2> <p>Each scoring engine follows 3 rules:</p> <
Why AI Agents Need a Trust Layer (And How We Built One)
<p><em>What happens when AI agents need to prove they're reliable before anyone trusts them with real work?</em></p> <h2> The Problem No One's Talking About </h2> <p>Every week, a new AI agent framework drops. Autonomous agents that can write code, send emails, book flights, manage databases. The capabilities are incredible.</p> <p>But here's the question nobody's answering: <strong>how do you know which agent to trust?</strong></p> <p>Right now, hiring an AI agent feels like hiring a contractor with no references, no portfolio, and no track record. You're just... hoping it works. And when it doesn't, there's no accountability trail.</p> <p>We kept running into this building our own multi-agent systems:</p> <ul> <li>Agent A says it can handle email outreach. Can it? Who knows.</li> <li>Age
🚀 I Vibecoded an AI Interview Simulator in 1 Hour using Gemini + Groq
<h1> 🚀 Skilla – Your AI Interview Simulator </h1> <h2> 💡 Inspiration </h2> <p>Interviews can be intimidating, especially without proper practice or feedback. Many students and job seekers don’t have access to real interview environments where they can build confidence and improve their answers.</p> <p>That’s why I built <strong>Skilla</strong> — an AI-powered interview simulator that helps users practice smarter, gain confidence, and improve their communication skills in a realistic way.</p> <h2> 🌐Live URL: <strong><a href="https://skilla-ai.streamlit.app" rel="noopener noreferrer">https://skilla-ai.streamlit.app</a></strong> </h2> <h2> 🤖 What It Does </h2> <p><strong>Skilla</strong> is a smart AI interview coach that:</p> <ul> <li>🎤 Simulates real interview scenarios </li> <li>🧠 Ask
Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!