Edge Caching Strategies to Cut Latency and Cost
<ul> <li>Why edge caching changes the latency equation</li> <li>Cache-Control and TTL patterns to make behavior predictable</li> <li>Surrogate keys and targeted invalidation workflows</li> <li>Measuring cache ROI and controlling cost</li> <li>A practical checklist and runbook for edge cache policies</li> <li>Sources</li> </ul> <p>Edge caching is the fastest, cheapest lever you have to cut user-visible latency; misconfigured caching is the stealthiest source of both poor UX and runaway origin cost. I draw on running high-traffic edge platforms to give you exact patterns—<code>Cache-Control</code> composition, sensible TTLs, <code>stale-while-revalidate</code>, and surrogate-key invalidation—that move latency off the critical path and shrink bills.</p> <p>You see this in audits: spikes in P9
-
Why edge caching changes the latency equation
-
Cache-Control and TTL patterns to make behavior predictable
-
Surrogate keys and targeted invalidation workflows
-
Measuring cache ROI and controlling cost
-
A practical checklist and runbook for edge cache policies
-
Sources
Edge caching is the fastest, cheapest lever you have to cut user-visible latency; misconfigured caching is the stealthiest source of both poor UX and runaway origin cost. I draw on running high-traffic edge platforms to give you exact patterns—Cache-Control composition, sensible TTLs, stale-while-revalidate, and surrogate-key invalidation—that move latency off the critical path and shrink bills.
You see this in audits: spikes in P95/P99 latency that coincide with cache misses, dashboards that show rising origin RPS, teams purging entire CDNs after content updates, and exploding numbers of cache keys because headers and query strings vary unpredictably. Those symptoms are operational signals: cache exists, but it isn’t shaping application behavior, and the result is poor UX plus avoidable origin cost.
Why edge caching changes the latency equation
Edge caches collapse geographic and network distance. Serving the same object from a nearby POP instead of the origin reduces round-trip time dramatically and removes origin compute from the request path for cache hits. The proportion of requests served from edge caches—cache hit ratio—directly controls origin load and therefore both latency tail behavior and egress bills.
Designing cache keys is primary: every header, cookie, or query parameter you include in the cache key fragments the cache and reduces hit ratio. Shared-cache directives like s-maxage let you treat the CDN differently from the browser, which is how you get the best of both: long-lived edge responses with conservative browser revalidation.
Important: small, repeatable improvements in hit ratio compound—moving from a 70% to an 85% edge hit ratio reduces origin traffic dramatically and reduces tail latency for the user cohorts that matter most.
Measure and segment hit ratio by URL prefixes, by client region, and by device type so you know where fragmentation happens. Treat the cache key the way you treat authentication logic: explicit, reviewed, and instrumented.
Cache-Control and TTL patterns to make behavior predictable
Get deliberate with Cache-Control. The directives you pick are your contract with every cache in the path:
-
max-age controls client-side freshness.
-
s-maxage overrides max-age for shared caches (CDNs), letting you decouple browser and edge lifetimes.
-
stale-while-revalidate and stale-if-error allow controlled staleness while hiding origin latency or failures. stale-while-revalidate is standardized behavior for serving a stale response immediately while revalidation happens in the background.
-
immutable is useful for fingerprinted assets to tell caches that the response never changes until its URL does.
Practical header patterns (examples):
# Fingerprinted/static assets Cache-Control: public, max-age=31536000, immutable# Fingerprinted/static assets Cache-Control: public, max-age=31536000, immutableHTML or SSR pages (edge-first, browser revalidate immediately)
Cache-Control: public, max-age=0, s-maxage=60, stale-while-revalidate=30
API responses that tolerate short staleness
Cache-Control: public, max-age=5, s-maxage=30, stale-while-revalidate=10, stale-if-error=86400`
Enter fullscreen mode
Exit fullscreen mode
Use s-maxage for edge-first behaviors and max-age for what clients should keep locally. Use stale-while-revalidate to avoid blocking requests during revalidation windows and to collapse bursts of traffic into a single origin fetch (the cache will return stale while a background validation occurs).
Contrarian operational insight: prefer a slightly longer shared-cache TTL with a short browser TTL and targeted invalidation, rather than short TTLs everywhere. Short TTLs shift cost and unpredictability back to your origin; targeted invalidation (surrogate keys / tags) preserves freshness without paying for constant origin traffic.
Surrogate keys and targeted invalidation workflows
When you need freshness on updates, avoid “purge everything.” Tag related responses at the origin so you can invalidate narrowly. Two common implementations:
-
Fastly-style Surrogate-Key headers that index responses against keys at the edge; you purge by key via API.
-
Cloudflare-style Cache-Tag headers that let you purge by tag (or purge by prefix/host for other use cases).
Example: tag a product page and all listing pages that include it:
Cache-Control: max-age=86400 Surrogate-Key: product-62952 category-shoesCache-Control: max-age=86400 Surrogate-Key: product-62952 category-shoesEnter fullscreen mode
Exit fullscreen mode
Purge-by-key examples (illustrative curl requests):
# Fastly - batch surrogate-key purge (JSON body) curl -X POST "https://api.fastly.com/service//purge" \ -H "Fastly-Key: ${FASTLY_API_KEY}" \ -H "Content-Type: application/json" \ -d '{"surrogate_keys":["product-62952","category-shoes"]}'# Fastly - batch surrogate-key purge (JSON body) curl -X POST "https://api.fastly.com/service//purge" \ -H "Fastly-Key: ${FASTLY_API_KEY}" \ -H "Content-Type: application/json" \ -d '{"surrogate_keys":["product-62952","category-shoes"]}'Cloudflare - purge by tag
curl -X POST "https://api.cloudflare.com/client/v4/zones//purge_cache"
-H "Authorization: Bearer ${CF_API_TOKEN}"
-H "Content-Type: application/json"
--data '{"tags":["product-62952","category-shoes"]}'`
Enter fullscreen mode
Exit fullscreen mode
Operational considerations and limits: surrogate/tag headers have size limits and practical key-count limits; large, unbounded sets of tags cause header bloat and parsing problems. Fastly documents header-length limits and Cloudflare documents tag-size/aggregate limits—design keys to be short, stable, and namespaced.
Design rules that have worked repeatedly in large systems:
-
Use composite, normalized keys (e.g., product:62952) rather than embedding free text.
-
Tag both canonical URLs and the derived representations (e.g., mobile/desktop variants) so you can invalidate a single logical object.
-
Emit tags from the origin at render time to keep tagging consistent and avoid prerendering mistakes.
-
Batch and throttle purge API calls from CMS/webhooks to avoid rate-limit cliffs and origin storms.
Measuring cache ROI and controlling cost
Measurement is where caching goes from "hope" to "ROI." Track these baseline metrics with daily resolution: edge hit ratio, origin requests per second (RPS), origin egress (GB), average object size, and latency percentiles (P50/P95/P99).
Compute a simple monthly savings estimate:
-
Baseline origin egress (GB) = total origin requests * average payload size (GB)
-
Estimated saved egress = Baseline * (delta in hit ratio)
-
Cost savings = Estimated saved egress * origin egress price per GB*
Example calculation (illustrative):
-
10 million monthly requests, average payload 50 KB → ~476 GB baseline
-
Increase hit ratio so origin requests fall by 20% → ~95 GB saved
-
At $0.09/GB, monthly saving ≈ $8.55; multiply by larger payloads or request volumes and savings scale quickly.
Also track business-impact metrics: conversion rate by geography and median time-to-first-byte for pages that are most visible to customers. Use these to prioritize which cache policies to tighten or which parts to tag.
Quick comparison table of TTL patterns and trade-offs:
Pattern Typical use Edge TTL example Browser TTL example Benefit Risk
Fingerprinted static
JS/CSS/images with content-hash
max-age=31536000
max-age=31536000, immutable
Maximize cache efficiency
None if fingerprinting is correct
Edge-first HTML
Pages that tolerate short staleness
s-maxage=60, stale-while-revalidate=30
max-age=0
Low P95 latency; controlled freshness
Short window risk if revalidation fails
API short-stale
Read-heavy APIs tolerant of slight staleness
s-maxage=30, stale-while-revalidate=10
max-age=0
Reduced origin RPS
Staleness must be acceptable
No-cache/private
Authenticated or sensitive data
no-store
no-store
Prevents stale sensitive data
Always origin-bound → higher latency/cost
Cloud CDN vendors themselves document the direct relationship between cache hit ratio and origin requests, and recommend policies like s-maxage + revalidation and features like Origin Shield to reduce origin fetches. Use those vendor signals to prioritize changes.
A practical checklist and runbook for edge cache policies
Checklist — audit and baseline (first 72 hours)
-
Collect last 30 days of logs: edge hit ratio, origin RPS, top 1,000 origin-requested URLs, average payload sizes by URL.
-
Identify top contributors to origin traffic and rank by business impact (revenue, pageviews).
-
Classify content into buckets: fingerprinted static, semi-static (catalog pages), dynamic per-user, and APIs.
-
Map current Cache-Control settings and cache-key dimensions (query strings, headers, cookies).
Checklist — policy rollout
-
For fingerprinted assets: deploy Cache-Control: public, max-age=31536000, immutable.
-
For semi-static pages: set s-maxage with stale-while-revalidate and tag responses with Surrogate-Key/Cache-Tag.
-
Implement purge-by-key hooks in the CMS or content pipeline; batch and rate-limit the purge calls.
-
Add monitoring: dashboards for hit ratio, origin RPS, egress GB, and latency. Set alerts for sudden drops in hit ratio or quick RPS increases.
Runbook — urgent invalidation (step-by-step)
-
Identify the minimal set of keys/tags affected by the change (product IDs, page slugs).
-
Issue a targeted purge-by-key or purge-by-tag call using the documented API (use batch where possible).
-
Verify a successful purge by requesting representative URLs and examining edge headers (e.g., X-Cache, CF-Cache-Status, Fastly-Debug) to confirm MISS then re-fill.
-
Monitor origin RPS and CPU. When origin traffic rises unexpectedly, pause non-critical purge batches and allow the cache to refill gradually.
-
If rollback is necessary, serve stale content while revalidations stabilize by ensuring stale-while-revalidate and stale-if-error are enabled for critical endpoints.
Automations and safety nets
-
Implement a purge queue that enforces per-minute quotas and exponential backoff on repeated failures.
-
Emit purge audits (who triggered, keys, timestamp) to a centralized log for post-mortem and cost allocation.
-
Use feature flags or percentage rollouts when changing cache-key composition or a global TTL policy.
Start with a short list of high-impact pages: get measurable hit-ratio improvement for those pages, observe origin egress change, then scale your policies. The work is incremental; measurable improvements come quickly when you stop fragmenting the cache and start invalidating surgically.
Sources
Cache-Control - HTTP | MDN Web Docs - Reference for Cache-Control, s-maxage, immutable, no-store, and practical examples of header composition.
RFC 5861 — HTTP Cache-Control Extensions for Stale Content - Formal specification of stale-while-revalidate and stale-if-error, with behavior expectations for caches.
Keeping things fresh with stale-while-revalidate | web.dev - Practical guidance and trade-offs for stale-while-revalidate on web applications.
Surrogate-Key | Fastly Documentation - Explanation of the Surrogate-Key header, indexing, purging by key, and header-size limits.
Purge cache by cache-tags · Cloudflare Cache (CDN) docs - Details on Cache-Tag usage, purge-by-tag workflow, limits, and API examples.
Increase the proportion of requests that are served directly from the CloudFront caches (cache hit ratio) - Amazon CloudFront Documentation - Definitions of cache hit ratio, advice on increasing hit ratio, and origin-cost reduction mechanisms.
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
versionupdateproductWilliam Hockey - Building the Operating System for the Dollar and Silicon Valley Heresy - [Invest Like the Best, EP.463]
William Hockey is the co-founder of Plaid and the founder and CEO of Column, a software company that owns a bank and powers Ramp, Wise, Bilt, Mercury, and others. He funded Column by borrowing against his Plaid shares and has never raised outside capital. William talks about what owning 100% of his company allows him to do that other venture-backed founder cannot and the personal risk he took to do so. He shares how Silicon Valley's consensus culture produces consensus founders, and why becoming a founder has become too safe. He believes the best builders are specialists and explains with unusual clarity what it takes to become the best in the world at one specific thing. William also spends a lot of time in emerging markets which has given him a unique perspective of the power of the US d

DGAI: Decoupled On-Disk Graph-Based ANN Index for Efficient Updates and Queries
arXiv:2510.25401v3 Announce Type: replace Abstract: On-disk graph-based indexes are favored for billion-scale Approximate Nearest Neighbor Search (ANNS) due to their high performance and cost-efficiency. However, existing systems typically rely on a coupled storage architecture that co-locates vectors and graph topology, which introduces substantial redundant I/O during index updates, thereby degrading usability in dynamic workloads. In this paper, we propose a decoupled storage architecture that physically separates heavy vectors from the lightweight graph topology. This design substantially improves update performance by reducing redundant I/O during updates. However, it introduces I/O amplification during ANNS, leading to degraded query efficiency.To improve query performance within the
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Products
William Hockey - Building the Operating System for the Dollar and Silicon Valley Heresy - [Invest Like the Best, EP.463]
William Hockey is the co-founder of Plaid and the founder and CEO of Column, a software company that owns a bank and powers Ramp, Wise, Bilt, Mercury, and others. He funded Column by borrowing against his Plaid shares and has never raised outside capital. William talks about what owning 100% of his company allows him to do that other venture-backed founder cannot and the personal risk he took to do so. He shares how Silicon Valley's consensus culture produces consensus founders, and why becoming a founder has become too safe. He believes the best builders are specialists and explains with unusual clarity what it takes to become the best in the world at one specific thing. William also spends a lot of time in emerging markets which has given him a unique perspective of the power of the US d



Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!