Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessAnthropic is having a moment in the private markets; SpaceX could spoil the partyTechCrunchAmazon is selling a Samsung Galaxy tablet with AI-capabilities for just $270 - aol.comGNews AI SamsungThe Tool That Built the Modern World Is Still the Most Powerful Thing in an Engineer’s ArsenalMedium AI[P] GPU friendly lossless 12-bit BF16 format with 0.03% escape rate and 1 integer ADD decode works for AMD & NVIDIAReddit r/MachineLearningI Tested AI Coding Assistants on the Same Full-Stack App — Here’s the Real WinnerMedium AIIs the Arrow of Time a Crucial Missing Component in Artificial Intelligence?Medium AIv0.20.1: Revert "enable flash attention for gemma4 (#15296)" (#15311)Ollama ReleasesAutomation vs AI: Not Just Similar — They Solve Fundamentally Different ProblemsMedium AIWalmart's AI Checkout Converted 3x Worse. The Interface Is Why.DEV Community✨ Why Humanity Still Moves Toward AI.Medium AIPredicting 10 Minutes in 1 Square Meter: The Ultimate AI Boundary?DEV CommunityOracle Database 26ai: The World’s First AI-Native Database Just Changed EverythingMedium AIBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessAnthropic is having a moment in the private markets; SpaceX could spoil the partyTechCrunchAmazon is selling a Samsung Galaxy tablet with AI-capabilities for just $270 - aol.comGNews AI SamsungThe Tool That Built the Modern World Is Still the Most Powerful Thing in an Engineer’s ArsenalMedium AI[P] GPU friendly lossless 12-bit BF16 format with 0.03% escape rate and 1 integer ADD decode works for AMD & NVIDIAReddit r/MachineLearningI Tested AI Coding Assistants on the Same Full-Stack App — Here’s the Real WinnerMedium AIIs the Arrow of Time a Crucial Missing Component in Artificial Intelligence?Medium AIv0.20.1: Revert "enable flash attention for gemma4 (#15296)" (#15311)Ollama ReleasesAutomation vs AI: Not Just Similar — They Solve Fundamentally Different ProblemsMedium AIWalmart's AI Checkout Converted 3x Worse. The Interface Is Why.DEV Community✨ Why Humanity Still Moves Toward AI.Medium AIPredicting 10 Minutes in 1 Square Meter: The Ultimate AI Boundary?DEV CommunityOracle Database 26ai: The World’s First AI-Native Database Just Changed EverythingMedium AI
AI NEWS HUBbyEIGENVECTOREigenvector

I Built Consistent Hashing From Scratch in Go — Here's What I Learned

DEV Communityby Mustafa Veysi SoyvuralApril 2, 20266 min read1 views
Source Quiz

If you've ever added a server to a cache cluster and watched your database melt, you already know the problem consistent hashing solves. You just might not know it by name. I built a full implementation from scratch in Go to understand it deeply. This post walks through what I learned — the problem, the fix, and the gotchas nobody tells you about. The five-minute version You have 5 cache servers. You route keys with hash(key) % 5 . Life is good. Then traffic spikes and you add a 6th server. Now it's hash(key) % 6 . Sounds harmless, right? Here's what actually happens: Before: hash("user:1001") % 5 = 3 → Server C After: hash("user:1001") % 6 = 1 → Server A ← moved! That key was sitting happily on Server C. Now every client thinks it's on Server A, where it doesn't exist. Cache miss. The req

If you've ever added a server to a cache cluster and watched your database melt, you already know the problem consistent hashing solves. You just might not know it by name.

I built a full implementation from scratch in Go to understand it deeply. This post walks through what I learned — the problem, the fix, and the gotchas nobody tells you about.

The five-minute version

You have 5 cache servers. You route keys with hash(key) % 5. Life is good.

Then traffic spikes and you add a 6th server. Now it's hash(key) % 6. Sounds harmless, right?

Here's what actually happens:

Before: hash("user:1001") % 5 = 3 → Server C After: hash("user:1001") % 6 = 1 → Server A ← moved!

Enter fullscreen mode

Exit fullscreen mode

That key was sitting happily on Server C. Now every client thinks it's on Server A, where it doesn't exist. Cache miss. The request hits your database.

And it's not just one key. ~83% of all keys remap. With 100K cached items, that's 83,000 simultaneous cache misses hammering your database. People call this a "thundering herd" or "cache stampede." I call it a bad Monday.

What consistent hashing actually does

Instead of hash % N, imagine putting everything on a circle — a ring from 0 to 2^32.

Nodes get hashed onto the ring. Keys get hashed onto the ring. To find which node owns a key, you start at the key's position and walk clockwise until you bump into a node.

0  │  Node-C ──┤  │  2^32 ────────┼──────── 2^8  │  ├── "user:1001" → walks clockwise → Node-A  │  Node-A ──┤  │  2^16  │  Node-B ──┤

Enter fullscreen mode

Exit fullscreen mode

When you add a node, it only takes over the arc between itself and the previous node. Everything else stays put. The math works out to roughly K/N keys moving (K = total keys, N = new node count). So with 100K keys and 6 nodes, about 17K keys move instead of 83K.

I measured this in my implementation:

Strategy │ Keys Remapped (5→6 nodes, 100K keys) ──────────────────│───────────────────────────────────── Modulo (hash%N) │ 83,803 (83.8%) Consistent Hash │ 15,723 (15.7%)

Enter fullscreen mode

Exit fullscreen mode

That's not a marginal improvement. That's the difference between "the site went down" and "nobody noticed."

The virtual node trick

There's a catch with the basic approach. If you have 5 nodes, you have 5 points on the ring. And 5 points on a circle don't spread evenly — some nodes end up owning huge arcs while others get almost nothing.

I measured this too. With 5 nodes and 50K keys, one node got 32,014 keys while another got 1,855. That's a 17:1 ratio. Terrible.

The fix is virtual nodes. Instead of putting each physical node at one position, you put it at 150 positions. Node-A becomes Node-A#0, Node-A#1, ... Node-A#149, each hashed to a different spot on the ring.

Here's what that does to the distribution:

Virtual Nodes Std Deviation Worst Node (ratio to fair share)

1 11,353 3.20x

50 4,601 1.83x

150 2,824 1.47x

500 976 1.17x

150 is the sweet spot most production systems use. Good enough distribution, not too much memory overhead.

What happens when a node dies

This is where it gets interesting for real-world systems.

I built a cache simulator on top of the hash ring — basically a miniature distributed Memcached. Five "Redis nodes," 10,000 keys. Then I killed one.

redis-1 2,197 keys ██████████████████████ redis-2 1,731 keys █████████████████ redis-3 1,559 keys ███████████████ redis-4 1,730 keys █████████████████ redis-5 2,783 keys ████████████████████████████

redis-3 goes down!

Cache hit rate: 84.4% Cache misses: 1,559 (exactly the keys redis-3 had)`

Enter fullscreen mode

Exit fullscreen mode

Exactly 1,559 misses — the precise number of keys that were on redis-3. Zero keys from other nodes were affected. That's the whole point: the blast radius of a failure is limited to the failed node.

In production, you'd replicate each key to the next 2-3 nodes on the ring, so even that node's keys survive. But that's a story for another post.

Things I got wrong the first time

Hash collisions between virtual nodes. With 150 vnodes per physical node and a 32-bit hash space, collisions are rare but not impossible. My first version silently overwrote one node's position with another's, which corrupted the ring in subtle ways. The fix: check before inserting and skip collisions.

Locking granularity. The ring needs a sync.RWMutex because reads (key lookups) vastly outnumber writes (node additions/removals). My first version used a regular Mutex and paid for it under concurrent load.

Silent error swallowing. When the ring is empty, GetNode returns an error. My first version of the cache simulator just ignored that error and kept going, silently dropping writes. In a real system, that's how you get data loss that takes hours to diagnose.

Testing the hard stuff

Unit tests are straightforward. The interesting part is testing the mathematical guarantees under chaos.

I wrote five "catastrophic tests" that simulate production nightmares:

Mass failure: Remove 5 of 10 nodes simultaneously. Verify that only the dead nodes' keys remapped. Result: 0 keys from surviving nodes moved. Perfect isolation.

Rapid churn: 20 random add/remove cycles. Each individual operation should remap roughly K/N keys. Result: every cycle stayed within 1.5x of the theoretical bound.

Concurrent chaos: 20 goroutines doing lookups while 5 goroutines add and remove nodes for 2 seconds straight, with Go's race detector on. Result: 2.4M reads, zero data races, 0% error rate.

These aren't pass/fail tests — they're property-based tests that verify the algorithm's mathematical guarantees hold under pressure.

The implementation

The full code is on GitHub: soyvural/consistent-hashing

It's about 800 lines of Go with zero external dependencies. The core is a sorted slice of uint32 positions with binary search for O(log n) lookups. Hash function is pluggable via an interface — ships with FNV-1a, MD5, and CRC32.

# try it yourself git clone https://github.com/soyvural/consistent-hashing.git cd consistent-hashing go run cmd/demo/main.go

Enter fullscreen mode

Exit fullscreen mode

The demo prints all the numbers I quoted in this post. You can tweak the vnode count, swap hash functions, and see exactly how the distribution changes.

Where consistent hashing shows up

Once you know what to look for, it's everywhere:

  • Memcached clients use it to route keys to servers

  • Amazon DynamoDB uses it for data partitioning (described in their 2007 Dynamo paper)

  • Apache Cassandra uses it for its token ring

  • Load balancers use it for sticky sessions that survive backend scaling

The common thread: any time you need to spread work across a changing set of workers without reshuffling everything.

Further reading

If you want to go deeper:

  • Karger et al. (1997) — the original paper that started it all

  • Amazon's Dynamo paper (2007) — consistent hashing at massive scale

  • Jump Consistent Hash (2014) — a faster alternative when your nodes don't need names

The full source code is at github.com/soyvural/consistent-hashing. Stars welcome if you found this useful.

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

versionproductinterface

Knowledge Map

Knowledge Map
TopicsEntitiesSource
I Built Con…versionproductinterfacepapergithubDEV Communi…

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 180 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Releases