Your DNS is Lying to You
<h1> Your DNS is Lying to You </h1> <h2> What Actually Happens Between a URL and the First Byte </h2> <p><em>Reading time: ~13 minutes</em></p> <p>You typed <code>api.example.com</code> into your browser — or <code>curl</code>'d it, or your service tried to connect to it — and something happened. Some bytes arrived. You moved on.</p> <p>It is not a lookup table. It is a distributed, eventually consistent database with a 40-year-old trust model, deployed across millions of machines that have no obligation to agree with each other. When it goes wrong — and it does go wrong — the failure modes are some of the most maddening in all of networking, because the answer you get looks valid. It's just wrong.</p> <p>There are four distinct roles. Most people know one of them.</p> <h2> The Bug That Ma
Your DNS is Lying to You
What Actually Happens Between a URL and the First Byte
Reading time: ~13 minutes
You typed api.example.com into your browser — or curl'd it, or your service tried to connect to it — and something happened. Some bytes arrived. You moved on.
It is not a lookup table. It is a distributed, eventually consistent database with a 40-year-old trust model, deployed across millions of machines that have no obligation to agree with each other. When it goes wrong — and it does go wrong — the failure modes are some of the most maddening in all of networking, because the answer you get looks valid. It's just wrong.
There are four distinct roles. Most people know one of them.
The Bug That Made This Click
Picture this: a microservice can't connect to a dependency. Health checks pass. curl works fine from your laptop. The service throws connection errors that make no sense.
The service is running in a Docker container. Inside the container, curl api.internal.corp returns a different IP than dig api.internal.corp run from the same container.
Different. IP.
Same host. Same moment. Different tool. Different answer.
We'll learn exactly why that's possible before the end of this post.
The Cast of Characters
Before the mechanics, let's name the players. There are four distinct roles in the DNS resolution chain, and conflating them is the source of most confusion.
The stub resolver lives on your machine. It's not really a DNS server — it can't do much on its own. It's the code in libc (or your OS networking stack) that takes a hostname and says "I need an IP for this" by forwarding the question to someone who can actually answer it. On Linux, that's getaddrinfo(). On macOS it goes through mDNSResponder. Every DNS query your applications make starts here.
The recursive resolver (also called a "full-service resolver" or sometimes misleadingly the "DNS server") does the actual work. This is the server your stub resolver talks to. Its job is to walk the DNS tree from the root all the way down to a definitive answer. Your ISP runs one. Google runs one at 8.8.8.8. Cloudflare at 1.1.1.1. Your office probably has one too.
The authoritative nameserver actually owns the answer. If you bought example.com and set up your DNS records, you pointed your registrar at some authoritative nameservers. Those servers are the canonical source of truth for your zone. They don't do recursion — they just answer questions about records they own.
The root nameservers are where recursion starts when a recursive resolver has no cached answer. There are exactly 13 of them by IP — hundreds of physical machines behind those 13 addresses via anycast. Why 13? Because the original DNS protocol used 512-byte UDP packets, and 13 NS records was the maximum that fit 🤷♂️. They don't know where api.example.com is, but they know who handles .com.
What Actually Happens When You Type a URL
Let's trace it. You type https://api.example.com/v1/users and hit Enter.
Your browser extracts the hostname: api.example.com. It calls into the OS resolver. Before any network packet leaves your machine, the OS checks three things in order:
First, /etc/hosts. This is a flat text file that predates DNS by over a decade. It's checked before anything else, unconditionally. If api.example.com appears in /etc/hosts, the search is over — no network query happens at all. This is why adding entries to /etc/hosts works for local development, and it's also why corporate malware occasionally modified it to redirect banking sites. It's also why many devs are confused when their DNS changes don't seem to take effect: they have a stale hosts file entry they forgot about from six months ago.
Second, the local DNS cache. Your OS, and often a local daemon (systemd-resolved on modern Linux, mDNSResponder on macOS), keeps a cache of recent answers. If the cache has a fresh entry, done.
Third, and only if neither of those had an answer: a query goes out to the recursive resolver specified in /etc/resolv.conf.
# /etc/resolv.conf — the file that decides where your DNS queries go nameserver 192.168.1.1 # your router, probably nameserver 8.8.8.8 # fallback: Google search corp.internal # try appending this domain to short names# /etc/resolv.conf — the file that decides where your DNS queries go nameserver 192.168.1.1 # your router, probably nameserver 8.8.8.8 # fallback: Google search corp.internal # try appending this domain to short namesEnter fullscreen mode
Exit fullscreen mode
That nameserver line is the only thing most developers know about /etc/resolv.conf. The search directive is where it gets interesting — and where short hostnames like db can silently resolve to db.corp.internal, which is either convenient or baffling depending on the day.
That's why db works on your laptop but fails in CI: one has a search corp.internal entry and the other doesn't.
The Recursive Resolver Earns Its Name
Your query reaches the recursive resolver. Let's say it's a cold cache — never seen api.example.com before.
The resolver starts at the top.
It queries one of the 13 root nameserver IPs (hardcoded into all resolver software as the "root hints"). The root server doesn't know api.example.com. It responds with a referral: "I don't know, but .com is handled by these nameservers."
The resolver then queries a .com Top Level Domain (TLD) nameserver. The TLD server doesn't know api.example.com. It responds with a referral: "I don't know, but example.com is handled by these nameservers."
The resolver then queries an authoritative nameserver for example.com. This one knows. It returns an A record (IPv4) or AAAA record (IPv6) for api.example.com, along with a TTL — a "time to live" value in seconds.
The resolver caches the answer for TTL seconds and returns it to your stub resolver. Your stub resolver hands it to getaddrinfo(). Your browser gets an IP. The connection starts.
That whole chain — root → TLD → authoritative — happened in the background, probably in under 100ms. On a warm cache, it's a single hop and maybe 5ms.
Query trace for api.example.com: → root nameserver (hardcoded IPs) ← "ask .com TLD at 192.5.6.30" → .com TLD nameserver (192.5.6.30) ← "ask ns1.example.com at 93.184.216.10" → ns1.example.com (93.184.216.10) ← "api.example.com A 198.51.100.42 TTL 300" → your stub resolver ← 198.51.100.42Query trace for api.example.com: → root nameserver (hardcoded IPs) ← "ask .com TLD at 192.5.6.30" → .com TLD nameserver (192.5.6.30) ← "ask ns1.example.com at 93.184.216.10" → ns1.example.com (93.184.216.10) ← "api.example.com A 198.51.100.42 TTL 300" → your stub resolver ← 198.51.100.42Enter fullscreen mode
Exit fullscreen mode
Three round trips. More if any of those delegations weren't cached. And critically: the recursive resolver that did all this work is running on someone else's machine, which you do not control, and which has its own cache that it shares with everyone else who uses it.
TTL Is Not a Suggestion, It's Also Not a Guarantee
TTL (Time to Live) is the record's expiry hint. If your A record has TTL 300, it means "cache this for 300 seconds, then check again."
Here's what TTL cannot do: it cannot tell resolvers that already have a cached answer to throw it away. When you update a DNS record, the old answer is still valid in every cache that holds it, until their individual TTLs expire. If your TTL was 24 hours (86400 seconds), some resolvers will be serving the old answer for up to 24 hours after your change.
This is why "just flush DNS" is not a real answer to a propagation problem. You can flush your local machine's cache. You cannot flush Google's cache. You cannot flush your ISP's cache. You cannot flush the cache of the recursive resolver your user's mobile carrier uses.
What you can do: lower your TTL well before a migration. If you know you're moving an IP next Tuesday, set your TTL to 60 seconds on Friday. Let the short TTL propagate. Do the migration. The blast radius of stale caches is 60 seconds instead of 24 hours.
What you cannot do: change the TTL and have it take effect immediately. The TTL change itself has to propagate, and it propagates at the old TTL.
That's right. The new TTL doesn't matter until the old TTL expires and resolvers re-fetch the record. Plan accordingly.
That's why your DNS change isn't working yet. The old answer is cached at some resolver with a TTL of 3600 and there are 47 minutes left. Verify by querying the authoritative server directly: dig @ns1.example.com api.example.com — that bypasses all caches and shows what the authoritative server has right now.
CNAME Chains and Why They're Weird
An A record maps a name to an IP. A CNAME record maps a name to another name — a canonical alias.
api.example.com CNAME loadbalancer.us-east-1.elb.amazonaws.com
Enter fullscreen mode
Exit fullscreen mode
When a resolver sees a CNAME, it has to resolve the target too. So api.example.com → look up loadbalancer.us-east-1.elb.amazonaws.com → that returns an A record. Two lookups, one query from your perspective.
CNAMEs are used everywhere — CDNs, load balancers, cloud services — because they let you point a name at another name that the provider controls. When AWS moves your load balancer, they update their A record; your CNAME keeps working.
The rule everyone forgets: you cannot have a CNAME at a zone apex. The zone apex is the bare domain itself — example.com with nothing in front of it. Why? Because CNAME has to be the only record for a name (it replaces the name entirely), but the zone apex needs SOA and NS records. You can't have a CNAME and also have NS records. The DNS spec doesn't allow it.
This is why CDN and DNS providers invented CNAME flattening (Cloudflare calls it CNAME at the root, Route53 calls it ALIAS records). When you point example.com at example.com.cdn.cloudflare.net, the provider does the CNAME lookup at query time and returns a flat A record to the client. From the outside, it looks like an A record. It's not. It's a CNAME that your DNS provider is silently expanding.
This matters when you're debugging. If dig example.com A returns an IP directly but you know you set up a CNAME at the root, the flattening is working. If it returns a CNAME, something's wrong with your provider config. These look identical to the application layer.
That's why you can't put a CNAME on example.com itself — CNAME semantics conflict with the SOA and NS records that every zone apex must have. Your DNS provider works around this with record flattening, which looks like an A record to the outside world.
Why dig and curl Give Different Answers
Back to my Docker debugging story. curl api.internal.corp returned a different IP than dig api.internal.corp. How?
Because they use different resolution paths.
dig is a DNS tool. It talks directly to a DNS resolver — by default, whatever is in /etc/resolv.conf, or you can specify one with @. It bypasses the OS resolver entirely, bypasses the local cache, and makes a raw DNS query.
curl uses getaddrinfo(). That function goes through the full OS name resolution stack, including /etc/nsswitch.conf — the OS's routing table for name resolution, a priority-ordered list of where to look. On a typical Linux machine it looks like:
# /etc/nsswitch.conf hosts: files dns myhostname# /etc/nsswitch.conf hosts: files dns myhostnameEnter fullscreen mode
Exit fullscreen mode
That files entry means /etc/hosts runs first — and curl reads it, while dig does not.
Inside Docker, it gets more interesting. Docker injects its own nameserver into the container's /etc/resolv.conf, pointing at Docker's internal resolver at 127.0.0.11. That resolver handles Docker network DNS (container names, service names). It may return different answers for internal names than an external DNS server would. dig run without arguments still reads /etc/resolv.conf — so inside the container, dig api.internal.corp was querying Docker's resolver, not the corporate DNS. And Docker's resolver didn't know about the internal service.
The rule: dig shows you what a DNS query returns. curl shows you what the application stack resolves. They are not always the same query against the same server.
When they disagree, the question is: which one matches what your application uses? Usually it's the curl path, because your application also calls getaddrinfo().
That's why dig and curl gave different answers inside that Docker container — Docker's internal resolver handled curl's path but not dig's direct query to the corporate DNS.
The Trust Problem Nobody Talks About
DNS was designed in 1983. The original protocol has no cryptographic authentication. A resolver asks a question; an authoritative server answers. There's nothing in the original design that proves the answer came from the real authoritative server.
This isn't a theoretical concern. DNS spoofing and cache poisoning are real attacks. An attacker who can intercept or forge DNS responses can redirect any hostname to any IP — transparently, with no visible error to the user.
The fix is DNSSEC — DNS Security Extensions. DNSSEC adds cryptographic signatures to DNS records. Authoritative servers sign their records with a private key; validators check those signatures against a public key published in the parent zone. The chain of trust runs from the root all the way down.
DNSSEC deployment is... fragmented. The root zone is signed. Many TLDs are signed. Many individual domains are not. And critically, many recursive resolvers don't validate DNSSEC even if the zone is signed — they just pass the signatures along. Validation has to happen somewhere for it to matter, and the chain has a lot of links.
This is why your browser sometimes shows a DNSSEC validation warning on a subdomain you're confident is yours — someone in the delegation chain has a misconfigured or expired signing key.
DoT (DNS over TLS) and DoH (DNS over HTTPS) are a different layer of protection. They encrypt the query in transit — so your ISP can't see what you're looking up, and a network attacker can't intercept the packet. But they don't solve the authoritative trust problem. You're still trusting the resolver at the other end.
The honest summary: DNS's trust model is "trust whoever answers." DNSSEC tries to fix that. Its deployment is patchy. DoT/DoH protect the wire, not the answer.
Practical Debugging Toolkit
When DNS is misbehaving, you want to query different layers explicitly rather than guessing:
# What does the authoritative server say right now? dig @ns1.example.com api.example.com A# What does the authoritative server say right now? dig @ns1.example.com api.example.com AWhat does your configured resolver return (including its cache)?
dig api.example.com A
What would an uncached query to a specific public resolver return?
dig @8.8.8.8 api.example.com A
Trace the full recursive delegation chain
dig +trace api.example.com A
Show what getaddrinfo() would return (follows /etc/hosts, nsswitch.conf)
getent is part of glibc-utils on Debian/Ubuntu; not available on macOS
getent hosts api.example.com
Check your resolver configuration
cat /etc/resolv.conf resolvectl status # systemd-resolved environments`
Enter fullscreen mode
Exit fullscreen mode
The +trace flag on dig is particularly useful when something is broken in the delegation chain itself — wrong glue records, expired DS records, missing NS entries. It shows you every hop.
When dig @authoritative returns the right answer but your application gets something different, the problem is between your application and that authoritative server. Work backwards: your OS cache, your container's resolver, your corporate split-horizon DNS, your VPN's DNS override.
The Thing Worth Holding Onto
I have a love-hate relationship with DNS. It's 40 years old, it was designed when the internet was a few hundred hosts, the trust model was "everyone on the network is trustworthy," and the failure modes of a globally distributed cache were... well not fully thought through.
And yet it works. Hundreds of billions of queries a day, run by competing organisations with no central coordinator, and it almost never falls over. When it does fail, the failures are the worst kind: subtle. An answer that looks valid but is stale. A resolution path that bypasses the record you just updated. A CNAME chain that doesn't behave the way you modelled it. You don't get an error. You get the wrong IP, served confidently. Oh dear maybe LLMs learned from DNS !!!
Every DNS debugging session I've ever had ended the same way: I wasn't querying the layer I thought I was querying. The answer is always cached somewhere you forgot to check. And you do this so rarely you basically have to re-teach yourself the DNS stack each time.
Further Reading
-
man 1 dig — More flags than you'll ever need, but +trace, +short, and @server will cover 90% of debugging sessions.
-
DNS and BIND, 5th ed. — The definitive reference. Dense. Worth having on a shelf.
-
How DNS Works (howdns.works) — Comic-style visual walkthrough of the resolution chain. Good for sharing with someone who's new to it.
-
Cloudflare's DNS Learning Center — Technically accurate, well-illustrated, and they have a vested interest in you understanding DNS.
-
RFC 1034, RFC 1035 — The original 1987 DNS specs. Remarkably readable for an RFC. Much of what's described here is in those two documents.
I'm writing a book about what makes developers irreplaceable in the age of AI. Join the early access list →
Naz Quadri once spent four hours debugging a DNS issue that turned out to be a single line he put into /etc/hosts in 2021. He blogs at nazquadri.dev. Rabbit holes all the way down 🐇🕳️.
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modelavailableupdate
This new phone from a little-known Chinese manufacturer has Ferrari-like styling
Amid the usual barrage of new launches around this time of year from the likes of Samsung and Xiaomi, I’ve been checking out the highest-end device yet from a manufacturer many readers won’t have heard of—but it’s one that marks an unusual collaboration with another brand that might be more familiar. Infinix is a sub-brand of Chinese company Transsion, which also owns the smartphone maker Tecno. The manufacturer is particularly successful in developing smartphone markets like Africa and the Middle East; across all of its brands, Transsion accounts for about half of Africa’s smartphone market share, according to figures from Canalys last year. Infinix largely targets young consumers in the markets where it operates. Its ultra-popular Hot series is designed to be affordable and stylish, whil

Just Because We Can: The Strategic Risks Of Automating Everything
While AI and automation can be powerful, many applications use complex global systems to solve simple problems that could be handled locally. Guest author Itay Sagie shares three risks of undisciplined automation of everything, urging more thoughtful and disciplined use of technology.
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Products

Just Because We Can: The Strategic Risks Of Automating Everything
While AI and automation can be powerful, many applications use complex global systems to solve simple problems that could be handled locally. Guest author Itay Sagie shares three risks of undisciplined automation of everything, urging more thoughtful and disciplined use of technology.
The Morning After: NASA’s Artemis II is on a voyage around the Moon
NASA’s Artemis II successfully launched on April 1 , with its crew on a 10-day mission to circle the Moon. It’s the first crewed Artemis flight and a major step toward humanity returning to our little neighbor in the future. Since launch, the vehicle has separated from its launch system and been manually piloted, testing how the Orion capsule will dock with future lunar landers. There have been some snags, however: The onboard toilet went awry, and Microsoft Outlook has been acting screwy . Jokes aside, there is something magnificent about seeing humanity taking to the stars once again. That, for all of our worst instincts, we can still come together to solve problems and explore beyond our own horizons. — Dan Cooper The other big stories (and deals) this morning SpaceX has reportedly file

The end of the browse-and-click era: The roadmap to agentic commerce
The agentic commerce era is here — but how much does it matter? The short answer: enormously, and more so every day. McKinsey has estimated that five years from now, agentic commerce — meaning AI agents acting autonomously on behalf of consumers to search, compare and buy across platforms — could account for $1 trillion in the US business-to-consumer (B2C) retail market, and $3 trillion to $5 trillion globally. At the January conference of the National Retail Federation, participants noted that what was experimental is fast becoming operationalized , and implementations that used to take two to three months are now being done in a few weeks. Common standards and practices are being established, such as the Agentic AI Foundation , as well as tools like the Agentic Commerce Protocol and the

PMI builds commerce engine to glean customer insights
Counterfeit tobacco sales account for as much as 75% of South Africa’s total market. And while Mary Mahuma, CIO for Southern Africa PMI, admits that the challenge facing the business is significant, she finds solutions by tackling the root cause of the issue: customer insights . According to her, other FMCG brands also struggle to clearly understand consumer behavior, how they engage with brands, and what they actually want. This is especially true in rural and informal markets. “One might expect a brand like PMI to try to address these challenges by focusing on big fish,” she says. “But there’s so much value in better targeting our strategies toward understanding the hidden market for tobacco products.” This market, consisting of small, independent convenience or general trade stores, is



Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!