Most engineers interact with CDNs through a configuration panel, setting a TTL and calling it done. But CDN internals are a rich engineering domain — one I spent years inside at Oracle Cloud Infrastructure's CDN and Load Balancer team. This post goes under the hood: how cache storage actually works at an edge node, how Anycast routing directs users to PoPs, how consistent hashing distributes load, and how the control plane propagates changes to a global fleet.
Cache storage at an edge node is typically a tiered hierarchy designed to maximize throughput while minimizing cost:
┌─────────────────────────────────────────────────────────────────┐
│ EDGE NODE — CACHE STORAGE TIERS │
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ L1: In-Process RAM Cache (hot objects) │ │
│ │ Size: ~4–32 GB per process │ │
│ │ Latency: sub-millisecond │ │
│ │ Eviction: LRU / SLRU (segmented LRU) │ │
│ └──────────────────────┬───────────────────────────────────┘ │
│ │ MISS │
│ ┌──────────────────────▼───────────────────────────────────┐ │
│ │ L2: Shared Memory / mmap SSD Cache │ │
│ │ Size: 100 GB – 2 TB (NVMe SSD) │ │
│ │ Latency: 0.1–1 ms │ │
│ │ Eviction: LIRS / ARC (adaptive) │ │
│ └──────────────────────┬───────────────────────────────────┘ │
│ │ MISS │
│ ┌──────────────────────▼───────────────────────────────────┐ │
│ │ L3: Peer / Sibling Node Cache (optional) │ │
│ │ ICP (Internet Cache Protocol) or custom gossip │ │
│ │ Check nearby edge nodes in same PoP before origin fetch │ │
│ └──────────────────────┬───────────────────────────────────┘ │
│ │ MISS → forward upstream │
└─────────────────────────┼───────────────────────────────────────┘
▼
Shield / Mid-Tier / Origin
Figure 1: Multi-tier cache storage inside a single edge node
Small objects (HTML, JSON, small images) are served entirely from L1 RAM. Large objects (video files, software packages) bypass RAM and are streamed from SSD to the network socket, with only the metadata hot in RAM.
CDNs use two main mechanisms to route users to geographically close PoPs:
The CDN's authoritative DNS resolvers detect the user's IP geolocation and return the A/AAAA records of the nearest PoP cluster. Simple, widely supported, but has a weakness: DNS caching at resolvers can cause users to be directed to stale or distant PoPs, especially behind corporate DNS resolvers that serve large populations.
The CDN announces the same IP address prefix (e.g., 1.2.3.0/24) from multiple PoP locations simultaneously. The internet's BGP routing selects the "closest" PoP based on AS path length. This works at the network layer — no DNS tricks required — and provides sub-second failover when a PoP goes down (BGP withdraws the route).
BGP ANYCAST — SAME IP ANNOUNCED FROM MULTIPLE PoPs
CDN IP: 198.51.100.1 (announced from all PoPs)
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ PoP: Tokyo │ │ PoP: London │ │ PoP: Dallas │
│ announces │ │ announces │ │ announces │
│ 198.51.100.1 │ │ 198.51.100.1 │ │ 198.51.100.1 │
└──────┬───────┘ └──────┬───────┘ └──────┬───────┘
│ │ │
│ BGP AS paths │ │
│ propagate to │ │
└──────────────┬─────────┘ │
│ │
┌─────────▼──────────────────────────────────▼──────┐
│ GLOBAL INTERNET ROUTING FABRIC │
│ │
│ User in Seoul → shortest AS path → Tokyo PoP │
│ User in Paris → shortest AS path → London PoP │
│ User in Houston → shortest AS path → Dallas PoP │
└─────────────────────────────────────────────────────┘
PoP failure: Tokyo BGP session drops → routes withdraw in ~30s
→ Seoul users now routed to next best PoP (Hong Kong)
Figure 2: BGP Anycast routing — same IP prefix announced from multiple PoPs; internet routing selects nearest
Inside a PoP, there are typically multiple edge nodes. The CDN must decide which node stores (and serves) a given URL. Naive modulo hashing fails because adding/removing nodes reshuffles 80%+ of assignments, invalidating most of the cache.
CDNs use consistent hashing with virtual nodes to solve this:
CONSISTENT HASHING — URL to Edge Node Assignment
Hash ring (conceptual circle 0 → 2^32):
0
┌──────┐
╔══╪══╗ │ Node A (virtual nodes: 0x0F.., 0x3A.., 0x7C..)
╔══╬══╬══╬══╗│
║ ║ │ ║ ║│ Node B (virtual nodes: 0x1B.., 0x55.., 0x8E..)
───╫──╫──┼──╫──╫─── 2^31
║ ╚══╬══╝ ║│ Node C (virtual nodes: 0x2D.., 0x6F.., 0xA1..)
╚═════╬═════╝│
└──────┘
2^32
URL hash → walk ring clockwise → first virtual node = responsible node
Adding Node D:
→ Only objects whose hash falls between Node C and Node D
need to be re-fetched (≈ 1/N of total cache, not entire cache)
Without consistent hashing (modulo):
→ Adding 1 node to a 10-node pool reshuffles 90% of cache keys
→ Cache cold-start storm hits origin
Figure 3: Consistent hashing distributes URLs across edge nodes — adding/removing nodes reshuffles only 1/N of keys
When a customer updates their CDN configuration (new cache rules, SSL cert, routing policy), that change must propagate to thousands of edge nodes worldwide consistently and safely.
CDN CONTROL PLANE — CONFIGURATION PROPAGATION
┌──────────────────────────────────────────────────────────────────┐
│ Customer Action: Update cache TTL rule via API / Console │
└──────────────────────────┬───────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────┐
│ CONTROL PLANE API LAYER │
│ → Validate config (schema, limits, conflict detection) │
│ → Write to Config Store (versioned, distributed DB) │
│ → Publish change event to message bus (Kafka / Pulsar) │
└──────────────────────────┬───────────────────────────────────────┘
│ Config change event (version N)
▼
┌──────────────────────────────────────────────────────────────────┐
│ FLEET ORCHESTRATION SERVICE │
│ → Determines rollout strategy: canary → regional → global │
│ → Pushes config version N to PoP agents │
└──────┬──────────────────────┬──────────────────────┬─────────────┘
│ │ │
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│PoP Agent │ │PoP Agent │ │PoP Agent │
│ Singapore │ │ Frankfurt │ │ Ashburn │
│ │ │ │ │ │
│ Pull config │ │ Pull config │ │ Pull config │
│ version N │ │ version N │ │ version N │
│ │ │ │ │ │
│ Hot-reload │ │ Hot-reload │ │ Hot-reload │
│ (no restart)│ │ (no restart)│ │ (no restart)│
│ │ │ │ │ │
│ ACK → fleet │ │ ACK → fleet │ │ ACK → fleet │
│ orchestrator│ │ orchestrator│ │ orchestrator│
└─────────────┘ └─────────────┘ └─────────────┘
│ │ │
└──────────────────────┴──────────────────────┘
│
▼
All PoPs ACK'd → Config Live
Propagation time: <60 seconds globally
Figure 4: Control plane config propagation — versioned, canary-deployed, hot-reloaded at each PoP
CDNs terminate TLS at the edge, which provides significant benefits:
Phil Karlton famously said there are only two hard problems in computer science: cache invalidation and naming things. CDN purging is cache invalidation at global scale.
| Purge Method | Granularity | Use Case | Risk |
|---|---|---|---|
| Single URL purge | One object | Fix a specific broken asset | Low — surgical |
| Wildcard purge | URL pattern (e.g., /images/*) |
Image batch update | Medium — cache miss storm on origin |
| Tag-based purge | Logical group (surrogate key) | Invalidate all pages using a product | Low if tags are well-designed |
| Full cache purge | Everything for a domain | Major deployment, emergency | High — 100% cache miss, origin spike |
main.a3f9c1.js) rather than purges for static assets. Reserve purges for dynamic content that cannot use versioned URLs.
Based on my experience at OCI and publicly available CDN benchmarks:
Understanding these internals changes how you design your caching headers, choose your purge strategy, and instrument your monitoring. The CDN is not a magic black box — it's a distributed system with all the trade-offs that entails.
Written by Sudhir Kumar Tiwari — Senior Engineer, formerly at Oracle Cloud Infrastructure (CDN & Load Balancer team)
Back to Portfolio