System Design

Caching Strategies Every Backend Developer Should Know

April 24, 2026 · 12 min read

Phil Karlton is supposed to have said that there are only two hard problems in computer science: cache invalidation and naming things. The joke has aged well because cache invalidation genuinely is hard, and most of the production incidents we have been paged for involve a cache that lied about reality for longer than expected.

Caching is also the single most effective performance technique at your disposal. A 10ms database query cached at the application layer is a 100-microsecond Redis call. Cached at the CDN, it is a 10-microsecond edge hit. The ratios matter. This article covers the patterns you need fluency in, the common failure modes, and how to reason about when a cache is worth its complexity.

Why caching works

Caching exploits the fact that most data access patterns have strong locality — the 20% of your content that receives 80% of the traffic, the same user making the same query twice in a session, the product page loaded a million times a day. A small, fast store holding the hot subset dramatically reduces load on the slow, large store behind it.

The costs are: memory (caches hold data you are not using right now), complexity (every cache is a second copy of the truth, which can diverge), and the possibility of serving stale data. The decision to cache is a tradeoff, not a free win.

The layers of caching

When people say "we cached it" they could mean any of several different layers. Understanding which layer solves which problem is foundational.

CPU cache (L1, L2, L3)

The fastest cache in your system, entirely managed by the hardware. Not something you directly control from application code, but you can influence it through data layout (cache-friendly structures, avoiding pointer chasing). Relevant when you are optimising hot loops at microsecond scale.

In-process memory

A hashmap in your application's memory. Fastest cache your code can control. Limited by the process's RAM, lost on restart, and not shared between processes or servers. Useful for data that is read constantly and is cheap to rebuild on restart.

Distributed cache (Redis, Memcached)

The default modern caching layer. A separate service that multiple application instances can share. Sub-millisecond latency, survives application restarts, typically sized in gigabytes. Almost every non-trivial backend has one.

Database query cache

Many databases maintain internal caches of recent results (Postgres's buffer cache, MySQL's query cache in older versions). You get these for free; knowing they exist helps you interpret query latency numbers.

HTTP / CDN cache

Cloudflare, Fastly, CloudFront cache responses at the network edge, hundreds of milliseconds closer to the user than your origin. Ideal for static assets and cacheable API responses. The largest absolute latency win of any caching layer.

Browser cache

The last layer. For returning users, cached assets are never requested again. Configured through HTTP headers (Cache-Control, ETag, Last-Modified). Undervalued because it affects repeat visits more than first visits, but repeat visits are most of your traffic.

The five core patterns

Cache-aside (lazy loading)

The application checks the cache first. On miss, it loads from the database and writes the result back to the cache. Reads are fast on hit, slow on miss. Writes go only to the database — the cache entry is invalidated or left to expire.

value = cache.get(key)
if value is None:
    value = db.query(key)
    cache.set(key, value, ttl=300)
return value

This is the default pattern for good reasons: simple, flexible, and resilient to cache failures (the database is still the source of truth). Downsides: every new key has a "cold" first access that hits the database, and there is no guarantee of consistency between cache and database.

sequenceDiagram participant A as App participant C as Cache participant D as Database Note over A,D: Cache miss (first access) A->>C: GET key C-->>A: nil A->>D: SELECT row D-->>A: row A->>C: SET key value (TTL=300) A->>A: return value Note over A,D: Cache hit (subsequent access) A->>C: GET key C-->>A: value A->>A: return value

Cache-aside: the first read pays the database cost and populates the cache. Every subsequent read within the TTL is served from cache.

Read-through

The cache sits in front of the database. The application reads from the cache only; the cache loads from the database on miss. Similar to cache-aside but the loading logic lives in the cache layer. Most managed caching services support this.

Write-through

Writes go to the cache, which synchronously updates the database. Reads are always from the cache. Every write pays the cost of both operations, but the cache is never stale — a major win for consistency at the cost of write latency.

Write-back (write-behind)

Writes go to the cache. The cache asynchronously persists to the database. Lowest write latency, but if the cache node dies before the persist completes, you lose data. Useful for high-write-volume scenarios where occasional data loss is acceptable (analytics counters, session data).

Refresh-ahead

The cache proactively refreshes popular entries before they expire, ensuring high-traffic keys never miss. Requires prediction of which keys to refresh, and adds background load to the database. Used for read-heavy workloads with predictable hot keys (a news site's front page).

Cache invalidation: the actual hard part

Keeping the cache consistent with the underlying data is where most caching incidents originate. Three strategies, each with tradeoffs:

TTL (Time-to-live) expiration

Every cache entry has a maximum age. After that, it is considered stale and must be reloaded. The simplest strategy, and acceptable for data that does not change often or where staleness is tolerable.

Choosing the TTL is an art. Too short and you lose the cache's benefit. Too long and users see outdated data. A common pattern is tiered TTLs: one minute for frequently changing data, one hour for moderate, one day for rare. There is no right answer — you pick based on how painful stale data is for that particular entry.

Event-driven invalidation

When the underlying data changes, the cache entry is deleted or updated. Strongly consistent on the happy path, but requires every write path to remember to invalidate, and has edge cases when the invalidation message is lost or arrives out of order.

The two common implementations: direct invalidation (the code that writes to the database also deletes the cache entry) and pub/sub (writes emit events that consumers use to invalidate). The first is simpler but tightly couples every write site to cache knowledge. The second scales better but adds a system to operate.

Write-through invalidation

Combine write-through caching with event-driven invalidation: every write updates both the database and the cache atomically. Avoids staleness entirely, at the cost of coupling writes to cache availability.

Cache stampede and how to avoid it

Imagine a popular cache entry with a 60-second TTL. At the 60-second mark it expires. A thousand incoming requests hit the cache simultaneously, all see a miss, and all start querying the database to rebuild it. The database, suddenly hit with a thousand concurrent copies of the same expensive query, collapses.

This is the thundering herd problem, also called cache stampede. Three defences:

Request coalescing

When multiple requests for the same key miss the cache simultaneously, only one actually runs the expensive query; the others wait for its result. Implemented with a shared in-memory lock or a distributed lock (Redis has primitives for this).

Probabilistic early expiration

Instead of expiring exactly at TTL, each request has a small random chance of treating the entry as expired shortly before its real TTL. This spreads cache refresh across a window rather than concentrating it at a single moment. A technique from the XFetch paper.

Stale-while-revalidate

Serve the stale entry immediately while asynchronously fetching a fresh one in the background. Users see the stale data for the refresh duration, but never a cache miss. Standard HTTP header behaviour (stale-while-revalidate) and widely supported.

The hot key problem

Occasionally a single key receives so much traffic that even the cache cannot serve it fast enough. One Redis node pegs at 100% CPU trying to answer requests for the homepage content. Classic symptoms: "Redis is slow", "the cache is saturated", "we need a bigger instance".

The actual problem is concentration, not capacity. Strategies:

Client-side cache in the application for the hottest keys. Each app instance has an in-process copy of the top 100 keys, refreshed once per second. Redis traffic drops by orders of magnitude.
Sharding by key so the hot key spreads across multiple nodes. Complicated if you need consistency across the shards.
Request deduplication so identical concurrent reads share a single cache hit.

Monitoring: the numbers that matter

A cache you are not monitoring is a cache that will surprise you. The five metrics to track per cache:

Hit rate

Percentage of requests served from cache. Below 80% is usually a sign the TTL is too short, the cache is too small, or the access pattern has low locality. Above 99% may mean your cache is holding cold data and you could shrink it.

Miss rate by reason

A miss because the key was never cached is different from a miss because the key was evicted. Breaking them down tells you whether you need a larger cache (evictions) or a different caching strategy (cold entries).

Latency percentiles

P50 tells you the typical case. P99 tells you the worst 1% of users' experience. A cache with 0.5ms P50 and 50ms P99 is failing for some users — probably due to GC pauses, network hiccups, or hot keys.

Eviction rate

How often the cache is dropping entries because it is full. A high eviction rate with a low hit rate means your cache is too small for the working set.

Staleness

How old the average served entry is. Hard to measure precisely, but worth approximating through sampled checks against the underlying store.

When not to cache

Caching is so reflexively defaulted to that its genuine downsides are often missed. Skip the cache when:

The read volume is low. A query that runs twice per minute does not need a cache. The complexity is not worth the tiny latency win.
Consistency is non-negotiable. Financial balances, inventory counts under flash-sale pressure, authentication tokens. The cost of serving stale data is higher than the cost of hitting the database every time.
The data is highly personalised. Cache hit rates collapse when every user sees different content. A cache with 2% hit rate is worse than no cache because you pay the cache lookup cost on every miss.
The computation is already fast enough. A primary-key lookup on an indexed Postgres column is sub-millisecond. Caching it in Redis saves you nothing and introduces a new consistency problem.

Frequently Asked Questions

Redis or Memcached?

Redis for almost everything in 2026. Memcached is simpler and has a slight edge in raw throughput for pure key-value, but Redis's richer data structures (lists, sets, sorted sets, streams), pub/sub, persistence, and clustering make it the default choice. Memcached still wins in specific niches (very high throughput on uniform workloads) but the gap has narrowed.

Should I cache at the application layer or the database layer?

Both, usually. Database buffer caches handle repeated queries within the database's own lifecycle. Application-layer caches handle the much longer round-trip between your service and the database. They serve different purposes and are not mutually exclusive.

How long should my cache TTL be?

Depends on how quickly the underlying data changes and how stale is tolerable. For most content that changes daily, 5-15 minutes is a reasonable default. For real-time dashboards, 30 seconds to 2 minutes. For static catalogue data, an hour or more. Start with a conservative value, measure hit rate, adjust.

What is the difference between caching and a CDN?

A CDN is a cache that lives closer to the user geographically — typically operated by Cloudflare, Fastly, or CloudFront. The same caching principles apply, but the primary goal is latency reduction through geographic proximity rather than computational speedup. For static assets, CDN caching is the highest-leverage win available.

Published by the MyByteNest editorial team · Spot a technical error? Tell us.

Share your thoughts

Worked with this in production and have a story to share, or disagree with a tradeoff? Email us at support@mybytenest.com — we read everything.