Overview
A distributed cache is a network-accessible, in-memory key/value store shared across application instances. In financial services it solves three problems: read latency (sub-millisecond access to reference data), backend protection (absorbing read fan-out from a 5x traffic spike), and session state (because session stickiness on a load balancer is fragile).
Redis dominates this space. Two deployment topologies cover almost every requirement: Sentinel for HA on a single primary, Cluster when data exceeds the memory of one node.
A cache is a derived view of authoritative data. If losing the cache means losing data, it is a database, not a cache — and a different set of design rules applies (durability, replication semantics, RPO). Resist the urge to use Redis as a primary store for anything you cannot recompute.
Cache patterns
Five patterns cover almost every cache use case. The choice is determined by who writes the cache and when.
| Pattern | Read | Write | When to use |
|---|---|---|---|
| Cache-aside | App checks cache, falls through to DB on miss | App writes DB, invalidates cache | Default; most read-heavy use cases |
| Read-through | Cache fetches from DB on miss | Same as cache-aside | When the cache layer can wrap the DB call |
| Write-through | App reads cache | App writes cache, cache writes DB synchronously | Strong read-after-write consistency |
| Write-behind | App reads cache | App writes cache, cache writes DB async | Burst-write workloads, can tolerate loss |
| Refresh-ahead | App reads cache | Cache pre-emptively refreshes hot keys | Predictable hot data, can spare CPU |
Topology
Cluster vs Sentinel
| Property | Cluster | Sentinel |
|---|---|---|
| Data size | Up to N× node memory (sharded) | One node’s memory |
| Multi-key operations | Same hash slot only (use {tag}) | Unrestricted |
| Lua scripts | Keys must be in same slot | Unrestricted |
| Failover | Built-in via gossip | External Sentinel processes |
| Client complexity | Cluster-aware client (Lettuce, Jedis) | Sentinel-aware client |
| Best fit | > 100 GB working set | < 100 GB working set, simpler ops |
Deploying Redis
-
Size the working set
Production Redis should run at < 60% of
maxmemory. Anything more and eviction churn destroys p99 latency. Pick node memory based on data size + 60% headroom + replica overhead. -
Pick the eviction policy
allkeys-lrufor general cache,volatile-ttlwhen only some keys have TTL and you want them prioritised for eviction,noevictionfor systems where stale-but-not-evicted is the right choice. -
Configure persistence
For pure cache: disable AOF, keep RDB snapshots for crash recovery. For Redis-as-database: AOF with
appendfsync everysec. The choice has performance implications. -
Deploy with anti-affinity
Primary and its replica must never be on the same physical host or the same AZ. Use Kubernetes pod anti-affinity or explicit node selectors.
-
Wire monitoring
Hit rate, eviction rate, connection count, memory fragmentation, replication lag, slow log. Without these you discover problems by latency complaints.
Configuration
# Memory
maxmemory 12gb
maxmemory-policy allkeys-lru
maxmemory-samples 10
# Persistence: cache mode
appendonly no
save "" # disable RDB during normal ops
save 3600 1 # light snapshot once an hour
# Replication
replica-read-only yes
repl-backlog-size 256mb
replica-lazy-flush yes
# Cluster
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 5000
cluster-require-full-coverage no
# Slow log threshold (microseconds)
slowlog-log-slower-than 10000
slowlog-max-len 256
# Latency monitor
latency-monitor-threshold 100
# TLS
tls-port 6379
port 0
tls-cert-file /etc/redis/server.crt
tls-key-file /etc/redis/server.key
tls-ca-cert-file /etc/redis/ca.crt
tls-auth-clients yes
Cache-aside in Java with Lettuce
public Account get(String id) {
String key = "account:" + id;
String cached = redis.sync().get(key);
if (cached != null) {
return Account.fromJson(cached);
}
// miss path: load from DB, populate cache with TTL + jitter
Account a = accountRepo.findById(id)
.orElseThrow(() -> new NotFoundException(id));
long ttl = 300 + ThreadLocalRandom.current().nextInt(60); // 300–360 s
redis.sync().setex(key, ttl, a.toJson());
return a;
}
public void invalidate(String id) {
redis.sync().del("account:" + id);
}
The jitter on the TTL (300–360 seconds rather than a fixed 300) is the simplest defence against synchronised expiry. Without it, a cohort of keys all written together expire together, and the resulting flood of misses hits the database simultaneously — the classic stampede.
Invalidation
Phil Karlton: “there are only two hard things in computer science: cache invalidation and naming things.” Three patterns work in practice:
- TTL only. Simplest. Tolerable staleness window equals the TTL. Use for reference data that changes daily or slower.
- TTL + explicit invalidation on write. Application code that updates the underlying record also calls
DELon the cache key. Use for typical cache-aside. - Event-driven invalidation. A CDC stream of database changes drives a cache-invalidation consumer. Use when the writer doesn’t know to invalidate (e.g., DBA changes via SQL, batch updates).
On a write, prefer DEL over SET for the cache key. The next read will repopulate from the source of truth. Updating the cache directly creates a window where the cache and the DB disagree if the write to either fails halfway.
Stampede protection
A cache stampede occurs when a popular key expires and N concurrent requests miss simultaneously, all trying to recompute. Three patterns prevent it:
- Single-flight (SETNX lock). First miss takes a lock; others poll briefly. Implementation in Redis is straightforward via
SET key value NX EX 30. - Probabilistic early refresh. Refresh the cache before expiry with probability proportional to how close to expiry the value is. Spreads the load.
- Stale-while-revalidate. Serve a stale value while a background task refreshes. Two TTLs: hard expiry (delete) and soft expiry (refresh in background, return current).
Single-flight in Lua
-- KEYS[1] = data key, KEYS[2] = lock key
-- ARGV[1] = lock TTL in seconds
local v = redis.call('GET', KEYS[1])
if v then return {'hit', v} end
local got = redis.call('SET', KEYS[2], '1', 'NX', 'EX', ARGV[1])
if got then return {'lead'} end -- caller should compute and SET
return {'wait'} -- caller should retry GET shortly
Observability
Five metrics to alert on in production:
- Hit rate. If hit rate falls below the historical baseline, something invalidated more than expected. Common cause: a deploy bumped a version prefix and orphaned the old keys.
- Eviction rate. A non-zero eviction rate when memory is at 60% means key sizes drifted. Investigate before memory pressure becomes latency.
- p99 latency. Redis is supposed to be sub-millisecond. p99 over 5ms means GC pauses, slow Lua, or hot keys. Check the slow log.
- Replication lag. Active/passive read-from-replica patterns rely on replicas being current. Lag > 1s means readers are seeing stale data.
- Connection count. Apps that don’t pool connections leak. A linear climb on connections is a deploy bug.
Common pitfalls
A 50 MB Redis value blocks the event loop on every read. Redis is single-threaded for command execution; one big key kills latency for every other client. Cap value size at ~1 MB; for larger objects, store the bytes elsewhere and cache a pointer.
One key receiving 80% of the traffic saturates one node’s CPU. Detect with --hotkeys sampling. Mitigate by sharding the key (customer:1234:1, customer:1234:2...) and reading round-robin, or by adding a small in-process cache in front of Redis.
The KEYS pattern command is O(N) and blocks. Use SCAN for any iteration. Lock down KEYS, FLUSHALL, and FLUSHDB via ACL in production.
Creating a Redis connection per request collapses under load. Configure Lettuce/Jedis pooling with min and max sizes; reuse the client across requests.
When not to cache
Caching is the right tool for read-heavy workloads with reusable values and tolerable staleness. It is the wrong tool when:
- The data is unique per request. No reuse, no benefit. Just hit the database.
- Strong consistency is required. If the bank cannot show a transaction one second after it’s booked, you don’t want a cache between the user and the ledger.
- The workload is write-heavy. Write-through caches add latency without read benefit. Write-behind risks loss.
- The DB is fast enough. Adding a cache adds a network hop, a cache-coherence problem, and an operational dependency. If the DB serves the read in 2 ms, a 1 ms cache hit is rarely worth the complexity.