Distributed Caching — Redis Cluster & Sentinel — Lab

Stack: Redis 7 / Lettuce / Spring Cache Audience: Engineers Read: ~14 min Author: Waleed Albadawi

Overview

A distributed cache is a network-accessible, in-memory key/value store shared across application instances. In financial services it solves three problems: read latency (sub-millisecond access to reference data), backend protection (absorbing read fan-out from a 5x traffic spike), and session state (because session stickiness on a load balancer is fragile).

Redis dominates this space. Two deployment topologies cover almost every requirement: Sentinel for HA on a single primary, Cluster when data exceeds the memory of one node.

Cache, not database

A cache is a derived view of authoritative data. If losing the cache means losing data, it is a database, not a cache — and a different set of design rules applies (durability, replication semantics, RPO). Resist the urge to use Redis as a primary store for anything you cannot recompute.

Cache patterns

Five patterns cover almost every cache use case. The choice is determined by who writes the cache and when.

Pattern	Read	Write	When to use
Cache-aside	App checks cache, falls through to DB on miss	App writes DB, invalidates cache	Default; most read-heavy use cases
Read-through	Cache fetches from DB on miss	Same as cache-aside	When the cache layer can wrap the DB call
Write-through	App reads cache	App writes cache, cache writes DB synchronously	Strong read-after-write consistency
Write-behind	App reads cache	App writes cache, cache writes DB async	Burst-write workloads, can tolerate loss
Refresh-ahead	App reads cache	Cache pre-emptively refreshes hot keys	Predictable hot data, can spare CPU

Topology

Figure 1 — Redis Cluster (sharded) vs Sentinel (single primary, HA)

Cluster vs Sentinel

Property	Cluster	Sentinel
Data size	Up to N× node memory (sharded)	One node’s memory
Multi-key operations	Same hash slot only (use `{tag}`)	Unrestricted
Lua scripts	Keys must be in same slot	Unrestricted
Failover	Built-in via gossip	External Sentinel processes
Client complexity	Cluster-aware client (Lettuce, Jedis)	Sentinel-aware client
Best fit	> 100 GB working set	< 100 GB working set, simpler ops

Deploying Redis

Size the working set

Production Redis should run at < 60% of maxmemory. Anything more and eviction churn destroys p99 latency. Pick node memory based on data size + 60% headroom + replica overhead.
Pick the eviction policy

allkeys-lru for general cache, volatile-ttl when only some keys have TTL and you want them prioritised for eviction, noeviction for systems where stale-but-not-evicted is the right choice.
Configure persistence

For pure cache: disable AOF, keep RDB snapshots for crash recovery. For Redis-as-database: AOF with appendfsync everysec. The choice has performance implications.
Deploy with anti-affinity

Primary and its replica must never be on the same physical host or the same AZ. Use Kubernetes pod anti-affinity or explicit node selectors.
Wire monitoring

Hit rate, eviction rate, connection count, memory fragmentation, replication lag, slow log. Without these you discover problems by latency complaints.

Configuration

# Memory
maxmemory 12gb
maxmemory-policy allkeys-lru
maxmemory-samples 10

# Persistence: cache mode
appendonly no
save ""                              # disable RDB during normal ops
save 3600 1                          # light snapshot once an hour

# Replication
replica-read-only yes
repl-backlog-size 256mb
replica-lazy-flush yes

# Cluster
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 5000
cluster-require-full-coverage no

# Slow log threshold (microseconds)
slowlog-log-slower-than 10000
slowlog-max-len 256

# Latency monitor
latency-monitor-threshold 100

# TLS
tls-port 6379
port 0
tls-cert-file /etc/redis/server.crt
tls-key-file /etc/redis/server.key
tls-ca-cert-file /etc/redis/ca.crt
tls-auth-clients yes

Cache-aside in Java with Lettuce

public Account get(String id) {
  String key = "account:" + id;
  String cached = redis.sync().get(key);
  if (cached != null) {
    return Account.fromJson(cached);
  }

  // miss path: load from DB, populate cache with TTL + jitter
  Account a = accountRepo.findById(id)
      .orElseThrow(() -> new NotFoundException(id));

  long ttl = 300 + ThreadLocalRandom.current().nextInt(60);  // 300–360 s
  redis.sync().setex(key, ttl, a.toJson());
  return a;
}

public void invalidate(String id) {
  redis.sync().del("account:" + id);
}

The jitter on the TTL (300–360 seconds rather than a fixed 300) is the simplest defence against synchronised expiry. Without it, a cohort of keys all written together expire together, and the resulting flood of misses hits the database simultaneously — the classic stampede.

Invalidation

Phil Karlton: “there are only two hard things in computer science: cache invalidation and naming things.” Three patterns work in practice:

TTL only. Simplest. Tolerable staleness window equals the TTL. Use for reference data that changes daily or slower.
TTL + explicit invalidation on write. Application code that updates the underlying record also calls DEL on the cache key. Use for typical cache-aside.
Event-driven invalidation. A CDC stream of database changes drives a cache-invalidation consumer. Use when the writer doesn’t know to invalidate (e.g., DBA changes via SQL, batch updates).

Don’t update, delete

On a write, prefer DEL over SET for the cache key. The next read will repopulate from the source of truth. Updating the cache directly creates a window where the cache and the DB disagree if the write to either fails halfway.

Stampede protection

A cache stampede occurs when a popular key expires and N concurrent requests miss simultaneously, all trying to recompute. Three patterns prevent it:

Figure 2 — Cache stampede with and without single-flight

Single-flight (SETNX lock). First miss takes a lock; others poll briefly. Implementation in Redis is straightforward via SET key value NX EX 30.
Probabilistic early refresh. Refresh the cache before expiry with probability proportional to how close to expiry the value is. Spreads the load.
Stale-while-revalidate. Serve a stale value while a background task refreshes. Two TTLs: hard expiry (delete) and soft expiry (refresh in background, return current).

Single-flight in Lua

-- KEYS[1] = data key, KEYS[2] = lock key
-- ARGV[1] = lock TTL in seconds
local v = redis.call('GET', KEYS[1])
if v then return {'hit', v} end

local got = redis.call('SET', KEYS[2], '1', 'NX', 'EX', ARGV[1])
if got then return {'lead'} end      -- caller should compute and SET
return {'wait'}                          -- caller should retry GET shortly

Observability

Five metrics to alert on in production:

Hit rate. If hit rate falls below the historical baseline, something invalidated more than expected. Common cause: a deploy bumped a version prefix and orphaned the old keys.
Eviction rate. A non-zero eviction rate when memory is at 60% means key sizes drifted. Investigate before memory pressure becomes latency.
p99 latency. Redis is supposed to be sub-millisecond. p99 over 5ms means GC pauses, slow Lua, or hot keys. Check the slow log.
Replication lag. Active/passive read-from-replica patterns rely on replicas being current. Lag > 1s means readers are seeing stale data.
Connection count. Apps that don’t pool connections leak. A linear climb on connections is a deploy bug.

Common pitfalls

Big keys

A 50 MB Redis value blocks the event loop on every read. Redis is single-threaded for command execution; one big key kills latency for every other client. Cap value size at ~1 MB; for larger objects, store the bytes elsewhere and cache a pointer.

Hot key

One key receiving 80% of the traffic saturates one node’s CPU. Detect with --hotkeys sampling. Mitigate by sharding the key (customer:1234:1, customer:1234:2...) and reading round-robin, or by adding a small in-process cache in front of Redis.

KEYS in production

The KEYS pattern command is O(N) and blocks. Use SCAN for any iteration. Lock down KEYS, FLUSHALL, and FLUSHDB via ACL in production.

No connection pool

Creating a Redis connection per request collapses under load. Configure Lettuce/Jedis pooling with min and max sizes; reuse the client across requests.

When not to cache

Caching is the right tool for read-heavy workloads with reusable values and tolerable staleness. It is the wrong tool when:

The data is unique per request. No reuse, no benefit. Just hit the database.
Strong consistency is required. If the bank cannot show a transaction one second after it’s booked, you don’t want a cache between the user and the ledger.
The workload is write-heavy. Write-through caches add latency without read benefit. Write-behind risks loss.
The DB is fast enough. Adding a cache adds a network hop, a cache-coherence problem, and an operational dependency. If the DB serves the read in 2 ms, a 1 ms cache hit is rarely worth the complexity.

Distributed Caching: Redis Cluster & Sentinel

Overview

Cache patterns

Topology

Cluster vs Sentinel

Deploying Redis

Size the working set

Pick the eviction policy

Configure persistence

Deploy with anti-affinity

Wire monitoring