Overview

A distributed cache is a network-accessible, in-memory key/value store shared across application instances. In financial services it solves three problems: read latency (sub-millisecond access to reference data), backend protection (absorbing read fan-out from a 5x traffic spike), and session state (because session stickiness on a load balancer is fragile).

Redis dominates this space. Two deployment topologies cover almost every requirement: Sentinel for HA on a single primary, Cluster when data exceeds the memory of one node.

Cache, not database

A cache is a derived view of authoritative data. If losing the cache means losing data, it is a database, not a cache — and a different set of design rules applies (durability, replication semantics, RPO). Resist the urge to use Redis as a primary store for anything you cannot recompute.

Cache patterns

Five patterns cover almost every cache use case. The choice is determined by who writes the cache and when.

PatternReadWriteWhen to use
Cache-asideApp checks cache, falls through to DB on missApp writes DB, invalidates cacheDefault; most read-heavy use cases
Read-throughCache fetches from DB on missSame as cache-asideWhen the cache layer can wrap the DB call
Write-throughApp reads cacheApp writes cache, cache writes DB synchronouslyStrong read-after-write consistency
Write-behindApp reads cacheApp writes cache, cache writes DB asyncBurst-write workloads, can tolerate loss
Refresh-aheadApp reads cacheCache pre-emptively refreshes hot keysPredictable hot data, can spare CPU

Topology

Cluster vs Sentinel

PropertyClusterSentinel
Data sizeUp to N× node memory (sharded)One node’s memory
Multi-key operationsSame hash slot only (use {tag})Unrestricted
Lua scriptsKeys must be in same slotUnrestricted
FailoverBuilt-in via gossipExternal Sentinel processes
Client complexityCluster-aware client (Lettuce, Jedis)Sentinel-aware client
Best fit> 100 GB working set< 100 GB working set, simpler ops

Deploying Redis

  1. Size the working set

    Production Redis should run at < 60% of maxmemory. Anything more and eviction churn destroys p99 latency. Pick node memory based on data size + 60% headroom + replica overhead.

  2. Pick the eviction policy

    allkeys-lru for general cache, volatile-ttl when only some keys have TTL and you want them prioritised for eviction, noeviction for systems where stale-but-not-evicted is the right choice.

  3. Configure persistence

    For pure cache: disable AOF, keep RDB snapshots for crash recovery. For Redis-as-database: AOF with appendfsync everysec. The choice has performance implications.

  4. Deploy with anti-affinity

    Primary and its replica must never be on the same physical host or the same AZ. Use Kubernetes pod anti-affinity or explicit node selectors.

  5. Wire monitoring

    Hit rate, eviction rate, connection count, memory fragmentation, replication lag, slow log. Without these you discover problems by latency complaints.

Configuration

redis.confconf
# Memory
maxmemory 12gb
maxmemory-policy allkeys-lru
maxmemory-samples 10

# Persistence: cache mode
appendonly no
save ""                              # disable RDB during normal ops
save 3600 1                          # light snapshot once an hour

# Replication
replica-read-only yes
repl-backlog-size 256mb
replica-lazy-flush yes

# Cluster
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 5000
cluster-require-full-coverage no

# Slow log threshold (microseconds)
slowlog-log-slower-than 10000
slowlog-max-len 256

# Latency monitor
latency-monitor-threshold 100

# TLS
tls-port 6379
port 0
tls-cert-file /etc/redis/server.crt
tls-key-file /etc/redis/server.key
tls-ca-cert-file /etc/redis/ca.crt
tls-auth-clients yes

Cache-aside in Java with Lettuce

AccountCache.javajava
public Account get(String id) {
  String key = "account:" + id;
  String cached = redis.sync().get(key);
  if (cached != null) {
    return Account.fromJson(cached);
  }

  // miss path: load from DB, populate cache with TTL + jitter
  Account a = accountRepo.findById(id)
      .orElseThrow(() -> new NotFoundException(id));

  long ttl = 300 + ThreadLocalRandom.current().nextInt(60);  // 300–360 s
  redis.sync().setex(key, ttl, a.toJson());
  return a;
}

public void invalidate(String id) {
  redis.sync().del("account:" + id);
}

The jitter on the TTL (300–360 seconds rather than a fixed 300) is the simplest defence against synchronised expiry. Without it, a cohort of keys all written together expire together, and the resulting flood of misses hits the database simultaneously — the classic stampede.

Invalidation

Phil Karlton: “there are only two hard things in computer science: cache invalidation and naming things.” Three patterns work in practice:

  • TTL only. Simplest. Tolerable staleness window equals the TTL. Use for reference data that changes daily or slower.
  • TTL + explicit invalidation on write. Application code that updates the underlying record also calls DEL on the cache key. Use for typical cache-aside.
  • Event-driven invalidation. A CDC stream of database changes drives a cache-invalidation consumer. Use when the writer doesn’t know to invalidate (e.g., DBA changes via SQL, batch updates).
Don’t update, delete

On a write, prefer DEL over SET for the cache key. The next read will repopulate from the source of truth. Updating the cache directly creates a window where the cache and the DB disagree if the write to either fails halfway.

Stampede protection

A cache stampede occurs when a popular key expires and N concurrent requests miss simultaneously, all trying to recompute. Three patterns prevent it:

  • Single-flight (SETNX lock). First miss takes a lock; others poll briefly. Implementation in Redis is straightforward via SET key value NX EX 30.
  • Probabilistic early refresh. Refresh the cache before expiry with probability proportional to how close to expiry the value is. Spreads the load.
  • Stale-while-revalidate. Serve a stale value while a background task refreshes. Two TTLs: hard expiry (delete) and soft expiry (refresh in background, return current).

Single-flight in Lua

single-flight.lualua
-- KEYS[1] = data key, KEYS[2] = lock key
-- ARGV[1] = lock TTL in seconds
local v = redis.call('GET', KEYS[1])
if v then return {'hit', v} end

local got = redis.call('SET', KEYS[2], '1', 'NX', 'EX', ARGV[1])
if got then return {'lead'} end      -- caller should compute and SET
return {'wait'}                          -- caller should retry GET shortly

Observability

Five metrics to alert on in production:

  • Hit rate. If hit rate falls below the historical baseline, something invalidated more than expected. Common cause: a deploy bumped a version prefix and orphaned the old keys.
  • Eviction rate. A non-zero eviction rate when memory is at 60% means key sizes drifted. Investigate before memory pressure becomes latency.
  • p99 latency. Redis is supposed to be sub-millisecond. p99 over 5ms means GC pauses, slow Lua, or hot keys. Check the slow log.
  • Replication lag. Active/passive read-from-replica patterns rely on replicas being current. Lag > 1s means readers are seeing stale data.
  • Connection count. Apps that don’t pool connections leak. A linear climb on connections is a deploy bug.

Common pitfalls

Big keys

A 50 MB Redis value blocks the event loop on every read. Redis is single-threaded for command execution; one big key kills latency for every other client. Cap value size at ~1 MB; for larger objects, store the bytes elsewhere and cache a pointer.

Hot key

One key receiving 80% of the traffic saturates one node’s CPU. Detect with --hotkeys sampling. Mitigate by sharding the key (customer:1234:1, customer:1234:2...) and reading round-robin, or by adding a small in-process cache in front of Redis.

KEYS in production

The KEYS pattern command is O(N) and blocks. Use SCAN for any iteration. Lock down KEYS, FLUSHALL, and FLUSHDB via ACL in production.

No connection pool

Creating a Redis connection per request collapses under load. Configure Lettuce/Jedis pooling with min and max sizes; reuse the client across requests.

When not to cache

Caching is the right tool for read-heavy workloads with reusable values and tolerable staleness. It is the wrong tool when:

  • The data is unique per request. No reuse, no benefit. Just hit the database.
  • Strong consistency is required. If the bank cannot show a transaction one second after it’s booked, you don’t want a cache between the user and the ledger.
  • The workload is write-heavy. Write-through caches add latency without read benefit. Write-behind risks loss.
  • The DB is fast enough. Adding a cache adds a network hop, a cache-coherence problem, and an operational dependency. If the DB serves the read in 2 ms, a 1 ms cache hit is rarely worth the complexity.