Idempotency Keys at Scale

Stack: Postgres / Redis / Spring / OpenAPI Audience: Engineers Read: ~13 min Author: Waleed Albadawi

Overview

Every payment API in production gets retried. Networks drop packets, gateways time out, mobile clients lose connectivity mid-request, batch jobs replay yesterday’s queue. The retry is unavoidable. The question is whether the retry produces a second payment or a second look at the first one.

Idempotency keys are the contract that turns a retried POST into a safe operation. The client sends a unique key with each new request; the server records the first outcome under that key and returns the same outcome for any subsequent request with the same key. Done correctly, the customer charges once even if the network told the merchant the request failed five times in a row. Done incorrectly, the customer is charged five times and the bank discovers it in a complaint queue two weeks later.

The default for every write endpoint

If a banking API can move money, change account state, submit a transfer, or trigger a downstream effect that costs money to undo, it requires idempotency keys. This is not an optimisation. It is the minimum viable contract for a write endpoint that crosses a network in a regulated environment.

Why retries are not optional

Distributed systems retry. They retry because the alternative is dropping requests, and dropping requests is a worse failure mode than possibly duplicating them — provided the receiver knows how to deduplicate. Three retry sources matter in practice.

Client-side retries. Mobile apps, merchant POS terminals, third-party PSPs. The client got no response (or a 5xx) and tries again. The first request may have already been processed; the client has no way to tell.
Gateway and proxy retries. Load balancers, API gateways, service meshes. A request times out at one hop and the proxy retries to a different backend instance. The original instance may have completed the work.
Replay and recovery. A queue replay after an outage, a CDC stream restart that resends a window of events, a manual operator replay of a stuck batch. The system intentionally re-presents requests it has already seen.

Without idempotency keys, every one of these is a potential duplicate. With idempotency keys, every one of them is recoverable.

The API contract

The HTTP convention is straightforward. The client supplies an Idempotency-Key header on every POST, PATCH, and DELETE for resources where duplicate execution is unsafe. The key is opaque to the server — a UUID v4 from the client is the right default.

POST /v1/payments HTTP/1.1
Host: api.bank.example
Authorization: Bearer eyJ...
Idempotency-Key: 7f9c3b2e-4a91-4d2c-88f1-2e0f3a1b9c67
Content-Type: application/json
Content-Length: 142

{
  "amount": "125.00",
  "currency": "SAR",
  "creditor_iban": "SA0380000000608010167519",
  "reference": "INV-44219"
}

The server’s contract has four cases.

New key, new request

Process normally. Record the key, the request fingerprint, the response, and the resulting resource id. Return the response.
Same key, identical request, processing complete

Return the cached response with the original status code. Do not re-execute. The client cannot tell whether they’re reading the cache or processing fresh — that’s the point.
Same key, identical request, still processing

Return 409 Conflict with a Retry-After header, or block on the in-flight request and return its result. Either is defensible; pick one and document it.
Same key, different request body

Return 422 Unprocessable Entity with a clear error code (idempotency_key_mismatch). The key was reused with different inputs; this is a client bug, not a retry.

Request fingerprinting

Case 4 above is where most implementations get it wrong. To detect a key reused with a different body, the server must compare the new request to the original. Storing the entire request payload is wasteful and creates a regulatory data-retention question. The right answer is a deterministic fingerprint — SHA-256 of the canonicalised request body, computed at API ingress and stored alongside the key.

public static String fingerprint(JsonNode body) {
  // Canonicalise: sort keys, normalise whitespace, lowercase optional fields
  String canonical = JsonCanonicalizer.canonicalize(body);
  return Hex.encode(Sha256.digest(canonical.getBytes(StandardCharsets.UTF_8)));
}

The canonicalisation step matters. Two clients sending the same logical request with different field ordering or different whitespace must produce the same fingerprint. RFC 8785 (JSON Canonicalization Scheme) is the right reference; library implementations exist for every server language.

What goes into the fingerprint

Hash the request body only. Do not include the URL path (the key is already scoped to the endpoint), do not include headers (transport-layer noise), do not include timestamps the client added. Including any of those guarantees that a legitimate client retry will be flagged as a body mismatch.

Storage choices

Three storage shapes, each with a different operational profile. The right choice depends on traffic volume, retention requirements, and how the bank already operates the relevant infrastructure.

Figure 1 — Three idempotency storage architectures and their operational trade-offs

Redis only. Fast, simple, fits the access pattern (hash lookup by key). The problem is durability and audit. Redis with persistence and replication is good infrastructure, but it is not a system of record. A regulator asking why a payment processed twice on March 14 will not accept “the cache lost the entry” as an answer. Acceptable for low-stakes endpoints; not acceptable for write paths that move money.

Postgres only. Durable and auditable. The idempotency table is a system of record; the unique constraint on the key is enforced atomically; the row carries the request fingerprint, response, and timestamps for the audit story. The cost is throughput — every POST becomes a database round-trip on the hot path. Acceptable for moderate-volume transactional APIs; can become a bottleneck for high-RPS endpoints.

Hybrid. Redis as the hot cache, Postgres as the durable record. Reads check Redis first, fall back to Postgres on miss, populate Redis for next time. Writes go to Postgres synchronously and Redis on success. This is the production default for any payment-grade API at the volume of a national bank: the latency profile of Redis with the audit and durability of Postgres, at the cost of operational complexity and the need to keep the two stores consistent under partial failures.

The end-to-end flow

CREATE TABLE idempotency_keys (
  key_value      varchar(128)  PRIMARY KEY,
  endpoint       varchar(64)   NOT NULL,
  fingerprint    char(64)      NOT NULL,            -- sha256 hex
  status         varchar(16)   NOT NULL,            -- in_flight | done | failed
  response_code  int,
  response_body  jsonb,
  resource_id    uuid,                                -- e.g., payment id
  created_at     timestamptz   NOT NULL DEFAULT now(),
  completed_at   timestamptz,
  expires_at     timestamptz   NOT NULL
);

CREATE INDEX idx_idempotency_expires ON idempotency_keys (expires_at);
CREATE INDEX idx_idempotency_resource ON idempotency_keys (resource_id);

@PostMapping("/v1/payments")
public ResponseEntity<PaymentResponse> submit(
    @RequestHeader("Idempotency-Key") String key,
    @RequestBody PaymentRequest req) {

  String fp = Fingerprint.of(req);

  // 1. Try to claim the key. INSERT ... ON CONFLICT DO NOTHING is atomic.
  boolean claimed = idempotencyRepo.tryClaim(key, "/v1/payments", fp);

  if (!claimed) {
    IdempotencyRecord existing = idempotencyRepo.find(key);

    // 2. Same key, different fingerprint? Client bug.
    if (!existing.fingerprint().equals(fp))
      throw new IdempotencyMismatchException();

    // 3. Still in flight? Tell the client to wait.
    if (existing.status() == Status.IN_FLIGHT)
      return ResponseEntity.status(409).header("Retry-After", "2").build();

    // 4. Already done. Replay the original response.
    return ResponseEntity.status(existing.responseCode()).body(existing.response());
  }

  // 5. We claimed the key. Process for real.
  try {
    PaymentResponse resp = paymentService.submit(req);
    idempotencyRepo.complete(key, 201, resp);
    return ResponseEntity.status(201).body(resp);
  } catch (Exception e) {
    idempotencyRepo.fail(key, e);
    throw e;
  }
}

The atomic claim is the entire trick. INSERT ... ON CONFLICT DO NOTHING in Postgres (or SET key value NX in Redis) lets exactly one request win the race for any given key. Every other path is a deterministic read of the winner’s state.

Concurrency & locking

Two requests with the same key arriving within milliseconds is the case to design for, not against. Without explicit handling, the second request can read the first’s in-flight row, decide the work is in progress, and the client gets a 409. That is correct — but it is not always what you want. A merchant retrying a request 200ms after the first has timed out their HTTP call wants the result, not a 409.

Two patterns address this. The simpler is to return 409 Retry-After: 2 and let the client retry. The more advanced is to wait on the in-flight request — advisory lock the key in Postgres, block until the first request completes, then return its result. The second pattern is more user-friendly but ties up a server thread for the duration of the original request. For payment APIs, where the original request usually completes in under 500ms, this is a reasonable trade-off. For long-running APIs, the 409 pattern is safer.

Postgres advisory locks for “wait for the original”

SELECT pg_try_advisory_xact_lock(hashtext(key)) within a transaction lets the second arrival block until the first commits. Use hashtext rather than the raw key to fit Postgres’s 64-bit lock space. Release on commit happens automatically.

TTL & cleanup

Idempotency keys do not need to live forever. The retry window for a real client is minutes to hours. The audit window is longer — in regulated banking, often years — but the storage requirement for the two is different.

Hot retention (Redis): 24 hours. Long enough to catch every legitimate retry from any reasonable client. Short enough that Redis memory stays bounded.
Cold retention (Postgres): 7 years for payment APIs. Aligns with SAMA’s record retention expectations for payment evidence. The table is partitioned by month and old partitions are detached to cold storage.
Cleanup job. A scheduled job deletes Redis entries past TTL (Redis does this for you with EXPIRE) and detaches Postgres partitions older than the retention boundary. Never DELETE from a high-volume idempotency table during business hours; partition pruning is the only sustainable strategy.

Observability

Three metrics tell you whether the idempotency layer is working.

Cache hit rate on retries. What fraction of POST requests find a previously seen key? Healthy production traffic for payment APIs is in the 1–5% range. Zero means clients aren’t retrying with the same key (which means they’ll send duplicates on the next outage). Above 10% suggests aggressive client retry behaviour worth investigating.
Fingerprint mismatch rate. The rate at which clients reuse a key with a different body. This is a client bug indicator; it should be near zero. A spike points at a specific integration that needs a developer conversation.
409 in-flight responses. The rate at which retries land while the original is still processing. Useful for capacity planning — if this rises, the underlying API latency has degraded.

Common pitfalls

Storing the response without the fingerprint

The most common production mistake: caching the response under the key but not validating the body matches. A client that reuses a key for a different payment gets the previous payment’s confirmation. The duplicate-payment incident becomes a wrong-payment-confirmation incident, which is worse.

Idempotency key scoped wrongly

Scope the key to the endpoint, not globally and not per-customer. A key reused across different endpoints should never collide; a key reused on the same endpoint by the same customer is the case you must catch. The unique constraint on (key_value, endpoint) is the simplest correct scope.

Long-running operations behind a synchronous endpoint

If the POST kicks off a downstream workflow that takes more than a few seconds, the client’s retry will land while the original is in flight. Either accept long 409 retry loops, or move to an async pattern with a separate status endpoint. Don’t pretend a long-running operation is synchronous.

Forgetting the failure case

A failed first request — downstream timeout, bank core unavailable, validation error after partial work — needs explicit handling. Should the same key be retryable after the failure? Usually yes, but only if the failure was clean. If money may have moved, the retry is unsafe. Mark the row failed with the reason and require a different key for the retry.

The client-generates-key contract

Some implementations let the server generate the key and return it. Don’t. The whole point of idempotency is that the client can retry safely after a network failure. If the server generated the key but the client never got the response, the client cannot replay with the same key. Client-generated keys are the only design that survives the failure mode the pattern was built for.

When idempotency keys are not enough

Multi-step workflows. A payment that requires authorisation, capture, and clearing is three operations, each with its own idempotency requirement. A single key on the orchestrator is not enough — each downstream step needs its own. Use a saga pattern with per-step keys derived from the orchestration key.
External callbacks and webhooks. When the bank calls a third party (e.g., an SMS provider, a fraud-screening service), the bank is now the client of someone else’s API. Generate idempotency keys for those calls too. The same logic applies in reverse.
Cross-system consistency. Idempotency makes a single API call safe to retry. It does not make distributed transactions safe. For multi-system writes, combine idempotency keys with the transactional outbox pattern (covered in the Data Synchronisation article) and explicit reconciliation.
Read endpoints. GET requests are idempotent by definition. Adding an Idempotency-Key to them is wasted work and confuses caches and proxies that already cache reads correctly.