Idempotency Keys and the Exactly-Once Myth in Distributed Systems
Teams often say "exactly once" when they really mean "at least once delivery plus deduplication." In real systems, retries, network timeouts, and client reconnects make duplicate requests normal behavior.
Idempotency keys are the practical way to keep side effects safe.
Why duplicates happen
- Client times out but server still completes the write
- Load balancer retries after a transient 502
- User taps the same payment button twice
- Message broker redelivers after consumer crash
Core design
Use a stable idempotency key per business operation (for example, one checkout submission). Store the key with operation outcome and return the same response on duplicate attempts.
Request(idempotency_key=abc123)
-> lookup key
-> if exists: return stored result
-> if not exists: execute + persist result atomically
Implementation rules
- Scope the key by tenant/user to avoid cross-account collisions.
- Persist both success and known business failures.
- Set an expiration window that matches retry behavior.
- Protect storage with unique constraints.
Common mistakes
- Generating a new key for each retry attempt
- Saving key after side effect instead of atomically
- Returning 409 without replaying original result body
- Treating non-idempotent downstream calls as safe
Conclusion
Exactly-once delivery is usually a protocol claim, not an end-to-end guarantee. Idempotency keys give you realistic protection against duplicates and make write APIs predictable under failure conditions.
Related posts
Compensation Design and Failure Recovery in Distributed Payment Flows
Guide to handling partial failures in payment services with compensation workflows and financial consistency checks.
Designing Idempotent Backfill Pipelines for Safe Historical Reprocessing
How to build idempotent historical backfill pipelines with checkpointing and data validation safeguards.
Durable Workflow Orchestration with Temporal
How Temporal helps backend teams build reliable long-running workflows with retries, timeouts, compensation logic, and strong observability.