Replicated requests
Replicate requests only when reduced tail latency is worth duplicate work, and cancel losers as soon as a winner is good enough.
Canonical guidance
- hedge only latency-sensitive, safe-to-duplicate work
- define what counts as a winning result
- cancel the rest immediately
Use when
- tail-latency reduction
- replicated reads
- multi-endpoint lookups where duplicates are acceptable
Avoid
- hedging writes with side effects
- launching duplicates with no cancellation path
- using replication where one slow dependency should instead be fixed or rate-limited
Preferred pattern
- launch bounded replicas with a shared context and accept the first sufficient result
Anti-pattern
- fan out duplicates and then wait for all of them anyway
Explanation: This is tempting because it simplifies collection logic, but it keeps paying the duplicate cost after the answer is already known.
Why
- replicated requests trade extra load for lower tail latency; that trade must stay explicit
Related pages
Sources
- Concurrency is not parallelism - Rob Pike
- Go Concurrency Patterns: Context - Sameer Ajmani