Retries & Dead-Letter Queue
Job failures are inevitable. Spooled provides automatic retries with exponential backoff and a dead-letter queue for jobs that can't be processed.
How Retries Work
When a job fails, Spooled automatically schedules a retry with exponential backoff. This prevents overwhelming downstream services and gives transient failures time to resolve.
%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#fef3c7', 'primaryTextColor': '#92400e', 'primaryBorderColor': '#f59e0b', 'lineColor': '#6b7280'}}}%%
stateDiagram-v2
[*] --> pending
pending --> claimed: Worker claims
claimed --> completed: Success
claimed --> failed: Error
failed --> pending: Retry (attempts left)
failed --> dlq: Max retries exceeded
dlq --> pending: Manual replay
completed --> [*] Retry Flow
- Worker claims a job and attempts to process it
- Processing fails (exception, timeout, or explicit failure)
- Spooled checks if retry attempts remain
- If retries remain: job is scheduled for later with backoff delay
- If max retries exceeded: job moves to dead-letter queue (DLQ)
Backoff Strategies
By default, Spooled uses exponential backoff with jitter. This spreads out retries and prevents thundering herd problems.
Default Exponential Backoff
With default settings (base=1s, max=1h):
| Attempt | Delay | Cumulative Time |
|---|---|---|
| 1 | 1s | 1s |
| 2 | 2s | 3s |
| 3 | 4s | 7s |
| 4 | 8s | 15s |
| 5 | 16s | 31s |
| 6 | 32s | ~1 min |
| 7 | 64s | ~2 min |
| 8 | 128s | ~4 min |
| 9 | 256s | ~8 min |
| 10 | 512s | ~17 min |
Available Strategies
- Exponential (default) — Delay doubles each attempt: 1s, 2s, 4s, 8s...
- Linear — Constant delay increase: 1s, 2s, 3s, 4s...
- Fixed — Same delay every time: 5s, 5s, 5s, 5s...
- Custom — Provide your own delay function
Retry Configuration
Note: Code examples use SDK syntax (under development). For production today, use the equivalent REST API endpoints.
Configure retry behavior per job:
// Per-job retry configuration
await client.jobs.enqueue({
queue: 'critical-webhooks',
payload: webhookData,
maxRetries: 10, // More retries for critical jobs
retryBackoff: {
type: 'exponential',
baseDelay: 1000, // 1 second initial delay
maxDelay: 3600000, // Max 1 hour between retries
jitter: 0.2, // 20% random jitter
},
}); Configuration Options
| Option | Type | Default | Description |
|---|---|---|---|
maxRetries | number | 3 | Maximum retry attempts before moving to DLQ |
retryBackoff.type | string | "exponential" | Backoff strategy: exponential, linear, fixed |
retryBackoff.baseDelay | number | 1000 | Initial delay in milliseconds |
retryBackoff.maxDelay | number | 3600000 | Maximum delay (1 hour by default) |
retryBackoff.jitter | number | 0.1 | Random jitter factor (0-1) |
Dead-Letter Queue (DLQ)
Jobs that exhaust all retry attempts land in the dead-letter queue. The DLQ preserves the full job payload and failure history for debugging and replay.
Working with the DLQ
// List jobs in dead-letter queue
const dlqJobs = await client.jobs.listDLQ({
queue: 'payment-processing',
limit: 100,
});
// Inspect a failed job
for (const job of dlqJobs) {
console.log(`Job ${job.id} failed after ${job.attempts} attempts`);
console.log(`Last error: ${job.lastError}`);
console.log(`Payload: `, job.payload);
}
// Replay a single job
await client.jobs.replay(dlqJobs[0].id);
// Replay all jobs in a queue's DLQ
await client.jobs.replayAll({ queue: 'payment-processing' }); DLQ Best Practices
- • Set up alerts when jobs enter the DLQ
- • Review DLQ jobs regularly (daily for critical queues)
- • Fix the root cause before replaying jobs
- • Consider archiving old DLQ jobs to cold storage
Handling Failures in Workers
Workers should explicitly mark jobs as failed with a reason. This helps with debugging and determines retry behavior.
worker.on('job', async (job) => {
try {
await processJob(job.payload);
await job.complete();
} catch (error) {
// Determine if error is retryable
if (error.code === 'RATE_LIMITED') {
// Retryable - let Spooled handle it
await job.fail({
reason: error.message,
retryable: true
});
} else if (error.code === 'INVALID_PAYLOAD') {
// Not retryable - send directly to DLQ
await job.fail({
reason: error.message,
retryable: false
});
} else {
// Unknown error - default retry behavior
await job.fail({ reason: error.message });
}
}
}); Next Steps
- Jobs & queues — Understand the job lifecycle
- Building workers — Production worker patterns
- Real-time updates — Monitor job status in real-time