Retries & Dead-Letter Queue

Job failures are inevitable. Spooled provides automatic retries with exponential backoff and a dead-letter queue for jobs that can't be processed.

How Retries Work

When a job fails, Spooled automatically schedules a retry with exponential backoff. This prevents overwhelming downstream services and gives transient failures time to resolve.

%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#fef3c7', 'primaryTextColor': '#92400e', 'primaryBorderColor': '#f59e0b', 'lineColor': '#6b7280'}}}%%
stateDiagram-v2
    [*] --> pending
    pending --> claimed: Worker claims
    claimed --> completed: Success
    claimed --> failed: Error
    failed --> pending: Retry (attempts left)
    failed --> dlq: Max retries exceeded
    dlq --> pending: Manual replay
    completed --> [*]

Retry Flow

Worker claims a job and attempts to process it
Processing fails (exception, timeout, or explicit failure)
Spooled checks if retry attempts remain
If retries remain: job is scheduled for later with backoff delay
If max retries exceeded: job moves to dead-letter queue (DLQ)

Backoff Strategies

By default, Spooled uses exponential backoff with jitter. This spreads out retries and prevents thundering herd problems.

Default Exponential Backoff

With default settings (base=1s, max=1h):

Attempt	Delay	Cumulative Time
1	1s	1s
2	2s	3s
3	4s	7s
4	8s	15s
5	16s	31s
6	32s	~1 min
7	64s	~2 min
8	128s	~4 min
9	256s	~8 min
10	512s	~17 min

Available Strategies

Exponential (default) — Delay doubles each attempt: 1s, 2s, 4s, 8s...
Linear — Constant delay increase: 1s, 2s, 3s, 4s...
Fixed — Same delay every time: 5s, 5s, 5s, 5s...
Custom — Provide your own delay function

Retry Configuration

Note: Code examples use SDK syntax (under development). For production today, use the equivalent REST API endpoints.

Configure retry behavior per job:

// Per-job retry configuration
await client.jobs.enqueue({
  queue: 'critical-webhooks',
  payload: webhookData,
  maxRetries: 10,              // More retries for critical jobs
  retryBackoff: {
    type: 'exponential',
    baseDelay: 1000,           // 1 second initial delay
    maxDelay: 3600000,         // Max 1 hour between retries
    jitter: 0.2,               // 20% random jitter
  },
});

Configuration Options

Option	Type	Default	Description
`maxRetries`	number	3	Maximum retry attempts before moving to DLQ
`retryBackoff.type`	string	"exponential"	Backoff strategy: exponential, linear, fixed
`retryBackoff.baseDelay`	number	1000	Initial delay in milliseconds
`retryBackoff.maxDelay`	number	3600000	Maximum delay (1 hour by default)
`retryBackoff.jitter`	number	0.1	Random jitter factor (0-1)

Dead-Letter Queue (DLQ)

Jobs that exhaust all retry attempts land in the dead-letter queue. The DLQ preserves the full job payload and failure history for debugging and replay.

Working with the DLQ

// List jobs in dead-letter queue
const dlqJobs = await client.jobs.listDLQ({
  queue: 'payment-processing',
  limit: 100,
});

// Inspect a failed job
for (const job of dlqJobs) {
  console.log(`Job ${job.id} failed after ${job.attempts} attempts`);
  console.log(`Last error: ${job.lastError}`);
  console.log(`Payload: `, job.payload);
}

// Replay a single job
await client.jobs.replay(dlqJobs[0].id);

// Replay all jobs in a queue's DLQ
await client.jobs.replayAll({ queue: 'payment-processing' });

DLQ Best Practices

• Set up alerts when jobs enter the DLQ
• Review DLQ jobs regularly (daily for critical queues)
• Fix the root cause before replaying jobs
• Consider archiving old DLQ jobs to cold storage

Handling Failures in Workers

Workers should explicitly mark jobs as failed with a reason. This helps with debugging and determines retry behavior.

worker.on('job', async (job) => {
  try {
    await processJob(job.payload);
    await job.complete();
  } catch (error) {
    // Determine if error is retryable
    if (error.code === 'RATE_LIMITED') {
      // Retryable - let Spooled handle it
      await job.fail({ 
        reason: error.message,
        retryable: true 
      });
    } else if (error.code === 'INVALID_PAYLOAD') {
      // Not retryable - send directly to DLQ
      await job.fail({ 
        reason: error.message,
        retryable: false 
      });
    } else {
      // Unknown error - default retry behavior
      await job.fail({ reason: error.message });
    }
  }
});

Next Steps

Jobs & queues — Understand the job lifecycle
Building workers — Production worker patterns
Real-time updates — Monitor job status in real-time