Skip to content

Retries & Dead-Letter Queue

Job failures are inevitable. Spooled provides automatic retries with exponential backoff and a dead-letter queue for jobs that can't be processed.

How Retries Work

When a job fails, Spooled automatically schedules a retry with exponential backoff. This prevents overwhelming downstream services and gives transient failures time to resolve.

%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#fef3c7', 'primaryTextColor': '#92400e', 'primaryBorderColor': '#f59e0b', 'lineColor': '#6b7280'}}}%%
stateDiagram-v2
    [*] --> pending
    pending --> claimed: Worker claims
    claimed --> completed: Success
    claimed --> failed: Error
    failed --> pending: Retry (attempts left)
    failed --> dlq: Max retries exceeded
    dlq --> pending: Manual replay
    completed --> [*]

Retry Flow

  1. Worker claims a job and attempts to process it
  2. Processing fails (exception, timeout, or explicit failure)
  3. Spooled checks if retry attempts remain
  4. If retries remain: job is scheduled for later with backoff delay
  5. If max retries exceeded: job moves to dead-letter queue (DLQ)

Backoff Strategies

By default, Spooled uses exponential backoff with jitter. This spreads out retries and prevents thundering herd problems.

Default Exponential Backoff

With default settings (base=1s, max=1h):

Attempt Delay Cumulative Time
1 1s 1s
2 2s 3s
3 4s 7s
4 8s 15s
5 16s 31s
6 32s ~1 min
7 64s ~2 min
8 128s ~4 min
9 256s ~8 min
10 512s ~17 min

Available Strategies

  • Exponential (default) — Delay doubles each attempt: 1s, 2s, 4s, 8s...
  • Linear — Constant delay increase: 1s, 2s, 3s, 4s...
  • Fixed — Same delay every time: 5s, 5s, 5s, 5s...
  • Custom — Provide your own delay function

Retry Configuration

Note: Code examples use SDK syntax (under development). For production today, use the equivalent REST API endpoints.

Configure retry behavior per job:

// Per-job retry configuration
await client.jobs.enqueue({
  queue: 'critical-webhooks',
  payload: webhookData,
  maxRetries: 10,              // More retries for critical jobs
  retryBackoff: {
    type: 'exponential',
    baseDelay: 1000,           // 1 second initial delay
    maxDelay: 3600000,         // Max 1 hour between retries
    jitter: 0.2,               // 20% random jitter
  },
});

Configuration Options

Option Type Default Description
maxRetries number 3 Maximum retry attempts before moving to DLQ
retryBackoff.type string "exponential" Backoff strategy: exponential, linear, fixed
retryBackoff.baseDelay number 1000 Initial delay in milliseconds
retryBackoff.maxDelay number 3600000 Maximum delay (1 hour by default)
retryBackoff.jitter number 0.1 Random jitter factor (0-1)

Dead-Letter Queue (DLQ)

Jobs that exhaust all retry attempts land in the dead-letter queue. The DLQ preserves the full job payload and failure history for debugging and replay.

Working with the DLQ

// List jobs in dead-letter queue
const dlqJobs = await client.jobs.listDLQ({
  queue: 'payment-processing',
  limit: 100,
});

// Inspect a failed job
for (const job of dlqJobs) {
  console.log(`Job ${job.id} failed after ${job.attempts} attempts`);
  console.log(`Last error: ${job.lastError}`);
  console.log(`Payload: `, job.payload);
}

// Replay a single job
await client.jobs.replay(dlqJobs[0].id);

// Replay all jobs in a queue's DLQ
await client.jobs.replayAll({ queue: 'payment-processing' });

DLQ Best Practices

  • • Set up alerts when jobs enter the DLQ
  • • Review DLQ jobs regularly (daily for critical queues)
  • • Fix the root cause before replaying jobs
  • • Consider archiving old DLQ jobs to cold storage

Handling Failures in Workers

Workers should explicitly mark jobs as failed with a reason. This helps with debugging and determines retry behavior.

worker.on('job', async (job) => {
  try {
    await processJob(job.payload);
    await job.complete();
  } catch (error) {
    // Determine if error is retryable
    if (error.code === 'RATE_LIMITED') {
      // Retryable - let Spooled handle it
      await job.fail({ 
        reason: error.message,
        retryable: true 
      });
    } else if (error.code === 'INVALID_PAYLOAD') {
      // Not retryable - send directly to DLQ
      await job.fail({ 
        reason: error.message,
        retryable: false 
      });
    } else {
      // Unknown error - default retry behavior
      await job.fail({ reason: error.message });
    }
  }
});

Next Steps