Skip to content

Architecture

Spooled is built for reliability, performance, and multi-tenant security. This guide explains the system architecture and design decisions.

System Overview

Spooled is a distributed job queue system built with Rust for maximum performance and safety. The architecture consists of several key layers:

%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#ecfdf5', 'primaryTextColor': '#065f46', 'primaryBorderColor': '#10b981', 'lineColor': '#6b7280'}}}%%
flowchart TB
    subgraph clients["Client Layer"]
        SDK["SDKs<br/>Node.js / Python / Go"]
        HTTP["REST API<br/>OpenAPI 3.0"]
        GRPC["gRPC API"]
    end

    subgraph core["Core Services"]
        API["API Gateway<br/>Axum + Tower"]
        AUTH["Auth Service"]
        QUEUE["Queue Engine"]
        SCHED["Scheduler"]
        STREAM["Event Stream"]
    end

    subgraph storage["Data Layer"]
        PG[("PostgreSQL<br/>Jobs + RLS")]
        RD[("Redis<br/>Pub/Sub + Cache")]
    end

    subgraph workers["Worker Layer"]
        W1["Worker Pool 1"]
        W2["Worker Pool 2"]
        W3["Worker Pool N"]
    end

    subgraph observability["Observability"]
        PROM["Prometheus"]
        GRAF["Grafana"]
        LOGS["Structured Logs"]
    end

    SDK --> API
    HTTP --> API
    GRPC --> API
    API --> AUTH
    AUTH --> QUEUE
    QUEUE --> PG
    QUEUE --> RD
    SCHED --> QUEUE
    STREAM --> RD
    
    W1 --> API
    W2 --> API
    W3 --> API
    
    API --> PROM
    PROM --> GRAF

Core Components

API Gateway (Axum + Tower)

The API gateway handles all incoming requests. Built with Axum, it provides:

  • REST API — OpenAPI 3.0 compliant, JSON over HTTPS
  • gRPC API — HTTP/2 + Protobuf with streaming for high-throughput workers
  • WebSocket/SSE — Real-time job status streaming
  • Rate limiting — Per-organization request throttling
  • Request validation — Schema validation for all inputs

Queue Engine

The core job processing engine manages job lifecycle, retries, and scheduling:

  • Optimistic locking for high-throughput claim operations
  • Configurable exponential backoff with jitter
  • Priority queue support (higher priority = processed first)
  • Scheduled job support with second-level precision

PostgreSQL (Data Plane)

PostgreSQL stores all job data with Row-Level Security (RLS) for multi-tenant isolation:

  • Durability — ACID transactions ensure no job loss
  • RLS — Organizations can only see their own data
  • Indexing — Optimized for queue operations and time-based queries
  • Partitioning — Time-based partitioning for large deployments

Redis (Real-time Layer)

Redis handles real-time features and caching:

  • Pub/Sub — Real-time job status notifications
  • Rate limiting — Token bucket counters per organization
  • Caching — API key validation and org metadata
  • Cluster support — Horizontal scaling for high availability

Multi-Tenant Security

Every API request is scoped to a single organization using PostgreSQL Row-Level Security. This provides defense-in-depth isolation at the database level.

%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#eff6ff', 'primaryTextColor': '#1e40af', 'primaryBorderColor': '#3b82f6', 'lineColor': '#6b7280'}}}%%
flowchart LR
    subgraph request["Incoming Request"]
        TOKEN["API Key"]
    end

    subgraph auth["Authentication"]
        VALIDATE["Validate Key"]
        EXTRACT["Extract org_id"]
    end

    subgraph pg["PostgreSQL RLS"]
        SET["SET app.current_org"]
        POLICY["RLS Policy Check"]
        DATA["Org's Data Only"]
    end

    TOKEN --> VALIDATE
    VALIDATE --> EXTRACT
    EXTRACT --> SET
    SET --> POLICY
    POLICY --> DATA

How RLS Works

  1. API key is validated and organization ID extracted
  2. Connection sets app.current_org session variable
  3. All queries automatically filter by org_id = current_setting('app.current_org')
  4. Even raw SQL access cannot cross tenant boundaries

Retry Mechanism

Failed jobs automatically retry with configurable exponential backoff. The retry system ensures reliable delivery while preventing thundering herd problems.

%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#fef3c7', 'primaryTextColor': '#92400e', 'primaryBorderColor': '#f59e0b', 'lineColor': '#6b7280'}}}%%
sequenceDiagram
    participant W as Worker
    participant S as Spooled
    participant DLQ as Dead Letter Queue

    W->>S: Claim job
    S-->>W: Job data
    W->>W: Process (fails)
    W->>S: Fail job
    S->>S: Check retry count
    alt retries remaining
        S->>S: Schedule retry (backoff)
        Note over S: Wait 2^n seconds
        S-->>W: Job available again
    else max retries exceeded
        S->>DLQ: Move to DLQ
        Note over DLQ: Manual review
    end

Backoff Formula

Default backoff uses exponential delay with jitter:

delay = min(base_delay * 2^attempt + random_jitter, max_delay)

Where:

  • base_delay = 1 second
  • max_delay = 1 hour
  • random_jitter = 0-500ms

Performance Characteristics

10,000+
Jobs per second per node
<50ms
P99 enqueue latency
99.99%
Uptime SLA (Pro plan)
Rust
Memory-safe, zero-cost abstractions

Deployment Options

Option Best For Maintenance
Managed Cloud Most teams Zero maintenance
Docker Compose Development, small deployments Basic ops required
Kubernetes/Helm Large scale, air-gapped Full ops team

Next Steps