Architecture
Spooled is built for reliability, performance, and multi-tenant security. This guide explains the system architecture and design decisions.
System Overview
Spooled is a distributed job queue system built with Rust for maximum performance and safety. The architecture consists of several key layers:
%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#ecfdf5', 'primaryTextColor': '#065f46', 'primaryBorderColor': '#10b981', 'lineColor': '#6b7280'}}}%%
flowchart TB
subgraph clients["Client Layer"]
SDK["SDKs<br/>Node.js / Python / Go"]
HTTP["REST API<br/>OpenAPI 3.0"]
GRPC["gRPC API"]
end
subgraph core["Core Services"]
API["API Gateway<br/>Axum + Tower"]
AUTH["Auth Service"]
QUEUE["Queue Engine"]
SCHED["Scheduler"]
STREAM["Event Stream"]
end
subgraph storage["Data Layer"]
PG[("PostgreSQL<br/>Jobs + RLS")]
RD[("Redis<br/>Pub/Sub + Cache")]
end
subgraph workers["Worker Layer"]
W1["Worker Pool 1"]
W2["Worker Pool 2"]
W3["Worker Pool N"]
end
subgraph observability["Observability"]
PROM["Prometheus"]
GRAF["Grafana"]
LOGS["Structured Logs"]
end
SDK --> API
HTTP --> API
GRPC --> API
API --> AUTH
AUTH --> QUEUE
QUEUE --> PG
QUEUE --> RD
SCHED --> QUEUE
STREAM --> RD
W1 --> API
W2 --> API
W3 --> API
API --> PROM
PROM --> GRAF Core Components
API Gateway (Axum + Tower)
The API gateway handles all incoming requests. Built with Axum, it provides:
- REST API — OpenAPI 3.0 compliant, JSON over HTTPS
- gRPC API — HTTP/2 + Protobuf with streaming for high-throughput workers
- WebSocket/SSE — Real-time job status streaming
- Rate limiting — Per-organization request throttling
- Request validation — Schema validation for all inputs
Queue Engine
The core job processing engine manages job lifecycle, retries, and scheduling:
- Optimistic locking for high-throughput claim operations
- Configurable exponential backoff with jitter
- Priority queue support (higher priority = processed first)
- Scheduled job support with second-level precision
PostgreSQL (Data Plane)
PostgreSQL stores all job data with Row-Level Security (RLS) for multi-tenant isolation:
- Durability — ACID transactions ensure no job loss
- RLS — Organizations can only see their own data
- Indexing — Optimized for queue operations and time-based queries
- Partitioning — Time-based partitioning for large deployments
Redis (Real-time Layer)
Redis handles real-time features and caching:
- Pub/Sub — Real-time job status notifications
- Rate limiting — Token bucket counters per organization
- Caching — API key validation and org metadata
- Cluster support — Horizontal scaling for high availability
Multi-Tenant Security
Every API request is scoped to a single organization using PostgreSQL Row-Level Security. This provides defense-in-depth isolation at the database level.
%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#eff6ff', 'primaryTextColor': '#1e40af', 'primaryBorderColor': '#3b82f6', 'lineColor': '#6b7280'}}}%%
flowchart LR
subgraph request["Incoming Request"]
TOKEN["API Key"]
end
subgraph auth["Authentication"]
VALIDATE["Validate Key"]
EXTRACT["Extract org_id"]
end
subgraph pg["PostgreSQL RLS"]
SET["SET app.current_org"]
POLICY["RLS Policy Check"]
DATA["Org's Data Only"]
end
TOKEN --> VALIDATE
VALIDATE --> EXTRACT
EXTRACT --> SET
SET --> POLICY
POLICY --> DATA How RLS Works
- API key is validated and organization ID extracted
- Connection sets
app.current_orgsession variable - All queries automatically filter by
org_id = current_setting('app.current_org') - Even raw SQL access cannot cross tenant boundaries
Retry Mechanism
Failed jobs automatically retry with configurable exponential backoff. The retry system ensures reliable delivery while preventing thundering herd problems.
%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#fef3c7', 'primaryTextColor': '#92400e', 'primaryBorderColor': '#f59e0b', 'lineColor': '#6b7280'}}}%%
sequenceDiagram
participant W as Worker
participant S as Spooled
participant DLQ as Dead Letter Queue
W->>S: Claim job
S-->>W: Job data
W->>W: Process (fails)
W->>S: Fail job
S->>S: Check retry count
alt retries remaining
S->>S: Schedule retry (backoff)
Note over S: Wait 2^n seconds
S-->>W: Job available again
else max retries exceeded
S->>DLQ: Move to DLQ
Note over DLQ: Manual review
end Backoff Formula
Default backoff uses exponential delay with jitter:
delay = min(base_delay * 2^attempt + random_jitter, max_delay) Where:
base_delay= 1 secondmax_delay= 1 hourrandom_jitter= 0-500ms
Performance Characteristics
Deployment Options
| Option | Best For | Maintenance |
|---|---|---|
| Managed Cloud | Most teams | Zero maintenance |
| Docker Compose | Development, small deployments | Basic ops required |
| Kubernetes/Helm | Large scale, air-gapped | Full ops team |
Next Steps
- Deployment guide — Self-hosting instructions
- API reference — Complete endpoint documentation
- Open source — Contributing and licensing