Skip to main content

Backup and restore

Orkestra's stateful surfaces are:

WhatWhereBackup needed?
Primary data (users, tenants, modules, billing, marketing, etc.)MongoDB✅ yes, daily minimum
AES-256-GCM-encrypted secrets (module_configs)MongoDB✅ yes — but recovery also requires OAUTH_TOKEN_ENCRYPTION_KEY from docker/.env
Per-tenant DEKs (compliance)MongoDB✅ yes — recovery requires ORKESTRA_KMS_MASTER_KEY from docker/.env
Audit events (audit_events)MongoDB✅ yes — never sample, never delete
Sessions / refresh tokens / rate-limit countersRedis❌ no — losing logs everyone out, doesn't lose data
HTTP request logsLoki / Datadog / stdoutoptional — your retention policy
OpenTelemetry traces / metricsTempo / Prometheus / vendoroptional — your retention policy
Object blobs (avatars today, attachments tomorrow)RustFS / S3✅ yes if you have user uploads in flight
Object content (PDF outputs, marketing imports)filesystem (MARKETING_IMPORT_SPOOL_DIR)✅ yes if you need it long-term
JWT keys (docker/keys/)filesystem✅ yes — losing invalidates every issued token

Bottom line: back up MongoDB nightly and the contents of docker/.env + docker/keys/ once. Skip Redis. Decide on the spool directory based on how much your importers matter.

Orkestra ships two paths for this:

  • ./backup.sh and ./restore.sh at the repo root — bundled, opinionated, cover every stateful surface in one tarball. Right for dev, staging, and small single-VM production. See Bundled scripts below.
  • Manual mongodump / mongorestore + a separate secrets tarball — finer-grained, the right choice once you start shipping dumps to off-site object storage with their own retention and ACLs. See Manual recipes.

Bundled scripts

./backup.sh and ./restore.sh live at the repo root alongside ./orkestra.sh. Both run as a TUI (no args) or via CLI flags, and target the canonical infra container names the base ships: orkestra-mongodb, orkestra-redis, orkestra-rustfs. (A fork that adds an addon with its own datastore extends the component list — the archived graph addon, for example, snapshotted an orkestra-memgraph volume.)

What they cover

ComponentBackup mechanismRestore mechanism
mongodbmongodump --db $MONGO_DATABASE --archive --gzip (scoped to the orkestra DB — skips admin/local/config system DBs and the throwaway orkestra_openapi_dump sandbox that make openapi-dump uses)mongorestore --drop --archive --gzip, with automatic --nsFrom/--nsTo remapping if the archive's source DB name differs from the target's MONGO_DATABASE
redisSynchronous SAVE, then docker cp of dump.rdb + the AOF directory from whatever path CONFIG GET dir reports (the redis-stack image used in docker-compose.infra.yml writes to /, not /data)Stop container, copy files back, start container
rustfsEnumerates buckets via aws-cli over orkestra-network and s3 syncs each one outs3 mb (best-effort) + s3 sync from the bundle back into each bucket
secretsCopies docker/.env and docker/keys/* into the bundleWrites the files back into docker/, prompting before overwriting existing ones

A manifest.json at the tarball root records the schema version, timestamp, environment, host, and which components were actually captured. restore.sh reads it to decide which components are available.

:::warning Secrets live in the tarball The secrets component bundles docker/.env (DB passwords, OAuth client secrets, KMS master key, encryption keys) and docker/keys/* (JWT private key, Apple p8). Anyone with read access to the tarball has everything they need to decrypt every secret in module_configs. Store the resulting file in a secrets-grade location — encrypted volume, vault, or off-site bucket with strict ACLs. For production deployments, prefer the manual recipes below so secret material is backed up separately from data on its own retention. :::

Quick start

# Interactive TUI — pick components from a menu
./backup.sh

# Non-interactive — back up everything available
./backup.sh all --yes

# Subset — only some components
./backup.sh --components mongodb,redis,secrets --yes

Output lands in ./backups/orkestra-backup-<env>-<UTC-timestamp>.tar.gz by default; override with --output.

# Interactive TUI — pick a tarball from ./backups/
./restore.sh

# Restore a specific bundle
./restore.sh ./backups/orkestra-backup-staging-20260527-071724.tar.gz

# Subset, no prompts (CI / scripted)
./restore.sh ./backups/<file>.tar.gz --components mongodb --yes

Dry-run

restore.sh has a --dry-run flag that reports exactly what would happen without mutating anything. It's the safest way to validate an archive end-to-end before committing to the restore:

./restore.sh ./backups/<file>.tar.gz --dry-run

What dry-run actually does, per component:

  • mongodb — runs mongorestore --dryRun against the live container so the archive is fully parsed, every namespace is listed, and any decompression / format errors surface; reports namespace remapping if source DB differs from target
  • redis — queries the live container's CONFIG GET dir/dbfilename/appenddirname and prints the exact docker cp destinations it would use
  • rustfs — runs aws s3 sync --dryrun against the live endpoint so per-object diffs (additions, overwrites, deletions) appear before any byte is moved
  • secrets — lists every file in the bundle and flags which targets already exist (would prompt to overwrite) vs which would be new

Dry-run also exercises the preconditions: if a target container isn't running, the dry-run for that component fails the same way the real restore would. That makes it a genuine preflight, not just a print of intent.

CLI reference

backup.sh

FlagDescription
all (positional)Back up every component currently available on the host.
--components <csv> / -cSubset, e.g. --components mongodb,secrets.
--output <path> / -oWrite to a specific path instead of ./backups/orkestra-backup-<env>-<timestamp>.tar.gz.
--yes / -ySkip the proceed-prompt. Required for non-interactive shells.
--help / -hPrint usage.

restore.sh

FlagDescription
<file> (positional)Path to a tarball produced by backup.sh. Omit to pick from ./backups/ via the TUI.
--components <csv> / -cRestore only some components from the archive. Must be a subset of the manifest.
--dry-run / -nShow what would happen, change nothing. Bypasses the destructive-action confirmation and runs without --yes in non-interactive shells.
--yes / -ySkip the type 'restore' to confirm gate. Destructive — required for non-interactive shells.
--help / -hPrint usage.

Tarball layout

orkestra-backup-<env>-<timestamp>.tar.gz
├── manifest.json schema, env, timestamp, components included
├── mongodb/
│ ├── mongo.archive.gz mongodump --archive --gzip output
│ └── database.txt source DB name (for ns-remapping on restore)
├── redis/
│ ├── dump.rdb RDB snapshot
│ ├── appendonlydir/ Redis 7+ AOF folder (manifest + base + incr)
│ │ └── appendonly.aof.*
│ └── redis-layout.txt live `dir` / `dbfilename` / `appenddirname` at backup time
├── rustfs/
│ └── <bucket>/ one directory per S3 bucket the script could see
│ └── ... full object tree synced via aws s3 sync
└── secrets/
├── .env docker/.env verbatim
└── keys/
├── jwt-private.pem
├── jwt-public.pem
└── apple.p8 (if Apple Sign In configured)

manifest.json schema:

{
"schema": "orkestra-backup/v1",
"createdAt": "2026-05-27T07:17:29Z",
"environment": "staging",
"host": "orkestra-staging-1",
"components": ["mongodb", "redis", "rustfs", "secrets"]
}

When to fall back to the manual recipes

The bundled scripts are the right default for dev, staging, and single-VM production. Reach for the manual recipes when:

  • You want secret material on a separate retention from data. The bundled secrets component ships keys and the .env in the same tarball as nightly data — fine if the tarball goes to an HSM-grade target, awkward if it goes to a general-purpose S3 bucket.
  • You want to stream dumps directly to off-site object storage (aws s3 cp - s3://...) instead of round-tripping through a local file.
  • You want fine-grained cron retention (e.g. hourly snapshots × 24h + daily × 30 + monthly × 12 — the bundled scripts produce one timestamped file per run; everything else is your job).
  • You're running Orkestra on a host where Docker isn't available (the bundled scripts assume docker).

Manual recipes

:::note Two scripts named backup.sh The bundled tool at the repo root is ./backup.sh. The production cron recipe below also happens to land at /opt/orkestra/scripts/backup.sh on the deployment VM. They are unrelated — the bundled one is a TUI/CLI for dev + small-prod use, the production one is a single-purpose mongodump wrapper for nightly cron. Rename the cron script if both will ever live on the same host. :::

MongoDB backup

mongodump is the operator-friendly path — produces BSON dumps that mongorestore reads. It does not lock the database; it's safe on a live system, though large dumps spike I/O.

Daily backup script

Save as /opt/orkestra/scripts/backup.sh on the production VM:

#!/bin/bash
# /opt/orkestra/scripts/backup.sh
set -euo pipefail

TIMESTAMP=$(date -u +%Y-%m-%dT%H-%M-%SZ)
BACKUP_DIR="/var/backups/orkestra"
RETENTION_DAYS=30

mkdir -p "$BACKUP_DIR"

# Source the env to get MONGO_ROOT_PASSWORD
set -a
. /opt/orkestra/docker/.env
set +a

# Dump into a tarball
docker exec orkestra-mongodb mongodump \
--username "${MONGO_ROOT_USERNAME:-admin}" \
--password "${MONGO_ROOT_PASSWORD}" \
--authenticationDatabase admin \
--archive --gzip > "${BACKUP_DIR}/orkestra-${TIMESTAMP}.archive.gz"

# Prune older than retention
find "$BACKUP_DIR" -name "orkestra-*.archive.gz" -mtime "+${RETENTION_DAYS}" -delete

# Optional: ship to S3 / GCS / Azure Blob
# aws s3 cp "${BACKUP_DIR}/orkestra-${TIMESTAMP}.archive.gz" s3://your-bucket/orkestra/

Make executable and schedule:

sudo chmod +x /opt/orkestra/scripts/backup.sh

# /etc/cron.d/orkestra-backup
0 3 * * * deploy /opt/orkestra/scripts/backup.sh >> /var/log/orkestra-backup.log 2>&1

3 AM UTC daily, kept for 30 days. Adjust to your timezone / retention policy.

Verify the backup is sane

A backup file you can't restore is not a backup. Test monthly (or at least quarterly) — see "Restore-test cadence" below.

For a quick sanity check that the archive isn't corrupt:

zcat /var/backups/orkestra/orkestra-2026-05-23T03-00-00Z.archive.gz | head -c 1024 | file -
# Should report: BSON / Binary JSON

MongoDB restore

# Stop the application services so nothing writes during restore.
sudo systemctl stop orkestra.service

# Restore from a specific archive.
zcat /var/backups/orkestra/orkestra-2026-05-23T03-00-00Z.archive.gz | \
docker exec -i orkestra-mongodb mongorestore \
--username admin \
--password "$(grep MONGO_ROOT_PASSWORD /opt/orkestra/docker/.env | cut -d= -f2)" \
--authenticationDatabase admin \
--archive --gzip --drop

# Bring the application back.
sudo systemctl start orkestra.service

--drop removes target collections before restoring — guarantees the restore matches the dump exactly. Without --drop, restored documents merge with whatever's currently in the DB.

If the restore is for an entirely different deployment (e.g. clone production into staging), you'll also need:

  • The original docker/.env OAUTH_TOKEN_ENCRYPTION_KEY — without it every encrypted secret in module_configs is unrecoverable.
  • The original docker/.env ORKESTRA_KMS_MASTER_KEY — without it every per-tenant DEK in the compliance addon is unrecoverable.
  • The original docker/keys/jwt-*.pem (if you want existing sessions to keep working) OR a fresh key pair (in which case all issued tokens are invalidated and users have to log in again).

Filesystem state

PathUsed byAction
docker/keys/jwt-private.pem, jwt-public.pemauth core moduleBack up once. Treat as a long-lived secret.
docker/keys/apple.p8 (if Apple Sign In configured)auth core moduleBack up once.
docker/.envevery moduleBack up once after make init, then again after any edit.
docker/marketing-spool-* (bind mount)marketing addonOptional. Imports that haven't been processed yet live here. Losing this loses in-flight imports but not anything already committed to MongoDB.
docker/mongo-init/mongodb infra serviceIn-repo, gitignored copy is fine.

A reasonable one-liner for the "everything outside Mongo" tarball:

sudo tar czf /var/backups/orkestra/files-$(date -u +%Y-%m-%dT%H-%M-%SZ).tar.gz \
/opt/orkestra/docker/.env \
/opt/orkestra/docker/keys
sudo chmod 600 /var/backups/orkestra/files-*.tar.gz

Treat these as secrets — anyone with read access to a backup tarball has the encryption key that decrypts every secret in module_configs.

Restore-test cadence

Run the restore drill at least quarterly, regardless of which path you use. The bundled ./restore.sh --dry-run is a fast pre-flight (parses the archive end-to-end, validates target containers, lists what would change) but it doesn't substitute for an end-to-end restore against a real staging stack — only that catches encryption-key drift and schema migrations.

Easiest path: spin up a staging stack, restore the latest production backup into it, verify a known canary (a specific user, a specific module config) is intact. Three things almost always work on the way out (you've automated them) and break on the way back (you haven't tested them):

  • Encryption key rotation — if you've rotated OAUTH_TOKEN_ENCRYPTION_KEY since the dump was taken, the restore is useless without the old key. Keep a key archive separate from the DB backup.
  • Schema migrations — Orkestra's collection schemas evolve. Restoring a 6-month-old dump into a current binary may produce documents whose shape the current code rejects. The make backend-test suite + docker compose -f docker-compose.prod.yml --env-file .env up -d smoke is the test.
  • Index recreation--drop removes indexes too. The module registry's ensureCollection recreates the declared indexes on next boot, but background index builds take time on large collections — expect slow queries for a few minutes post-restore.

Document the runbook on whatever wiki / Notion / GDoc your ops team reads. The dry-run is no good if no one knows where the dump archives live when the production VM is on fire.

What --drop does NOT touch

  • The admin database (Mongo's own — system users, roles).
  • Collections present in the live DB but not in the dump (e.g. new collections introduced by a module enabled after the dump).
  • Anything outside MongoDB — Redis, filesystem, env vars.

Redis

Redis stores ephemeral state — refresh tokens, rate-limit counters, ConfigService cache (30s TTL). On restore:

  • Refresh tokens are gone → every user has to log in again (operator UX hit but no data loss).
  • Rate-limit counters are gone → first-request burst is permitted from every IP for one window.
  • ConfigService cache is gone → first request to each module config endpoint hits MongoDB (cold-start latency for ~30 s).

If your operational priority is "zero session loss across restores", run Redis with AOF persistence and back up the AOF file. Most operators don't bother — the UX cost is small.

See also