Backup and restore

Orkestra's stateful surfaces are:

What	Where	Backup needed?
Primary data (users, tenants, modules, billing, marketing, etc.)	MongoDB	✅ yes, daily minimum
AES-256-GCM-encrypted secrets (`module_configs`)	MongoDB	✅ yes — but recovery also requires `OAUTH_TOKEN_ENCRYPTION_KEY` from `docker/.env`
Per-tenant DEKs (compliance)	MongoDB	✅ yes — recovery requires `ORKESTRA_KMS_MASTER_KEY` from `docker/.env`
Audit events (`audit_events`)	MongoDB	✅ yes — never sample, never delete
Sessions / refresh tokens / rate-limit counters	Redis	❌ no — losing logs everyone out, doesn't lose data
HTTP request logs	Loki / Datadog / stdout	optional — your retention policy
OpenTelemetry traces / metrics	Tempo / Prometheus / vendor	optional — your retention policy
Object blobs (avatars today, attachments tomorrow)	RustFS / S3	✅ yes if you have user uploads in flight
Object content (PDF outputs, marketing imports)	filesystem (`MARKETING_IMPORT_SPOOL_DIR`)	✅ yes if you need it long-term
JWT keys (`docker/keys/`)	filesystem	✅ yes — losing invalidates every issued token

Bottom line: back up MongoDB nightly and the contents of docker/.env + docker/keys/ once. Skip Redis. Decide on the spool directory based on how much your importers matter.

Orkestra ships two paths for this:

./backup.sh and ./restore.sh at the repo root — bundled, opinionated, cover every stateful surface in one tarball. Right for dev, staging, and small single-VM production. See Bundled scripts below.
Manual mongodump / mongorestore + a separate secrets tarball — finer-grained, the right choice once you start shipping dumps to off-site object storage with their own retention and ACLs. See Manual recipes.

Bundled scripts

./backup.sh and ./restore.sh live at the repo root alongside ./orkestra.sh. Both run as a TUI (no args) or via CLI flags, and target the canonical infra container names the base ships: orkestra-mongodb, orkestra-redis, orkestra-rustfs. (A fork that adds an addon with its own datastore extends the component list — the archived graph addon, for example, snapshotted an orkestra-memgraph volume.)

What they cover

Component	Backup mechanism	Restore mechanism
`mongodb`	`mongodump --db $MONGO_DATABASE --archive --gzip` (scoped to the `orkestra` DB — skips `admin`/`local`/`config` system DBs and the throwaway `orkestra_openapi_dump` sandbox that `make openapi-dump` uses)	`mongorestore --drop --archive --gzip`, with automatic `--nsFrom`/`--nsTo` remapping if the archive's source DB name differs from the target's `MONGO_DATABASE`
`redis`	Synchronous `SAVE`, then `docker cp` of `dump.rdb` + the AOF directory from whatever path `CONFIG GET dir` reports (the redis-stack image used in `docker-compose.infra.yml` writes to `/`, not `/data`)	Stop container, copy files back, start container
`rustfs`	Enumerates buckets via `aws-cli` over `orkestra-network` and `s3 sync`s each one out	`s3 mb` (best-effort) + `s3 sync` from the bundle back into each bucket
`secrets`	Copies `docker/.env` and `docker/keys/*` into the bundle	Writes the files back into `docker/`, prompting before overwriting existing ones

A manifest.json at the tarball root records the schema version, timestamp, environment, host, and which components were actually captured. restore.sh reads it to decide which components are available.

:::warning Secrets live in the tarball The secrets component bundles docker/.env (DB passwords, OAuth client secrets, KMS master key, encryption keys) and docker/keys/* (JWT private key, Apple p8). Anyone with read access to the tarball has everything they need to decrypt every secret in module_configs. Store the resulting file in a secrets-grade location — encrypted volume, vault, or off-site bucket with strict ACLs. For production deployments, prefer the manual recipes below so secret material is backed up separately from data on its own retention. :::

Quick start

# Interactive TUI — pick components from a menu
./backup.sh

# Non-interactive — back up everything available
./backup.sh all --yes

# Subset — only some components
./backup.sh --components mongodb,redis,secrets --yes

Output lands in ./backups/orkestra-backup-<env>-<UTC-timestamp>.tar.gz by default; override with --output.

# Interactive TUI — pick a tarball from ./backups/
./restore.sh

# Restore a specific bundle
./restore.sh ./backups/orkestra-backup-staging-20260527-071724.tar.gz

# Subset, no prompts (CI / scripted)
./restore.sh ./backups/<file>.tar.gz --components mongodb --yes

Dry-run

restore.sh has a --dry-run flag that reports exactly what would happen without mutating anything. It's the safest way to validate an archive end-to-end before committing to the restore:

./restore.sh ./backups/<file>.tar.gz --dry-run

What dry-run actually does, per component:

mongodb — runs mongorestore --dryRun against the live container so the archive is fully parsed, every namespace is listed, and any decompression / format errors surface; reports namespace remapping if source DB differs from target
redis — queries the live container's CONFIG GET dir/dbfilename/appenddirname and prints the exact docker cp destinations it would use
rustfs — runs aws s3 sync --dryrun against the live endpoint so per-object diffs (additions, overwrites, deletions) appear before any byte is moved
secrets — lists every file in the bundle and flags which targets already exist (would prompt to overwrite) vs which would be new

Dry-run also exercises the preconditions: if a target container isn't running, the dry-run for that component fails the same way the real restore would. That makes it a genuine preflight, not just a print of intent.

CLI reference

`backup.sh`

Flag	Description
`all` (positional)	Back up every component currently available on the host.
`--components <csv>` / `-c`	Subset, e.g. `--components mongodb,secrets`.
`--output <path>` / `-o`	Write to a specific path instead of `./backups/orkestra-backup-<env>-<timestamp>.tar.gz`.
`--yes` / `-y`	Skip the proceed-prompt. Required for non-interactive shells.
`--help` / `-h`	Print usage.

`restore.sh`

Flag	Description
`<file>` (positional)	Path to a tarball produced by `backup.sh`. Omit to pick from `./backups/` via the TUI.
`--components <csv>` / `-c`	Restore only some components from the archive. Must be a subset of the manifest.
`--dry-run` / `-n`	Show what would happen, change nothing. Bypasses the destructive-action confirmation and runs without `--yes` in non-interactive shells.
`--yes` / `-y`	Skip the `type 'restore' to confirm` gate. Destructive — required for non-interactive shells.
`--help` / `-h`	Print usage.

Tarball layout

orkestra-backup-<env>-<timestamp>.tar.gz
├── manifest.json                         schema, env, timestamp, components included
├── mongodb/
│   ├── mongo.archive.gz                  mongodump --archive --gzip output
│   └── database.txt                      source DB name (for ns-remapping on restore)
├── redis/
│   ├── dump.rdb                          RDB snapshot
│   ├── appendonlydir/                    Redis 7+ AOF folder (manifest + base + incr)
│   │   └── appendonly.aof.*
│   └── redis-layout.txt                  live `dir` / `dbfilename` / `appenddirname` at backup time
├── rustfs/
│   └── <bucket>/                         one directory per S3 bucket the script could see
│       └── ...                           full object tree synced via aws s3 sync
└── secrets/
    ├── .env                              docker/.env verbatim
    └── keys/
        ├── jwt-private.pem
        ├── jwt-public.pem
        └── apple.p8                      (if Apple Sign In configured)

manifest.json schema:

{
  "schema": "orkestra-backup/v1",
  "createdAt": "2026-05-27T07:17:29Z",
  "environment": "staging",
  "host": "orkestra-staging-1",
  "components": ["mongodb", "redis", "rustfs", "secrets"]
}

When to fall back to the manual recipes

The bundled scripts are the right default for dev, staging, and single-VM production. Reach for the manual recipes when:

You want secret material on a separate retention from data. The bundled secrets component ships keys and the .env in the same tarball as nightly data — fine if the tarball goes to an HSM-grade target, awkward if it goes to a general-purpose S3 bucket.
You want to stream dumps directly to off-site object storage (aws s3 cp - s3://...) instead of round-tripping through a local file.
You want fine-grained cron retention (e.g. hourly snapshots × 24h + daily × 30 + monthly × 12 — the bundled scripts produce one timestamped file per run; everything else is your job).
You're running Orkestra on a host where Docker isn't available (the bundled scripts assume docker).

Manual recipes

:::note Two scripts named backup.sh The bundled tool at the repo root is ./backup.sh. The production cron recipe below also happens to land at /opt/orkestra/scripts/backup.sh on the deployment VM. They are unrelated — the bundled one is a TUI/CLI for dev + small-prod use, the production one is a single-purpose mongodump wrapper for nightly cron. Rename the cron script if both will ever live on the same host. :::

MongoDB backup

mongodump is the operator-friendly path — produces BSON dumps that mongorestore reads. It does not lock the database; it's safe on a live system, though large dumps spike I/O.

Daily backup script

Save as /opt/orkestra/scripts/backup.sh on the production VM:

#!/bin/bash
# /opt/orkestra/scripts/backup.sh
set -euo pipefail

TIMESTAMP=$(date -u +%Y-%m-%dT%H-%M-%SZ)
BACKUP_DIR="/var/backups/orkestra"
RETENTION_DAYS=30

mkdir -p "$BACKUP_DIR"

# Source the env to get MONGO_ROOT_PASSWORD
set -a
. /opt/orkestra/docker/.env
set +a

# Dump into a tarball
docker exec orkestra-mongodb mongodump \
  --username "${MONGO_ROOT_USERNAME:-admin}" \
  --password "${MONGO_ROOT_PASSWORD}" \
  --authenticationDatabase admin \
  --archive --gzip > "${BACKUP_DIR}/orkestra-${TIMESTAMP}.archive.gz"

# Prune older than retention
find "$BACKUP_DIR" -name "orkestra-*.archive.gz" -mtime "+${RETENTION_DAYS}" -delete

# Optional: ship to S3 / GCS / Azure Blob
# aws s3 cp "${BACKUP_DIR}/orkestra-${TIMESTAMP}.archive.gz" s3://your-bucket/orkestra/

Make executable and schedule:

sudo chmod +x /opt/orkestra/scripts/backup.sh

# /etc/cron.d/orkestra-backup
0 3 * * * deploy /opt/orkestra/scripts/backup.sh >> /var/log/orkestra-backup.log 2>&1

3 AM UTC daily, kept for 30 days. Adjust to your timezone / retention policy.

Verify the backup is sane

A backup file you can't restore is not a backup. Test monthly (or at least quarterly) — see "Restore-test cadence" below.

For a quick sanity check that the archive isn't corrupt:

zcat /var/backups/orkestra/orkestra-2026-05-23T03-00-00Z.archive.gz | head -c 1024 | file -
# Should report: BSON / Binary JSON

MongoDB restore

# Stop the application services so nothing writes during restore.
sudo systemctl stop orkestra.service

# Restore from a specific archive.
zcat /var/backups/orkestra/orkestra-2026-05-23T03-00-00Z.archive.gz | \
  docker exec -i orkestra-mongodb mongorestore \
    --username admin \
    --password "$(grep MONGO_ROOT_PASSWORD /opt/orkestra/docker/.env | cut -d= -f2)" \
    --authenticationDatabase admin \
    --archive --gzip --drop

# Bring the application back.
sudo systemctl start orkestra.service

--drop removes target collections before restoring — guarantees the restore matches the dump exactly. Without --drop, restored documents merge with whatever's currently in the DB.

If the restore is for an entirely different deployment (e.g. clone production into staging), you'll also need:

The original docker/.env OAUTH_TOKEN_ENCRYPTION_KEY — without it every encrypted secret in module_configs is unrecoverable.
The original docker/.env ORKESTRA_KMS_MASTER_KEY — without it every per-tenant DEK in the compliance addon is unrecoverable.
The original docker/keys/jwt-*.pem (if you want existing sessions to keep working) OR a fresh key pair (in which case all issued tokens are invalidated and users have to log in again).

Filesystem state

Path	Used by	Action
`docker/keys/jwt-private.pem`, `jwt-public.pem`	`auth` core module	Back up once. Treat as a long-lived secret.
`docker/keys/apple.p8` (if Apple Sign In configured)	`auth` core module	Back up once.
`docker/.env`	every module	Back up once after `make init`, then again after any edit.
`docker/marketing-spool-*` (bind mount)	`marketing` addon	Optional. Imports that haven't been processed yet live here. Losing this loses in-flight imports but not anything already committed to MongoDB.
`docker/mongo-init/`	`mongodb` infra service	In-repo, gitignored copy is fine.

A reasonable one-liner for the "everything outside Mongo" tarball:

sudo tar czf /var/backups/orkestra/files-$(date -u +%Y-%m-%dT%H-%M-%SZ).tar.gz \
  /opt/orkestra/docker/.env \
  /opt/orkestra/docker/keys
sudo chmod 600 /var/backups/orkestra/files-*.tar.gz

Treat these as secrets — anyone with read access to a backup tarball has the encryption key that decrypts every secret in module_configs.

Restore-test cadence

Run the restore drill at least quarterly, regardless of which path you use. The bundled ./restore.sh --dry-run is a fast pre-flight (parses the archive end-to-end, validates target containers, lists what would change) but it doesn't substitute for an end-to-end restore against a real staging stack — only that catches encryption-key drift and schema migrations.

Easiest path: spin up a staging stack, restore the latest production backup into it, verify a known canary (a specific user, a specific module config) is intact. Three things almost always work on the way out (you've automated them) and break on the way back (you haven't tested them):

Encryption key rotation — if you've rotated OAUTH_TOKEN_ENCRYPTION_KEY since the dump was taken, the restore is useless without the old key. Keep a key archive separate from the DB backup.
Schema migrations — Orkestra's collection schemas evolve. Restoring a 6-month-old dump into a current binary may produce documents whose shape the current code rejects. The make backend-test suite + docker compose -f docker-compose.prod.yml --env-file .env up -d smoke is the test.
Index recreation — --drop removes indexes too. The module registry's ensureCollection recreates the declared indexes on next boot, but background index builds take time on large collections — expect slow queries for a few minutes post-restore.

Document the runbook on whatever wiki / Notion / GDoc your ops team reads. The dry-run is no good if no one knows where the dump archives live when the production VM is on fire.

What `--drop` does NOT touch

The admin database (Mongo's own — system users, roles).
Collections present in the live DB but not in the dump (e.g. new collections introduced by a module enabled after the dump).
Anything outside MongoDB — Redis, filesystem, env vars.

Redis

Redis stores ephemeral state — refresh tokens, rate-limit counters, ConfigService cache (30s TTL). On restore:

Refresh tokens are gone → every user has to log in again (operator UX hit but no data loss).
Rate-limit counters are gone → first-request burst is permitted from every IP for one window.
ConfigService cache is gone → first request to each module config endpoint hits MongoDB (cold-start latency for ~30 s).

If your operational priority is "zero session loss across restores", run Redis with AOF persistence and back up the AOF file. Most operators don't bother — the UX cost is small.

Bundled scripts​

What they cover​

Quick start​

Dry-run​

CLI reference​

backup.sh​

restore.sh​

Tarball layout​

When to fall back to the manual recipes​

Manual recipes​

MongoDB backup​

Daily backup script​

Verify the backup is sane​

MongoDB restore​

Filesystem state​

Restore-test cadence​

What --drop does NOT touch​

Redis​

See also​