Troubleshooting

A catalog of symptoms and their fixes, indexed by what you'll actually see in the log / browser / curl output. If you hit something not on this list, open a GitHub issue and we'll add it.

Backend won't start

Symptom: container exits immediately, log says `JWT_PRIVATE_KEY_PATH: file not found`

The JWT key pair wasn't generated. From a fresh checkout, this happens if you ran docker compose up without first running make init.

make init                 # generates docker/keys/jwt-{private,public}.pem
sudo systemctl restart orkestra.service     # or docker compose up -d

Symptom: `panic: open /app/keys/jwt-private.pem: permission denied`

The host file's owner doesn't match the container user (UID 1000). Common after make init --force run as root.

sudo chown -R 1000:1000 docker/keys
sudo chmod 600 docker/keys/jwt-private.pem
sudo chmod 644 docker/keys/jwt-public.pem

Symptom: `failed to connect to MongoDB: server selection timeout`

Three sub-cases:

Infra not started. Run docker compose -f docker-compose.infra.yml --env-file .env up -d first.
Password mismatch. The infra compose seeded Mongo with a different password than what's in docker/.env now. Wipe Mongo data with docker compose -f docker-compose.infra.yml down -v (destructive!) and re-init.
First-boot SCRAM race. Mongo accepts TCP before SCRAM user provisioning completes. The backend retries with backoff (up to 20 attempts, 500ms→5s — see shared/database/NewMongoConnection) — give it 30 s. If it still fails after 30 s, look at infra logs (docker logs orkestra-mongodb).

Symptom: `failed to connect to Redis: AUTH failed`

Same root cause as the Mongo case (password mismatch between .env and what Redis was seeded with). Fix:

docker compose -f docker-compose.infra.yml down
docker volume rm orkestra-infra_orkestra-redis-data
docker compose -f docker-compose.infra.yml --env-file .env up -d

Symptom: `bind: address already in use`

Another process holds port 3000 (default backend) or 8080 (default frontend). Either stop it:

sudo ss -lntp | grep :3000
sudo kill <pid>

Or override in docker/.env:

BACKEND_PORT=3100
FRONTEND_PORT=8181

The browser is silently dropping your cookie. Almost always COOKIE_SECURE=true over plain HTTP (insecure cookies on HTTP, secure cookies are HTTPS-only).

For dev:

# docker/.env
COOKIE_SECURE=false

For production, the right fix is HTTPS, not COOKIE_SECURE=false. See Reverse-proxy with Caddy.

Symptom: `401 audience_mismatch`

You're hitting the wrong audience host with a token minted for the other audience. E.g. operator token on api.example.com. Mint a token for the right audience:

./scripts/devtoken.sh administrator --audience client    # for api.* surface
./scripts/devtoken.sh administrator                       # for console.* surface (default)

See ADR-0003 host split for the full audience model.

Symptom: OAuth callback returns `redirect_uri_mismatch`

The redirect URI you registered with the OAuth provider doesn't match what the backend computed. Two sub-cases:

Wrong host registered. Provider expects http://console.localhost:3000/v1/auth/oauth/google/callback, you registered http://localhost:3000/.... Re-register with the operator host.
HTTP vs HTTPS mismatch. Production registers https://..., backend redirected with http://.... Set BACKEND_URL=https://console.example.com in docker/.env.

See OAuth Providers for the per-provider walkthroughs.

Two backends share a Mongo database but issue different JWT signing keys. The cookie from backend A is rejected by backend B (which then asks the browser to re-auth), but A's session is still good.

Standardize on one JWT key pair across every replica. In K8s, mount the same Secret into every Pod (see Kubernetes overview).

Frontend issues

Symptom: blank page, browser console says `Failed to load module script: Expected a JavaScript module script`

Vite dev server is serving an HTML page where JS was expected — usually a 404 served via the SPA fallback. Two sub-cases:

Vite hasn't started yet. First boot of npm install && npm run dev takes 30–60 s. Watch with docker compose logs -f orkestra-frontend-admin.
VITE_API_URL points at a host the browser can't reach. The browser fetches Vite-built bundles from the API URL; if it's set to a Docker-internal hostname, the browser fails. Set VITE_API_URL=http://localhost:3000 (or your public hostname) in docker/.env.

Symptom: 404 dev-token page

The core dev-token endpoint (POST /dev/token, in internal/shared/devtoken) is gated to non-production environments, so ENV=production disables it.

For local / staging-like: keep ENV=development or staging, NOT production — the endpoint is present in any non-production build.
For production: the dev token issuer is correctly disabled. Create the first admin via the setup wizard (/v1/setup/*), then use OAuth or password login.

Symptom: CORS error in browser console

The browser blocked the request because the API origin isn't in the backend's allowlist. Two checks:

The hostname the browser hits is in OPERATOR_CORS_ORIGINS (operator surface) or CLIENT_CORS_ORIGINS (client surface), comma-separated.
Both vars fall back to the legacy CORS_ORIGINS when empty — but the per-audience vars take priority.

# docker/.env
OPERATOR_CORS_ORIGINS=https://console.example.com
CLIENT_CORS_ORIGINS=https://client.example.com

Restart the backend after editing.

Module / admin UI issues

Symptom: an addon I expected isn't in `/admin/modules`

The base ships no optional addons — the optionalModules catalog is empty, so a stock Orkestra build only lists the 7 core modules. An addon only appears once a fork has added it in-tree (internal/addons/<name>/ + a cmd/server/catalog_<name>.go that registers its factory). Once registered, it's listed at /admin/modules and toggled there at runtime — there are no per-SKU builds or build tags to compile it in.

If the addon was registered but still isn't listed, confirm the fork's catalog_<name>.go calls the registry in its init() and that the binary you launched was built from that fork (see the Backend module map).

Symptom: enabling an addon at `/admin/modules` returns 409

Dependency conflict — you're trying to enable an addon whose dependency isn't enabled. The error body lists which one.

Enable the dependency first, then re-enable the dependent addon. The registry auto-resolves transitive deps when you start from the top, but enforces them strictly on individual toggle.

Symptom: addon-managed container (`orkestra-hindsight`, `orkestra-memgraph`) isn't visible in `docker compose ps`

By design. Addon-owned infrastructure containers are managed by the backend via the host Docker socket, not by Compose. Discover them with:

docker ps --filter label=orkestra.managed=true

See docker/CLAUDE.md "Backend-managed Containers" for the full lifecycle.

Symptom: backend can't start the addon container — `permission denied on /var/run/docker.sock`

The backend container isn't in the host's docker group. Check DOCKER_GID in docker/.env:

getent group docker | cut -d: -f3    # on the host

Match that value in docker/.env, restart the backend. Common values: 999 (Debian), 1001 (Ubuntu / WSL2).

Alternatively, set CONTAINER_CONTROL_ENABLED=false to disable backend-managed containers entirely (you'll have to start Hindsight / Memgraph manually).

Deploy / CI issues

Symptom: pre-commit `end-of-file-fixer` keeps adding a newline to badge SVGs, blocks the push

Recurring drift — the CI coverage-badge bot writes SVGs without a final newline, the local hook adds one. Workaround: small hygiene commit:

git add .github/badges/*.svg
git commit -m "chore(badges): trailing newline on coverage svg"
git push origin dev

The real fix (patch the badge bot to emit a newline) is on the backlog.

Symptom: `make ci` red but `make ci-backend` green

A different surface is broken. Check which:

make ci         # shows what changed and what ran

ci-frontend-admin or ci-frontend-client failures are usually a stale npm run typecheck — run cd frontend-admin && npm ci to refresh node_modules.

Symptom: tenantscope analyzer fires on a new core module's Mongo write

You added a new core module (internal/core/<name>/) with a Mongo repo that doesn't derive its filter from tenantrepo.Scope. Two options:

The collection IS tenant-scoped. Use tenantrepo.MustScope(ctx) to derive the filter.
The collection is global (system-config, sentinel, etc.). Add //tenantscope:allow <reason> on the offending line — see project_ci_release_blockers for prior examples (the logging module's log_levels collection is a worked case).

Symptom: openapi-check fails on dev tip

You added / renamed / removed a Huma route and didn't regenerate backend/openapi/enterprise.json. Fix:

cd backend
make openapi-dump
git add openapi/enterprise.json
git commit -m "chore(backend): regenerate openapi spec"

WSL2-specific

Symptom: AIR doesn't reload when I save a Go file

The Windows filesystem mounted into WSL2 doesn't fire inotify events reliably. Two workarounds:

Keep the source on the WSL2 native filesystem (/home/<user>/..., not /mnt/c/...). The dev compose's bind-mount picks up changes correctly when the source is on ext4.

Manual rebuild when AIR misses an event:

docker exec orkestra-backend-dev go build -o /app/tmp/main ./cmd/server
docker restart orkestra-backend-dev

Observability

Symptom: `OTEL exporter error: connect: connection refused`

OTEL_EXPORTER_OTLP_ENDPOINT points at a collector that isn't running. Either bring up the observability stack:

./orkestra.sh observability up

Or set OTEL_TRACES_ENABLED=false and OTEL_LOGS_ENABLED=false to silence the exporter (the SDK falls back to no-op).

Symptom: Loki labels exceed cardinality

Per ADR-0002, the route label uses Chi route templates (/v1/users/{id}), never the raw path. If you see unbounded route cardinality in Prometheus, a custom handler is registering routes outside the Chi router — fix the registration to use chi.Mux.Method() / huma.Register() so the template is captured.

Still stuck?

Search existing issues: github.com/orkestra-cc/orkestra/issues
Open a new one with: the exact error message, the output of docker compose -f <your-compose>.yml --env-file .env ps, and the last 50 lines of docker compose logs backend.

Backend won't start​

Symptom: container exits immediately, log says JWT_PRIVATE_KEY_PATH: file not found​

Symptom: panic: open /app/keys/jwt-private.pem: permission denied​

Symptom: failed to connect to MongoDB: server selection timeout​

Symptom: failed to connect to Redis: AUTH failed​

Symptom: bind: address already in use​

Login fails​

Symptom: login returns 200 but the next request is 401​

Symptom: 401 audience_mismatch​

Symptom: OAuth callback returns redirect_uri_mismatch​

Symptom: login works, then logout immediately re-logs you in​

Frontend issues​

Symptom: blank page, browser console says Failed to load module script: Expected a JavaScript module script​

Symptom: 404 dev-token page​

Symptom: CORS error in browser console​

Module / admin UI issues​

Symptom: an addon I expected isn't in /admin/modules​

Symptom: enabling an addon at /admin/modules returns 409​

Symptom: addon-managed container (orkestra-hindsight, orkestra-memgraph) isn't visible in docker compose ps​

Symptom: backend can't start the addon container — permission denied on /var/run/docker.sock​

Deploy / CI issues​

Symptom: pre-commit end-of-file-fixer keeps adding a newline to badge SVGs, blocks the push​

Symptom: make ci red but make ci-backend green​

Symptom: tenantscope analyzer fires on a new core module's Mongo write​

Symptom: openapi-check fails on dev tip​

WSL2-specific​

Symptom: AIR doesn't reload when I save a Go file​

Observability​

Symptom: OTEL exporter error: connect: connection refused​

Symptom: Loki labels exceed cardinality​

Still stuck?​