Backup and restore for the Cullis Mastio bundle. Hot SQLite snapshot, encrypted tarball, full restore in ~15 min. Scenario walk-throughs for VM loss, ransomware, accidental wipe.

Disaster recovery

The Mastio is the trust root of your agent population: its Org CA signs every agent certificate, and its mcp_proxy.db carries every session, audit record, and user/agent enrollment. Losing it without a backup means re-enrolling everything from scratch with a new Org CA — every agent, every cert thumbprint pin. With the procedure below, recovery is a 15-minute job from any host that can read the encrypted backup file.

This guide covers the bundle deploy (single-host Docker Compose). For Kubernetes deployments, the Helm chart pairs with your cluster’s existing backup tooling (Velero, etcd snapshots, Postgres backups) — out of scope here.

What gets backed up

Path	Contents	Why critical
`data/mcp_proxy.db`	SQLite: agents, sessions, audit log, users, config	Loss = full re-enrollment
`nginx-certs/org-ca.{crt,key}`	Org CA keypair (trust root)	Loss = every agent cert invalidated
`nginx-certs/mastio-server.{crt,key}`	Server cert for nginx sidecar	Loss = TLS termination broken
`certs/org-ca.pem`	Operator-readable copy of the Org CA cert	Loss = inconvenience only (regenerated from `nginx-certs/`)
`proxy.env`	Operator-side config (`PROXY_PUBLIC_URL`, admin secrets, plugin envs)	Loss = re-tuning from scratch

What is not backed up:

Plugin secrets stored in external systems (Vault, AWS Secrets Manager, etc.) — those have their own backup strategy.
Cloud KMS Org CA key copy (if MCP_PROXY_KMS_BACKEND is set to vault / aws / azure / gcp) — the key lives in the KMS already, outside the bundle. Bundle backup snapshots only the Mastio’s local view.

Taking a backup

The bundle ships bind directories (./data/, ./nginx-certs/, ./certs/) on the host filesystem, so backup is a sqlite3 .backup + tar + gpg away. From inside the bundle dir:

# 1. Take a hot SQLite snapshot (no need to stop the running stack)
docker compose -p cullis-mastio exec -T mastio \
    sqlite3 /data/mcp_proxy.db ".backup /data/mcp_proxy.db.snapshot"

# 2. Tar the bind dirs + proxy.env into a single archive
TS=$(date -u +%Y%m%dT%H%M%SZ)
tar czf "cullis-mastio-backup-${TS}.tar.gz" \
    data/mcp_proxy.db.snapshot \
    nginx-certs/ \
    certs/ \
    proxy.env

# 3. Encrypt with a passphrase
gpg --symmetric --cipher-algo AES256 \
    --output "cullis-mastio-backup-${TS}.tar.gz.gpg" \
    "cullis-mastio-backup-${TS}.tar.gz"

# 4. Remove the unencrypted copies
rm "cullis-mastio-backup-${TS}.tar.gz" data/mcp_proxy.db.snapshot

The hot SQLite snapshot is consistent without stopping the running Mastio (uses SQLite’s .backup command, which holds a read transaction). Cert files are copied as-is; they rarely change at runtime.

Bundled upgrade backup (automatic)

./deploy.sh --upgrade <version> automatically writes a pre-upgrade backup to ./backups/pre-upgrade-<ts>/ before applying the upgrade. This is not a substitute for a regular off-host backup (it stays on the same disk), but it lets you roll back a botched upgrade without ceremony.

Non-interactive (cron)

For scheduled backups, pre-place the passphrase in a 0400-mode file:

echo 'your-strong-passphrase' > /etc/cullis/backup.pass
chmod 0400 /etc/cullis/backup.pass
chown root:root /etc/cullis/backup.pass

Then wrap the four-step procedure in a script and pass --batch --passphrase-file to gpg:

gpg --symmetric --cipher-algo AES256 --batch \
    --passphrase-file /etc/cullis/backup.pass \
    --output "${OUT_DIR}/cullis-mastio-backup-${TS}.tar.gz.gpg" \
    "${WORKDIR}/cullis-mastio-backup-${TS}.tar.gz"

Sample cron entry (daily 02:00, retain 30 days):

0 2 * * *  /opt/cullis-mastio-bundle/backup.sh \
           && find /var/backups/cullis -mtime +30 -name '*.tar.gz.gpg' -delete

A reference backup.sh wrapper that codifies the four steps above and respects --passphrase-file is on the bundle roadmap; until then, copy the snippet above into a script in your config-management repo.

Off-host copy

The encrypted file is safe to transmit over untrusted channels. Pick one (or several):

# rsync to a separate host
rsync -a backups/cullis-mastio-backup-*.tar.gz.gpg \
      backup-host:/var/backups/cullis/

# S3
aws s3 cp backups/cullis-mastio-backup-*.tar.gz.gpg \
          s3://yourorg-cullis-backups/

# USB drive
cp backups/cullis-mastio-backup-*.tar.gz.gpg /mnt/usb/cullis/

The passphrase is the only secret. Store it in your password manager (Bitwarden, 1Password) under a different item from the backup itself. Both lost = data unrecoverable.

Restoring

On a fresh host or after disaster:

Install Docker + Compose v2 (see Mastio on Docker prerequisites).

Re-deploy the bundle into an empty directory:

curl -L -o cullis-mastio-bundle.tar.gz \
    https://github.com/cullis-security/cullis/releases/latest/download/cullis-mastio-bundle.tar.gz
tar xzf cullis-mastio-bundle.tar.gz
cd cullis-mastio-bundle/

Decrypt and extract the backup over the bundle’s bind dirs:

gpg --decrypt /path/to/cullis-mastio-backup-*.tar.gz.gpg \
    | tar xzf - --overwrite
# Rename the snapshot back to the live DB filename
mv data/mcp_proxy.db.snapshot data/mcp_proxy.db

Sanity-check proxy.env:
- MCP_PROXY_PROXY_PUBLIC_URL matches the hostname the new host will serve on. If you’re moving to a new IP / DNS name, update it here and update MCP_PROXY_NGINX_SAN to include the new hostname.
- Plugin secret references (Vault paths, KMS ARNs, API keys) are still resolvable from the new host.
Bring up the stack:
```
./deploy.sh
```

Verify post-boot:

curl -k https://localhost:9443/healthz
curl -k https://localhost:9443/readyz
docker compose -p cullis-mastio logs mastio | tail -50

/readyz should return {"status":"ready",...}. Logs should not show TLS handshake errors or Org CA mint warnings.

Scenario walk-throughs

VM disk failure (most common)

Provision new VM, install Docker.
Download bundle, tar xz, cd cullis-mastio-bundle/.
Decrypt the backup over the bind dirs (mount the off-host backup volume or copy via scp first).
Edit proxy.env if the public URL changes.
./deploy.sh.

Time: ~15 minutes including DNS update if MCP_PROXY_PROXY_PUBLIC_URL changes. Existing agents continue working as long as they can reach the new IP and the Org CA cert is restored (= preserves their thumbprint pin).

Ransomware / host compromise

Quarantine the affected host (do not power it back on; preserve forensics).
Provision new VM as above.
Restore from the last clean backup (verify the timestamp pre-dates the suspected breach).
Rotate all secrets that could have leaked:
- MCP_PROXY_ADMIN_SECRET, MCP_PROXY_DASHBOARD_SIGNING_KEY in proxy.env — regenerate with openssl rand -hex 32
- Anthropic / OpenAI API keys in proxy.env — rotate at the provider
- Cloud creds (AWS_ACCESS_KEY_ID, Azure SP, etc.) — rotate at IdP
- Any Vault tokens — revoke + re-issue
Force agent cert re-issuance for any agent that could have had its private key exposed (dashboard → Agents → Rotate cert, or POST /registry/agents/<id>/rotate-cert).
Audit log review on the restored DB to identify the breach window.

Accidental wipe (`rm -rf data/` on the wrong host)

Stop the stack: ./deploy.sh --down.
Find the most recent backup: ls -lt backups/ /var/backups/cullis/ | head -5.

Decrypt and extract over the (now empty) bind dirs:

gpg --decrypt /path/to/latest.tar.gz.gpg | tar xzf - --overwrite
mv data/mcp_proxy.db.snapshot data/mcp_proxy.db

./deploy.sh.

Time: ~5 minutes since you’re not provisioning a new host.

Org CA key rotation after suspected compromise

The Org CA key is the most sensitive material in the deploy. If you suspect it leaked:

Take a backup first (audit trail).
Stop the stack.
Rotate the Org CA: this is intrusive. Every agent cert needs re-issuance under the new CA. See Rotate keys for the full procedure.
Distribute the new CA cert to all agents via their next enrollment.

Backup helps here by giving you a known-good baseline to roll forward from, but the rotation itself is independent.

Compliance mapping

The backup pattern aligns with these common controls:

Control	What
SOC 2 CC9.2 (data backup)	Encrypted off-host backup with documented frequency
ISO 27001 A.8.13 (information backup)	Same
DORA Art. 12 (ICT business continuity)	RPO + RTO defined (24h / 15min)
EU AI Act Art. 12 (record-keeping)	Audit log preserved in `mcp_proxy.db`
ISO 22301 (BCMS)	DR runbook documented + tested

Recommended cadence:

Backup: daily for production, weekly for staging
Off-host copy: every backup (no point keeping it on the same disk)
Restore drill: quarterly on a non-prod host. Verify the procedure still works end-to-end. Document any drift in the runbook.

Runbook — incident response and day-to-day operations
Rotate keys — key rotation procedures, including the Org CA
Vault as Org CA private key store — move the Org CA root key out of the bundle entirely
Audit export — extract the tamper-evident audit log for forensic review

Disaster recovery

Disaster recovery

What gets backed up

Taking a backup

Bundled upgrade backup (automatic)

Non-interactive (cron)

Off-host copy

Restoring

Scenario walk-throughs

VM disk failure (most common)

Ransomware / host compromise

Accidental wipe (rm -rf data/ on the wrong host)

Org CA key rotation after suspected compromise

Compliance mapping

Next

Accidental wipe (`rm -rf data/` on the wrong host)