oracle-bridge VPS — Architecture & Operations
Complementary views: System Topology places the VPS in the three-machine picture; Secret Hygiene covers the Pattern C env-var flow specific to this service; founderyos-api is the SaaS-side counterpart that consumes oracle-bridge's cross-domain token-exchange endpoint; Infrastructure & Deployment is the operator-facing runbook.
Why oracle-bridge runs on a VPS
The rest of the non-canister backend runs on AX42-U's k3s cluster. oracle-bridge deliberately does not. It is the off-chain signer for every canister write initiated from the frontend, holds the PostgreSQL session store that replaced the decommissioned auth-service canister, and brokers every third-party webhook (Didit KYC, Resend delivery events, Stripe in some flows). Putting it on a single-tenant Hetzner Cloud VPS (65.21.149.226, Helsinki) keeps it out of the cluster's OOM blast radius and lets Docker Compose manage the staging + production twins without coupling their lifecycle to cluster maintenance. The auth-service decommission (2026-04-11) made oracle-bridge the sole session authority — any cluster incident that took down oracle-bridge would log every DAO user out. Keeping it physically separate is load-bearing availability, not preference.
The tradeoff is that oracle-bridge does not inherit the Pattern A (secretKeyRef) secret hygiene story. It uses Pattern C from Secret Hygiene: .env.staging and .env.production files rendered from GitHub Secrets by the deploy workflow, scp'd to /etc/oracle-bridge/ on the VPS, and loaded by docker compose via env_file: directives. No k8s, no ConfigMap, no valueFrom.
Deployment topology
Two containers run side-by-side on the same VPS. Staging serves every staging-*.helloworlddao.com suite; production serves www.helloworlddao.com / helloworlddao.com. They share the host, Docker daemon, and PEM/CA cert directory, but have separate Neon Postgres databases, distinct signing PEMs, and are reachable over different ports. The API gateway on AX42-U proxies /oracle/* into the VPS over the Hetzner vSwitch private network — staging-apis hits 10.0.0.2:8787, apis hits 10.0.0.2:8788.
graph TB
subgraph cf["Cloudflare DNS (grey-cloud, DNS-only)"]
dns["staging-oracle.helloworlddao.com<br/>oracle.helloworlddao.com"]
end
subgraph ax["AX42-U — k3s"]
traefik["Traefik<br/>apis.helloworlddao.com<br/>staging-apis.helloworlddao.com"]
end
subgraph vsw["Hetzner vSwitch 80388 — VLAN 4010 — 10.0.0.0/16"]
bus["private backbone<br/>(no ICMP; TCP/UDP forwarded)"]
end
subgraph vps["oracle-bridge VPS (65.21.149.226 / 10.0.0.2)"]
obs["oracle-bridge-staging :8787<br/>(docker)"]
obp["oracle-bridge-production :8788<br/>(docker)"]
certs["/etc/oracle-bridge/<br/>github-ci-identity.pem<br/>proton-bridge-ca.crt<br/>.env.staging / .env.production"]
end
subgraph ext["External managed services"]
neon["Neon Postgres<br/>(two databases: staging + prod)"]
ic["IC mainnet<br/>canister fleet"]
didit["Didit KYC"]
resend["Resend (email dispatch)"]
end
dns --> obs
dns --> obp
traefik -.->|/oracle/* via vSwitch| bus
bus --> obs
bus --> obp
obs --> certs
obp --> certs
obs --> neon
obp --> neon
obs -->|signed HTTP outcalls| ic
obp -->|signed HTTP outcalls| ic
obs --> didit
obp --> didit
obs --> resend
obp --> resendCompose files live under /home/deploy/oracle-bridge/ on the VPS (docker-compose.staging.yml, docker-compose.production.yml). The compose spec points env_file: at /etc/oracle-bridge/.env.staging (or .env.production) and mounts /etc/oracle-bridge/ read-only for the PEM + CA cert.
Environment variables
Every live env var is declared in deploy/env.{staging,production}.template in the oracle-bridge repo. The deploy workflow renders the template with GitHub Secrets and Variables via envsubst before scp'ing it to the VPS. The audit truth-table is maintained in bmad-artifacts/runbooks/env-drift-audit-YYYY-MM-DD.md; the secret values are never echoed in workflow logs (see Secret Hygiene — Invariants).
| Var | Source | Purpose |
|---|---|---|
DATABASE_URL | GH Secret (env-scoped) | Neon Postgres — sessions, governance cache, digest preferences, KYC webhook mailbox |
PRIV_KEY_B64 | GH Secret (env-scoped — distinct staging vs prod) | Base64 Ed25519 PEM → canister signing principal. Staging principal: ervli-tob4m-zidjr-ilnoz-f6l7o-7bdhu-jate4-crnsj-53wf2-hlxgg-tqe. Production PEM at ~/.config/oracle-bridge-prod/ on Coby's box (BL-206, deploys at prod cutover). |
IC_HOST | GH Variable | https://icp0.io (mainnet) |
TOKEN_ORACLE_BRIDGE | GH Secret | Verifies /oracle/* requests from Traefik ForwardAuth (PLATFORM-006.4 token-authz) |
TOKEN_NOTIFICATION_SERVICE | GH Secret | Outbound auth header for notification-service /api/v1/send |
PAYMENT_GATEWAY_TOKEN | GH Secret | Outbound auth header for payment-gateway /api/v1/... |
FOS_API_TOKEN | GH Secret | Outbound auth header for founderyos-api /api/v1/... |
RESEND_API_KEY | GH Secret (shared with notification-service) | Direct email dispatch fallback |
NOTIFICATION_SERVICE_URL, PAYMENT_GATEWAY_URL, FOS_API_URL | GH Variable | Downstream service base URLs |
LOG_LEVEL, NODE_ENV, PORT | GH Variable | Runtime config (PORT: 8787 staging, 8788 prod) |
DIDIT_* | GH Secret | KYC adapter credentials |
OAUTH_* | GH Secret | Google/GitHub/Discord/Apple/Microsoft OAuth client secrets |
Not rotated as part of secret-hygiene migrations: PRIV_KEY_B64 requires coordinated set_oracle_bridge / set_oracle_bridge_principal updates across the canister fleet. See Secret Hygiene — "Never reuse staging secrets in production" and MEMORY note project_oracle_bridge_prod_pem (BL-206).
Authentication boundaries
oracle-bridge has three auth surfaces. Inbound user traffic — session cookies validated by src/middleware/auth.ts against the sessions table in Postgres. Seven methods supported (EmailPassword, Internet Identity, Google, Apple, Microsoft, GitHub, Discord). Argon2id password verification runs in Node (BL-048 moved it off-canister). Inbound service traffic — /oracle/* via the API gateway carries a Traefik-attached service token verified by token-authz before reaching oracle-bridge; direct VPS hostname hits bypass that and rely on session cookies. Outbound canister writes — Ed25519 signatures produced from the PEM; the derived principal is what each canister configures via set_oracle_bridge / set_oracle_bridge_principal. ops-infra/scripts/verify-canister-principals.sh (also mirrored at oracle-bridge/scripts/) asserts every canister's configured principal matches the running PEM — run after every deploy that changes the PEM.
Integration points
Upstream callers: every DAO frontend suite (dao-suite, dao-admin-suite, governance-suite, otter-camp-suite, think-tank-suite, marketing-suite), the Traefik API gateway (/oracle/* routes), and founderyos-api (cross-domain auth exchange per PLATFORM-003.1).
Downstream dependencies: IC mainnet canisters (membership, governance, user-service, dom-token, treasury, airdrop, marketplace, identity-gateway); Neon Postgres (two databases, one per environment); notification-service for transactional email; payment-gateway for Stripe/ICP checkout; Didit for KYC webhooks; Resend for direct email fallback.
Deploy flow
Push to main triggers docker-build.yml: build image, push ghcr.io/hello-world-co-op/oracle-bridge:staging, render .env.staging from GH Secrets/Variables, scp to the VPS, SSH in and docker compose -f docker-compose.staging.yml up -d, then health-check. A GitHub Release event triggers deploy-production.yml with the same shape against the :latest tag and .env.production.
sequenceDiagram
autonumber
participant D as Developer
participant GH as GitHub Actions
participant GHCR as ghcr.io
participant V as VPS (10.0.0.2)
participant H as Health check
D->>GH: push to main (or Release)
GH->>GHCR: docker build + push :staging
GH->>GH: envsubst deploy/env.staging.template<br/>→ /tmp/.env.staging.rendered
GH->>V: scp .env.staging.rendered<br/>→ /etc/oracle-bridge/.env.staging
GH->>V: ssh → docker compose pull + up -d
V->>H: curl --fail https://staging-oracle.helloworlddao.com/health
H-->>GH: 200 okBoth workflow files SHA-pin every third-party action (appleboy/scp-action, appleboy/ssh-action, actions/checkout) per BL-230. Rendered .env files are never committed; deploy/env.*.template with ${VAR} placeholders is the only git-tracked env surface.
Operations
- Database migrations: Liquibase, per BL-269 (pgmigrations retired). Changelog in
oracle-bridge/liquibase/changelog/; properties files per env atoracle-bridge/liquibase/liquibase.{staging,production}.properties. Runbook: BL-269 migrations runbook. - Post-deploy canister principal check:
oracle-bridge/scripts/verify-canister-principals.sh— fails loudly if any canister's configured oracle-bridge principal has drifted from the PEM the container is running. - PEM rotation: DO NOT rotate
PRIV_KEY_B64as part of a secret-hygiene migration. Rotation is a fleet-coordinated story (BL-206 at prod cutover). - Cycles monitoring: off-chain — oracle-bridge does not consume cycles itself, but its outbound canister calls do. Monitored via
ops-infra/scripts/canister-cycles-report.sh. - Log tailing:
ssh -i ~/.ssh/oracle-bridge-deploy deploy@65.21.149.226 "docker logs -f oracle-bridge-staging". - Drift audit:
ops-infra/scripts/audit-env-drift.sh oracle-bridge— nightly cron per BL-252 catches new live-only vars before they cause a deploy-strip regression.
Known gotchas
- URL convention is easy to typo.
staging-oracle.helloworlddao.com→ staging (:8787).oracle.helloworlddao.com→ production (:8788). Thestaging-prefix is on the staging host, not production — the inverted pattern (which some older runbooks used) is incorrect (MEMORY note 2026-04-15). - VPS + AX42-U private network (
10.0.0.0/16) drops ICMP at the gateway but forwards TCP/UDP. Do not troubleshoot vSwitch connectivity withping; usecurl --fail -m 5 http://10.0.0.2:8787/healthfrom AX42-U instead. - Traefik's
/oracle/*route returns 502 until oracle-bridge binds its listener to the vSwitch IP. Binding to0.0.0.0is fine; binding to127.0.0.1hides the service from the gateway even though direct VPS-hostname traffic still works (MEMORY Platform API Gateway section). - Two separate Neon databases. Staging and production never share a connection string. Running
liquibase updateagainst the wrong.propertiesfile is how you corrupt production — always verifypsql $DATABASE_URL -c "SELECT current_database()"before a manual migration. docker compose(space, not hyphen). The VPS has the modern CLI plugin installed; older runbook snippets usingdocker-composeshould be updated on sight.- No manual
.envedits on the VPS. The GHA workflow is the exclusive source of truth for env file contents. Post-PLATFORM-009.4, any hand-edit will be stomped on the next deploy; capture the change as a GH Secret/Variable rotation instead. - Two PEMs, two principals. Staging PEM lives at
/etc/oracle-bridge/github-ci-identity.pemon the VPS; the derived principal iservli-tob4m-...-tqe. Production PEM is staged at~/.config/oracle-bridge-prod/on Coby's workstation and deploys at prod cutover (BL-206). Do not cross-deploy. A principal mismatch againstset_oracle_bridgefails every downstream canister write silently.
References
| Reference | Purpose |
|---|---|
| System Topology | Three-machine overview with the VPS in context of AX42-U + IC mainnet |
| Secret Hygiene | Pattern C (.env rendering) flow, rotation procedure, drift-detection cron |
developer/infrastructure | Operator runbook — VPS Docker management, cluster access, DNS sync |
oracle-bridge/runbooks/pem-deployment.md | PEM provisioning + rotation |
oracle-bridge/scripts/verify-canister-principals.sh | Post-deploy parity check — PEM principal vs fleet-configured principal |
ops-infra/runbooks/secret-hygiene-playbook.md | PLATFORM-009.6 full playbook, Pattern C case study |
ops-infra/scripts/audit-env-drift.sh | Drift detection (BL-252 nightly cron) |
bmad-artifacts/implementation-artifacts/platform-006-1-hetzner-vswitch.md | Private network provisioning (10.0.0.0/16) |
bmad-artifacts/implementation-artifacts/platform-009-4-oracle-bridge-vps.md | Secret-hygiene migration story |
| BL-048 | Argon2id off-canister migration |
| BL-206 | Production PEM key separation (prod cutover) |
| BL-230 | SHA-pin GHA appleboy actions pattern |
| BL-269 | Liquibase replaces pgmigrations |