Skip to content

Checking access...

oracle-bridge VPS — Architecture & Operations

Complementary views: System Topology places the VPS in the three-machine picture; Secret Hygiene covers the Pattern C env-var flow specific to this service; founderyos-api is the SaaS-side counterpart that consumes oracle-bridge's cross-domain token-exchange endpoint; Infrastructure & Deployment is the operator-facing runbook.

Why oracle-bridge runs on a VPS

The rest of the non-canister backend runs on AX42-U's k3s cluster. oracle-bridge deliberately does not. It is the off-chain signer for every canister write initiated from the frontend, holds the PostgreSQL session store that replaced the decommissioned auth-service canister, and brokers every third-party webhook (Didit KYC, Resend delivery events, Stripe in some flows). Putting it on a single-tenant Hetzner Cloud VPS (65.21.149.226, Helsinki) keeps it out of the cluster's OOM blast radius and lets Docker Compose manage the staging + production twins without coupling their lifecycle to cluster maintenance. The auth-service decommission (2026-04-11) made oracle-bridge the sole session authority — any cluster incident that took down oracle-bridge would log every DAO user out. Keeping it physically separate is load-bearing availability, not preference.

The tradeoff is that oracle-bridge does not inherit the Pattern A (secretKeyRef) secret hygiene story. It uses Pattern C from Secret Hygiene: .env.staging and .env.production files rendered from GitHub Secrets by the deploy workflow, scp'd to /etc/oracle-bridge/ on the VPS, and loaded by docker compose via env_file: directives. No k8s, no ConfigMap, no valueFrom.

Deployment topology

Two containers run side-by-side on the same VPS. Staging serves every staging-*.helloworlddao.com suite; production serves www.helloworlddao.com / helloworlddao.com. They share the host, Docker daemon, and PEM/CA cert directory, but have separate Neon Postgres databases, distinct signing PEMs, and are reachable over different ports. The API gateway on AX42-U proxies /oracle/* into the VPS over the Hetzner vSwitch private network — staging-apis hits 10.0.0.2:8787, apis hits 10.0.0.2:8788.

mermaid
graph TB
  subgraph cf["Cloudflare DNS (grey-cloud, DNS-only)"]
    dns["staging-oracle.helloworlddao.com<br/>oracle.helloworlddao.com"]
  end
  subgraph ax["AX42-U — k3s"]
    traefik["Traefik<br/>apis.helloworlddao.com<br/>staging-apis.helloworlddao.com"]
  end
  subgraph vsw["Hetzner vSwitch 80388 — VLAN 4010 — 10.0.0.0/16"]
    bus["private backbone<br/>(no ICMP; TCP/UDP forwarded)"]
  end
  subgraph vps["oracle-bridge VPS (65.21.149.226 / 10.0.0.2)"]
    obs["oracle-bridge-staging :8787<br/>(docker)"]
    obp["oracle-bridge-production :8788<br/>(docker)"]
    certs["/etc/oracle-bridge/<br/>github-ci-identity.pem<br/>proton-bridge-ca.crt<br/>.env.staging / .env.production"]
  end
  subgraph ext["External managed services"]
    neon["Neon Postgres<br/>(two databases: staging + prod)"]
    ic["IC mainnet<br/>canister fleet"]
    didit["Didit KYC"]
    resend["Resend (email dispatch)"]
  end
  dns --> obs
  dns --> obp
  traefik -.->|/oracle/* via vSwitch| bus
  bus --> obs
  bus --> obp
  obs --> certs
  obp --> certs
  obs --> neon
  obp --> neon
  obs -->|signed HTTP outcalls| ic
  obp -->|signed HTTP outcalls| ic
  obs --> didit
  obp --> didit
  obs --> resend
  obp --> resend

Compose files live under /home/deploy/oracle-bridge/ on the VPS (docker-compose.staging.yml, docker-compose.production.yml). The compose spec points env_file: at /etc/oracle-bridge/.env.staging (or .env.production) and mounts /etc/oracle-bridge/ read-only for the PEM + CA cert.

Environment variables

Every live env var is declared in deploy/env.{staging,production}.template in the oracle-bridge repo. The deploy workflow renders the template with GitHub Secrets and Variables via envsubst before scp'ing it to the VPS. The audit truth-table is maintained in bmad-artifacts/runbooks/env-drift-audit-YYYY-MM-DD.md; the secret values are never echoed in workflow logs (see Secret Hygiene — Invariants).

VarSourcePurpose
DATABASE_URLGH Secret (env-scoped)Neon Postgres — sessions, governance cache, digest preferences, KYC webhook mailbox
PRIV_KEY_B64GH Secret (env-scoped — distinct staging vs prod)Base64 Ed25519 PEM → canister signing principal. Staging principal: ervli-tob4m-zidjr-ilnoz-f6l7o-7bdhu-jate4-crnsj-53wf2-hlxgg-tqe. Production PEM at ~/.config/oracle-bridge-prod/ on Coby's box (BL-206, deploys at prod cutover).
IC_HOSTGH Variablehttps://icp0.io (mainnet)
TOKEN_ORACLE_BRIDGEGH SecretVerifies /oracle/* requests from Traefik ForwardAuth (PLATFORM-006.4 token-authz)
TOKEN_NOTIFICATION_SERVICEGH SecretOutbound auth header for notification-service /api/v1/send
PAYMENT_GATEWAY_TOKENGH SecretOutbound auth header for payment-gateway /api/v1/...
FOS_API_TOKENGH SecretOutbound auth header for founderyos-api /api/v1/...
RESEND_API_KEYGH Secret (shared with notification-service)Direct email dispatch fallback
NOTIFICATION_SERVICE_URL, PAYMENT_GATEWAY_URL, FOS_API_URLGH VariableDownstream service base URLs
LOG_LEVEL, NODE_ENV, PORTGH VariableRuntime config (PORT: 8787 staging, 8788 prod)
DIDIT_*GH SecretKYC adapter credentials
OAUTH_*GH SecretGoogle/GitHub/Discord/Apple/Microsoft OAuth client secrets

Not rotated as part of secret-hygiene migrations: PRIV_KEY_B64 requires coordinated set_oracle_bridge / set_oracle_bridge_principal updates across the canister fleet. See Secret Hygiene — "Never reuse staging secrets in production" and MEMORY note project_oracle_bridge_prod_pem (BL-206).

Authentication boundaries

oracle-bridge has three auth surfaces. Inbound user traffic — session cookies validated by src/middleware/auth.ts against the sessions table in Postgres. Seven methods supported (EmailPassword, Internet Identity, Google, Apple, Microsoft, GitHub, Discord). Argon2id password verification runs in Node (BL-048 moved it off-canister). Inbound service traffic/oracle/* via the API gateway carries a Traefik-attached service token verified by token-authz before reaching oracle-bridge; direct VPS hostname hits bypass that and rely on session cookies. Outbound canister writes — Ed25519 signatures produced from the PEM; the derived principal is what each canister configures via set_oracle_bridge / set_oracle_bridge_principal. ops-infra/scripts/verify-canister-principals.sh (also mirrored at oracle-bridge/scripts/) asserts every canister's configured principal matches the running PEM — run after every deploy that changes the PEM.

Integration points

Upstream callers: every DAO frontend suite (dao-suite, dao-admin-suite, governance-suite, otter-camp-suite, think-tank-suite, marketing-suite), the Traefik API gateway (/oracle/* routes), and founderyos-api (cross-domain auth exchange per PLATFORM-003.1).

Downstream dependencies: IC mainnet canisters (membership, governance, user-service, dom-token, treasury, airdrop, marketplace, identity-gateway); Neon Postgres (two databases, one per environment); notification-service for transactional email; payment-gateway for Stripe/ICP checkout; Didit for KYC webhooks; Resend for direct email fallback.

Deploy flow

Push to main triggers docker-build.yml: build image, push ghcr.io/hello-world-co-op/oracle-bridge:staging, render .env.staging from GH Secrets/Variables, scp to the VPS, SSH in and docker compose -f docker-compose.staging.yml up -d, then health-check. A GitHub Release event triggers deploy-production.yml with the same shape against the :latest tag and .env.production.

mermaid
sequenceDiagram
  autonumber
  participant D as Developer
  participant GH as GitHub Actions
  participant GHCR as ghcr.io
  participant V as VPS (10.0.0.2)
  participant H as Health check
  D->>GH: push to main (or Release)
  GH->>GHCR: docker build + push :staging
  GH->>GH: envsubst deploy/env.staging.template<br/>→ /tmp/.env.staging.rendered
  GH->>V: scp .env.staging.rendered<br/>→ /etc/oracle-bridge/.env.staging
  GH->>V: ssh → docker compose pull + up -d
  V->>H: curl --fail https://staging-oracle.helloworlddao.com/health
  H-->>GH: 200 ok

Both workflow files SHA-pin every third-party action (appleboy/scp-action, appleboy/ssh-action, actions/checkout) per BL-230. Rendered .env files are never committed; deploy/env.*.template with ${VAR} placeholders is the only git-tracked env surface.

Operations

  • Database migrations: Liquibase, per BL-269 (pgmigrations retired). Changelog in oracle-bridge/liquibase/changelog/; properties files per env at oracle-bridge/liquibase/liquibase.{staging,production}.properties. Runbook: BL-269 migrations runbook.
  • Post-deploy canister principal check: oracle-bridge/scripts/verify-canister-principals.sh — fails loudly if any canister's configured oracle-bridge principal has drifted from the PEM the container is running.
  • PEM rotation: DO NOT rotate PRIV_KEY_B64 as part of a secret-hygiene migration. Rotation is a fleet-coordinated story (BL-206 at prod cutover).
  • Cycles monitoring: off-chain — oracle-bridge does not consume cycles itself, but its outbound canister calls do. Monitored via ops-infra/scripts/canister-cycles-report.sh.
  • Log tailing: ssh -i ~/.ssh/oracle-bridge-deploy deploy@65.21.149.226 "docker logs -f oracle-bridge-staging".
  • Drift audit: ops-infra/scripts/audit-env-drift.sh oracle-bridge — nightly cron per BL-252 catches new live-only vars before they cause a deploy-strip regression.

Known gotchas

  • URL convention is easy to typo. staging-oracle.helloworlddao.com → staging (:8787). oracle.helloworlddao.com → production (:8788). The staging- prefix is on the staging host, not production — the inverted pattern (which some older runbooks used) is incorrect (MEMORY note 2026-04-15).
  • VPS + AX42-U private network (10.0.0.0/16) drops ICMP at the gateway but forwards TCP/UDP. Do not troubleshoot vSwitch connectivity with ping; use curl --fail -m 5 http://10.0.0.2:8787/health from AX42-U instead.
  • Traefik's /oracle/* route returns 502 until oracle-bridge binds its listener to the vSwitch IP. Binding to 0.0.0.0 is fine; binding to 127.0.0.1 hides the service from the gateway even though direct VPS-hostname traffic still works (MEMORY Platform API Gateway section).
  • Two separate Neon databases. Staging and production never share a connection string. Running liquibase update against the wrong .properties file is how you corrupt production — always verify psql $DATABASE_URL -c "SELECT current_database()" before a manual migration.
  • docker compose (space, not hyphen). The VPS has the modern CLI plugin installed; older runbook snippets using docker-compose should be updated on sight.
  • No manual .env edits on the VPS. The GHA workflow is the exclusive source of truth for env file contents. Post-PLATFORM-009.4, any hand-edit will be stomped on the next deploy; capture the change as a GH Secret/Variable rotation instead.
  • Two PEMs, two principals. Staging PEM lives at /etc/oracle-bridge/github-ci-identity.pem on the VPS; the derived principal is ervli-tob4m-...-tqe. Production PEM is staged at ~/.config/oracle-bridge-prod/ on Coby's workstation and deploys at prod cutover (BL-206). Do not cross-deploy. A principal mismatch against set_oracle_bridge fails every downstream canister write silently.

References

ReferencePurpose
System TopologyThree-machine overview with the VPS in context of AX42-U + IC mainnet
Secret HygienePattern C (.env rendering) flow, rotation procedure, drift-detection cron
developer/infrastructureOperator runbook — VPS Docker management, cluster access, DNS sync
oracle-bridge/runbooks/pem-deployment.mdPEM provisioning + rotation
oracle-bridge/scripts/verify-canister-principals.shPost-deploy parity check — PEM principal vs fleet-configured principal
ops-infra/runbooks/secret-hygiene-playbook.mdPLATFORM-009.6 full playbook, Pattern C case study
ops-infra/scripts/audit-env-drift.shDrift detection (BL-252 nightly cron)
bmad-artifacts/implementation-artifacts/platform-006-1-hetzner-vswitch.mdPrivate network provisioning (10.0.0.0/16)
bmad-artifacts/implementation-artifacts/platform-009-4-oracle-bridge-vps.mdSecret-hygiene migration story
BL-048Argon2id off-canister migration
BL-206Production PEM key separation (prod cutover)
BL-230SHA-pin GHA appleboy actions pattern
BL-269Liquibase replaces pgmigrations

Hello World Co-Op DAO