Infrastructure & Deployment
This document covers the production and staging infrastructure for Hello World DAO: the Hetzner VPS, Cloudflare DNS, Docker-based oracle-bridge deployment, and the FounderyOS Kubernetes cluster on AX42-U.
Hetzner VPS (oracle-bridge Staging)
The oracle-bridge off-chain service is deployed to a Hetzner server in Helsinki.
| Property | Value |
|---|---|
| Provider | Hetzner Cloud |
| Location | Helsinki, Finland |
| IP | 65.21.149.226 |
| SSH | ssh -i ~/.ssh/oracle-bridge-deploy deploy@65.21.149.226 |
Docker Containers
Oracle-bridge runs as Docker containers managed by Docker Compose:
| Container | Port | Purpose |
|---|---|---|
oracle-bridge-staging | 8787 | Staging environment |
oracle-bridge-production | 8788 | Production environment |
# SSH to VPS
ssh -i ~/.ssh/oracle-bridge-deploy deploy@65.21.149.226
# View running containers
docker ps
# View staging logs (last 100 lines)
docker logs oracle-bridge-staging --tail 100 -f
# Restart staging container
docker compose -f ~/oracle-bridge/docker-compose.staging.yml restartDocker Compose Directory
Compose files and .env files live in ~/oracle-bridge/ on the VPS. Never edit these manually in production — changes are managed by CI/CD.
Container Registry
Images are published to GitHub Container Registry:
ghcr.io/hello-world-co-op/oracle-bridge:staging (latest staging build)
ghcr.io/hello-world-co-op/oracle-bridge:latest (latest production build)
ghcr.io/hello-world-co-op/oracle-bridge:v0.x.x (versioned release tags)PEM and CA Certificates
Secrets stored on the VPS:
| Path | Contents | Permissions |
|---|---|---|
/etc/oracle-bridge/github-ci-identity.pem | IC identity for canister calls | 640 root:docker |
/etc/oracle-bridge/proton-bridge-ca.crt | Internal CA cert | 640 root:docker |
These are mounted read-only into containers via the compose file. To rotate: update the file on the VPS and restart the container.
CI/CD — Docker Build & Deploy
Oracle-bridge uses GitHub Actions for automated deployment. Manual dfx deploy or SSH deploys are not used.
Workflow Files
| Workflow | File | Trigger |
|---|---|---|
| Staging deploy | docker-build.yml | Push to main |
| Production deploy | deploy-production.yml | GitHub Release event |
Staging Deploy Flow
git push origin main
│
▼
GHA: docker-build.yml
├── Build Docker image
├── Push to ghcr.io:staging
└── SSH to VPS → docker compose pull + up -dTo check deploy status:
# View latest staging deploy run
gh run list --workflow=docker-build.yml --repo Hello-World-Co-Op/oracle-bridge --limit 5
# View logs for a specific run
gh run view <run-id> --logProduction Release Flow
gh release create v0.x.x --title "..." --notes "..."
│
▼
GHA: deploy-production.yml
├── Build Docker image
├── Push to ghcr.io:latest + ghcr.io:v0.x.x
└── SSH to VPS → docker compose pull + up -d (production container)Cloudflare DNS
DNS for helloworlddao.com migrated from GoDaddy to Cloudflare in February 2026.
| Property | Value |
|---|---|
| Nameservers | elmo.ns.cloudflare.com, tessa.ns.cloudflare.com |
| Zone ID | c54ceb83773e6dc926a644a2d4e8d4af (org account, migrated 2026-04-17 PLATFORM-001.2) |
| Credentials | ~/.config/cloudflare/org-credentials.env (canonical) |
| Env overrides | CLOUDFLARE_ZONE_ID, CLOUDFLARE_CREDENTIALS — accepted by both sync + DDNS scripts |
DNS Sync Script
DNS records are managed declaratively via ops-infra/scripts/cloudflare-dns-sync.sh. This script defines all 61+ records and syncs them to Cloudflare.
# Dry run — shows what would change
./ops-infra/scripts/cloudflare-dns-sync.sh --dry-run
# Apply changes
./ops-infra/scripts/cloudflare-dns-sync.shImportant: IC boundary node records MUST be proxied: false (DNS-only, grey cloud). Cloudflare's auto-import sets proxied: true for all records — the sync script corrects this. Proxied IC records break canister routing.
Zone Migration — Pre-Cutover Checklist (AI-P1-01)
Background: Cloudflare's "add a zone to a new account" flow does NOT import all records, and silently sets proxied=true on many record types that need to be DNS-only. Treating auto-import as a migration will break IC boundary TLS, break subdomains that weren't imported, and leave latent bugs invisible. Follow this checklist every time a zone moves between Cloudflare accounts.
Before changing NS at the registrar:
- Add the zone to the destination Cloudflare account.
- Get the new zone ID:bash
source ~/.config/cloudflare/<dest>-credentials.env curl -s -H "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \ "https://api.cloudflare.com/client/v4/zones?name=<domain>" \ | jq '.result[] | {id, name, account: .account.name}' - Run the sync script in dry-run mode against the new zone:bash
CLOUDFLARE_ZONE_ID=<new-zone-id> \ CLOUDFLARE_CREDENTIALS=~/.config/cloudflare/<dest>-credentials.env \ ./ops-infra/scripts/cloudflare-dns-sync.sh --dry-run - Inspect the diff:
- CREATE count tells you how many records auto-import missed.
- UPDATE count with
proxied=true→falsetells you how many records auto-import flipped to proxied (IC records MUST be DNS-only). - Unmanaged records tells you what extras (AAAA apex, duplicate DMARC, legacy CAA) need manual review.
- Run the live sync:bash
CLOUDFLARE_ZONE_ID=<new-zone-id> \ CLOUDFLARE_CREDENTIALS=~/.config/cloudflare/<dest>-credentials.env \ ./ops-infra/scripts/cloudflare-dns-sync.sh - Verify with
dig @<new-ns>.ns.cloudflare.comfor the critical record types:bashdig +short @<new-ns> CNAME staging-portal.<domain> dig +short @<new-ns> TXT _canister-id.www.<domain> dig +short @<new-ns> CNAME _acme-challenge.www.<domain> - Only after the new zone dig matches expectations, update NS at the registrar.
After NS cutover:
- Re-run the full
dig+ HTTPScurlchecklist at T+15m and T+1h. Schedule these as concrete tasks in the migration story — an aspirational "we'll check later" leaves outages invisible. - If IC boundary node serves a wrong cert (
CN=<other-canister>.icpex.orgetc.), verify the domain is registered with IC's custom-domain registry (POST https://icp0.io/custom-domains/v1). DNS being correct is necessary but not sufficient.
Why this checklist exists: PLATFORM-001.2 (2026-04-17) migrated helloworlddao.com to the org account. Cloudflare auto-import brought in 28 of 61 records and set IC CNAMEs to proxied=true. Staging suites were unreachable for hours before the sync script restored them. Source: epic-platform-001-retro-2026-04-17.md.
DDNS Script (legacy)
A DDNS cron job historically kept DNS current with a dynamic WAN IP for the Sector7 lab network (now decommissioned). The script remains in ops-infra for any contributor whose home/office network needs the same pattern:
# Script location
ops-infra/scripts/cloudflare-ddns-helloworlddao.shThe active staging + production stack now lives entirely on Hetzner (oracle-bridge VPS + AX42-U dedicated server, both with static IPs), so DDNS is no longer required for the platform itself.
Key DNS Records
| Subdomain | Type | Target | Notes |
|---|---|---|---|
www | CNAME | IC boundary node | Marketing suite canister |
portal | CNAME | IC boundary node | DAO suite canister |
admin | CNAME | IC boundary node | DAO admin suite canister |
oracle | A | 65.21.149.226 | Oracle bridge (staging on 8787) |
staging-oracle | A | 65.21.149.226 | Explicit staging endpoint |
think-tank | CNAME | IC boundary node | Think Tank suite |
ottercamp | CNAME | IC boundary node | Otter Camp suite |
governance | CNAME | IC boundary node | Governance suite |
Unmanaged Records
These records exist in Cloudflare but are NOT managed by the sync script:
| Record | Reason |
|---|---|
_domainconnect | GoDaddy artifact — safe to leave |
_gh-hello-world-coop-dao-e | GitHub org verification — do not delete |
IC Custom Domain API
To transfer a custom domain between IC canisters (e.g., promote staging canister to production):
# Trigger domain transfer
curl -X PATCH https://icp0.io/custom-domains/v1/helloworlddao.com \
-H "Content-Type: application/json" \
-d '{"canister_id": "<new-canister-id>"}'You must also remove the domain from the old canister's .well-known/ic-domains file before or after the API call. See IC Custom Domain Runbook for the full procedure.
Platform API Gateway (AX42-U k3s — PLATFORM-006)
See also: System Topology — the cross-machine architecture overview (VPS + AX42-U + IC mainnet, vSwitch bridging, per-path gateway routing, cross-domain auth bridge, payment + notification data flow). This section is the developer-operational view; the system-topology doc is the architecture-overview view.
Traefik-based TLS ingress on AX42-U that unifies all backend services under one pair of hostnames, with path-based routing, TLS termination, CORS, rate limiting, and service-token auth for internal routes.
graph LR
subgraph public["Public internet"]
browser["Browser / API client"]
end
subgraph cloudflare["Cloudflare DNS"]
dns["apis.helloworlddao.com<br/>staging-apis.helloworlddao.com<br/>A → 157.180.13.84"]
end
subgraph ax42u["AX42-U (k3s)"]
traefik["Traefik Ingress<br/>:80 → 308 redirect<br/>:443 TLS (Let's Encrypt)"]
mw["Middleware chain<br/>CORS · rate-limit · security-headers<br/>strip-* · service-token-auth"]
subgraph ns_platform["ns: platform"]
health["health (nginx)"]
authz["token-authz (ForwardAuth)"]
shim["ExternalName / Endpoints"]
end
subgraph ns_fos["ns: founderyos"]
fosapi["founderyos-api:8000"]
end
end
subgraph vps["VPS (Hetzner Cloud)"]
vsw["vSwitch 10.0.0.2<br/>(oracle-bridge via private net — pending rebind)"]
ob["oracle-bridge :8787 staging<br/>oracle-bridge :8788 prod"]
end
browser -->|HTTPS| dns
dns --> traefik
traefik --> mw
mw -->|/health| health
mw -->|/fos/*| shim -->|strip /fos| fosapi
mw -->|/oracle/*| shim -->|strip /oracle, vSwitch| vsw
vsw -.->|pending oracle-bridge listener| ob
mw -->|/notify/* with token| authz
authz -.->|401 no token| mwHosts and routing
| Host | Environment |
|---|---|
https://apis.helloworlddao.com | Production |
https://staging-apis.helloworlddao.com | Staging |
| Path | Backend | Notes |
|---|---|---|
/health | in-cluster nginx | 200 ok — gateway liveness |
/fos/* | founderyos-api.founderyos:8000 via ExternalName shim | prefix stripped |
/oracle/* | oracle-bridge VPS 10.0.0.2:8787 (staging) / :8788 (prod) via vSwitch | prefix stripped; pending oracle-bridge rebind |
/auth/* | placeholder (503) | filled in by PLATFORM-003 |
/notify/* | placeholder (503) + service-token ForwardAuth | filled in by PLATFORM-002 |
Bootstrap
Manifests: ops-infra/k8s/platform-gateway/ — namespace, health, middlewares, service-token-auth, per-path Ingresses, Traefik HelmChartConfig.
First-time apply and one-time Traefik arg patch are documented in the platform-gateway README.
Service tokens
Stored in k8s Secret service-tokens (namespace platform). One token per calling service (TOKEN_NOTIFICATION_SERVICE, TOKEN_FOUNDERYOS_API, TOKEN_ORACLE_BRIDGE). Validated by the token-authz Deployment — a small Node.js ForwardAuth verifier. Rotate with the script in the README. Never committed to git.
Access logs + dashboard
- Traefik emits JSON access logs to stdout.
AuthorizationandCookieheaders are stripped. Tail viakubectl -n kube-system logs -f deploy/traefik. - Traefik dashboard is enabled but not Ingress-exposed. Access:
kubectl -n kube-system port-forward deploy/traefik 9000:8080→http://127.0.0.1:9000/dashboard/.
Adding a new service
See ops-infra/runbooks/api-gateway-add-service.md — template-based walkthrough covering ExternalName shim, strip middleware, Ingress, token provisioning, verification, and troubleshooting.
AX42-U Kubernetes Cluster (FounderyOS + platform services)
Cluster note (2026-04-27): The Sector7 cluster (Aurora, Theo, Library, Knower nodes on
192.168.2.0/24) was fully decommissioned. AX42-U is the only k3s cluster. The platform API gateway, FounderyOS API, and Ollama all run here.
The off-chain platform (FounderyOS API, notification-service, payment-gateway, Ollama, GlitchTip) runs on a single-node k3s cluster on the Hetzner AX42-U dedicated server.
Server
| Property | Value |
|---|---|
| Hostname | ax42u-hel1 |
| Public IP | 157.180.13.84 |
| Private IP | 10.0.1.3/24 (vSwitch VLAN, see "Private Network" below) |
| Provider | Hetzner Robot |
| Location | Helsinki, Finland |
| OS | Ubuntu 6.8.0-90 |
| SSH | ssh -i ~/.ssh/hetzner_vps root@157.180.13.84 |
| k3s | Single-node — cni0 MTU 1450, pod CIDR 10.42.0.0/24, svc CIDR 10.43.0.0/16 |
Private Network (vSwitch — PLATFORM-006.1)
Hetzner Cloud Network hwdao-private (10.0.0.0/16) bridged to Robot vSwitch 80388 (VLAN 4010). Connects oracle-bridge VPS and AX42-U for cross-machine backend traffic.
| Machine | Public | Private | Iface |
|---|---|---|---|
| oracle-bridge VPS | 65.21.149.226 | 10.0.0.2/32 | enp7s0 (cloud-init) |
| AX42-U (this server) | 157.180.13.84 | 10.0.1.3/24 | enp7s0.4010 (VLAN) |
Gateway 10.0.1.1 forwards between subnets but drops ICMP. ufw on each box allows 10.0.0.0/16 inbound on the private iface only.
kubectl Access
# kubectl binary
~/.local/bin/kubectl
# Kubeconfig
~/.kube/config
# Context and cluster (post-cutover from sector7)
kubectl config get-contexts# Check cluster status
kubectl cluster-info
# List pods in hello-world namespace
kubectl get pods -n hello-world
# List pods in platform namespace (api gateway + service tokens)
kubectl get pods -n platform
# View pod logs
kubectl logs -n hello-world <pod-name> --tail 100 -fNamespaces
| Namespace | Purpose |
|---|---|
hello-world | Canonical app namespace — FounderyOS API, workers, Ollama, future microservices (per project memory: consolidate here) |
platform | API gateway middleware, service-token-auth ForwardAuth, notification-service, payment-gateway |
founderyos | FounderyOS API (legacy ns — being consolidated into hello-world) |
kube-system | Traefik ingress, k3s control-plane |
the-flourish | Affiliate project — view-only for Coby |
Coby's RBAC: edit on hello-world, view on the-flourish. No longer cluster-admin (post-S206).
Platform Services
| Service | Namespace | Internal Address | External | Notes |
|---|---|---|---|---|
| Traefik (ingress) | kube-system | — | apis.helloworlddao.com / staging-apis.helloworlddao.com | TLS via Let's Encrypt |
| FounderyOS API | founderyos | founderyos-api:8000 | founderyos.dev (via Traefik /fos/*) | Node.js + Fastify |
| notification-service | platform | notification-service:3100 | gated by service-token | Resend-backed email (PLATFORM-002) |
| payment-gateway | platform | payment-gateway:3200 | gated by service-token | Stripe + Stripe Connect + ICP/DOM (PLATFORM-007) |
| Ollama | platform | ollama:11434 | — | Inference for AI features |
| GlitchTip | hello-world | glitchtip.founderyos.dev | direct DNS | Error tracking — single project ID 4 |
Network Access
Public-facing services route through Traefik on AX42-U. Tailscale is NOT in use (decommissioned 2026-04-17). Direct SSH to AX42-U for cluster admin; service traffic is public via Traefik on :443.
Database Migrations (FounderyOS)
FounderyOS uses Prisma for database schema management:
# Apply pending migrations (from founderyos-api repo)
npx prisma migrate deploy
# Generate Prisma client after schema changes
npx prisma generateMigrations run automatically on deploy via the founderyos-api pod's startup script. For manual migration during incidents:
# Exec into API pod
kubectl exec -it -n founderyos <founderyos-api-pod> -- sh
# Run migration inside pod
npx prisma migrate deployCycle Monitoring
The IC canister fleet (12 backend canisters + 6 frontend asset canisters, ~30 TC total) is monitored via:
- Local script:
ops-infra/scripts/check-cycles.sh - GHA cron:
ops-infra/.github/workflows/monitor-metrics.yml(runs every 6 hours) - Minimum balance: 100B cycles per canister
Canisters with high burn rates (user-service, membership) require monthly top-ups. Top-up command:
# Convert ICP to cycles and top up a canister
dfx ledger top-up <canister-id> --amount <icp> --network icResource Stability Rules
These rules apply to all cluster workloads (learned from EPIC-033 OOM crash resolution):
- k3s reserves 2 GiB memory on all nodes (
systemReserved) - kubelet eviction triggers at < 500 Mi available
- All pods must have
resources.limitsandresources.requestsset - LimitRange and ResourceQuota are enforced per namespace
- Vitest test workers:
maxThreads: 4to prevent OOM during CI
Violating these rules causes node OOM and pod eviction cascades. Review ops-infra/k8s/ for current LimitRange and ResourceQuota manifests.