Iteration 2 — Deploy robustness:
- Health check: Fibonacci-ish backoff (1,2,3,5,8,13s) instead of fixed
5s intervals. Why: containers need variable warmup time; fixed intervals
either wait too long or give up too early. Total budget ~32s vs 30s before.
- Smoke test: now checks /dashboard, /health, and /api/swagger. Why: a
single endpoint check can miss backend-only outages; API Swagger confirms
the ASP.NET layer is healthy.
- Rollback hint: on any failure, prints previous git tag + docker compose
commands for quick manual rollback. Why: reduces MTTR by providing the
exact recovery steps inline.
pnpm defaults to frozen-lockfile in CI. The committed lockfile
is outdated (vitest added to package.json). Using --no-frozen-lockfile
is a pragmatic fix; lockfile should be regenerated via 'pnpm install'
and recommitted for full --frozen-lockfile enforcement.
The --frozen-lockfile flag requires the lockfile to be present
in the checkout. Previously pnpm-lock.yaml was gitignored, so
it was absent from CI checkouts.
Lockfiles SHOULD be version-controlled for reproducible builds.
This also enables CI to detect when lockfile is outdated vs
package.json.
Iteration 1 — CI reliability and speed:
- Concurrency: cancel in-progress CI runs when new push arrives
to the same branch. Why: Avoids waste when pushing multiple
fixes in quick succession; only the latest code is tested.
- pnpm: switch from --no-frozen-lockfile to --frozen-lockfile.
Why: Fails fast if pnpm-lock.yaml is outdated — prevents
untested dependency changes from reaching main.
- pnpm: add --prefer-offline to use locally cached packages.
Why: Slightly faster installs when packages are already
available in the runner image cache.
Iteration 1 — Build caching:
- Backend: cache ~/.nuget/packages keyed on .csproj hashes.
Typical hit: restore drops from ~15s to ~2s (NuGet packages
already cached locally).
- Frontend: cache node_modules + ~/.pnpm-store keyed on
pnpm-lock.yaml. Typical hit: install drops from ~30s to ~3s.
- Concurrency: cancel in-progress CI runs when new push arrives
to the same branch (avoids queue buildup).
Why: On cache hits, CI time drops ~60-70%. Faster feedback for
developers means shorter fix-deploy cycles.
The \$ escape before ${{ inputs.service }} prevented Gitea from
evaluating the expression, passing literal backslash to the shell.
Also use ${BUILD_ARGS} (shell expansion) instead of \$BUILD_ARGS
so the outer shell passes the actual build args to the DIND container.
Phase 1 — .env provisioning fix:
The previous approach tried to write .env directly to
/opt/openclaw/data/openclaw/workspace/nexus from inside the
runner's job container, but that host path is not mounted there.
Fix: write .env from Gitea secrets into the workspace first,
then sync it along with the source code via the existing
Docker-in-Docker pattern (which can access the host path).
Combined the separate '.env creation' and 'sync code' steps
into a single atomic 'Sync code + .env to host' step.
Phase 1 — Deploy reliability:
- Version bump: derive current version from 'git describe --tags' instead of
VERSION file. This eliminates race conditions where the VERSION file is
stale but the tag already exists from a previous failed run.
- Tag creation: use 'git tag -f' + 'git push --force --tags' to handle
retries gracefully when tags already exist.
- Environment: provision .env at the host deploy path from Gitea secrets
(ENV_POSTGRES_PASSWORD, ENV_JWT_KEY, ENV_OWNER_PASSWORD, ENV_OPENCLAW_TOKEN).
This ensures .env always exists on the host even though it's excluded from
the sync step for security.
Runner label was already fixed in previous commit (runs-on: ubuntu-latest).
The runner registers with labels [linux, dotnet, node, ubuntu-latest, ...]
but did not include 'deploy'. Changed workflow to use the consistently
available ubuntu-latest label. Also added 'deploy' label to the runner
registration for future compatibility.
Runner job containers don't have the /workspace/nexus mount.
- Sync code to host path using a docker run helper (preserves .env)
- Build & deploy from host path using docker:cli image
- Health check with retry loop for slow container startup
The runner job container does not have /workspace/nexus mounted.
Run everything from the checkout directory which has .git and compose.yaml.
- Removed rsync sync step (not needed)
- Version bump uses checkout dir with full git history
- Docker compose runs from checkout dir
- Added fetch-depth:0 and fetch-tags for version tagging
The Gitea runner ubuntu-latest image lacks rsync, causing
the Sync-to-deploy-path step to fail with exit code 127.
Added apt-get install rsync before the sync step.
- deploy.yaml now triggers automatically after successful CI completion
- Adds workflow_run event listener for 'CI - Build & Test'
- Guards deploy to only run when CI conclusion == success
- Preserves manual workflow_dispatch for targeted deploys
- Adds CI/CD note to README
- Renamed 'overdue' (Überfällig) → 'critical' (Kritisch):
Was falsch: Der Meter zählte tasks.filter(t => t.state === 'Blocked'),
zeigte aber 'Überfällig' an. Blockierte Tasks sind nicht 'überfällig',
sondern 'kritisch'. Zudem war die Berechnung redundant zum 'blocked'-Meter
(incidents aus metrics).
- Renamed 'todayAppointments' (Heute) → 'active' (Aktiv):
Was falsch: Der Meter zählte tasks mit state === 'In progress', das Label
'Heute' suggerierte aber einen Zeitbezug. 'Aktiv' beschreibt korrekt den
Bearbeitungsstatus.
- CSS-Klassen entsprechend umbenannt (meter-overdue → meter-critical,
meter-today → meter-active).
- Add VERSION file (0.1.0)
- Deploy workflow: auto-bump version (patch/minor/major)
- Git tags set automatically on deploy
- Version displayed in workflow run name