fix: postgres WAL corruption recovery + memory bump + researcher/executor
CI - Build & Test / Backend (.NET) (push) Successful in 30s
CI - Build & Test / Frontend (Vue/TS) (push) Successful in 19s
CI - Build & Test / Security Check (push) Successful in 4s

- Postgres memory: 256M→384M limits, 64M→96M reservations
- Added pg_resetwal -f pre-deploy step to recover from corrupt WAL
  ('PANIC: could not locate a valid checkpoint record' caused by
  force-killed postgres during --force-recreate)
- Added data-checksums initdb arg for future corruption detection
- api→postgres and web→api depends_on: service_healthy→service_started
- Deploy wait loop: fail fast on unhealthy, wait on starting (180s)
- Added researcher/executor to ValidAssignees and frontend dropdowns
This commit is contained in:
2026-06-20 18:56:11 +02:00
parent b95bec7915
commit 06eac66baa
5 changed files with 30 additions and 3 deletions
+14
View File
@@ -211,6 +211,20 @@ jobs:
set -e
trap 'rm -f /tmp/nexus-deploy-env' EXIT
cat > /tmp/nexus-deploy-env
# ── WAL recovery: reset corrupt WAL that can block postgres startup ──
# Force-killed postgres containers can leave stale WAL entries that cause
# 'PANIC: could not locate a valid checkpoint record' on next start.
# pg_resetwal -f clears the WAL (losing uncommitted tx, which were lost anyway).
PG_VOL=\$(docker volume ls -q --filter name=nexus-postgres 2>/dev/null | head -1)
if [ -n \"\$PG_VOL\" ]; then
echo '🩺 Checking postgres WAL integrity...'
docker run --rm -v \"\$PG_VOL:/var/lib/postgresql/data\" \
--entrypoint sh postgres:17-alpine -c '
pg_resetwal -f /var/lib/postgresql/data 2>/dev/null && echo \"✅ WAL reset OK\" || echo \"WAL reset not needed / benign error\"
' 2>/dev/null || echo 'WAL check skipped'
fi
if [ -n '${SERVICE_ARG}' ]; then
echo '🚀 Deploying service: ${SERVICE_ARG}'
docker compose --env-file /tmp/nexus-deploy-env build ${BUILD_ARGS} ${SERVICE_ARG}