Phase 8 of v0.3. Tightens the update lifecycle on both ends.
Pre-flight (apply.go, before any download):
- Free-space check on the passive partition: image size + 10% headroom must
be available. Uses statfs(2) via the new pkg/partition.FreeBytes /
HasFreeSpaceFor helpers (tests cover happy path, tiny request, huge
request, missing path). Catches corrupted-FS and shrunk-partition cases
before we destroy the existing slot data.
- Node-block-label check: refuses if the local K8s node carries the
updates.kubesolo.io/block=true label. New pkg/health.CheckNodeBlocked
shells out to kubectl per the project's zero-deps stance. Silently bypassed
when no kubeconfig is reachable (air-gap case). Skipped by --force.
Healthcheck (extended via new pkg/health/extended.go + preflight.go):
- CheckKubeSystemReady waits until every kube-system pod has held the Running
phase for >= N seconds (default 30). Catches "started ok, will crash-loop"
bugs that a single-shot phase check misses.
- CheckProbeURL fetches an operator-supplied URL; 200 = pass. Wired through
update.conf as healthcheck_url= and cloud-init updates.healthcheck_url.
- CheckDiskWritable writes/fsyncs/reads a 1-KiB probe under /var/lib/kubesolo.
Always runs in healthcheck so a wedged data partition fails fast.
- pkg/health.Status grows KubeSystemReady, ProbeURL, DiskWritable booleans.
Optional checks default to true in RunAll() so they don't block when
unconfigured. health_test.go updated to the new 6-field shape.
Auto-rollback (healthcheck.go):
- state.UpdateState gains HealthCheckFailures (consecutive post-Activated
failures). Reset on a clean pass.
- --auto-rollback-after N (also auto_rollback_after= in update.conf) triggers
env.ForceRollback() when the failure count reaches the threshold. State
transitions to RolledBack with a descriptive LastError. The command still
exits with the healthcheck error; the operator/init is expected to reboot.
- Only fires while Phase == Activated. Doesn't second-guess a long-stable
system that happens to fail one healthcheck.
config / opts / cloud-init plumbing:
- update.conf gains healthcheck_url= and auto_rollback_after= keys.
- New CLI flags: --healthcheck-url, --auto-rollback-after, --kube-system-settle.
- cloud-init full-config.yaml documents the new updates: subfields.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>