feat: add A/B partition updates with GRUB and Go update agent (Phase 3)

Implement atomic OS updates via A/B partition scheme with automatic
rollback. GRUB bootloader manages slot selection with a 3-attempt
boot counter that auto-rolls back on repeated health check failures.

GRUB boot config:
- A/B slot selection with boot_counter/boot_success env vars
- Automatic rollback when counter reaches 0 (3 failed boots)
- Debug, emergency shell, and manual slot-switch menu entries

Disk image (refactored):
- 4-partition GPT layout: EFI + System A + System B + Data
- GRUB EFI/BIOS installation with graceful fallbacks
- Both system partitions populated during image creation

Update agent (Go, zero external deps):
- pkg/grubenv: read/write GRUB env vars (grub-editenv + manual fallback)
- pkg/partition: find/mount/write system partitions by label
- pkg/image: HTTP download with SHA256 verification
- pkg/health: post-boot checks (containerd, API server, node Ready)
- 6 CLI commands: check, apply, activate, rollback, healthcheck, status
- 37 unit tests across all 4 packages

Deployment:
- K8s CronJob for automatic update checks (every 6 hours)
- ConfigMap for update server URL
- Health check Job for post-boot verification

Build pipeline:
- build-update-agent.sh compiles static Linux binary (~5.9 MB)
- inject-kubesolo.sh includes update agent in initramfs
- Makefile: build-update-agent, test-update-agent, test-update targets

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-11 11:12:46 -06:00
parent d900fa920e
commit 8d25e1890e
25 changed files with 2807 additions and 74 deletions

View File

@@ -1,6 +1,6 @@
.PHONY: all fetch build-cloudinit rootfs initramfs iso disk-image \
.PHONY: all fetch build-cloudinit build-update-agent rootfs initramfs iso disk-image \
test-boot test-k8s test-persistence test-deploy test-storage test-all \
test-cloudinit \
test-cloudinit test-update-agent \
dev-vm dev-vm-shell quick docker-build shellcheck \
kernel-audit clean distclean help
@@ -32,7 +32,11 @@ build-cloudinit:
@echo "==> Building cloud-init binary..."
$(BUILD_DIR)/scripts/build-cloudinit.sh
rootfs: fetch build-cloudinit
build-update-agent:
@echo "==> Building update agent..."
$(BUILD_DIR)/scripts/build-update-agent.sh
rootfs: fetch build-cloudinit build-update-agent
@echo "==> Preparing rootfs..."
$(BUILD_DIR)/scripts/extract-core.sh
$(BUILD_DIR)/scripts/inject-kubesolo.sh
@@ -88,6 +92,20 @@ test-cloudinit:
@echo "==> Testing cloud-init parser..."
cd cloud-init && go test ./... -v -count=1
# Update agent Go tests
test-update-agent:
@echo "==> Testing update agent..."
cd update && go test ./... -v -count=1
# A/B update integration tests
test-update: disk-image
@echo "==> Testing A/B update cycle..."
test/qemu/test-update.sh $(OUTPUT_DIR)/$(OS_NAME)-$(VERSION).img
test-rollback: disk-image
@echo "==> Testing rollback..."
test/qemu/test-rollback.sh $(OUTPUT_DIR)/$(OS_NAME)-$(VERSION).img
# Full integration test suite (requires more time)
test-integration: test-k8s test-deploy test-storage
@@ -157,24 +175,28 @@ help:
@echo "KubeSolo OS Build System (v$(VERSION))"
@echo ""
@echo "Build targets:"
@echo " make fetch Download Tiny Core ISO, KubeSolo, dependencies"
@echo " make build-cloudinit Build cloud-init Go binary"
@echo " make rootfs Extract + prepare rootfs with KubeSolo"
@echo " make initramfs Repack rootfs into kubesolo-os.gz"
@echo " make iso Create bootable ISO (default target)"
@echo " make disk-image Create raw disk image with boot + data partitions"
@echo " make quick Fast rebuild (re-inject + repack + ISO only)"
@echo " make docker-build Reproducible build inside Docker"
@echo " make fetch Download Tiny Core ISO, KubeSolo, dependencies"
@echo " make build-cloudinit Build cloud-init Go binary"
@echo " make build-update-agent Build update agent Go binary"
@echo " make rootfs Extract + prepare rootfs with KubeSolo"
@echo " make initramfs Repack rootfs into kubesolo-os.gz"
@echo " make iso Create bootable ISO (default target)"
@echo " make disk-image Create raw disk image with A/B partitions + GRUB"
@echo " make quick Fast rebuild (re-inject + repack + ISO only)"
@echo " make docker-build Reproducible build inside Docker"
@echo ""
@echo "Test targets:"
@echo " make test-boot Boot ISO in QEMU, verify boot success"
@echo " make test-k8s Boot + verify K8s node reaches Ready"
@echo " make test-persist Reboot disk image, verify state persists"
@echo " make test-deploy Deploy nginx pod, verify Running"
@echo " make test-storage Test PVC with local-path provisioner"
@echo " make test-cloudinit Run cloud-init Go unit tests"
@echo " make test-all Run core tests (boot + k8s + persistence)"
@echo " make test-integ Run full integration suite"
@echo " make test-boot Boot ISO in QEMU, verify boot success"
@echo " make test-k8s Boot + verify K8s node reaches Ready"
@echo " make test-persist Reboot disk image, verify state persists"
@echo " make test-deploy Deploy nginx pod, verify Running"
@echo " make test-storage Test PVC with local-path provisioner"
@echo " make test-cloudinit Run cloud-init Go unit tests"
@echo " make test-update-agent Run update agent Go unit tests"
@echo " make test-update A/B update cycle integration test"
@echo " make test-rollback Forced rollback integration test"
@echo " make test-all Run core tests (boot + k8s + persistence)"
@echo " make test-integ Run full integration suite"
@echo ""
@echo "Dev targets:"
@echo " make dev-vm Launch interactive QEMU VM"