# KubeSolo OS — Atomic Update Flow This document describes the A/B partition update mechanism used by KubeSolo OS for safe, atomic OS updates with automatic rollback. ## Partition Layout KubeSolo OS uses a 4-partition GPT layout: ``` Disk (minimum 4 GB): Part 1: EFI/Boot (256 MB, FAT32, label: KSOLOEFI) — GRUB + boot config Part 2: System A (512 MB, ext4, label: KSOLOA) — vmlinuz + kubesolo-os.gz Part 3: System B (512 MB, ext4, label: KSOLOB) — vmlinuz + kubesolo-os.gz Part 4: Data (remaining, ext4, label: KSOLODATA) — persistent K8s state ``` Only one system partition is active at a time. The other is the "passive" slot used for staging updates. ## GRUB Environment Variables The A/B boot logic is controlled by three GRUB environment variables stored in `/boot/grub/grubenv`: | Variable | Values | Description | |---|---|---| | `active_slot` | `A` or `B` | Which system partition to boot | | `boot_counter` | `3` → `0` | Attempts remaining before rollback | | `boot_success` | `0` or `1` | Whether the current boot has been verified healthy | ## Boot Flow ``` ┌──────────────┐ │ GRUB starts │ └──────┬───────┘ │ ┌──────▼───────┐ │ Load grubenv │ └──────┬───────┘ │ ┌─────────▼─────────┐ │ boot_success == 1? │ └────┬──────────┬───┘ yes│ │no │ ┌─────▼──────────┐ │ │ boot_counter=0? │ │ └──┬──────────┬──┘ │ no │ │ yes │ │ ┌─────▼──────────┐ │ │ │ SWAP active_slot│ │ │ │ Reset counter=3 │ │ │ └─────┬───────────┘ │ │ │ ┌────▼───────▼──────────▼────┐ │ Set boot_success=0 │ │ Decrement boot_counter │ │ Boot active_slot partition │ └────────────┬───────────────┘ │ ┌─────────▼─────────┐ │ System boots... │ └─────────┬─────────┘ │ ┌─────────▼─────────────┐ │ Health check runs │ │ (containerd, API, │ │ node Ready) │ └─────┬──────────┬──────┘ pass│ │fail ┌─────▼─────┐ │ │ Mark boot │ │ boot_success stays 0 │ success=1 │ │ counter decremented │ counter=3 │ │ on next reboot └───────────┘ └────────────────────── ``` ### Rollback Behavior The boot counter starts at 3 and decrements on each boot where `boot_success` remains 0: 1. **Boot 1**: counter 3 → 2 (health check fails → reboot) 2. **Boot 2**: counter 2 → 1 (health check fails → reboot) 3. **Boot 3**: counter 1 → 0 (health check fails → reboot) 4. **Boot 4**: counter = 0, GRUB swaps `active_slot` and resets counter to 3 This provides **3 chances** for the new version to pass health checks before automatic rollback to the previous version. ## Update Agent Commands The `kubesolo-update` binary provides 6 subcommands: ### `check` — Check for Updates Queries the update server and compares against the current running version. ```bash kubesolo-update check --server https://updates.example.com ``` Output: ``` Current version: 1.0.0 (slot A) Latest version: 1.1.0 Status: update available ``` ### `apply` — Download and Write Update Downloads the new OS image (vmlinuz + initramfs) from the update server, verifies SHA256 checksums, and writes to the passive partition. ```bash kubesolo-update apply --server https://updates.example.com ``` This does NOT activate the new partition or trigger a reboot. ### `activate` — Set Next Boot Target Switches the GRUB boot target to the passive partition (the one with the new image) and sets `boot_counter=3`. ```bash kubesolo-update activate ``` After activation, reboot to boot into the new version: ```bash reboot ``` ### `rollback` — Force Rollback Manually switches to the other partition, regardless of health check status. ```bash kubesolo-update rollback reboot ``` ### `healthcheck` — Post-Boot Health Verification Runs after every boot to verify the system is healthy. If all checks pass, marks `boot_success=1` in GRUB to prevent rollback. Checks performed: 1. **containerd**: Socket exists and `ctr version` responds 2. **API server**: TCP connection to 127.0.0.1:6443 and `/healthz` endpoint 3. **Node Ready**: `kubectl get nodes` shows Ready status ```bash kubesolo-update healthcheck --timeout 120 ``` ### `status` — Show A/B Slot Status Displays the current partition state: ```bash kubesolo-update status ``` Output: ``` KubeSolo OS — A/B Partition Status ─────────────────────────────────── Active slot: A Passive slot: B Boot counter: 3 Boot success: 1 ✓ System is healthy (boot confirmed) ``` ## Update Server Protocol The update server is a simple HTTP(S) file server that serves: ``` /latest.json — Update metadata /vmlinuz- — Linux kernel /kubesolo-os-.gz — Initramfs ``` ### `latest.json` Format ```json { "version": "1.1.0", "vmlinuz_url": "https://updates.example.com/vmlinuz-1.1.0", "vmlinuz_sha256": "abc123...", "initramfs_url": "https://updates.example.com/kubesolo-os-1.1.0.gz", "initramfs_sha256": "def456...", "release_notes": "Bug fixes and performance improvements", "release_date": "2025-01-15" } ``` Any static file server (nginx, S3, GitHub Releases) can serve as an update server. ## Automated Updates via CronJob KubeSolo OS includes a Kubernetes CronJob for automatic update checking: ```bash # Deploy the update CronJob kubectl apply -f /usr/lib/kubesolo-os/update-cronjob.yaml # Configure the update server URL kubectl -n kube-system create configmap kubesolo-update-config \ --from-literal=server-url=https://updates.example.com # Manually trigger an update check kubectl create job --from=cronjob/kubesolo-update kubesolo-update-manual -n kube-system ``` The CronJob runs every 6 hours and performs `apply` (download + write). It does NOT reboot — the administrator controls when to reboot. ## Complete Update Cycle A full update cycle looks like: ```bash # 1. Check if update is available kubesolo-update check --server https://updates.example.com # 2. Download and write to passive partition kubesolo-update apply --server https://updates.example.com # 3. Activate the new partition kubesolo-update activate # 4. Reboot into the new version reboot # 5. (Automatic) Health check runs, marks boot successful # kubesolo-update healthcheck is run by init system # 6. Verify status kubesolo-update status ``` If the health check fails 3 times, GRUB automatically rolls back to the previous version on the next reboot. ## Command-Line Options All subcommands accept these options: | Option | Default | Description | |---|---|---| | `--server URL` | (none) | Update server URL | | `--grubenv PATH` | `/boot/grub/grubenv` | Path to GRUB environment file | | `--timeout SECS` | `120` | Health check timeout in seconds | ## File Locations | File | Description | |---|---| | `/usr/lib/kubesolo-os/kubesolo-update` | Update agent binary | | `/boot/grub/grubenv` | GRUB environment (on EFI partition) | | `/boot/grub/grub.cfg` | GRUB boot config with A/B logic | | `/vmlinuz` | Linux kernel | | `/kubesolo-os.gz` | Initramfs | | `/version` | Version string |