# Changelog All notable changes to KubeSolo OS are documented in this file. Format based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), versioning follows [Semantic Versioning](https://semver.org/spec/v2.0.0.html). ## [0.3.0] - 2026-05-14 The main themes: generic ARM64 (not just Raspberry Pi), an honest update lifecycle with state file + metrics, OCI multi-arch distribution via ghcr.io, and policy gates (channels, maintenance windows, version stepping-stones, pre-flight checks, auto-rollback). ### Added - Generic ARM64 build track distinct from Raspberry Pi: - `make kernel-arm64` builds a mainline kernel.org LTS kernel (6.12.10 by default) from `arm64 defconfig` + shared `kernel-container.fragment` + arm64 virt-host enables (VIRTIO_*, EFI_STUB, NVMe). - `make disk-image-arm64` produces a UEFI-bootable raw GPT image with A/B system partitions and GRUB-EFI ARM64. Targets QEMU virt, Graviton, Ampere, or any UEFI ARM64 host. - `hack/dev-vm-arm64.sh --disk` boots the built image through QEMU UEFI for end-to-end testing. - `test/qemu/test-boot-arm64-disk.sh` automated boot smoke test. - Bumped KubeSolo to v1.1.5 (was v1.1.0). New cloud-init flags surfaced: - `kubesolo.full` (v1.1.4+) — disable edge-optimised overrides - `kubesolo.disable-ipv6` (v1.1.5+) - `kubesolo.db-wal-repair` (v1.1.5+) — recover from unclean shutdowns - Per-arch supply-chain verification: `KUBESOLO_SHA256_AMD64` and `KUBESOLO_SHA256_ARM64` in `versions.env`, applied to the tarball before extract. - `docs/arm64-architecture.md` — defines the generic-vs-RPi two-track layout. - `docs/arm64-status.md` — Phase 3 status snapshot, known limitations, what's needed to ship. - `docs/ci-runners.md` — Gitea Actions runner setup (Odroid arm64-linux). - Update agent state machine and observability (`update/pkg/state`): - Persistent on-disk `state.json` at `/var/lib/kubesolo/update/state.json` (atomic write via tmp + rename). Records Phase (Idle / Checking / Downloading / Staged / Activated / Verifying / Success / RolledBack / Failed), FromVersion, ToVersion, StartedAt, UpdatedAt, LastError, AttemptCount, HealthCheckFailures. - `apply`, `activate`, `healthcheck`, `rollback` all transition state explicitly on entry / exit / failure. Errors land in LastError so `status` can show why. - `kubesolo-update status --json` emits the full state for orchestration tooling. Human-readable mode adds an "Update Lifecycle" section when not idle. - New Prometheus metrics: `kubesolo_update_phase{phase="..."}` (all 9 phase labels always emitted), `kubesolo_update_attempts_total`, `kubesolo_update_last_attempt_timestamp_seconds`. - Channels, maintenance windows, version policy (`update/pkg/config`): - `/etc/kubesolo/update.conf` (key=value, comments, missing-OK) configures server, channel, maintenance_window, pubkey, healthcheck_url, auto_rollback_after. - `cloud-init` top-level `updates:` block writes `update.conf` on first boot. Empty block leaves any existing file alone. - `apply` enforces four gates before download: maintenance window, channel match, runtime architecture match, min_compatible_version stepping-stone. All gate failures land in the state machine as Failed with a clear LastError. `--force` bypasses window + node-block-label. - `UpdateMetadata` JSON gains `channel`, `min_compatible_version`, `architecture` (all optional, omitempty). - OCI registry distribution (`update/pkg/oci`, ~280 LOC, 9 tests): - `kubesolo-update apply --registry ghcr.io//kubesolo-os --tag stable` pulls update artifacts from any OCI-compliant registry. Multi-arch indexes resolve to the runtime.GOARCH-matching manifest automatically. - Custom media types: `application/vnd.kubesolo.os.kernel.v1+octet-stream` and `application/vnd.kubesolo.os.initramfs.v1+gzip`. Annotations: `io.kubesolo.os.{version,channel,architecture,min_compatible_version, release_notes,release_date}`. - End-to-end digest verification from manifest to blobs via oras-go/v2. - `build/scripts/push-oci-artifact.sh` publishes per-arch artifacts via `oras`. Multi-arch index composition documented inline. - Dependencies added (update module only): oras.land/oras-go/v2 and transitive opencontainers/{go-digest,image-spec} + golang.org/x/sync. - Pre-flight gates and deeper healthcheck (`update/pkg/health` extended, `update/pkg/partition` extended): - Free-space pre-flight on the passive partition (image + 10% headroom) via `partition.FreeBytes` / `HasFreeSpaceFor`. - Node-block-label pre-flight: refuses if the local K8s node carries `updates.kubesolo.io/block=true`. Silently allowed when no kubeconfig (air-gap). Skipped by `--force`. - `CheckKubeSystemReady` waits until every kube-system pod has held Running for ≥ N seconds (configurable via `--kube-system-settle`). - `CheckProbeURL` GETs an operator-supplied URL; 200 = pass. Configurable via `--healthcheck-url` or `healthcheck_url=` in update.conf. - `CheckDiskWritable` writes / fsyncs / reads / deletes a probe file under `/var/lib/kubesolo` to catch a wedged data partition. - `--auto-rollback-after N` (also `auto_rollback_after=` in update.conf): after N consecutive post-activation healthcheck failures, the agent calls `ForceRollback()` and the operator/init reboots. Reset to 0 on a clean pass. - `.gitea/workflows/build-arm64.yaml` — full ARM64 build on the Odroid self-hosted runner. Triggers on push to main, tags, and workflow_dispatch. Boot smoke test marked continue-on-error pending KVM or real-hardware validation. ### Changed - `build/scripts/build-kernel-arm64.sh` is now the **generic ARM64** kernel build (mainline kernel.org LTS, generic UEFI/virtio). - Renamed `build/scripts/build-kernel-rpi.sh` (was `build-kernel-arm64.sh`). RPi kernel build (raspberrypi/linux fork, bcm2711_defconfig) lives here now. - Renamed `build/config/kernel-container.fragment` (was `rpi-kernel-config.fragment`). Misnomer: contents are arch-agnostic and now shared across x86, ARM64-generic, and RPi kernels. - `build/scripts/build-kernel.sh` (x86) refactored to consume the shared fragment via a generic `apply_fragment` function. ~50 lines of duplication killed. - `KUBESOLO_VERSION` moved out of `fetch-components.sh` defaults into `versions.env`. Bumping is now a one-line PR. ### Fixed - Native ARM64 build hosts (e.g. an Odroid runner) no longer require the x86 cross-compiler. Both `build-kernel-arm64.sh` and `build-kernel-rpi.sh` detect `uname -m` and use the host's gcc directly when arch matches. - ARM64 grub.cfg console ordering: `ttyAMA0` is now the primary console (`console=ttyS0,... console=ttyAMA0,...`). Init output is now visible on QEMU virt and most ARM64 SBCs without further configuration. - ARM64 boot: replaced piCore64's `/init` with our staged init at `/init` and `/sbin/init`. Previously the kernel ran piCore's TCE handler which segfaulted in our environment. - ARM64 boot: replaced piCore64's broken dynamic BusyBox with the build host's `busybox-static`. piCore's binary triggered EL0 instruction-abort panics on QEMU virt under both `-cpu cortex-a72` and `-cpu max`. - POSIX-character-class portability: `tr -d '[:space:]'` in `30-kernel-modules.sh` and `40-sysctl.sh` replaced with explicit `' \t\r\n'`. Ubuntu's busybox-static 1.30.1 doesn't parse `[:space:]` and instead deletes the literal characters `[ : s p a c e ]`, which truncated module names (`virtio_net` → `virtio_nt`, etc.) and sysctl keys. - `inject-kubesolo.sh` no longer copies `init/lib/functions.sh` into `init.d/`. Previously the main init loop tried to run it as a stage after stage 90 and panicked with "Init completed without exec'ing KubeSolo". - ARM64 disk image: `TARGET_ARCH=arm64 create-disk-image.sh` produces `BOOTAA64.EFI` via `grub-mkimage -O arm64-efi` (not `bootx64.efi`). Skips the BIOS-only `grub-install --target=i386-pc` step. - `build/Dockerfile.builder`: added `grub-efi-amd64-bin`, `grub-efi-arm64-bin`, `grub-pc-bin`, `grub-common`, `grub2-common`, and `busybox-static` so the Docker-based build flow can produce ARM64 disk images and gets the same BusyBox swap behaviour as native builds. ### Known limitations (deferred to follow-up) - **ARM64 LABEL= resolution** doesn't work yet — piCore's `blkid`/`findfs` crash in QEMU and our static busybox lacks the applets. Hardcoded `/dev/vda4` as a workaround in `build/grub/grub-arm64.cfg`. Production fix: ship static `blkid`/`findfs` or replace LABEL resolution with a sysfs walk. - **AppArmor profile load fails on ARM64** (apparmor_parser ABI mismatch). Init reports it; boot continues without enforcement. - **OCI signature verification** is deferred. The HTTP transport still honours `--pubkey` for `.sig` files; the OCI transport is digest-verified end-to-end via oras-go but does not yet consume cosign-style referrer attestations. Targeted for v0.3.1. - **Real-hardware validation** of the generic ARM64 image is still pending. Builds and boots end-to-end under QEMU virt; production certification waits on a Graviton / Ampere run. - **QEMU TCG performance** can trigger KubeSolo's first-boot image-import deadline. Not a defect in the OS itself; real hardware and KVM-accelerated QEMU complete the import in seconds. ## [0.2.0] - 2026-02-12 ### Added - Cloud-init: support all documented KubeSolo CLI flags (`--local-storage-shared-path`, `--debug`, `--pprof-server`, `--portainer-edge-id`, `--portainer-edge-key`, `--portainer-edge-async`) - Cloud-init: `full-config.yaml` example showing all supported parameters - Cloud-init: KubeSolo configuration reference table in docs/cloud-init.md - Security hardening: mount hardening, sysctl, kernel module lock, AppArmor profiles - ARM64 Raspberry Pi support with A/B boot via tryboot - BootEnv abstraction for GRUB and RPi boot environments - Go 1.25.5 installed on host for native builds ## [0.1.0] - 2026-02-12 First release with all 5 design-doc phases complete. ISO boots and runs K8s pods. ### Added #### Custom Kernel - Custom kernel build (6.18.2-tinycore64) with container-critical configs - Added CONFIG_CGROUP_BPF, CONFIG_DEVTMPFS, CONFIG_DEVTMPFS_MOUNT, CONFIG_MEMCG, CONFIG_CFS_BANDWIDTH - Stripped unnecessary subsystems (sound, GPU, wireless, Bluetooth, etc.) - Selective kernel module install — only modules.list + transitive deps in initramfs #### Init System (Phase 1) - POSIX sh init system with staged boot (00-early-mount through 90-kubesolo) - switch_root from initramfs to SquashFS root - Persistent data partition mount with bind-mounts for K8s state - Kernel module loading, sysctl tuning, network, hostname, NTP - Emergency shell fallback on boot failure - Device node creation via mknod fallback from sysfs #### Cloud-Init (Phase 2) - Go-based cloud-init parser (~2.7 MB static binary) - Network configuration: DHCP and static IP modes - Hostname and machine-id generation - KubeSolo configuration (node-name, extra flags) - Portainer Edge Agent integration via K8s manifest injection - Persistent config saved to /mnt/data/ for next-boot fast path - 22 Go tests #### A/B Atomic Updates (Phase 3) - 4-partition GPT disk image: EFI + System A + System B + Data - GRUB 2 bootloader with A/B slot selection and boot counter rollback - Go update agent (~6.0 MB static binary) with check, apply, activate, rollback commands - Health check: containerd + K8s API + node Ready verification - Update server protocol: HTTP serving latest.json + image files - K8s CronJob for automated update checks (every 6 hours) - Zero external Go dependencies — uses kubectl/ctr exec commands #### Production Hardening (Phase 4) - Ed25519 image signing with pure Go stdlib (zero external deps) - Key generation, signing, and verification CLI commands - Portainer Edge Agent deployment via cloud-init - SSH extension injection for debugging (hack/inject-ssh.sh) - Boot time and resource usage benchmarks - Deployment guide documentation #### Distribution & Fleet Management (Phase 5) - Gitea Actions CI/CD (test + build + shellcheck on push, release on tags) - OCI container image packaging (scratch-based) - Prometheus metrics endpoint (zero-dependency text exposition format) - USB provisioning script with cloud-init injection - ARM64 cross-compilation support #### Build System - Makefile with full build orchestration - Dockerized reproducible builds (build/Dockerfile.builder) - Component fetching with version pinning - ISO and raw disk image creation - Fast rebuild path (`make quick`) #### Documentation - Architecture design document - Boot flow reference - A/B update flow reference - Cloud-init configuration reference - Deployment and operations guide ### Fixed - Replaced `grep -oP` with POSIX-safe `sed` in functions.sh (BusyBox compatibility) - Replaced `grep -qiE` with `grep -qi -e` pattern (POSIX compliance) - Fixed KVM flag handling in dev-vm.sh (bash array context) - Added iptables table pre-initialization before kube-proxy start (nf_tables issue) - Added /dev/kmsg and /etc/machine-id creation for kubelet - Added CA certificates bundle to initramfs (containerd TLS verification for Docker Hub) - Added DNS fallback (10.0.2.3 + 8.8.8.8) when DHCP client doesn't populate resolv.conf - Added headless Service to Portainer Edge Agent manifest (agent peer discovery DNS) - Added kubesolo.edge_id/edge_key kernel boot parameters for Portainer Edge - Added auto-format of unformatted data disks on first boot - Rewrote dev-vm.sh for macOS: bsdtar ISO extraction, Homebrew mkfs.ext4 detection, direct kernel boot, TCG acceleration, port 8080 forwarding - Kubeconfig now served via HTTP on port 8080 (serial console truncates base64 lines) - Added 127.0.0.1 and 10.0.2.15 to API server SANs for QEMU port forwarding - dev-vm.sh now works on Linux: fallback ISO extraction via isoinfo or loop mount, KVM auto-detection, platform-aware error messages