# Changelog All notable changes to KubeSolo OS are documented in this file. Format based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), versioning follows [Semantic Versioning](https://semver.org/spec/v2.0.0.html). ## [0.3.1] - 2026-05-15 First fully-functional generic ARM64 release. v0.3.0 shipped the build scaffold; v0.3.1 makes it actually boot a Kubernetes cluster end-to-end on QEMU virt under HVF acceleration. Validated by deploying CoreDNS, local-path-provisioner, and an `nginx:alpine` workload — all reach Running, `kubectl get nodes` reports `Ready`. ### Fixed - **Dual-glibc loading on ARM64** — piCore64's `/lib/libc.so.6` and the build host's `/lib/$LIB_ARCH/libc.so.6` could both be resolved into the same process by the dynamic linker, triggering `*** stack smashing detected ***` aborts when stack frames crossed between functions linked against different libcs. Fix: bundle the full glibc family (libc + libpthread + libdl + libm + libresolv + librt + libanl + libgcc_s + ld.so), delete piCore's duplicates in `/lib/`, and write `/etc/ld.so.conf` + `ldconfig -r` so the runtime linker has a deterministic search order. (`76ed2ff`) - **`nft` binary not bundled** — KubeSolo v1.1.4+ runs `nft add table ip kubesolo-masq` for pod-masquerade setup, but `inject-kubesolo.sh` only bundled `xtables-nft-multi`. Without standalone `nft` in `$PATH`, KubeSolo FATAL'd at startup. Fix: copy `/usr/sbin/nft` + its non-shared libs (libnftables, libedit, libjansson, libgmp, libtinfo, libbsd, libmd) into the rootfs. (`51c1f78`) - **nftables address-family handlers** — `nf_tables` core was loaded but no address families were registered, so `nft add table ip ...` returned `EOPNOTSUPP`. The bool Kconfigs `CONFIG_NF_TABLES_IPV4`, `CONFIG_NF_TABLES_IPV6`, `CONFIG_NF_TABLES_INET`, `CONFIG_NF_TABLES_NETDEV` are required and weren't in the fragment. Fix: add to `kernel-container.fragment` as `=y`. (`7e46f8f`) - **kube-proxy nftables-backend expression modules** — Kubernetes 1.34's kube-proxy nft backend uses `numgen`, `hash`, `limit`, `log` expressions. The corresponding kernel modules (`CONFIG_NFT_NUMGEN`, etc.) were missing from the fragment AND the runtime module list, so even after a kernel rebuild stage 30 didn't load them and stage 85's `kernel.modules_disabled=1` lockdown prevented on-demand loads. Fix: add to both `kernel-container.fragment` (as `=m`) and `modules.list` / `modules-arm64.list`. (`31eee77`, `3bcf2e1`) - **`modules.list` inline-comment parser bug** — the inject script's comment-strip only matched lines starting with `#`, not lines with inline `# comment` tails. So `nft_numgen # foo` was passed verbatim to modprobe, resolved to nothing, and the .ko never made it into the initramfs. Fix: parse with `mod="${mod%%#*}"` to strip inline tails. (`bc3300e`) - **Banner only printed on kubeconfig success** — `90-kubesolo.sh` gated the host-access banner behind `if [ -f $KUBECONFIG_PATH ]`. When KubeSolo crashed early (bug #2 above) or the wait loop timed out, the user never saw the connection instructions. Fix: write the banner to `/etc/motd` AND print it unconditionally after the wait loop. (`51c1f78`) - **`dev-vm-arm64.sh` missing port-8080 hostfwd** — the in-VM HTTP server that serves the kubeconfig listens on port 8080, but the QEMU `-net user` line only forwarded 6443 and 2222, so `curl http://localhost:8080` from the host machine connected to nothing. Fix: add the third hostfwd. (`fbe2d0b`) ### Fixed (CI) - **`release.yaml` workflow** rewritten so v0.3.1+ tag pushes auto-publish a complete release page on Gitea: `actions/upload-artifact` pinned to `@v3` for act_runner compatibility, the `softprops/action-gh-release@v2` step replaced with a direct `curl` against `/api/v1/repos/.../releases` (`softprops` hard-codes `api.github.com` so it silently no-ops on Gitea), added a `build-disk-arm64` job that builds on the `arm64-linux` runner. v0.3.0's manual-upload-only release was the canary that exposed all three bugs. (`f8c308d`) ### Known issues carried forward to v0.3.2 These don't block normal operation but are tracked: - `xt_comment` userspace extension load fails on the iptables-nft path, causing kubelet's KUBE-FIREWALL rule install to skip. Reported as `Couldn't load match 'comment'` in the boot log. kubelet continues without the localhost-drop rule. - `containerd-shim-runc-v2 -info` probe reports `runc: executable file not found in $PATH`. Cosmetic — containerd uses the absolute path from its config when actually launching containers. - `kube-proxy conntrack cleanup` logs `Failed to list conntrack entries: invalid argument` every cleanup cycle. Probably needs `CONFIG_NF_CONNTRACK_PROCFS` or netlink-glue tweaks. - Several pods restart 1–2 times on first boot due to a PLEG / runtime-probe race in the kubelet startup path. Pods stabilise. ## [0.3.0] - 2026-05-14 The main themes: generic ARM64 (not just Raspberry Pi), an honest update lifecycle with state file + metrics, OCI multi-arch distribution via ghcr.io, and policy gates (channels, maintenance windows, version stepping-stones, pre-flight checks, auto-rollback). ### Added - Generic ARM64 build track distinct from Raspberry Pi: - `make kernel-arm64` builds a mainline kernel.org LTS kernel (6.12.10 by default) from `arm64 defconfig` + shared `kernel-container.fragment` + arm64 virt-host enables (VIRTIO_*, EFI_STUB, NVMe). - `make disk-image-arm64` produces a UEFI-bootable raw GPT image with A/B system partitions and GRUB-EFI ARM64. Targets QEMU virt, Graviton, Ampere, or any UEFI ARM64 host. - `hack/dev-vm-arm64.sh --disk` boots the built image through QEMU UEFI for end-to-end testing. - `test/qemu/test-boot-arm64-disk.sh` automated boot smoke test. - Bumped KubeSolo to v1.1.5 (was v1.1.0). New cloud-init flags surfaced: - `kubesolo.full` (v1.1.4+) — disable edge-optimised overrides - `kubesolo.disable-ipv6` (v1.1.5+) - `kubesolo.db-wal-repair` (v1.1.5+) — recover from unclean shutdowns - Per-arch supply-chain verification: `KUBESOLO_SHA256_AMD64` and `KUBESOLO_SHA256_ARM64` in `versions.env`, applied to the tarball before extract. - `docs/arm64-architecture.md` — defines the generic-vs-RPi two-track layout. - `docs/arm64-status.md` — Phase 3 status snapshot, known limitations, what's needed to ship. - `docs/ci-runners.md` — Gitea Actions runner setup (Odroid arm64-linux). - Update agent state machine and observability (`update/pkg/state`): - Persistent on-disk `state.json` at `/var/lib/kubesolo/update/state.json` (atomic write via tmp + rename). Records Phase (Idle / Checking / Downloading / Staged / Activated / Verifying / Success / RolledBack / Failed), FromVersion, ToVersion, StartedAt, UpdatedAt, LastError, AttemptCount, HealthCheckFailures. - `apply`, `activate`, `healthcheck`, `rollback` all transition state explicitly on entry / exit / failure. Errors land in LastError so `status` can show why. - `kubesolo-update status --json` emits the full state for orchestration tooling. Human-readable mode adds an "Update Lifecycle" section when not idle. - New Prometheus metrics: `kubesolo_update_phase{phase="..."}` (all 9 phase labels always emitted), `kubesolo_update_attempts_total`, `kubesolo_update_last_attempt_timestamp_seconds`. - Channels, maintenance windows, version policy (`update/pkg/config`): - `/etc/kubesolo/update.conf` (key=value, comments, missing-OK) configures server, channel, maintenance_window, pubkey, healthcheck_url, auto_rollback_after. - `cloud-init` top-level `updates:` block writes `update.conf` on first boot. Empty block leaves any existing file alone. - `apply` enforces four gates before download: maintenance window, channel match, runtime architecture match, min_compatible_version stepping-stone. All gate failures land in the state machine as Failed with a clear LastError. `--force` bypasses window + node-block-label. - `UpdateMetadata` JSON gains `channel`, `min_compatible_version`, `architecture` (all optional, omitempty). - OCI registry distribution (`update/pkg/oci`, ~280 LOC, 9 tests): - `kubesolo-update apply --registry ghcr.io//kubesolo-os --tag stable` pulls update artifacts from any OCI-compliant registry. Multi-arch indexes resolve to the runtime.GOARCH-matching manifest automatically. - Custom media types: `application/vnd.kubesolo.os.kernel.v1+octet-stream` and `application/vnd.kubesolo.os.initramfs.v1+gzip`. Annotations: `io.kubesolo.os.{version,channel,architecture,min_compatible_version, release_notes,release_date}`. - End-to-end digest verification from manifest to blobs via oras-go/v2. - `build/scripts/push-oci-artifact.sh` publishes per-arch artifacts via `oras`. Multi-arch index composition documented inline. - Dependencies added (update module only): oras.land/oras-go/v2 and transitive opencontainers/{go-digest,image-spec} + golang.org/x/sync. - Pre-flight gates and deeper healthcheck (`update/pkg/health` extended, `update/pkg/partition` extended): - Free-space pre-flight on the passive partition (image + 10% headroom) via `partition.FreeBytes` / `HasFreeSpaceFor`. - Node-block-label pre-flight: refuses if the local K8s node carries `updates.kubesolo.io/block=true`. Silently allowed when no kubeconfig (air-gap). Skipped by `--force`. - `CheckKubeSystemReady` waits until every kube-system pod has held Running for ≥ N seconds (configurable via `--kube-system-settle`). - `CheckProbeURL` GETs an operator-supplied URL; 200 = pass. Configurable via `--healthcheck-url` or `healthcheck_url=` in update.conf. - `CheckDiskWritable` writes / fsyncs / reads / deletes a probe file under `/var/lib/kubesolo` to catch a wedged data partition. - `--auto-rollback-after N` (also `auto_rollback_after=` in update.conf): after N consecutive post-activation healthcheck failures, the agent calls `ForceRollback()` and the operator/init reboots. Reset to 0 on a clean pass. - `.gitea/workflows/build-arm64.yaml` — full ARM64 build on the Odroid self-hosted runner. Triggers on push to main, tags, and workflow_dispatch. Boot smoke test marked continue-on-error pending KVM or real-hardware validation. ### Changed - `build/scripts/build-kernel-arm64.sh` is now the **generic ARM64** kernel build (mainline kernel.org LTS, generic UEFI/virtio). - Renamed `build/scripts/build-kernel-rpi.sh` (was `build-kernel-arm64.sh`). RPi kernel build (raspberrypi/linux fork, bcm2711_defconfig) lives here now. - Renamed `build/config/kernel-container.fragment` (was `rpi-kernel-config.fragment`). Misnomer: contents are arch-agnostic and now shared across x86, ARM64-generic, and RPi kernels. - `build/scripts/build-kernel.sh` (x86) refactored to consume the shared fragment via a generic `apply_fragment` function. ~50 lines of duplication killed. - `KUBESOLO_VERSION` moved out of `fetch-components.sh` defaults into `versions.env`. Bumping is now a one-line PR. ### Fixed - Native ARM64 build hosts (e.g. an Odroid runner) no longer require the x86 cross-compiler. Both `build-kernel-arm64.sh` and `build-kernel-rpi.sh` detect `uname -m` and use the host's gcc directly when arch matches. - ARM64 grub.cfg console ordering: `ttyAMA0` is now the primary console (`console=ttyS0,... console=ttyAMA0,...`). Init output is now visible on QEMU virt and most ARM64 SBCs without further configuration. - ARM64 boot: replaced piCore64's `/init` with our staged init at `/init` and `/sbin/init`. Previously the kernel ran piCore's TCE handler which segfaulted in our environment. - ARM64 boot: replaced piCore64's broken dynamic BusyBox with the build host's `busybox-static`. piCore's binary triggered EL0 instruction-abort panics on QEMU virt under both `-cpu cortex-a72` and `-cpu max`. - POSIX-character-class portability: `tr -d '[:space:]'` in `30-kernel-modules.sh` and `40-sysctl.sh` replaced with explicit `' \t\r\n'`. Ubuntu's busybox-static 1.30.1 doesn't parse `[:space:]` and instead deletes the literal characters `[ : s p a c e ]`, which truncated module names (`virtio_net` → `virtio_nt`, etc.) and sysctl keys. - `inject-kubesolo.sh` no longer copies `init/lib/functions.sh` into `init.d/`. Previously the main init loop tried to run it as a stage after stage 90 and panicked with "Init completed without exec'ing KubeSolo". - ARM64 disk image: `TARGET_ARCH=arm64 create-disk-image.sh` produces `BOOTAA64.EFI` via `grub-mkimage -O arm64-efi` (not `bootx64.efi`). Skips the BIOS-only `grub-install --target=i386-pc` step. - `build/Dockerfile.builder`: added `grub-efi-amd64-bin`, `grub-efi-arm64-bin`, `grub-pc-bin`, `grub-common`, `grub2-common`, and `busybox-static` so the Docker-based build flow can produce ARM64 disk images and gets the same BusyBox swap behaviour as native builds. ### Known limitations (deferred to follow-up) - **ARM64 LABEL= resolution** doesn't work yet — piCore's `blkid`/`findfs` crash in QEMU and our static busybox lacks the applets. Hardcoded `/dev/vda4` as a workaround in `build/grub/grub-arm64.cfg`. Production fix: ship static `blkid`/`findfs` or replace LABEL resolution with a sysfs walk. - **AppArmor profile load fails on ARM64** (apparmor_parser ABI mismatch). Init reports it; boot continues without enforcement. - **OCI signature verification** is deferred. The HTTP transport still honours `--pubkey` for `.sig` files; the OCI transport is digest-verified end-to-end via oras-go but does not yet consume cosign-style referrer attestations. Targeted for v0.3.1. - **Real-hardware validation** of the generic ARM64 image is still pending. Builds and boots end-to-end under QEMU virt; production certification waits on a Graviton / Ampere run. - **QEMU TCG performance** can trigger KubeSolo's first-boot image-import deadline. Not a defect in the OS itself; real hardware and KVM-accelerated QEMU complete the import in seconds. ## [0.2.0] - 2026-02-12 ### Added - Cloud-init: support all documented KubeSolo CLI flags (`--local-storage-shared-path`, `--debug`, `--pprof-server`, `--portainer-edge-id`, `--portainer-edge-key`, `--portainer-edge-async`) - Cloud-init: `full-config.yaml` example showing all supported parameters - Cloud-init: KubeSolo configuration reference table in docs/cloud-init.md - Security hardening: mount hardening, sysctl, kernel module lock, AppArmor profiles - ARM64 Raspberry Pi support with A/B boot via tryboot - BootEnv abstraction for GRUB and RPi boot environments - Go 1.25.5 installed on host for native builds ## [0.1.0] - 2026-02-12 First release with all 5 design-doc phases complete. ISO boots and runs K8s pods. ### Added #### Custom Kernel - Custom kernel build (6.18.2-tinycore64) with container-critical configs - Added CONFIG_CGROUP_BPF, CONFIG_DEVTMPFS, CONFIG_DEVTMPFS_MOUNT, CONFIG_MEMCG, CONFIG_CFS_BANDWIDTH - Stripped unnecessary subsystems (sound, GPU, wireless, Bluetooth, etc.) - Selective kernel module install — only modules.list + transitive deps in initramfs #### Init System (Phase 1) - POSIX sh init system with staged boot (00-early-mount through 90-kubesolo) - switch_root from initramfs to SquashFS root - Persistent data partition mount with bind-mounts for K8s state - Kernel module loading, sysctl tuning, network, hostname, NTP - Emergency shell fallback on boot failure - Device node creation via mknod fallback from sysfs #### Cloud-Init (Phase 2) - Go-based cloud-init parser (~2.7 MB static binary) - Network configuration: DHCP and static IP modes - Hostname and machine-id generation - KubeSolo configuration (node-name, extra flags) - Portainer Edge Agent integration via K8s manifest injection - Persistent config saved to /mnt/data/ for next-boot fast path - 22 Go tests #### A/B Atomic Updates (Phase 3) - 4-partition GPT disk image: EFI + System A + System B + Data - GRUB 2 bootloader with A/B slot selection and boot counter rollback - Go update agent (~6.0 MB static binary) with check, apply, activate, rollback commands - Health check: containerd + K8s API + node Ready verification - Update server protocol: HTTP serving latest.json + image files - K8s CronJob for automated update checks (every 6 hours) - Zero external Go dependencies — uses kubectl/ctr exec commands #### Production Hardening (Phase 4) - Ed25519 image signing with pure Go stdlib (zero external deps) - Key generation, signing, and verification CLI commands - Portainer Edge Agent deployment via cloud-init - SSH extension injection for debugging (hack/inject-ssh.sh) - Boot time and resource usage benchmarks - Deployment guide documentation #### Distribution & Fleet Management (Phase 5) - Gitea Actions CI/CD (test + build + shellcheck on push, release on tags) - OCI container image packaging (scratch-based) - Prometheus metrics endpoint (zero-dependency text exposition format) - USB provisioning script with cloud-init injection - ARM64 cross-compilation support #### Build System - Makefile with full build orchestration - Dockerized reproducible builds (build/Dockerfile.builder) - Component fetching with version pinning - ISO and raw disk image creation - Fast rebuild path (`make quick`) #### Documentation - Architecture design document - Boot flow reference - A/B update flow reference - Cloud-init configuration reference - Deployment and operations guide ### Fixed - Replaced `grep -oP` with POSIX-safe `sed` in functions.sh (BusyBox compatibility) - Replaced `grep -qiE` with `grep -qi -e` pattern (POSIX compliance) - Fixed KVM flag handling in dev-vm.sh (bash array context) - Added iptables table pre-initialization before kube-proxy start (nf_tables issue) - Added /dev/kmsg and /etc/machine-id creation for kubelet - Added CA certificates bundle to initramfs (containerd TLS verification for Docker Hub) - Added DNS fallback (10.0.2.3 + 8.8.8.8) when DHCP client doesn't populate resolv.conf - Added headless Service to Portainer Edge Agent manifest (agent peer discovery DNS) - Added kubesolo.edge_id/edge_key kernel boot parameters for Portainer Edge - Added auto-format of unformatted data disks on first boot - Rewrote dev-vm.sh for macOS: bsdtar ISO extraction, Homebrew mkfs.ext4 detection, direct kernel boot, TCG acceleration, port 8080 forwarding - Kubeconfig now served via HTTP on port 8080 (serial console truncates base64 lines) - Added 127.0.0.1 and 10.0.2.15 to API server SANs for QEMU port forwarding - dev-vm.sh now works on Linux: fallback ISO extraction via isoinfo or loop mount, KVM auto-detection, platform-aware error messages