VERSION 0.3.0 -> 0.3.1. Append CHANGELOG entry covering the eight fix commits since v0.3.0 (dual-glibc, nft binary, NF_TABLES_IPV4 family, NFT_NUMGEN expressions, modules.list parser, banner+motd, port 8080 hostfwd, and the release.yaml workflow rewrite). End-to-end validated on Apple Silicon Mac under QEMU virt + HVF: - kubectl get nodes -> kubesolo-XXXXXX Ready - kube-system/coredns 1/1 Running - local-path-storage/local-path-prov 1/1 Running - default/nginx-test (user workload) 1/1 Running (pulled+started 11s) Tagging this release is also the first real exercise of the rewritten release.yaml workflow. If it works as designed, the v0.3.1 release page should populate automatically with: x86 ISO + .img.xz, ARM64 .arm64.img.xz, Go binaries (cloudinit + update, amd64 + arm64), and SHA256SUMS. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
18 KiB
Changelog
All notable changes to KubeSolo OS are documented in this file.
Format based on Keep a Changelog, versioning follows Semantic Versioning.
[0.3.1] - 2026-05-15
First fully-functional generic ARM64 release. v0.3.0 shipped the build
scaffold; v0.3.1 makes it actually boot a Kubernetes cluster end-to-end
on QEMU virt under HVF acceleration. Validated by deploying CoreDNS,
local-path-provisioner, and an nginx:alpine workload — all reach
Running, kubectl get nodes reports Ready.
Fixed
- Dual-glibc loading on ARM64 — piCore64's
/lib/libc.so.6and the build host's/lib/$LIB_ARCH/libc.so.6could both be resolved into the same process by the dynamic linker, triggering*** stack smashing detected ***aborts when stack frames crossed between functions linked against different libcs. Fix: bundle the full glibc family (libc + libpthread + libdl + libm + libresolv + librt + libanl + libgcc_s + ld.so), delete piCore's duplicates in/lib/, and write/etc/ld.so.conf+ldconfig -rso the runtime linker has a deterministic search order. (76ed2ff) nftbinary not bundled — KubeSolo v1.1.4+ runsnft add table ip kubesolo-masqfor pod-masquerade setup, butinject-kubesolo.shonly bundledxtables-nft-multi. Without standalonenftin$PATH, KubeSolo FATAL'd at startup. Fix: copy/usr/sbin/nft+ its non-shared libs (libnftables, libedit, libjansson, libgmp, libtinfo, libbsd, libmd) into the rootfs. (51c1f78)- nftables address-family handlers —
nf_tablescore was loaded but no address families were registered, sonft add table ip ...returnedEOPNOTSUPP. The bool KconfigsCONFIG_NF_TABLES_IPV4,CONFIG_NF_TABLES_IPV6,CONFIG_NF_TABLES_INET,CONFIG_NF_TABLES_NETDEVare required and weren't in the fragment. Fix: add tokernel-container.fragmentas=y. (7e46f8f) - kube-proxy nftables-backend expression modules — Kubernetes 1.34's
kube-proxy nft backend uses
numgen,hash,limit,logexpressions. The corresponding kernel modules (CONFIG_NFT_NUMGEN, etc.) were missing from the fragment AND the runtime module list, so even after a kernel rebuild stage 30 didn't load them and stage 85'skernel.modules_disabled=1lockdown prevented on-demand loads. Fix: add to bothkernel-container.fragment(as=m) andmodules.list/modules-arm64.list. (31eee77,3bcf2e1) modules.listinline-comment parser bug — the inject script's comment-strip only matched lines starting with#, not lines with inline# commenttails. Sonft_numgen # foowas passed verbatim to modprobe, resolved to nothing, and the .ko never made it into the initramfs. Fix: parse withmod="${mod%%#*}"to strip inline tails. (bc3300e)- Banner only printed on kubeconfig success —
90-kubesolo.shgated the host-access banner behindif [ -f $KUBECONFIG_PATH ]. When KubeSolo crashed early (bug #2 above) or the wait loop timed out, the user never saw the connection instructions. Fix: write the banner to/etc/motdAND print it unconditionally after the wait loop. (51c1f78) dev-vm-arm64.shmissing port-8080 hostfwd — the in-VM HTTP server that serves the kubeconfig listens on port 8080, but the QEMU-net userline only forwarded 6443 and 2222, socurl http://localhost:8080from the host machine connected to nothing. Fix: add the third hostfwd. (fbe2d0b)
Fixed (CI)
release.yamlworkflow rewritten so v0.3.1+ tag pushes auto-publish a complete release page on Gitea:actions/upload-artifactpinned to@v3for act_runner compatibility, thesoftprops/action-gh-release@v2step replaced with a directcurlagainst/api/v1/repos/.../releases(softpropshard-codesapi.github.comso it silently no-ops on Gitea), added abuild-disk-arm64job that builds on thearm64-linuxrunner. v0.3.0's manual-upload-only release was the canary that exposed all three bugs. (f8c308d)
Known issues carried forward to v0.3.2
These don't block normal operation but are tracked:
xt_commentuserspace extension load fails on the iptables-nft path, causing kubelet's KUBE-FIREWALL rule install to skip. Reported asCouldn't load match 'comment'in the boot log. kubelet continues without the localhost-drop rule.containerd-shim-runc-v2 -infoprobe reportsrunc: executable file not found in $PATH. Cosmetic — containerd uses the absolute path from its config when actually launching containers.kube-proxy conntrack cleanuplogsFailed to list conntrack entries: invalid argumentevery cleanup cycle. Probably needsCONFIG_NF_CONNTRACK_PROCFSor netlink-glue tweaks.- Several pods restart 1–2 times on first boot due to a PLEG / runtime-probe race in the kubelet startup path. Pods stabilise.
[0.3.0] - 2026-05-14
The main themes: generic ARM64 (not just Raspberry Pi), an honest update lifecycle with state file + metrics, OCI multi-arch distribution via ghcr.io, and policy gates (channels, maintenance windows, version stepping-stones, pre-flight checks, auto-rollback).
Added
- Generic ARM64 build track distinct from Raspberry Pi:
make kernel-arm64builds a mainline kernel.org LTS kernel (6.12.10 by default) fromarm64 defconfig+ sharedkernel-container.fragment+ arm64 virt-host enables (VIRTIO_*, EFI_STUB, NVMe).make disk-image-arm64produces a UEFI-bootable raw GPT image with A/B system partitions and GRUB-EFI ARM64. Targets QEMU virt, Graviton, Ampere, or any UEFI ARM64 host.hack/dev-vm-arm64.sh --diskboots the built image through QEMU UEFI for end-to-end testing.test/qemu/test-boot-arm64-disk.shautomated boot smoke test.
- Bumped KubeSolo to v1.1.5 (was v1.1.0). New cloud-init flags surfaced:
kubesolo.full(v1.1.4+) — disable edge-optimised overrideskubesolo.disable-ipv6(v1.1.5+)kubesolo.db-wal-repair(v1.1.5+) — recover from unclean shutdowns
- Per-arch supply-chain verification:
KUBESOLO_SHA256_AMD64andKUBESOLO_SHA256_ARM64inversions.env, applied to the tarball before extract. docs/arm64-architecture.md— defines the generic-vs-RPi two-track layout.docs/arm64-status.md— Phase 3 status snapshot, known limitations, what's needed to ship.docs/ci-runners.md— Gitea Actions runner setup (Odroid arm64-linux).- Update agent state machine and observability (
update/pkg/state):- Persistent on-disk
state.jsonat/var/lib/kubesolo/update/state.json(atomic write via tmp + rename). Records Phase (Idle / Checking / Downloading / Staged / Activated / Verifying / Success / RolledBack / Failed), FromVersion, ToVersion, StartedAt, UpdatedAt, LastError, AttemptCount, HealthCheckFailures. apply,activate,healthcheck,rollbackall transition state explicitly on entry / exit / failure. Errors land in LastError sostatuscan show why.kubesolo-update status --jsonemits the full state for orchestration tooling. Human-readable mode adds an "Update Lifecycle" section when not idle.- New Prometheus metrics:
kubesolo_update_phase{phase="..."}(all 9 phase labels always emitted),kubesolo_update_attempts_total,kubesolo_update_last_attempt_timestamp_seconds.
- Persistent on-disk
- Channels, maintenance windows, version policy (
update/pkg/config):/etc/kubesolo/update.conf(key=value, comments, missing-OK) configures server, channel, maintenance_window, pubkey, healthcheck_url, auto_rollback_after.cloud-inittop-levelupdates:block writesupdate.confon first boot. Empty block leaves any existing file alone.applyenforces four gates before download: maintenance window, channel match, runtime architecture match, min_compatible_version stepping-stone. All gate failures land in the state machine as Failed with a clear LastError.--forcebypasses window + node-block-label.UpdateMetadataJSON gainschannel,min_compatible_version,architecture(all optional, omitempty).
- OCI registry distribution (
update/pkg/oci, ~280 LOC, 9 tests):kubesolo-update apply --registry ghcr.io/<org>/kubesolo-os --tag stablepulls update artifacts from any OCI-compliant registry. Multi-arch indexes resolve to the runtime.GOARCH-matching manifest automatically.- Custom media types:
application/vnd.kubesolo.os.kernel.v1+octet-streamandapplication/vnd.kubesolo.os.initramfs.v1+gzip. Annotations:io.kubesolo.os.{version,channel,architecture,min_compatible_version, release_notes,release_date}. - End-to-end digest verification from manifest to blobs via oras-go/v2.
build/scripts/push-oci-artifact.shpublishes per-arch artifacts viaoras. Multi-arch index composition documented inline.- Dependencies added (update module only): oras.land/oras-go/v2 and transitive opencontainers/{go-digest,image-spec} + golang.org/x/sync.
- Pre-flight gates and deeper healthcheck (
update/pkg/healthextended,update/pkg/partitionextended):- Free-space pre-flight on the passive partition (image + 10% headroom)
via
partition.FreeBytes/HasFreeSpaceFor. - Node-block-label pre-flight: refuses if the local K8s node carries
updates.kubesolo.io/block=true. Silently allowed when no kubeconfig (air-gap). Skipped by--force. CheckKubeSystemReadywaits until every kube-system pod has held Running for ≥ N seconds (configurable via--kube-system-settle).CheckProbeURLGETs an operator-supplied URL; 200 = pass. Configurable via--healthcheck-urlorhealthcheck_url=in update.conf.CheckDiskWritablewrites / fsyncs / reads / deletes a probe file under/var/lib/kubesoloto catch a wedged data partition.--auto-rollback-after N(alsoauto_rollback_after=in update.conf): after N consecutive post-activation healthcheck failures, the agent callsForceRollback()and the operator/init reboots. Reset to 0 on a clean pass.
- Free-space pre-flight on the passive partition (image + 10% headroom)
via
.gitea/workflows/build-arm64.yaml— full ARM64 build on the Odroid self-hosted runner. Triggers on push to main, tags, and workflow_dispatch. Boot smoke test marked continue-on-error pending KVM or real-hardware validation.
Changed
build/scripts/build-kernel-arm64.shis now the generic ARM64 kernel build (mainline kernel.org LTS, generic UEFI/virtio).- Renamed
build/scripts/build-kernel-rpi.sh(wasbuild-kernel-arm64.sh). RPi kernel build (raspberrypi/linux fork, bcm2711_defconfig) lives here now. - Renamed
build/config/kernel-container.fragment(wasrpi-kernel-config.fragment). Misnomer: contents are arch-agnostic and now shared across x86, ARM64-generic, and RPi kernels. build/scripts/build-kernel.sh(x86) refactored to consume the shared fragment via a genericapply_fragmentfunction. ~50 lines of duplication killed.KUBESOLO_VERSIONmoved out offetch-components.shdefaults intoversions.env. Bumping is now a one-line PR.
Fixed
- Native ARM64 build hosts (e.g. an Odroid runner) no longer require the x86
cross-compiler. Both
build-kernel-arm64.shandbuild-kernel-rpi.shdetectuname -mand use the host's gcc directly when arch matches. - ARM64 grub.cfg console ordering:
ttyAMA0is now the primary console (console=ttyS0,... console=ttyAMA0,...). Init output is now visible on QEMU virt and most ARM64 SBCs without further configuration. - ARM64 boot: replaced piCore64's
/initwith our staged init at/initand/sbin/init. Previously the kernel ran piCore's TCE handler which segfaulted in our environment. - ARM64 boot: replaced piCore64's broken dynamic BusyBox with the build
host's
busybox-static. piCore's binary triggered EL0 instruction-abort panics on QEMU virt under both-cpu cortex-a72and-cpu max. - POSIX-character-class portability:
tr -d '[:space:]'in30-kernel-modules.shand40-sysctl.shreplaced with explicit' \t\r\n'. Ubuntu's busybox-static 1.30.1 doesn't parse[:space:]and instead deletes the literal characters[ : s p a c e ], which truncated module names (virtio_net→virtio_nt, etc.) and sysctl keys. inject-kubesolo.shno longer copiesinit/lib/functions.shintoinit.d/. Previously the main init loop tried to run it as a stage after stage 90 and panicked with "Init completed without exec'ing KubeSolo".- ARM64 disk image:
TARGET_ARCH=arm64 create-disk-image.shproducesBOOTAA64.EFIviagrub-mkimage -O arm64-efi(notbootx64.efi). Skips the BIOS-onlygrub-install --target=i386-pcstep. build/Dockerfile.builder: addedgrub-efi-amd64-bin,grub-efi-arm64-bin,grub-pc-bin,grub-common,grub2-common, andbusybox-staticso the Docker-based build flow can produce ARM64 disk images and gets the same BusyBox swap behaviour as native builds.
Known limitations (deferred to follow-up)
- ARM64 LABEL= resolution doesn't work yet — piCore's
blkid/findfscrash in QEMU and our static busybox lacks the applets. Hardcoded/dev/vda4as a workaround inbuild/grub/grub-arm64.cfg. Production fix: ship staticblkid/findfsor replace LABEL resolution with a sysfs walk. - AppArmor profile load fails on ARM64 (apparmor_parser ABI mismatch). Init reports it; boot continues without enforcement.
- OCI signature verification is deferred. The HTTP transport still
honours
--pubkeyfor.sigfiles; the OCI transport is digest-verified end-to-end via oras-go but does not yet consume cosign-style referrer attestations. Targeted for v0.3.1. - Real-hardware validation of the generic ARM64 image is still pending. Builds and boots end-to-end under QEMU virt; production certification waits on a Graviton / Ampere run.
- QEMU TCG performance can trigger KubeSolo's first-boot image-import deadline. Not a defect in the OS itself; real hardware and KVM-accelerated QEMU complete the import in seconds.
[0.2.0] - 2026-02-12
Added
- Cloud-init: support all documented KubeSolo CLI flags (
--local-storage-shared-path,--debug,--pprof-server,--portainer-edge-id,--portainer-edge-key,--portainer-edge-async) - Cloud-init:
full-config.yamlexample showing all supported parameters - Cloud-init: KubeSolo configuration reference table in docs/cloud-init.md
- Security hardening: mount hardening, sysctl, kernel module lock, AppArmor profiles
- ARM64 Raspberry Pi support with A/B boot via tryboot
- BootEnv abstraction for GRUB and RPi boot environments
- Go 1.25.5 installed on host for native builds
[0.1.0] - 2026-02-12
First release with all 5 design-doc phases complete. ISO boots and runs K8s pods.
Added
Custom Kernel
- Custom kernel build (6.18.2-tinycore64) with container-critical configs
- Added CONFIG_CGROUP_BPF, CONFIG_DEVTMPFS, CONFIG_DEVTMPFS_MOUNT, CONFIG_MEMCG, CONFIG_CFS_BANDWIDTH
- Stripped unnecessary subsystems (sound, GPU, wireless, Bluetooth, etc.)
- Selective kernel module install — only modules.list + transitive deps in initramfs
Init System (Phase 1)
- POSIX sh init system with staged boot (00-early-mount through 90-kubesolo)
- switch_root from initramfs to SquashFS root
- Persistent data partition mount with bind-mounts for K8s state
- Kernel module loading, sysctl tuning, network, hostname, NTP
- Emergency shell fallback on boot failure
- Device node creation via mknod fallback from sysfs
Cloud-Init (Phase 2)
- Go-based cloud-init parser (~2.7 MB static binary)
- Network configuration: DHCP and static IP modes
- Hostname and machine-id generation
- KubeSolo configuration (node-name, extra flags)
- Portainer Edge Agent integration via K8s manifest injection
- Persistent config saved to /mnt/data/ for next-boot fast path
- 22 Go tests
A/B Atomic Updates (Phase 3)
- 4-partition GPT disk image: EFI + System A + System B + Data
- GRUB 2 bootloader with A/B slot selection and boot counter rollback
- Go update agent (~6.0 MB static binary) with check, apply, activate, rollback commands
- Health check: containerd + K8s API + node Ready verification
- Update server protocol: HTTP serving latest.json + image files
- K8s CronJob for automated update checks (every 6 hours)
- Zero external Go dependencies — uses kubectl/ctr exec commands
Production Hardening (Phase 4)
- Ed25519 image signing with pure Go stdlib (zero external deps)
- Key generation, signing, and verification CLI commands
- Portainer Edge Agent deployment via cloud-init
- SSH extension injection for debugging (hack/inject-ssh.sh)
- Boot time and resource usage benchmarks
- Deployment guide documentation
Distribution & Fleet Management (Phase 5)
- Gitea Actions CI/CD (test + build + shellcheck on push, release on tags)
- OCI container image packaging (scratch-based)
- Prometheus metrics endpoint (zero-dependency text exposition format)
- USB provisioning script with cloud-init injection
- ARM64 cross-compilation support
Build System
- Makefile with full build orchestration
- Dockerized reproducible builds (build/Dockerfile.builder)
- Component fetching with version pinning
- ISO and raw disk image creation
- Fast rebuild path (
make quick)
Documentation
- Architecture design document
- Boot flow reference
- A/B update flow reference
- Cloud-init configuration reference
- Deployment and operations guide
Fixed
- Replaced
grep -oPwith POSIX-safesedin functions.sh (BusyBox compatibility) - Replaced
grep -qiEwithgrep -qi -epattern (POSIX compliance) - Fixed KVM flag handling in dev-vm.sh (bash array context)
- Added iptables table pre-initialization before kube-proxy start (nf_tables issue)
- Added /dev/kmsg and /etc/machine-id creation for kubelet
- Added CA certificates bundle to initramfs (containerd TLS verification for Docker Hub)
- Added DNS fallback (10.0.2.3 + 8.8.8.8) when DHCP client doesn't populate resolv.conf
- Added headless Service to Portainer Edge Agent manifest (agent peer discovery DNS)
- Added kubesolo.edge_id/edge_key kernel boot parameters for Portainer Edge
- Added auto-format of unformatted data disks on first boot
- Rewrote dev-vm.sh for macOS: bsdtar ISO extraction, Homebrew mkfs.ext4 detection, direct kernel boot, TCG acceleration, port 8080 forwarding
- Kubeconfig now served via HTTP on port 8080 (serial console truncates base64 lines)
- Added 127.0.0.1 and 10.0.2.15 to API server SANs for QEMU port forwarding
- dev-vm.sh now works on Linux: fallback ISO extraction via isoinfo or loop mount, KVM auto-detection, platform-aware error messages