Some checks failed
CI / Go Tests (push) Successful in 1m29s
CI / Shellcheck (push) Successful in 46s
ARM64 Build / Build generic ARM64 disk image (push) Failing after 3s
Release / Test (push) Successful in 1m21s
CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Successful in 1m19s
CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Successful in 1m36s
Release / Build Binaries (amd64, linux, linux-amd64) (push) Failing after 1m27s
Release / Build Binaries (arm64, linux, linux-arm64) (push) Failing after 1m17s
Release / Build ISO (amd64) (push) Has been skipped
Release / Create Release (push) Has been skipped
Promote VERSION from 0.3.0-dev to 0.3.0. Finalise CHANGELOG entry with phases 5-8 work (state machine + metrics, channels + maintenance windows, OCI multi-arch distribution, pre-flight gates + deeper healthcheck + auto-rollback). Refresh README quick-start to show both x86_64 and generic ARM64 paths; update the roadmap status table to mark all v0.3 phases complete and explicitly track the v0.3.1 follow-ups (OCI cosign, LABEL=KSOLODATA on ARM64, real-hardware validation). Add docs/release-notes-0.3.0.md as the operator-facing summary, including a v0.2.x -> v0.3.0 migration section (non-breaking on live systems) and the known-limitations list copied from CHANGELOG. All tests green: cloud-init module, all 10 update-module packages, shellcheck across init / build / test / hack scripts under the v0.3 severity policy. Tagging is intentionally NOT done from this commit — that's a manual step so the operator can decide when v0.3.0 is final. After tagging: git tag -a v0.3.0 -m "KubeSolo OS v0.3.0" git push origin v0.3.0 The push triggers .gitea/workflows/build-arm64.yaml which runs the full ARM64 build on the Odroid runner. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
14 KiB
14 KiB
Changelog
All notable changes to KubeSolo OS are documented in this file.
Format based on Keep a Changelog, versioning follows Semantic Versioning.
[0.3.0] - 2026-05-14
The main themes: generic ARM64 (not just Raspberry Pi), an honest update lifecycle with state file + metrics, OCI multi-arch distribution via ghcr.io, and policy gates (channels, maintenance windows, version stepping-stones, pre-flight checks, auto-rollback).
Added
- Generic ARM64 build track distinct from Raspberry Pi:
make kernel-arm64builds a mainline kernel.org LTS kernel (6.12.10 by default) fromarm64 defconfig+ sharedkernel-container.fragment+ arm64 virt-host enables (VIRTIO_*, EFI_STUB, NVMe).make disk-image-arm64produces a UEFI-bootable raw GPT image with A/B system partitions and GRUB-EFI ARM64. Targets QEMU virt, Graviton, Ampere, or any UEFI ARM64 host.hack/dev-vm-arm64.sh --diskboots the built image through QEMU UEFI for end-to-end testing.test/qemu/test-boot-arm64-disk.shautomated boot smoke test.
- Bumped KubeSolo to v1.1.5 (was v1.1.0). New cloud-init flags surfaced:
kubesolo.full(v1.1.4+) — disable edge-optimised overrideskubesolo.disable-ipv6(v1.1.5+)kubesolo.db-wal-repair(v1.1.5+) — recover from unclean shutdowns
- Per-arch supply-chain verification:
KUBESOLO_SHA256_AMD64andKUBESOLO_SHA256_ARM64inversions.env, applied to the tarball before extract. docs/arm64-architecture.md— defines the generic-vs-RPi two-track layout.docs/arm64-status.md— Phase 3 status snapshot, known limitations, what's needed to ship.docs/ci-runners.md— Gitea Actions runner setup (Odroid arm64-linux).- Update agent state machine and observability (
update/pkg/state):- Persistent on-disk
state.jsonat/var/lib/kubesolo/update/state.json(atomic write via tmp + rename). Records Phase (Idle / Checking / Downloading / Staged / Activated / Verifying / Success / RolledBack / Failed), FromVersion, ToVersion, StartedAt, UpdatedAt, LastError, AttemptCount, HealthCheckFailures. apply,activate,healthcheck,rollbackall transition state explicitly on entry / exit / failure. Errors land in LastError sostatuscan show why.kubesolo-update status --jsonemits the full state for orchestration tooling. Human-readable mode adds an "Update Lifecycle" section when not idle.- New Prometheus metrics:
kubesolo_update_phase{phase="..."}(all 9 phase labels always emitted),kubesolo_update_attempts_total,kubesolo_update_last_attempt_timestamp_seconds.
- Persistent on-disk
- Channels, maintenance windows, version policy (
update/pkg/config):/etc/kubesolo/update.conf(key=value, comments, missing-OK) configures server, channel, maintenance_window, pubkey, healthcheck_url, auto_rollback_after.cloud-inittop-levelupdates:block writesupdate.confon first boot. Empty block leaves any existing file alone.applyenforces four gates before download: maintenance window, channel match, runtime architecture match, min_compatible_version stepping-stone. All gate failures land in the state machine as Failed with a clear LastError.--forcebypasses window + node-block-label.UpdateMetadataJSON gainschannel,min_compatible_version,architecture(all optional, omitempty).
- OCI registry distribution (
update/pkg/oci, ~280 LOC, 9 tests):kubesolo-update apply --registry ghcr.io/<org>/kubesolo-os --tag stablepulls update artifacts from any OCI-compliant registry. Multi-arch indexes resolve to the runtime.GOARCH-matching manifest automatically.- Custom media types:
application/vnd.kubesolo.os.kernel.v1+octet-streamandapplication/vnd.kubesolo.os.initramfs.v1+gzip. Annotations:io.kubesolo.os.{version,channel,architecture,min_compatible_version, release_notes,release_date}. - End-to-end digest verification from manifest to blobs via oras-go/v2.
build/scripts/push-oci-artifact.shpublishes per-arch artifacts viaoras. Multi-arch index composition documented inline.- Dependencies added (update module only): oras.land/oras-go/v2 and transitive opencontainers/{go-digest,image-spec} + golang.org/x/sync.
- Pre-flight gates and deeper healthcheck (
update/pkg/healthextended,update/pkg/partitionextended):- Free-space pre-flight on the passive partition (image + 10% headroom)
via
partition.FreeBytes/HasFreeSpaceFor. - Node-block-label pre-flight: refuses if the local K8s node carries
updates.kubesolo.io/block=true. Silently allowed when no kubeconfig (air-gap). Skipped by--force. CheckKubeSystemReadywaits until every kube-system pod has held Running for ≥ N seconds (configurable via--kube-system-settle).CheckProbeURLGETs an operator-supplied URL; 200 = pass. Configurable via--healthcheck-urlorhealthcheck_url=in update.conf.CheckDiskWritablewrites / fsyncs / reads / deletes a probe file under/var/lib/kubesoloto catch a wedged data partition.--auto-rollback-after N(alsoauto_rollback_after=in update.conf): after N consecutive post-activation healthcheck failures, the agent callsForceRollback()and the operator/init reboots. Reset to 0 on a clean pass.
- Free-space pre-flight on the passive partition (image + 10% headroom)
via
.gitea/workflows/build-arm64.yaml— full ARM64 build on the Odroid self-hosted runner. Triggers on push to main, tags, and workflow_dispatch. Boot smoke test marked continue-on-error pending KVM or real-hardware validation.
Changed
build/scripts/build-kernel-arm64.shis now the generic ARM64 kernel build (mainline kernel.org LTS, generic UEFI/virtio).- Renamed
build/scripts/build-kernel-rpi.sh(wasbuild-kernel-arm64.sh). RPi kernel build (raspberrypi/linux fork, bcm2711_defconfig) lives here now. - Renamed
build/config/kernel-container.fragment(wasrpi-kernel-config.fragment). Misnomer: contents are arch-agnostic and now shared across x86, ARM64-generic, and RPi kernels. build/scripts/build-kernel.sh(x86) refactored to consume the shared fragment via a genericapply_fragmentfunction. ~50 lines of duplication killed.KUBESOLO_VERSIONmoved out offetch-components.shdefaults intoversions.env. Bumping is now a one-line PR.
Fixed
- Native ARM64 build hosts (e.g. an Odroid runner) no longer require the x86
cross-compiler. Both
build-kernel-arm64.shandbuild-kernel-rpi.shdetectuname -mand use the host's gcc directly when arch matches. - ARM64 grub.cfg console ordering:
ttyAMA0is now the primary console (console=ttyS0,... console=ttyAMA0,...). Init output is now visible on QEMU virt and most ARM64 SBCs without further configuration. - ARM64 boot: replaced piCore64's
/initwith our staged init at/initand/sbin/init. Previously the kernel ran piCore's TCE handler which segfaulted in our environment. - ARM64 boot: replaced piCore64's broken dynamic BusyBox with the build
host's
busybox-static. piCore's binary triggered EL0 instruction-abort panics on QEMU virt under both-cpu cortex-a72and-cpu max. - POSIX-character-class portability:
tr -d '[:space:]'in30-kernel-modules.shand40-sysctl.shreplaced with explicit' \t\r\n'. Ubuntu's busybox-static 1.30.1 doesn't parse[:space:]and instead deletes the literal characters[ : s p a c e ], which truncated module names (virtio_net→virtio_nt, etc.) and sysctl keys. inject-kubesolo.shno longer copiesinit/lib/functions.shintoinit.d/. Previously the main init loop tried to run it as a stage after stage 90 and panicked with "Init completed without exec'ing KubeSolo".- ARM64 disk image:
TARGET_ARCH=arm64 create-disk-image.shproducesBOOTAA64.EFIviagrub-mkimage -O arm64-efi(notbootx64.efi). Skips the BIOS-onlygrub-install --target=i386-pcstep. build/Dockerfile.builder: addedgrub-efi-amd64-bin,grub-efi-arm64-bin,grub-pc-bin,grub-common,grub2-common, andbusybox-staticso the Docker-based build flow can produce ARM64 disk images and gets the same BusyBox swap behaviour as native builds.
Known limitations (deferred to follow-up)
- ARM64 LABEL= resolution doesn't work yet — piCore's
blkid/findfscrash in QEMU and our static busybox lacks the applets. Hardcoded/dev/vda4as a workaround inbuild/grub/grub-arm64.cfg. Production fix: ship staticblkid/findfsor replace LABEL resolution with a sysfs walk. - AppArmor profile load fails on ARM64 (apparmor_parser ABI mismatch). Init reports it; boot continues without enforcement.
- OCI signature verification is deferred. The HTTP transport still
honours
--pubkeyfor.sigfiles; the OCI transport is digest-verified end-to-end via oras-go but does not yet consume cosign-style referrer attestations. Targeted for v0.3.1. - Real-hardware validation of the generic ARM64 image is still pending. Builds and boots end-to-end under QEMU virt; production certification waits on a Graviton / Ampere run.
- QEMU TCG performance can trigger KubeSolo's first-boot image-import deadline. Not a defect in the OS itself; real hardware and KVM-accelerated QEMU complete the import in seconds.
[0.2.0] - 2026-02-12
Added
- Cloud-init: support all documented KubeSolo CLI flags (
--local-storage-shared-path,--debug,--pprof-server,--portainer-edge-id,--portainer-edge-key,--portainer-edge-async) - Cloud-init:
full-config.yamlexample showing all supported parameters - Cloud-init: KubeSolo configuration reference table in docs/cloud-init.md
- Security hardening: mount hardening, sysctl, kernel module lock, AppArmor profiles
- ARM64 Raspberry Pi support with A/B boot via tryboot
- BootEnv abstraction for GRUB and RPi boot environments
- Go 1.25.5 installed on host for native builds
[0.1.0] - 2026-02-12
First release with all 5 design-doc phases complete. ISO boots and runs K8s pods.
Added
Custom Kernel
- Custom kernel build (6.18.2-tinycore64) with container-critical configs
- Added CONFIG_CGROUP_BPF, CONFIG_DEVTMPFS, CONFIG_DEVTMPFS_MOUNT, CONFIG_MEMCG, CONFIG_CFS_BANDWIDTH
- Stripped unnecessary subsystems (sound, GPU, wireless, Bluetooth, etc.)
- Selective kernel module install — only modules.list + transitive deps in initramfs
Init System (Phase 1)
- POSIX sh init system with staged boot (00-early-mount through 90-kubesolo)
- switch_root from initramfs to SquashFS root
- Persistent data partition mount with bind-mounts for K8s state
- Kernel module loading, sysctl tuning, network, hostname, NTP
- Emergency shell fallback on boot failure
- Device node creation via mknod fallback from sysfs
Cloud-Init (Phase 2)
- Go-based cloud-init parser (~2.7 MB static binary)
- Network configuration: DHCP and static IP modes
- Hostname and machine-id generation
- KubeSolo configuration (node-name, extra flags)
- Portainer Edge Agent integration via K8s manifest injection
- Persistent config saved to /mnt/data/ for next-boot fast path
- 22 Go tests
A/B Atomic Updates (Phase 3)
- 4-partition GPT disk image: EFI + System A + System B + Data
- GRUB 2 bootloader with A/B slot selection and boot counter rollback
- Go update agent (~6.0 MB static binary) with check, apply, activate, rollback commands
- Health check: containerd + K8s API + node Ready verification
- Update server protocol: HTTP serving latest.json + image files
- K8s CronJob for automated update checks (every 6 hours)
- Zero external Go dependencies — uses kubectl/ctr exec commands
Production Hardening (Phase 4)
- Ed25519 image signing with pure Go stdlib (zero external deps)
- Key generation, signing, and verification CLI commands
- Portainer Edge Agent deployment via cloud-init
- SSH extension injection for debugging (hack/inject-ssh.sh)
- Boot time and resource usage benchmarks
- Deployment guide documentation
Distribution & Fleet Management (Phase 5)
- Gitea Actions CI/CD (test + build + shellcheck on push, release on tags)
- OCI container image packaging (scratch-based)
- Prometheus metrics endpoint (zero-dependency text exposition format)
- USB provisioning script with cloud-init injection
- ARM64 cross-compilation support
Build System
- Makefile with full build orchestration
- Dockerized reproducible builds (build/Dockerfile.builder)
- Component fetching with version pinning
- ISO and raw disk image creation
- Fast rebuild path (
make quick)
Documentation
- Architecture design document
- Boot flow reference
- A/B update flow reference
- Cloud-init configuration reference
- Deployment and operations guide
Fixed
- Replaced
grep -oPwith POSIX-safesedin functions.sh (BusyBox compatibility) - Replaced
grep -qiEwithgrep -qi -epattern (POSIX compliance) - Fixed KVM flag handling in dev-vm.sh (bash array context)
- Added iptables table pre-initialization before kube-proxy start (nf_tables issue)
- Added /dev/kmsg and /etc/machine-id creation for kubelet
- Added CA certificates bundle to initramfs (containerd TLS verification for Docker Hub)
- Added DNS fallback (10.0.2.3 + 8.8.8.8) when DHCP client doesn't populate resolv.conf
- Added headless Service to Portainer Edge Agent manifest (agent peer discovery DNS)
- Added kubesolo.edge_id/edge_key kernel boot parameters for Portainer Edge
- Added auto-format of unformatted data disks on first boot
- Rewrote dev-vm.sh for macOS: bsdtar ISO extraction, Homebrew mkfs.ext4 detection, direct kernel boot, TCG acceleration, port 8080 forwarding
- Kubeconfig now served via HTTP on port 8080 (serial console truncates base64 lines)
- Added 127.0.0.1 and 10.0.2.15 to API server SANs for QEMU port forwarding
- dev-vm.sh now works on Linux: fallback ISO extraction via isoinfo or loop mount, KVM auto-detection, platform-aware error messages