Promote VERSION from 0.3.0-dev to 0.3.0. Finalise CHANGELOG entry with phases 5-8 work (state machine + metrics, channels + maintenance windows, OCI multi-arch distribution, pre-flight gates + deeper healthcheck + auto-rollback). Refresh README quick-start to show both x86_64 and generic ARM64 paths; update the roadmap status table to mark all v0.3 phases complete and explicitly track the v0.3.1 follow-ups (OCI cosign, LABEL=KSOLODATA on ARM64, real-hardware validation). Add docs/release-notes-0.3.0.md as the operator-facing summary, including a v0.2.x -> v0.3.0 migration section (non-breaking on live systems) and the known-limitations list copied from CHANGELOG. All tests green: cloud-init module, all 10 update-module packages, shellcheck across init / build / test / hack scripts under the v0.3 severity policy. Tagging is intentionally NOT done from this commit — that's a manual step so the operator can decide when v0.3.0 is final. After tagging: git tag -a v0.3.0 -m "KubeSolo OS v0.3.0" git push origin v0.3.0 The push triggers .gitea/workflows/build-arm64.yaml which runs the full ARM64 build on the Odroid runner. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
7.4 KiB
KubeSolo OS v0.3.0 — Release Notes
Released: 2026-05-14
v0.3.0 is the second feature release after v0.2.0 and the first release that ships a generic ARM64 build alongside x86_64. The update agent grew up: it now has an explicit on-disk lifecycle, OCI registry distribution, and a fleet-friendly set of policy gates (channels, maintenance windows, version-stepping-stones, pre-flight checks, auto-rollback).
This document is the operator-facing summary. The full per-phase changelog lives in CHANGELOG.md.
What's new
Generic ARM64 build
The image you build with make disk-image-arm64 now targets any UEFI-capable
ARM64 host: AWS Graviton, Oracle Ampere, generic ARM64 servers, future SBCs
with UEFI-compatible firmware. The kernel comes from kernel.org mainline LTS
(6.12.10 by default, configurable via MAINLINE_KERNEL_VERSION in
build/config/versions.env).
This is distinct from the Raspberry Pi build path. RPi keeps its
specialised kernel from raspberrypi/linux with bcm-defconfig + custom DTBs;
the generic ARM64 path uses mainline + arm64-defconfig + UEFI/virtio. See
docs/arm64-architecture.md for the file-by-file
split.
KubeSolo bumped to v1.1.5 (was v1.1.0). New flags surfaced via cloud-init:
kubesolo.full— disable edge-optimised k8s overrideskubesolo.disable-ipv6— disable IPv6 cluster-widekubesolo.db-wal-repair— recover from unclean shutdowns
Update lifecycle is now observable
The update agent writes a state.json at /var/lib/kubesolo/update/state.json
recording where the current attempt is in the lifecycle:
idle → checking → downloading → staged → activated → verifying → success
↘ rolled_back
↘ failed
kubesolo-update status --json emits the full state for orchestration tooling.
The Prometheus metrics endpoint gains three new series:
kubesolo_update_phase{phase="..."}— 1 for current phase, 0 for others (all 9 always emitted)kubesolo_update_attempts_totalkubesolo_update_last_attempt_timestamp_seconds
OCI registry distribution
Update artifacts can now be pulled from any OCI-compliant registry alongside
the existing HTTP latest.json protocol:
# HTTP, unchanged from v0.2:
kubesolo-update apply --server https://updates.example.com
# New: OCI from ghcr.io (or quay.io, harbor, zot, ...)
kubesolo-update apply --registry ghcr.io/yourorg/kubesolo-os --tag stable
Multi-arch is handled transparently — the same stable tag points at a
manifest index, the agent picks the manifest matching its runtime.GOARCH.
Publish your own artifacts with build/scripts/push-oci-artifact.sh. See
the script's header comment for the full publishing flow.
Policy gates
apply now enforces five gates before destroying the passive slot:
- Maintenance window (configurable, e.g.
03:00-05:00; wrapping midnight supported) - Node-block-label — refuses if the K8s node carries
updates.kubesolo.io/block=true(workload-author kill switch) - Channel —
stable/beta/edgemust match between the artifact metadata and the local channel - Architecture — refuses cross-arch artifacts via
runtime.GOARCHcheck - Min compatible version — stepping-stone enforcement; refuses an upgrade that bypasses a required intermediate version
--force bypasses the maintenance window and node-block label (channel /
arch / min-version are non-negotiable). Failures are recorded in state.json
with a clear LastError field.
Healthcheck deepening + auto-rollback
kubesolo-update healthcheck grew three optional probes:
- Kube-system pods must hold Running for ≥ N seconds before passing
- Operator probe URL — GET an operator-supplied endpoint; 200 = pass
- Disk smoke test — write/fsync/read/delete a probe file under
/var/lib/kubesoloto catch a wedged data partition
Plus auto-rollback: with --auto-rollback-after N (or auto_rollback_after=
in update.conf), after N consecutive post-activation failures, the agent
calls ForceRollback() and the operator/init is expected to reboot. The
counter resets on a clean pass.
Persistent configuration via /etc/kubesolo/update.conf
Cloud-init writes this file on first boot from a new updates: block; you
can also hand-edit it. Recognised keys:
server = https://updates.example.com # or omit if using registry
registry = # OCI registry ref (alt to server)
channel = stable
maintenance_window = 03:00-05:00
pubkey = /etc/kubesolo/update-pubkey.hex
healthcheck_url = http://localhost:8000/ready
auto_rollback_after = 3
Cloud-init full reference at cloud-init/examples/full-config.yaml.
Migration from v0.2.x
This is a non-breaking release for live systems. v0.2.x → v0.3.0 changes:
state.jsonwill appear at/var/lib/kubesolo/update/state.jsonthe first time a v0.3 agent runsapply. Pre-existing v0.2 deployments without this file are fine — the agent treats a missing file as fresh Idle state.update.confis optional. v0.2 deployments that pass everything via CLI flags keep working unchanged.- HTTP
latest.jsonprotocol unchanged. Existing update servers don't need a rebuild. - GRUB env (boot counter, active slot) unchanged. The bootloader's rollback behaviour is the same.
- No new mandatory kernel command-line parameters.
To opt into the new lifecycle, transports, and gates, drop in an
update.conf (or update cloud-init) and switch to --registry if you want
OCI distribution.
Known limitations
These shipped intentionally with v0.3.0 and are explicitly tracked for v0.3.1+:
- OCI signature verification — the OCI transport is digest-verified
end-to-end via oras-go, but does not yet consume cosign-style referrer
attestations. The HTTP transport still honours
--pubkeyfor.sigfiles. - ARM64 LABEL=KSOLODATA resolution doesn't work yet — piCore's
blkid/findfscrash on QEMU virt under our mainline kernel; the staticbusybox-staticwe ship doesn't include those applets.build/grub/grub-arm64.cfghardcodeskubesolo.data=/dev/vda4as a workaround. On real ARM64 hardware the device path may differ. - Real-hardware ARM64 validation is pending. The image builds and boots end-to-end under QEMU virt; production certification waits on a Graviton / Ampere run.
- AppArmor profile load fails on ARM64 (
apparmor_parserABI mismatch). Init reports the failure; boot continues without AppArmor enforcement. - QEMU TCG performance can trigger KubeSolo's first-boot image-import deadline. Not an OS defect; real hardware and KVM-accelerated QEMU complete the import in seconds.
How to upgrade your build host
git pull
make distclean # optional — drops the build cache; full rebuild takes ~30 min
make iso # or disk-image, or disk-image-arm64
The Docker-based builder (make docker-build) regenerates its own image
from build/Dockerfile.builder on next invocation; oras 1.2.3 and
busybox-static are now included.
Acknowledgements
v0.3.0 work was driven by a single multi-week pair-programming session
working through Phases 0–9 of the v0.3 roadmap. The Odroid self-hosted
Gitea Actions runner (odroid.local, arm64-linux) carried every ARM64
build during development.