Files
kubesolo-os/docs/release-notes-0.3.0.md
Adolfo Delorenzo 3b47e7af68
Some checks failed
CI / Go Tests (push) Successful in 1m29s
CI / Shellcheck (push) Successful in 46s
ARM64 Build / Build generic ARM64 disk image (push) Failing after 3s
Release / Test (push) Successful in 1m21s
CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Successful in 1m19s
CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Successful in 1m36s
Release / Build Binaries (amd64, linux, linux-amd64) (push) Failing after 1m27s
Release / Build Binaries (arm64, linux, linux-arm64) (push) Failing after 1m17s
Release / Build ISO (amd64) (push) Has been skipped
Release / Create Release (push) Has been skipped
release: v0.3.0
Promote VERSION from 0.3.0-dev to 0.3.0. Finalise CHANGELOG entry with
phases 5-8 work (state machine + metrics, channels + maintenance windows,
OCI multi-arch distribution, pre-flight gates + deeper healthcheck +
auto-rollback). Refresh README quick-start to show both x86_64 and generic
ARM64 paths; update the roadmap status table to mark all v0.3 phases
complete and explicitly track the v0.3.1 follow-ups (OCI cosign,
LABEL=KSOLODATA on ARM64, real-hardware validation).

Add docs/release-notes-0.3.0.md as the operator-facing summary, including a
v0.2.x -> v0.3.0 migration section (non-breaking on live systems) and the
known-limitations list copied from CHANGELOG.

All tests green: cloud-init module, all 10 update-module packages,
shellcheck across init / build / test / hack scripts under the v0.3
severity policy.

Tagging is intentionally NOT done from this commit — that's a manual step
so the operator can decide when v0.3.0 is final. After tagging:

  git tag -a v0.3.0 -m "KubeSolo OS v0.3.0"
  git push origin v0.3.0

The push triggers .gitea/workflows/build-arm64.yaml which runs the full
ARM64 build on the Odroid runner.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 19:13:09 -06:00

7.4 KiB
Raw Blame History

KubeSolo OS v0.3.0 — Release Notes

Released: 2026-05-14

v0.3.0 is the second feature release after v0.2.0 and the first release that ships a generic ARM64 build alongside x86_64. The update agent grew up: it now has an explicit on-disk lifecycle, OCI registry distribution, and a fleet-friendly set of policy gates (channels, maintenance windows, version-stepping-stones, pre-flight checks, auto-rollback).

This document is the operator-facing summary. The full per-phase changelog lives in CHANGELOG.md.

What's new

Generic ARM64 build

The image you build with make disk-image-arm64 now targets any UEFI-capable ARM64 host: AWS Graviton, Oracle Ampere, generic ARM64 servers, future SBCs with UEFI-compatible firmware. The kernel comes from kernel.org mainline LTS (6.12.10 by default, configurable via MAINLINE_KERNEL_VERSION in build/config/versions.env).

This is distinct from the Raspberry Pi build path. RPi keeps its specialised kernel from raspberrypi/linux with bcm-defconfig + custom DTBs; the generic ARM64 path uses mainline + arm64-defconfig + UEFI/virtio. See docs/arm64-architecture.md for the file-by-file split.

KubeSolo bumped to v1.1.5 (was v1.1.0). New flags surfaced via cloud-init:

  • kubesolo.full — disable edge-optimised k8s overrides
  • kubesolo.disable-ipv6 — disable IPv6 cluster-wide
  • kubesolo.db-wal-repair — recover from unclean shutdowns

Update lifecycle is now observable

The update agent writes a state.json at /var/lib/kubesolo/update/state.json recording where the current attempt is in the lifecycle:

idle → checking → downloading → staged → activated → verifying → success
                                                              ↘ rolled_back
                                                              ↘ failed

kubesolo-update status --json emits the full state for orchestration tooling. The Prometheus metrics endpoint gains three new series:

  • kubesolo_update_phase{phase="..."} — 1 for current phase, 0 for others (all 9 always emitted)
  • kubesolo_update_attempts_total
  • kubesolo_update_last_attempt_timestamp_seconds

OCI registry distribution

Update artifacts can now be pulled from any OCI-compliant registry alongside the existing HTTP latest.json protocol:

# HTTP, unchanged from v0.2:
kubesolo-update apply --server https://updates.example.com

# New: OCI from ghcr.io (or quay.io, harbor, zot, ...)
kubesolo-update apply --registry ghcr.io/yourorg/kubesolo-os --tag stable

Multi-arch is handled transparently — the same stable tag points at a manifest index, the agent picks the manifest matching its runtime.GOARCH.

Publish your own artifacts with build/scripts/push-oci-artifact.sh. See the script's header comment for the full publishing flow.

Policy gates

apply now enforces five gates before destroying the passive slot:

  1. Maintenance window (configurable, e.g. 03:00-05:00; wrapping midnight supported)
  2. Node-block-label — refuses if the K8s node carries updates.kubesolo.io/block=true (workload-author kill switch)
  3. Channelstable / beta / edge must match between the artifact metadata and the local channel
  4. Architecture — refuses cross-arch artifacts via runtime.GOARCH check
  5. Min compatible version — stepping-stone enforcement; refuses an upgrade that bypasses a required intermediate version

--force bypasses the maintenance window and node-block label (channel / arch / min-version are non-negotiable). Failures are recorded in state.json with a clear LastError field.

Healthcheck deepening + auto-rollback

kubesolo-update healthcheck grew three optional probes:

  • Kube-system pods must hold Running for ≥ N seconds before passing
  • Operator probe URL — GET an operator-supplied endpoint; 200 = pass
  • Disk smoke test — write/fsync/read/delete a probe file under /var/lib/kubesolo to catch a wedged data partition

Plus auto-rollback: with --auto-rollback-after N (or auto_rollback_after= in update.conf), after N consecutive post-activation failures, the agent calls ForceRollback() and the operator/init is expected to reboot. The counter resets on a clean pass.

Persistent configuration via /etc/kubesolo/update.conf

Cloud-init writes this file on first boot from a new updates: block; you can also hand-edit it. Recognised keys:

server = https://updates.example.com         # or omit if using registry
registry =                                   # OCI registry ref (alt to server)
channel = stable
maintenance_window = 03:00-05:00
pubkey = /etc/kubesolo/update-pubkey.hex
healthcheck_url = http://localhost:8000/ready
auto_rollback_after = 3

Cloud-init full reference at cloud-init/examples/full-config.yaml.

Migration from v0.2.x

This is a non-breaking release for live systems. v0.2.x → v0.3.0 changes:

  • state.json will appear at /var/lib/kubesolo/update/state.json the first time a v0.3 agent runs apply. Pre-existing v0.2 deployments without this file are fine — the agent treats a missing file as fresh Idle state.
  • update.conf is optional. v0.2 deployments that pass everything via CLI flags keep working unchanged.
  • HTTP latest.json protocol unchanged. Existing update servers don't need a rebuild.
  • GRUB env (boot counter, active slot) unchanged. The bootloader's rollback behaviour is the same.
  • No new mandatory kernel command-line parameters.

To opt into the new lifecycle, transports, and gates, drop in an update.conf (or update cloud-init) and switch to --registry if you want OCI distribution.

Known limitations

These shipped intentionally with v0.3.0 and are explicitly tracked for v0.3.1+:

  • OCI signature verification — the OCI transport is digest-verified end-to-end via oras-go, but does not yet consume cosign-style referrer attestations. The HTTP transport still honours --pubkey for .sig files.
  • ARM64 LABEL=KSOLODATA resolution doesn't work yet — piCore's blkid/findfs crash on QEMU virt under our mainline kernel; the static busybox-static we ship doesn't include those applets. build/grub/grub-arm64.cfg hardcodes kubesolo.data=/dev/vda4 as a workaround. On real ARM64 hardware the device path may differ.
  • Real-hardware ARM64 validation is pending. The image builds and boots end-to-end under QEMU virt; production certification waits on a Graviton / Ampere run.
  • AppArmor profile load fails on ARM64 (apparmor_parser ABI mismatch). Init reports the failure; boot continues without AppArmor enforcement.
  • QEMU TCG performance can trigger KubeSolo's first-boot image-import deadline. Not an OS defect; real hardware and KVM-accelerated QEMU complete the import in seconds.

How to upgrade your build host

git pull
make distclean   # optional — drops the build cache; full rebuild takes ~30 min
make iso         # or disk-image, or disk-image-arm64

The Docker-based builder (make docker-build) regenerates its own image from build/Dockerfile.builder on next invocation; oras 1.2.3 and busybox-static are now included.

Acknowledgements

v0.3.0 work was driven by a single multi-week pair-programming session working through Phases 09 of the v0.3 roadmap. The Odroid self-hosted Gitea Actions runner (odroid.local, arm64-linux) carried every ARM64 build during development.