Files
kubesolo-os/docs/release-notes-0.3.0.md
Adolfo Delorenzo 3b47e7af68
Some checks failed
CI / Go Tests (push) Successful in 1m29s
CI / Shellcheck (push) Successful in 46s
ARM64 Build / Build generic ARM64 disk image (push) Failing after 3s
Release / Test (push) Successful in 1m21s
CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Successful in 1m19s
CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Successful in 1m36s
Release / Build Binaries (amd64, linux, linux-amd64) (push) Failing after 1m27s
Release / Build Binaries (arm64, linux, linux-arm64) (push) Failing after 1m17s
Release / Build ISO (amd64) (push) Has been skipped
Release / Create Release (push) Has been skipped
release: v0.3.0
Promote VERSION from 0.3.0-dev to 0.3.0. Finalise CHANGELOG entry with
phases 5-8 work (state machine + metrics, channels + maintenance windows,
OCI multi-arch distribution, pre-flight gates + deeper healthcheck +
auto-rollback). Refresh README quick-start to show both x86_64 and generic
ARM64 paths; update the roadmap status table to mark all v0.3 phases
complete and explicitly track the v0.3.1 follow-ups (OCI cosign,
LABEL=KSOLODATA on ARM64, real-hardware validation).

Add docs/release-notes-0.3.0.md as the operator-facing summary, including a
v0.2.x -> v0.3.0 migration section (non-breaking on live systems) and the
known-limitations list copied from CHANGELOG.

All tests green: cloud-init module, all 10 update-module packages,
shellcheck across init / build / test / hack scripts under the v0.3
severity policy.

Tagging is intentionally NOT done from this commit — that's a manual step
so the operator can decide when v0.3.0 is final. After tagging:

  git tag -a v0.3.0 -m "KubeSolo OS v0.3.0"
  git push origin v0.3.0

The push triggers .gitea/workflows/build-arm64.yaml which runs the full
ARM64 build on the Odroid runner.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 19:13:09 -06:00

182 lines
7.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# KubeSolo OS v0.3.0 — Release Notes
**Released:** 2026-05-14
v0.3.0 is the second feature release after v0.2.0 and the first release that
ships a generic ARM64 build alongside x86_64. The update agent grew up: it
now has an explicit on-disk lifecycle, OCI registry distribution, and a
fleet-friendly set of policy gates (channels, maintenance windows,
version-stepping-stones, pre-flight checks, auto-rollback).
This document is the operator-facing summary. The full per-phase changelog
lives in [CHANGELOG.md](../CHANGELOG.md).
## What's new
### Generic ARM64 build
The image you build with `make disk-image-arm64` now targets any UEFI-capable
ARM64 host: AWS Graviton, Oracle Ampere, generic ARM64 servers, future SBCs
with UEFI-compatible firmware. The kernel comes from kernel.org mainline LTS
(6.12.10 by default, configurable via `MAINLINE_KERNEL_VERSION` in
`build/config/versions.env`).
This is **distinct** from the Raspberry Pi build path. RPi keeps its
specialised kernel from `raspberrypi/linux` with bcm-defconfig + custom DTBs;
the generic ARM64 path uses mainline + arm64-defconfig + UEFI/virtio. See
[docs/arm64-architecture.md](arm64-architecture.md) for the file-by-file
split.
KubeSolo bumped to **v1.1.5** (was v1.1.0). New flags surfaced via cloud-init:
- `kubesolo.full` — disable edge-optimised k8s overrides
- `kubesolo.disable-ipv6` — disable IPv6 cluster-wide
- `kubesolo.db-wal-repair` — recover from unclean shutdowns
### Update lifecycle is now observable
The update agent writes a `state.json` at `/var/lib/kubesolo/update/state.json`
recording where the current attempt is in the lifecycle:
```
idle → checking → downloading → staged → activated → verifying → success
↘ rolled_back
↘ failed
```
`kubesolo-update status --json` emits the full state for orchestration tooling.
The Prometheus metrics endpoint gains three new series:
- `kubesolo_update_phase{phase="..."}` — 1 for current phase, 0 for others (all 9 always emitted)
- `kubesolo_update_attempts_total`
- `kubesolo_update_last_attempt_timestamp_seconds`
### OCI registry distribution
Update artifacts can now be pulled from any OCI-compliant registry alongside
the existing HTTP `latest.json` protocol:
```bash
# HTTP, unchanged from v0.2:
kubesolo-update apply --server https://updates.example.com
# New: OCI from ghcr.io (or quay.io, harbor, zot, ...)
kubesolo-update apply --registry ghcr.io/yourorg/kubesolo-os --tag stable
```
Multi-arch is handled transparently — the same `stable` tag points at a
manifest index, the agent picks the manifest matching its `runtime.GOARCH`.
Publish your own artifacts with `build/scripts/push-oci-artifact.sh`. See
the script's header comment for the full publishing flow.
### Policy gates
`apply` now enforces five gates before destroying the passive slot:
1. **Maintenance window** (configurable, e.g. `03:00-05:00`; wrapping
midnight supported)
2. **Node-block-label** — refuses if the K8s node carries
`updates.kubesolo.io/block=true` (workload-author kill switch)
3. **Channel**`stable` / `beta` / `edge` must match between the artifact
metadata and the local channel
4. **Architecture** — refuses cross-arch artifacts via `runtime.GOARCH` check
5. **Min compatible version** — stepping-stone enforcement; refuses an
upgrade that bypasses a required intermediate version
`--force` bypasses the maintenance window and node-block label (channel /
arch / min-version are non-negotiable). Failures are recorded in `state.json`
with a clear `LastError` field.
### Healthcheck deepening + auto-rollback
`kubesolo-update healthcheck` grew three optional probes:
- **Kube-system pods** must hold Running for ≥ N seconds before passing
- **Operator probe URL** — GET an operator-supplied endpoint; 200 = pass
- **Disk smoke test** — write/fsync/read/delete a probe file under
`/var/lib/kubesolo` to catch a wedged data partition
Plus auto-rollback: with `--auto-rollback-after N` (or `auto_rollback_after=`
in `update.conf`), after N consecutive post-activation failures, the agent
calls `ForceRollback()` and the operator/init is expected to reboot. The
counter resets on a clean pass.
### Persistent configuration via `/etc/kubesolo/update.conf`
Cloud-init writes this file on first boot from a new `updates:` block; you
can also hand-edit it. Recognised keys:
```
server = https://updates.example.com # or omit if using registry
registry = # OCI registry ref (alt to server)
channel = stable
maintenance_window = 03:00-05:00
pubkey = /etc/kubesolo/update-pubkey.hex
healthcheck_url = http://localhost:8000/ready
auto_rollback_after = 3
```
Cloud-init full reference at
[cloud-init/examples/full-config.yaml](../cloud-init/examples/full-config.yaml).
## Migration from v0.2.x
This is a non-breaking release for live systems. v0.2.x → v0.3.0 changes:
- **`state.json` will appear** at `/var/lib/kubesolo/update/state.json` the
first time a v0.3 agent runs `apply`. Pre-existing v0.2 deployments without
this file are fine — the agent treats a missing file as fresh Idle state.
- **`update.conf` is optional**. v0.2 deployments that pass everything via
CLI flags keep working unchanged.
- **HTTP `latest.json` protocol unchanged**. Existing update servers don't
need a rebuild.
- **GRUB env (boot counter, active slot)** unchanged. The bootloader's
rollback behaviour is the same.
- **No new mandatory kernel command-line parameters**.
To opt into the new lifecycle, transports, and gates, drop in an
`update.conf` (or update cloud-init) and switch to `--registry` if you want
OCI distribution.
## Known limitations
These shipped intentionally with v0.3.0 and are explicitly tracked for
v0.3.1+:
- **OCI signature verification** — the OCI transport is digest-verified
end-to-end via oras-go, but does not yet consume cosign-style referrer
attestations. The HTTP transport still honours `--pubkey` for `.sig`
files.
- **ARM64 LABEL=KSOLODATA** resolution doesn't work yet — piCore's
`blkid`/`findfs` crash on QEMU virt under our mainline kernel; the
static `busybox-static` we ship doesn't include those applets.
`build/grub/grub-arm64.cfg` hardcodes `kubesolo.data=/dev/vda4` as a
workaround. On real ARM64 hardware the device path may differ.
- **Real-hardware ARM64 validation** is pending. The image builds and
boots end-to-end under QEMU virt; production certification waits on a
Graviton / Ampere run.
- **AppArmor profile load fails on ARM64** (`apparmor_parser` ABI mismatch).
Init reports the failure; boot continues without AppArmor enforcement.
- **QEMU TCG performance** can trigger KubeSolo's first-boot image-import
deadline. Not an OS defect; real hardware and KVM-accelerated QEMU
complete the import in seconds.
## How to upgrade your build host
```bash
git pull
make distclean # optional — drops the build cache; full rebuild takes ~30 min
make iso # or disk-image, or disk-image-arm64
```
The Docker-based builder (`make docker-build`) regenerates its own image
from `build/Dockerfile.builder` on next invocation; oras 1.2.3 and
busybox-static are now included.
## Acknowledgements
v0.3.0 work was driven by a single multi-week pair-programming session
working through Phases 09 of the v0.3 roadmap. The Odroid self-hosted
Gitea Actions runner (`odroid.local`, arm64-linux) carried every ARM64
build during development.