release: v0.3.0
Some checks failed
CI / Go Tests (push) Successful in 1m29s
CI / Shellcheck (push) Successful in 46s
ARM64 Build / Build generic ARM64 disk image (push) Failing after 3s
Release / Test (push) Successful in 1m21s
CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Successful in 1m19s
CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Successful in 1m36s
Release / Build Binaries (amd64, linux, linux-amd64) (push) Failing after 1m27s
Release / Build Binaries (arm64, linux, linux-arm64) (push) Failing after 1m17s
Release / Build ISO (amd64) (push) Has been skipped
Release / Create Release (push) Has been skipped
Some checks failed
CI / Go Tests (push) Successful in 1m29s
CI / Shellcheck (push) Successful in 46s
ARM64 Build / Build generic ARM64 disk image (push) Failing after 3s
Release / Test (push) Successful in 1m21s
CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Successful in 1m19s
CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Successful in 1m36s
Release / Build Binaries (amd64, linux, linux-amd64) (push) Failing after 1m27s
Release / Build Binaries (arm64, linux, linux-arm64) (push) Failing after 1m17s
Release / Build ISO (amd64) (push) Has been skipped
Release / Create Release (push) Has been skipped
Promote VERSION from 0.3.0-dev to 0.3.0. Finalise CHANGELOG entry with phases 5-8 work (state machine + metrics, channels + maintenance windows, OCI multi-arch distribution, pre-flight gates + deeper healthcheck + auto-rollback). Refresh README quick-start to show both x86_64 and generic ARM64 paths; update the roadmap status table to mark all v0.3 phases complete and explicitly track the v0.3.1 follow-ups (OCI cosign, LABEL=KSOLODATA on ARM64, real-hardware validation). Add docs/release-notes-0.3.0.md as the operator-facing summary, including a v0.2.x -> v0.3.0 migration section (non-breaking on live systems) and the known-limitations list copied from CHANGELOG. All tests green: cloud-init module, all 10 update-module packages, shellcheck across init / build / test / hack scripts under the v0.3 severity policy. Tagging is intentionally NOT done from this commit — that's a manual step so the operator can decide when v0.3.0 is final. After tagging: git tag -a v0.3.0 -m "KubeSolo OS v0.3.0" git push origin v0.3.0 The push triggers .gitea/workflows/build-arm64.yaml which runs the full ARM64 build on the Odroid runner. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
181
docs/release-notes-0.3.0.md
Normal file
181
docs/release-notes-0.3.0.md
Normal file
@@ -0,0 +1,181 @@
|
||||
# KubeSolo OS v0.3.0 — Release Notes
|
||||
|
||||
**Released:** 2026-05-14
|
||||
|
||||
v0.3.0 is the second feature release after v0.2.0 and the first release that
|
||||
ships a generic ARM64 build alongside x86_64. The update agent grew up: it
|
||||
now has an explicit on-disk lifecycle, OCI registry distribution, and a
|
||||
fleet-friendly set of policy gates (channels, maintenance windows,
|
||||
version-stepping-stones, pre-flight checks, auto-rollback).
|
||||
|
||||
This document is the operator-facing summary. The full per-phase changelog
|
||||
lives in [CHANGELOG.md](../CHANGELOG.md).
|
||||
|
||||
## What's new
|
||||
|
||||
### Generic ARM64 build
|
||||
|
||||
The image you build with `make disk-image-arm64` now targets any UEFI-capable
|
||||
ARM64 host: AWS Graviton, Oracle Ampere, generic ARM64 servers, future SBCs
|
||||
with UEFI-compatible firmware. The kernel comes from kernel.org mainline LTS
|
||||
(6.12.10 by default, configurable via `MAINLINE_KERNEL_VERSION` in
|
||||
`build/config/versions.env`).
|
||||
|
||||
This is **distinct** from the Raspberry Pi build path. RPi keeps its
|
||||
specialised kernel from `raspberrypi/linux` with bcm-defconfig + custom DTBs;
|
||||
the generic ARM64 path uses mainline + arm64-defconfig + UEFI/virtio. See
|
||||
[docs/arm64-architecture.md](arm64-architecture.md) for the file-by-file
|
||||
split.
|
||||
|
||||
KubeSolo bumped to **v1.1.5** (was v1.1.0). New flags surfaced via cloud-init:
|
||||
- `kubesolo.full` — disable edge-optimised k8s overrides
|
||||
- `kubesolo.disable-ipv6` — disable IPv6 cluster-wide
|
||||
- `kubesolo.db-wal-repair` — recover from unclean shutdowns
|
||||
|
||||
### Update lifecycle is now observable
|
||||
|
||||
The update agent writes a `state.json` at `/var/lib/kubesolo/update/state.json`
|
||||
recording where the current attempt is in the lifecycle:
|
||||
|
||||
```
|
||||
idle → checking → downloading → staged → activated → verifying → success
|
||||
↘ rolled_back
|
||||
↘ failed
|
||||
```
|
||||
|
||||
`kubesolo-update status --json` emits the full state for orchestration tooling.
|
||||
The Prometheus metrics endpoint gains three new series:
|
||||
|
||||
- `kubesolo_update_phase{phase="..."}` — 1 for current phase, 0 for others (all 9 always emitted)
|
||||
- `kubesolo_update_attempts_total`
|
||||
- `kubesolo_update_last_attempt_timestamp_seconds`
|
||||
|
||||
### OCI registry distribution
|
||||
|
||||
Update artifacts can now be pulled from any OCI-compliant registry alongside
|
||||
the existing HTTP `latest.json` protocol:
|
||||
|
||||
```bash
|
||||
# HTTP, unchanged from v0.2:
|
||||
kubesolo-update apply --server https://updates.example.com
|
||||
|
||||
# New: OCI from ghcr.io (or quay.io, harbor, zot, ...)
|
||||
kubesolo-update apply --registry ghcr.io/yourorg/kubesolo-os --tag stable
|
||||
```
|
||||
|
||||
Multi-arch is handled transparently — the same `stable` tag points at a
|
||||
manifest index, the agent picks the manifest matching its `runtime.GOARCH`.
|
||||
|
||||
Publish your own artifacts with `build/scripts/push-oci-artifact.sh`. See
|
||||
the script's header comment for the full publishing flow.
|
||||
|
||||
### Policy gates
|
||||
|
||||
`apply` now enforces five gates before destroying the passive slot:
|
||||
|
||||
1. **Maintenance window** (configurable, e.g. `03:00-05:00`; wrapping
|
||||
midnight supported)
|
||||
2. **Node-block-label** — refuses if the K8s node carries
|
||||
`updates.kubesolo.io/block=true` (workload-author kill switch)
|
||||
3. **Channel** — `stable` / `beta` / `edge` must match between the artifact
|
||||
metadata and the local channel
|
||||
4. **Architecture** — refuses cross-arch artifacts via `runtime.GOARCH` check
|
||||
5. **Min compatible version** — stepping-stone enforcement; refuses an
|
||||
upgrade that bypasses a required intermediate version
|
||||
|
||||
`--force` bypasses the maintenance window and node-block label (channel /
|
||||
arch / min-version are non-negotiable). Failures are recorded in `state.json`
|
||||
with a clear `LastError` field.
|
||||
|
||||
### Healthcheck deepening + auto-rollback
|
||||
|
||||
`kubesolo-update healthcheck` grew three optional probes:
|
||||
|
||||
- **Kube-system pods** must hold Running for ≥ N seconds before passing
|
||||
- **Operator probe URL** — GET an operator-supplied endpoint; 200 = pass
|
||||
- **Disk smoke test** — write/fsync/read/delete a probe file under
|
||||
`/var/lib/kubesolo` to catch a wedged data partition
|
||||
|
||||
Plus auto-rollback: with `--auto-rollback-after N` (or `auto_rollback_after=`
|
||||
in `update.conf`), after N consecutive post-activation failures, the agent
|
||||
calls `ForceRollback()` and the operator/init is expected to reboot. The
|
||||
counter resets on a clean pass.
|
||||
|
||||
### Persistent configuration via `/etc/kubesolo/update.conf`
|
||||
|
||||
Cloud-init writes this file on first boot from a new `updates:` block; you
|
||||
can also hand-edit it. Recognised keys:
|
||||
|
||||
```
|
||||
server = https://updates.example.com # or omit if using registry
|
||||
registry = # OCI registry ref (alt to server)
|
||||
channel = stable
|
||||
maintenance_window = 03:00-05:00
|
||||
pubkey = /etc/kubesolo/update-pubkey.hex
|
||||
healthcheck_url = http://localhost:8000/ready
|
||||
auto_rollback_after = 3
|
||||
```
|
||||
|
||||
Cloud-init full reference at
|
||||
[cloud-init/examples/full-config.yaml](../cloud-init/examples/full-config.yaml).
|
||||
|
||||
## Migration from v0.2.x
|
||||
|
||||
This is a non-breaking release for live systems. v0.2.x → v0.3.0 changes:
|
||||
|
||||
- **`state.json` will appear** at `/var/lib/kubesolo/update/state.json` the
|
||||
first time a v0.3 agent runs `apply`. Pre-existing v0.2 deployments without
|
||||
this file are fine — the agent treats a missing file as fresh Idle state.
|
||||
- **`update.conf` is optional**. v0.2 deployments that pass everything via
|
||||
CLI flags keep working unchanged.
|
||||
- **HTTP `latest.json` protocol unchanged**. Existing update servers don't
|
||||
need a rebuild.
|
||||
- **GRUB env (boot counter, active slot)** unchanged. The bootloader's
|
||||
rollback behaviour is the same.
|
||||
- **No new mandatory kernel command-line parameters**.
|
||||
|
||||
To opt into the new lifecycle, transports, and gates, drop in an
|
||||
`update.conf` (or update cloud-init) and switch to `--registry` if you want
|
||||
OCI distribution.
|
||||
|
||||
## Known limitations
|
||||
|
||||
These shipped intentionally with v0.3.0 and are explicitly tracked for
|
||||
v0.3.1+:
|
||||
|
||||
- **OCI signature verification** — the OCI transport is digest-verified
|
||||
end-to-end via oras-go, but does not yet consume cosign-style referrer
|
||||
attestations. The HTTP transport still honours `--pubkey` for `.sig`
|
||||
files.
|
||||
- **ARM64 LABEL=KSOLODATA** resolution doesn't work yet — piCore's
|
||||
`blkid`/`findfs` crash on QEMU virt under our mainline kernel; the
|
||||
static `busybox-static` we ship doesn't include those applets.
|
||||
`build/grub/grub-arm64.cfg` hardcodes `kubesolo.data=/dev/vda4` as a
|
||||
workaround. On real ARM64 hardware the device path may differ.
|
||||
- **Real-hardware ARM64 validation** is pending. The image builds and
|
||||
boots end-to-end under QEMU virt; production certification waits on a
|
||||
Graviton / Ampere run.
|
||||
- **AppArmor profile load fails on ARM64** (`apparmor_parser` ABI mismatch).
|
||||
Init reports the failure; boot continues without AppArmor enforcement.
|
||||
- **QEMU TCG performance** can trigger KubeSolo's first-boot image-import
|
||||
deadline. Not an OS defect; real hardware and KVM-accelerated QEMU
|
||||
complete the import in seconds.
|
||||
|
||||
## How to upgrade your build host
|
||||
|
||||
```bash
|
||||
git pull
|
||||
make distclean # optional — drops the build cache; full rebuild takes ~30 min
|
||||
make iso # or disk-image, or disk-image-arm64
|
||||
```
|
||||
|
||||
The Docker-based builder (`make docker-build`) regenerates its own image
|
||||
from `build/Dockerfile.builder` on next invocation; oras 1.2.3 and
|
||||
busybox-static are now included.
|
||||
|
||||
## Acknowledgements
|
||||
|
||||
v0.3.0 work was driven by a single multi-week pair-programming session
|
||||
working through Phases 0–9 of the v0.3 roadmap. The Odroid self-hosted
|
||||
Gitea Actions runner (`odroid.local`, arm64-linux) carried every ARM64
|
||||
build during development.
|
||||
Reference in New Issue
Block a user