release: v0.3.0
Some checks failed
CI / Go Tests (push) Successful in 1m29s
CI / Shellcheck (push) Successful in 46s
ARM64 Build / Build generic ARM64 disk image (push) Failing after 3s
Release / Test (push) Successful in 1m21s
CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Successful in 1m19s
CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Successful in 1m36s
Release / Build Binaries (amd64, linux, linux-amd64) (push) Failing after 1m27s
Release / Build Binaries (arm64, linux, linux-arm64) (push) Failing after 1m17s
Release / Build ISO (amd64) (push) Has been skipped
Release / Create Release (push) Has been skipped

Promote VERSION from 0.3.0-dev to 0.3.0. Finalise CHANGELOG entry with
phases 5-8 work (state machine + metrics, channels + maintenance windows,
OCI multi-arch distribution, pre-flight gates + deeper healthcheck +
auto-rollback). Refresh README quick-start to show both x86_64 and generic
ARM64 paths; update the roadmap status table to mark all v0.3 phases
complete and explicitly track the v0.3.1 follow-ups (OCI cosign,
LABEL=KSOLODATA on ARM64, real-hardware validation).

Add docs/release-notes-0.3.0.md as the operator-facing summary, including a
v0.2.x -> v0.3.0 migration section (non-breaking on live systems) and the
known-limitations list copied from CHANGELOG.

All tests green: cloud-init module, all 10 update-module packages,
shellcheck across init / build / test / hack scripts under the v0.3
severity policy.

Tagging is intentionally NOT done from this commit — that's a manual step
so the operator can decide when v0.3.0 is final. After tagging:

  git tag -a v0.3.0 -m "KubeSolo OS v0.3.0"
  git push origin v0.3.0

The push triggers .gitea/workflows/build-arm64.yaml which runs the full
ARM64 build on the Odroid runner.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-14 19:13:09 -06:00
parent 9fb894c5af
commit 3b47e7af68
4 changed files with 295 additions and 23 deletions

View File

@@ -5,7 +5,12 @@ All notable changes to KubeSolo OS are documented in this file.
Format based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
versioning follows [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [0.3.0-dev] - unreleased
## [0.3.0] - 2026-05-14
The main themes: generic ARM64 (not just Raspberry Pi), an honest update
lifecycle with state file + metrics, OCI multi-arch distribution via ghcr.io,
and policy gates (channels, maintenance windows, version stepping-stones,
pre-flight checks, auto-rollback).
### Added
@@ -30,6 +35,68 @@ versioning follows [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
- `docs/arm64-status.md` — Phase 3 status snapshot, known limitations, what's
needed to ship.
- `docs/ci-runners.md` — Gitea Actions runner setup (Odroid arm64-linux).
- Update agent state machine and observability (`update/pkg/state`):
- Persistent on-disk `state.json` at `/var/lib/kubesolo/update/state.json`
(atomic write via tmp + rename). Records Phase (Idle / Checking /
Downloading / Staged / Activated / Verifying / Success / RolledBack /
Failed), FromVersion, ToVersion, StartedAt, UpdatedAt, LastError,
AttemptCount, HealthCheckFailures.
- `apply`, `activate`, `healthcheck`, `rollback` all transition state
explicitly on entry / exit / failure. Errors land in LastError so
`status` can show why.
- `kubesolo-update status --json` emits the full state for
orchestration tooling. Human-readable mode adds an "Update Lifecycle"
section when not idle.
- New Prometheus metrics: `kubesolo_update_phase{phase="..."}` (all 9
phase labels always emitted), `kubesolo_update_attempts_total`,
`kubesolo_update_last_attempt_timestamp_seconds`.
- Channels, maintenance windows, version policy (`update/pkg/config`):
- `/etc/kubesolo/update.conf` (key=value, comments, missing-OK) configures
server, channel, maintenance_window, pubkey, healthcheck_url,
auto_rollback_after.
- `cloud-init` top-level `updates:` block writes `update.conf` on first
boot. Empty block leaves any existing file alone.
- `apply` enforces four gates before download: maintenance window,
channel match, runtime architecture match, min_compatible_version
stepping-stone. All gate failures land in the state machine as Failed
with a clear LastError. `--force` bypasses window + node-block-label.
- `UpdateMetadata` JSON gains `channel`, `min_compatible_version`,
`architecture` (all optional, omitempty).
- OCI registry distribution (`update/pkg/oci`, ~280 LOC, 9 tests):
- `kubesolo-update apply --registry ghcr.io/<org>/kubesolo-os --tag stable`
pulls update artifacts from any OCI-compliant registry. Multi-arch
indexes resolve to the runtime.GOARCH-matching manifest automatically.
- Custom media types: `application/vnd.kubesolo.os.kernel.v1+octet-stream`
and `application/vnd.kubesolo.os.initramfs.v1+gzip`. Annotations:
`io.kubesolo.os.{version,channel,architecture,min_compatible_version,
release_notes,release_date}`.
- End-to-end digest verification from manifest to blobs via oras-go/v2.
- `build/scripts/push-oci-artifact.sh` publishes per-arch artifacts via
`oras`. Multi-arch index composition documented inline.
- Dependencies added (update module only): oras.land/oras-go/v2 and
transitive opencontainers/{go-digest,image-spec} + golang.org/x/sync.
- Pre-flight gates and deeper healthcheck (`update/pkg/health` extended,
`update/pkg/partition` extended):
- Free-space pre-flight on the passive partition (image + 10% headroom)
via `partition.FreeBytes` / `HasFreeSpaceFor`.
- Node-block-label pre-flight: refuses if the local K8s node carries
`updates.kubesolo.io/block=true`. Silently allowed when no kubeconfig
(air-gap). Skipped by `--force`.
- `CheckKubeSystemReady` waits until every kube-system pod has held
Running for ≥ N seconds (configurable via
`--kube-system-settle`).
- `CheckProbeURL` GETs an operator-supplied URL; 200 = pass. Configurable
via `--healthcheck-url` or `healthcheck_url=` in update.conf.
- `CheckDiskWritable` writes / fsyncs / reads / deletes a probe file
under `/var/lib/kubesolo` to catch a wedged data partition.
- `--auto-rollback-after N` (also `auto_rollback_after=` in update.conf):
after N consecutive post-activation healthcheck failures, the agent
calls `ForceRollback()` and the operator/init reboots. Reset to 0 on
a clean pass.
- `.gitea/workflows/build-arm64.yaml` — full ARM64 build on the Odroid
self-hosted runner. Triggers on push to main, tags, and workflow_dispatch.
Boot smoke test marked continue-on-error pending KVM or real-hardware
validation.
### Changed
@@ -78,13 +145,23 @@ versioning follows [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
### Known limitations (deferred to follow-up)
- ARM64 `kubesolo.data=LABEL=KSOLODATA` resolution doesn't work yet —
piCore's `blkid`/`findfs` crash in QEMU and our static busybox lacks the
applets. Hardcoded `/dev/vda4` as a workaround. Production fix: ship
static `blkid`/`findfs` or replace LABEL resolution with a sysfs walk.
- AppArmor profile load fails on ARM64 (apparmor_parser ABI mismatch).
- KubeSolo's image-import deadline can fire under QEMU TCG (software
emulation). On real hardware (or with KVM) the import finishes in seconds.
- **ARM64 LABEL= resolution** doesn't work yet — piCore's `blkid`/`findfs`
crash in QEMU and our static busybox lacks the applets. Hardcoded
`/dev/vda4` as a workaround in `build/grub/grub-arm64.cfg`. Production
fix: ship static `blkid`/`findfs` or replace LABEL resolution with a
sysfs walk.
- **AppArmor profile load fails on ARM64** (apparmor_parser ABI mismatch).
Init reports it; boot continues without enforcement.
- **OCI signature verification** is deferred. The HTTP transport still
honours `--pubkey` for `.sig` files; the OCI transport is digest-verified
end-to-end via oras-go but does not yet consume cosign-style referrer
attestations. Targeted for v0.3.1.
- **Real-hardware validation** of the generic ARM64 image is still
pending. Builds and boots end-to-end under QEMU virt; production
certification waits on a Graviton / Ampere run.
- **QEMU TCG performance** can trigger KubeSolo's first-boot image-import
deadline. Not a defect in the OS itself; real hardware and KVM-accelerated
QEMU complete the import in seconds.
## [0.2.0] - 2026-02-12

View File

@@ -2,7 +2,7 @@
An immutable, bootable Linux distribution purpose-built for [KubeSolo](https://github.com/portainer/kubesolo) — Portainer's ultra-lightweight single-node Kubernetes.
> **Status:** x86_64 is stable — boots and runs K8s workloads, Portainer Edge Agent tested and connected. ARM64 generic UEFI is the active focus for v0.3.0; ARM64 Raspberry Pi support is paused pending physical hardware testing.
> **Status (v0.3.0):** x86_64 and generic ARM64 (UEFI / virtio / mainline kernel) both build and boot end-to-end. Update agent has an explicit state machine, OCI registry distribution alongside HTTP, channel + maintenance-window + version-stepping-stone gates, and auto-rollback. ARM64 Raspberry Pi support remains paused pending physical hardware. See [docs/release-notes-0.3.0.md](docs/release-notes-0.3.0.md) for the full v0.3.0 changelog.
## What is this?
@@ -24,23 +24,34 @@ KubeSolo OS combines **Tiny Core Linux** (~11 MB) with **KubeSolo** (single-bina
## Quick Start
### x86_64 ISO
```bash
# Fetch Tiny Core ISO + KubeSolo binary
make fetch
# Build custom kernel (first time only, ~25 min, cached)
make kernel
# Build Go binaries
make fetch # Tiny Core ISO + KubeSolo binary
make kernel # Custom kernel (first time only, ~25 min, cached)
make build-cloudinit build-update-agent
# Build bootable ISO
make rootfs initramfs iso
# Test in QEMU
make dev-vm
```
### Generic ARM64 disk image (v0.3.0+)
For Graviton / Ampere / generic UEFI ARM64 hosts:
```bash
make kernel-arm64 # Mainline 6.12 LTS kernel (first time only, ~30-60 min)
make rootfs-arm64 # Mainline kernel modules + KubeSolo arm64
make disk-image-arm64 # UEFI-bootable A/B GPT image
make test-boot-arm64-disk # boot smoke test under qemu-system-aarch64
```
### Raspberry Pi (work in progress)
Build path lives at `make kernel-rpi` / `make rpi-image`; needs physical
hardware to validate the firmware + autoboot.txt path. See
[docs/arm64-architecture.md](docs/arm64-architecture.md) for the two-track
build layout.
Or build everything at once inside Docker:
```bash
@@ -234,9 +245,12 @@ Metrics include: `kubesolo_os_info`, `boot_success`, `boot_counter`, `uptime_sec
| 5 | CI/CD, OCI distribution, Prometheus metrics, ARM64 cross-compile | Complete |
| 6 | Security hardening, AppArmor | Complete |
| - | Custom kernel build for container runtime fixes | Complete (x86_64) |
| 7 | ARM64 generic (mainline kernel, UEFI, virtio) | In progress (v0.3.0) |
| 8 | Update engine v2 (state machine, OCI distribution, channels) | In progress (v0.3.0) |
| 7 | ARM64 generic (mainline kernel, UEFI, virtio) | Complete (v0.3.0, QEMU validated) |
| 8 | Update engine v2 (state machine, channels, OCI, pre-flight gates) | Complete (v0.3.0) |
| - | ARM64 Raspberry Pi (custom kernel, firmware, SD card image) | Paused — needs hardware |
| - | OCI cosign signature verification | Planned for v0.3.1 |
| - | LABEL=KSOLODATA on ARM64 (replace blkid/findfs path) | Planned for v0.3.1 |
| - | Real-hardware ARM64 validation (Graviton / Ampere) | Planned for v0.3.1 |
## License

View File

@@ -1 +1 @@
0.3.0-dev
0.3.0

181
docs/release-notes-0.3.0.md Normal file
View File

@@ -0,0 +1,181 @@
# KubeSolo OS v0.3.0 — Release Notes
**Released:** 2026-05-14
v0.3.0 is the second feature release after v0.2.0 and the first release that
ships a generic ARM64 build alongside x86_64. The update agent grew up: it
now has an explicit on-disk lifecycle, OCI registry distribution, and a
fleet-friendly set of policy gates (channels, maintenance windows,
version-stepping-stones, pre-flight checks, auto-rollback).
This document is the operator-facing summary. The full per-phase changelog
lives in [CHANGELOG.md](../CHANGELOG.md).
## What's new
### Generic ARM64 build
The image you build with `make disk-image-arm64` now targets any UEFI-capable
ARM64 host: AWS Graviton, Oracle Ampere, generic ARM64 servers, future SBCs
with UEFI-compatible firmware. The kernel comes from kernel.org mainline LTS
(6.12.10 by default, configurable via `MAINLINE_KERNEL_VERSION` in
`build/config/versions.env`).
This is **distinct** from the Raspberry Pi build path. RPi keeps its
specialised kernel from `raspberrypi/linux` with bcm-defconfig + custom DTBs;
the generic ARM64 path uses mainline + arm64-defconfig + UEFI/virtio. See
[docs/arm64-architecture.md](arm64-architecture.md) for the file-by-file
split.
KubeSolo bumped to **v1.1.5** (was v1.1.0). New flags surfaced via cloud-init:
- `kubesolo.full` — disable edge-optimised k8s overrides
- `kubesolo.disable-ipv6` — disable IPv6 cluster-wide
- `kubesolo.db-wal-repair` — recover from unclean shutdowns
### Update lifecycle is now observable
The update agent writes a `state.json` at `/var/lib/kubesolo/update/state.json`
recording where the current attempt is in the lifecycle:
```
idle → checking → downloading → staged → activated → verifying → success
↘ rolled_back
↘ failed
```
`kubesolo-update status --json` emits the full state for orchestration tooling.
The Prometheus metrics endpoint gains three new series:
- `kubesolo_update_phase{phase="..."}` — 1 for current phase, 0 for others (all 9 always emitted)
- `kubesolo_update_attempts_total`
- `kubesolo_update_last_attempt_timestamp_seconds`
### OCI registry distribution
Update artifacts can now be pulled from any OCI-compliant registry alongside
the existing HTTP `latest.json` protocol:
```bash
# HTTP, unchanged from v0.2:
kubesolo-update apply --server https://updates.example.com
# New: OCI from ghcr.io (or quay.io, harbor, zot, ...)
kubesolo-update apply --registry ghcr.io/yourorg/kubesolo-os --tag stable
```
Multi-arch is handled transparently — the same `stable` tag points at a
manifest index, the agent picks the manifest matching its `runtime.GOARCH`.
Publish your own artifacts with `build/scripts/push-oci-artifact.sh`. See
the script's header comment for the full publishing flow.
### Policy gates
`apply` now enforces five gates before destroying the passive slot:
1. **Maintenance window** (configurable, e.g. `03:00-05:00`; wrapping
midnight supported)
2. **Node-block-label** — refuses if the K8s node carries
`updates.kubesolo.io/block=true` (workload-author kill switch)
3. **Channel**`stable` / `beta` / `edge` must match between the artifact
metadata and the local channel
4. **Architecture** — refuses cross-arch artifacts via `runtime.GOARCH` check
5. **Min compatible version** — stepping-stone enforcement; refuses an
upgrade that bypasses a required intermediate version
`--force` bypasses the maintenance window and node-block label (channel /
arch / min-version are non-negotiable). Failures are recorded in `state.json`
with a clear `LastError` field.
### Healthcheck deepening + auto-rollback
`kubesolo-update healthcheck` grew three optional probes:
- **Kube-system pods** must hold Running for ≥ N seconds before passing
- **Operator probe URL** — GET an operator-supplied endpoint; 200 = pass
- **Disk smoke test** — write/fsync/read/delete a probe file under
`/var/lib/kubesolo` to catch a wedged data partition
Plus auto-rollback: with `--auto-rollback-after N` (or `auto_rollback_after=`
in `update.conf`), after N consecutive post-activation failures, the agent
calls `ForceRollback()` and the operator/init is expected to reboot. The
counter resets on a clean pass.
### Persistent configuration via `/etc/kubesolo/update.conf`
Cloud-init writes this file on first boot from a new `updates:` block; you
can also hand-edit it. Recognised keys:
```
server = https://updates.example.com # or omit if using registry
registry = # OCI registry ref (alt to server)
channel = stable
maintenance_window = 03:00-05:00
pubkey = /etc/kubesolo/update-pubkey.hex
healthcheck_url = http://localhost:8000/ready
auto_rollback_after = 3
```
Cloud-init full reference at
[cloud-init/examples/full-config.yaml](../cloud-init/examples/full-config.yaml).
## Migration from v0.2.x
This is a non-breaking release for live systems. v0.2.x → v0.3.0 changes:
- **`state.json` will appear** at `/var/lib/kubesolo/update/state.json` the
first time a v0.3 agent runs `apply`. Pre-existing v0.2 deployments without
this file are fine — the agent treats a missing file as fresh Idle state.
- **`update.conf` is optional**. v0.2 deployments that pass everything via
CLI flags keep working unchanged.
- **HTTP `latest.json` protocol unchanged**. Existing update servers don't
need a rebuild.
- **GRUB env (boot counter, active slot)** unchanged. The bootloader's
rollback behaviour is the same.
- **No new mandatory kernel command-line parameters**.
To opt into the new lifecycle, transports, and gates, drop in an
`update.conf` (or update cloud-init) and switch to `--registry` if you want
OCI distribution.
## Known limitations
These shipped intentionally with v0.3.0 and are explicitly tracked for
v0.3.1+:
- **OCI signature verification** — the OCI transport is digest-verified
end-to-end via oras-go, but does not yet consume cosign-style referrer
attestations. The HTTP transport still honours `--pubkey` for `.sig`
files.
- **ARM64 LABEL=KSOLODATA** resolution doesn't work yet — piCore's
`blkid`/`findfs` crash on QEMU virt under our mainline kernel; the
static `busybox-static` we ship doesn't include those applets.
`build/grub/grub-arm64.cfg` hardcodes `kubesolo.data=/dev/vda4` as a
workaround. On real ARM64 hardware the device path may differ.
- **Real-hardware ARM64 validation** is pending. The image builds and
boots end-to-end under QEMU virt; production certification waits on a
Graviton / Ampere run.
- **AppArmor profile load fails on ARM64** (`apparmor_parser` ABI mismatch).
Init reports the failure; boot continues without AppArmor enforcement.
- **QEMU TCG performance** can trigger KubeSolo's first-boot image-import
deadline. Not an OS defect; real hardware and KVM-accelerated QEMU
complete the import in seconds.
## How to upgrade your build host
```bash
git pull
make distclean # optional — drops the build cache; full rebuild takes ~30 min
make iso # or disk-image, or disk-image-arm64
```
The Docker-based builder (`make docker-build`) regenerates its own image
from `build/Dockerfile.builder` on next invocation; oras 1.2.3 and
busybox-static are now included.
## Acknowledgements
v0.3.0 work was driven by a single multi-week pair-programming session
working through Phases 09 of the v0.3 roadmap. The Odroid self-hosted
Gitea Actions runner (`odroid.local`, arm64-linux) carried every ARM64
build during development.