Files
kubesolo-os/README.md
Adolfo Delorenzo 3b47e7af68
Some checks failed
CI / Go Tests (push) Successful in 1m29s
CI / Shellcheck (push) Successful in 46s
ARM64 Build / Build generic ARM64 disk image (push) Failing after 3s
Release / Test (push) Successful in 1m21s
CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Successful in 1m19s
CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Successful in 1m36s
Release / Build Binaries (amd64, linux, linux-amd64) (push) Failing after 1m27s
Release / Build Binaries (arm64, linux, linux-arm64) (push) Failing after 1m17s
Release / Build ISO (amd64) (push) Has been skipped
Release / Create Release (push) Has been skipped
release: v0.3.0
Promote VERSION from 0.3.0-dev to 0.3.0. Finalise CHANGELOG entry with
phases 5-8 work (state machine + metrics, channels + maintenance windows,
OCI multi-arch distribution, pre-flight gates + deeper healthcheck +
auto-rollback). Refresh README quick-start to show both x86_64 and generic
ARM64 paths; update the roadmap status table to mark all v0.3 phases
complete and explicitly track the v0.3.1 follow-ups (OCI cosign,
LABEL=KSOLODATA on ARM64, real-hardware validation).

Add docs/release-notes-0.3.0.md as the operator-facing summary, including a
v0.2.x -> v0.3.0 migration section (non-breaking on live systems) and the
known-limitations list copied from CHANGELOG.

All tests green: cloud-init module, all 10 update-module packages,
shellcheck across init / build / test / hack scripts under the v0.3
severity policy.

Tagging is intentionally NOT done from this commit — that's a manual step
so the operator can decide when v0.3.0 is final. After tagging:

  git tag -a v0.3.0 -m "KubeSolo OS v0.3.0"
  git push origin v0.3.0

The push triggers .gitea/workflows/build-arm64.yaml which runs the full
ARM64 build on the Odroid runner.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 19:13:09 -06:00

258 lines
10 KiB
Markdown

# KubeSolo OS
An immutable, bootable Linux distribution purpose-built for [KubeSolo](https://github.com/portainer/kubesolo) — Portainer's ultra-lightweight single-node Kubernetes.
> **Status (v0.3.0):** x86_64 and generic ARM64 (UEFI / virtio / mainline kernel) both build and boot end-to-end. Update agent has an explicit state machine, OCI registry distribution alongside HTTP, channel + maintenance-window + version-stepping-stone gates, and auto-rollback. ARM64 Raspberry Pi support remains paused pending physical hardware. See [docs/release-notes-0.3.0.md](docs/release-notes-0.3.0.md) for the full v0.3.0 changelog.
## What is this?
KubeSolo OS combines **Tiny Core Linux** (~11 MB) with **KubeSolo** (single-binary Kubernetes) to create an appliance-like K8s node that:
- Boots to a functional Kubernetes cluster in ~35 seconds
- Runs entirely from RAM with a read-only SquashFS root
- Persists K8s state across reboots via a dedicated data partition
- Uses a custom kernel (6.18.2-tinycore64) optimized for containers
- Supports first-boot configuration via cloud-init YAML
- Performs atomic A/B updates with automatic GRUB-based rollback
- Signs update images with Ed25519 for integrity verification
- Exposes Prometheus metrics for monitoring
- Integrates with Portainer Edge for fleet management
- Ships as ISO, raw disk image, or OCI container
- Requires no SSH, no package manager, no writable system files
**Target use cases:** IoT/IIoT edge, air-gapped deployments, single-node K8s appliances, kiosk/POS systems, resource-constrained hardware.
## Quick Start
### x86_64 ISO
```bash
make fetch # Tiny Core ISO + KubeSolo binary
make kernel # Custom kernel (first time only, ~25 min, cached)
make build-cloudinit build-update-agent
make rootfs initramfs iso
make dev-vm
```
### Generic ARM64 disk image (v0.3.0+)
For Graviton / Ampere / generic UEFI ARM64 hosts:
```bash
make kernel-arm64 # Mainline 6.12 LTS kernel (first time only, ~30-60 min)
make rootfs-arm64 # Mainline kernel modules + KubeSolo arm64
make disk-image-arm64 # UEFI-bootable A/B GPT image
make test-boot-arm64-disk # boot smoke test under qemu-system-aarch64
```
### Raspberry Pi (work in progress)
Build path lives at `make kernel-rpi` / `make rpi-image`; needs physical
hardware to validate the firmware + autoboot.txt path. See
[docs/arm64-architecture.md](docs/arm64-architecture.md) for the two-track
build layout.
Or build everything at once inside Docker:
```bash
make docker-build
```
After boot, retrieve the kubeconfig and manage your cluster from the host:
```bash
curl -s http://localhost:8080 > ~/.kube/kubesolo-config
export KUBECONFIG=~/.kube/kubesolo-config
kubectl get nodes
```
### Portainer Edge Agent
Pass Edge credentials via boot parameters:
```bash
./hack/dev-vm.sh --edge-id=YOUR_EDGE_ID --edge-key=YOUR_EDGE_KEY
```
Or configure via [cloud-init YAML](cloud-init/examples/portainer-edge.yaml).
## Requirements
**Build host:**
- Linux x86_64 with root/sudo (for loop mounts)
- Go 1.22+ (for cloud-init and update agent)
- Tools: `cpio`, `gzip`, `wget`, `curl`, `syslinux` (or use `make docker-build`)
**Runtime:**
- x86_64 hardware or VM (ARM64 cross-compilation available)
- 512 MB RAM minimum (1 GB+ recommended)
- 8 GB disk (for persistent data partition)
## Architecture
```
Boot Media (ISO or Disk Image)
├── GRUB 2 bootloader (A/B slot selection, rollback counter)
└── Kernel + Initramfs (kubesolo-os.gz)
├── switch_root → SquashFS root (read-only, in RAM)
├── Persistent data partition (ext4, bind-mounted)
│ ├── /var/lib/kubesolo (K8s state, certs, SQLite)
│ ├── /var/lib/containerd (container images)
│ └── /etc/kubesolo (node configuration)
├── Custom init (POSIX sh, staged boot 00→90)
│ └── Stage 45: cloud-init (Go binary)
├── containerd (bundled with KubeSolo)
└── KubeSolo (single-binary K8s)
```
### Partition Layout (Disk Image)
```
GPT Disk (minimum 8 GB):
Part 1: EFI/Boot (256 MB, FAT32) — GRUB + A/B boot logic
Part 2: System A (512 MB, ext4) — vmlinuz + kubesolo-os.gz (active)
Part 3: System B (512 MB, ext4) — vmlinuz + kubesolo-os.gz (passive)
Part 4: Data (remaining, ext4) — persistent K8s state
```
See [docs/design/kubesolo-os-design.md](docs/design/kubesolo-os-design.md) for the full architecture document.
## Custom Kernel
The stock Tiny Core 17.0 kernel lacks several configs required for containers. KubeSolo OS builds a custom kernel (6.18.2-tinycore64) that adds:
- `CONFIG_CGROUP_BPF` — cgroup v2 device control via BPF
- `CONFIG_DEVTMPFS` / `CONFIG_DEVTMPFS_MOUNT` — automatic /dev node creation
- `CONFIG_MEMCG` — memory cgroup controller
- `CONFIG_CFS_BANDWIDTH` — CPU bandwidth throttling
Unnecessary subsystems (sound, GPU, wireless, Bluetooth, etc.) are stripped to keep the kernel minimal. Build is cached in `build/cache/custom-kernel/`.
## Cloud-Init
First-boot configuration via a simple YAML schema. All [documented KubeSolo flags](https://www.kubesolo.io/documentation#install) are supported:
```yaml
hostname: edge-node-01
network:
mode: static
address: 192.168.1.100/24
gateway: 192.168.1.1
dns:
- 8.8.8.8
kubesolo:
local-storage: true
local-storage-shared-path: "/mnt/shared"
apiserver-extra-sans:
- edge-node-01.local
debug: false
pprof-server: false
portainer-edge-id: "your-edge-id"
portainer-edge-key: "your-edge-key"
portainer-edge-async: true
```
See [docs/cloud-init.md](docs/cloud-init.md) and the [examples](cloud-init/examples/).
## Atomic Updates
A/B partition scheme with GRUB boot counter for automatic rollback:
1. Update agent downloads new image to passive partition
2. GRUB boots new partition with `boot_counter=3`
3. Health check verifies containerd + K8s API + node Ready → sets `boot_success=1`
4. On 3 consecutive boot failures, GRUB auto-rolls back to previous slot
Updates can be signed with Ed25519 for integrity verification. A K8s CronJob checks for updates every 6 hours.
See [docs/update-flow.md](docs/update-flow.md).
## Monitoring
The update agent exposes Prometheus metrics on port 9100:
```bash
kubesolo-update metrics --listen :9100
```
Metrics include: `kubesolo_os_info`, `boot_success`, `boot_counter`, `uptime_seconds`, `update_available`, `memory_total_bytes`, `memory_available_bytes`.
## Project Structure
```
├── Makefile # Build orchestration
├── build/ # Build scripts, kernel config, rootfs overlays
│ └── scripts/
│ ├── build-kernel.sh # Custom kernel compilation
│ ├── fetch-components.sh # Download components
│ ├── create-iso.sh # Bootable ISO
│ ├── create-disk-image.sh # A/B partition disk image
│ └── create-oci-image.sh # OCI container image
├── init/ # Custom init system (POSIX sh)
│ ├── init.sh # Main init + switch_root
│ └── lib/ # Staged boot scripts (00-90)
├── cloud-init/ # Go cloud-init parser
├── update/ # Go atomic update agent
├── test/ # QEMU-based automated tests + benchmarks
├── hack/ # Developer utilities (dev-vm, SSH, USB)
├── docs/ # Documentation
│ ├── design/ # Architecture design document
│ ├── boot-flow.md # Boot sequence reference
│ ├── update-flow.md # A/B update reference
│ ├── cloud-init.md # Cloud-init configuration reference
│ └── deployment-guide.md # Deployment and operations guide
└── .gitea/workflows/ # CI/CD (Gitea Actions)
```
## Make Targets
| Target | Description |
|--------|-------------|
| `make fetch` | Download Tiny Core ISO + KubeSolo binary |
| `make kernel` | Build custom kernel (cached) |
| `make build-cloudinit` | Compile cloud-init Go binary |
| `make build-update-agent` | Compile update agent Go binary |
| `make rootfs` | Extract Tiny Core + inject KubeSolo |
| `make initramfs` | Pack initramfs (kubesolo-os.gz) |
| `make iso` | Create bootable ISO |
| `make disk-image` | Create A/B partition disk image |
| `make oci-image` | Package as OCI container |
| `make build-cross` | Cross-compile for amd64 + arm64 |
| `make docker-build` | Build everything in Docker |
| `make quick` | Fast rebuild (re-inject + repack + ISO) |
| `make dev-vm` | Launch QEMU dev VM (Linux + macOS) |
| `make test-all` | Run all tests |
## Documentation
- [Architecture Design](docs/design/kubesolo-os-design.md) — full research and technical specification
- [Boot Flow](docs/boot-flow.md) — boot sequence from GRUB to K8s Ready
- [Update Flow](docs/update-flow.md) — A/B atomic update mechanism
- [Cloud-Init](docs/cloud-init.md) — first-boot configuration reference
- [Deployment Guide](docs/deployment-guide.md) — installation, operations, troubleshooting
## Roadmap
| Phase | Scope | Status |
|-------|-------|--------|
| 1 | PoC: boot Tiny Core + KubeSolo, verify K8s | Complete (x86_64) |
| 2 | Cloud-init Go parser, network, hostname | Complete |
| 3 | A/B atomic updates, GRUB, rollback agent | Complete (x86_64) |
| 4 | Ed25519 signing, Portainer Edge, SSH extension | Complete |
| 5 | CI/CD, OCI distribution, Prometheus metrics, ARM64 cross-compile | Complete |
| 6 | Security hardening, AppArmor | Complete |
| - | Custom kernel build for container runtime fixes | Complete (x86_64) |
| 7 | ARM64 generic (mainline kernel, UEFI, virtio) | Complete (v0.3.0, QEMU validated) |
| 8 | Update engine v2 (state machine, channels, OCI, pre-flight gates) | Complete (v0.3.0) |
| - | ARM64 Raspberry Pi (custom kernel, firmware, SD card image) | Paused — needs hardware |
| - | OCI cosign signature verification | Planned for v0.3.1 |
| - | LABEL=KSOLODATA on ARM64 (replace blkid/findfs path) | Planned for v0.3.1 |
| - | Real-hardware ARM64 validation (Graviton / Ampere) | Planned for v0.3.1 |
## License
MIT License — see [LICENSE](LICENSE) for details.