Files
kubesolo-os/docs/boot-flow.md
Adolfo Delorenzo e372df578b feat: initial Phase 1 PoC scaffolding for KubeSolo OS
Complete Phase 1 implementation of KubeSolo OS — an immutable, bootable
Linux distribution built on Tiny Core Linux for running KubeSolo
single-node Kubernetes.

Build system:
- Makefile with fetch, rootfs, initramfs, iso, disk-image targets
- Dockerfile.builder for reproducible builds
- Scripts to download Tiny Core, extract rootfs, inject KubeSolo,
  pack initramfs, and create bootable ISO/disk images

Init system (10 POSIX sh stages):
- Early mount (proc/sys/dev/cgroup2), cmdline parsing, persistent
  mount with bind-mounts, kernel module loading, sysctl, DHCP
  networking, hostname, clock sync, containerd prep, KubeSolo exec

Shared libraries:
- functions.sh (device wait, IP lookup, config helpers)
- network.sh (static IP, config persistence, interface detection)
- health.sh (containerd, API server, node readiness checks)
- Emergency shell for boot failure debugging

Testing:
- QEMU boot test with serial log marker detection
- K8s readiness test with kubectl verification
- Persistence test (reboot + verify state survives)
- Workload deployment test (nginx pod)
- Local storage test (PVC + local-path provisioner)
- Network policy test
- Reusable run-vm.sh launcher

Developer tools:
- dev-vm.sh (interactive QEMU with port forwarding)
- rebuild-initramfs.sh (fast iteration)
- inject-ssh.sh (dropbear SSH for debugging)
- extract-kernel-config.sh + kernel-audit.sh

Documentation:
- Full design document with architecture research
- Boot flow documentation covering all 10 init stages
- Cloud-init examples (DHCP, static IP, Portainer Edge, air-gapped)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 10:18:42 -06:00

182 lines
6.7 KiB
Markdown

# KubeSolo OS — Boot Flow
This document describes the boot sequence from power-on to a running Kubernetes node.
## Overview
```
BIOS/UEFI → Bootloader (isolinux) → Linux Kernel → initramfs → /sbin/init
→ Stage 00: Mount virtual filesystems
→ Stage 10: Parse boot parameters
→ Stage 20: Mount persistent storage
→ Stage 30: Load kernel modules
→ Stage 40: Apply sysctl settings
→ Stage 50: Configure networking
→ Stage 60: Set hostname
→ Stage 70: Set system clock
→ Stage 80: Prepare containerd prerequisites
→ Stage 90: exec KubeSolo (becomes PID 1)
```
## Stage Details
### Bootloader (isolinux/syslinux)
The ISO uses isolinux with several boot options:
| Label | Description |
|-------|-------------|
| `kubesolo` | Normal boot (default, 3s timeout) |
| `kubesolo-debug` | Boot with verbose init logging + serial console |
| `kubesolo-shell` | Drop to emergency shell immediately |
| `kubesolo-nopersist` | Run fully in RAM, no persistent mount |
Kernel command line always includes `kubesolo.data=LABEL=KSOLODATA` to specify the persistent data partition.
### Stage 00 — Early Mount (`00-early-mount.sh`)
Mounts essential virtual filesystems before anything else can work:
- `/proc` — process information
- `/sys` — sysfs (device/driver info)
- `/dev` — devtmpfs (block devices)
- `/tmp`, `/run` — tmpfs scratch space
- `/dev/pts`, `/dev/shm` — pseudo-terminals, shared memory
- `/sys/fs/cgroup` — cgroup v2 unified hierarchy (v1 fallback if unavailable)
### Stage 10 — Parse Cmdline (`10-parse-cmdline.sh`)
Reads `/proc/cmdline` and sets environment variables:
| Boot Parameter | Variable | Description |
|---------------|----------|-------------|
| `kubesolo.data=<dev>` | `KUBESOLO_DATA_DEV` | Block device for persistent data |
| `kubesolo.debug` | `KUBESOLO_DEBUG` | Enables `set -x` for trace logging |
| `kubesolo.shell` | `KUBESOLO_SHELL` | Drop to shell after this stage |
| `kubesolo.nopersist` | `KUBESOLO_NOPERSIST` | Skip persistent mount |
| `kubesolo.cloudinit=<path>` | `KUBESOLO_CLOUDINIT` | Cloud-init config file path |
| `kubesolo.flags=<flags>` | `KUBESOLO_EXTRA_FLAGS` | Extra flags for KubeSolo binary |
If `kubesolo.data` is not specified, auto-detects a partition with label `KSOLODATA` via `blkid`. If none found, falls back to RAM-only mode.
### Stage 20 — Persistent Mount (`20-persistent-mount.sh`)
If not in RAM-only mode:
1. Waits up to 30s for the data device to appear (handles slow USB, virtio)
2. Mounts the ext4 data partition at `/mnt/data`
3. Creates directory structure on first boot
4. Bind-mounts persistent directories:
| Source (data partition) | Mount Point | Content |
|------------------------|-------------|---------|
| `/mnt/data/kubesolo` | `/var/lib/kubesolo` | K8s state, certs, SQLite DB |
| `/mnt/data/containerd` | `/var/lib/containerd` | Container images + layers |
| `/mnt/data/etc-kubesolo` | `/etc/kubesolo` | Node configuration |
| `/mnt/data/log` | `/var/log` | System + K8s logs |
| `/mnt/data/usr-local` | `/usr/local` | User binaries |
In RAM-only mode, these directories are backed by tmpfs and lost on reboot.
### Stage 30 — Kernel Modules (`30-kernel-modules.sh`)
Loads kernel modules listed in `/usr/lib/kubesolo-os/modules.list`:
- `br_netfilter`, `bridge`, `veth`, `vxlan` — K8s pod networking
- `ip_tables`, `iptable_nat`, `nf_nat`, `nf_conntrack` — service routing
- `overlay` — containerd storage driver
- `ip_vs`, `ip_vs_rr`, `ip_vs_wrr`, `ip_vs_sh` — optional IPVS mode
Modules that fail to load are logged as warnings (may be built into the kernel).
### Stage 40 — Sysctl (`40-sysctl.sh`)
Applies kernel parameters from `/etc/sysctl.d/k8s.conf`:
- `net.bridge.bridge-nf-call-iptables = 1` — K8s requirement
- `net.ipv4.ip_forward = 1` — pod-to-pod routing
- `fs.inotify.max_user_watches = 524288` — kubelet/containerd watchers
- `net.netfilter.nf_conntrack_max = 131072` — service connection tracking
- `vm.swappiness = 0` — no swap (K8s requirement)
### Stage 50 — Network (`50-network.sh`)
Priority order:
1. **Saved config**`/mnt/data/network/interfaces.sh` (from previous boot)
2. **Cloud-init** — parsed from `cloud-init.yaml` (Phase 2: Go parser)
3. **DHCP fallback**`udhcpc` on first non-virtual interface
Brings up loopback, finds the first physical interface (skipping lo, docker, veth, br, cni), and runs DHCP.
### Stage 60 — Hostname (`60-hostname.sh`)
Priority order:
1. Saved hostname from data partition
2. Generated from MAC address of primary interface (`kubesolo-XXXXXX`)
Writes to `/etc/hostname` and appends to `/etc/hosts`.
### Stage 70 — Clock (`70-clock.sh`)
Best-effort time synchronization:
1. Try `hwclock -s` (hardware clock)
2. Try NTP in background (non-blocking) via `ntpd` or `ntpdate`
3. Log warning if no time source available
Non-blocking because NTP failure shouldn't prevent boot.
### Stage 80 — Containerd (`80-containerd.sh`)
Ensures containerd prerequisites:
- Creates `/run/containerd`, `/var/lib/containerd`
- Creates CNI directories (`/etc/cni/net.d`, `/opt/cni/bin`)
- Loads custom containerd config if present
KubeSolo manages the actual containerd lifecycle internally.
### Stage 90 — KubeSolo (`90-kubesolo.sh`)
Final stage — **exec replaces the init process**:
1. Verifies `/usr/local/bin/kubesolo` exists
2. Builds command line: `--path /var/lib/kubesolo --local-storage true`
3. Adds hostname as extra SAN for API server certificate
4. Appends any extra flags from boot params or config file
5. `exec kubesolo $ARGS` — KubeSolo becomes PID 1
After this, KubeSolo starts containerd, kubelet, API server, and all K8s components. The node should reach Ready status within 60-120 seconds.
## Failure Handling
If any stage returns non-zero, `/sbin/init` calls `emergency_shell()` which:
1. Logs the failure to serial console
2. Drops to `/bin/sh` for debugging
3. User can type `exit` to retry the boot sequence
If `kubesolo.shell` is passed as a boot parameter, the system drops to shell immediately after Stage 10 (cmdline parsing).
## Debugging
### Serial Console
All init stages log to stderr with the prefix `[kubesolo-init]`. Boot with
`console=ttyS0,115200n8` (default in debug mode) to see output on serial.
### Boot Markers
Test scripts look for these markers in the serial log:
- `[kubesolo-init] [OK] Stage 90-kubesolo.sh complete` — full boot success
- `[kubesolo-init] [ERROR]` — stage failure
### Emergency Shell
From the emergency shell:
```sh
dmesg | tail -50 # Kernel messages
cat /proc/cmdline # Boot parameters
cat /proc/mounts # Current mounts
blkid # Block devices and labels
ip addr # Network interfaces
ls /usr/lib/kubesolo-os/init.d/ # Available init stages
```