feat: initial Phase 1 PoC scaffolding for KubeSolo OS
Complete Phase 1 implementation of KubeSolo OS — an immutable, bootable Linux distribution built on Tiny Core Linux for running KubeSolo single-node Kubernetes. Build system: - Makefile with fetch, rootfs, initramfs, iso, disk-image targets - Dockerfile.builder for reproducible builds - Scripts to download Tiny Core, extract rootfs, inject KubeSolo, pack initramfs, and create bootable ISO/disk images Init system (10 POSIX sh stages): - Early mount (proc/sys/dev/cgroup2), cmdline parsing, persistent mount with bind-mounts, kernel module loading, sysctl, DHCP networking, hostname, clock sync, containerd prep, KubeSolo exec Shared libraries: - functions.sh (device wait, IP lookup, config helpers) - network.sh (static IP, config persistence, interface detection) - health.sh (containerd, API server, node readiness checks) - Emergency shell for boot failure debugging Testing: - QEMU boot test with serial log marker detection - K8s readiness test with kubectl verification - Persistence test (reboot + verify state survives) - Workload deployment test (nginx pod) - Local storage test (PVC + local-path provisioner) - Network policy test - Reusable run-vm.sh launcher Developer tools: - dev-vm.sh (interactive QEMU with port forwarding) - rebuild-initramfs.sh (fast iteration) - inject-ssh.sh (dropbear SSH for debugging) - extract-kernel-config.sh + kernel-audit.sh Documentation: - Full design document with architecture research - Boot flow documentation covering all 10 init stages - Cloud-init examples (DHCP, static IP, Portainer Edge, air-gapped) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
181
docs/boot-flow.md
Normal file
181
docs/boot-flow.md
Normal file
@@ -0,0 +1,181 @@
|
||||
# KubeSolo OS — Boot Flow
|
||||
|
||||
This document describes the boot sequence from power-on to a running Kubernetes node.
|
||||
|
||||
## Overview
|
||||
|
||||
```
|
||||
BIOS/UEFI → Bootloader (isolinux) → Linux Kernel → initramfs → /sbin/init
|
||||
→ Stage 00: Mount virtual filesystems
|
||||
→ Stage 10: Parse boot parameters
|
||||
→ Stage 20: Mount persistent storage
|
||||
→ Stage 30: Load kernel modules
|
||||
→ Stage 40: Apply sysctl settings
|
||||
→ Stage 50: Configure networking
|
||||
→ Stage 60: Set hostname
|
||||
→ Stage 70: Set system clock
|
||||
→ Stage 80: Prepare containerd prerequisites
|
||||
→ Stage 90: exec KubeSolo (becomes PID 1)
|
||||
```
|
||||
|
||||
## Stage Details
|
||||
|
||||
### Bootloader (isolinux/syslinux)
|
||||
|
||||
The ISO uses isolinux with several boot options:
|
||||
|
||||
| Label | Description |
|
||||
|-------|-------------|
|
||||
| `kubesolo` | Normal boot (default, 3s timeout) |
|
||||
| `kubesolo-debug` | Boot with verbose init logging + serial console |
|
||||
| `kubesolo-shell` | Drop to emergency shell immediately |
|
||||
| `kubesolo-nopersist` | Run fully in RAM, no persistent mount |
|
||||
|
||||
Kernel command line always includes `kubesolo.data=LABEL=KSOLODATA` to specify the persistent data partition.
|
||||
|
||||
### Stage 00 — Early Mount (`00-early-mount.sh`)
|
||||
|
||||
Mounts essential virtual filesystems before anything else can work:
|
||||
|
||||
- `/proc` — process information
|
||||
- `/sys` — sysfs (device/driver info)
|
||||
- `/dev` — devtmpfs (block devices)
|
||||
- `/tmp`, `/run` — tmpfs scratch space
|
||||
- `/dev/pts`, `/dev/shm` — pseudo-terminals, shared memory
|
||||
- `/sys/fs/cgroup` — cgroup v2 unified hierarchy (v1 fallback if unavailable)
|
||||
|
||||
### Stage 10 — Parse Cmdline (`10-parse-cmdline.sh`)
|
||||
|
||||
Reads `/proc/cmdline` and sets environment variables:
|
||||
|
||||
| Boot Parameter | Variable | Description |
|
||||
|---------------|----------|-------------|
|
||||
| `kubesolo.data=<dev>` | `KUBESOLO_DATA_DEV` | Block device for persistent data |
|
||||
| `kubesolo.debug` | `KUBESOLO_DEBUG` | Enables `set -x` for trace logging |
|
||||
| `kubesolo.shell` | `KUBESOLO_SHELL` | Drop to shell after this stage |
|
||||
| `kubesolo.nopersist` | `KUBESOLO_NOPERSIST` | Skip persistent mount |
|
||||
| `kubesolo.cloudinit=<path>` | `KUBESOLO_CLOUDINIT` | Cloud-init config file path |
|
||||
| `kubesolo.flags=<flags>` | `KUBESOLO_EXTRA_FLAGS` | Extra flags for KubeSolo binary |
|
||||
|
||||
If `kubesolo.data` is not specified, auto-detects a partition with label `KSOLODATA` via `blkid`. If none found, falls back to RAM-only mode.
|
||||
|
||||
### Stage 20 — Persistent Mount (`20-persistent-mount.sh`)
|
||||
|
||||
If not in RAM-only mode:
|
||||
|
||||
1. Waits up to 30s for the data device to appear (handles slow USB, virtio)
|
||||
2. Mounts the ext4 data partition at `/mnt/data`
|
||||
3. Creates directory structure on first boot
|
||||
4. Bind-mounts persistent directories:
|
||||
|
||||
| Source (data partition) | Mount Point | Content |
|
||||
|------------------------|-------------|---------|
|
||||
| `/mnt/data/kubesolo` | `/var/lib/kubesolo` | K8s state, certs, SQLite DB |
|
||||
| `/mnt/data/containerd` | `/var/lib/containerd` | Container images + layers |
|
||||
| `/mnt/data/etc-kubesolo` | `/etc/kubesolo` | Node configuration |
|
||||
| `/mnt/data/log` | `/var/log` | System + K8s logs |
|
||||
| `/mnt/data/usr-local` | `/usr/local` | User binaries |
|
||||
|
||||
In RAM-only mode, these directories are backed by tmpfs and lost on reboot.
|
||||
|
||||
### Stage 30 — Kernel Modules (`30-kernel-modules.sh`)
|
||||
|
||||
Loads kernel modules listed in `/usr/lib/kubesolo-os/modules.list`:
|
||||
|
||||
- `br_netfilter`, `bridge`, `veth`, `vxlan` — K8s pod networking
|
||||
- `ip_tables`, `iptable_nat`, `nf_nat`, `nf_conntrack` — service routing
|
||||
- `overlay` — containerd storage driver
|
||||
- `ip_vs`, `ip_vs_rr`, `ip_vs_wrr`, `ip_vs_sh` — optional IPVS mode
|
||||
|
||||
Modules that fail to load are logged as warnings (may be built into the kernel).
|
||||
|
||||
### Stage 40 — Sysctl (`40-sysctl.sh`)
|
||||
|
||||
Applies kernel parameters from `/etc/sysctl.d/k8s.conf`:
|
||||
|
||||
- `net.bridge.bridge-nf-call-iptables = 1` — K8s requirement
|
||||
- `net.ipv4.ip_forward = 1` — pod-to-pod routing
|
||||
- `fs.inotify.max_user_watches = 524288` — kubelet/containerd watchers
|
||||
- `net.netfilter.nf_conntrack_max = 131072` — service connection tracking
|
||||
- `vm.swappiness = 0` — no swap (K8s requirement)
|
||||
|
||||
### Stage 50 — Network (`50-network.sh`)
|
||||
|
||||
Priority order:
|
||||
1. **Saved config** — `/mnt/data/network/interfaces.sh` (from previous boot)
|
||||
2. **Cloud-init** — parsed from `cloud-init.yaml` (Phase 2: Go parser)
|
||||
3. **DHCP fallback** — `udhcpc` on first non-virtual interface
|
||||
|
||||
Brings up loopback, finds the first physical interface (skipping lo, docker, veth, br, cni), and runs DHCP.
|
||||
|
||||
### Stage 60 — Hostname (`60-hostname.sh`)
|
||||
|
||||
Priority order:
|
||||
1. Saved hostname from data partition
|
||||
2. Generated from MAC address of primary interface (`kubesolo-XXXXXX`)
|
||||
|
||||
Writes to `/etc/hostname` and appends to `/etc/hosts`.
|
||||
|
||||
### Stage 70 — Clock (`70-clock.sh`)
|
||||
|
||||
Best-effort time synchronization:
|
||||
1. Try `hwclock -s` (hardware clock)
|
||||
2. Try NTP in background (non-blocking) via `ntpd` or `ntpdate`
|
||||
3. Log warning if no time source available
|
||||
|
||||
Non-blocking because NTP failure shouldn't prevent boot.
|
||||
|
||||
### Stage 80 — Containerd (`80-containerd.sh`)
|
||||
|
||||
Ensures containerd prerequisites:
|
||||
- Creates `/run/containerd`, `/var/lib/containerd`
|
||||
- Creates CNI directories (`/etc/cni/net.d`, `/opt/cni/bin`)
|
||||
- Loads custom containerd config if present
|
||||
|
||||
KubeSolo manages the actual containerd lifecycle internally.
|
||||
|
||||
### Stage 90 — KubeSolo (`90-kubesolo.sh`)
|
||||
|
||||
Final stage — **exec replaces the init process**:
|
||||
|
||||
1. Verifies `/usr/local/bin/kubesolo` exists
|
||||
2. Builds command line: `--path /var/lib/kubesolo --local-storage true`
|
||||
3. Adds hostname as extra SAN for API server certificate
|
||||
4. Appends any extra flags from boot params or config file
|
||||
5. `exec kubesolo $ARGS` — KubeSolo becomes PID 1
|
||||
|
||||
After this, KubeSolo starts containerd, kubelet, API server, and all K8s components. The node should reach Ready status within 60-120 seconds.
|
||||
|
||||
## Failure Handling
|
||||
|
||||
If any stage returns non-zero, `/sbin/init` calls `emergency_shell()` which:
|
||||
1. Logs the failure to serial console
|
||||
2. Drops to `/bin/sh` for debugging
|
||||
3. User can type `exit` to retry the boot sequence
|
||||
|
||||
If `kubesolo.shell` is passed as a boot parameter, the system drops to shell immediately after Stage 10 (cmdline parsing).
|
||||
|
||||
## Debugging
|
||||
|
||||
### Serial Console
|
||||
|
||||
All init stages log to stderr with the prefix `[kubesolo-init]`. Boot with
|
||||
`console=ttyS0,115200n8` (default in debug mode) to see output on serial.
|
||||
|
||||
### Boot Markers
|
||||
|
||||
Test scripts look for these markers in the serial log:
|
||||
- `[kubesolo-init] [OK] Stage 90-kubesolo.sh complete` — full boot success
|
||||
- `[kubesolo-init] [ERROR]` — stage failure
|
||||
|
||||
### Emergency Shell
|
||||
|
||||
From the emergency shell:
|
||||
```sh
|
||||
dmesg | tail -50 # Kernel messages
|
||||
cat /proc/cmdline # Boot parameters
|
||||
cat /proc/mounts # Current mounts
|
||||
blkid # Block devices and labels
|
||||
ip addr # Network interfaces
|
||||
ls /usr/lib/kubesolo-os/init.d/ # Available init stages
|
||||
```
|
||||
Reference in New Issue
Block a user