Files

Adolfo Delorenzo e372df578b feat: initial Phase 1 PoC scaffolding for KubeSolo OS

Complete Phase 1 implementation of KubeSolo OS — an immutable, bootable
Linux distribution built on Tiny Core Linux for running KubeSolo
single-node Kubernetes.

Build system:
- Makefile with fetch, rootfs, initramfs, iso, disk-image targets
- Dockerfile.builder for reproducible builds
- Scripts to download Tiny Core, extract rootfs, inject KubeSolo,
  pack initramfs, and create bootable ISO/disk images

Init system (10 POSIX sh stages):
- Early mount (proc/sys/dev/cgroup2), cmdline parsing, persistent
  mount with bind-mounts, kernel module loading, sysctl, DHCP
  networking, hostname, clock sync, containerd prep, KubeSolo exec

Shared libraries:
- functions.sh (device wait, IP lookup, config helpers)
- network.sh (static IP, config persistence, interface detection)
- health.sh (containerd, API server, node readiness checks)
- Emergency shell for boot failure debugging

Testing:
- QEMU boot test with serial log marker detection
- K8s readiness test with kubectl verification
- Persistence test (reboot + verify state survives)
- Workload deployment test (nginx pod)
- Local storage test (PVC + local-path provisioner)
- Network policy test
- Reusable run-vm.sh launcher

Developer tools:
- dev-vm.sh (interactive QEMU with port forwarding)
- rebuild-initramfs.sh (fast iteration)
- inject-ssh.sh (dropbear SSH for debugging)
- extract-kernel-config.sh + kernel-audit.sh

Documentation:
- Full design document with architecture research
- Boot flow documentation covering all 10 init stages
- Cloud-init examples (DHCP, static IP, Portainer Edge, air-gapped)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-11 10:18:42 -06:00

6.7 KiB

Raw Blame History

KubeSolo OS — Boot Flow

This document describes the boot sequence from power-on to a running Kubernetes node.

Overview

BIOS/UEFI → Bootloader (isolinux) → Linux Kernel → initramfs → /sbin/init
  → Stage 00: Mount virtual filesystems
  → Stage 10: Parse boot parameters
  → Stage 20: Mount persistent storage
  → Stage 30: Load kernel modules
  → Stage 40: Apply sysctl settings
  → Stage 50: Configure networking
  → Stage 60: Set hostname
  → Stage 70: Set system clock
  → Stage 80: Prepare containerd prerequisites
  → Stage 90: exec KubeSolo (becomes PID 1)

Stage Details

Bootloader (isolinux/syslinux)

The ISO uses isolinux with several boot options:

Label	Description
`kubesolo`	Normal boot (default, 3s timeout)
`kubesolo-debug`	Boot with verbose init logging + serial console
`kubesolo-shell`	Drop to emergency shell immediately
`kubesolo-nopersist`	Run fully in RAM, no persistent mount

Kernel command line always includes kubesolo.data=LABEL=KSOLODATA to specify the persistent data partition.

Stage 00 — Early Mount (`00-early-mount.sh`)

Mounts essential virtual filesystems before anything else can work:

/proc — process information
/sys — sysfs (device/driver info)
/dev — devtmpfs (block devices)
/tmp, /run — tmpfs scratch space
/dev/pts, /dev/shm — pseudo-terminals, shared memory
/sys/fs/cgroup — cgroup v2 unified hierarchy (v1 fallback if unavailable)

Stage 10 — Parse Cmdline (`10-parse-cmdline.sh`)

Reads /proc/cmdline and sets environment variables:

Boot Parameter	Variable	Description
`kubesolo.data=<dev>`	`KUBESOLO_DATA_DEV`	Block device for persistent data
`kubesolo.debug`	`KUBESOLO_DEBUG`	Enables `set -x` for trace logging
`kubesolo.shell`	`KUBESOLO_SHELL`	Drop to shell after this stage
`kubesolo.nopersist`	`KUBESOLO_NOPERSIST`	Skip persistent mount
`kubesolo.cloudinit=<path>`	`KUBESOLO_CLOUDINIT`	Cloud-init config file path
`kubesolo.flags=<flags>`	`KUBESOLO_EXTRA_FLAGS`	Extra flags for KubeSolo binary

If kubesolo.data is not specified, auto-detects a partition with label KSOLODATA via blkid. If none found, falls back to RAM-only mode.

Stage 20 — Persistent Mount (`20-persistent-mount.sh`)

If not in RAM-only mode:

Waits up to 30s for the data device to appear (handles slow USB, virtio)
Mounts the ext4 data partition at /mnt/data
Creates directory structure on first boot
Bind-mounts persistent directories:

Source (data partition)	Mount Point	Content
`/mnt/data/kubesolo`	`/var/lib/kubesolo`	K8s state, certs, SQLite DB
`/mnt/data/containerd`	`/var/lib/containerd`	Container images + layers
`/mnt/data/etc-kubesolo`	`/etc/kubesolo`	Node configuration
`/mnt/data/log`	`/var/log`	System + K8s logs
`/mnt/data/usr-local`	`/usr/local`	User binaries

In RAM-only mode, these directories are backed by tmpfs and lost on reboot.

Stage 30 — Kernel Modules (`30-kernel-modules.sh`)

Loads kernel modules listed in /usr/lib/kubesolo-os/modules.list:

br_netfilter, bridge, veth, vxlan — K8s pod networking
ip_tables, iptable_nat, nf_nat, nf_conntrack — service routing
overlay — containerd storage driver
ip_vs, ip_vs_rr, ip_vs_wrr, ip_vs_sh — optional IPVS mode

Modules that fail to load are logged as warnings (may be built into the kernel).

Stage 40 — Sysctl (`40-sysctl.sh`)

Applies kernel parameters from /etc/sysctl.d/k8s.conf:

net.bridge.bridge-nf-call-iptables = 1 — K8s requirement
net.ipv4.ip_forward = 1 — pod-to-pod routing
fs.inotify.max_user_watches = 524288 — kubelet/containerd watchers
net.netfilter.nf_conntrack_max = 131072 — service connection tracking
vm.swappiness = 0 — no swap (K8s requirement)

Stage 50 — Network (`50-network.sh`)

Priority order:

Saved config — /mnt/data/network/interfaces.sh (from previous boot)
Cloud-init — parsed from cloud-init.yaml (Phase 2: Go parser)
DHCP fallback — udhcpc on first non-virtual interface

Brings up loopback, finds the first physical interface (skipping lo, docker, veth, br, cni), and runs DHCP.

Stage 60 — Hostname (`60-hostname.sh`)

Priority order:

Saved hostname from data partition
Generated from MAC address of primary interface (kubesolo-XXXXXX)

Writes to /etc/hostname and appends to /etc/hosts.

Stage 70 — Clock (`70-clock.sh`)

Best-effort time synchronization:

Try hwclock -s (hardware clock)
Try NTP in background (non-blocking) via ntpd or ntpdate
Log warning if no time source available

Non-blocking because NTP failure shouldn't prevent boot.

Stage 80 — Containerd (`80-containerd.sh`)

Ensures containerd prerequisites:

Creates /run/containerd, /var/lib/containerd
Creates CNI directories (/etc/cni/net.d, /opt/cni/bin)
Loads custom containerd config if present

KubeSolo manages the actual containerd lifecycle internally.

Stage 90 — KubeSolo (`90-kubesolo.sh`)

Final stage — exec replaces the init process:

Verifies /usr/local/bin/kubesolo exists
Builds command line: --path /var/lib/kubesolo --local-storage true
Adds hostname as extra SAN for API server certificate
Appends any extra flags from boot params or config file
exec kubesolo $ARGS — KubeSolo becomes PID 1

After this, KubeSolo starts containerd, kubelet, API server, and all K8s components. The node should reach Ready status within 60-120 seconds.

Failure Handling

If any stage returns non-zero, /sbin/init calls emergency_shell() which:

Logs the failure to serial console
Drops to /bin/sh for debugging
User can type exit to retry the boot sequence

If kubesolo.shell is passed as a boot parameter, the system drops to shell immediately after Stage 10 (cmdline parsing).

Debugging

Serial Console

All init stages log to stderr with the prefix [kubesolo-init]. Boot with console=ttyS0,115200n8 (default in debug mode) to see output on serial.

Boot Markers

Test scripts look for these markers in the serial log:

[kubesolo-init] [OK] Stage 90-kubesolo.sh complete — full boot success
[kubesolo-init] [ERROR] — stage failure

Emergency Shell

From the emergency shell:

dmesg | tail -50       # Kernel messages
cat /proc/cmdline      # Boot parameters
cat /proc/mounts       # Current mounts
blkid                  # Block devices and labels
ip addr                # Network interfaces
ls /usr/lib/kubesolo-os/init.d/   # Available init stages

6.7 KiB Raw Blame History

KubeSolo OS — Boot Flow

Overview

Stage Details

Bootloader (isolinux/syslinux)

Stage 00 — Early Mount (00-early-mount.sh)

Stage 10 — Parse Cmdline (10-parse-cmdline.sh)

Stage 20 — Persistent Mount (20-persistent-mount.sh)

Stage 30 — Kernel Modules (30-kernel-modules.sh)

Stage 40 — Sysctl (40-sysctl.sh)

Stage 50 — Network (50-network.sh)

Stage 60 — Hostname (60-hostname.sh)

Stage 70 — Clock (70-clock.sh)

Stage 80 — Containerd (80-containerd.sh)

Stage 90 — KubeSolo (90-kubesolo.sh)