Files
kubesolo-os/docs/boot-flow.md
Adolfo Delorenzo e372df578b feat: initial Phase 1 PoC scaffolding for KubeSolo OS
Complete Phase 1 implementation of KubeSolo OS — an immutable, bootable
Linux distribution built on Tiny Core Linux for running KubeSolo
single-node Kubernetes.

Build system:
- Makefile with fetch, rootfs, initramfs, iso, disk-image targets
- Dockerfile.builder for reproducible builds
- Scripts to download Tiny Core, extract rootfs, inject KubeSolo,
  pack initramfs, and create bootable ISO/disk images

Init system (10 POSIX sh stages):
- Early mount (proc/sys/dev/cgroup2), cmdline parsing, persistent
  mount with bind-mounts, kernel module loading, sysctl, DHCP
  networking, hostname, clock sync, containerd prep, KubeSolo exec

Shared libraries:
- functions.sh (device wait, IP lookup, config helpers)
- network.sh (static IP, config persistence, interface detection)
- health.sh (containerd, API server, node readiness checks)
- Emergency shell for boot failure debugging

Testing:
- QEMU boot test with serial log marker detection
- K8s readiness test with kubectl verification
- Persistence test (reboot + verify state survives)
- Workload deployment test (nginx pod)
- Local storage test (PVC + local-path provisioner)
- Network policy test
- Reusable run-vm.sh launcher

Developer tools:
- dev-vm.sh (interactive QEMU with port forwarding)
- rebuild-initramfs.sh (fast iteration)
- inject-ssh.sh (dropbear SSH for debugging)
- extract-kernel-config.sh + kernel-audit.sh

Documentation:
- Full design document with architecture research
- Boot flow documentation covering all 10 init stages
- Cloud-init examples (DHCP, static IP, Portainer Edge, air-gapped)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 10:18:42 -06:00

6.7 KiB

KubeSolo OS — Boot Flow

This document describes the boot sequence from power-on to a running Kubernetes node.

Overview

BIOS/UEFI → Bootloader (isolinux) → Linux Kernel → initramfs → /sbin/init
  → Stage 00: Mount virtual filesystems
  → Stage 10: Parse boot parameters
  → Stage 20: Mount persistent storage
  → Stage 30: Load kernel modules
  → Stage 40: Apply sysctl settings
  → Stage 50: Configure networking
  → Stage 60: Set hostname
  → Stage 70: Set system clock
  → Stage 80: Prepare containerd prerequisites
  → Stage 90: exec KubeSolo (becomes PID 1)

Stage Details

Bootloader (isolinux/syslinux)

The ISO uses isolinux with several boot options:

Label Description
kubesolo Normal boot (default, 3s timeout)
kubesolo-debug Boot with verbose init logging + serial console
kubesolo-shell Drop to emergency shell immediately
kubesolo-nopersist Run fully in RAM, no persistent mount

Kernel command line always includes kubesolo.data=LABEL=KSOLODATA to specify the persistent data partition.

Stage 00 — Early Mount (00-early-mount.sh)

Mounts essential virtual filesystems before anything else can work:

  • /proc — process information
  • /sys — sysfs (device/driver info)
  • /dev — devtmpfs (block devices)
  • /tmp, /run — tmpfs scratch space
  • /dev/pts, /dev/shm — pseudo-terminals, shared memory
  • /sys/fs/cgroup — cgroup v2 unified hierarchy (v1 fallback if unavailable)

Stage 10 — Parse Cmdline (10-parse-cmdline.sh)

Reads /proc/cmdline and sets environment variables:

Boot Parameter Variable Description
kubesolo.data=<dev> KUBESOLO_DATA_DEV Block device for persistent data
kubesolo.debug KUBESOLO_DEBUG Enables set -x for trace logging
kubesolo.shell KUBESOLO_SHELL Drop to shell after this stage
kubesolo.nopersist KUBESOLO_NOPERSIST Skip persistent mount
kubesolo.cloudinit=<path> KUBESOLO_CLOUDINIT Cloud-init config file path
kubesolo.flags=<flags> KUBESOLO_EXTRA_FLAGS Extra flags for KubeSolo binary

If kubesolo.data is not specified, auto-detects a partition with label KSOLODATA via blkid. If none found, falls back to RAM-only mode.

Stage 20 — Persistent Mount (20-persistent-mount.sh)

If not in RAM-only mode:

  1. Waits up to 30s for the data device to appear (handles slow USB, virtio)
  2. Mounts the ext4 data partition at /mnt/data
  3. Creates directory structure on first boot
  4. Bind-mounts persistent directories:
Source (data partition) Mount Point Content
/mnt/data/kubesolo /var/lib/kubesolo K8s state, certs, SQLite DB
/mnt/data/containerd /var/lib/containerd Container images + layers
/mnt/data/etc-kubesolo /etc/kubesolo Node configuration
/mnt/data/log /var/log System + K8s logs
/mnt/data/usr-local /usr/local User binaries

In RAM-only mode, these directories are backed by tmpfs and lost on reboot.

Stage 30 — Kernel Modules (30-kernel-modules.sh)

Loads kernel modules listed in /usr/lib/kubesolo-os/modules.list:

  • br_netfilter, bridge, veth, vxlan — K8s pod networking
  • ip_tables, iptable_nat, nf_nat, nf_conntrack — service routing
  • overlay — containerd storage driver
  • ip_vs, ip_vs_rr, ip_vs_wrr, ip_vs_sh — optional IPVS mode

Modules that fail to load are logged as warnings (may be built into the kernel).

Stage 40 — Sysctl (40-sysctl.sh)

Applies kernel parameters from /etc/sysctl.d/k8s.conf:

  • net.bridge.bridge-nf-call-iptables = 1 — K8s requirement
  • net.ipv4.ip_forward = 1 — pod-to-pod routing
  • fs.inotify.max_user_watches = 524288 — kubelet/containerd watchers
  • net.netfilter.nf_conntrack_max = 131072 — service connection tracking
  • vm.swappiness = 0 — no swap (K8s requirement)

Stage 50 — Network (50-network.sh)

Priority order:

  1. Saved config/mnt/data/network/interfaces.sh (from previous boot)
  2. Cloud-init — parsed from cloud-init.yaml (Phase 2: Go parser)
  3. DHCP fallbackudhcpc on first non-virtual interface

Brings up loopback, finds the first physical interface (skipping lo, docker, veth, br, cni), and runs DHCP.

Stage 60 — Hostname (60-hostname.sh)

Priority order:

  1. Saved hostname from data partition
  2. Generated from MAC address of primary interface (kubesolo-XXXXXX)

Writes to /etc/hostname and appends to /etc/hosts.

Stage 70 — Clock (70-clock.sh)

Best-effort time synchronization:

  1. Try hwclock -s (hardware clock)
  2. Try NTP in background (non-blocking) via ntpd or ntpdate
  3. Log warning if no time source available

Non-blocking because NTP failure shouldn't prevent boot.

Stage 80 — Containerd (80-containerd.sh)

Ensures containerd prerequisites:

  • Creates /run/containerd, /var/lib/containerd
  • Creates CNI directories (/etc/cni/net.d, /opt/cni/bin)
  • Loads custom containerd config if present

KubeSolo manages the actual containerd lifecycle internally.

Stage 90 — KubeSolo (90-kubesolo.sh)

Final stage — exec replaces the init process:

  1. Verifies /usr/local/bin/kubesolo exists
  2. Builds command line: --path /var/lib/kubesolo --local-storage true
  3. Adds hostname as extra SAN for API server certificate
  4. Appends any extra flags from boot params or config file
  5. exec kubesolo $ARGS — KubeSolo becomes PID 1

After this, KubeSolo starts containerd, kubelet, API server, and all K8s components. The node should reach Ready status within 60-120 seconds.

Failure Handling

If any stage returns non-zero, /sbin/init calls emergency_shell() which:

  1. Logs the failure to serial console
  2. Drops to /bin/sh for debugging
  3. User can type exit to retry the boot sequence

If kubesolo.shell is passed as a boot parameter, the system drops to shell immediately after Stage 10 (cmdline parsing).

Debugging

Serial Console

All init stages log to stderr with the prefix [kubesolo-init]. Boot with console=ttyS0,115200n8 (default in debug mode) to see output on serial.

Boot Markers

Test scripts look for these markers in the serial log:

  • [kubesolo-init] [OK] Stage 90-kubesolo.sh complete — full boot success
  • [kubesolo-init] [ERROR] — stage failure

Emergency Shell

From the emergency shell:

dmesg | tail -50       # Kernel messages
cat /proc/cmdline      # Boot parameters
cat /proc/mounts       # Current mounts
blkid                  # Block devices and labels
ip addr                # Network interfaces
ls /usr/lib/kubesolo-os/init.d/   # Available init stages