Files
kubesolo-os/init/init.sh
Adolfo Delorenzo 39732488ef feat: custom kernel build + boot fixes for working container runtime
Build a custom Tiny Core 17.0 kernel (6.18.2) with missing configs
that the stock kernel lacks for container workloads:
- CONFIG_CGROUP_BPF=y (cgroup v2 device control via BPF)
- CONFIG_DEVTMPFS=y (auto-create /dev device nodes)
- CONFIG_DEVTMPFS_MOUNT=y (auto-mount devtmpfs)
- CONFIG_MEMCG=y (memory cgroup controller for memory.max)
- CONFIG_CFS_BANDWIDTH=y (CPU bandwidth throttling for cpu.max)

Also strips unnecessary subsystems (sound, GPU, wireless, Bluetooth,
KVM, etc.) for minimal footprint on a headless K8s edge appliance.

Init system fixes for successful boot-to-running-pods:
- Add switch_root in init.sh to escape initramfs (runc pivot_root)
- Add mountpoint guards in 00-early-mount.sh (skip if already mounted)
- Create essential device nodes after switch_root (kmsg, console, etc.)
- Enable cgroup v2 controller delegation with init process isolation
- Mount BPF filesystem for cgroup v2 device control
- Add mknod fallback from sysfs in 20-persistent-mount.sh for /dev/vda
- Move KubeSolo binary to /usr/bin (avoid /usr/local bind mount hiding)
- Generate /etc/machine-id in 60-hostname.sh (kubelet requires it)
- Pre-initialize iptables tables before kube-proxy starts
- Add nft_reject, nft_fib, xt_nfacct to kernel modules list

Build system changes:
- New build-kernel.sh script for custom kernel compilation
- Dockerfile.builder adds kernel build deps (flex, bison, libelf, etc.)
- Selective kernel module install (only modules.list + transitive deps)
- Install iptables-nft (xtables-nft-multi) + shared libs in rootfs

Tested: ISO boots in QEMU, node reaches Ready in ~35s, CoreDNS and
local-path-provisioner pods start and run successfully.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 23:13:31 -06:00

121 lines
4.0 KiB
Bash
Executable File

#!/bin/sh
# /sbin/init — KubeSolo OS init system
# POSIX sh compatible (BusyBox ash)
#
# Boot stages are sourced from /usr/lib/kubesolo-os/init.d/ in numeric order.
# Each stage file must be a valid POSIX sh script.
# If any mandatory stage fails, the system drops to an emergency shell.
#
# Boot parameters (from kernel command line):
# kubesolo.data=<device> Persistent data partition (required)
# kubesolo.debug Enable verbose logging
# kubesolo.shell Drop to emergency shell immediately
# kubesolo.nopersist Run without persistent storage (RAM only)
# kubesolo.cloudinit=<path> Path to cloud-init config
# kubesolo.flags=<flags> Extra flags for KubeSolo binary
set -e
# --- Switch root: escape initramfs so runc pivot_root works ---
# The kernel boots into an initramfs (rootfs), which is a special mount that
# doesn't support pivot_root. Container runtimes (runc) need pivot_root to
# set up container root filesystems. To fix this, we copy the rootfs to a
# tmpfs and switch_root to it. The sentinel file prevents infinite loops.
if [ ! -f /etc/.switched_root ]; then
mount -t proc proc /proc 2>/dev/null || true
mount -t sysfs sysfs /sys 2>/dev/null || true
mount -t devtmpfs devtmpfs /dev 2>/dev/null || true
mkdir -p /mnt/newroot
mount -t tmpfs -o size=400M,mode=755 tmpfs /mnt/newroot
echo "[init] Copying rootfs to tmpfs..." >&2
# Copy each top-level directory explicitly (BusyBox cp -ax on rootfs is broken)
for d in bin sbin usr lib lib64 etc var opt; do
[ -d "/$d" ] && cp -a "/$d" /mnt/newroot/ 2>/dev/null || true
done
# Recreate mount point and special directories
mkdir -p /mnt/newroot/proc /mnt/newroot/sys /mnt/newroot/dev
mkdir -p /mnt/newroot/run /mnt/newroot/tmp /mnt/newroot/mnt
touch /mnt/newroot/etc/.switched_root
mount --move /proc /mnt/newroot/proc
mount --move /sys /mnt/newroot/sys
mount --move /dev /mnt/newroot/dev
echo "[init] Switching root..." >&2
exec switch_root /mnt/newroot /sbin/init
fi
# --- PATH setup ---
# Ensure /usr/local paths are in PATH (iptables, KubeSolo, etc.)
export PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
# iptables shared libraries live in /usr/local/lib
export LD_LIBRARY_PATH="/usr/local/lib${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}"
# --- Constants ---
INIT_LIB="/usr/lib/kubesolo-os"
INIT_STAGES="/usr/lib/kubesolo-os/init.d"
LOG_PREFIX="[kubesolo-init]"
DATA_MOUNT="/mnt/data"
# --- Parsed boot parameters (populated by 10-parse-cmdline.sh) ---
export KUBESOLO_DATA_DEV=""
export KUBESOLO_DEBUG=""
export KUBESOLO_SHELL=""
export KUBESOLO_NOPERSIST=""
export KUBESOLO_CLOUDINIT=""
export KUBESOLO_EXTRA_FLAGS=""
# --- Logging ---
log() {
echo "$LOG_PREFIX $*" >&2
}
log_ok() {
echo "$LOG_PREFIX [OK] $*" >&2
}
log_err() {
echo "$LOG_PREFIX [ERROR] $*" >&2
}
log_warn() {
echo "$LOG_PREFIX [WARN] $*" >&2
}
# --- Emergency shell ---
emergency_shell() {
log_err "Boot failed: $*"
log_err "Dropping to emergency shell. Type 'exit' to retry boot."
exec /bin/sh
}
# --- Main boot sequence ---
log "KubeSolo OS v$(cat /etc/kubesolo-os-version 2>/dev/null || echo 'dev') starting..."
# Source shared functions
if [ -f "$INIT_LIB/functions.sh" ]; then
. "$INIT_LIB/functions.sh"
fi
# Run init stages in order
for stage in "$INIT_STAGES"/*.sh; do
[ -f "$stage" ] || continue
stage_name="$(basename "$stage")"
log "Running stage: $stage_name"
if ! . "$stage"; then
emergency_shell "Stage $stage_name failed"
fi
# Check for early shell request (parsed in 10-parse-cmdline.sh)
if [ "$KUBESOLO_SHELL" = "1" ] && [ "$stage_name" = "10-parse-cmdline.sh" ]; then
log "Emergency shell requested via boot parameter"
exec /bin/sh
fi
log_ok "Stage $stage_name complete"
done
# If we get here, all stages ran but KubeSolo should have exec'd.
# This means 90-kubesolo.sh didn't exec (shouldn't happen).
emergency_shell "Init completed without exec'ing KubeSolo — this is a bug"