feat: custom kernel build + boot fixes for working container runtime

Build a custom Tiny Core 17.0 kernel (6.18.2) with missing configs
that the stock kernel lacks for container workloads:
- CONFIG_CGROUP_BPF=y (cgroup v2 device control via BPF)
- CONFIG_DEVTMPFS=y (auto-create /dev device nodes)
- CONFIG_DEVTMPFS_MOUNT=y (auto-mount devtmpfs)
- CONFIG_MEMCG=y (memory cgroup controller for memory.max)
- CONFIG_CFS_BANDWIDTH=y (CPU bandwidth throttling for cpu.max)

Also strips unnecessary subsystems (sound, GPU, wireless, Bluetooth,
KVM, etc.) for minimal footprint on a headless K8s edge appliance.

Init system fixes for successful boot-to-running-pods:
- Add switch_root in init.sh to escape initramfs (runc pivot_root)
- Add mountpoint guards in 00-early-mount.sh (skip if already mounted)
- Create essential device nodes after switch_root (kmsg, console, etc.)
- Enable cgroup v2 controller delegation with init process isolation
- Mount BPF filesystem for cgroup v2 device control
- Add mknod fallback from sysfs in 20-persistent-mount.sh for /dev/vda
- Move KubeSolo binary to /usr/bin (avoid /usr/local bind mount hiding)
- Generate /etc/machine-id in 60-hostname.sh (kubelet requires it)
- Pre-initialize iptables tables before kube-proxy starts
- Add nft_reject, nft_fib, xt_nfacct to kernel modules list

Build system changes:
- New build-kernel.sh script for custom kernel compilation
- Dockerfile.builder adds kernel build deps (flex, bison, libelf, etc.)
- Selective kernel module install (only modules.list + transitive deps)
- Install iptables-nft (xtables-nft-multi) + shared libs in rootfs

Tested: ISO boots in QEMU, node reaches Ready in ~35s, CoreDNS and
local-path-provisioner pods start and run successfully.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-11 23:13:31 -06:00
parent 456aa8eb5b
commit 39732488ef
13 changed files with 794 additions and 77 deletions

View File

@@ -16,6 +16,39 @@
set -e
# --- Switch root: escape initramfs so runc pivot_root works ---
# The kernel boots into an initramfs (rootfs), which is a special mount that
# doesn't support pivot_root. Container runtimes (runc) need pivot_root to
# set up container root filesystems. To fix this, we copy the rootfs to a
# tmpfs and switch_root to it. The sentinel file prevents infinite loops.
if [ ! -f /etc/.switched_root ]; then
mount -t proc proc /proc 2>/dev/null || true
mount -t sysfs sysfs /sys 2>/dev/null || true
mount -t devtmpfs devtmpfs /dev 2>/dev/null || true
mkdir -p /mnt/newroot
mount -t tmpfs -o size=400M,mode=755 tmpfs /mnt/newroot
echo "[init] Copying rootfs to tmpfs..." >&2
# Copy each top-level directory explicitly (BusyBox cp -ax on rootfs is broken)
for d in bin sbin usr lib lib64 etc var opt; do
[ -d "/$d" ] && cp -a "/$d" /mnt/newroot/ 2>/dev/null || true
done
# Recreate mount point and special directories
mkdir -p /mnt/newroot/proc /mnt/newroot/sys /mnt/newroot/dev
mkdir -p /mnt/newroot/run /mnt/newroot/tmp /mnt/newroot/mnt
touch /mnt/newroot/etc/.switched_root
mount --move /proc /mnt/newroot/proc
mount --move /sys /mnt/newroot/sys
mount --move /dev /mnt/newroot/dev
echo "[init] Switching root..." >&2
exec switch_root /mnt/newroot /sbin/init
fi
# --- PATH setup ---
# Ensure /usr/local paths are in PATH (iptables, KubeSolo, etc.)
export PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
# iptables shared libraries live in /usr/local/lib
export LD_LIBRARY_PATH="/usr/local/lib${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}"
# --- Constants ---
INIT_LIB="/usr/lib/kubesolo-os"
INIT_STAGES="/usr/lib/kubesolo-os/init.d"