feat: custom kernel build + boot fixes for working container runtime

Build a custom Tiny Core 17.0 kernel (6.18.2) with missing configs
that the stock kernel lacks for container workloads:
- CONFIG_CGROUP_BPF=y (cgroup v2 device control via BPF)
- CONFIG_DEVTMPFS=y (auto-create /dev device nodes)
- CONFIG_DEVTMPFS_MOUNT=y (auto-mount devtmpfs)
- CONFIG_MEMCG=y (memory cgroup controller for memory.max)
- CONFIG_CFS_BANDWIDTH=y (CPU bandwidth throttling for cpu.max)

Also strips unnecessary subsystems (sound, GPU, wireless, Bluetooth,
KVM, etc.) for minimal footprint on a headless K8s edge appliance.

Init system fixes for successful boot-to-running-pods:
- Add switch_root in init.sh to escape initramfs (runc pivot_root)
- Add mountpoint guards in 00-early-mount.sh (skip if already mounted)
- Create essential device nodes after switch_root (kmsg, console, etc.)
- Enable cgroup v2 controller delegation with init process isolation
- Mount BPF filesystem for cgroup v2 device control
- Add mknod fallback from sysfs in 20-persistent-mount.sh for /dev/vda
- Move KubeSolo binary to /usr/bin (avoid /usr/local bind mount hiding)
- Generate /etc/machine-id in 60-hostname.sh (kubelet requires it)
- Pre-initialize iptables tables before kube-proxy starts
- Add nft_reject, nft_fib, xt_nfacct to kernel modules list

Build system changes:
- New build-kernel.sh script for custom kernel compilation
- Dockerfile.builder adds kernel build deps (flex, bison, libelf, etc.)
- Selective kernel module install (only modules.list + transitive deps)
- Install iptables-nft (xtables-nft-multi) + shared libs in rootfs

Tested: ISO boots in QEMU, node reaches Ready in ~35s, CoreDNS and
local-path-provisioner pods start and run successfully.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-11 23:13:31 -06:00
parent 456aa8eb5b
commit 39732488ef
13 changed files with 794 additions and 77 deletions

View File

@@ -29,11 +29,16 @@ wait_for_file() {
return 1
}
# Get IP address of an interface (POSIX-safe, no grep -P)
# Get IP address of an interface (BusyBox-safe: prefer ifconfig, fall back to ip)
get_iface_ip() {
iface="$1"
ip -4 addr show "$iface" 2>/dev/null | \
sed -n 's/.*inet \([0-9.]*\).*/\1/p' | head -1
if command -v ifconfig >/dev/null 2>&1; then
ifconfig "$iface" 2>/dev/null | \
sed -n 's/.*inet addr:\([0-9.]*\).*/\1/p;s/.*inet \([0-9.]*\).*/\1/p' | head -1
elif command -v ip >/dev/null 2>&1; then
ip -4 addr show "$iface" 2>/dev/null | \
sed -n 's/.*inet \([0-9.]*\).*/\1/p' | head -1
fi
}
# Check if running in a VM (useful for adjusting timeouts)