feat: custom kernel build + boot fixes for working container runtime
Build a custom Tiny Core 17.0 kernel (6.18.2) with missing configs that the stock kernel lacks for container workloads: - CONFIG_CGROUP_BPF=y (cgroup v2 device control via BPF) - CONFIG_DEVTMPFS=y (auto-create /dev device nodes) - CONFIG_DEVTMPFS_MOUNT=y (auto-mount devtmpfs) - CONFIG_MEMCG=y (memory cgroup controller for memory.max) - CONFIG_CFS_BANDWIDTH=y (CPU bandwidth throttling for cpu.max) Also strips unnecessary subsystems (sound, GPU, wireless, Bluetooth, KVM, etc.) for minimal footprint on a headless K8s edge appliance. Init system fixes for successful boot-to-running-pods: - Add switch_root in init.sh to escape initramfs (runc pivot_root) - Add mountpoint guards in 00-early-mount.sh (skip if already mounted) - Create essential device nodes after switch_root (kmsg, console, etc.) - Enable cgroup v2 controller delegation with init process isolation - Mount BPF filesystem for cgroup v2 device control - Add mknod fallback from sysfs in 20-persistent-mount.sh for /dev/vda - Move KubeSolo binary to /usr/bin (avoid /usr/local bind mount hiding) - Generate /etc/machine-id in 60-hostname.sh (kubelet requires it) - Pre-initialize iptables tables before kube-proxy starts - Add nft_reject, nft_fib, xt_nfacct to kernel modules list Build system changes: - New build-kernel.sh script for custom kernel compilation - Dockerfile.builder adds kernel build deps (flex, bison, libelf, etc.) - Selective kernel module install (only modules.list + transitive deps) - Install iptables-nft (xtables-nft-multi) + shared libs in rootfs Tested: ISO boots in QEMU, node reaches Ready in ~35s, CoreDNS and local-path-provisioner pods start and run successfully. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
33
init/init.sh
33
init/init.sh
@@ -16,6 +16,39 @@
|
||||
|
||||
set -e
|
||||
|
||||
# --- Switch root: escape initramfs so runc pivot_root works ---
|
||||
# The kernel boots into an initramfs (rootfs), which is a special mount that
|
||||
# doesn't support pivot_root. Container runtimes (runc) need pivot_root to
|
||||
# set up container root filesystems. To fix this, we copy the rootfs to a
|
||||
# tmpfs and switch_root to it. The sentinel file prevents infinite loops.
|
||||
if [ ! -f /etc/.switched_root ]; then
|
||||
mount -t proc proc /proc 2>/dev/null || true
|
||||
mount -t sysfs sysfs /sys 2>/dev/null || true
|
||||
mount -t devtmpfs devtmpfs /dev 2>/dev/null || true
|
||||
mkdir -p /mnt/newroot
|
||||
mount -t tmpfs -o size=400M,mode=755 tmpfs /mnt/newroot
|
||||
echo "[init] Copying rootfs to tmpfs..." >&2
|
||||
# Copy each top-level directory explicitly (BusyBox cp -ax on rootfs is broken)
|
||||
for d in bin sbin usr lib lib64 etc var opt; do
|
||||
[ -d "/$d" ] && cp -a "/$d" /mnt/newroot/ 2>/dev/null || true
|
||||
done
|
||||
# Recreate mount point and special directories
|
||||
mkdir -p /mnt/newroot/proc /mnt/newroot/sys /mnt/newroot/dev
|
||||
mkdir -p /mnt/newroot/run /mnt/newroot/tmp /mnt/newroot/mnt
|
||||
touch /mnt/newroot/etc/.switched_root
|
||||
mount --move /proc /mnt/newroot/proc
|
||||
mount --move /sys /mnt/newroot/sys
|
||||
mount --move /dev /mnt/newroot/dev
|
||||
echo "[init] Switching root..." >&2
|
||||
exec switch_root /mnt/newroot /sbin/init
|
||||
fi
|
||||
|
||||
# --- PATH setup ---
|
||||
# Ensure /usr/local paths are in PATH (iptables, KubeSolo, etc.)
|
||||
export PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
|
||||
# iptables shared libraries live in /usr/local/lib
|
||||
export LD_LIBRARY_PATH="/usr/local/lib${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}"
|
||||
|
||||
# --- Constants ---
|
||||
INIT_LIB="/usr/lib/kubesolo-os"
|
||||
INIT_STAGES="/usr/lib/kubesolo-os/init.d"
|
||||
|
||||
Reference in New Issue
Block a user