kubesolo-os

Author	SHA1	Message	Date
Adolfo Delorenzo	bc3300e7e7	fix(modules): strip inline comments in modules.list parser Some checks failed ARM64 Build / Build generic ARM64 disk image (push) Failing after 5s Details CI / Go Tests (push) Successful in 2m35s Details CI / Shellcheck (push) Successful in 1m23s Details CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Successful in 1m53s Details CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Successful in 1m47s Details `3bcf2e1` added nft_numgen / nft_hash / nft_limit / nft_log to both module lists but in a format the inject parser doesn't handle: nft_numgen # numgen random/inc mod N vmap — Service endpoint LB The parser's only comment skip is `case "$mod" in \#\|"") continue ;;` which matches lines STARTING with #, not lines with inline #-comments. So each new line was passed to modprobe verbatim as a single (invalid) module name, modprobe returned nonzero, and the .ko never made it into the initramfs. ls'ing the rootfs after the rootfs rebuild confirmed: ls .../lib/modules//kernel/net/netfilter/ \| grep nft_numgen <empty> Two changes: 1. Strip inline comments from the new entries in modules.list and modules-arm64.list. Each module name on its own line, matching the convention the rest of the file uses. 2. Harden the parser in inject-kubesolo.sh to handle "name # comment" regardless. Single-line tweak: `mod="${mod%%#}"` before the continue check. Prevents a future contributor's inline doc from silently dropping a module the same way. After rebuilding the rootfs on the Odroid (no kernel rebuild needed — this is a rootfs-only change), the four .ko files should appear at build/rootfs-work/rootfs/lib/modules//kernel/net/netfilter/. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 15:10:09 -06:00
Adolfo Delorenzo	3bcf2e115f	fix(modules): ship and load nft_numgen/hash/limit/log at boot Some checks failed ARM64 Build / Build generic ARM64 disk image (push) Failing after 6s Details CI / Go Tests (push) Successful in 2m12s Details CI / Shellcheck (push) Successful in 55s Details CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Successful in 1m48s Details CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Successful in 1m35s Details After `31eee77` added CONFIG_NFT_NUMGEN=m and friends to the kernel fragment, the rebuilt kernel does include nft_numgen.ko on disk in build/cache/kernel-arm64-generic/modules/. But the runtime kernel doesn't load it, and kube-proxy keeps failing with the same "No such file or directory" pointing at `numgen` as before the kernel rebuild. Root cause is the boot-stage-vs-lockdown ordering combined with inject-kubesolo.sh's selective module copy: 1. inject-kubesolo.sh ships modules listed in modules.list / modules-arm64.list plus their transitive deps. nft_numgen wasn't in either list, so its .ko is in the kernel build cache but never makes it into the initramfs. 2. Stage 30 (kernel-modules) only modprobes from the same list, so it wouldn't load nft_numgen even if the .ko were present. 3. Stage 85 (security-lockdown) writes 1 to /proc/sys/kernel/modules_disabled, blocking any further module loads — including the lazy request_module() that nftables would otherwise do when kube-proxy first uses the `numgen` expression. The kernel-side fix (=m in the fragment) is necessary but not sufficient: we have to ship + load these in stage 30, before lockdown. Add nft_numgen, nft_hash, nft_limit, nft_log to BOTH modules.list (x86) and modules-arm64.list. Same justification on x86 — KubeSolo's nftables kube-proxy backend uses numgen regardless of arch, we just haven't exercised it on x86 since v0.2 deployments stuck with the older iptables-restore backend. After this lands on the Odroid: sudo make rootfs-arm64 disk-image-arm64 # kernel cached, rootfs only # no kernel rebuild needed; this is a rootfs-only change Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 14:25:11 -06:00
Adolfo Delorenzo	31eee77397	fix(kernel): enable nftables NUMGEN + HASH + helper expressions Some checks failed ARM64 Build / Build generic ARM64 disk image (push) Failing after 5s Details CI / Go Tests (push) Successful in 3m51s Details CI / Shellcheck (push) Successful in 1m5s Details CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Successful in 2m48s Details CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Successful in 2m50s Details Fourth round of the v0.3 nftables-on-arm64 debug saga. After the NF_TABLES_IPV4 family fix from `7e46f8f`, KubeSolo + containerd + a CoreDNS pod all reach Running state, but kube-proxy fails to install Service rules: add rule ip kube-proxy service-2QRHZV4L-default/kubernetes/tcp/https numgen random mod 1 vmap { 0 : goto ... } ^^^^^^^^^^^^^^^^^^^ Error: Could not process rule: No such file or directory The caret points at `numgen random mod 1`. That's the nftables NUMGEN expression — kube-proxy's nftables backend uses it for random endpoint load-balancing across Service endpoints. Without CONFIG_NFT_NUMGEN compiled into the kernel, every Service sync fails and kube-dns / any ClusterIP is unreachable. Cascade: kube-proxy sync fail -> kube-dns Service has no DNAT -> CoreDNS readiness probe never goes Ready -> KubeSolo's coredns deploy step times out after 15 attempts -> FTL -> kernel panic. Fix: add NFT_NUMGEN to kernel-container.fragment, plus the small family of expression modules kube-proxy and CNI plugins commonly use so we don't repeat this debug loop for the next missing one: CONFIG_NFT_NUMGEN=m random / inc LB CONFIG_NFT_HASH=m consistent-hash LB (sessionAffinity=ClientIP) CONFIG_NFT_OBJREF=m named objects (counters, quotas) refs in rules CONFIG_NFT_LIMIT=m rate-limit expression CONFIG_NFT_LOG=m log expression (used by some CNI debug rules) All =m so init's stage-30 loads them from modules.list / modules-arm64.list alongside the existing nft_nat / nft_masq / nft_compat. This needs another kernel rebuild (rm -rf build/cache/kernel-arm64-generic, sudo make kernel-arm64) on the Odroid. After that we should have a fully working KubeSolo OS v0.3 on ARM64 generic — at which point the only thing left is to tag v0.3.1 and verify the rewritten release.yaml workflow publishes both arches automatically. Note on runc-PATH log noise: containerd-shim-runc-v2 -info probes for runc in $PATH and fails because KubeSolo's runc lives at /var/lib/kubesolo/containerd/runc. This is cosmetic — actual container creation uses an absolute path from the containerd config and works fine (CoreDNS container did start successfully). Will polish in v0.3.2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 11:48:43 -06:00
Adolfo Delorenzo	7e46f8fdc2	fix(kernel): enable nftables address-family handlers Some checks failed ARM64 Build / Build generic ARM64 disk image (push) Failing after 6s Details CI / Go Tests (push) Successful in 2m40s Details CI / Shellcheck (push) Successful in 1m39s Details CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Failing after 10s Details CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Failing after 7s Details Third KubeSolo crash from the QEMU validation loop: nft add table ip kubesolo-masq: exit status 1 Error: Could not process rule: Operation not supported That's EOPNOTSUPP from netlink. nf_tables core is loaded (the binary even runs cleanly now after the previous dual-glibc fix), but no address families are registered with it — so any `nft add table ip ...`, `add table inet ...`, etc. is rejected. In modern Linux (5.x / 6.x) the nftables address families are gated by separate BOOL Kconfigs: CONFIG_NF_TABLES_IPV4 "ip" family CONFIG_NF_TABLES_IPV6 "ip6" family CONFIG_NF_TABLES_INET "inet" family (both) CONFIG_NF_TABLES_NETDEV "netdev" family These are bool (not tristate) — they must be built into the kernel; no module to load at runtime. Our shared kernel-container.fragment had CONFIG_NF_TABLES=m (the core) but none of the family Kconfigs, and the arm64 defconfig leaves them off. Fix: enable all four families as =y in kernel-container.fragment. Also pin the NFT expression modules KubeSolo v1.1.4+'s masquerade ruleset depends on (NFT_NAT, NFT_MASQ, NFT_CT, NFT_REDIR, NFT_REJECT, NFT_REJECT_INET, NFT_COMPAT, NFT_FIB + FIB_IPV4/6) as =m — they're already in modules-arm64.list / modules.list and get modprobed at boot, this just makes sure olddefconfig doesn't strip them when applied on top of a minimal defconfig. NF_NAT_MASQUERADE pinned =y because NFT_MASQ select-depends on it; on some kernels it would get auto-selected, on others it gets dropped by olddefconfig if not pinned. This change requires a kernel rebuild — the configs are bool / module defs, not runtime knobs. On the Odroid: rm -rf build/cache/kernel-arm64-generic sudo make kernel-arm64 # ~30-60 min from scratch sudo make rootfs-arm64 disk-image-arm64 x86 needs the same treatment when we cut v0.3.1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 08:55:41 -06:00
Adolfo Delorenzo	1b44c9d621	feat: bump KubeSolo to v1.1.5 + cross-arch CI workflow Some checks failed ARM64 Build / Build generic ARM64 disk image (push) Failing after 3s Details CI / Go Tests (push) Successful in 1m27s Details CI / Shellcheck (push) Failing after 50s Details CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Failing after 1m33s Details CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Failing after 1m15s Details Phase 4 of v0.3 — KubeSolo version bump and CI gating. KubeSolo v1.1.0 → v1.1.5 brings: - New flag --disable-ipv6 (v1.1.5) - New flag --db-wal-repair (v1.1.5) — important for power-loss resilience on edge appliances; surfaced as kubesolo.db-wal-repair in cloud-init - New flag --full (v1.1.4) — disables edge-optimised k8s overrides - Pod egress connectivity fix after reboot (v1.1.4) - Registry config persistence fix (v1.1.5) - k8s 1.34.7, CoreDNS 1.14.3, Go 1.26.2 All three new flags wired into cloud-init: config.go fields, kubesolo.go extra-flag emission, full-config.yaml example. Supply-chain hygiene: - Per-arch checksums: KUBESOLO_SHA256_AMD64 and KUBESOLO_SHA256_ARM64 in versions.env. Replaces the single shared KUBESOLO_SHA256 that couldn't meaningfully verify both binaries at once. - Checksum now applied to the tarball (the immutable upstream artifact) rather than the post-extract binary. CI: - New .gitea/workflows/build-arm64.yaml routes the full kernel + rootfs + disk-image build to the Odroid arm64-linux runner. Triggers on push to main, tags, and manual workflow_dispatch. The boot smoke test is continue-on-error because KubeSolo's first-boot image import deadline fires under QEMU TCG on the Odroid. VERSION bumped to 0.3.0-dev. CHANGELOG entry under [0.3.0-dev] captures all Phase 1-4 work + the known limitations documented in arm64-status.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 16:26:20 -06:00
Adolfo Delorenzo	d51618badb	build: separate generic ARM64 from Raspberry Pi kernel builds Splits the ARM64 build into two tracks per docs/arm64-architecture.md: Generic ARM64 (mainline kernel.org, UEFI, virtio, GRUB): - New build/scripts/build-kernel-arm64.sh builds mainline LTS (6.12.x by default) from arm64 defconfig + shared container fragment + arm64-virt enables (VIRTIO_*, EFI_STUB, NVMe). Output: build/cache/kernel-arm64-generic/. - New Makefile targets: kernel-arm64, rootfs-arm64 (now consumes the mainline kernel modules via TARGET_VARIANT=generic). - versions.env: pin MAINLINE_KERNEL_VERSION=6.12.10, declare cdn.kernel.org URL and SHA256 placeholder. Raspberry Pi (raspberrypi/linux fork, custom DTBs, autoboot.txt): - build-kernel-arm64.sh (RPi-flavoured) renamed to build-kernel-rpi.sh; cache dir renamed from custom-kernel-arm64 to custom-kernel-rpi. - New Makefile targets: kernel-rpi, rootfs-arm64-rpi (uses TARGET_VARIANT=rpi). - rpi-image now depends on rootfs-arm64-rpi + kernel-rpi instead of the generic rootfs-arm64. - create-rpi-image.sh + inject-kubesolo.sh updated to reference the new cache path. inject-kubesolo.sh now takes a TARGET_VARIANT env var (rpi\|generic) to select which ARM64 kernel modules to consume. Shared substrate: - rpi-kernel-config.fragment renamed to kernel-container.fragment. The contents were never RPi-specific (cgroup, namespaces, AppArmor, netfilter) — just misnamed. Extended with extra subsystem disables (KVM, WLAN, CFG80211, INFINIBAND, PCMCIA, HAMRADIO, ISDN, ATM, INPUT_JOYSTICK, INPUT_TABLET, FPGA) and CONFIG_LSM=lockdown,yama,apparmor. - build-kernel.sh (x86) refactored to apply the shared fragment via a generic apply_fragment function (two-pass for the TC stock config security dance), killing ~50 lines of inline config duplication. Note: rename detection shows build-kernel-arm64.sh as 'modified' because the new file at that path is the mainline build, while the old RPi-flavoured content lives in build-kernel-rpi.sh (which appears as a new file). The git log for build-kernel-rpi.sh is empty; the RPi history is preserved at the original path until this commit. No actual kernel build runs in this commit — that's Phase 3 work. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 10:30:11 -06:00
Adolfo Delorenzo	059ec7955f	chore: housekeeping for v0.3 prep - Pin KUBESOLO_VERSION in versions.env (was soft-defaulted in fetch-components.sh) - Gitignore screenshots, macOS resource forks, and common image extensions - Update README roadmap: x86_64 stable, ARM64 generic in progress (v0.3), ARM64 RPi paused pending hardware - Add docs/ci-runners.md documenting the Odroid arm64-linux Gitea runner Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 09:44:01 -06:00
Adolfo Delorenzo	09dcea84ef	fix: disk image build, piCore64 URL, license Some checks failed CI / Go Tests (push) Has been cancelled Details CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Has been cancelled Details CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Has been cancelled Details CI / Shellcheck (push) Has been cancelled Details Release / Test (push) Has been cancelled Details Release / Build Binaries (amd64, linux, linux-amd64) (push) Has been cancelled Details Release / Build Binaries (arm64, linux, linux-arm64) (push) Has been cancelled Details Release / Build ISO (amd64) (push) Has been cancelled Details Release / Create Release (push) Has been cancelled Details - Add kpartx for reliable loop partition mapping in Docker containers - Fix piCore64 download URL (changed from .img.gz to .zip format) - Fix piCore64 boot partition mount (initramfs on p1, not p2) - Fix tar --wildcards for RPi firmware extraction - Add MIT license (same as KubeSolo) - Add kpartx and unzip to Docker builder image Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-12 17:05:03 -06:00
Adolfo Delorenzo	efc7f80b65	feat: add security hardening, AppArmor, and ARM64 Raspberry Pi support (Phase 6) Some checks failed CI / Go Tests (push) Has been cancelled Details CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Has been cancelled Details CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Has been cancelled Details CI / Shellcheck (push) Has been cancelled Details Security hardening: bind kubeconfig server to localhost, mount hardening (noexec/nosuid/nodev on tmpfs), sysctl network hardening, kernel module loading lock after boot, SHA256 checksum verification for downloads, kernel AppArmor + Audit support, complain-mode AppArmor profiles for containerd and kubelet, and security integration test. ARM64 Raspberry Pi support: piCore64 base extraction, RPi kernel build from raspberrypi/linux fork, RPi firmware fetch, SD card image with 4- partition GPT and tryboot A/B mechanism, BootEnv Go interface abstracting GRUB vs RPi boot environments, architecture-aware build scripts, QEMU aarch64 dev VM and boot test. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-12 13:08:17 -06:00
Adolfo Delorenzo	39732488ef	feat: custom kernel build + boot fixes for working container runtime Build a custom Tiny Core 17.0 kernel (6.18.2) with missing configs that the stock kernel lacks for container workloads: - CONFIG_CGROUP_BPF=y (cgroup v2 device control via BPF) - CONFIG_DEVTMPFS=y (auto-create /dev device nodes) - CONFIG_DEVTMPFS_MOUNT=y (auto-mount devtmpfs) - CONFIG_MEMCG=y (memory cgroup controller for memory.max) - CONFIG_CFS_BANDWIDTH=y (CPU bandwidth throttling for cpu.max) Also strips unnecessary subsystems (sound, GPU, wireless, Bluetooth, KVM, etc.) for minimal footprint on a headless K8s edge appliance. Init system fixes for successful boot-to-running-pods: - Add switch_root in init.sh to escape initramfs (runc pivot_root) - Add mountpoint guards in 00-early-mount.sh (skip if already mounted) - Create essential device nodes after switch_root (kmsg, console, etc.) - Enable cgroup v2 controller delegation with init process isolation - Mount BPF filesystem for cgroup v2 device control - Add mknod fallback from sysfs in 20-persistent-mount.sh for /dev/vda - Move KubeSolo binary to /usr/bin (avoid /usr/local bind mount hiding) - Generate /etc/machine-id in 60-hostname.sh (kubelet requires it) - Pre-initialize iptables tables before kube-proxy starts - Add nft_reject, nft_fib, xt_nfacct to kernel modules list Build system changes: - New build-kernel.sh script for custom kernel compilation - Dockerfile.builder adds kernel build deps (flex, bison, libelf, etc.) - Selective kernel module install (only modules.list + transitive deps) - Install iptables-nft (xtables-nft-multi) + shared libs in rootfs Tested: ISO boots in QEMU, node reaches Ready in ~35s, CoreDNS and local-path-provisioner pods start and run successfully. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-11 23:13:31 -06:00
Adolfo Delorenzo	e372df578b	feat: initial Phase 1 PoC scaffolding for KubeSolo OS Complete Phase 1 implementation of KubeSolo OS — an immutable, bootable Linux distribution built on Tiny Core Linux for running KubeSolo single-node Kubernetes. Build system: - Makefile with fetch, rootfs, initramfs, iso, disk-image targets - Dockerfile.builder for reproducible builds - Scripts to download Tiny Core, extract rootfs, inject KubeSolo, pack initramfs, and create bootable ISO/disk images Init system (10 POSIX sh stages): - Early mount (proc/sys/dev/cgroup2), cmdline parsing, persistent mount with bind-mounts, kernel module loading, sysctl, DHCP networking, hostname, clock sync, containerd prep, KubeSolo exec Shared libraries: - functions.sh (device wait, IP lookup, config helpers) - network.sh (static IP, config persistence, interface detection) - health.sh (containerd, API server, node readiness checks) - Emergency shell for boot failure debugging Testing: - QEMU boot test with serial log marker detection - K8s readiness test with kubectl verification - Persistence test (reboot + verify state survives) - Workload deployment test (nginx pod) - Local storage test (PVC + local-path provisioner) - Network policy test - Reusable run-vm.sh launcher Developer tools: - dev-vm.sh (interactive QEMU with port forwarding) - rebuild-initramfs.sh (fast iteration) - inject-ssh.sh (dropbear SSH for debugging) - extract-kernel-config.sh + kernel-audit.sh Documentation: - Full design document with architecture research - Boot flow documentation covering all 10 init stages - Cloud-init examples (DHCP, static IP, Portainer Edge, air-gapped) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-11 10:18:42 -06:00

11 Commits