kubesolo-os

Author	SHA1	Message	Date
Adolfo Delorenzo	51c1f78aea	fix(arm64): bundle nft binary + always show access banner Some checks failed ARM64 Build / Build generic ARM64 disk image (push) Failing after 5s Details CI / Go Tests (push) Successful in 1m55s Details CI / Shellcheck (push) Successful in 53s Details CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Failing after 1m0s Details CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Successful in 2m18s Details Two real v0.3.0 bugs that surface on first-boot: 1. KubeSolo v1.1.4+ owns its pod-masquerade rules directly via nft add table ip kubesolo-masq instead of going through kube-proxy/CNI. Without the standalone nft CLI in PATH, KubeSolo FATALs at startup with: "nft": executable file not found in $PATH then the init exits and the kernel panics on PID 1 death. inject-kubesolo.sh now also copies /usr/sbin/nft and its non-shared libraries (libnftables, libedit, libjansson, libgmp, libtinfo, libbsd, libmd). The iptables-nft block above already covered libmnl, libnftnl, libxtables, libc, ld. 2. The host-access banner ("From your host machine, run: curl -s http://localhost:8080 ...") was gated on the kubeconfig appearing within 120s. When KubeSolo crashed early (bug 1 above) or simply took longer than the wait window, the user never saw the connection instructions. 90-kubesolo.sh now: - writes the banner to /etc/motd so it shows on any later shell (SSH ext, emergency shell, console login) - prints the banner to console unconditionally, after the wait loop, regardless of whether the kubeconfig was found Both fixes are pure rootfs changes — no kernel rebuild required. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 07:16:12 -06:00
Adolfo Delorenzo	28de656b97	feat(update): OCI registry distribution for update artifacts Some checks failed ARM64 Build / Build generic ARM64 disk image (push) Failing after 4s Details CI / Go Tests (push) Successful in 1m28s Details CI / Shellcheck (push) Successful in 45s Details CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Successful in 1m17s Details CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Successful in 1m13s Details Phase 7 of v0.3. The update agent can now pull update artifacts from any OCI-compliant registry (ghcr.io, quay.io, harbor, zot, etc.) alongside the existing HTTP latest.json protocol. Multi-arch artifacts are resolved through manifest indexes so the same tag (e.g. "stable") yields the right kernel + initramfs for runtime.GOARCH. New package update/pkg/oci (~280 LOC, 9 tests): - Client wraps oras-go/v2's remote.Repository. NewClient parses host/path references; WithPlainHTTP toggle for httptest. - FetchMetadata resolves a tag and returns image.UpdateMetadata from manifest annotations (io.kubesolo.os.{version,channel,architecture, min_compatible_version,release_notes,release_date}). No blobs fetched. - Pull resolves the tag, walks index → arch-specific manifest, downloads kernel + initramfs layers identified by their custom media types (application/vnd.kubesolo.os.kernel.v1+octet-stream and application/vnd.kubesolo.os.initramfs.v1+gzip), verifies their digests against the manifest, returns the same image.StagedImage shape the HTTP client produces. - Cross-arch single-arch manifests are refused via the AnnotArch check (defense in depth on top of the gates in cmd/apply.go). - Tests use a hand-rolled httptest registry implementing /v2/probe, manifest fetch by tag-or-digest, blob fetch by digest. Cover index arch-selection, single-arch manifests, missing-arch error, tampered blob rejection (digest mismatch), and reference parsing. Dependencies added: oras.land/oras-go/v2 v2.6.0 plus its transitive opencontainers/{go-digest,image-spec} and golang.org/x/sync. All small and well-maintained; total binary size impact is negligible relative to the existing 6.1 MB update agent. cmd/apply.go: - New --registry and --tag flags; mutually exclusive with --server. - applyMetadataGates extracted as a helper, called from both transports so channel/arch/min-version policy is enforced identically regardless of how metadata was fetched. - State transitions identical to the HTTP path: Checking → Downloading → Staged, with RecordError on any failure. cmd/opts.go: --registry, --tag CLI flags. update.conf "server=" already accepts either an HTTP URL or an OCI ref; the agent distinguishes by which CLI/conf field carries the value. build/scripts/push-oci-artifact.sh: new tool that publishes a single-arch update artifact via the oras CLI with our custom media types and annotations. After running for each arch, the operator composes the multi-arch index with `oras manifest index create`. Documented inline. build/Dockerfile.builder: installs oras 1.2.3 from upstream releases so the Gitea Actions build container can run the new script. Signature verification on the OCI path is intentionally deferred — the artifact format is digest-verified end-to-end via oras-go, and Ed25519 signature consumption via OCI referrers is a follow-up. Plain HTTP clients keep their existing signature path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 18:58:38 -06:00
Adolfo Delorenzo	1b44c9d621	feat: bump KubeSolo to v1.1.5 + cross-arch CI workflow Some checks failed ARM64 Build / Build generic ARM64 disk image (push) Failing after 3s Details CI / Go Tests (push) Successful in 1m27s Details CI / Shellcheck (push) Failing after 50s Details CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Failing after 1m33s Details CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Failing after 1m15s Details Phase 4 of v0.3 — KubeSolo version bump and CI gating. KubeSolo v1.1.0 → v1.1.5 brings: - New flag --disable-ipv6 (v1.1.5) - New flag --db-wal-repair (v1.1.5) — important for power-loss resilience on edge appliances; surfaced as kubesolo.db-wal-repair in cloud-init - New flag --full (v1.1.4) — disables edge-optimised k8s overrides - Pod egress connectivity fix after reboot (v1.1.4) - Registry config persistence fix (v1.1.5) - k8s 1.34.7, CoreDNS 1.14.3, Go 1.26.2 All three new flags wired into cloud-init: config.go fields, kubesolo.go extra-flag emission, full-config.yaml example. Supply-chain hygiene: - Per-arch checksums: KUBESOLO_SHA256_AMD64 and KUBESOLO_SHA256_ARM64 in versions.env. Replaces the single shared KUBESOLO_SHA256 that couldn't meaningfully verify both binaries at once. - Checksum now applied to the tarball (the immutable upstream artifact) rather than the post-extract binary. CI: - New .gitea/workflows/build-arm64.yaml routes the full kernel + rootfs + disk-image build to the Odroid arm64-linux runner. Triggers on push to main, tags, and manual workflow_dispatch. The boot smoke test is continue-on-error because KubeSolo's first-boot image import deadline fires under QEMU TCG on the Odroid. VERSION bumped to 0.3.0-dev. CHANGELOG entry under [0.3.0-dev] captures all Phase 1-4 work + the known limitations documented in arm64-status.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 16:26:20 -06:00
Adolfo Delorenzo	1de36289a5	fix(arm64): tr -d '[:space:]' is parsed as literal char-set by busybox 1.30.1 Some checks failed CI / Go Tests (push) Successful in 1m39s Details CI / Shellcheck (push) Failing after 44s Details CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Failing after 1m13s Details CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Failing after 1m31s Details Ubuntu's busybox-static 1.30.1 (which we use for the ARM64 rootfs after piCore64's BusyBox crashes in QEMU virt) doesn't recognize POSIX character classes. `tr -d '[:space:]'` is interpreted as "delete any of the literal characters [, :, s, p, a, c, e, ]" — so every s/p/a/c/e in module names and sysctl keys gets eaten. Symptoms in the boot log: virtio_net -> virtio_nt (e dropped) overlay -> ovrly (e, a dropped) bridge -> bridg (e dropped) nf_conntrack -> nf_onntrk (c, a, c dropped) net.bridge.bridge-nf-call-iptables -> nt.bridg.bridg-nf-ll-itbl Fix: use explicit whitespace chars `tr -d ' \t\r\n'` in both 30-kernel-modules.sh and 40-sysctl.sh. Works under any tr implementation. Also: filter functions.sh out of the init.d stage-copy loop. It's a shared library (sourced by init.sh), not a numbered stage. With it in init.d the main loop runs it as a stage after stage 90, then panics with "Init completed without exec'ing KubeSolo". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 16:02:21 -06:00
Adolfo Delorenzo	31aac701db	debug(arm64): use /dev/vda4 directly instead of LABEL=KSOLODATA Some checks failed CI / Go Tests (push) Successful in 1m28s Details CI / Shellcheck (push) Failing after 46s Details CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Failing after 1m18s Details CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Failing after 1m15s Details piCore64's blkid/findfs binaries (separate util-linux dynamics, NOT busybox symlinks) crash in QEMU virt with the same instruction-abort issue as the broken BusyBox. The host's static busybox doesn't include blkid/findfs applets either, so stage 20-persistent-mount.sh segfaults in a loop trying to resolve LABEL=KSOLODATA. Short-term: hardcode /dev/vda4 (the virtio data partition under QEMU) so the boot can progress past stage 20 and we can see what else needs fixing. Pre-v0.3 release we need to either: a) ship a real blkid/findfs binary that works (util-linux from upstream, statically built), or b) avoid LABEL= entirely and detect the data partition by walking /sys/class/block looking for our ext4 magic+label. Either way the LABEL= path needs to work on real ARM64 hosts where the device path varies (vda/sda/nvme0n1). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 15:47:55 -06:00
Adolfo Delorenzo	06e12a79bd	fix(arm64): override piCore64's BusyBox with host's static busybox Some checks failed CI / Go Tests (push) Successful in 1m26s Details CI / Shellcheck (push) Failing after 36s Details CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Failing after 1m15s Details CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Failing after 1m14s Details piCore64 v15.0.0 ships BusyBox built with ARM instructions that QEMU virt cannot emulate even under -cpu max — applets like mkdir, uname, readlink SIGILL on first invocation (el0_undef in the panic trace). mount works because piCore's busybox.suid happens to use a different code path. Fix: when building the arm64 rootfs, replace piCore's bin/busybox and bin/busybox.suid with /bin/busybox from the build host (Ubuntu's busybox-static, statically linked, built for generic ARMv8-A). Also add busybox-static to Dockerfile.builder so the Docker-based build flow has the same fallback available. Long-term: source a known-good ARM64 BusyBox build (Alpine, or our own from upstream BusyBox) so we don't depend on the build host's package manager. Tracked as future work. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 15:38:05 -06:00
Adolfo Delorenzo	5cf81049f6	fix: install our staged init at /init too, not just /sbin/init Some checks failed CI / Go Tests (push) Successful in 1m29s Details CI / Shellcheck (push) Failing after 33s Details CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Failing after 1m7s Details CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Failing after 1m12s Details The kernel ALWAYS runs /init when booting from an initramfs. If /init doesn't exist, the kernel falls back to the legacy root-mount path (looking for a real root partition via root= cmdline), which we don't want — our system IS the initramfs. Previous fix removed piCore's /init to stop it from being run; that caused the kernel to skip the initramfs entrypoint entirely and panic with 'Cannot open root device' (error -6). Correct fix: replace piCore's /init with a copy of our init.sh. The kernel runs /init -> our staged boot, which is exactly what we want. Keep /sbin/init as well (some boot paths exec it directly, e.g. via init= cmdline override) and the existing init=/sbin/init in grub-arm64.cfg as a belt. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 15:01:20 -06:00
Adolfo Delorenzo	863f498cc2	fix: kernel must use /sbin/init, not piCore's /init Some checks failed CI / Go Tests (push) Failing after 53s Details CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Has been skipped Details CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Has been skipped Details CI / Shellcheck (push) Failing after 27s Details Root cause of the 'Run /init as init process' -> immediate SIGSEGV panic on the generic ARM64 boot: piCore64's rootfs ships a /init script at the rootfs root, and the kernel's init search order picks /init over /sbin/init. piCore's init then exec's something incompatible with our environment and segfaults. Two fixes: 1. inject-kubesolo.sh now removes the upstream /init after replacing /sbin/init. This is the structural fix — the rootfs no longer has the conflicting entry-point. 2. grub-arm64.cfg passes init=/sbin/init explicitly. Belt-and-suspenders in case any future rootfs source re-introduces /init. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 14:43:35 -06:00
Adolfo Delorenzo	05ab108de1	fix(grub): put ttyAMA0 last so it's the primary console on ARM64 Some checks failed CI / Go Tests (push) Successful in 1m29s Details CI / Shellcheck (push) Failing after 40s Details CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Failing after 1m21s Details CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Failing after 1m9s Details Kernel takes the last `console=` argument as primary (where init's stdout/stderr land). The previous order had ttyS0 last, which is a dead device on QEMU virt and most ARM64 SBCs — so init output disappeared and we only saw kernel panic messages (which use earlycon, bypassing the console preference). Also drop `quiet` from the default boot entry while we stabilise — we need the kernel + init output visible right now. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 14:11:58 -06:00
Adolfo Delorenzo	c20f5a2e8c	fix(build): detect native ARM64 host and skip cross-compiler requirement Some checks failed CI / Go Tests (push) Successful in 1m32s Details CI / Shellcheck (push) Failing after 39s Details CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Failing after 1m27s Details CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Failing after 2m32s Details build-kernel-arm64.sh and build-kernel-rpi.sh both insisted on aarch64-linux-gnu-gcc (the cross-compiler from x86), which fails on a native ARM64 build host like the Odroid runner. Detect uname -m and use the host's gcc with an empty CROSS_COMPILE on aarch64 hosts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 10:56:39 -06:00
Adolfo Delorenzo	80aca5e372	feat: ARM64 generic UEFI disk image (GPT + GRUB A/B) Some checks failed CI / Go Tests (push) Successful in 2m38s Details CI / Shellcheck (push) Failing after 37s Details CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Failing after 1m22s Details CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Failing after 1m11s Details Produces a UEFI-bootable raw disk image for generic ARM64 hosts (QEMU virt, Ampere/Graviton cloud, ARM64 SBCs with UEFI). Reuses the existing 4-partition A/B layout from x86 (EFI 256 MB FAT32 + System A 512 MB ext4 + System B 512 MB ext4 + Data ext4 remainder). Changes: - build/scripts/create-disk-image.sh: TARGET_ARCH env var (amd64 default, arm64). Selects kernel source path, grub-mkimage target (x86_64-efi vs arm64-efi), EFI binary name (bootx64.efi vs BOOTAA64.EFI), grub.cfg variant, and whether to also install BIOS GRUB (x86 only). - build/grub/grub-arm64.cfg: ARM64 variant of grub.cfg. Identical A/B logic; console=ttyAMA0+ttyS0 to cover QEMU virt PL011, Ampere PL011, and Graviton 16550-compat. - build/Dockerfile.builder: add grub-efi-amd64-bin, grub-efi-arm64-bin, grub-pc-bin, grub-common, grub2-common so the builder container can produce EFI images for both architectures. - hack/dev-vm-arm64.sh: split into kernel mode (direct -kernel/-initrd, fast iteration) and --disk mode (UEFI firmware + GRUB + disk image, full integration test). Probes common UEFI firmware paths on Ubuntu/Fedora/macOS. Default kernel path now points at kernel-arm64-generic/Image with fallback to the renamed custom-kernel-rpi/Image. - test/qemu/test-boot-arm64-disk.sh: new CI test for the full UEFI -> GRUB -> kernel -> stage-90 boot chain. Uses a scratch copy of the disk so grubenv writes don't mutate the source artifact. - Makefile: new disk-image-arm64 target (depends on rootfs-arm64 + kernel-arm64), new test-boot-arm64-disk target, .PHONY + help updates. Phase 3 scaffold is in place. First real end-to-end ARM64 build runs in the next step on the Odroid runner — that's where we find out what's actually broken. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 10:36:08 -06:00
Adolfo Delorenzo	d51618badb	build: separate generic ARM64 from Raspberry Pi kernel builds Splits the ARM64 build into two tracks per docs/arm64-architecture.md: Generic ARM64 (mainline kernel.org, UEFI, virtio, GRUB): - New build/scripts/build-kernel-arm64.sh builds mainline LTS (6.12.x by default) from arm64 defconfig + shared container fragment + arm64-virt enables (VIRTIO_*, EFI_STUB, NVMe). Output: build/cache/kernel-arm64-generic/. - New Makefile targets: kernel-arm64, rootfs-arm64 (now consumes the mainline kernel modules via TARGET_VARIANT=generic). - versions.env: pin MAINLINE_KERNEL_VERSION=6.12.10, declare cdn.kernel.org URL and SHA256 placeholder. Raspberry Pi (raspberrypi/linux fork, custom DTBs, autoboot.txt): - build-kernel-arm64.sh (RPi-flavoured) renamed to build-kernel-rpi.sh; cache dir renamed from custom-kernel-arm64 to custom-kernel-rpi. - New Makefile targets: kernel-rpi, rootfs-arm64-rpi (uses TARGET_VARIANT=rpi). - rpi-image now depends on rootfs-arm64-rpi + kernel-rpi instead of the generic rootfs-arm64. - create-rpi-image.sh + inject-kubesolo.sh updated to reference the new cache path. inject-kubesolo.sh now takes a TARGET_VARIANT env var (rpi\|generic) to select which ARM64 kernel modules to consume. Shared substrate: - rpi-kernel-config.fragment renamed to kernel-container.fragment. The contents were never RPi-specific (cgroup, namespaces, AppArmor, netfilter) — just misnamed. Extended with extra subsystem disables (KVM, WLAN, CFG80211, INFINIBAND, PCMCIA, HAMRADIO, ISDN, ATM, INPUT_JOYSTICK, INPUT_TABLET, FPGA) and CONFIG_LSM=lockdown,yama,apparmor. - build-kernel.sh (x86) refactored to apply the shared fragment via a generic apply_fragment function (two-pass for the TC stock config security dance), killing ~50 lines of inline config duplication. Note: rename detection shows build-kernel-arm64.sh as 'modified' because the new file at that path is the mainline build, while the old RPi-flavoured content lives in build-kernel-rpi.sh (which appears as a new file). The git log for build-kernel-rpi.sh is empty; the RPi history is preserved at the original path until this commit. No actual kernel build runs in this commit — that's Phase 3 work. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 10:30:11 -06:00
Adolfo Delorenzo	059ec7955f	chore: housekeeping for v0.3 prep - Pin KUBESOLO_VERSION in versions.env (was soft-defaulted in fetch-components.sh) - Gitignore screenshots, macOS resource forks, and common image extensions - Update README roadmap: x86_64 stable, ARM64 generic in progress (v0.3), ARM64 RPi paused pending hardware - Add docs/ci-runners.md documenting the Odroid arm64-linux Gitea runner Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 09:44:01 -06:00
Adolfo Delorenzo	a6c5d56ade	rpi: drop to interactive shell on boot failure, add initcall_debug Some checks failed CI / Go Tests (push) Has been cancelled Details CI / Shellcheck (push) Has been cancelled Details CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Has been cancelled Details CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Has been cancelled Details Instead of returning 1 (which triggers kernel panic via set -e before emergency_shell runs), exec an interactive shell on /dev/console so the user can run dmesg and debug interactively. Add initcall_debug and loglevel=7 to cmdline.txt to show every driver probe during boot. Also dump last 60 lines of dmesg before dropping to shell. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-12 20:50:20 -06:00
Adolfo Delorenzo	6c6940afac	rpi: add boot diagnostics and remove quiet for debugging Some checks failed CI / Go Tests (push) Has been cancelled Details CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Has been cancelled Details CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Has been cancelled Details CI / Shellcheck (push) Has been cancelled Details Remove 'quiet' from RPi cmdline.txt so kernel probe messages are visible on HDMI. Add comprehensive diagnostics to the data device error path: dmesg for MMC/SDHCI/regulators/firmware, /sys/class/block listing, and error message scanning. This will reveal why zero block devices appear despite all kernel configs being correct. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-12 20:12:26 -06:00
Adolfo Delorenzo	4e3f1d6cf0	fix: use kernel-built DTBs for RPi SD card driver probe Some checks failed CI / Go Tests (push) Has been cancelled Details CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Has been cancelled Details CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Has been cancelled Details CI / Shellcheck (push) Has been cancelled Details Release / Test (push) Has been cancelled Details Release / Build Binaries (amd64, linux, linux-amd64) (push) Has been cancelled Details Release / Build Binaries (arm64, linux, linux-arm64) (push) Has been cancelled Details Release / Build ISO (amd64) (push) Has been cancelled Details Release / Create Release (push) Has been cancelled Details The sdhci-iproc driver (RPi 4 SD card controller) probes via Device Tree matching. Using DTBs from the firmware repo instead of the kernel build caused a mismatch — the driver silently failed to probe, resulting in zero block devices after boot. Changes: - Use DTBs from custom-kernel-arm64/dtbs/ (matches the kernel) - Firmware blobs (start4.elf, fixup4.dat) still from firmware repo - Also includes prior fix for LABEL= resolution in persistent mount Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-12 19:27:54 -06:00
Adolfo Delorenzo	a2764218fc	fix: make RPi partition 1 self-sufficient boot fallback Some checks failed CI / Go Tests (push) Has been cancelled Details CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Has been cancelled Details CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Has been cancelled Details CI / Shellcheck (push) Has been cancelled Details Release / Test (push) Has been cancelled Details Release / Build Binaries (amd64, linux, linux-amd64) (push) Has been cancelled Details Release / Build Binaries (arm64, linux, linux-arm64) (push) Has been cancelled Details Release / Build ISO (amd64) (push) Has been cancelled Details Release / Create Release (push) Has been cancelled Details The autoboot.txt A/B redirect requires newer RPi EEPROM firmware. On older EEPROMs, autoboot.txt is silently ignored and the firmware tries to boot from partition 1 directly — failing with a rainbow screen because partition 1 had no kernel or initramfs. Changes: - Increase partition 1 from 32 MB to 384 MB - Populate partition 1 with full boot files (kernel, initramfs, config.txt with kernel= directive, DTBs, overlays) - Keep autoboot.txt for A/B redirect on supported EEPROMs - When autoboot.txt works: boots from partition 2 (A/B scheme) - When autoboot.txt is unsupported: boots from partition 1 (fallback) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-12 18:52:21 -06:00
Adolfo Delorenzo	2ba816bf6e	fix: add config.txt and DTBs to RPi boot control partition Some checks failed CI / Go Tests (push) Has been cancelled Details CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Has been cancelled Details CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Has been cancelled Details CI / Shellcheck (push) Has been cancelled Details Release / Test (push) Has been cancelled Details Release / Build Binaries (amd64, linux, linux-amd64) (push) Has been cancelled Details Release / Build Binaries (arm64, linux, linux-arm64) (push) Has been cancelled Details Release / Build ISO (amd64) (push) Has been cancelled Details Release / Create Release (push) Has been cancelled Details The Raspberry Pi firmware reads config.txt from partition 1 BEFORE processing autoboot.txt. Without arm_64bit=1 on the boot control partition, the firmware defaults to 32-bit mode and shows only a rainbow square. Add minimal config.txt, device tree blobs, and overlays to partition 1 so the firmware can initialize correctly before redirecting to the A/B boot partitions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-12 18:29:28 -06:00
Adolfo Delorenzo	65dcddb47e	fix: RPi image uses MBR and firmware on boot partition Some checks failed CI / Go Tests (push) Has been cancelled Details CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Has been cancelled Details CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Has been cancelled Details CI / Shellcheck (push) Has been cancelled Details Release / Test (push) Has been cancelled Details Release / Build Binaries (amd64, linux, linux-amd64) (push) Has been cancelled Details Release / Build Binaries (arm64, linux, linux-arm64) (push) Has been cancelled Details Release / Build ISO (amd64) (push) Has been cancelled Details Release / Create Release (push) Has been cancelled Details - Switch from GPT to MBR (dos) partition table — GPT + autoboot.txt fails on many Pi 4 EEPROM versions - Copy firmware blobs (start.elf, fixup.dat) to partition 1 (KSOLOCTL) so the EEPROM can find and load them - Increase boot control partition from 16 MB to 32 MB to fit firmware - Mark partition 1 as bootable Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-12 18:16:34 -06:00
Adolfo Delorenzo	ba4812f637	fix: complete ARM64 RPi build pipeline Some checks failed CI / Go Tests (push) Has been cancelled Details CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Has been cancelled Details CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Has been cancelled Details CI / Shellcheck (push) Has been cancelled Details Release / Test (push) Has been cancelled Details Release / Build Binaries (amd64, linux, linux-amd64) (push) Has been cancelled Details Release / Build Binaries (arm64, linux, linux-arm64) (push) Has been cancelled Details Release / Build ISO (amd64) (push) Has been cancelled Details Release / Create Release (push) Has been cancelled Details - fetch-components.sh: download ARM64 KubeSolo binary (kubesolo-arm64) - inject-kubesolo.sh: use arch-specific binaries for KubeSolo, cloud-init, and update agent; detect KVER from custom kernel when rootfs has none; cross-arch module resolution via find fallback when modprobe fails - create-rpi-image.sh: kpartx support for Docker container builds - Makefile: rootfs-arm64 depends on build-cross, includes pack-initramfs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-12 17:20:04 -06:00
Adolfo Delorenzo	09dcea84ef	fix: disk image build, piCore64 URL, license Some checks failed CI / Go Tests (push) Has been cancelled Details CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Has been cancelled Details CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Has been cancelled Details CI / Shellcheck (push) Has been cancelled Details Release / Test (push) Has been cancelled Details Release / Build Binaries (amd64, linux, linux-amd64) (push) Has been cancelled Details Release / Build Binaries (arm64, linux, linux-arm64) (push) Has been cancelled Details Release / Build ISO (amd64) (push) Has been cancelled Details Release / Create Release (push) Has been cancelled Details - Add kpartx for reliable loop partition mapping in Docker containers - Fix piCore64 download URL (changed from .img.gz to .zip format) - Fix piCore64 boot partition mount (initramfs on p1, not p2) - Fix tar --wildcards for RPi firmware extraction - Add MIT license (same as KubeSolo) - Add kpartx and unzip to Docker builder image Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-12 17:05:03 -06:00
Adolfo Delorenzo	6c15ba7776	fix: kernel AppArmor 2-pass olddefconfig and QEMU test direct kernel boot Some checks failed CI / Go Tests (push) Has been cancelled Details CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Has been cancelled Details CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Has been cancelled Details CI / Shellcheck (push) Has been cancelled Details The stock TinyCore kernel config has "# CONFIG_SECURITY is not set" which caused make olddefconfig to silently revert all security configs in a single pass. Fix by applying security configs (AppArmor, Audit, LSM) after the first olddefconfig resolves base dependencies, then running a second pass. Added mandatory verification that exits on missing critical configs. All QEMU test scripts converted from broken -cdrom + -append pattern to direct kernel boot (-kernel + -initrd) via shared test/lib/qemu-helpers.sh helper library. The -append flag only works with -kernel, not -cdrom. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-12 14:11:38 -06:00
Adolfo Delorenzo	958524e6d8	fix: Go version, test scripts, and shellcheck warnings from validation Some checks failed CI / Go Tests (push) Has been cancelled Details CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Has been cancelled Details CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Has been cancelled Details CI / Shellcheck (push) Has been cancelled Details - Dockerfile.builder: Go 1.24.0 → 1.25.5 (go.mod requires it) - test-boot.sh: use direct kernel boot via ISO extraction instead of broken -cdrom + -append; fix boot marker to "KubeSolo is running" (Stage 90 blocks on wait, never emits "complete") - test-security-hardening.sh: same direct kernel boot and marker fixes - run-vm.sh, dev-vm.sh, dev-vm-arm64.sh: quote QEMU -net args to silence shellcheck SC2054 - fetch-components.sh, fetch-rpi-firmware.sh, dev-vm-arm64.sh: fix trap quoting (SC2064) Validated: full Docker build, 94 Go tests pass, QEMU boot (73s), security hardening test (6/6 pass, 1 AppArmor skip pending kernel rebuild). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-12 13:30:55 -06:00
Adolfo Delorenzo	efc7f80b65	feat: add security hardening, AppArmor, and ARM64 Raspberry Pi support (Phase 6) Some checks failed CI / Go Tests (push) Has been cancelled Details CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Has been cancelled Details CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Has been cancelled Details CI / Shellcheck (push) Has been cancelled Details Security hardening: bind kubeconfig server to localhost, mount hardening (noexec/nosuid/nodev on tmpfs), sysctl network hardening, kernel module loading lock after boot, SHA256 checksum verification for downloads, kernel AppArmor + Audit support, complain-mode AppArmor profiles for containerd and kubelet, and security integration test. ARM64 Raspberry Pi support: piCore64 base extraction, RPi kernel build from raspberrypi/linux fork, RPi firmware fetch, SD card image with 4- partition GPT and tryboot A/B mechanism, BootEnv Go interface abstracting GRUB vs RPi boot environments, architecture-aware build scripts, QEMU aarch64 dev VM and boot test. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-12 13:08:17 -06:00
Adolfo Delorenzo	d9ac58418d	fix: macOS dev VM, CA certs, DNS fallback, Portainer Edge integration Some checks failed CI / Go Tests (push) Has been cancelled Details CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Has been cancelled Details CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Has been cancelled Details CI / Shellcheck (push) Has been cancelled Details - dev-vm.sh: rewrite for macOS (bsdtar ISO extraction, Homebrew mkfs.ext4 detection, direct kernel boot, TCG acceleration, port 8080 forwarding) - inject-kubesolo.sh: add CA certificates bundle from builder so containerd can verify TLS when pulling from registries (Docker Hub, etc.) - 50-network.sh: add DNS fallback (10.0.2.3 + 8.8.8.8) when DHCP client doesn't populate /etc/resolv.conf - 90-kubesolo.sh: serve kubeconfig via HTTP on port 8080 for reliable retrieval from host, add 127.0.0.1 and 10.0.2.15 to API server SANs - portainer.go: add headless Service to Edge Agent manifest (required for agent peer discovery DNS lookup) - 10-parse-cmdline.sh + init.sh: add kubesolo.edge_id/edge_key boot params - 20-persistent-mount.sh: auto-format unformatted data disks on first boot - hack/fix-portainer-service.sh: helper to patch running cluster Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-12 02:11:31 -06:00
Adolfo Delorenzo	39732488ef	feat: custom kernel build + boot fixes for working container runtime Build a custom Tiny Core 17.0 kernel (6.18.2) with missing configs that the stock kernel lacks for container workloads: - CONFIG_CGROUP_BPF=y (cgroup v2 device control via BPF) - CONFIG_DEVTMPFS=y (auto-create /dev device nodes) - CONFIG_DEVTMPFS_MOUNT=y (auto-mount devtmpfs) - CONFIG_MEMCG=y (memory cgroup controller for memory.max) - CONFIG_CFS_BANDWIDTH=y (CPU bandwidth throttling for cpu.max) Also strips unnecessary subsystems (sound, GPU, wireless, Bluetooth, KVM, etc.) for minimal footprint on a headless K8s edge appliance. Init system fixes for successful boot-to-running-pods: - Add switch_root in init.sh to escape initramfs (runc pivot_root) - Add mountpoint guards in 00-early-mount.sh (skip if already mounted) - Create essential device nodes after switch_root (kmsg, console, etc.) - Enable cgroup v2 controller delegation with init process isolation - Mount BPF filesystem for cgroup v2 device control - Add mknod fallback from sysfs in 20-persistent-mount.sh for /dev/vda - Move KubeSolo binary to /usr/bin (avoid /usr/local bind mount hiding) - Generate /etc/machine-id in 60-hostname.sh (kubelet requires it) - Pre-initialize iptables tables before kube-proxy starts - Add nft_reject, nft_fib, xt_nfacct to kernel modules list Build system changes: - New build-kernel.sh script for custom kernel compilation - Dockerfile.builder adds kernel build deps (flex, bison, libelf, etc.) - Selective kernel module install (only modules.list + transitive deps) - Install iptables-nft (xtables-nft-multi) + shared libs in rootfs Tested: ISO boots in QEMU, node reaches Ready in ~35s, CoreDNS and local-path-provisioner pods start and run successfully. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-11 23:13:31 -06:00
Adolfo Delorenzo	456aa8eb5b	feat: add distribution and fleet management — CI/CD, OCI, metrics, ARM64 (Phase 5) Some checks failed CI / Go Tests (push) Has been cancelled Details CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Has been cancelled Details CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Has been cancelled Details CI / Shellcheck (push) Has been cancelled Details - Gitea Actions CI pipeline: Go tests, build, shellcheck on push/PR - Gitea Actions release pipeline: full build + artifact upload on version tags - OCI container image builder for registry-based OS distribution - Zero-dependency Prometheus metrics endpoint (kubesolo_os_info, boot, memory, update status) with 10 tests - USB provisioning tool for air-gapped deployments with cloud-init injection - ARM64 cross-compilation support (TARGET_ARCH env var + build-cross.sh) - Updated build scripts to accept TARGET_ARCH for both amd64 and arm64 - New Makefile targets: oci-image, build-cross Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-11 11:36:53 -06:00
Adolfo Delorenzo	8d25e1890e	feat: add A/B partition updates with GRUB and Go update agent (Phase 3) Implement atomic OS updates via A/B partition scheme with automatic rollback. GRUB bootloader manages slot selection with a 3-attempt boot counter that auto-rolls back on repeated health check failures. GRUB boot config: - A/B slot selection with boot_counter/boot_success env vars - Automatic rollback when counter reaches 0 (3 failed boots) - Debug, emergency shell, and manual slot-switch menu entries Disk image (refactored): - 4-partition GPT layout: EFI + System A + System B + Data - GRUB EFI/BIOS installation with graceful fallbacks - Both system partitions populated during image creation Update agent (Go, zero external deps): - pkg/grubenv: read/write GRUB env vars (grub-editenv + manual fallback) - pkg/partition: find/mount/write system partitions by label - pkg/image: HTTP download with SHA256 verification - pkg/health: post-boot checks (containerd, API server, node Ready) - 6 CLI commands: check, apply, activate, rollback, healthcheck, status - 37 unit tests across all 4 packages Deployment: - K8s CronJob for automatic update checks (every 6 hours) - ConfigMap for update server URL - Health check Job for post-boot verification Build pipeline: - build-update-agent.sh compiles static Linux binary (~5.9 MB) - inject-kubesolo.sh includes update agent in initramfs - Makefile: build-update-agent, test-update-agent, test-update targets Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-11 11:12:46 -06:00
Adolfo Delorenzo	d900fa920e	feat: add cloud-init Go parser (Phase 2) Implement a lightweight cloud-init system for first-boot configuration: - Go parser for YAML config (hostname, network, KubeSolo settings) - Static/DHCP network modes with DNS override - KubeSolo extra flags and API server SAN configuration - Portainer Edge Agent and air-gapped deployment support - New init stage 45-cloud-init.sh runs before network/hostname stages - Stages 50/60 skip gracefully when cloud-init has already applied - Build script compiles static Linux/amd64 binary (~2.7 MB) - 17 unit tests covering parsing, validation, and example files - Full documentation at docs/cloud-init.md Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-11 10:39:05 -06:00
Adolfo Delorenzo	e372df578b	feat: initial Phase 1 PoC scaffolding for KubeSolo OS Complete Phase 1 implementation of KubeSolo OS — an immutable, bootable Linux distribution built on Tiny Core Linux for running KubeSolo single-node Kubernetes. Build system: - Makefile with fetch, rootfs, initramfs, iso, disk-image targets - Dockerfile.builder for reproducible builds - Scripts to download Tiny Core, extract rootfs, inject KubeSolo, pack initramfs, and create bootable ISO/disk images Init system (10 POSIX sh stages): - Early mount (proc/sys/dev/cgroup2), cmdline parsing, persistent mount with bind-mounts, kernel module loading, sysctl, DHCP networking, hostname, clock sync, containerd prep, KubeSolo exec Shared libraries: - functions.sh (device wait, IP lookup, config helpers) - network.sh (static IP, config persistence, interface detection) - health.sh (containerd, API server, node readiness checks) - Emergency shell for boot failure debugging Testing: - QEMU boot test with serial log marker detection - K8s readiness test with kubectl verification - Persistence test (reboot + verify state survives) - Workload deployment test (nginx pod) - Local storage test (PVC + local-path provisioner) - Network policy test - Reusable run-vm.sh launcher Developer tools: - dev-vm.sh (interactive QEMU with port forwarding) - rebuild-initramfs.sh (fast iteration) - inject-ssh.sh (dropbear SSH for debugging) - extract-kernel-config.sh + kernel-audit.sh Documentation: - Full design document with architecture research - Boot flow documentation covering all 10 init stages - Cloud-init examples (DHCP, static IP, Portainer Edge, air-gapped) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-11 10:18:42 -06:00

30 Commits