Remove [KSOLO-DBG] per-step echos from init.sh. The /dev/console redirect
stays — it's load-bearing for early-boot visibility on QEMU virt.
Add docs/arm64-status.md capturing the end-of-Phase-3 state:
- What works (full boot through 14 stages, KubeSolo + containerd start)
- Known limitations of the dev setup (QEMU TCG perf, /dev/vda4 hardcode,
busybox-static gaps)
- What's needed to ship v0.3 ARM64 as production-ready
Real-hardware validation (Graviton, Ampere, or similar) is the next gating
step before we can call ARM64 generic done.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5.8 KiB
ARM64 Generic Status (v0.3 in-progress)
End-of-Phase-3 snapshot of the generic ARM64 build track.
What works
End-to-end boot through QEMU on an Odroid (aarch64 Ubuntu 22.04 build host):
make kernel-arm64produces a mainline 6.12.10 LTS kernel (44 MB Image, 868 modules)make rootfs-arm64extracts piCore64 userland, replaces BusyBox with Ubuntu's static busybox-static, injects KubeSolo + Go agents + init scriptsmake disk-image-arm64produces a UEFI-bootable 4 GB GPT image with GRUB A/B slotshack/dev-vm-arm64.sh --diskboots the image:- UEFI firmware loads GRUB
- GRUB loads kernel + initramfs
- Custom init runs all 14 stages (early-mount, parse-cmdline, persistent-mount, kernel-modules, apparmor, sysctl, cloud-init, network, hostname, clock, containerd, security-lockdown, kubesolo)
- Data partition mounts (ext4 on vda4)
- Network configured (DHCP on virtio eth0)
- KubeSolo starts; containerd boots successfully; CoreDNS + pause images register
Known limitations of the current dev setup
These are debugging-environment issues, not production blockers:
1. QEMU TCG performance hits KubeSolo's image-import deadline
KubeSolo bundles its essential container images and imports them into containerd on first boot. Under QEMU TCG (software emulation on the Odroid's 1.8 GB / 6-core ARM64), the import takes longer than KubeSolo's internal deadline, so we see:
failed to import images: ... context deadline exceeded
shutdown requested before containerd was ready
On real ARM64 hardware (Graviton, Ampere, RPi 5, etc.) this import completes in seconds. KVM acceleration on the Odroid would also fix it, but the Odroid's vendor kernel (4.9.337-38) doesn't ship the KVM module — fixing that requires a host-kernel upgrade outside this project's scope.
2. Hardcoded /dev/vda4 data partition path
Stage 20 currently expects kubesolo.data=/dev/vda4 rather than
LABEL=KSOLODATA. The LABEL= path is preferred (works regardless of disk
naming on different hosts), but resolution depends on blkid and findfs,
which:
- piCore64 ships as dynamic util-linux binaries that crash in QEMU virt
- Ubuntu's
busybox-static1.30.1 doesn't include the applets
Production fix options (deferred to next phase):
- Build a more comprehensive static BusyBox (Alpine's, or upstream + custom config)
- Ship statically-linked
blkidandfindfsfrom util-linux - Replace LABEL resolution with a sysfs walk that reads
/sys/class/block/*/holdersand/dev/<n>device numbers
3. AppArmor profiles fail to load
apparmor_parser errors on the containerd and kubelet profiles, probably
because the parser binary or libraries copied from the build host don't
match the rootfs's libc layout. Boot proceeds without AppArmor enforcement.
Same fix path as #2 (better static binaries).
4. piCore64 BusyBox swap is a build-host dependency
inject-kubesolo.sh replaces piCore's /bin/busybox with the build host's
/bin/busybox (Ubuntu's busybox-static package). That binary must exist on
the build host or in the builder Docker image. Documented; works in CI
because the Dockerfile installs busybox-static.
A more reproducible approach (future work): ship a known-good ARM64 BusyBox binary as a tracked artifact rather than depending on the host package.
5. busybox-static 1.30.1 has its own bugs
Even after the swap, some applets misbehave inside QEMU:
modprobetriggers "stack smashing detected" abort (kernel modules still load via direct write to /sys/... in stage 30, so this isn't fatal)trdoesn't parse POSIX character classes like[:space:]— already worked around by using explicit' \t\r\n'in our scripts- Missing applets:
blkid,findfs,--version, etc.
These won't necessarily manifest on real hardware (different CPU, different glibc interaction) but they confirm that 1.30.1 isn't the right long-term BusyBox.
What's needed to ship v0.3 ARM64 as production-ready
In order of priority:
- Validate on real ARM64 hardware — boot the image on a Graviton EC2 instance, Ampere VPS, RPi 5 (when hardware available), or any UEFI-capable ARM64 board. Confirm full KubeSolo bring-up: node Ready, pods schedule.
- Fix LABEL=KSOLODATA resolution — see option list in #2 above.
- Replace busybox-static with a curated build — see #4.
- Add a Gitea workflow that runs
make kernel-arm64 + disk-image-arm64on the Odroid runner and the QEMU boot-test as a smoke test (with the expectation that KubeSolo doesn't finish first-boot under TCG).
Files exercised by the Phase 3 work
| Path | Status |
|---|---|
build/scripts/build-kernel-arm64.sh |
New — mainline 6.12.10 kernel build, native or cross |
build/scripts/build-kernel-rpi.sh |
Renamed from old build-kernel-arm64.sh — RPi path |
build/config/kernel-container.fragment |
Renamed from rpi-kernel-config.fragment |
build/scripts/create-disk-image.sh |
Refactored — accepts TARGET_ARCH=arm64 |
build/grub/grub-arm64.cfg |
New — ARM64 console + init=/sbin/init |
build/scripts/inject-kubesolo.sh |
Updated — BusyBox swap, /init install, variant routing |
init/init.sh |
Updated — output to /dev/console for early-boot visibility |
init/lib/30-kernel-modules.sh |
Fixed — tr -d ' \t\r\n' instead of [:space:] |
init/lib/40-sysctl.sh |
Same fix |
hack/dev-vm-arm64.sh |
Updated — -cpu max, UEFI --disk mode |
test/qemu/test-boot-arm64-disk.sh |
New — CI test for UEFI boot |
Makefile |
New targets: kernel-arm64, kernel-rpi, disk-image-arm64, test-boot-arm64-disk, rootfs-arm64-rpi |
build/config/versions.env |
Pinned MAINLINE_KERNEL_VERSION=6.12.10, KUBESOLO_VERSION=v1.1.0 |
build/Dockerfile.builder |
Added grub-efi-amd64-bin, grub-efi-arm64-bin, busybox-static |