# ARM64 Generic Status (v0.3 in-progress) End-of-Phase-3 snapshot of the generic ARM64 build track. ## What works End-to-end boot through QEMU on an Odroid (aarch64 Ubuntu 22.04 build host): 1. `make kernel-arm64` produces a mainline 6.12.10 LTS kernel (44 MB Image, 868 modules) 2. `make rootfs-arm64` extracts piCore64 userland, replaces BusyBox with Ubuntu's static busybox-static, injects KubeSolo + Go agents + init scripts 3. `make disk-image-arm64` produces a UEFI-bootable 4 GB GPT image with GRUB A/B slots 4. `hack/dev-vm-arm64.sh --disk` boots the image: - UEFI firmware loads GRUB - GRUB loads kernel + initramfs - Custom init runs all 14 stages (early-mount, parse-cmdline, persistent-mount, kernel-modules, apparmor, sysctl, cloud-init, network, hostname, clock, containerd, security-lockdown, kubesolo) - Data partition mounts (ext4 on vda4) - Network configured (DHCP on virtio eth0) - KubeSolo starts; containerd boots successfully; CoreDNS + pause images register ## Known limitations of the current dev setup These are debugging-environment issues, not production blockers: ### 1. QEMU TCG performance hits KubeSolo's image-import deadline KubeSolo bundles its essential container images and imports them into containerd on first boot. Under QEMU TCG (software emulation on the Odroid's 1.8 GB / 6-core ARM64), the import takes longer than KubeSolo's internal deadline, so we see: ``` failed to import images: ... context deadline exceeded shutdown requested before containerd was ready ``` On real ARM64 hardware (Graviton, Ampere, RPi 5, etc.) this import completes in seconds. KVM acceleration on the Odroid would also fix it, but the Odroid's vendor kernel (4.9.337-38) doesn't ship the KVM module — fixing that requires a host-kernel upgrade outside this project's scope. ### 2. Hardcoded `/dev/vda4` data partition path Stage 20 currently expects `kubesolo.data=/dev/vda4` rather than `LABEL=KSOLODATA`. The LABEL= path is preferred (works regardless of disk naming on different hosts), but resolution depends on `blkid` and `findfs`, which: - piCore64 ships as dynamic util-linux binaries that crash in QEMU virt - Ubuntu's `busybox-static` 1.30.1 doesn't include the applets Production fix options (deferred to next phase): - Build a more comprehensive static BusyBox (Alpine's, or upstream + custom config) - Ship statically-linked `blkid` and `findfs` from util-linux - Replace LABEL resolution with a sysfs walk that reads `/sys/class/block/*/holders` and `/dev/` device numbers ### 3. AppArmor profiles fail to load `apparmor_parser` errors on the containerd and kubelet profiles, probably because the parser binary or libraries copied from the build host don't match the rootfs's libc layout. Boot proceeds without AppArmor enforcement. Same fix path as #2 (better static binaries). ### 4. piCore64 BusyBox swap is a build-host dependency `inject-kubesolo.sh` replaces piCore's `/bin/busybox` with the build host's `/bin/busybox` (Ubuntu's busybox-static package). That binary must exist on the build host or in the builder Docker image. Documented; works in CI because the Dockerfile installs busybox-static. A more reproducible approach (future work): ship a known-good ARM64 BusyBox binary as a tracked artifact rather than depending on the host package. ### 5. busybox-static 1.30.1 has its own bugs Even after the swap, some applets misbehave inside QEMU: - `modprobe` triggers "stack smashing detected" abort (kernel modules still load via direct write to /sys/... in stage 30, so this isn't fatal) - `tr` doesn't parse POSIX character classes like `[:space:]` — already worked around by using explicit `' \t\r\n'` in our scripts - Missing applets: `blkid`, `findfs`, `--version`, etc. These won't necessarily manifest on real hardware (different CPU, different glibc interaction) but they confirm that 1.30.1 isn't the right long-term BusyBox. ## What's needed to ship v0.3 ARM64 as production-ready In order of priority: 1. **Validate on real ARM64 hardware** — boot the image on a Graviton EC2 instance, Ampere VPS, RPi 5 (when hardware available), or any UEFI-capable ARM64 board. Confirm full KubeSolo bring-up: node Ready, pods schedule. 2. **Fix LABEL=KSOLODATA resolution** — see option list in #2 above. 3. **Replace busybox-static with a curated build** — see #4. 4. **Add a Gitea workflow** that runs `make kernel-arm64 + disk-image-arm64` on the Odroid runner and the QEMU boot-test as a smoke test (with the expectation that KubeSolo doesn't finish first-boot under TCG). ## Files exercised by the Phase 3 work | Path | Status | |------|--------| | `build/scripts/build-kernel-arm64.sh` | New — mainline 6.12.10 kernel build, native or cross | | `build/scripts/build-kernel-rpi.sh` | Renamed from old `build-kernel-arm64.sh` — RPi path | | `build/config/kernel-container.fragment` | Renamed from `rpi-kernel-config.fragment` | | `build/scripts/create-disk-image.sh` | Refactored — accepts `TARGET_ARCH=arm64` | | `build/grub/grub-arm64.cfg` | New — ARM64 console + `init=/sbin/init` | | `build/scripts/inject-kubesolo.sh` | Updated — BusyBox swap, `/init` install, variant routing | | `init/init.sh` | Updated — output to `/dev/console` for early-boot visibility | | `init/lib/30-kernel-modules.sh` | Fixed — `tr -d ' \t\r\n'` instead of `[:space:]` | | `init/lib/40-sysctl.sh` | Same fix | | `hack/dev-vm-arm64.sh` | Updated — `-cpu max`, UEFI `--disk` mode | | `test/qemu/test-boot-arm64-disk.sh` | New — CI test for UEFI boot | | `Makefile` | New targets: `kernel-arm64`, `kernel-rpi`, `disk-image-arm64`, `test-boot-arm64-disk`, `rootfs-arm64-rpi` | | `build/config/versions.env` | Pinned `MAINLINE_KERNEL_VERSION=6.12.10`, `KUBESOLO_VERSION=v1.1.0` | | `build/Dockerfile.builder` | Added `grub-efi-amd64-bin`, `grub-efi-arm64-bin`, `busybox-static` |