5 Commits

Author SHA1 Message Date
31aac701db debug(arm64): use /dev/vda4 directly instead of LABEL=KSOLODATA
Some checks failed
CI / Go Tests (push) Successful in 1m28s
CI / Shellcheck (push) Failing after 46s
CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Failing after 1m18s
CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Failing after 1m15s
piCore64's blkid/findfs binaries (separate util-linux dynamics, NOT busybox
symlinks) crash in QEMU virt with the same instruction-abort issue as the
broken BusyBox. The host's static busybox doesn't include blkid/findfs
applets either, so stage 20-persistent-mount.sh segfaults in a loop trying
to resolve LABEL=KSOLODATA.

Short-term: hardcode /dev/vda4 (the virtio data partition under QEMU) so
the boot can progress past stage 20 and we can see what else needs fixing.

Pre-v0.3 release we need to either:
  a) ship a real blkid/findfs binary that works (util-linux from upstream,
     statically built), or
  b) avoid LABEL= entirely and detect the data partition by walking
     /sys/class/block looking for our ext4 magic+label.

Either way the LABEL= path needs to work on real ARM64 hosts where the
device path varies (vda/sda/nvme0n1).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 15:47:55 -06:00
863f498cc2 fix: kernel must use /sbin/init, not piCore's /init
Some checks failed
CI / Go Tests (push) Failing after 53s
CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Has been skipped
CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Has been skipped
CI / Shellcheck (push) Failing after 27s
Root cause of the 'Run /init as init process' -> immediate SIGSEGV panic on
the generic ARM64 boot: piCore64's rootfs ships a /init script at the rootfs
root, and the kernel's init search order picks /init over /sbin/init. piCore's
init then exec's something incompatible with our environment and segfaults.

Two fixes:
1. inject-kubesolo.sh now removes the upstream /init after replacing
   /sbin/init. This is the structural fix — the rootfs no longer has the
   conflicting entry-point.
2. grub-arm64.cfg passes init=/sbin/init explicitly. Belt-and-suspenders in
   case any future rootfs source re-introduces /init.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 14:43:35 -06:00
05ab108de1 fix(grub): put ttyAMA0 last so it's the primary console on ARM64
Some checks failed
CI / Go Tests (push) Successful in 1m29s
CI / Shellcheck (push) Failing after 40s
CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Failing after 1m21s
CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Failing after 1m9s
Kernel takes the last `console=` argument as primary (where init's stdout/stderr
land). The previous order had ttyS0 last, which is a dead device on QEMU virt
and most ARM64 SBCs — so init output disappeared and we only saw kernel panic
messages (which use earlycon, bypassing the console preference).

Also drop `quiet` from the default boot entry while we stabilise — we need the
kernel + init output visible right now.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 14:11:58 -06:00
80aca5e372 feat: ARM64 generic UEFI disk image (GPT + GRUB A/B)
Some checks failed
CI / Go Tests (push) Successful in 2m38s
CI / Shellcheck (push) Failing after 37s
CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Failing after 1m22s
CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Failing after 1m11s
Produces a UEFI-bootable raw disk image for generic ARM64 hosts (QEMU virt,
Ampere/Graviton cloud, ARM64 SBCs with UEFI). Reuses the existing 4-partition
A/B layout from x86 (EFI 256 MB FAT32 + System A 512 MB ext4 + System B 512 MB
ext4 + Data ext4 remainder).

Changes:
- build/scripts/create-disk-image.sh: TARGET_ARCH env var (amd64 default,
  arm64). Selects kernel source path, grub-mkimage target (x86_64-efi vs
  arm64-efi), EFI binary name (bootx64.efi vs BOOTAA64.EFI), grub.cfg variant,
  and whether to also install BIOS GRUB (x86 only).
- build/grub/grub-arm64.cfg: ARM64 variant of grub.cfg. Identical A/B logic;
  console=ttyAMA0+ttyS0 to cover QEMU virt PL011, Ampere PL011, and Graviton
  16550-compat.
- build/Dockerfile.builder: add grub-efi-amd64-bin, grub-efi-arm64-bin,
  grub-pc-bin, grub-common, grub2-common so the builder container can produce
  EFI images for both architectures.
- hack/dev-vm-arm64.sh: split into kernel mode (direct -kernel/-initrd, fast
  iteration) and --disk mode (UEFI firmware + GRUB + disk image, full
  integration test). Probes common UEFI firmware paths on Ubuntu/Fedora/macOS.
  Default kernel path now points at kernel-arm64-generic/Image with fallback
  to the renamed custom-kernel-rpi/Image.
- test/qemu/test-boot-arm64-disk.sh: new CI test for the full UEFI -> GRUB ->
  kernel -> stage-90 boot chain. Uses a scratch copy of the disk so grubenv
  writes don't mutate the source artifact.
- Makefile: new disk-image-arm64 target (depends on rootfs-arm64 + kernel-arm64),
  new test-boot-arm64-disk target, .PHONY + help updates.

Phase 3 scaffold is in place. First real end-to-end ARM64 build runs in the
next step on the Odroid runner — that's where we find out what's actually
broken.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 10:36:08 -06:00
8d25e1890e feat: add A/B partition updates with GRUB and Go update agent (Phase 3)
Implement atomic OS updates via A/B partition scheme with automatic
rollback. GRUB bootloader manages slot selection with a 3-attempt
boot counter that auto-rolls back on repeated health check failures.

GRUB boot config:
- A/B slot selection with boot_counter/boot_success env vars
- Automatic rollback when counter reaches 0 (3 failed boots)
- Debug, emergency shell, and manual slot-switch menu entries

Disk image (refactored):
- 4-partition GPT layout: EFI + System A + System B + Data
- GRUB EFI/BIOS installation with graceful fallbacks
- Both system partitions populated during image creation

Update agent (Go, zero external deps):
- pkg/grubenv: read/write GRUB env vars (grub-editenv + manual fallback)
- pkg/partition: find/mount/write system partitions by label
- pkg/image: HTTP download with SHA256 verification
- pkg/health: post-boot checks (containerd, API server, node Ready)
- 6 CLI commands: check, apply, activate, rollback, healthcheck, status
- 37 unit tests across all 4 packages

Deployment:
- K8s CronJob for automatic update checks (every 6 hours)
- ConfigMap for update server URL
- Health check Job for post-boot verification

Build pipeline:
- build-update-agent.sh compiles static Linux binary (~5.9 MB)
- inject-kubesolo.sh includes update agent in initramfs
- Makefile: build-update-agent, test-update-agent, test-update targets

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 11:12:46 -06:00