After v0.3.1 published successfully, run 524 stayed in 'queued' status
overall even though all 5 jobs that actually ran completed at success.
Cause: the gated build-iso-amd64 job is `if: false` with
`runs-on: amd64-linux`. No runner matches `amd64-linux`, so Gitea
queued the job indefinitely waiting for one. The `if:` expression
is only evaluated when a runner actually picks up the job, so the
skip never fires.
Switch the runs-on to `ubuntu-latest` (which our Odroid claims). The
runner picks the job up, evaluates `if: false`, marks it `skipped`,
and the run as a whole concludes properly.
Comment block updated to flag the two lines to flip when a real
amd64-linux runner is registered.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The v0.3.1 retag triggered BOTH .gitea/workflows/build-arm64.yaml AND
.gitea/workflows/release.yaml. Both build the ARM64 disk image from
scratch on the Odroid runner — each kernel build takes ~60 min. The
build-arm64 run finished first (uploaded as a workflow artifact, scoped
to that run), then release.yaml started another from-scratch build to
get the same artifact for the actual Gitea release. That's a wasted hour
on a constrained runner.
Limit build-arm64.yaml to push-to-main (for early breakage detection)
and manual workflow_dispatch. Tag-driven release pipelines are
release.yaml's job alone.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v0.3.1's first release.yaml run exposed two issues:
1. The `ubuntu-latest` label resolved to the Odroid (only runner registered
with that label), which is arm64. apt-get install grub-efi-amd64-bin
then failed because ports.ubuntu.com only ships arm64 packages — the
amd64 grub binaries don't exist in the arm64 repo. Building x86 ISOs
on an arm64 host requires either a native amd64 runner or
qemu-user-static emulation; neither is set up.
2. The `arm64-linux:host` runner runs jobs directly on the Odroid host
(no Docker), and actions/checkout@v4 is a JS action needing Node 20+
in $PATH. The Odroid had no Node installed at all, so checkout failed.
Fixes:
- `build-iso-amd64` gated `if: false` and `runs-on: amd64-linux`. The job
stays in the workflow as a placeholder for when an amd64 runner is
eventually registered. Flip the `if: false` line at that time and it
starts working.
- `release` job no longer depends on build-iso-amd64, so the workflow
completes with just ARM64 + Go binaries. `if: always() && needs.X ==
'success'` for the jobs we actually require.
- Release body no longer promises x86 artifacts that aren't there.
Replaced with a clear note about how to build x86 from source at the
release tag.
Operator action required for the Odroid runner:
curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
sudo apt install -y nodejs
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Second nft crash report from QEMU virt:
failed to set up pod masquerade
nft add table ip kubesolo-masq:
signal: aborted (output: *** stack smashing detected ***: terminated)
Root cause: two glibcs are visible to dynamically-linked binaries in the
rootfs. piCore64 ships glibc at /lib/libc.so.6; we copy the build host's
glibc (for the iptables-nft / nft / xtables-modules family) to
/lib/$LIB_ARCH/libc.so.6. The dynamic linker can resolve one binary's
NEEDED libc.so.6 to piCore's and another (via transitive load through
e.g. libnftables.so.1) to ours. Each libc has its own __stack_chk_guard
global; stack frames whose canary was written by code from libc-A and
checked by code from libc-B trip "stack smashing detected" → SIGABRT.
This didn't fire before nft was added because no host-installed dyn
binary actually got invoked before kubesolo crashed at first-boot
preflight.
Three layered fixes in inject-kubesolo.sh:
1. Bundle the full glibc family (was just libc.so.6 + ld). Now also
libpthread, libdl, libm, libresolv, librt, libanl, libgcc_s. Without
these, transitively-loaded host libs could pull them in from piCore's
/lib and re-introduce the split.
2. After bundling, delete piCore's duplicates from /lib/ where our copy
exists in /lib/$LIB_ARCH/. The dynamic linker's search now has
exactly one match per soname.
3. Write /etc/ld.so.conf giving /lib/$LIB_ARCH precedence over /lib, and
run `ldconfig -r "$ROOTFS"` to bake an explicit /etc/ld.so.cache.
The runtime linker uses the cache (when present) instead of falling
back to compiled-in default paths, making lookup order deterministic.
Also done (followups from previous commit):
- build/Dockerfile.builder gains nftables so docker-build picks up nft.
- .gitea/workflows/release.yaml's amd64 build job installs iptables +
nftables (previously only listed iptables-related libs but not the
CLIs themselves).
Verified by shellcheck. End-to-end QEMU verification on the Odroid next.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three changes that should have happened pre-v0.3.0:
1. Add a build-disk-arm64 job that runs on the arm64-linux runner (Odroid),
building kernel + rootfs + disk-image then xz-compressing the .arm64.img.
The previous release.yaml shipped x86_64 only.
2. Replace softprops/action-gh-release@v2 with a direct curl against Gitea's
/api/v1/repos/<owner>/<repo>/releases endpoint. The softprops action
hard-codes api.github.com instead of honouring ${{ github.api_url }},
so on Gitea's act_runner it succeeds silently without creating a
release. The curl path uses the auto-populated ${{ secrets.GITHUB_TOKEN }}
for auth; doc note in ci-runners.md covers the GITEA_TOKEN fallback.
3. Downgrade actions/upload-artifact and actions/download-artifact from
@v4 to @v3 to match Gitea act_runner v1.0.x's compatibility — same fix
we applied to ci.yaml in 0c6e200.
Also compress the x86 disk image with xz before uploading (parity with
the arm64 path, saves ~95% on bandwidth), and emit SHA256SUMS over all
attached artifacts.
docs/ci-runners.md gains a "Workflows in this repo" table, a per-job
breakdown of the release pipeline, the rationale for direct-curl over
the marketplace action, and a "manually re-running a release" section
warning against force-updating published tags.
This commit fixes the workflow but does not retroactively rebuild v0.3.0.
v0.3.0's release page already has the manually-uploaded arm64 image and
SHA256SUMS; x86 users who want the v0.3.0 artifact build from source
(documented in the release body). v0.3.1 will be the first tag that
exercises the fixed workflow end-to-end.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The existing ci.yaml had two unrelated breakages exposed by the recent runs:
1. actions/upload-artifact@v4 isn't fully implemented by Gitea's act_runner
yet. Downgrade to @v3 which works reliably.
2. Shellcheck fails on init scripts due to false-positive warnings (SC1090,
SC1091, SC2034) that are intrinsic to init-style code that sources other
files dynamically. The init scripts have always had these — they just
didn't fail builds before because... well, they did, this was already
failing.
Fix: run shellcheck with --severity=error and an exclude list. Real bugs
(errors) still fail CI; style/info findings (SC2002, SC2015, SC2012, SC2013)
don't. Validated locally: all four shellcheck steps exit 0 with this
configuration.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 4 of v0.3 — KubeSolo version bump and CI gating.
KubeSolo v1.1.0 → v1.1.5 brings:
- New flag --disable-ipv6 (v1.1.5)
- New flag --db-wal-repair (v1.1.5) — important for power-loss resilience
on edge appliances; surfaced as kubesolo.db-wal-repair in cloud-init
- New flag --full (v1.1.4) — disables edge-optimised k8s overrides
- Pod egress connectivity fix after reboot (v1.1.4)
- Registry config persistence fix (v1.1.5)
- k8s 1.34.7, CoreDNS 1.14.3, Go 1.26.2
All three new flags wired into cloud-init: config.go fields, kubesolo.go
extra-flag emission, full-config.yaml example.
Supply-chain hygiene:
- Per-arch checksums: KUBESOLO_SHA256_AMD64 and KUBESOLO_SHA256_ARM64 in
versions.env. Replaces the single shared KUBESOLO_SHA256 that couldn't
meaningfully verify both binaries at once.
- Checksum now applied to the tarball (the immutable upstream artifact)
rather than the post-extract binary.
CI:
- New .gitea/workflows/build-arm64.yaml routes the full kernel + rootfs +
disk-image build to the Odroid arm64-linux runner. Triggers on push to
main, tags, and manual workflow_dispatch. The boot smoke test is
continue-on-error because KubeSolo's first-boot image import deadline
fires under QEMU TCG on the Odroid.
VERSION bumped to 0.3.0-dev. CHANGELOG entry under [0.3.0-dev] captures all
Phase 1-4 work + the known limitations documented in arm64-status.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Gitea Actions CI pipeline: Go tests, build, shellcheck on push/PR
- Gitea Actions release pipeline: full build + artifact upload on version tags
- OCI container image builder for registry-based OS distribution
- Zero-dependency Prometheus metrics endpoint (kubesolo_os_info, boot,
memory, update status) with 10 tests
- USB provisioning tool for air-gapped deployments with cloud-init injection
- ARM64 cross-compilation support (TARGET_ARCH env var + build-cross.sh)
- Updated build scripts to accept TARGET_ARCH for both amd64 and arm64
- New Makefile targets: oci-image, build-cross
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>