eb39787cf3427f3a5947f6dcb2ba6eb1a614055c
6 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
| eb39787cf3 |
ci: gate x86 build until amd64 runner exists; ARM64 release self-sufficient
Some checks failed
CI / Go Tests (push) Successful in 2m30s
CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Successful in 1m37s
CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Successful in 2m0s
CI / Shellcheck (push) Failing after 10m50s
Release / Build x86_64 ISO + disk image (push) Blocked by required conditions
ARM64 Build / Build generic ARM64 disk image (push) Failing after 1h6m52s
Release / Test (push) Successful in 1m59s
Release / Build Binaries (linux-amd64) (push) Successful in 1m33s
Release / Build Binaries (linux-arm64) (push) Successful in 1m40s
Release / Build ARM64 disk image (push) Successful in 1h11m43s
Release / Publish Gitea Release (push) Successful in 3m1s
v0.3.1's first release.yaml run exposed two issues: 1. The `ubuntu-latest` label resolved to the Odroid (only runner registered with that label), which is arm64. apt-get install grub-efi-amd64-bin then failed because ports.ubuntu.com only ships arm64 packages — the amd64 grub binaries don't exist in the arm64 repo. Building x86 ISOs on an arm64 host requires either a native amd64 runner or qemu-user-static emulation; neither is set up. 2. The `arm64-linux:host` runner runs jobs directly on the Odroid host (no Docker), and actions/checkout@v4 is a JS action needing Node 20+ in $PATH. The Odroid had no Node installed at all, so checkout failed. Fixes: - `build-iso-amd64` gated `if: false` and `runs-on: amd64-linux`. The job stays in the workflow as a placeholder for when an amd64 runner is eventually registered. Flip the `if: false` line at that time and it starts working. - `release` job no longer depends on build-iso-amd64, so the workflow completes with just ARM64 + Go binaries. `if: always() && needs.X == 'success'` for the jobs we actually require. - Release body no longer promises x86 artifacts that aren't there. Replaced with a clear note about how to build x86 from source at the release tag. Operator action required for the Odroid runner: curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash - sudo apt install -y nodejs Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
|||
| 76ed2ffc14 |
fix(arm64): resolve dual-glibc loading that triggers stack-canary aborts
Some checks failed
ARM64 Build / Build generic ARM64 disk image (push) Failing after 5s
CI / Go Tests (push) Successful in 1m49s
CI / Shellcheck (push) Successful in 56s
CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Successful in 1m43s
CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Successful in 1m54s
Second nft crash report from QEMU virt:
failed to set up pod masquerade
nft add table ip kubesolo-masq:
signal: aborted (output: *** stack smashing detected ***: terminated)
Root cause: two glibcs are visible to dynamically-linked binaries in the
rootfs. piCore64 ships glibc at /lib/libc.so.6; we copy the build host's
glibc (for the iptables-nft / nft / xtables-modules family) to
/lib/$LIB_ARCH/libc.so.6. The dynamic linker can resolve one binary's
NEEDED libc.so.6 to piCore's and another (via transitive load through
e.g. libnftables.so.1) to ours. Each libc has its own __stack_chk_guard
global; stack frames whose canary was written by code from libc-A and
checked by code from libc-B trip "stack smashing detected" → SIGABRT.
This didn't fire before nft was added because no host-installed dyn
binary actually got invoked before kubesolo crashed at first-boot
preflight.
Three layered fixes in inject-kubesolo.sh:
1. Bundle the full glibc family (was just libc.so.6 + ld). Now also
libpthread, libdl, libm, libresolv, librt, libanl, libgcc_s. Without
these, transitively-loaded host libs could pull them in from piCore's
/lib and re-introduce the split.
2. After bundling, delete piCore's duplicates from /lib/ where our copy
exists in /lib/$LIB_ARCH/. The dynamic linker's search now has
exactly one match per soname.
3. Write /etc/ld.so.conf giving /lib/$LIB_ARCH precedence over /lib, and
run `ldconfig -r "$ROOTFS"` to bake an explicit /etc/ld.so.cache.
The runtime linker uses the cache (when present) instead of falling
back to compiled-in default paths, making lookup order deterministic.
Also done (followups from previous commit):
- build/Dockerfile.builder gains nftables so docker-build picks up nft.
- .gitea/workflows/release.yaml's amd64 build job installs iptables +
nftables (previously only listed iptables-related libs but not the
CLIs themselves).
Verified by shellcheck. End-to-end QEMU verification on the Odroid next.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
|||
| f8c308d9b7 |
ci: fix release.yaml so v0.3.1+ auto-publishes a complete release
Some checks failed
ARM64 Build / Build generic ARM64 disk image (push) Failing after 3s
CI / Go Tests (push) Successful in 1m40s
CI / Shellcheck (push) Successful in 55s
CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Successful in 1m16s
CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Successful in 1m21s
Three changes that should have happened pre-v0.3.0:
1. Add a build-disk-arm64 job that runs on the arm64-linux runner (Odroid),
building kernel + rootfs + disk-image then xz-compressing the .arm64.img.
The previous release.yaml shipped x86_64 only.
2. Replace softprops/action-gh-release@v2 with a direct curl against Gitea's
/api/v1/repos/<owner>/<repo>/releases endpoint. The softprops action
hard-codes api.github.com instead of honouring ${{ github.api_url }},
so on Gitea's act_runner it succeeds silently without creating a
release. The curl path uses the auto-populated ${{ secrets.GITHUB_TOKEN }}
for auth; doc note in ci-runners.md covers the GITEA_TOKEN fallback.
3. Downgrade actions/upload-artifact and actions/download-artifact from
@v4 to @v3 to match Gitea act_runner v1.0.x's compatibility — same fix
we applied to ci.yaml in
|
|||
| 0c6e200585 |
ci: fix shellcheck + upload-artifact failures
Some checks failed
ARM64 Build / Build generic ARM64 disk image (push) Failing after 14s
CI / Go Tests (push) Failing after 11s
CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Has been skipped
CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Has been skipped
CI / Shellcheck (push) Failing after 6s
The existing ci.yaml had two unrelated breakages exposed by the recent runs: 1. actions/upload-artifact@v4 isn't fully implemented by Gitea's act_runner yet. Downgrade to @v3 which works reliably. 2. Shellcheck fails on init scripts due to false-positive warnings (SC1090, SC1091, SC2034) that are intrinsic to init-style code that sources other files dynamically. The init scripts have always had these — they just didn't fail builds before because... well, they did, this was already failing. Fix: run shellcheck with --severity=error and an exclude list. Real bugs (errors) still fail CI; style/info findings (SC2002, SC2015, SC2012, SC2013) don't. Validated locally: all four shellcheck steps exit 0 with this configuration. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
|||
| 1b44c9d621 |
feat: bump KubeSolo to v1.1.5 + cross-arch CI workflow
Some checks failed
ARM64 Build / Build generic ARM64 disk image (push) Failing after 3s
CI / Go Tests (push) Successful in 1m27s
CI / Shellcheck (push) Failing after 50s
CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Failing after 1m33s
CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Failing after 1m15s
Phase 4 of v0.3 — KubeSolo version bump and CI gating. KubeSolo v1.1.0 → v1.1.5 brings: - New flag --disable-ipv6 (v1.1.5) - New flag --db-wal-repair (v1.1.5) — important for power-loss resilience on edge appliances; surfaced as kubesolo.db-wal-repair in cloud-init - New flag --full (v1.1.4) — disables edge-optimised k8s overrides - Pod egress connectivity fix after reboot (v1.1.4) - Registry config persistence fix (v1.1.5) - k8s 1.34.7, CoreDNS 1.14.3, Go 1.26.2 All three new flags wired into cloud-init: config.go fields, kubesolo.go extra-flag emission, full-config.yaml example. Supply-chain hygiene: - Per-arch checksums: KUBESOLO_SHA256_AMD64 and KUBESOLO_SHA256_ARM64 in versions.env. Replaces the single shared KUBESOLO_SHA256 that couldn't meaningfully verify both binaries at once. - Checksum now applied to the tarball (the immutable upstream artifact) rather than the post-extract binary. CI: - New .gitea/workflows/build-arm64.yaml routes the full kernel + rootfs + disk-image build to the Odroid arm64-linux runner. Triggers on push to main, tags, and manual workflow_dispatch. The boot smoke test is continue-on-error because KubeSolo's first-boot image import deadline fires under QEMU TCG on the Odroid. VERSION bumped to 0.3.0-dev. CHANGELOG entry under [0.3.0-dev] captures all Phase 1-4 work + the known limitations documented in arm64-status.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
|||
| 456aa8eb5b |
feat: add distribution and fleet management — CI/CD, OCI, metrics, ARM64 (Phase 5)
- Gitea Actions CI pipeline: Go tests, build, shellcheck on push/PR - Gitea Actions release pipeline: full build + artifact upload on version tags - OCI container image builder for registry-based OS distribution - Zero-dependency Prometheus metrics endpoint (kubesolo_os_info, boot, memory, update status) with 10 tests - USB provisioning tool for air-gapped deployments with cloud-init injection - ARM64 cross-compilation support (TARGET_ARCH env var + build-cross.sh) - Updated build scripts to accept TARGET_ARCH for both amd64 and arm64 - New Makefile targets: oci-image, build-cross Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |