Commit Graph

3 Commits

Author SHA1 Message Date
76ed2ffc14 fix(arm64): resolve dual-glibc loading that triggers stack-canary aborts
Some checks failed
ARM64 Build / Build generic ARM64 disk image (push) Failing after 5s
CI / Go Tests (push) Successful in 1m49s
CI / Shellcheck (push) Successful in 56s
CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Successful in 1m43s
CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Successful in 1m54s
Second nft crash report from QEMU virt:

  failed to set up pod masquerade
    nft add table ip kubesolo-masq:
      signal: aborted (output: *** stack smashing detected ***: terminated)

Root cause: two glibcs are visible to dynamically-linked binaries in the
rootfs. piCore64 ships glibc at /lib/libc.so.6; we copy the build host's
glibc (for the iptables-nft / nft / xtables-modules family) to
/lib/$LIB_ARCH/libc.so.6. The dynamic linker can resolve one binary's
NEEDED libc.so.6 to piCore's and another (via transitive load through
e.g. libnftables.so.1) to ours. Each libc has its own __stack_chk_guard
global; stack frames whose canary was written by code from libc-A and
checked by code from libc-B trip "stack smashing detected" → SIGABRT.
This didn't fire before nft was added because no host-installed dyn
binary actually got invoked before kubesolo crashed at first-boot
preflight.

Three layered fixes in inject-kubesolo.sh:

1. Bundle the full glibc family (was just libc.so.6 + ld). Now also
   libpthread, libdl, libm, libresolv, librt, libanl, libgcc_s. Without
   these, transitively-loaded host libs could pull them in from piCore's
   /lib and re-introduce the split.

2. After bundling, delete piCore's duplicates from /lib/ where our copy
   exists in /lib/$LIB_ARCH/. The dynamic linker's search now has
   exactly one match per soname.

3. Write /etc/ld.so.conf giving /lib/$LIB_ARCH precedence over /lib, and
   run `ldconfig -r "$ROOTFS"` to bake an explicit /etc/ld.so.cache.
   The runtime linker uses the cache (when present) instead of falling
   back to compiled-in default paths, making lookup order deterministic.

Also done (followups from previous commit):

- build/Dockerfile.builder gains nftables so docker-build picks up nft.
- .gitea/workflows/release.yaml's amd64 build job installs iptables +
  nftables (previously only listed iptables-related libs but not the
  CLIs themselves).

Verified by shellcheck. End-to-end QEMU verification on the Odroid next.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 07:56:49 -06:00
f8c308d9b7 ci: fix release.yaml so v0.3.1+ auto-publishes a complete release
Some checks failed
ARM64 Build / Build generic ARM64 disk image (push) Failing after 3s
CI / Go Tests (push) Successful in 1m40s
CI / Shellcheck (push) Successful in 55s
CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Successful in 1m16s
CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Successful in 1m21s
Three changes that should have happened pre-v0.3.0:

1. Add a build-disk-arm64 job that runs on the arm64-linux runner (Odroid),
   building kernel + rootfs + disk-image then xz-compressing the .arm64.img.
   The previous release.yaml shipped x86_64 only.

2. Replace softprops/action-gh-release@v2 with a direct curl against Gitea's
   /api/v1/repos/<owner>/<repo>/releases endpoint. The softprops action
   hard-codes api.github.com instead of honouring ${{ github.api_url }},
   so on Gitea's act_runner it succeeds silently without creating a
   release. The curl path uses the auto-populated ${{ secrets.GITHUB_TOKEN }}
   for auth; doc note in ci-runners.md covers the GITEA_TOKEN fallback.

3. Downgrade actions/upload-artifact and actions/download-artifact from
   @v4 to @v3 to match Gitea act_runner v1.0.x's compatibility — same fix
   we applied to ci.yaml in 0c6e200.

Also compress the x86 disk image with xz before uploading (parity with
the arm64 path, saves ~95% on bandwidth), and emit SHA256SUMS over all
attached artifacts.

docs/ci-runners.md gains a "Workflows in this repo" table, a per-job
breakdown of the release pipeline, the rationale for direct-curl over
the marketplace action, and a "manually re-running a release" section
warning against force-updating published tags.

This commit fixes the workflow but does not retroactively rebuild v0.3.0.
v0.3.0's release page already has the manually-uploaded arm64 image and
SHA256SUMS; x86 users who want the v0.3.0 artifact build from source
(documented in the release body). v0.3.1 will be the first tag that
exercises the fixed workflow end-to-end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 20:18:41 -06:00
456aa8eb5b feat: add distribution and fleet management — CI/CD, OCI, metrics, ARM64 (Phase 5)
Some checks failed
CI / Go Tests (push) Has been cancelled
CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Has been cancelled
CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Has been cancelled
CI / Shellcheck (push) Has been cancelled
- Gitea Actions CI pipeline: Go tests, build, shellcheck on push/PR
- Gitea Actions release pipeline: full build + artifact upload on version tags
- OCI container image builder for registry-based OS distribution
- Zero-dependency Prometheus metrics endpoint (kubesolo_os_info, boot,
  memory, update status) with 10 tests
- USB provisioning tool for air-gapped deployments with cloud-init injection
- ARM64 cross-compilation support (TARGET_ARCH env var + build-cross.sh)
- Updated build scripts to accept TARGET_ARCH for both amd64 and arm64
- New Makefile targets: oci-image, build-cross

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 11:36:53 -06:00