fix(arm64): resolve dual-glibc loading that triggers stack-canary aborts
Some checks failed
ARM64 Build / Build generic ARM64 disk image (push) Failing after 5s
CI / Go Tests (push) Successful in 1m49s
CI / Shellcheck (push) Successful in 56s
CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Successful in 1m43s
CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Successful in 1m54s

Second nft crash report from QEMU virt:

  failed to set up pod masquerade
    nft add table ip kubesolo-masq:
      signal: aborted (output: *** stack smashing detected ***: terminated)

Root cause: two glibcs are visible to dynamically-linked binaries in the
rootfs. piCore64 ships glibc at /lib/libc.so.6; we copy the build host's
glibc (for the iptables-nft / nft / xtables-modules family) to
/lib/$LIB_ARCH/libc.so.6. The dynamic linker can resolve one binary's
NEEDED libc.so.6 to piCore's and another (via transitive load through
e.g. libnftables.so.1) to ours. Each libc has its own __stack_chk_guard
global; stack frames whose canary was written by code from libc-A and
checked by code from libc-B trip "stack smashing detected" → SIGABRT.
This didn't fire before nft was added because no host-installed dyn
binary actually got invoked before kubesolo crashed at first-boot
preflight.

Three layered fixes in inject-kubesolo.sh:

1. Bundle the full glibc family (was just libc.so.6 + ld). Now also
   libpthread, libdl, libm, libresolv, librt, libanl, libgcc_s. Without
   these, transitively-loaded host libs could pull them in from piCore's
   /lib and re-introduce the split.

2. After bundling, delete piCore's duplicates from /lib/ where our copy
   exists in /lib/$LIB_ARCH/. The dynamic linker's search now has
   exactly one match per soname.

3. Write /etc/ld.so.conf giving /lib/$LIB_ARCH precedence over /lib, and
   run `ldconfig -r "$ROOTFS"` to bake an explicit /etc/ld.so.cache.
   The runtime linker uses the cache (when present) instead of falling
   back to compiled-in default paths, making lookup order deterministic.

Also done (followups from previous commit):

- build/Dockerfile.builder gains nftables so docker-build picks up nft.
- .gitea/workflows/release.yaml's amd64 build job installs iptables +
  nftables (previously only listed iptables-related libs but not the
  CLIs themselves).

Verified by shellcheck. End-to-end QEMU verification on the Odroid next.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-15 07:56:49 -06:00
parent 51c1f78aea
commit 76ed2ffc14
3 changed files with 65 additions and 2 deletions

View File

@@ -30,6 +30,7 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
libarchive-tools \
libelf-dev \
libssl-dev \
nftables \
make \
parted \
squashfs-tools \

View File

@@ -397,7 +397,13 @@ if [ -f /usr/sbin/xtables-nft-multi ]; then
ln -sf xtables-nft-multi "$ROOTFS/usr/sbin/$cmd"
done
# Copy required shared libraries (architecture-aware paths)
# Copy required shared libraries (architecture-aware paths).
# We deliberately bundle the *full* glibc family from the build host —
# not just libc.so.6 — so dynamically-linked binaries we ship (nft,
# xtables-nft-multi, etc.) load a consistent set of libraries. Mixing
# glibc components across versions causes __stack_chk_guard mismatches
# ("stack smashing detected" aborts) when stack frames cross between
# functions linked against different libcs.
mkdir -p "$ROOTFS/usr/lib/$LIB_ARCH" "$ROOTFS/lib/$LIB_ARCH"
[ "$INJECT_ARCH" != "arm64" ] && mkdir -p "$ROOTFS/lib64"
for lib in \
@@ -405,6 +411,13 @@ if [ -f /usr/sbin/xtables-nft-multi ]; then
"/lib/$LIB_ARCH/libmnl.so.0"* \
"/lib/$LIB_ARCH/libnftnl.so.11"* \
"/lib/$LIB_ARCH/libc.so.6" \
"/lib/$LIB_ARCH/libpthread.so.0" \
"/lib/$LIB_ARCH/libdl.so.2" \
"/lib/$LIB_ARCH/libm.so.6" \
"/lib/$LIB_ARCH/libresolv.so.2" \
"/lib/$LIB_ARCH/librt.so.1" \
"/lib/$LIB_ARCH/libanl.so.1" \
"/lib/$LIB_ARCH/libgcc_s.so.1" \
"$LD_SO"; do
[ -e "$lib" ] && cp -aL "$lib" "$ROOTFS${lib}" 2>/dev/null || true
done
@@ -541,6 +554,54 @@ nameserver 1.1.1.1
EOF
fi
# --- Resolve dual-glibc ambiguity (ARM64) ---
# piCore64's rootfs ships glibc at /lib/libc.so.6, and we've copied the
# build host's glibc to /lib/$LIB_ARCH/libc.so.6. Two libc.so.6 in the
# dynamic linker's search path can lead to a process loading both — one
# directly, one transitively — and "stack smashing detected" aborts when
# stack frames cross between them (each libc has its own
# __stack_chk_guard). Remove piCore's copies so resolution is unambiguous
# and write a proper /etc/ld.so.conf + cache pointing at our copies.
if [ "$INJECT_ARCH" = "arm64" ] && [ -d "$ROOTFS/lib/$LIB_ARCH" ]; then
echo " Pruning duplicate glibc components in $ROOTFS/lib/..."
for lib in \
libc.so.6 \
libpthread.so.0 \
libdl.so.2 \
libm.so.6 \
libresolv.so.2 \
librt.so.1 \
libanl.so.1 \
libgcc_s.so.1; do
# Only delete piCore's copy when our version exists; otherwise
# we'd leave the binary unable to find any libc at all.
if [ -e "$ROOTFS/lib/$lib" ] && [ -e "$ROOTFS/lib/$LIB_ARCH/$lib" ]; then
rm -f "$ROOTFS/lib/$lib"
fi
done
# ld.so.conf gives our $LIB_ARCH paths precedence over piCore's /lib
# (defaults vary by glibc version; this makes the order explicit).
cat > "$ROOTFS/etc/ld.so.conf" <<EOF
/lib/$LIB_ARCH
/usr/lib/$LIB_ARCH
/usr/local/lib
/lib
/usr/lib
EOF
# Generate /etc/ld.so.cache. ldconfig -r treats $ROOTFS as the system
# root, so it reads ld.so.conf from there and writes the cache there.
# Works even cross-arch (it only parses ELF headers, doesn't execute).
if command -v ldconfig >/dev/null 2>&1; then
ldconfig -r "$ROOTFS" 2>/dev/null && \
echo " Generated /etc/ld.so.cache via ldconfig" || \
echo " WARN: ldconfig failed; falling back to default search order"
else
echo " WARN: ldconfig not on builder; cache not generated"
fi
fi
# --- Summary ---
echo ""
echo "==> Injection complete. Rootfs contents:"