fix(arm64): resolve dual-glibc loading that triggers stack-canary aborts
Some checks failed
ARM64 Build / Build generic ARM64 disk image (push) Failing after 5s
CI / Go Tests (push) Successful in 1m49s
CI / Shellcheck (push) Successful in 56s
CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Successful in 1m43s
CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Successful in 1m54s
Some checks failed
ARM64 Build / Build generic ARM64 disk image (push) Failing after 5s
CI / Go Tests (push) Successful in 1m49s
CI / Shellcheck (push) Successful in 56s
CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Successful in 1m43s
CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Successful in 1m54s
Second nft crash report from QEMU virt:
failed to set up pod masquerade
nft add table ip kubesolo-masq:
signal: aborted (output: *** stack smashing detected ***: terminated)
Root cause: two glibcs are visible to dynamically-linked binaries in the
rootfs. piCore64 ships glibc at /lib/libc.so.6; we copy the build host's
glibc (for the iptables-nft / nft / xtables-modules family) to
/lib/$LIB_ARCH/libc.so.6. The dynamic linker can resolve one binary's
NEEDED libc.so.6 to piCore's and another (via transitive load through
e.g. libnftables.so.1) to ours. Each libc has its own __stack_chk_guard
global; stack frames whose canary was written by code from libc-A and
checked by code from libc-B trip "stack smashing detected" → SIGABRT.
This didn't fire before nft was added because no host-installed dyn
binary actually got invoked before kubesolo crashed at first-boot
preflight.
Three layered fixes in inject-kubesolo.sh:
1. Bundle the full glibc family (was just libc.so.6 + ld). Now also
libpthread, libdl, libm, libresolv, librt, libanl, libgcc_s. Without
these, transitively-loaded host libs could pull them in from piCore's
/lib and re-introduce the split.
2. After bundling, delete piCore's duplicates from /lib/ where our copy
exists in /lib/$LIB_ARCH/. The dynamic linker's search now has
exactly one match per soname.
3. Write /etc/ld.so.conf giving /lib/$LIB_ARCH precedence over /lib, and
run `ldconfig -r "$ROOTFS"` to bake an explicit /etc/ld.so.cache.
The runtime linker uses the cache (when present) instead of falling
back to compiled-in default paths, making lookup order deterministic.
Also done (followups from previous commit):
- build/Dockerfile.builder gains nftables so docker-build picks up nft.
- .gitea/workflows/release.yaml's amd64 build job installs iptables +
nftables (previously only listed iptables-related libs but not the
CLIs themselves).
Verified by shellcheck. End-to-end QEMU verification on the Odroid next.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -30,6 +30,7 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
|
||||
libarchive-tools \
|
||||
libelf-dev \
|
||||
libssl-dev \
|
||||
nftables \
|
||||
make \
|
||||
parted \
|
||||
squashfs-tools \
|
||||
|
||||
@@ -397,7 +397,13 @@ if [ -f /usr/sbin/xtables-nft-multi ]; then
|
||||
ln -sf xtables-nft-multi "$ROOTFS/usr/sbin/$cmd"
|
||||
done
|
||||
|
||||
# Copy required shared libraries (architecture-aware paths)
|
||||
# Copy required shared libraries (architecture-aware paths).
|
||||
# We deliberately bundle the *full* glibc family from the build host —
|
||||
# not just libc.so.6 — so dynamically-linked binaries we ship (nft,
|
||||
# xtables-nft-multi, etc.) load a consistent set of libraries. Mixing
|
||||
# glibc components across versions causes __stack_chk_guard mismatches
|
||||
# ("stack smashing detected" aborts) when stack frames cross between
|
||||
# functions linked against different libcs.
|
||||
mkdir -p "$ROOTFS/usr/lib/$LIB_ARCH" "$ROOTFS/lib/$LIB_ARCH"
|
||||
[ "$INJECT_ARCH" != "arm64" ] && mkdir -p "$ROOTFS/lib64"
|
||||
for lib in \
|
||||
@@ -405,6 +411,13 @@ if [ -f /usr/sbin/xtables-nft-multi ]; then
|
||||
"/lib/$LIB_ARCH/libmnl.so.0"* \
|
||||
"/lib/$LIB_ARCH/libnftnl.so.11"* \
|
||||
"/lib/$LIB_ARCH/libc.so.6" \
|
||||
"/lib/$LIB_ARCH/libpthread.so.0" \
|
||||
"/lib/$LIB_ARCH/libdl.so.2" \
|
||||
"/lib/$LIB_ARCH/libm.so.6" \
|
||||
"/lib/$LIB_ARCH/libresolv.so.2" \
|
||||
"/lib/$LIB_ARCH/librt.so.1" \
|
||||
"/lib/$LIB_ARCH/libanl.so.1" \
|
||||
"/lib/$LIB_ARCH/libgcc_s.so.1" \
|
||||
"$LD_SO"; do
|
||||
[ -e "$lib" ] && cp -aL "$lib" "$ROOTFS${lib}" 2>/dev/null || true
|
||||
done
|
||||
@@ -541,6 +554,54 @@ nameserver 1.1.1.1
|
||||
EOF
|
||||
fi
|
||||
|
||||
# --- Resolve dual-glibc ambiguity (ARM64) ---
|
||||
# piCore64's rootfs ships glibc at /lib/libc.so.6, and we've copied the
|
||||
# build host's glibc to /lib/$LIB_ARCH/libc.so.6. Two libc.so.6 in the
|
||||
# dynamic linker's search path can lead to a process loading both — one
|
||||
# directly, one transitively — and "stack smashing detected" aborts when
|
||||
# stack frames cross between them (each libc has its own
|
||||
# __stack_chk_guard). Remove piCore's copies so resolution is unambiguous
|
||||
# and write a proper /etc/ld.so.conf + cache pointing at our copies.
|
||||
if [ "$INJECT_ARCH" = "arm64" ] && [ -d "$ROOTFS/lib/$LIB_ARCH" ]; then
|
||||
echo " Pruning duplicate glibc components in $ROOTFS/lib/..."
|
||||
for lib in \
|
||||
libc.so.6 \
|
||||
libpthread.so.0 \
|
||||
libdl.so.2 \
|
||||
libm.so.6 \
|
||||
libresolv.so.2 \
|
||||
librt.so.1 \
|
||||
libanl.so.1 \
|
||||
libgcc_s.so.1; do
|
||||
# Only delete piCore's copy when our version exists; otherwise
|
||||
# we'd leave the binary unable to find any libc at all.
|
||||
if [ -e "$ROOTFS/lib/$lib" ] && [ -e "$ROOTFS/lib/$LIB_ARCH/$lib" ]; then
|
||||
rm -f "$ROOTFS/lib/$lib"
|
||||
fi
|
||||
done
|
||||
|
||||
# ld.so.conf gives our $LIB_ARCH paths precedence over piCore's /lib
|
||||
# (defaults vary by glibc version; this makes the order explicit).
|
||||
cat > "$ROOTFS/etc/ld.so.conf" <<EOF
|
||||
/lib/$LIB_ARCH
|
||||
/usr/lib/$LIB_ARCH
|
||||
/usr/local/lib
|
||||
/lib
|
||||
/usr/lib
|
||||
EOF
|
||||
|
||||
# Generate /etc/ld.so.cache. ldconfig -r treats $ROOTFS as the system
|
||||
# root, so it reads ld.so.conf from there and writes the cache there.
|
||||
# Works even cross-arch (it only parses ELF headers, doesn't execute).
|
||||
if command -v ldconfig >/dev/null 2>&1; then
|
||||
ldconfig -r "$ROOTFS" 2>/dev/null && \
|
||||
echo " Generated /etc/ld.so.cache via ldconfig" || \
|
||||
echo " WARN: ldconfig failed; falling back to default search order"
|
||||
else
|
||||
echo " WARN: ldconfig not on builder; cache not generated"
|
||||
fi
|
||||
fi
|
||||
|
||||
# --- Summary ---
|
||||
echo ""
|
||||
echo "==> Injection complete. Rootfs contents:"
|
||||
|
||||
Reference in New Issue
Block a user