Files
kubesolo-os/build/config
Adolfo Delorenzo 31eee77397
Some checks failed
ARM64 Build / Build generic ARM64 disk image (push) Failing after 5s
CI / Go Tests (push) Successful in 3m51s
CI / Shellcheck (push) Successful in 1m5s
CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Successful in 2m48s
CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Successful in 2m50s
fix(kernel): enable nftables NUMGEN + HASH + helper expressions
Fourth round of the v0.3 nftables-on-arm64 debug saga. After the
NF_TABLES_IPV4 family fix from 7e46f8f, KubeSolo + containerd + a
CoreDNS pod all reach Running state, but kube-proxy fails to install
Service rules:

  add rule ip kube-proxy service-2QRHZV4L-default/kubernetes/tcp/https
    numgen random mod 1 vmap { 0 : goto ... }
    ^^^^^^^^^^^^^^^^^^^
  Error: Could not process rule: No such file or directory

The caret points at `numgen random mod 1`. That's the nftables
NUMGEN expression — kube-proxy's nftables backend uses it for random
endpoint load-balancing across Service endpoints. Without
CONFIG_NFT_NUMGEN compiled into the kernel, every Service sync fails
and kube-dns / any ClusterIP is unreachable.

Cascade: kube-proxy sync fail -> kube-dns Service has no DNAT ->
CoreDNS readiness probe never goes Ready -> KubeSolo's coredns
deploy step times out after 15 attempts -> FTL -> kernel panic.

Fix: add NFT_NUMGEN to kernel-container.fragment, plus the small
family of expression modules kube-proxy and CNI plugins commonly use
so we don't repeat this debug loop for the next missing one:

  CONFIG_NFT_NUMGEN=m   random / inc LB
  CONFIG_NFT_HASH=m     consistent-hash LB (sessionAffinity=ClientIP)
  CONFIG_NFT_OBJREF=m   named objects (counters, quotas) refs in rules
  CONFIG_NFT_LIMIT=m    rate-limit expression
  CONFIG_NFT_LOG=m      log expression (used by some CNI debug rules)

All =m so init's stage-30 loads them from modules.list / modules-arm64.list
alongside the existing nft_nat / nft_masq / nft_compat.

This needs another kernel rebuild (rm -rf build/cache/kernel-arm64-generic,
sudo make kernel-arm64) on the Odroid. After that we should have a fully
working KubeSolo OS v0.3 on ARM64 generic — at which point the only thing
left is to tag v0.3.1 and verify the rewritten release.yaml workflow
publishes both arches automatically.

Note on runc-PATH log noise: containerd-shim-runc-v2 -info probes for
runc in $PATH and fails because KubeSolo's runc lives at
/var/lib/kubesolo/containerd/runc. This is cosmetic — actual container
creation uses an absolute path from the containerd config and works
fine (CoreDNS container did start successfully). Will polish in v0.3.2.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 11:48:43 -06:00
..