fix(kernel): enable nftables NUMGEN + HASH + helper expressions
Some checks failed
ARM64 Build / Build generic ARM64 disk image (push) Failing after 5s
CI / Go Tests (push) Successful in 3m51s
CI / Shellcheck (push) Successful in 1m5s
CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Successful in 2m48s
CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Successful in 2m50s
Some checks failed
ARM64 Build / Build generic ARM64 disk image (push) Failing after 5s
CI / Go Tests (push) Successful in 3m51s
CI / Shellcheck (push) Successful in 1m5s
CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Successful in 2m48s
CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Successful in 2m50s
Fourth round of the v0.3 nftables-on-arm64 debug saga. After the
NF_TABLES_IPV4 family fix from 7e46f8f, KubeSolo + containerd + a
CoreDNS pod all reach Running state, but kube-proxy fails to install
Service rules:
add rule ip kube-proxy service-2QRHZV4L-default/kubernetes/tcp/https
numgen random mod 1 vmap { 0 : goto ... }
^^^^^^^^^^^^^^^^^^^
Error: Could not process rule: No such file or directory
The caret points at `numgen random mod 1`. That's the nftables
NUMGEN expression — kube-proxy's nftables backend uses it for random
endpoint load-balancing across Service endpoints. Without
CONFIG_NFT_NUMGEN compiled into the kernel, every Service sync fails
and kube-dns / any ClusterIP is unreachable.
Cascade: kube-proxy sync fail -> kube-dns Service has no DNAT ->
CoreDNS readiness probe never goes Ready -> KubeSolo's coredns
deploy step times out after 15 attempts -> FTL -> kernel panic.
Fix: add NFT_NUMGEN to kernel-container.fragment, plus the small
family of expression modules kube-proxy and CNI plugins commonly use
so we don't repeat this debug loop for the next missing one:
CONFIG_NFT_NUMGEN=m random / inc LB
CONFIG_NFT_HASH=m consistent-hash LB (sessionAffinity=ClientIP)
CONFIG_NFT_OBJREF=m named objects (counters, quotas) refs in rules
CONFIG_NFT_LIMIT=m rate-limit expression
CONFIG_NFT_LOG=m log expression (used by some CNI debug rules)
All =m so init's stage-30 loads them from modules.list / modules-arm64.list
alongside the existing nft_nat / nft_masq / nft_compat.
This needs another kernel rebuild (rm -rf build/cache/kernel-arm64-generic,
sudo make kernel-arm64) on the Odroid. After that we should have a fully
working KubeSolo OS v0.3 on ARM64 generic — at which point the only thing
left is to tag v0.3.1 and verify the rewritten release.yaml workflow
publishes both arches automatically.
Note on runc-PATH log noise: containerd-shim-runc-v2 -info probes for
runc in $PATH and fails because KubeSolo's runc lives at
/var/lib/kubesolo/containerd/runc. This is cosmetic — actual container
creation uses an absolute path from the containerd config and works
fine (CoreDNS container did start successfully). Will polish in v0.3.2.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -62,9 +62,9 @@ CONFIG_NF_TABLES_IPV6=y
|
|||||||
CONFIG_NF_TABLES_INET=y
|
CONFIG_NF_TABLES_INET=y
|
||||||
CONFIG_NF_TABLES_NETDEV=y
|
CONFIG_NF_TABLES_NETDEV=y
|
||||||
|
|
||||||
# nftables expression modules used by KubeSolo's masquerade ruleset and
|
# nftables expression modules used by KubeSolo's masquerade ruleset, the
|
||||||
# kube-proxy's nft-compat path. Listed in modules.list / modules-arm64.list
|
# kube-proxy nft backend (Kubernetes 1.34+), and the xtables compat path.
|
||||||
# so init loads them at boot.
|
# Listed in modules.list / modules-arm64.list so init loads them at boot.
|
||||||
CONFIG_NFT_NAT=m
|
CONFIG_NFT_NAT=m
|
||||||
CONFIG_NFT_MASQ=m
|
CONFIG_NFT_MASQ=m
|
||||||
CONFIG_NFT_CT=m
|
CONFIG_NFT_CT=m
|
||||||
@@ -75,6 +75,18 @@ CONFIG_NFT_COMPAT=m
|
|||||||
CONFIG_NFT_FIB=m
|
CONFIG_NFT_FIB=m
|
||||||
CONFIG_NFT_FIB_IPV4=m
|
CONFIG_NFT_FIB_IPV4=m
|
||||||
CONFIG_NFT_FIB_IPV6=m
|
CONFIG_NFT_FIB_IPV6=m
|
||||||
|
# numgen drives kube-proxy's random / round-robin endpoint LB:
|
||||||
|
# `numgen random mod N vmap { ... }` in service rules.
|
||||||
|
# Without it kube-proxy's nft sync fails with ENOENT on every service.
|
||||||
|
CONFIG_NFT_NUMGEN=m
|
||||||
|
# hash drives consistent-hash LB (sessionAffinity=ClientIP, etc.).
|
||||||
|
CONFIG_NFT_HASH=m
|
||||||
|
# objref / limit / log are used by various policy expressions kube-proxy and
|
||||||
|
# CNI plugins emit. Including them pre-empts a future "could not process
|
||||||
|
# rule" debug loop.
|
||||||
|
CONFIG_NFT_OBJREF=m
|
||||||
|
CONFIG_NFT_LIMIT=m
|
||||||
|
CONFIG_NFT_LOG=m
|
||||||
|
|
||||||
# IPv4 NAT bits NFT_MASQ depends on. Auto-selected on most kernels but we
|
# IPv4 NAT bits NFT_MASQ depends on. Auto-selected on most kernels but we
|
||||||
# pin them explicitly so olddefconfig doesn't strip them when the fragment
|
# pin them explicitly so olddefconfig doesn't strip them when the fragment
|
||||||
|
|||||||
Reference in New Issue
Block a user