kubesolo-os

Author	SHA1	Message	Date
Adolfo Delorenzo	9fb894c5af	feat(update): pre-flight gates + deeper healthcheck + auto-rollback Some checks failed ARM64 Build / Build generic ARM64 disk image (push) Failing after 4s Details CI / Go Tests (push) Successful in 1m29s Details CI / Shellcheck (push) Successful in 48s Details CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Successful in 1m12s Details CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Has been cancelled Details Phase 8 of v0.3. Tightens the update lifecycle on both ends. Pre-flight (apply.go, before any download): - Free-space check on the passive partition: image size + 10% headroom must be available. Uses statfs(2) via the new pkg/partition.FreeBytes / HasFreeSpaceFor helpers (tests cover happy path, tiny request, huge request, missing path). Catches corrupted-FS and shrunk-partition cases before we destroy the existing slot data. - Node-block-label check: refuses if the local K8s node carries the updates.kubesolo.io/block=true label. New pkg/health.CheckNodeBlocked shells out to kubectl per the project's zero-deps stance. Silently bypassed when no kubeconfig is reachable (air-gap case). Skipped by --force. Healthcheck (extended via new pkg/health/extended.go + preflight.go): - CheckKubeSystemReady waits until every kube-system pod has held the Running phase for >= N seconds (default 30). Catches "started ok, will crash-loop" bugs that a single-shot phase check misses. - CheckProbeURL fetches an operator-supplied URL; 200 = pass. Wired through update.conf as healthcheck_url= and cloud-init updates.healthcheck_url. - CheckDiskWritable writes/fsyncs/reads a 1-KiB probe under /var/lib/kubesolo. Always runs in healthcheck so a wedged data partition fails fast. - pkg/health.Status grows KubeSystemReady, ProbeURL, DiskWritable booleans. Optional checks default to true in RunAll() so they don't block when unconfigured. health_test.go updated to the new 6-field shape. Auto-rollback (healthcheck.go): - state.UpdateState gains HealthCheckFailures (consecutive post-Activated failures). Reset on a clean pass. - --auto-rollback-after N (also auto_rollback_after= in update.conf) triggers env.ForceRollback() when the failure count reaches the threshold. State transitions to RolledBack with a descriptive LastError. The command still exits with the healthcheck error; the operator/init is expected to reboot. - Only fires while Phase == Activated. Doesn't second-guess a long-stable system that happens to fail one healthcheck. config / opts / cloud-init plumbing: - update.conf gains healthcheck_url= and auto_rollback_after= keys. - New CLI flags: --healthcheck-url, --auto-rollback-after, --kube-system-settle. - cloud-init full-config.yaml documents the new updates: subfields. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 19:08:30 -06:00
Adolfo Delorenzo	dfed6ddba8	feat(update): channels, maintenance windows, min-version gate Some checks failed ARM64 Build / Build generic ARM64 disk image (push) Failing after 3s Details CI / Go Tests (push) Successful in 1m23s Details CI / Shellcheck (push) Successful in 46s Details CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Successful in 1m32s Details CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Successful in 1m15s Details Phase 6 of v0.3. The update agent now refuses to apply artifacts whose channel doesn't match local policy, whose architecture differs from the running host, or whose min_compatible_version is above the current version. It also refuses to apply outside a configured maintenance window unless --force is given. New package update/pkg/config: - config.Load parses /etc/kubesolo/update.conf (key=value, # comments, unknown keys ignored). Missing file is fine — fresh systems before cloud-init has run. - ParseWindow handles "HH:MM-HH:MM" plus the wrapping midnight case (e.g. "23:00-01:00"). Empty input -> AlwaysOpen (no constraint). Degenerate zero-length windows never match. - CompareVersions does a simple 3-component semver compare with the 'v' prefix optional and pre-release suffix ignored. - 14 unit tests total. update/pkg/image/image.UpdateMetadata gains three optional fields: - channel ("stable", "beta", ...) - min_compatible_version (refuse upgrade if current < this) - architecture ("amd64", "arm64", ...) update/cmd/opts.go reads update.conf and merges it into opts; explicit --server / --channel / --pubkey / --maintenance-window CLI flags override the file. New --force, --conf, --channel, --maintenance-window flags. Precedence: CLI > config file > package defaults. update/cmd/apply.go gains four gates in order: 1. Maintenance window — checked locally before any HTTP work; skipped with --force. 2. Channel — refused if metadata.channel doesn't match opts.Channel. 3. Architecture — refused if metadata.architecture != runtime.GOARCH. 4. Min compatible version — refused if FromVersion < min_compatible. All gate failures transition state to Failed with a clear LastError. cloud-init gains a top-level updates: block (Server, Channel, MaintenanceWindow, PubKey). cloud-init.ApplyUpdates writes /etc/kubesolo/update.conf from those fields on first boot. Empty block leaves any existing file alone (so hand-edited update.conf survives a reboot without cloud-init re-applying). 4 new tests cover empty / all / partial / parent-dir-creation cases. full-config.yaml example updated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 18:21:46 -06:00
Adolfo Delorenzo	1b44c9d621	feat: bump KubeSolo to v1.1.5 + cross-arch CI workflow Some checks failed ARM64 Build / Build generic ARM64 disk image (push) Failing after 3s Details CI / Go Tests (push) Successful in 1m27s Details CI / Shellcheck (push) Failing after 50s Details CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Failing after 1m33s Details CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Failing after 1m15s Details Phase 4 of v0.3 — KubeSolo version bump and CI gating. KubeSolo v1.1.0 → v1.1.5 brings: - New flag --disable-ipv6 (v1.1.5) - New flag --db-wal-repair (v1.1.5) — important for power-loss resilience on edge appliances; surfaced as kubesolo.db-wal-repair in cloud-init - New flag --full (v1.1.4) — disables edge-optimised k8s overrides - Pod egress connectivity fix after reboot (v1.1.4) - Registry config persistence fix (v1.1.5) - k8s 1.34.7, CoreDNS 1.14.3, Go 1.26.2 All three new flags wired into cloud-init: config.go fields, kubesolo.go extra-flag emission, full-config.yaml example. Supply-chain hygiene: - Per-arch checksums: KUBESOLO_SHA256_AMD64 and KUBESOLO_SHA256_ARM64 in versions.env. Replaces the single shared KUBESOLO_SHA256 that couldn't meaningfully verify both binaries at once. - Checksum now applied to the tarball (the immutable upstream artifact) rather than the post-extract binary. CI: - New .gitea/workflows/build-arm64.yaml routes the full kernel + rootfs + disk-image build to the Odroid arm64-linux runner. Triggers on push to main, tags, and manual workflow_dispatch. The boot smoke test is continue-on-error because KubeSolo's first-boot image import deadline fires under QEMU TCG on the Odroid. VERSION bumped to 0.3.0-dev. CHANGELOG entry under [0.3.0-dev] captures all Phase 1-4 work + the known limitations documented in arm64-status.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 16:26:20 -06:00
Adolfo Delorenzo	61bd28c692	feat: cloud-init supports all documented KubeSolo CLI flags Some checks failed CI / Go Tests (push) Has been cancelled Details CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Has been cancelled Details CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Has been cancelled Details CI / Shellcheck (push) Has been cancelled Details Add missing flags (--local-storage-shared-path, --debug, --pprof-server, --portainer-edge-id, --portainer-edge-key, --portainer-edge-async) so all 10 documented KubeSolo parameters can be configured via cloud-init YAML. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-12 15:49:31 -06:00
Adolfo Delorenzo	d9ac58418d	fix: macOS dev VM, CA certs, DNS fallback, Portainer Edge integration Some checks failed CI / Go Tests (push) Has been cancelled Details CI / Build Go Binaries (amd64, linux, linux-amd64) (push) Has been cancelled Details CI / Build Go Binaries (arm64, linux, linux-arm64) (push) Has been cancelled Details CI / Shellcheck (push) Has been cancelled Details - dev-vm.sh: rewrite for macOS (bsdtar ISO extraction, Homebrew mkfs.ext4 detection, direct kernel boot, TCG acceleration, port 8080 forwarding) - inject-kubesolo.sh: add CA certificates bundle from builder so containerd can verify TLS when pulling from registries (Docker Hub, etc.) - 50-network.sh: add DNS fallback (10.0.2.3 + 8.8.8.8) when DHCP client doesn't populate /etc/resolv.conf - 90-kubesolo.sh: serve kubeconfig via HTTP on port 8080 for reliable retrieval from host, add 127.0.0.1 and 10.0.2.15 to API server SANs - portainer.go: add headless Service to Edge Agent manifest (required for agent peer discovery DNS lookup) - 10-parse-cmdline.sh + init.sh: add kubesolo.edge_id/edge_key boot params - 20-persistent-mount.sh: auto-format unformatted data disks on first boot - hack/fix-portainer-service.sh: helper to patch running cluster Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-12 02:11:31 -06:00
Adolfo Delorenzo	49a37e30e8	feat: add production hardening — Ed25519 signing, Portainer Edge, SSH extension (Phase 4) Image signing: - Ed25519 sign/verify package (pure Go stdlib, zero deps) - genkey and sign CLI subcommands for build system - Optional --pubkey flag for verifying updates on apply - Signature URLs in update metadata (latest.json) Portainer Edge Agent: - cloud-init portainer.go module writes K8s manifest - Auto-deploys Edge Agent when portainer.edge-agent.enabled - Full RBAC (ServiceAccount, ClusterRoleBinding, Deployment) - 5 Portainer tests in portainer_test.go Production tooling: - SSH debug extension builder (hack/build-ssh-extension.sh) - Boot performance benchmark (test/benchmark/bench-boot.sh) - Resource usage benchmark (test/benchmark/bench-resources.sh) - Deployment guide (docs/deployment-guide.md) Test results: 50 update agent tests + 22 cloud-init tests passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-11 11:26:23 -06:00
Adolfo Delorenzo	d900fa920e	feat: add cloud-init Go parser (Phase 2) Implement a lightweight cloud-init system for first-boot configuration: - Go parser for YAML config (hostname, network, KubeSolo settings) - Static/DHCP network modes with DNS override - KubeSolo extra flags and API server SAN configuration - Portainer Edge Agent and air-gapped deployment support - New init stage 45-cloud-init.sh runs before network/hostname stages - Stages 50/60 skip gracefully when cloud-init has already applied - Build script compiles static Linux/amd64 binary (~2.7 MB) - 17 unit tests covering parsing, validation, and example files - Full documentation at docs/cloud-init.md Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-11 10:39:05 -06:00
Adolfo Delorenzo	e372df578b	feat: initial Phase 1 PoC scaffolding for KubeSolo OS Complete Phase 1 implementation of KubeSolo OS — an immutable, bootable Linux distribution built on Tiny Core Linux for running KubeSolo single-node Kubernetes. Build system: - Makefile with fetch, rootfs, initramfs, iso, disk-image targets - Dockerfile.builder for reproducible builds - Scripts to download Tiny Core, extract rootfs, inject KubeSolo, pack initramfs, and create bootable ISO/disk images Init system (10 POSIX sh stages): - Early mount (proc/sys/dev/cgroup2), cmdline parsing, persistent mount with bind-mounts, kernel module loading, sysctl, DHCP networking, hostname, clock sync, containerd prep, KubeSolo exec Shared libraries: - functions.sh (device wait, IP lookup, config helpers) - network.sh (static IP, config persistence, interface detection) - health.sh (containerd, API server, node readiness checks) - Emergency shell for boot failure debugging Testing: - QEMU boot test with serial log marker detection - K8s readiness test with kubectl verification - Persistence test (reboot + verify state survives) - Workload deployment test (nginx pod) - Local storage test (PVC + local-path provisioner) - Network policy test - Reusable run-vm.sh launcher Developer tools: - dev-vm.sh (interactive QEMU with port forwarding) - rebuild-initramfs.sh (fast iteration) - inject-ssh.sh (dropbear SSH for debugging) - extract-kernel-config.sh + kernel-audit.sh Documentation: - Full design document with architecture research - Boot flow documentation covering all 10 init stages - Cloud-init examples (DHCP, static IP, Portainer Edge, air-gapped) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-11 10:18:42 -06:00

8 Commits