Files
kubesolo-os/docs/deployment-guide.md
Adolfo Delorenzo 49a37e30e8 feat: add production hardening — Ed25519 signing, Portainer Edge, SSH extension (Phase 4)
Image signing:
- Ed25519 sign/verify package (pure Go stdlib, zero deps)
- genkey and sign CLI subcommands for build system
- Optional --pubkey flag for verifying updates on apply
- Signature URLs in update metadata (latest.json)

Portainer Edge Agent:
- cloud-init portainer.go module writes K8s manifest
- Auto-deploys Edge Agent when portainer.edge-agent.enabled
- Full RBAC (ServiceAccount, ClusterRoleBinding, Deployment)
- 5 Portainer tests in portainer_test.go

Production tooling:
- SSH debug extension builder (hack/build-ssh-extension.sh)
- Boot performance benchmark (test/benchmark/bench-boot.sh)
- Resource usage benchmark (test/benchmark/bench-resources.sh)
- Deployment guide (docs/deployment-guide.md)

Test results: 50 update agent tests + 22 cloud-init tests passing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 11:26:23 -06:00

451 lines
9.4 KiB
Markdown

# KubeSolo OS — Deployment Guide
This guide covers deploying KubeSolo OS to physical hardware and virtual machines,
including first-boot configuration, update signing, and Portainer Edge integration.
## Table of Contents
- [Prerequisites](#prerequisites)
- [Building](#building)
- [Installation Methods](#installation-methods)
- [First-Boot Configuration (Cloud-Init)](#first-boot-configuration)
- [Update Signing](#update-signing)
- [Portainer Edge Integration](#portainer-edge-integration)
- [SSH Debug Access](#ssh-debug-access)
- [Monitoring and Health Checks](#monitoring-and-health-checks)
- [Troubleshooting](#troubleshooting)
---
## Prerequisites
**Hardware requirements:**
- x86_64 processor
- 512 MB RAM minimum (1 GB recommended)
- 8 GB storage minimum (16 GB recommended)
- Network interface (wired or WiFi with supported chipset)
**Build requirements:**
- Linux or macOS host
- Docker (for reproducible builds) or: bash, make, cpio, gzip, xorriso, Go 1.22+
- QEMU (for testing)
---
## Building
### Quick build (ISO)
```bash
git clone https://github.com/portainer/kubesolo-os.git
cd kubesolo-os
make fetch # Download Tiny Core + KubeSolo
make iso # Build bootable ISO
```
Output: `output/kubesolo-os-<version>.iso`
### Disk image (for persistent installations)
```bash
make disk-image # Build raw disk with A/B partitions
```
Output: `output/kubesolo-os-<version>.img`
### Reproducible build (Docker)
```bash
make docker-build
```
---
## Installation Methods
### USB Flash Drive
```bash
# Write disk image to USB (replace /dev/sdX with your device)
sudo dd if=output/kubesolo-os-0.1.0.img of=/dev/sdX bs=4M status=progress
sync
```
### Virtual Machine (QEMU/KVM)
```bash
# Quick launch for testing
make dev-vm
# Or manually:
qemu-system-x86_64 -m 1024 -smp 2 \
-enable-kvm -cpu host \
-drive file=output/kubesolo-os-0.1.0.img,format=raw,if=virtio \
-net nic,model=virtio \
-net user,hostfwd=tcp::6443-:6443,hostfwd=tcp::2222-:22 \
-nographic
```
### Cloud / Hypervisor
Convert the raw image for your platform:
```bash
# VMware
qemu-img convert -f raw -O vmdk output/kubesolo-os-0.1.0.img kubesolo-os.vmdk
# VirtualBox
qemu-img convert -f raw -O vdi output/kubesolo-os-0.1.0.img kubesolo-os.vdi
# Hyper-V
qemu-img convert -f raw -O vhdx output/kubesolo-os-0.1.0.img kubesolo-os.vhdx
```
---
## First-Boot Configuration
KubeSolo OS uses a simplified cloud-init system for first-boot configuration.
Place the config file on the data partition before first boot.
### Config file location
```
/mnt/data/etc-kubesolo/cloud-init.yaml
```
For ISO boot, the config can be provided via a secondary drive or kernel parameter:
```
kubesolo.cloudinit=/path/to/cloud-init.yaml
```
### Basic DHCP configuration
```yaml
hostname: kubesolo-node-01
network:
mode: dhcp
kubesolo:
local-storage: true
```
### Static IP configuration
```yaml
hostname: kubesolo-prod-01
network:
mode: static
interface: eth0
address: 192.168.1.100/24
gateway: 192.168.1.1
dns:
- 8.8.8.8
- 1.1.1.1
kubesolo:
local-storage: true
apiserver-extra-sans:
- 192.168.1.100
- kubesolo-prod-01.local
```
### Air-gapped deployment
```yaml
hostname: airgap-node
network:
mode: static
address: 10.0.0.50/24
gateway: 10.0.0.1
dns:
- 10.0.0.1
kubesolo:
local-storage: true
extra-flags: "--disable=traefik --disable=servicelb"
airgap:
import-images: true
images-dir: /mnt/data/images
```
Pre-load container images by placing tar archives in `/mnt/data/images/`.
---
## Update Signing
KubeSolo OS supports Ed25519 signature verification for update images.
This ensures only authorized images can be applied to your devices.
### Generate a signing key pair
```bash
# On your build machine (keep private key secure!)
cd update && go run . genkey
```
Output:
```
Public key (hex): <64-char hex string>
Private key (hex): <128-char hex string>
Save the public key to /etc/kubesolo/update-pubkey.hex on the device.
Keep the private key secure and offline - use it only for signing updates.
```
Save the private key to a secure location (e.g., `signing-key.hex`).
Save the public key to `update-pubkey.hex`.
### Sign update images
```bash
# Sign the kernel and initramfs
cd update && go run . sign --key /path/to/signing-key.hex \
../output/vmlinuz ../output/kubesolo-os.gz
```
This produces `.sig` files alongside each image.
### Deploy the public key
Place the public key on the device's data partition:
```
/mnt/data/etc-kubesolo/update-pubkey.hex
```
Or embed it in the cloud-init config on the data partition.
### Update server layout
Your update server should serve:
```
/latest.json # Update metadata
/vmlinuz # Kernel
/vmlinuz.sig # Kernel signature
/kubesolo-os.gz # Initramfs
/kubesolo-os.gz.sig # Initramfs signature
```
Example `latest.json`:
```json
{
"version": "0.2.0",
"vmlinuz_url": "https://updates.example.com/v0.2.0/vmlinuz",
"vmlinuz_sha256": "<sha256-hex>",
"vmlinuz_sig_url": "https://updates.example.com/v0.2.0/vmlinuz.sig",
"initramfs_url": "https://updates.example.com/v0.2.0/kubesolo-os.gz",
"initramfs_sha256": "<sha256-hex>",
"initramfs_sig_url": "https://updates.example.com/v0.2.0/kubesolo-os.gz.sig",
"release_notes": "Bug fixes and security updates",
"release_date": "2025-01-15"
}
```
### Apply a signed update
```bash
kubesolo-update apply \
--server https://updates.example.com \
--pubkey /etc/kubesolo/update-pubkey.hex
```
---
## Portainer Edge Integration
KubeSolo OS can automatically deploy the Portainer Edge Agent for remote
management through Portainer Business Edition.
### Setup in Portainer
1. Log in to your Portainer Business instance
2. Go to **Environments****Add Environment****Edge Agent**
3. Select **Kubernetes** as the environment type
4. Copy the **Edge ID** and **Edge Key** values
### Cloud-init configuration
```yaml
hostname: edge-node-01
network:
mode: dhcp
kubesolo:
local-storage: true
portainer:
edge-agent:
enabled: true
edge-id: "your-edge-id-from-portainer"
edge-key: "your-edge-key-from-portainer"
portainer-url: "https://portainer.yourcompany.com"
# Optional: pin agent version
# image: portainer/agent:2.20.0
```
### Manual deployment
If not using cloud-init, deploy the Edge Agent manually after boot:
```bash
# Create namespace
kubesolo kubectl create namespace portainer
# Apply the edge agent manifest (generated from template)
kubesolo kubectl apply -f /path/to/portainer-edge-agent.yaml
```
### Verify connection
```bash
kubesolo kubectl -n portainer get pods
# Should show portainer-agent pod in Running state
```
The node should appear in your Portainer dashboard within a few minutes.
---
## SSH Debug Access
For development and debugging, you can add SSH access using the
optional ssh-debug extension.
### Build the SSH extension
```bash
./hack/build-ssh-extension.sh --pubkey ~/.ssh/id_ed25519.pub
```
### Load on a running system
```bash
# Copy to device
scp output/ssh-debug.tcz root@<device>:/mnt/data/extensions/
# Load (no reboot required)
unsquashfs -f -d / /mnt/data/extensions/ssh-debug.tcz
/usr/lib/kubesolo-os/init.d/85-ssh.sh
```
### Quick inject for development
```bash
# Inject into rootfs before building ISO
./hack/inject-ssh.sh
make initramfs iso
```
> **Warning:** SSH access should NEVER be enabled in production. The debug
> extension uses key-based auth only and has no password, but it still
> expands the attack surface.
---
## Monitoring and Health Checks
### Automatic health checks
KubeSolo OS runs a post-boot health check that verifies:
- containerd is running
- Kubernetes API server responds
- Node reports Ready status
On success, the health check marks the boot as successful in GRUB,
preventing automatic rollback.
### Deploy the health check CronJob
```bash
kubesolo kubectl apply -f update/deploy/update-cronjob.yaml
```
This deploys:
- A CronJob that checks for updates every 6 hours
- A health check Job that runs at boot
### Manual health check
```bash
kubesolo-update healthcheck --timeout 120
```
### Status check
```bash
kubesolo-update status
```
Shows:
- Active/passive slot
- Current version
- Boot counter status
- GRUB environment variables
---
## Troubleshooting
### Boot hangs at kernel loading
- Verify the ISO/image is not corrupted: check SHA256 against published hash
- Try adding `kubesolo.debug` to kernel command line for verbose logging
- Try `kubesolo.shell` to drop to emergency shell
### KubeSolo fails to start
```bash
# Check KubeSolo logs
cat /var/log/kubesolo.log
# Verify containerd is running
pidof containerd
# Check if required kernel modules are loaded
lsmod | grep -E "overlay|br_netfilter|veth"
```
### Node not reaching Ready state
```bash
# Check node status
kubesolo kubectl get nodes -o wide
# Check system pods
kubesolo kubectl get pods -A
# Check kubelet logs
kubesolo kubectl logs -n kube-system <pod-name>
```
### Update fails with signature error
```bash
# Verify the public key matches the signing key
cat /etc/kubesolo/update-pubkey.hex
# Test signature verification manually
kubesolo-update check --server https://updates.example.com
```
### Rollback to previous version
```bash
# Force rollback to the other slot
kubesolo-update rollback --grubenv /boot/grub/grubenv
# Reboot to apply
reboot
```
### Emergency shell
Boot with `kubesolo.shell` kernel parameter, or if boot fails after 3
attempts, GRUB automatically rolls back to the last known good slot.