portainer_scripts/ai-templates-0/README.md

# Portainer AI Templates (v2)

> **26 production-ready AI/ML Docker Compose stacks for Portainer** — filling the AI gap in the official v3 template library. Aligned with an AI infrastructure positioning strategy for Portainer.

## Background

The official [Portainer v3 templates](https://raw.githubusercontent.com/portainer/templates/v3/templates.json) contain **71 templates** with **zero pure AI/ML deployments**. This repository provides a curated, Portainer-compatible template set covering the entire AI infrastructure stack — from edge inference to distributed training to governed ML pipelines.

See [docs/AI_GAP_ANALYSIS.md](docs/AI_GAP_ANALYSIS.md) for the full gap analysis.

## Homepage Alignment

These templates map directly to the AI infrastructure positioning pillars:

| Mock-Up Pillar | Templates Covering It |
|---|---|
| **GPU-Aware Fleet Management** | Triton, vLLM, NVIDIA NIM, Ray Cluster, Ollama, LocalAI |
| **Model Lifecycle Governance** | MLflow + MinIO (Production MLOps), Prefect, BentoML, Label Studio |
| **Edge AI Deployment** | ONNX Runtime (CPU/edge profile), Triton, DeepStream |
| **Self-Service AI Stacks** | Open WebUI, Langflow, Flowise, n8n AI, Jupyter GPU |
| **LLM Fine-Tune** (diagram) | Ray Cluster (distributed training) |
| **RAG Pipeline** (diagram) | Qdrant, ChromaDB, Weaviate + Langflow/Flowise |
| **Vision Model** (diagram) | DeepStream, ComfyUI, Stable Diffusion WebUI |
| **Anomaly Detection** (diagram) | DeepStream (video analytics), Triton (custom models) |

## Quick Start

### Option A: Use as Custom Template URL in Portainer

1. In Portainer, go to **Settings > App Templates**
2. Set the URL to:
   ```
   https://git.oe74.net/adelorenzo/portainer_scripts/raw/branch/master/ai-templates/portainer-ai-templates.json
   ```
3. Click **Save** — all 26 AI templates appear in your App Templates list

### Option B: Deploy Individual Stacks

```bash
cd stacks/ollama
docker compose up -d
```

## Template Catalog

### LLM Inference and Model Serving

| # | Template | Port | GPU | Description |
|---|---|---|---|---|
| 1 | **Ollama** | 11434 | Yes | Local LLM engine — Llama, Mistral, Qwen, Gemma, Phi |
| 2 | **Open WebUI + Ollama** | 3000 | Yes | ChatGPT-like UI bundled with Ollama backend |
| 3 | **LocalAI** | 8080 | Yes | Drop-in OpenAI API replacement |
| 4 | **vLLM** | 8000 | Yes | High-throughput serving with PagedAttention |
| 5 | **Text Gen WebUI** | 7860 | Yes | Comprehensive LLM interface (oobabooga) |
| 6 | **LiteLLM Proxy** | 4000 | No | Unified API gateway for 100+ LLM providers |
| 26 | **NVIDIA NIM** | 8000 | Yes | Enterprise TensorRT-LLM optimized inference |

### Production Inference Serving

| # | Template | Port | GPU | Description |
|---|---|---|---|---|
| 19 | **NVIDIA Triton** | 8000 | Yes | Multi-framework inference server (TensorRT, ONNX, PyTorch, TF) |
| 20 | **ONNX Runtime** | 8001 | Optional | Lightweight inference with GPU and CPU/edge profiles |
| 24 | **BentoML** | 3000 | Yes | Model packaging and serving with metrics |

### Image and Video Generation

| # | Template | Port | GPU | Description |
|---|---|---|---|---|
| 7 | **ComfyUI** | 8188 | Yes | Node-based Stable Diffusion workflow engine |
| 8 | **Stable Diffusion WebUI** | 7860 | Yes | AUTOMATIC1111 interface for image generation |

### Industrial AI and Computer Vision

| # | Template | Port | GPU | Description |
|---|---|---|---|---|
| 21 | **NVIDIA DeepStream** | 8554 | Yes | Video analytics for inspection, anomaly detection, smart factory |

### Distributed Training

| # | Template | Port | GPU | Description |
|---|---|---|---|---|
| 22 | **Ray Cluster** | 8265 | Yes | Head + workers for LLM fine-tuning, distributed training, Ray Serve |

### AI Agents and Workflows

| # | Template | Port | GPU | Description |
|---|---|---|---|---|
| 9 | **Langflow** | 7860 | No | Visual multi-agent and RAG pipeline builder |
| 10 | **Flowise** | 3000 | No | Drag-and-drop LLM chatflow builder |
| 11 | **n8n (AI-Enabled)** | 5678 | No | Workflow automation with AI agent nodes |

### Vector Databases

| # | Template | Port | GPU | Description |
|---|---|---|---|---|
| 12 | **Qdrant** | 6333 | No | High-performance vector similarity search |
| 13 | **ChromaDB** | 8000 | No | AI-native embedding database |
| 14 | **Weaviate** | 8080 | No | Vector DB with built-in vectorization modules |

### ML Operations and Governance

| # | Template | Port | GPU | Description |
|---|---|---|---|---|
| 15 | **MLflow** | 5000 | No | Experiment tracking and model registry (SQLite) |
| 25 | **MLflow + MinIO** | 5000 | No | Production MLOps: PostgreSQL + S3 artifact store |
| 23 | **Prefect** | 4200 | No | Governed ML pipeline orchestration with audit logging |
| 16 | **Label Studio** | 8080 | No | Multi-type data labeling platform |
| 17 | **Jupyter (GPU/PyTorch)** | 8888 | Yes | GPU-accelerated notebooks |

### Speech and Audio

| # | Template | Port | GPU | Description |
|---|---|---|---|---|
| 18 | **Whisper ASR** | 9000 | Yes | Speech-to-text API server |

## GPU Requirements

Templates marked **GPU: Yes** require:
- NVIDIA GPU with CUDA support
- [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html) installed
- Docker configured with `nvidia` runtime

**Edge deployments (ONNX Runtime CPU profile):** No GPU required — runs on ARM or x86 with constrained CPU/memory limits.

For AMD GPUs (ROCm), modify the `deploy.resources` section to use ROCm-compatible images and remove the NVIDIA device reservation.

## File Structure

```
ai-templates/
├── portainer-ai-templates.json       # Portainer v3 template definition (26 templates)
├── README.md
├── docs/
│   └── AI_GAP_ANALYSIS.md           # Analysis of official templates gap
└── stacks/
    ├── ollama/                        # LLM Inference
    ├── open-webui/
    ├── localai/
    ├── vllm/
    ├── text-generation-webui/
    ├── litellm/
    ├── nvidia-nim/                    # v2: Enterprise inference
    ├── triton/                        # v2: Production inference serving
    ├── onnx-runtime/                  # v2: Edge-friendly inference
    ├── bentoml/                       # v2: Model packaging + serving
    ├── deepstream/                    # v2: Industrial computer vision
    ├── ray-cluster/                   # v2: Distributed training
    ├── prefect/                       # v2: Governed ML pipelines
    ├── minio-mlops/                   # v2: Production MLOps stack
    ├── comfyui/                       # Image generation
    ├── stable-diffusion-webui/
    ├── langflow/                      # AI agents
    ├── flowise/
    ├── n8n-ai/
    ├── qdrant/                        # Vector databases
    ├── chromadb/
    ├── weaviate/
    ├── mlflow/                        # ML operations
    ├── label-studio/
    ├── jupyter-gpu/
    └── whisper/                       # Speech
```

## Changelog

### v2 (March 2026)
- Added 8 templates to close alignment gap with AI infrastructure positioning:
  - **NVIDIA Triton Inference Server** — production multi-framework inference
  - **ONNX Runtime Server** — lightweight edge inference with CPU/GPU profiles
  - **NVIDIA DeepStream** — industrial computer vision and video analytics
  - **Ray Cluster (GPU)** — distributed training and fine-tuning
  - **Prefect** — governed ML pipeline orchestration
  - **BentoML** — model packaging and serving
  - **MLflow + MinIO** — production MLOps with S3 artifact governance
  - **NVIDIA NIM** — enterprise-optimized LLM inference

### v1 (March 2026)
- Initial 18 AI templates covering LLM inference, image generation, agents, vector DBs, MLOps, and speech

## License

These templates reference publicly available Docker images from their respective maintainers. Each tool has its own license — refer to the individual project documentation.

---

*Portainer AI Templates by Adolfo De Lorenzo — March 2026*