188 lines
7.9 KiB
Markdown
188 lines
7.9 KiB
Markdown
# Portainer AI Templates (v2)
|
|
|
|
> **26 production-ready AI/ML Docker Compose stacks for Portainer** — filling the AI gap in the official v3 template library. Aligned with an AI infrastructure positioning strategy for Portainer.
|
|
|
|
## Background
|
|
|
|
The official [Portainer v3 templates](https://raw.githubusercontent.com/portainer/templates/v3/templates.json) contain **71 templates** with **zero pure AI/ML deployments**. This repository provides a curated, Portainer-compatible template set covering the entire AI infrastructure stack — from edge inference to distributed training to governed ML pipelines.
|
|
|
|
See [docs/AI_GAP_ANALYSIS.md](docs/AI_GAP_ANALYSIS.md) for the full gap analysis.
|
|
|
|
## Homepage Alignment
|
|
|
|
These templates map directly to the AI infrastructure positioning pillars:
|
|
|
|
| Mock-Up Pillar | Templates Covering It |
|
|
|---|---|
|
|
| **GPU-Aware Fleet Management** | Triton, vLLM, NVIDIA NIM, Ray Cluster, Ollama, LocalAI |
|
|
| **Model Lifecycle Governance** | MLflow + MinIO (Production MLOps), Prefect, BentoML, Label Studio |
|
|
| **Edge AI Deployment** | ONNX Runtime (CPU/edge profile), Triton, DeepStream |
|
|
| **Self-Service AI Stacks** | Open WebUI, Langflow, Flowise, n8n AI, Jupyter GPU |
|
|
| **LLM Fine-Tune** (diagram) | Ray Cluster (distributed training) |
|
|
| **RAG Pipeline** (diagram) | Qdrant, ChromaDB, Weaviate + Langflow/Flowise |
|
|
| **Vision Model** (diagram) | DeepStream, ComfyUI, Stable Diffusion WebUI |
|
|
| **Anomaly Detection** (diagram) | DeepStream (video analytics), Triton (custom models) |
|
|
|
|
## Quick Start
|
|
|
|
### Option A: Use as Custom Template URL in Portainer
|
|
|
|
1. In Portainer, go to **Settings > App Templates**
|
|
2. Set the URL to:
|
|
```
|
|
https://git.oe74.net/adelorenzo/portainer_scripts/raw/branch/master/ai-templates/portainer-ai-templates.json
|
|
```
|
|
3. Click **Save** — all 26 AI templates appear in your App Templates list
|
|
|
|
### Option B: Deploy Individual Stacks
|
|
|
|
```bash
|
|
cd stacks/ollama
|
|
docker compose up -d
|
|
```
|
|
|
|
## Template Catalog
|
|
|
|
### LLM Inference and Model Serving
|
|
|
|
| # | Template | Port | GPU | Description |
|
|
|---|---|---|---|---|
|
|
| 1 | **Ollama** | 11434 | Yes | Local LLM engine — Llama, Mistral, Qwen, Gemma, Phi |
|
|
| 2 | **Open WebUI + Ollama** | 3000 | Yes | ChatGPT-like UI bundled with Ollama backend |
|
|
| 3 | **LocalAI** | 8080 | Yes | Drop-in OpenAI API replacement |
|
|
| 4 | **vLLM** | 8000 | Yes | High-throughput serving with PagedAttention |
|
|
| 5 | **Text Gen WebUI** | 7860 | Yes | Comprehensive LLM interface (oobabooga) |
|
|
| 6 | **LiteLLM Proxy** | 4000 | No | Unified API gateway for 100+ LLM providers |
|
|
| 26 | **NVIDIA NIM** | 8000 | Yes | Enterprise TensorRT-LLM optimized inference |
|
|
|
|
### Production Inference Serving
|
|
|
|
| # | Template | Port | GPU | Description |
|
|
|---|---|---|---|---|
|
|
| 19 | **NVIDIA Triton** | 8000 | Yes | Multi-framework inference server (TensorRT, ONNX, PyTorch, TF) |
|
|
| 20 | **ONNX Runtime** | 8001 | Optional | Lightweight inference with GPU and CPU/edge profiles |
|
|
| 24 | **BentoML** | 3000 | Yes | Model packaging and serving with metrics |
|
|
|
|
### Image and Video Generation
|
|
|
|
| # | Template | Port | GPU | Description |
|
|
|---|---|---|---|---|
|
|
| 7 | **ComfyUI** | 8188 | Yes | Node-based Stable Diffusion workflow engine |
|
|
| 8 | **Stable Diffusion WebUI** | 7860 | Yes | AUTOMATIC1111 interface for image generation |
|
|
|
|
### Industrial AI and Computer Vision
|
|
|
|
| # | Template | Port | GPU | Description |
|
|
|---|---|---|---|---|
|
|
| 21 | **NVIDIA DeepStream** | 8554 | Yes | Video analytics for inspection, anomaly detection, smart factory |
|
|
|
|
### Distributed Training
|
|
|
|
| # | Template | Port | GPU | Description |
|
|
|---|---|---|---|---|
|
|
| 22 | **Ray Cluster** | 8265 | Yes | Head + workers for LLM fine-tuning, distributed training, Ray Serve |
|
|
|
|
### AI Agents and Workflows
|
|
|
|
| # | Template | Port | GPU | Description |
|
|
|---|---|---|---|---|
|
|
| 9 | **Langflow** | 7860 | No | Visual multi-agent and RAG pipeline builder |
|
|
| 10 | **Flowise** | 3000 | No | Drag-and-drop LLM chatflow builder |
|
|
| 11 | **n8n (AI-Enabled)** | 5678 | No | Workflow automation with AI agent nodes |
|
|
|
|
### Vector Databases
|
|
|
|
| # | Template | Port | GPU | Description |
|
|
|---|---|---|---|---|
|
|
| 12 | **Qdrant** | 6333 | No | High-performance vector similarity search |
|
|
| 13 | **ChromaDB** | 8000 | No | AI-native embedding database |
|
|
| 14 | **Weaviate** | 8080 | No | Vector DB with built-in vectorization modules |
|
|
|
|
### ML Operations and Governance
|
|
|
|
| # | Template | Port | GPU | Description |
|
|
|---|---|---|---|---|
|
|
| 15 | **MLflow** | 5000 | No | Experiment tracking and model registry (SQLite) |
|
|
| 25 | **MLflow + MinIO** | 5000 | No | Production MLOps: PostgreSQL + S3 artifact store |
|
|
| 23 | **Prefect** | 4200 | No | Governed ML pipeline orchestration with audit logging |
|
|
| 16 | **Label Studio** | 8080 | No | Multi-type data labeling platform |
|
|
| 17 | **Jupyter (GPU/PyTorch)** | 8888 | Yes | GPU-accelerated notebooks |
|
|
|
|
### Speech and Audio
|
|
|
|
| # | Template | Port | GPU | Description |
|
|
|---|---|---|---|---|
|
|
| 18 | **Whisper ASR** | 9000 | Yes | Speech-to-text API server |
|
|
|
|
## GPU Requirements
|
|
|
|
Templates marked **GPU: Yes** require:
|
|
- NVIDIA GPU with CUDA support
|
|
- [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html) installed
|
|
- Docker configured with `nvidia` runtime
|
|
|
|
**Edge deployments (ONNX Runtime CPU profile):** No GPU required — runs on ARM or x86 with constrained CPU/memory limits.
|
|
|
|
For AMD GPUs (ROCm), modify the `deploy.resources` section to use ROCm-compatible images and remove the NVIDIA device reservation.
|
|
|
|
## File Structure
|
|
|
|
```
|
|
ai-templates/
|
|
├── portainer-ai-templates.json # Portainer v3 template definition (26 templates)
|
|
├── README.md
|
|
├── docs/
|
|
│ └── AI_GAP_ANALYSIS.md # Analysis of official templates gap
|
|
└── stacks/
|
|
├── ollama/ # LLM Inference
|
|
├── open-webui/
|
|
├── localai/
|
|
├── vllm/
|
|
├── text-generation-webui/
|
|
├── litellm/
|
|
├── nvidia-nim/ # v2: Enterprise inference
|
|
├── triton/ # v2: Production inference serving
|
|
├── onnx-runtime/ # v2: Edge-friendly inference
|
|
├── bentoml/ # v2: Model packaging + serving
|
|
├── deepstream/ # v2: Industrial computer vision
|
|
├── ray-cluster/ # v2: Distributed training
|
|
├── prefect/ # v2: Governed ML pipelines
|
|
├── minio-mlops/ # v2: Production MLOps stack
|
|
├── comfyui/ # Image generation
|
|
├── stable-diffusion-webui/
|
|
├── langflow/ # AI agents
|
|
├── flowise/
|
|
├── n8n-ai/
|
|
├── qdrant/ # Vector databases
|
|
├── chromadb/
|
|
├── weaviate/
|
|
├── mlflow/ # ML operations
|
|
├── label-studio/
|
|
├── jupyter-gpu/
|
|
└── whisper/ # Speech
|
|
```
|
|
|
|
## Changelog
|
|
|
|
### v2 (March 2026)
|
|
- Added 8 templates to close alignment gap with AI infrastructure positioning:
|
|
- **NVIDIA Triton Inference Server** — production multi-framework inference
|
|
- **ONNX Runtime Server** — lightweight edge inference with CPU/GPU profiles
|
|
- **NVIDIA DeepStream** — industrial computer vision and video analytics
|
|
- **Ray Cluster (GPU)** — distributed training and fine-tuning
|
|
- **Prefect** — governed ML pipeline orchestration
|
|
- **BentoML** — model packaging and serving
|
|
- **MLflow + MinIO** — production MLOps with S3 artifact governance
|
|
- **NVIDIA NIM** — enterprise-optimized LLM inference
|
|
|
|
### v1 (March 2026)
|
|
- Initial 18 AI templates covering LLM inference, image generation, agents, vector DBs, MLOps, and speech
|
|
|
|
## License
|
|
|
|
These templates reference publicly available Docker images from their respective maintainers. Each tool has its own license — refer to the individual project documentation.
|
|
|
|
---
|
|
|
|
*Portainer AI Templates by Adolfo De Lorenzo — March 2026*
|