Portainer AI Templates (v2)
26 production-ready AI/ML Docker Compose stacks for Portainer — filling the AI gap in the official v3 template library. Aligned with an AI infrastructure positioning strategy for Portainer.
Background
The official Portainer v3 templates contain 71 templates with zero pure AI/ML deployments. This repository provides a curated, Portainer-compatible template set covering the entire AI infrastructure stack — from edge inference to distributed training to governed ML pipelines.
See docs/AI_GAP_ANALYSIS.md for the full gap analysis.
Homepage Alignment
These templates map directly to the AI infrastructure positioning pillars:
| Mock-Up Pillar |
Templates Covering It |
| GPU-Aware Fleet Management |
Triton, vLLM, NVIDIA NIM, Ray Cluster, Ollama, LocalAI |
| Model Lifecycle Governance |
MLflow + MinIO (Production MLOps), Prefect, BentoML, Label Studio |
| Edge AI Deployment |
ONNX Runtime (CPU/edge profile), Triton, DeepStream |
| Self-Service AI Stacks |
Open WebUI, Langflow, Flowise, n8n AI, Jupyter GPU |
| LLM Fine-Tune (diagram) |
Ray Cluster (distributed training) |
| RAG Pipeline (diagram) |
Qdrant, ChromaDB, Weaviate + Langflow/Flowise |
| Vision Model (diagram) |
DeepStream, ComfyUI, Stable Diffusion WebUI |
| Anomaly Detection (diagram) |
DeepStream (video analytics), Triton (custom models) |
Quick Start
Option A: Use as Custom Template URL in Portainer
- In Portainer, go to Settings > App Templates
- Set the URL to:
- Click Save — all 26 AI templates appear in your App Templates list
Option B: Deploy Individual Stacks
Template Catalog
LLM Inference and Model Serving
| # |
Template |
Port |
GPU |
Description |
| 1 |
Ollama |
11434 |
Yes |
Local LLM engine — Llama, Mistral, Qwen, Gemma, Phi |
| 2 |
Open WebUI + Ollama |
3000 |
Yes |
ChatGPT-like UI bundled with Ollama backend |
| 3 |
LocalAI |
8080 |
Yes |
Drop-in OpenAI API replacement |
| 4 |
vLLM |
8000 |
Yes |
High-throughput serving with PagedAttention |
| 5 |
Text Gen WebUI |
7860 |
Yes |
Comprehensive LLM interface (oobabooga) |
| 6 |
LiteLLM Proxy |
4000 |
No |
Unified API gateway for 100+ LLM providers |
| 26 |
NVIDIA NIM |
8000 |
Yes |
Enterprise TensorRT-LLM optimized inference |
Production Inference Serving
| # |
Template |
Port |
GPU |
Description |
| 19 |
NVIDIA Triton |
8000 |
Yes |
Multi-framework inference server (TensorRT, ONNX, PyTorch, TF) |
| 20 |
ONNX Runtime |
8001 |
Optional |
Lightweight inference with GPU and CPU/edge profiles |
| 24 |
BentoML |
3000 |
Yes |
Model packaging and serving with metrics |
Image and Video Generation
| # |
Template |
Port |
GPU |
Description |
| 7 |
ComfyUI |
8188 |
Yes |
Node-based Stable Diffusion workflow engine |
| 8 |
Stable Diffusion WebUI |
7860 |
Yes |
AUTOMATIC1111 interface for image generation |
Industrial AI and Computer Vision
| # |
Template |
Port |
GPU |
Description |
| 21 |
NVIDIA DeepStream |
8554 |
Yes |
Video analytics for inspection, anomaly detection, smart factory |
Distributed Training
| # |
Template |
Port |
GPU |
Description |
| 22 |
Ray Cluster |
8265 |
Yes |
Head + workers for LLM fine-tuning, distributed training, Ray Serve |
AI Agents and Workflows
| # |
Template |
Port |
GPU |
Description |
| 9 |
Langflow |
7860 |
No |
Visual multi-agent and RAG pipeline builder |
| 10 |
Flowise |
3000 |
No |
Drag-and-drop LLM chatflow builder |
| 11 |
n8n (AI-Enabled) |
5678 |
No |
Workflow automation with AI agent nodes |
Vector Databases
| # |
Template |
Port |
GPU |
Description |
| 12 |
Qdrant |
6333 |
No |
High-performance vector similarity search |
| 13 |
ChromaDB |
8000 |
No |
AI-native embedding database |
| 14 |
Weaviate |
8080 |
No |
Vector DB with built-in vectorization modules |
ML Operations and Governance
| # |
Template |
Port |
GPU |
Description |
| 15 |
MLflow |
5000 |
No |
Experiment tracking and model registry (SQLite) |
| 25 |
MLflow + MinIO |
5000 |
No |
Production MLOps: PostgreSQL + S3 artifact store |
| 23 |
Prefect |
4200 |
No |
Governed ML pipeline orchestration with audit logging |
| 16 |
Label Studio |
8080 |
No |
Multi-type data labeling platform |
| 17 |
Jupyter (GPU/PyTorch) |
8888 |
Yes |
GPU-accelerated notebooks |
Speech and Audio
| # |
Template |
Port |
GPU |
Description |
| 18 |
Whisper ASR |
9000 |
Yes |
Speech-to-text API server |
GPU Requirements
Templates marked GPU: Yes require:
Edge deployments (ONNX Runtime CPU profile): No GPU required — runs on ARM or x86 with constrained CPU/memory limits.
For AMD GPUs (ROCm), modify the deploy.resources section to use ROCm-compatible images and remove the NVIDIA device reservation.
File Structure
Changelog
v2 (March 2026)
- Added 8 templates to close alignment gap with AI infrastructure positioning:
- NVIDIA Triton Inference Server — production multi-framework inference
- ONNX Runtime Server — lightweight edge inference with CPU/GPU profiles
- NVIDIA DeepStream — industrial computer vision and video analytics
- Ray Cluster (GPU) — distributed training and fine-tuning
- Prefect — governed ML pipeline orchestration
- BentoML — model packaging and serving
- MLflow + MinIO — production MLOps with S3 artifact governance
- NVIDIA NIM — enterprise-optimized LLM inference
v1 (March 2026)
- Initial 18 AI templates covering LLM inference, image generation, agents, vector DBs, MLOps, and speech
License
These templates reference publicly available Docker images from their respective maintainers. Each tool has its own license — refer to the individual project documentation.
Portainer AI Templates by Adolfo De Lorenzo — March 2026