Files
2026-03-10 14:40:51 -03:00

188 lines
7.9 KiB
Markdown

# Portainer AI Templates (v2)
> **26 production-ready AI/ML Docker Compose stacks for Portainer** — filling the AI gap in the official v3 template library. Aligned with an AI infrastructure positioning strategy for Portainer.
## Background
The official [Portainer v3 templates](https://raw.githubusercontent.com/portainer/templates/v3/templates.json) contain **71 templates** with **zero pure AI/ML deployments**. This repository provides a curated, Portainer-compatible template set covering the entire AI infrastructure stack — from edge inference to distributed training to governed ML pipelines.
See [docs/AI_GAP_ANALYSIS.md](docs/AI_GAP_ANALYSIS.md) for the full gap analysis.
## Homepage Alignment
These templates map directly to the AI infrastructure positioning pillars:
| Mock-Up Pillar | Templates Covering It |
|---|---|
| **GPU-Aware Fleet Management** | Triton, vLLM, NVIDIA NIM, Ray Cluster, Ollama, LocalAI |
| **Model Lifecycle Governance** | MLflow + MinIO (Production MLOps), Prefect, BentoML, Label Studio |
| **Edge AI Deployment** | ONNX Runtime (CPU/edge profile), Triton, DeepStream |
| **Self-Service AI Stacks** | Open WebUI, Langflow, Flowise, n8n AI, Jupyter GPU |
| **LLM Fine-Tune** (diagram) | Ray Cluster (distributed training) |
| **RAG Pipeline** (diagram) | Qdrant, ChromaDB, Weaviate + Langflow/Flowise |
| **Vision Model** (diagram) | DeepStream, ComfyUI, Stable Diffusion WebUI |
| **Anomaly Detection** (diagram) | DeepStream (video analytics), Triton (custom models) |
## Quick Start
### Option A: Use as Custom Template URL in Portainer
1. In Portainer, go to **Settings > App Templates**
2. Set the URL to:
```
https://git.oe74.net/adelorenzo/portainer_scripts/raw/branch/master/ai-templates/portainer-ai-templates.json
```
3. Click **Save** — all 26 AI templates appear in your App Templates list
### Option B: Deploy Individual Stacks
```bash
cd stacks/ollama
docker compose up -d
```
## Template Catalog
### LLM Inference and Model Serving
| # | Template | Port | GPU | Description |
|---|---|---|---|---|
| 1 | **Ollama** | 11434 | Yes | Local LLM engine — Llama, Mistral, Qwen, Gemma, Phi |
| 2 | **Open WebUI + Ollama** | 3000 | Yes | ChatGPT-like UI bundled with Ollama backend |
| 3 | **LocalAI** | 8080 | Yes | Drop-in OpenAI API replacement |
| 4 | **vLLM** | 8000 | Yes | High-throughput serving with PagedAttention |
| 5 | **Text Gen WebUI** | 7860 | Yes | Comprehensive LLM interface (oobabooga) |
| 6 | **LiteLLM Proxy** | 4000 | No | Unified API gateway for 100+ LLM providers |
| 26 | **NVIDIA NIM** | 8000 | Yes | Enterprise TensorRT-LLM optimized inference |
### Production Inference Serving
| # | Template | Port | GPU | Description |
|---|---|---|---|---|
| 19 | **NVIDIA Triton** | 8000 | Yes | Multi-framework inference server (TensorRT, ONNX, PyTorch, TF) |
| 20 | **ONNX Runtime** | 8001 | Optional | Lightweight inference with GPU and CPU/edge profiles |
| 24 | **BentoML** | 3000 | Yes | Model packaging and serving with metrics |
### Image and Video Generation
| # | Template | Port | GPU | Description |
|---|---|---|---|---|
| 7 | **ComfyUI** | 8188 | Yes | Node-based Stable Diffusion workflow engine |
| 8 | **Stable Diffusion WebUI** | 7860 | Yes | AUTOMATIC1111 interface for image generation |
### Industrial AI and Computer Vision
| # | Template | Port | GPU | Description |
|---|---|---|---|---|
| 21 | **NVIDIA DeepStream** | 8554 | Yes | Video analytics for inspection, anomaly detection, smart factory |
### Distributed Training
| # | Template | Port | GPU | Description |
|---|---|---|---|---|
| 22 | **Ray Cluster** | 8265 | Yes | Head + workers for LLM fine-tuning, distributed training, Ray Serve |
### AI Agents and Workflows
| # | Template | Port | GPU | Description |
|---|---|---|---|---|
| 9 | **Langflow** | 7860 | No | Visual multi-agent and RAG pipeline builder |
| 10 | **Flowise** | 3000 | No | Drag-and-drop LLM chatflow builder |
| 11 | **n8n (AI-Enabled)** | 5678 | No | Workflow automation with AI agent nodes |
### Vector Databases
| # | Template | Port | GPU | Description |
|---|---|---|---|---|
| 12 | **Qdrant** | 6333 | No | High-performance vector similarity search |
| 13 | **ChromaDB** | 8000 | No | AI-native embedding database |
| 14 | **Weaviate** | 8080 | No | Vector DB with built-in vectorization modules |
### ML Operations and Governance
| # | Template | Port | GPU | Description |
|---|---|---|---|---|
| 15 | **MLflow** | 5000 | No | Experiment tracking and model registry (SQLite) |
| 25 | **MLflow + MinIO** | 5000 | No | Production MLOps: PostgreSQL + S3 artifact store |
| 23 | **Prefect** | 4200 | No | Governed ML pipeline orchestration with audit logging |
| 16 | **Label Studio** | 8080 | No | Multi-type data labeling platform |
| 17 | **Jupyter (GPU/PyTorch)** | 8888 | Yes | GPU-accelerated notebooks |
### Speech and Audio
| # | Template | Port | GPU | Description |
|---|---|---|---|---|
| 18 | **Whisper ASR** | 9000 | Yes | Speech-to-text API server |
## GPU Requirements
Templates marked **GPU: Yes** require:
- NVIDIA GPU with CUDA support
- [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html) installed
- Docker configured with `nvidia` runtime
**Edge deployments (ONNX Runtime CPU profile):** No GPU required — runs on ARM or x86 with constrained CPU/memory limits.
For AMD GPUs (ROCm), modify the `deploy.resources` section to use ROCm-compatible images and remove the NVIDIA device reservation.
## File Structure
```
ai-templates/
├── portainer-ai-templates.json # Portainer v3 template definition (26 templates)
├── README.md
├── docs/
│ └── AI_GAP_ANALYSIS.md # Analysis of official templates gap
└── stacks/
├── ollama/ # LLM Inference
├── open-webui/
├── localai/
├── vllm/
├── text-generation-webui/
├── litellm/
├── nvidia-nim/ # v2: Enterprise inference
├── triton/ # v2: Production inference serving
├── onnx-runtime/ # v2: Edge-friendly inference
├── bentoml/ # v2: Model packaging + serving
├── deepstream/ # v2: Industrial computer vision
├── ray-cluster/ # v2: Distributed training
├── prefect/ # v2: Governed ML pipelines
├── minio-mlops/ # v2: Production MLOps stack
├── comfyui/ # Image generation
├── stable-diffusion-webui/
├── langflow/ # AI agents
├── flowise/
├── n8n-ai/
├── qdrant/ # Vector databases
├── chromadb/
├── weaviate/
├── mlflow/ # ML operations
├── label-studio/
├── jupyter-gpu/
└── whisper/ # Speech
```
## Changelog
### v2 (March 2026)
- Added 8 templates to close alignment gap with AI infrastructure positioning:
- **NVIDIA Triton Inference Server** — production multi-framework inference
- **ONNX Runtime Server** — lightweight edge inference with CPU/GPU profiles
- **NVIDIA DeepStream** — industrial computer vision and video analytics
- **Ray Cluster (GPU)** — distributed training and fine-tuning
- **Prefect** — governed ML pipeline orchestration
- **BentoML** — model packaging and serving
- **MLflow + MinIO** — production MLOps with S3 artifact governance
- **NVIDIA NIM** — enterprise-optimized LLM inference
### v1 (March 2026)
- Initial 18 AI templates covering LLM inference, image generation, agents, vector DBs, MLOps, and speech
## License
These templates reference publicly available Docker images from their respective maintainers. Each tool has its own license — refer to the individual project documentation.
---
*Portainer AI Templates by Adolfo De Lorenzo — March 2026*