docs: Add README, VERSION, and CHANGELOG
This commit is contained in:
35
CHANGELOG.md
Normal file
35
CHANGELOG.md
Normal file
@@ -0,0 +1,35 @@
|
|||||||
|
# Changelog
|
||||||
|
|
||||||
|
All notable changes to this project will be documented in this file.
|
||||||
|
|
||||||
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
||||||
|
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
||||||
|
|
||||||
|
## [2.0.0] - 2026-02-09
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- **Memory System** — Hybrid SQLite + ChromaDB storage for agent observations
|
||||||
|
- `/memory/save` — Save observations with automatic vector embedding
|
||||||
|
- `/memory/query` — Semantic search with progressive disclosure
|
||||||
|
- `/memory/get` — Fetch full observations by IDs
|
||||||
|
- `/memory/timeline` — Chronological context around specific time/observation
|
||||||
|
- `/memory/preference` — Store user preferences
|
||||||
|
- `/memory/stats` — Memory statistics per user
|
||||||
|
- User isolation via `user_id` field (collections: `moxie_memory_{user_id}`)
|
||||||
|
- Content deduplication via SHA256 hashing
|
||||||
|
- Observation types: general, learning, decision, preference, tool_use, system_event
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Updated Dockerfile to use PyTorch ROCm 7.2 nightly
|
||||||
|
- Bumped version to 2.0.0
|
||||||
|
|
||||||
|
## [1.0.0] - 2026-02-05
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Initial release
|
||||||
|
- Multi-tenant RAG with ChromaDB vector storage
|
||||||
|
- Document ingestion (PDF, DOCX, Excel, TXT, MD, CSV)
|
||||||
|
- Semantic search via sentence-transformers
|
||||||
|
- Audio/video transcription via Whisper API
|
||||||
|
- Email pollers for auto-ingestion (zeus@zz11.net, moxie@zz11.net)
|
||||||
|
- GPU acceleration with AMD ROCm
|
||||||
142
README.md
Normal file
142
README.md
Normal file
@@ -0,0 +1,142 @@
|
|||||||
|
# Moxie RAG
|
||||||
|
|
||||||
|
Multi-tenant RAG (Retrieval-Augmented Generation) system with hybrid SQLite + ChromaDB memory storage. Built for GPU-accelerated semantic search on AMD ROCm.
|
||||||
|
|
||||||
|
## Features
|
||||||
|
|
||||||
|
- **Multi-tenant document storage** — Isolated collections per user/tenant
|
||||||
|
- **Hybrid memory system** — SQLite for structured data + ChromaDB for vector embeddings
|
||||||
|
- **GPU-accelerated embeddings** — Runs on AMD ROCm (RX 7900 XTX tested)
|
||||||
|
- **Progressive disclosure** — Token-efficient retrieval pattern
|
||||||
|
- **File ingestion** — PDF, DOCX, Excel, TXT, MD, CSV support
|
||||||
|
- **Audio/video transcription** — Whisper integration for media files
|
||||||
|
- **Email polling** — Auto-ingest attachments from configured IMAP accounts
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Clone and start
|
||||||
|
git clone https://git.oe74.net/adelorenzo/moxie-rag.git
|
||||||
|
cd moxie-rag
|
||||||
|
docker compose up -d
|
||||||
|
|
||||||
|
# Check health
|
||||||
|
curl http://localhost:8899/health
|
||||||
|
```
|
||||||
|
|
||||||
|
## API Endpoints
|
||||||
|
|
||||||
|
### Document Operations
|
||||||
|
|
||||||
|
| Endpoint | Method | Description |
|
||||||
|
|----------|--------|-------------|
|
||||||
|
| `/` | GET | Service info and collections |
|
||||||
|
| `/health` | GET | Health check |
|
||||||
|
| `/collections` | GET | List all collections |
|
||||||
|
| `/ingest` | POST | Ingest text content |
|
||||||
|
| `/ingest-file` | POST | Upload and ingest file |
|
||||||
|
| `/query` | POST | Semantic search |
|
||||||
|
| `/documents` | GET | List indexed documents |
|
||||||
|
| `/documents/{id}` | DELETE | Delete document |
|
||||||
|
| `/transcribe` | POST | Transcribe audio/video |
|
||||||
|
|
||||||
|
### Memory Operations
|
||||||
|
|
||||||
|
| Endpoint | Method | Description |
|
||||||
|
|----------|--------|-------------|
|
||||||
|
| `/memory/save` | POST | Save observation (auto-embeds) |
|
||||||
|
| `/memory/query` | POST | Semantic search with progressive disclosure |
|
||||||
|
| `/memory/get` | POST | Fetch full observations by IDs |
|
||||||
|
| `/memory/timeline` | POST | Chronological context around time/ID |
|
||||||
|
| `/memory/preference` | POST | Save user preference |
|
||||||
|
| `/memory/preferences/{user_id}` | GET | Get all preferences |
|
||||||
|
| `/memory/stats/{user_id}` | GET | Memory statistics |
|
||||||
|
|
||||||
|
## Memory System
|
||||||
|
|
||||||
|
The memory system provides user-isolated observation storage with vector search:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Save an observation
|
||||||
|
curl -X POST http://localhost:8899/memory/save \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{
|
||||||
|
"user_id": "alice",
|
||||||
|
"content": "Learned that the API requires auth header X-API-Key",
|
||||||
|
"type": "learning",
|
||||||
|
"title": "API Auth Discovery"
|
||||||
|
}'
|
||||||
|
|
||||||
|
# Query memory (progressive disclosure - index first)
|
||||||
|
curl -X POST http://localhost:8899/memory/query \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{
|
||||||
|
"user_id": "alice",
|
||||||
|
"query": "API authentication",
|
||||||
|
"include_content": false
|
||||||
|
}'
|
||||||
|
|
||||||
|
# Get full details by IDs
|
||||||
|
curl -X POST http://localhost:8899/memory/get \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{
|
||||||
|
"user_id": "alice",
|
||||||
|
"ids": [1, 2, 3]
|
||||||
|
}'
|
||||||
|
```
|
||||||
|
|
||||||
|
### Observation Types
|
||||||
|
|
||||||
|
- `general` — Default type
|
||||||
|
- `learning` — Learned information
|
||||||
|
- `decision` — Decisions made
|
||||||
|
- `preference` — User preferences
|
||||||
|
- `tool_use` — Tool usage logs
|
||||||
|
- `system_event` — System events
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
Environment variables (set in docker-compose.yml):
|
||||||
|
|
||||||
|
| Variable | Default | Description |
|
||||||
|
|----------|---------|-------------|
|
||||||
|
| `CHROMA_DIR` | `/app/data/chromadb` | ChromaDB storage path |
|
||||||
|
| `MEMORY_DB` | `/app/data/memory.db` | SQLite database path |
|
||||||
|
| `WHISPER_URL` | `http://host.docker.internal:8081/transcribe` | Whisper API endpoint |
|
||||||
|
| `UPLOAD_DIR` | `/app/data/uploads` | File upload storage |
|
||||||
|
| `LOG_DIR` | `/app/logs` | Log directory |
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────────────────────────────────────────┐
|
||||||
|
│ FastAPI Server │
|
||||||
|
│ (port 8899) │
|
||||||
|
├─────────────────────────────────────────────────────┤
|
||||||
|
│ Document Endpoints │ Memory Endpoints │
|
||||||
|
│ /ingest, /query │ /memory/save, /memory/query │
|
||||||
|
├──────────────────────┴──────────────────────────────┤
|
||||||
|
│ RAG Engine │
|
||||||
|
│ (sentence-transformers + ChromaDB) │
|
||||||
|
├─────────────────────────────────────────────────────┤
|
||||||
|
│ ChromaDB (vectors) │ SQLite (structured data) │
|
||||||
|
│ /app/data/chromadb │ /app/data/memory.db │
|
||||||
|
└─────────────────────────────────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
## Email Pollers
|
||||||
|
|
||||||
|
Auto-ingest attachments from IMAP accounts:
|
||||||
|
|
||||||
|
- **zeus-email-poller** — Polls zeus@zz11.net → `zeus_docs` collection
|
||||||
|
- **moxie-email-poller** — Polls moxie@zz11.net → `adolfo_docs` collection
|
||||||
|
|
||||||
|
## Requirements
|
||||||
|
|
||||||
|
- Docker with compose v2
|
||||||
|
- AMD GPU with ROCm 7.2+ (or modify Dockerfile for CUDA/CPU)
|
||||||
|
- Optional: Whisper API for transcription
|
||||||
|
|
||||||
|
## License
|
||||||
|
|
||||||
|
Private repository.
|
||||||
Reference in New Issue
Block a user