From 6635c861c8db37adec84e16e7b41d6722ac4789d Mon Sep 17 00:00:00 2001 From: Adolfo Delorenzo Date: Mon, 9 Feb 2026 15:46:11 -0600 Subject: [PATCH] docs: Add README, VERSION, and CHANGELOG --- CHANGELOG.md | 35 +++++++++++++ README.md | 142 +++++++++++++++++++++++++++++++++++++++++++++++++++ VERSION | 1 + 3 files changed, 178 insertions(+) create mode 100644 CHANGELOG.md create mode 100644 README.md create mode 100644 VERSION diff --git a/CHANGELOG.md b/CHANGELOG.md new file mode 100644 index 0000000..5ef281a --- /dev/null +++ b/CHANGELOG.md @@ -0,0 +1,35 @@ +# Changelog + +All notable changes to this project will be documented in this file. + +The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), +and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). + +## [2.0.0] - 2026-02-09 + +### Added +- **Memory System** — Hybrid SQLite + ChromaDB storage for agent observations + - `/memory/save` — Save observations with automatic vector embedding + - `/memory/query` — Semantic search with progressive disclosure + - `/memory/get` — Fetch full observations by IDs + - `/memory/timeline` — Chronological context around specific time/observation + - `/memory/preference` — Store user preferences + - `/memory/stats` — Memory statistics per user +- User isolation via `user_id` field (collections: `moxie_memory_{user_id}`) +- Content deduplication via SHA256 hashing +- Observation types: general, learning, decision, preference, tool_use, system_event + +### Changed +- Updated Dockerfile to use PyTorch ROCm 7.2 nightly +- Bumped version to 2.0.0 + +## [1.0.0] - 2026-02-05 + +### Added +- Initial release +- Multi-tenant RAG with ChromaDB vector storage +- Document ingestion (PDF, DOCX, Excel, TXT, MD, CSV) +- Semantic search via sentence-transformers +- Audio/video transcription via Whisper API +- Email pollers for auto-ingestion (zeus@zz11.net, moxie@zz11.net) +- GPU acceleration with AMD ROCm diff --git a/README.md b/README.md new file mode 100644 index 0000000..40cae56 --- /dev/null +++ b/README.md @@ -0,0 +1,142 @@ +# Moxie RAG + +Multi-tenant RAG (Retrieval-Augmented Generation) system with hybrid SQLite + ChromaDB memory storage. Built for GPU-accelerated semantic search on AMD ROCm. + +## Features + +- **Multi-tenant document storage** — Isolated collections per user/tenant +- **Hybrid memory system** — SQLite for structured data + ChromaDB for vector embeddings +- **GPU-accelerated embeddings** — Runs on AMD ROCm (RX 7900 XTX tested) +- **Progressive disclosure** — Token-efficient retrieval pattern +- **File ingestion** — PDF, DOCX, Excel, TXT, MD, CSV support +- **Audio/video transcription** — Whisper integration for media files +- **Email polling** — Auto-ingest attachments from configured IMAP accounts + +## Quick Start + +```bash +# Clone and start +git clone https://git.oe74.net/adelorenzo/moxie-rag.git +cd moxie-rag +docker compose up -d + +# Check health +curl http://localhost:8899/health +``` + +## API Endpoints + +### Document Operations + +| Endpoint | Method | Description | +|----------|--------|-------------| +| `/` | GET | Service info and collections | +| `/health` | GET | Health check | +| `/collections` | GET | List all collections | +| `/ingest` | POST | Ingest text content | +| `/ingest-file` | POST | Upload and ingest file | +| `/query` | POST | Semantic search | +| `/documents` | GET | List indexed documents | +| `/documents/{id}` | DELETE | Delete document | +| `/transcribe` | POST | Transcribe audio/video | + +### Memory Operations + +| Endpoint | Method | Description | +|----------|--------|-------------| +| `/memory/save` | POST | Save observation (auto-embeds) | +| `/memory/query` | POST | Semantic search with progressive disclosure | +| `/memory/get` | POST | Fetch full observations by IDs | +| `/memory/timeline` | POST | Chronological context around time/ID | +| `/memory/preference` | POST | Save user preference | +| `/memory/preferences/{user_id}` | GET | Get all preferences | +| `/memory/stats/{user_id}` | GET | Memory statistics | + +## Memory System + +The memory system provides user-isolated observation storage with vector search: + +```bash +# Save an observation +curl -X POST http://localhost:8899/memory/save \ + -H "Content-Type: application/json" \ + -d '{ + "user_id": "alice", + "content": "Learned that the API requires auth header X-API-Key", + "type": "learning", + "title": "API Auth Discovery" + }' + +# Query memory (progressive disclosure - index first) +curl -X POST http://localhost:8899/memory/query \ + -H "Content-Type: application/json" \ + -d '{ + "user_id": "alice", + "query": "API authentication", + "include_content": false + }' + +# Get full details by IDs +curl -X POST http://localhost:8899/memory/get \ + -H "Content-Type: application/json" \ + -d '{ + "user_id": "alice", + "ids": [1, 2, 3] + }' +``` + +### Observation Types + +- `general` — Default type +- `learning` — Learned information +- `decision` — Decisions made +- `preference` — User preferences +- `tool_use` — Tool usage logs +- `system_event` — System events + +## Configuration + +Environment variables (set in docker-compose.yml): + +| Variable | Default | Description | +|----------|---------|-------------| +| `CHROMA_DIR` | `/app/data/chromadb` | ChromaDB storage path | +| `MEMORY_DB` | `/app/data/memory.db` | SQLite database path | +| `WHISPER_URL` | `http://host.docker.internal:8081/transcribe` | Whisper API endpoint | +| `UPLOAD_DIR` | `/app/data/uploads` | File upload storage | +| `LOG_DIR` | `/app/logs` | Log directory | + +## Architecture + +``` +┌─────────────────────────────────────────────────────┐ +│ FastAPI Server │ +│ (port 8899) │ +├─────────────────────────────────────────────────────┤ +│ Document Endpoints │ Memory Endpoints │ +│ /ingest, /query │ /memory/save, /memory/query │ +├──────────────────────┴──────────────────────────────┤ +│ RAG Engine │ +│ (sentence-transformers + ChromaDB) │ +├─────────────────────────────────────────────────────┤ +│ ChromaDB (vectors) │ SQLite (structured data) │ +│ /app/data/chromadb │ /app/data/memory.db │ +└─────────────────────────────────────────────────────┘ +``` + +## Email Pollers + +Auto-ingest attachments from IMAP accounts: + +- **zeus-email-poller** — Polls zeus@zz11.net → `zeus_docs` collection +- **moxie-email-poller** — Polls moxie@zz11.net → `adolfo_docs` collection + +## Requirements + +- Docker with compose v2 +- AMD GPU with ROCm 7.2+ (or modify Dockerfile for CUDA/CPU) +- Optional: Whisper API for transcription + +## License + +Private repository. diff --git a/VERSION b/VERSION new file mode 100644 index 0000000..227cea2 --- /dev/null +++ b/VERSION @@ -0,0 +1 @@ +2.0.0