feat(10-01): KB ingestion pipeline - migration, extractors, API router
- Migration 014: add status/error_message/chunk_count to kb_documents, make agent_id nullable - Add GOOGLE_CALENDAR to ChannelTypeEnum in tenant.py - Add brave_api_key, firecrawl_api_key, google_client_id/secret, minio_kb_bucket to config - Add text extractors for PDF, DOCX, PPTX, XLSX/XLS, CSV, TXT, MD - Add KB management API router with upload, list, delete, URL ingest, reindex endpoints - Install pypdf, python-docx, python-pptx, openpyxl, pandas, firecrawl-py, youtube-transcript-api - Update .env.example with new env vars - Unit tests: test_extractors.py (10 tests) and test_kb_upload.py (7 tests) all pass
This commit is contained in:
15
.env.example
15
.env.example
@@ -62,6 +62,21 @@ DEBUG=false
|
||||
# Tenant rate limits (requests per minute defaults)
|
||||
DEFAULT_RATE_LIMIT_RPM=60
|
||||
|
||||
# -----------------------------------------------------------------------------
|
||||
# Web Search / Knowledge Base Scraping
|
||||
# BRAVE_API_KEY: Get from https://brave.com/search/api/
|
||||
# FIRECRAWL_API_KEY: Get from https://firecrawl.dev
|
||||
# -----------------------------------------------------------------------------
|
||||
BRAVE_API_KEY=
|
||||
FIRECRAWL_API_KEY=
|
||||
|
||||
# Google OAuth (Calendar integration)
|
||||
GOOGLE_CLIENT_ID=
|
||||
GOOGLE_CLIENT_SECRET=
|
||||
|
||||
# MinIO KB bucket (for knowledge base documents)
|
||||
MINIO_KB_BUCKET=kb-documents
|
||||
|
||||
# -----------------------------------------------------------------------------
|
||||
# Web Push Notifications (VAPID keys)
|
||||
# Generate with: cd packages/portal && npx web-push generate-vapid-keys
|
||||
|
||||
Reference in New Issue
Block a user