feat(10-01): KB ingestion pipeline - migration, extractors, API router

- Migration 014: add status/error_message/chunk_count to kb_documents, make agent_id nullable
- Add GOOGLE_CALENDAR to ChannelTypeEnum in tenant.py
- Add brave_api_key, firecrawl_api_key, google_client_id/secret, minio_kb_bucket to config
- Add text extractors for PDF, DOCX, PPTX, XLSX/XLS, CSV, TXT, MD
- Add KB management API router with upload, list, delete, URL ingest, reindex endpoints
- Install pypdf, python-docx, python-pptx, openpyxl, pandas, firecrawl-py, youtube-transcript-api
- Update .env.example with new env vars
- Unit tests: test_extractors.py (10 tests) and test_kb_upload.py (7 tests) all pass
This commit is contained in:
2026-03-26 09:05:29 -06:00
parent eae4b0324d
commit e8d3e8a108
11 changed files with 1745 additions and 28 deletions

View File

@@ -14,6 +14,15 @@ dependencies = [
"httpx>=0.28.0",
"sentence-transformers>=3.0.0",
"jsonschema>=4.26.0",
"pypdf>=6.9.2",
"python-docx>=1.2.0",
"python-pptx>=1.0.2",
"openpyxl>=3.1.5",
"pandas>=3.0.1",
"firecrawl-py>=4.21.0",
"youtube-transcript-api>=1.2.4",
"google-api-python-client>=2.193.0",
"google-auth-oauthlib>=1.3.0",
]
[tool.uv.sources]