feat(10-01): KB ingestion pipeline - migration, extractors, API router
- Migration 014: add status/error_message/chunk_count to kb_documents, make agent_id nullable - Add GOOGLE_CALENDAR to ChannelTypeEnum in tenant.py - Add brave_api_key, firecrawl_api_key, google_client_id/secret, minio_kb_bucket to config - Add text extractors for PDF, DOCX, PPTX, XLSX/XLS, CSV, TXT, MD - Add KB management API router with upload, list, delete, URL ingest, reindex endpoints - Install pypdf, python-docx, python-pptx, openpyxl, pandas, firecrawl-py, youtube-transcript-api - Update .env.example with new env vars - Unit tests: test_extractors.py (10 tests) and test_kb_upload.py (7 tests) all pass
This commit is contained in:
@@ -14,6 +14,15 @@ dependencies = [
|
||||
"httpx>=0.28.0",
|
||||
"sentence-transformers>=3.0.0",
|
||||
"jsonschema>=4.26.0",
|
||||
"pypdf>=6.9.2",
|
||||
"python-docx>=1.2.0",
|
||||
"python-pptx>=1.0.2",
|
||||
"openpyxl>=3.1.5",
|
||||
"pandas>=3.0.1",
|
||||
"firecrawl-py>=4.21.0",
|
||||
"youtube-transcript-api>=1.2.4",
|
||||
"google-api-python-client>=2.193.0",
|
||||
"google-auth-oauthlib>=1.3.0",
|
||||
]
|
||||
|
||||
[tool.uv.sources]
|
||||
|
||||
Reference in New Issue
Block a user