9.7 KiB
9.7 KiB
phase, plan, subsystem, tags, requires, provides, affects, tech-stack, key-files, key-decisions, patterns-established, requirements-completed, duration, completed
| phase | plan | subsystem | tags | requires | provides | affects | tech-stack | key-files | key-decisions | patterns-established | requirements-completed | duration | completed | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 10-agent-capabilities | 01 | api |
|
|
|
|
|
|
|
|
|
11min | 2026-03-26 |
Phase 10 Plan 01: KB Ingestion Pipeline Summary
Document ingestion pipeline for KB search: text extractors (PDF/DOCX/PPTX/XLSX/CSV/TXT/MD), Celery async ingest task, executor tenant context injection, and KB management REST API
Performance
- Duration: 11 min
- Started: 2026-03-26T14:59:19Z
- Completed: 2026-03-26T15:10:06Z
- Tasks: 2
- Files modified: 16
Accomplishments
- Full document text extraction for 7 format families using pypdf, python-docx, python-pptx, pandas, plus CSV/TXT/MD decode
- KB management REST API with file upload, URL/YouTube ingest, list, delete, and reindex endpoints
- Celery
ingest_documenttask runs async pipeline: MinIO download → extract → chunk (500 char sliding window) → embed (all-MiniLM-L6-v2) → store kb_chunks - Tool executor now injects
tenant_idandagent_idas string kwargs into every tool handler before invocation - 31 unit tests pass across all 4 test files
Task Commits
- Task 1: Migration 013, ORM updates, config settings, text extractors, KB API router -
e8d3e8a(feat) - Task 2: Celery ingestion task, executor tenant_id injection, KB search wiring -
9c7686a(feat)
Files Created/Modified
migrations/versions/014_kb_status.py- Migration: add status/error_message/chunk_count to kb_documents, make agent_id nullablepackages/shared/shared/models/kb.py- Added status/error_message/chunk_count mapped columns, agent_id nullablepackages/shared/shared/models/tenant.py- Added GOOGLE_CALENDAR and WEB to ChannelTypeEnumpackages/shared/shared/config.py- Added brave_api_key, firecrawl_api_key, google_client_id, google_client_secret, minio_kb_bucketpackages/shared/shared/api/kb.py- New KB management API router (5 endpoints)packages/orchestrator/orchestrator/tools/extractors.py- Text extraction for all 7 formatspackages/orchestrator/orchestrator/tools/ingest.py- chunk_text + ingest_document_pipelinepackages/orchestrator/orchestrator/tasks.py- Added ingest_document Celery taskpackages/orchestrator/orchestrator/tools/executor.py- tenant_id/agent_id injection after schema validationpackages/orchestrator/orchestrator/tools/builtins/web_search.py- Migrated to settings.brave_api_keypackages/orchestrator/pyproject.toml- Added 8 new dependencies.env.example- Added BRAVE_API_KEY, FIRECRAWL_API_KEY, GOOGLE_CLIENT_ID/SECRET, MINIO_KB_BUCKET
Decisions Made
- Migration numbered 014 (not 013) — 013 was already used by a google_calendar channel type migration from a prior session
- KB is per-tenant not per-agent — agent_id made nullable in kb_documents
- Executor injects tenant_id/agent_id as strings after schema validation to avoid triggering schema rejections
- Lazy import of ingest_document task in kb.py via
_get_ingest_task()function — avoids shared→orchestrator circular dependency at module load time ingest_document_pipelineuses ORMselect(KnowledgeBaseDocument)for document fetch (testable via mock) and raw SQL for chunk INSERTs (pgvector CAST pattern)
Deviations from Plan
Auto-fixed Issues
1. [Rule 3 - Blocking] Migration renumbered from 013 to 014
- Found during: Task 1 (Migration creation)
- Issue: Migration 013 already existed (
013_google_calendar_channel.py) from a prior phase session - Fix: Renamed migration file to
014_kb_status.pywith revision=014, down_revision=013 - Files modified: migrations/versions/014_kb_status.py
- Verification: File renamed, revision chain intact
- Committed in:
e8d3e8a(Task 1 commit)
2. [Rule 2 - Missing Critical] Added WEB to ChannelTypeEnum alongside GOOGLE_CALENDAR
- Found during: Task 1 (tenant.py update)
- Issue: WEB channel type was missing from the enum (google_calendar was not the only new type)
- Fix: Added both
WEB = "web"andGOOGLE_CALENDAR = "google_calendar"to ChannelTypeEnum - Files modified: packages/shared/shared/models/tenant.py
- Committed in:
e8d3e8a(Task 1 commit)
3. [Rule 1 - Bug] FastAPI Depends overrides required for KB upload tests
- Found during: Task 1 (test_kb_upload.py)
- Issue: Initial test approach used
patch()to mock auth deps but FastAPI calls Depends directly — 422 returned - Fix: Updated test to use
app.dependency_overrides(correct FastAPI testing pattern) - Files modified: tests/unit/test_kb_upload.py
- Committed in:
e8d3e8a(Task 1 commit)
Total deviations: 3 auto-fixed (1 blocking, 1 missing critical, 1 bug) Impact on plan: All fixes necessary for correctness. No scope creep.
Issues Encountered
None beyond the deviations documented above.
User Setup Required
New environment variables needed:
BRAVE_API_KEY— Brave Search API key (https://brave.com/search/api/)FIRECRAWL_API_KEY— Firecrawl API key for URL scraping (https://firecrawl.dev)GOOGLE_CLIENT_ID/GOOGLE_CLIENT_SECRET— Google OAuth credentialsMINIO_KB_BUCKET— MinIO bucket for KB documents (default:kb-documents)
Next Phase Readiness
- KB ingestion pipeline is fully functional and tested
- kb_search tool already wired to query kb_chunks via pgvector (existing from Phase 2)
- Executor now injects tenant context — all context-aware tools (kb_search, calendar) will work correctly
- Ready for 10-02 (calendar tool) and 10-03 (any remaining agent capability work)
Self-Check: PASSED
All files found on disk. All commits verified in git log.
Phase: 10-agent-capabilities Completed: 2026-03-26