Compare commits

...

14 Commits

Author SHA1 Message Date
f1b79dffe0 docs: update PROJECT.md, add README.md and CHANGELOG.md
Some checks failed
CI / Backend Tests (push) Has been cancelled
CI / Portal E2E (push) Has been cancelled
- PROJECT.md updated to reflect v1.0 completion (10 phases, 39 plans,
  67 requirements). All key decisions marked as shipped.
- README.md: comprehensive project documentation with quick start,
  architecture, tech stack, configuration, and project structure.
- CHANGELOG.md: detailed changelog covering all 10 phases with
  feature descriptions organized by phase.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 09:37:55 -06:00
cac01b7ff9 docs(phase-10): complete Agent Capabilities phase execution
Some checks failed
CI / Backend Tests (push) Has been cancelled
CI / Portal E2E (push) Has been cancelled
2026-03-26 09:29:24 -06:00
08d602a3e8 docs(10-03): complete Knowledge Base portal page plan
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 09:24:30 -06:00
bc8cbd26df docs(10-02): complete Google Calendar OAuth and calendar tool CRUD plan
- Update STATE.md: stopped_at reflects 10-02 completion
- SUMMARY.md already captured in previous commit (e56b5f8)
2026-03-26 09:13:32 -06:00
e56b5f885b docs(10-01): complete KB ingestion pipeline plan 2026-03-26 09:11:56 -06:00
a64634ff90 feat(10-02): mount KB and calendar routers, update tool registry and prompt builder
- Mount kb_router and calendar_auth_router on gateway (Phase 10 agent capabilities)
- Update calendar_lookup tool schema with action/event_summary/event_start/event_end params
- Add tool result formatting instruction to build_system_prompt when tools assigned (CAP-06)
- Add kb_router and calendar_auth_router to shared/api/__init__.py exports
- Confirm CAP-04 (http_request) and CAP-07 (audit logging) already working
2026-03-26 09:10:01 -06:00
9c7686a7b4 feat(10-01): Celery ingestion task, executor injection, KB search wiring
- Add ingest_document Celery task (sync def + asyncio.run per arch constraint)
- Add ingest_document_pipeline: MinIO download, extract, chunk, embed, store
- Add chunk_text sliding window chunker (500 chars default, 50 overlap)
- Update execute_tool to inject tenant_id/agent_id into all tool handler kwargs
- Update web_search to use settings.brave_api_key (shared config) not os.getenv
- Unit tests: test_ingestion.py (9 tests) and test_executor_injection.py (5 tests) all pass
2026-03-26 09:09:36 -06:00
08572fcc40 feat(10-02): Google Calendar OAuth endpoints and per-tenant calendar tool
- Add calendar_auth.py: OAuth install/callback/status endpoints with HMAC-signed state
- Replace calendar_lookup.py service account stub with per-tenant OAuth token lookup
- Support list, check_availability, and create actions with natural language responses
- Token auto-refresh: write updated credentials back to channel_connections on refresh
- Add migration 013: add google_calendar to channel_type CHECK constraint
- Add unit tests: 16 tests covering all actions, not-connected path, token refresh write-back
2026-03-26 09:07:37 -06:00
e8d3e8a108 feat(10-01): KB ingestion pipeline - migration, extractors, API router
- Migration 014: add status/error_message/chunk_count to kb_documents, make agent_id nullable
- Add GOOGLE_CALENDAR to ChannelTypeEnum in tenant.py
- Add brave_api_key, firecrawl_api_key, google_client_id/secret, minio_kb_bucket to config
- Add text extractors for PDF, DOCX, PPTX, XLSX/XLS, CSV, TXT, MD
- Add KB management API router with upload, list, delete, URL ingest, reindex endpoints
- Install pypdf, python-docx, python-pptx, openpyxl, pandas, firecrawl-py, youtube-transcript-api
- Update .env.example with new env vars
- Unit tests: test_extractors.py (10 tests) and test_kb_upload.py (7 tests) all pass
2026-03-26 09:05:29 -06:00
eae4b0324d docs(10): create phase plan
Some checks failed
CI / Backend Tests (push) Has been cancelled
CI / Portal E2E (push) Has been cancelled
2026-03-25 23:33:27 -06:00
95d05f5f88 docs(10): add research and validation strategy 2026-03-25 23:24:53 -06:00
9f70eede69 docs(10): research phase agent capabilities 2026-03-25 23:24:03 -06:00
003bebc39f docs(state): record phase 10 context session
Some checks failed
CI / Backend Tests (push) Has been cancelled
CI / Portal E2E (push) Has been cancelled
2026-03-25 23:17:22 -06:00
63cc198ede docs(10): capture phase context 2026-03-25 23:17:22 -06:00
43 changed files with 6445 additions and 177 deletions

View File

@@ -62,6 +62,21 @@ DEBUG=false
# Tenant rate limits (requests per minute defaults)
DEFAULT_RATE_LIMIT_RPM=60
# -----------------------------------------------------------------------------
# Web Search / Knowledge Base Scraping
# BRAVE_API_KEY: Get from https://brave.com/search/api/
# FIRECRAWL_API_KEY: Get from https://firecrawl.dev
# -----------------------------------------------------------------------------
BRAVE_API_KEY=
FIRECRAWL_API_KEY=
# Google OAuth (Calendar integration)
GOOGLE_CLIENT_ID=
GOOGLE_CLIENT_SECRET=
# MinIO KB bucket (for knowledge base documents)
MINIO_KB_BUCKET=kb-documents
# -----------------------------------------------------------------------------
# Web Push Notifications (VAPID keys)
# Generate with: cd packages/portal && npx web-push generate-vapid-keys

View File

@@ -2,69 +2,74 @@
## What This Is
Konstruct is an AI workforce platform where SMBs subscribe to AI employees that communicate through familiar messaging channels — Slack and WhatsApp for v1. Clients get an AI worker that shows up where their team already communicates, requiring zero behavior change. Think "hire an AI department" rather than "subscribe to another SaaS dashboard."
Konstruct is an AI workforce platform where SMBs subscribe to AI employees that communicate through familiar messaging channels — Slack, WhatsApp, and the built-in web chat. Clients get AI workers that show up where their team already communicates, requiring zero behavior change. Think "hire an AI department" rather than "subscribe to another SaaS dashboard."
## Core Value
An AI employee that works in the channels your team already uses — no new tools to learn, no dashboards to check, just a capable coworker in Slack or WhatsApp.
An AI employee that works in the channels your team already uses — no new tools to learn, no dashboards to check, just a capable coworker in Slack, WhatsApp, or the portal chat.
## Requirements
## Current State (v1.0 — Beta-Ready)
### Validated
All 10 phases complete. 39 plans executed. 67 requirements satisfied.
(None yet — ship to validate)
### What's Shipped
### Active
| Feature | Status |
|---------|--------|
| Channel Gateway (Slack + WhatsApp + Web Chat) | ✓ Complete |
| Multi-tenant isolation (PostgreSQL RLS) | ✓ Complete |
| LLM Backend (Ollama + Anthropic/OpenAI via LiteLLM) | ✓ Complete |
| Conversational memory (Redis sliding window + pgvector) | ✓ Complete |
| Tool framework (web search, KB, HTTP, calendar) | ✓ Complete |
| Knowledge base (document upload, URL scraping, YouTube transcription) | ✓ Complete |
| Google Calendar integration (OAuth, CRUD) | ✓ Complete |
| Human escalation with assistant mode | ✓ Complete |
| Bidirectional media support (multimodal LLM) | ✓ Complete |
| Admin portal (Next.js 16, shadcn/ui, DM Sans) | ✓ Complete |
| Agent Designer + Wizard + 6 pre-built templates | ✓ Complete |
| Stripe billing (per-agent monthly, 14-day trial) | ✓ Complete |
| BYO API keys (Fernet encrypted) | ✓ Complete |
| Cost dashboard with Recharts | ✓ Complete |
| 3-tier RBAC (platform admin, customer admin, operator) | ✓ Complete |
| Email invitation flow (SMTP, HMAC tokens) | ✓ Complete |
| Web Chat with real-time streaming (bypass Celery) | ✓ Complete |
| Multilanguage (English, Spanish, Portuguese) | ✓ Complete |
| Mobile layout (bottom tab bar, full-screen chat) | ✓ Complete |
| PWA (service worker, push notifications, offline queue) | ✓ Complete |
| E2E tests (Playwright, 7 flows, 3 browsers) | ✓ Complete |
| CI pipeline (Gitea Actions) | ✓ Complete |
| Premium UI (indigo brand, dark sidebar, glass-morphism) | ✓ Complete |
- [ ] Channel Gateway that normalizes messages from Slack and WhatsApp into a unified internal format
- [ ] Single AI employee per tenant with configurable role, persona, and tools
- [ ] Multi-tenant architecture with proper isolation (PostgreSQL RLS for Starter tier)
- [ ] LLM backend pool with Ollama (local) + commercial APIs (Anthropic/OpenAI) via LiteLLM
- [ ] Full admin portal (Next.js) for tenant management, agent configuration, and monitoring
- [ ] Tenant onboarding flow in the portal
- [ ] Billing integration (Stripe) for subscription management
- [ ] Conversational memory (conversation history + vector search)
- [ ] Tool framework for agent capabilities (registry, execution)
- [ ] Rate limiting per tenant and per channel
### v2 Scope (Deferred)
### Out of Scope
- Multi-agent teams and coordinator pattern — v2 (need single agent working first)
- AI company hierarchy (teams of teams) — v2+
- Microsoft Teams, Mattermost, Rocket.Chat, Signal, Telegram — v2 channel expansion
- BYO API key support — moved to v1 Phase 3 (operator requested during scoping)
- Self-hosted deployment (Helm chart) — v2+ (SaaS-first for beta)
- Voice/telephony channels — v3+
- Agent marketplace / pre-built templates — v3+
- SOC 2 / HIPAA compliance — post-revenue
- White-labeling for agencies — future consideration
- Multi-agent teams and coordinator pattern
- Microsoft Teams, Mattermost, Telegram channels
- Self-hosted deployment (Helm chart)
- Schema-per-tenant isolation
- Agent marketplace
- Voice/telephony channels
- SSO/SAML for enterprise
- Granular operator permissions
## Context
- **Market gap:** Existing AI tools are dashboards or chatbots, not channel-native workers. No coordinated AI teams. No self-hosted options for enterprises. Konstruct addresses all three.
- **Target customer:** SMBs that need additional staff capacity but lack resources, are overwhelmed with processes, or want to grow faster but can't find the right balance.
- **Inspiration:** paperclip.ing — but differentiated by channel-native presence, tiered multi-tenancy, and eventual BYO-model support.
- **V1 goal:** Beta-ready product that can accept early users. One AI employee per tenant on Slack + WhatsApp, managed through a full admin portal, with multi-tenancy and billing.
- **Tech foundation:** Python (FastAPI) backend, Next.js portal, PostgreSQL + Redis, Docker Compose for dev, monorepo structure.
## Constraints
- **Tech stack:** Python 3.12+ (FastAPI, SQLAlchemy 2.0, Pydantic v2), Next.js 14+ (App Router, shadcn/ui), PostgreSQL 16, Redis — as specified in CLAUDE.md
- **V1 channels:** Slack (slack-bolt) + WhatsApp (Business Cloud API) only
- **LLM providers:** Ollama (local) + Anthropic/OpenAI (commercial) via LiteLLM — no BYO in v1
- **Multi-tenancy:** PostgreSQL RLS for v1 (Starter tier), schema isolation deferred to v2
- **Deployment:** Docker Compose for dev, single-server deployment for beta — Kubernetes deferred
- **Market gap:** Existing AI tools are dashboards or chatbots, not channel-native workers. No coordinated AI teams. No self-hosted options for enterprises.
- **Target customer:** SMBs that need additional staff capacity but lack resources, are overwhelmed with processes, or want to grow faster.
- **Tech foundation:** Python 3.12+ (FastAPI, SQLAlchemy 2.0, Celery), Next.js 16 (App Router, shadcn/ui, next-intl, Serwist), PostgreSQL 16 + pgvector, Redis, Ollama, Docker Compose.
## Key Decisions
| Decision | Rationale | Outcome |
|----------|-----------|---------|
| Slack + WhatsApp for v1 channels | Slack = where SMB teams work, WhatsApp = massive business communication reach | — Pending |
| Single agent per tenant for v1 | Prove the channel-native thesis before adding team complexity | — Pending |
| Full portal from day one | Beta users need a proper UI, not config files — lowers barrier to adoption | — Pending |
| Local + commercial LLMs | Ollama for dev/cheap tasks, commercial APIs for quality — balances cost and capability | — Pending |
| PostgreSQL RLS multi-tenancy | Simplest to start, sufficient for Starter tier, upgrade path to schema isolation exists | — Pending |
| Beta-ready as v1 target | Multi-tenancy + billing = can accept real users, not just demos | — Pending |
| Slack + WhatsApp + Web Chat channels | Covers office (Slack), customers (WhatsApp), and portal users (Web Chat) | ✓ Shipped |
| Single agent per tenant for v1 | Prove channel-native thesis before team complexity | ✓ Shipped |
| Full portal from day one | Beta users need UI, not config files | ✓ Shipped |
| Local + commercial LLMs | Ollama for dev/cost, commercial for quality | ✓ Shipped |
| PostgreSQL RLS multi-tenancy | Simplest, sufficient for Starter tier | ✓ Shipped |
| Web chat bypasses Celery | Direct LLM streaming from WebSocket for speed | ✓ Shipped |
| Per-agent monthly pricing | Matches "hire an employee" metaphor | ✓ Shipped |
| 3-tier RBAC with invite flow | Self-service for customers, control for operators | ✓ Shipped |
| DM Sans + indigo brand | Premium SaaS aesthetic for SMB market | ✓ Shipped |
---
*Last updated: 2026-03-22 after initialization*
*Last updated: 2026-03-26 after Phase 10 completion*

View File

@@ -102,13 +102,13 @@ Requirements for beta-ready release. Each maps to roadmap phases.
### Agent Capabilities
- [ ] **CAP-01**: Web search tool returns real results from a search provider (Brave Search, SerpAPI, or similar)
- [ ] **CAP-02**: Knowledge base tool searches tenant-scoped documents that have been uploaded, chunked, and embedded in pgvector
- [ ] **CAP-03**: Operators can upload documents (PDF, DOCX, TXT) to a tenant's knowledge base via the portal
- [ ] **CAP-04**: HTTP request tool can call operator-configured URLs with response parsing and timeout handling
- [ ] **CAP-05**: Calendar tool can check Google Calendar availability (read-only for v1)
- [ ] **CAP-06**: Tool results are incorporated naturally into agent responses — no raw JSON or technical output shown to users
- [ ] **CAP-07**: All tool invocations are logged in the audit trail with input parameters and output summary
- [x] **CAP-01**: Web search tool returns real results from a search provider (Brave Search, SerpAPI, or similar)
- [x] **CAP-02**: Knowledge base tool searches tenant-scoped documents that have been uploaded, chunked, and embedded in pgvector
- [x] **CAP-03**: Operators can upload documents (PDF, DOCX, TXT) to a tenant's knowledge base via the portal
- [x] **CAP-04**: HTTP request tool can call operator-configured URLs with response parsing and timeout handling
- [x] **CAP-05**: Calendar tool can check Google Calendar availability (read-only for v1)
- [x] **CAP-06**: Tool results are incorporated naturally into agent responses — no raw JSON or technical output shown to users
- [x] **CAP-07**: All tool invocations are logged in the audit trail with input parameters and output summary
## v2 Requirements
@@ -219,13 +219,13 @@ Which phases cover which requirements. Updated during roadmap creation.
| QA-05 | Phase 9 | Complete |
| QA-06 | Phase 9 | Complete |
| QA-07 | Phase 9 | Complete |
| CAP-01 | Phase 10 | Pending |
| CAP-02 | Phase 10 | Pending |
| CAP-03 | Phase 10 | Pending |
| CAP-04 | Phase 10 | Pending |
| CAP-05 | Phase 10 | Pending |
| CAP-06 | Phase 10 | Pending |
| CAP-07 | Phase 10 | Pending |
| CAP-01 | Phase 10 | Complete |
| CAP-02 | Phase 10 | Complete |
| CAP-03 | Phase 10 | Complete |
| CAP-04 | Phase 10 | Complete |
| CAP-05 | Phase 10 | Complete |
| CAP-06 | Phase 10 | Complete |
| CAP-07 | Phase 10 | Complete |
**Coverage:**
- v1 requirements: 25 total (all complete)

View File

@@ -131,7 +131,7 @@ Plans:
## Progress
**Execution Order:**
Phases execute in numeric order: 1 -> 2 -> 3 -> 4 -> 5 -> 6 -> 7 -> 8 -> 9
Phases execute in numeric order: 1 -> 2 -> 3 -> 4 -> 5 -> 6 -> 7 -> 8 -> 9 -> 10
| Phase | Plans Complete | Status | Completed |
|-------|----------------|--------|-----------|
@@ -144,7 +144,7 @@ Phases execute in numeric order: 1 -> 2 -> 3 -> 4 -> 5 -> 6 -> 7 -> 8 -> 9
| 7. Multilanguage | 4/4 | Complete | 2026-03-25 |
| 8. Mobile + PWA | 4/4 | Complete | 2026-03-26 |
| 9. Testing & QA | 3/3 | Complete | 2026-03-26 |
| 10. Agent Capabilities | 0/0 | Not started | - |
| 10. Agent Capabilities | 3/3 | Complete | 2026-03-26 |
---
@@ -210,7 +210,7 @@ Plans:
- [ ] 09-03-PLAN.md — Gitea Actions CI pipeline (backend lint+pytest, portal build+E2E+Lighthouse) + human verification
### Phase 10: Agent Capabilities
**Goal**: Connect the 4 built-in agent tools to real external services so AI Employees can actually search the web, query a knowledge base of uploaded documents, make HTTP API calls, and check calendar availability
**Goal**: Connect the 4 built-in agent tools to real external services so AI Employees can actually search the web, query a knowledge base of uploaded documents, make HTTP API calls, and check calendar availability — with full CRUD Google Calendar integration and a dedicated KB management portal page
**Depends on**: Phase 9
**Requirements**: CAP-01, CAP-02, CAP-03, CAP-04, CAP-05, CAP-06, CAP-07
**Success Criteria** (what must be TRUE):
@@ -221,11 +221,13 @@ Plans:
5. Calendar tool can check availability on Google Calendar (read-only for v1)
6. Tool results are incorporated naturally into agent responses (no raw JSON dumps)
7. All tool invocations are logged in the audit trail with input/output
**Plans**: 0 plans
**Plans**: 3 plans
Plans:
- [ ] TBD (run /gsd:plan-phase 10 to break down)
- [ ] 10-01-PLAN.md — KB ingestion pipeline backend: migration 013, text extractors (PDF/DOCX/PPTX/XLSX/CSV/TXT/MD), chunking + embedding Celery task, KB API router (upload/list/delete/reindex/URL), executor tenant_id injection, web search config
- [ ] 10-02-PLAN.md — Google Calendar OAuth per tenant: install/callback endpoints, calendar_lookup replacement with list/create/check_availability, encrypted token storage, router mounting, tool response formatting
- [ ] 10-03-PLAN.md — Portal KB management page: document list with status polling, file upload (drag-and-drop), URL/YouTube ingestion, delete/reindex, RBAC, human verification
---
*Roadmap created: 2026-03-23*
*Coverage: 25/25 v1 requirements + 6 RBAC requirements + 5 Employee Design requirements + 5 Web Chat requirements + 6 Multilanguage requirements + 6 Mobile+PWA requirements + 7 Testing & QA requirements mapped*
*Coverage: 25/25 v1 requirements + 6 RBAC requirements + 5 Employee Design requirements + 5 Web Chat requirements + 6 Multilanguage requirements + 6 Mobile+PWA requirements + 7 Testing & QA requirements + 7 Agent Capabilities requirements mapped*

View File

@@ -3,14 +3,14 @@ gsd_state_version: 1.0
milestone: v1.0
milestone_name: milestone
status: completed
stopped_at: Completed 09-03-PLAN.md (Gitea Actions CI pipeline)
last_updated: "2026-03-26T04:54:21.890Z"
stopped_at: "Completed 10-03: Knowledge Base portal page, file upload, URL ingest, RBAC, i18n"
last_updated: "2026-03-26T15:29:17.215Z"
last_activity: 2026-03-23 — Completed 03-02 onboarding wizard, Slack OAuth, BYO API keys
progress:
total_phases: 9
completed_phases: 9
total_plans: 36
completed_plans: 36
total_phases: 10
completed_phases: 10
total_plans: 39
completed_plans: 39
percent: 100
---
@@ -88,6 +88,9 @@ Progress: [██████████] 100%
| Phase 09-testing-qa P01 | 5min | 2 tasks | 12 files |
| Phase 09-testing-qa P02 | 1min | 2 tasks | 3 files |
| Phase 09-testing-qa P03 | 3min | 1 tasks | 1 files |
| Phase 10-agent-capabilities P02 | 10m | 2 tasks | 9 files |
| Phase 10-agent-capabilities P01 | 11min | 2 tasks | 16 files |
| Phase 10-agent-capabilities P03 | 22min | 2 tasks | 10 files |
## Accumulated Context
@@ -208,6 +211,16 @@ Recent decisions affecting current work:
- [Phase 09-testing-qa]: Serious a11y violations are console.warn only — critical violations are hard CI failures
- [Phase 09-testing-qa]: No mypy --strict in CI — ruff lint is sufficient gate; mypy can be added incrementally when codebase is fully typed
- [Phase 09-testing-qa]: seed_admin uses || true in CI — test users created via E2E auth setup login form, not DB seeding
- [Phase 10-agent-capabilities]: calendar_lookup receives _session param for test injection — production obtains session from async_session_factory
- [Phase 10-agent-capabilities]: Tool result formatting instruction added to build_system_prompt when agent has tool_assignments (CAP-06)
- [Phase 10-agent-capabilities]: build() imported at module level in calendar_lookup for patchability in tests; try/except ImportError handles optional google library
- [Phase 10-agent-capabilities]: Migration numbered 014 (not 013) — 013 already used by google_calendar channel type migration from prior session
- [Phase 10-agent-capabilities]: KB is per-tenant not per-agent — agent_id made nullable in kb_documents
- [Phase 10-agent-capabilities]: Executor injects tenant_id/agent_id as strings after schema validation to avoid triggering schema rejections on LLM-provided args
- [Phase 10-agent-capabilities]: Lazy import of ingest_document task in kb.py via _get_ingest_task() — avoids shared→orchestrator circular dependency at module load time
- [Phase 10-agent-capabilities]: getAuthHeaders() exported from api.ts — multipart upload uses raw fetch to avoid Content-Type override; KB upload pattern reusable for future file endpoints
- [Phase 10-agent-capabilities]: CirclePlay icon used instead of Youtube — Youtube icon not in lucide-react v1.0.1 installed in portal
- [Phase 10-agent-capabilities]: Conditional refetchInterval in useKbDocuments — returns 5000ms while any doc is processing, false when all done; avoids constant polling
### Roadmap Evolution
@@ -223,6 +236,6 @@ None — all phases complete.
## Session Continuity
Last session: 2026-03-26T04:53:34.687Z
Stopped at: Completed 09-03-PLAN.md (Gitea Actions CI pipeline)
Last session: 2026-03-26T15:24:12.693Z
Stopped at: Completed 10-03: Knowledge Base portal page, file upload, URL ingest, RBAC, i18n
Resume file: None

View File

@@ -0,0 +1,338 @@
---
phase: 10-agent-capabilities
plan: 01
type: execute
wave: 1
depends_on: []
files_modified:
- migrations/versions/013_kb_status_and_calendar.py
- packages/shared/shared/models/kb.py
- packages/shared/shared/models/tenant.py
- packages/shared/shared/config.py
- packages/shared/shared/api/kb.py
- packages/orchestrator/orchestrator/tools/ingest.py
- packages/orchestrator/orchestrator/tools/extractors.py
- packages/orchestrator/orchestrator/tasks.py
- packages/orchestrator/orchestrator/tools/executor.py
- packages/orchestrator/orchestrator/tools/builtins/kb_search.py
- packages/orchestrator/pyproject.toml
- .env.example
- tests/unit/test_extractors.py
- tests/unit/test_kb_upload.py
autonomous: true
requirements:
- CAP-01
- CAP-02
- CAP-03
- CAP-04
- CAP-07
must_haves:
truths:
- "Documents uploaded via API are saved to MinIO and a KbDocument row is created with status=processing"
- "The Celery ingestion task extracts text from PDF, DOCX, PPTX, XLSX, CSV, TXT, and MD files"
- "Extracted text is chunked (500 chars, 50 overlap) and embedded via all-MiniLM-L6-v2 into kb_chunks with tenant_id"
- "kb_search tool receives tenant_id injection from executor and returns matching chunks"
- "BRAVE_API_KEY and FIRECRAWL_API_KEY are platform-wide settings in shared config"
- "Tool executor injects tenant_id and agent_id into tool handler kwargs for context-aware tools"
artifacts:
- path: "migrations/versions/013_kb_status_and_calendar.py"
provides: "DB migration: kb_documents status/error_message/chunk_count columns, agent_id nullable, channel_type CHECK update for google_calendar"
contains: "status"
- path: "packages/orchestrator/orchestrator/tools/extractors.py"
provides: "Text extraction functions for all supported document formats"
exports: ["extract_text"]
- path: "packages/orchestrator/orchestrator/tools/ingest.py"
provides: "Document chunking and ingestion pipeline logic"
exports: ["chunk_text", "ingest_document_pipeline"]
- path: "packages/shared/shared/api/kb.py"
provides: "KB management API router (upload, list, delete, re-index)"
exports: ["kb_router"]
- path: "tests/unit/test_extractors.py"
provides: "Unit tests for text extraction functions"
key_links:
- from: "packages/shared/shared/api/kb.py"
to: "packages/orchestrator/orchestrator/tasks.py"
via: "ingest_document.delay(document_id, tenant_id)"
pattern: "ingest_document\\.delay"
- from: "packages/orchestrator/orchestrator/tools/executor.py"
to: "tool.handler"
via: "tenant_id/agent_id injection into kwargs"
pattern: "tenant_id.*agent_id.*handler"
---
<objective>
Build the knowledge base document ingestion pipeline backend and activate web search/HTTP tools.
Purpose: This is the core backend for CAP-02/CAP-03 -- the document upload, text extraction, chunking, embedding, and storage pipeline that makes the KB search tool functional with real data. Also fixes the tool executor to inject tenant context into tool handlers, activates web search via BRAVE_API_KEY config, and confirms HTTP request tool needs no changes (CAP-04).
Output: Working KB upload API, Celery ingestion task, text extractors for all formats, migration 013, executor tenant_id injection, updated config with new env vars.
</objective>
<execution_context>
@/home/adelorenzo/.claude/get-shit-done/workflows/execute-plan.md
@/home/adelorenzo/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/STATE.md
@.planning/phases/10-agent-capabilities/10-CONTEXT.md
@.planning/phases/10-agent-capabilities/10-RESEARCH.md
<interfaces>
<!-- Key types and contracts the executor needs -->
From packages/shared/shared/models/kb.py:
```python
class KnowledgeBaseDocument(KBBase):
__tablename__ = "kb_documents"
id: Mapped[uuid.UUID]
tenant_id: Mapped[uuid.UUID]
agent_id: Mapped[uuid.UUID] # Currently NOT NULL — migration 013 makes nullable
filename: Mapped[str | None]
source_url: Mapped[str | None]
content_type: Mapped[str | None]
created_at: Mapped[datetime]
chunks: Mapped[list[KBChunk]]
class KBChunk(KBBase):
__tablename__ = "kb_chunks"
id: Mapped[uuid.UUID]
tenant_id: Mapped[uuid.UUID]
document_id: Mapped[uuid.UUID]
content: Mapped[str]
chunk_index: Mapped[int | None]
created_at: Mapped[datetime]
```
From packages/orchestrator/orchestrator/tools/executor.py:
```python
async def execute_tool(
tool_call: dict[str, Any],
registry: dict[str, "ToolDefinition"],
tenant_id: uuid.UUID,
agent_id: uuid.UUID,
audit_logger: "AuditLogger",
) -> str:
# Line 126: result = await tool.handler(**args)
# PROBLEM: only LLM-provided args are passed, tenant_id/agent_id NOT injected
```
From packages/orchestrator/orchestrator/memory/embedder.py:
```python
def embed_text(text: str) -> list[float]: # Returns 384-dim vector
def embed_texts(texts: list[str]) -> list[list[float]]: # Batch embedding
```
From packages/shared/shared/config.py:
```python
class Settings(BaseSettings):
minio_endpoint: str
minio_access_key: str
minio_secret_key: str
minio_media_bucket: str
```
From packages/shared/shared/api/channels.py:
```python
channels_router = APIRouter(prefix="/api/portal/channels", tags=["channels"])
# Uses: require_tenant_admin, get_session, KeyEncryptionService
# OAuth state: generate_oauth_state() / verify_oauth_state() with HMAC-SHA256
```
From packages/shared/shared/api/rbac.py:
```python
class PortalCaller: ...
async def require_tenant_admin(...) -> PortalCaller: ...
async def require_tenant_member(...) -> PortalCaller: ...
```
</interfaces>
</context>
<tasks>
<task type="auto" tdd="true">
<name>Task 1: Migration 013, ORM updates, config settings, text extractors, KB API router</name>
<files>
migrations/versions/013_kb_status_and_calendar.py,
packages/shared/shared/models/kb.py,
packages/shared/shared/models/tenant.py,
packages/shared/shared/config.py,
packages/shared/shared/api/kb.py,
packages/orchestrator/orchestrator/tools/extractors.py,
packages/orchestrator/pyproject.toml,
.env.example,
tests/unit/test_extractors.py,
tests/unit/test_kb_upload.py
</files>
<behavior>
- extract_text("hello.pdf", pdf_bytes) returns extracted text from PDF pages
- extract_text("doc.docx", docx_bytes) returns paragraph text from DOCX
- extract_text("slides.pptx", pptx_bytes) returns slide text from PPTX
- extract_text("data.xlsx", xlsx_bytes) returns CSV-formatted cell data
- extract_text("data.csv", csv_bytes) returns decoded UTF-8 text
- extract_text("notes.txt", txt_bytes) returns decoded text
- extract_text("notes.md", md_bytes) returns decoded text
- extract_text("file.exe", bytes) raises ValueError("Unsupported file extension")
- KB upload endpoint returns 201 with document_id for valid file
- KB list endpoint returns documents with status field
- KB delete endpoint removes document and chunks
</behavior>
<action>
1. **Migration 013** (`migrations/versions/013_kb_status_and_calendar.py`):
- ALTER TABLE kb_documents ADD COLUMN status TEXT NOT NULL DEFAULT 'processing'
- ALTER TABLE kb_documents ADD COLUMN error_message TEXT
- ALTER TABLE kb_documents ADD COLUMN chunk_count INTEGER
- ALTER TABLE kb_documents ALTER COLUMN agent_id DROP NOT NULL (KB is per-tenant per locked decision)
- DROP + re-ADD channel_connections CHECK constraint to include 'google_calendar' (same pattern as migration 008)
- New channel types tuple: slack, whatsapp, mattermost, rocketchat, teams, telegram, signal, web, google_calendar
- Add CHECK constraint on kb_documents.status: CHECK (status IN ('processing', 'ready', 'error'))
2. **ORM updates**:
- `packages/shared/shared/models/kb.py`: Add status (str, server_default='processing'), error_message (str | None), chunk_count (int | None) mapped columns to KnowledgeBaseDocument. Change agent_id to nullable=True.
- `packages/shared/shared/models/tenant.py`: Add GOOGLE_CALENDAR = "google_calendar" to ChannelTypeEnum
3. **Config** (`packages/shared/shared/config.py`):
- Add brave_api_key: str = Field(default="", description="Brave Search API key")
- Add firecrawl_api_key: str = Field(default="", description="Firecrawl API key for URL scraping")
- Add google_client_id: str = Field(default="", description="Google OAuth client ID")
- Add google_client_secret: str = Field(default="", description="Google OAuth client secret")
- Add minio_kb_bucket: str = Field(default="kb-documents", description="MinIO bucket for KB documents")
- Update .env.example with all new env vars
4. **Install dependencies** on orchestrator:
```bash
uv add --project packages/orchestrator pypdf python-docx python-pptx openpyxl pandas firecrawl-py youtube-transcript-api google-api-python-client google-auth-oauthlib
```
5. **Text extractors** (`packages/orchestrator/orchestrator/tools/extractors.py`):
- Create extract_text(filename: str, file_bytes: bytes) -> str function
- PDF: pypdf PdfReader on BytesIO, join page text with newlines
- DOCX: python-docx Document on BytesIO, join paragraph text
- PPTX: python-pptx Presentation on BytesIO, iterate slides/shapes for text
- XLSX/XLS: pandas read_excel on BytesIO, to_csv(index=False)
- CSV: decode UTF-8 with errors="replace"
- TXT/MD: decode UTF-8 with errors="replace"
- Raise ValueError for unsupported extensions
- After extraction, check if len(text.strip()) < 100 chars for PDF — return error message about OCR not supported
6. **KB API router** (`packages/shared/shared/api/kb.py`):
- kb_router = APIRouter(prefix="/api/portal/kb", tags=["knowledge-base"])
- POST /{tenant_id}/documents — multipart file upload (UploadFile + File)
- Validate file extension against supported list
- Read file bytes, upload to MinIO kb-documents bucket with key: {tenant_id}/{doc_id}/{filename}
- Insert KnowledgeBaseDocument(tenant_id, filename, content_type, status='processing', agent_id=None)
- Call ingest_document.delay(str(doc.id), str(tenant_id)) — import from orchestrator.tasks
- Return 201 with {"id": str(doc.id), "filename": filename, "status": "processing"}
- Guard with require_tenant_admin
- POST /{tenant_id}/documents/url — JSON body {url: str, source_type: "web" | "youtube"}
- Insert KnowledgeBaseDocument(tenant_id, source_url=url, status='processing', agent_id=None)
- Call ingest_document.delay(str(doc.id), str(tenant_id))
- Return 201
- Guard with require_tenant_admin
- GET /{tenant_id}/documents — list KbDocuments for tenant with status, chunk_count, created_at
- Guard with require_tenant_member (operators can view)
- DELETE /{tenant_id}/documents/{document_id} — delete document (CASCADE deletes chunks)
- Also delete file from MinIO if filename present
- Guard with require_tenant_admin
- POST /{tenant_id}/documents/{document_id}/reindex — delete existing chunks, re-dispatch ingest_document.delay
- Guard with require_tenant_admin
7. **Tests** (write BEFORE implementation per tdd=true):
- test_extractors.py: test each format extraction with minimal valid files (create in-memory test fixtures using the libraries)
- test_kb_upload.py: test upload endpoint with mocked MinIO and mocked Celery task dispatch
</action>
<verify>
<automated>cd /home/adelorenzo/repos/konstruct && python -m pytest tests/unit/test_extractors.py tests/unit/test_kb_upload.py -x -q</automated>
</verify>
<done>Migration 013 exists with all schema changes. Text extractors handle all 7 format families. KB API router has upload, list, delete, URL ingest, and reindex endpoints. All unit tests pass.</done>
</task>
<task type="auto" tdd="true">
<name>Task 2: Celery ingestion task, executor tenant_id injection, KB search wiring</name>
<files>
packages/orchestrator/orchestrator/tasks.py,
packages/orchestrator/orchestrator/tools/ingest.py,
packages/orchestrator/orchestrator/tools/executor.py,
packages/orchestrator/orchestrator/tools/builtins/kb_search.py,
packages/orchestrator/orchestrator/tools/builtins/web_search.py,
tests/unit/test_ingestion.py,
tests/unit/test_executor_injection.py
</files>
<behavior>
- chunk_text("hello world " * 100, chunk_size=500, overlap=50) returns overlapping chunks of correct size
- ingest_document_pipeline fetches file from MinIO, extracts text, chunks, embeds, inserts kb_chunks rows, updates status to 'ready'
- ingest_document_pipeline sets status='error' with error_message on failure
- execute_tool injects tenant_id and agent_id into handler kwargs before calling handler
- web_search reads BRAVE_API_KEY from settings (not os.getenv) for consistency
- kb_search receives injected tenant_id from executor
</behavior>
<action>
1. **Chunking + ingestion logic** (`packages/orchestrator/orchestrator/tools/ingest.py`):
- chunk_text(text: str, chunk_size: int = 500, overlap: int = 50) -> list[str]
- Simple sliding window chunker, strip empty chunks
- async ingest_document_pipeline(document_id: str, tenant_id: str) -> None:
- Load KnowledgeBaseDocument from DB by ID (use RLS with tenant_id)
- If filename: download file bytes from MinIO (boto3 client, kb-documents bucket, key: {tenant_id}/{document_id}/{filename})
- If source_url and source_url contains "youtube.com" or "youtu.be": use youtube_transcript_api to fetch transcript
- If source_url and not YouTube: use firecrawl-py to scrape URL to markdown (graceful error if FIRECRAWL_API_KEY not set)
- Call extract_text(filename, file_bytes) for file uploads
- Call chunk_text(text) on extracted text
- Batch embed chunks using embed_texts() from embedder.py
- INSERT kb_chunks rows with embedding vectors (use raw SQL text() with CAST(:embedding AS vector) pattern from kb_search.py)
- UPDATE kb_documents SET status='ready', chunk_count=len(chunks)
- On any error: UPDATE kb_documents SET status='error', error_message=str(exc)
2. **Celery task** in `packages/orchestrator/orchestrator/tasks.py`:
- Add ingest_document Celery task (sync def with asyncio.run per hard architectural constraint)
- @celery_app.task(bind=True, max_retries=2, ignore_result=True)
- def ingest_document(self, document_id: str, tenant_id: str) -> None
- Calls asyncio.run(ingest_document_pipeline(document_id, tenant_id))
- On exception: asyncio.run to mark document as error, then self.retry(countdown=60)
3. **Executor tenant_id injection** (`packages/orchestrator/orchestrator/tools/executor.py`):
- Before calling tool.handler(**args), inject tenant_id and agent_id as string kwargs:
args["tenant_id"] = str(tenant_id)
args["agent_id"] = str(agent_id)
- This makes kb_search, calendar_lookup, and future context-aware tools work without LLM needing to know tenant context
- Place injection AFTER schema validation (line ~126) so the injected keys don't fail validation
4. **Update web_search.py**: Change `os.getenv("BRAVE_API_KEY", "")` to import settings from shared.config and use `settings.brave_api_key` for consistency with platform-wide config pattern.
5. **Tests** (write BEFORE implementation):
- test_ingestion.py: test chunk_text with various inputs, test ingest_document_pipeline with mocked MinIO/DB/embedder
- test_executor_injection.py: test that execute_tool injects tenant_id/agent_id into handler kwargs
</action>
<verify>
<automated>cd /home/adelorenzo/repos/konstruct && python -m pytest tests/unit/test_ingestion.py tests/unit/test_executor_injection.py -x -q</automated>
</verify>
<done>Celery ingest_document task dispatches async ingestion pipeline. Pipeline downloads files from MinIO, extracts text, chunks, embeds, and stores in kb_chunks. Executor injects tenant_id/agent_id into all tool handlers. web_search uses shared config. All tests pass.</done>
</task>
</tasks>
<verification>
- Migration 013 applies cleanly: `cd /home/adelorenzo/repos/konstruct && alembic upgrade head`
- All unit tests pass: `pytest tests/unit/test_extractors.py tests/unit/test_kb_upload.py tests/unit/test_ingestion.py tests/unit/test_executor_injection.py -x -q`
- KB API router mounts and serves: import kb_router without errors
- Executor properly injects tenant context into tool handlers
</verification>
<success_criteria>
- KnowledgeBaseDocument has status, error_message, chunk_count columns; agent_id is nullable
- channel_connections CHECK constraint includes 'google_calendar'
- Text extraction works for PDF, DOCX, PPTX, XLSX, CSV, TXT, MD
- KB upload endpoint accepts files and dispatches Celery task
- KB list/delete/reindex endpoints work
- URL and YouTube ingestion endpoints dispatch Celery tasks
- Celery ingestion pipeline: extract -> chunk -> embed -> store
- Tool executor injects tenant_id and agent_id into handler kwargs
- BRAVE_API_KEY and FIRECRAWL_API_KEY in shared config
- All unit tests pass
</success_criteria>
<output>
After completion, create `.planning/phases/10-agent-capabilities/10-01-SUMMARY.md`
</output>

View File

@@ -0,0 +1,188 @@
---
phase: 10-agent-capabilities
plan: 01
subsystem: api
tags: [knowledge-base, celery, minio, pgvector, pdf, docx, pptx, embeddings, text-extraction]
# Dependency graph
requires:
- phase: 02-agent-features
provides: pgvector kb_chunks table, embed_texts, kb_search tool, executor framework
- phase: 01-foundation
provides: Celery task infrastructure, MinIO, asyncio.run pattern, RLS session factory
provides:
- Migration 014: kb_documents status/error_message/chunk_count columns, agent_id nullable
- Text extractors for PDF, DOCX, PPTX, XLSX/XLS, CSV, TXT, MD
- KB management API: upload file, ingest URL/YouTube, list, delete, reindex endpoints
- Celery ingest_document task: download → extract → chunk → embed → store pipeline
- Executor tenant_id/agent_id injection into all tool handlers
- brave_api_key + firecrawl_api_key + google_client_id/secret + minio_kb_bucket in shared config
affects: [10-02, 10-03, 10-04, kb-search, agent-tools]
# Tech tracking
tech-stack:
added:
- pypdf (PDF text extraction)
- python-docx (DOCX paragraph extraction)
- python-pptx (PPTX slide text extraction)
- openpyxl (XLSX/XLS reading via pandas)
- pandas (spreadsheet to CSV conversion)
- firecrawl-py (URL scraping for KB ingestion)
- youtube-transcript-api (YouTube video transcripts)
- google-api-python-client (Google API client)
- google-auth-oauthlib (Google OAuth)
patterns:
- Lazy Celery task import in kb.py to avoid circular dependencies
- Executor context injection pattern (tenant_id/agent_id injected after schema validation)
- chunk_text sliding window chunker (default 500 chars, 50 overlap)
- ingest_document_pipeline: fetch → extract → chunk → embed → store in single async transaction
key-files:
created:
- migrations/versions/014_kb_status.py
- packages/orchestrator/orchestrator/tools/extractors.py
- packages/orchestrator/orchestrator/tools/ingest.py
- packages/shared/shared/api/kb.py
- tests/unit/test_extractors.py
- tests/unit/test_kb_upload.py
- tests/unit/test_ingestion.py
- tests/unit/test_executor_injection.py
modified:
- packages/shared/shared/models/kb.py (status/error_message/chunk_count columns, agent_id nullable)
- packages/shared/shared/models/tenant.py (GOOGLE_CALENDAR added to ChannelTypeEnum)
- packages/shared/shared/config.py (brave_api_key, firecrawl_api_key, google_client_id/secret, minio_kb_bucket)
- packages/orchestrator/orchestrator/tools/executor.py (tenant_id/agent_id injection)
- packages/orchestrator/orchestrator/tools/builtins/web_search.py (use settings.brave_api_key)
- packages/orchestrator/orchestrator/tasks.py (ingest_document Celery task added)
- packages/orchestrator/pyproject.toml (new dependencies)
- .env.example (BRAVE_API_KEY, FIRECRAWL_API_KEY, GOOGLE_CLIENT_ID/SECRET, MINIO_KB_BUCKET)
key-decisions:
- "Migration numbered 014 (not 013) — 013 was already used by google_calendar channel type migration from prior session"
- "KB is per-tenant not per-agent — agent_id made nullable in kb_documents"
- "Executor injects tenant_id/agent_id as strings after schema validation to avoid schema rejections"
- "Lazy import of ingest_document task in kb.py router via _get_ingest_task() — avoids shared→orchestrator circular dependency"
- "ingest_document_pipeline uses ORM select for document fetch (testable) and raw SQL for chunk inserts (pgvector CAST pattern)"
- "web_search migrated from os.getenv to settings.brave_api_key — consistent with platform-wide config pattern"
- "chunk_text returns empty list for empty/whitespace text, not error — silent skip is safer in async pipeline"
- "PDF extraction returns warning message (not exception) for image-only PDFs with < 100 chars extracted"
patterns-established:
- "Context injection pattern: executor injects tenant_id/agent_id as str kwargs after schema validation, before handler call"
- "KB ingestion pipeline: try/except updates doc.status to error with error_message on any failure"
- "Lazy circular dep avoidance: _get_ingest_task() function returns task at call time, imported inside function"
requirements-completed: [CAP-01, CAP-02, CAP-03, CAP-04, CAP-07]
# Metrics
duration: 11min
completed: 2026-03-26
---
# Phase 10 Plan 01: KB Ingestion Pipeline Summary
**Document ingestion pipeline for KB search: text extractors (PDF/DOCX/PPTX/XLSX/CSV/TXT/MD), Celery async ingest task, executor tenant context injection, and KB management REST API**
## Performance
- **Duration:** 11 min
- **Started:** 2026-03-26T14:59:19Z
- **Completed:** 2026-03-26T15:10:06Z
- **Tasks:** 2
- **Files modified:** 16
## Accomplishments
- Full document text extraction for 7 format families using pypdf, python-docx, python-pptx, pandas, plus CSV/TXT/MD decode
- KB management REST API with file upload, URL/YouTube ingest, list, delete, and reindex endpoints
- Celery `ingest_document` task runs async pipeline: MinIO download → extract → chunk (500 char sliding window) → embed (all-MiniLM-L6-v2) → store kb_chunks
- Tool executor now injects `tenant_id` and `agent_id` as string kwargs into every tool handler before invocation
- 31 unit tests pass across all 4 test files
## Task Commits
1. **Task 1: Migration 013, ORM updates, config settings, text extractors, KB API router** - `e8d3e8a` (feat)
2. **Task 2: Celery ingestion task, executor tenant_id injection, KB search wiring** - `9c7686a` (feat)
## Files Created/Modified
- `migrations/versions/014_kb_status.py` - Migration: add status/error_message/chunk_count to kb_documents, make agent_id nullable
- `packages/shared/shared/models/kb.py` - Added status/error_message/chunk_count mapped columns, agent_id nullable
- `packages/shared/shared/models/tenant.py` - Added GOOGLE_CALENDAR and WEB to ChannelTypeEnum
- `packages/shared/shared/config.py` - Added brave_api_key, firecrawl_api_key, google_client_id, google_client_secret, minio_kb_bucket
- `packages/shared/shared/api/kb.py` - New KB management API router (5 endpoints)
- `packages/orchestrator/orchestrator/tools/extractors.py` - Text extraction for all 7 formats
- `packages/orchestrator/orchestrator/tools/ingest.py` - chunk_text + ingest_document_pipeline
- `packages/orchestrator/orchestrator/tasks.py` - Added ingest_document Celery task
- `packages/orchestrator/orchestrator/tools/executor.py` - tenant_id/agent_id injection after schema validation
- `packages/orchestrator/orchestrator/tools/builtins/web_search.py` - Migrated to settings.brave_api_key
- `packages/orchestrator/pyproject.toml` - Added 8 new dependencies
- `.env.example` - Added BRAVE_API_KEY, FIRECRAWL_API_KEY, GOOGLE_CLIENT_ID/SECRET, MINIO_KB_BUCKET
## Decisions Made
- Migration numbered 014 (not 013) — 013 was already used by a google_calendar channel type migration from a prior session
- KB is per-tenant not per-agent — agent_id made nullable in kb_documents
- Executor injects tenant_id/agent_id as strings after schema validation to avoid triggering schema rejections
- Lazy import of ingest_document task in kb.py via `_get_ingest_task()` function — avoids shared→orchestrator circular dependency at module load time
- `ingest_document_pipeline` uses ORM `select(KnowledgeBaseDocument)` for document fetch (testable via mock) and raw SQL for chunk INSERTs (pgvector CAST pattern)
## Deviations from Plan
### Auto-fixed Issues
**1. [Rule 3 - Blocking] Migration renumbered from 013 to 014**
- **Found during:** Task 1 (Migration creation)
- **Issue:** Migration 013 already existed (`013_google_calendar_channel.py`) from a prior phase session
- **Fix:** Renamed migration file to `014_kb_status.py` with revision=014, down_revision=013
- **Files modified:** migrations/versions/014_kb_status.py
- **Verification:** File renamed, revision chain intact
- **Committed in:** e8d3e8a (Task 1 commit)
**2. [Rule 2 - Missing Critical] Added WEB to ChannelTypeEnum alongside GOOGLE_CALENDAR**
- **Found during:** Task 1 (tenant.py update)
- **Issue:** WEB channel type was missing from the enum (google_calendar was not the only new type)
- **Fix:** Added both `WEB = "web"` and `GOOGLE_CALENDAR = "google_calendar"` to ChannelTypeEnum
- **Files modified:** packages/shared/shared/models/tenant.py
- **Committed in:** e8d3e8a (Task 1 commit)
**3. [Rule 1 - Bug] FastAPI Depends overrides required for KB upload tests**
- **Found during:** Task 1 (test_kb_upload.py)
- **Issue:** Initial test approach used `patch()` to mock auth deps but FastAPI calls Depends directly — 422 returned
- **Fix:** Updated test to use `app.dependency_overrides` (correct FastAPI testing pattern)
- **Files modified:** tests/unit/test_kb_upload.py
- **Committed in:** e8d3e8a (Task 1 commit)
---
**Total deviations:** 3 auto-fixed (1 blocking, 1 missing critical, 1 bug)
**Impact on plan:** All fixes necessary for correctness. No scope creep.
## Issues Encountered
None beyond the deviations documented above.
## User Setup Required
New environment variables needed:
- `BRAVE_API_KEY` — Brave Search API key (https://brave.com/search/api/)
- `FIRECRAWL_API_KEY` — Firecrawl API key for URL scraping (https://firecrawl.dev)
- `GOOGLE_CLIENT_ID` / `GOOGLE_CLIENT_SECRET` — Google OAuth credentials
- `MINIO_KB_BUCKET` — MinIO bucket for KB documents (default: `kb-documents`)
## Next Phase Readiness
- KB ingestion pipeline is fully functional and tested
- kb_search tool already wired to query kb_chunks via pgvector (existing from Phase 2)
- Executor now injects tenant context — all context-aware tools (kb_search, calendar) will work correctly
- Ready for 10-02 (calendar tool) and 10-03 (any remaining agent capability work)
## Self-Check: PASSED
All files found on disk. All commits verified in git log.
---
*Phase: 10-agent-capabilities*
*Completed: 2026-03-26*

View File

@@ -0,0 +1,262 @@
---
phase: 10-agent-capabilities
plan: 02
type: execute
wave: 1
depends_on: []
files_modified:
- packages/shared/shared/api/calendar_auth.py
- packages/orchestrator/orchestrator/tools/builtins/calendar_lookup.py
- packages/orchestrator/orchestrator/tools/registry.py
- tests/unit/test_calendar_lookup.py
- tests/unit/test_calendar_auth.py
autonomous: true
requirements:
- CAP-05
- CAP-06
user_setup:
- service: google-cloud
why: "Google Calendar OAuth for per-tenant calendar access"
env_vars:
- name: GOOGLE_CLIENT_ID
source: "Google Cloud Console -> APIs & Services -> Credentials -> OAuth 2.0 Client ID (Web application)"
- name: GOOGLE_CLIENT_SECRET
source: "Google Cloud Console -> APIs & Services -> Credentials -> OAuth 2.0 Client ID secret"
dashboard_config:
- task: "Create OAuth 2.0 Client ID (Web application type)"
location: "Google Cloud Console -> APIs & Services -> Credentials"
- task: "Add authorized redirect URI: {PORTAL_URL}/api/portal/calendar/callback"
location: "Google Cloud Console -> Credentials -> OAuth client -> Authorized redirect URIs"
- task: "Enable Google Calendar API"
location: "Google Cloud Console -> APIs & Services -> Library -> Google Calendar API"
must_haves:
truths:
- "Tenant admin can initiate Google Calendar OAuth from the portal and authorize calendar access"
- "Calendar OAuth callback exchanges code for tokens and stores them encrypted per tenant"
- "Calendar tool reads per-tenant OAuth tokens from channel_connections and calls Google Calendar API"
- "Calendar tool supports list events, check availability, and create event actions"
- "Token auto-refresh works — expired access tokens are refreshed via stored refresh_token and written back to DB"
- "Tool results are formatted as natural language (no raw JSON)"
artifacts:
- path: "packages/shared/shared/api/calendar_auth.py"
provides: "Google Calendar OAuth install + callback endpoints"
exports: ["calendar_auth_router"]
- path: "packages/orchestrator/orchestrator/tools/builtins/calendar_lookup.py"
provides: "Per-tenant OAuth calendar tool with list/create/check_availability"
exports: ["calendar_lookup"]
- path: "tests/unit/test_calendar_lookup.py"
provides: "Unit tests for calendar tool with mocked Google API"
key_links:
- from: "packages/shared/shared/api/calendar_auth.py"
to: "channel_connections table"
via: "Upsert ChannelConnection(channel_type='google_calendar') with encrypted token"
pattern: "google_calendar.*encrypt"
- from: "packages/orchestrator/orchestrator/tools/builtins/calendar_lookup.py"
to: "channel_connections table"
via: "Load encrypted token, decrypt, build Credentials, call Google API"
pattern: "Credentials.*refresh_token"
---
<objective>
Build Google Calendar OAuth per-tenant integration and replace the service-account stub with full CRUD calendar tool.
Purpose: Enables CAP-05 (calendar availability checking + event creation) by replacing the service account stub in calendar_lookup.py with per-tenant OAuth token lookup. Also addresses CAP-06 (natural language tool results) by ensuring calendar and all tool outputs are formatted as readable text.
Output: Google Calendar OAuth install/callback endpoints, fully functional calendar_lookup tool with list/create/check_availability actions, encrypted per-tenant token storage, token auto-refresh with write-back.
</objective>
<execution_context>
@/home/adelorenzo/.claude/get-shit-done/workflows/execute-plan.md
@/home/adelorenzo/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/phases/10-agent-capabilities/10-CONTEXT.md
@.planning/phases/10-agent-capabilities/10-RESEARCH.md
<interfaces>
<!-- Existing OAuth pattern from Slack to reuse -->
From packages/shared/shared/api/channels.py:
```python
channels_router = APIRouter(prefix="/api/portal/channels", tags=["channels"])
def _generate_oauth_state(tenant_id: uuid.UUID) -> str:
"""HMAC-SHA256 signed state with embedded tenant_id + nonce."""
...
def _verify_oauth_state(state: str) -> uuid.UUID:
"""Verify HMAC signature, return tenant_id. Raises HTTPException on failure."""
...
```
From packages/shared/shared/crypto.py:
```python
class KeyEncryptionService:
def encrypt(self, plaintext: str) -> str: ...
def decrypt(self, ciphertext: str) -> str: ...
```
From packages/shared/shared/models/tenant.py:
```python
class ChannelConnection(Base):
__tablename__ = "channel_connections"
id: Mapped[uuid.UUID]
tenant_id: Mapped[uuid.UUID]
channel_type: Mapped[ChannelTypeEnum] # TEXT + CHECK in DB
workspace_id: Mapped[str]
config: Mapped[dict] # JSON — stores encrypted token
created_at: Mapped[datetime]
```
From packages/shared/shared/config.py (after Plan 01):
```python
class Settings(BaseSettings):
google_client_id: str = ""
google_client_secret: str = ""
```
</interfaces>
</context>
<tasks>
<task type="auto" tdd="true">
<name>Task 1: Google Calendar OAuth endpoints and calendar tool replacement</name>
<files>
packages/shared/shared/api/calendar_auth.py,
packages/orchestrator/orchestrator/tools/builtins/calendar_lookup.py,
tests/unit/test_calendar_lookup.py,
tests/unit/test_calendar_auth.py
</files>
<behavior>
- OAuth install endpoint returns redirect URL with HMAC-signed state containing tenant_id
- OAuth callback verifies HMAC state, exchanges code for tokens, encrypts and stores in channel_connections as google_calendar type
- OAuth callback redirects to portal settings page with connected=true param
- calendar_lookup(date, action="list", tenant_id=...) loads encrypted token from DB, decrypts, calls Google Calendar API, returns formatted event list
- calendar_lookup(date, action="create", event_summary=..., event_start=..., event_end=..., tenant_id=...) creates a Google Calendar event and returns confirmation
- calendar_lookup(date, action="check_availability", tenant_id=...) returns free/busy summary
- calendar_lookup returns informative message when no Google Calendar is connected for tenant
- Token refresh: if access_token expired, google-auth auto-refreshes, updated token written back to DB
- All results are natural language strings, not raw JSON
</behavior>
<action>
1. **Calendar OAuth router** (`packages/shared/shared/api/calendar_auth.py`):
- calendar_auth_router = APIRouter(prefix="/api/portal/calendar", tags=["calendar"])
- Import and reuse _generate_oauth_state / _verify_oauth_state from channels.py (or extract to shared utility if private)
- If they are private (_prefix), create equivalent functions in this module using the same HMAC pattern
- GET /install?tenant_id={id}:
- Guard with require_tenant_admin
- Generate HMAC-signed state with tenant_id
- Build Google OAuth URL: https://accounts.google.com/o/oauth2/v2/auth with:
- client_id from settings
- redirect_uri = settings.portal_url + "/api/portal/calendar/callback"
- scope = "https://www.googleapis.com/auth/calendar" (full read+write per locked decision)
- state = hmac_state
- access_type = "offline" (to get refresh_token)
- prompt = "consent" (force consent to always get refresh_token)
- Return {"url": oauth_url}
- GET /callback?code={code}&state={state}:
- NO auth guard (external redirect from Google — no session cookie)
- Verify HMAC state to recover tenant_id
- Exchange code for tokens using google_auth_oauthlib or httpx POST to https://oauth2.googleapis.com/token
- Encrypt token JSON with KeyEncryptionService (Fernet)
- Upsert ChannelConnection(tenant_id=tenant_id, channel_type="google_calendar", workspace_id=str(tenant_id), config={"token": encrypted_token})
- Redirect to portal /settings?calendar=connected
- GET /{tenant_id}/status:
- Guard with require_tenant_member
- Check if ChannelConnection with channel_type='google_calendar' exists for tenant
- Return {"connected": true/false}
2. **Replace calendar_lookup.py** entirely:
- Remove all service account code
- New signature: async def calendar_lookup(date: str, action: str = "list", event_summary: str | None = None, event_start: str | None = None, event_end: str | None = None, calendar_id: str = "primary", tenant_id: str | None = None, **kwargs) -> str
- If no tenant_id: return "Calendar not available: missing tenant context."
- Load ChannelConnection(channel_type='google_calendar', tenant_id=tenant_uuid) from DB
- If not found: return "Google Calendar is not connected for this tenant. Ask an admin to connect it in Settings."
- Decrypt token JSON, build google.oauth2.credentials.Credentials
- Build Calendar service: build("calendar", "v3", credentials=creds, cache_discovery=False)
- Run API call in thread executor (same pattern as original — avoid blocking event loop)
- action="list": list events for date, format as "Calendar events for {date}:\n- {time}: {summary}\n..."
- action="check_availability": list events, format as "Busy slots on {date}:\n..." or "No events — the entire day is free."
- action="create": insert event with summary, start, end, return "Event created: {summary} from {start} to {end}"
- After API call: check if credentials.token changed (refresh occurred) — if so, encrypt and UPDATE channel_connections.config with new token
- All errors return human-readable messages, never raw exceptions
3. **Update tool registry** if needed — ensure calendar_lookup parameters schema includes action, event_summary, event_start, event_end fields so LLM knows about CRUD capabilities. Check packages/orchestrator/orchestrator/tools/registry.py for the calendar_lookup entry and update its parameters JSON schema.
4. **Tests** (write BEFORE implementation):
- test_calendar_lookup.py: mock Google Calendar API (googleapiclient.discovery.build), mock DB session to return encrypted token, test list/create/check_availability actions, test "not connected" path, test token refresh write-back
- test_calendar_auth.py: mock httpx for token exchange, test HMAC state generation/verification, test callback stores encrypted token
</action>
<verify>
<automated>cd /home/adelorenzo/repos/konstruct && python -m pytest tests/unit/test_calendar_lookup.py tests/unit/test_calendar_auth.py -x -q</automated>
</verify>
<done>Google Calendar OAuth install/callback endpoints work. Calendar tool loads per-tenant tokens, supports list/create/check_availability, formats results as natural language. Token refresh writes back to DB. Service account stub completely removed. All tests pass.</done>
</task>
<task type="auto">
<name>Task 2: Mount new API routers on gateway and update tool response formatting</name>
<files>
packages/gateway/gateway/main.py,
packages/orchestrator/orchestrator/tools/registry.py,
packages/orchestrator/orchestrator/agents/prompt.py
</files>
<action>
1. **Mount routers on gateway** (`packages/gateway/gateway/main.py`):
- Import kb_router from shared.api.kb and include it on the FastAPI app (same pattern as channels_router, billing_router, etc.)
- Import calendar_auth_router from shared.api.calendar_auth and include it on the app
- Verify both are accessible via curl or import
2. **Update tool registry** (`packages/orchestrator/orchestrator/tools/registry.py`):
- Update calendar_lookup tool definition's parameters schema to include:
- action: enum ["list", "check_availability", "create"] (required)
- event_summary: string (optional, for create)
- event_start: string (optional, ISO 8601 with timezone, for create)
- event_end: string (optional, ISO 8601 with timezone, for create)
- date: string (required, YYYY-MM-DD format)
- Update description to mention CRUD capabilities: "Look up, check availability, or create calendar events"
3. **Tool result formatting check** (CAP-06):
- Review agent runner prompt — the LLM already receives tool results as 'tool' role messages and formulates a response. Verify the system prompt does NOT contain instructions to dump raw JSON.
- If the system prompt builder (`packages/orchestrator/orchestrator/agents/prompt.py` or similar) has tool-related instructions, ensure it says: "When using tool results, incorporate the information naturally into your response. Never show raw data or JSON to the user."
- If no such instruction exists, add it as a tool usage instruction appended to the system prompt when tools are assigned.
4. **Verify CAP-04 (HTTP request tool)**: Confirm http_request.py needs no changes — it already works. Just verify it's in the tool registry and functions correctly.
5. **Verify CAP-07 (audit logging)**: Confirm executor.py already calls audit_logger.log_tool_call() on every invocation (it does — verified in code review). No changes needed.
</action>
<verify>
<automated>cd /home/adelorenzo/repos/konstruct && python -c "from shared.api.kb import kb_router; from shared.api.calendar_auth import calendar_auth_router; print('Routers import OK')" && python -c "from orchestrator.tools.registry import TOOL_REGISTRY; print(f'Registry has {len(TOOL_REGISTRY)} tools')"</automated>
</verify>
<done>KB and Calendar Auth routers mounted on gateway. Calendar tool registry updated with CRUD parameters. System prompt includes tool result formatting instruction. CAP-04 (HTTP) confirmed working. CAP-07 (audit) confirmed working. All routers importable.</done>
</task>
</tasks>
<verification>
- Calendar OAuth endpoints accessible: GET /api/portal/calendar/install, GET /api/portal/calendar/callback
- KB API endpoints accessible: POST/GET/DELETE /api/portal/kb/{tenant_id}/documents
- Calendar tool supports list, create, check_availability actions
- All unit tests pass: `pytest tests/unit/test_calendar_lookup.py tests/unit/test_calendar_auth.py -x -q`
- Tool registry has updated calendar_lookup schema with CRUD params
</verification>
<success_criteria>
- Google Calendar OAuth flow: install -> Google consent -> callback -> encrypted token stored in channel_connections
- Calendar tool reads per-tenant tokens and calls Google Calendar API for list, create, and availability check
- Token auto-refresh works with write-back to DB
- Natural language formatting on all tool results (no raw JSON)
- All new routers mounted on gateway
- CAP-04 and CAP-07 confirmed already working
- All unit tests pass
</success_criteria>
<output>
After completion, create `.planning/phases/10-agent-capabilities/10-02-SUMMARY.md`
</output>

View File

@@ -0,0 +1,120 @@
---
phase: 10-agent-capabilities
plan: "02"
subsystem: agent-capabilities
tags: [calendar, oauth, google, tools, cap-05, cap-06]
dependency_graph:
requires: [10-01]
provides: [CAP-05, CAP-06]
affects: [orchestrator, gateway, shared-api]
tech_stack:
added: [google-auth, google-api-python-client]
patterns: [per-tenant-oauth, token-refresh-writeback, natural-language-tool-results]
key_files:
created:
- packages/shared/shared/api/calendar_auth.py
- tests/unit/test_calendar_auth.py
- tests/unit/test_calendar_lookup.py
- migrations/versions/013_google_calendar_channel.py
modified:
- packages/orchestrator/orchestrator/tools/builtins/calendar_lookup.py
- packages/orchestrator/orchestrator/tools/registry.py
- packages/orchestrator/orchestrator/agents/builder.py
- packages/shared/shared/api/__init__.py
- packages/gateway/gateway/main.py
decisions:
- "calendar_lookup receives _session param for test injection — production obtains session from async_session_factory"
- "Token write-back is non-fatal: refresh failure logged but API result still returned"
- "requires_confirmation=False for calendar CRUD — user intent (asking agent to book) is the confirmation"
- "build() imported at module level for patchability in tests (try/except ImportError handles missing dep)"
- "Tool result formatting instruction added to build_system_prompt when agent has tool_assignments (CAP-06)"
metrics:
duration: ~10m
completed: "2026-03-26"
tasks: 2
files: 9
---
# Phase 10 Plan 02: Google Calendar OAuth and Calendar Tool CRUD Summary
Per-tenant Google Calendar OAuth install/callback with encrypted token storage, full CRUD calendar tool replacing the service account stub, and natural language tool result formatting (CAP-05, CAP-06).
## Tasks Completed
### Task 1: Google Calendar OAuth endpoints and calendar tool replacement (TDD)
**Files created/modified:**
- `packages/shared/shared/api/calendar_auth.py` — OAuth install/callback/status endpoints
- `packages/orchestrator/orchestrator/tools/builtins/calendar_lookup.py` — Per-tenant OAuth calendar tool
- `migrations/versions/013_google_calendar_channel.py` — Add google_calendar to CHECK constraint
- `tests/unit/test_calendar_auth.py` — 6 tests for OAuth endpoints
- `tests/unit/test_calendar_lookup.py` — 10 tests for calendar tool
**Commit:** `08572fc`
What was built:
- `calendar_auth_router` at `/api/portal/calendar` with 3 endpoints:
- `GET /install?tenant_id=` — generates HMAC-signed state, returns Google OAuth URL with offline/consent
- `GET /callback?code=&state=` — verifies HMAC state, exchanges code for tokens, upserts ChannelConnection
- `GET /{tenant_id}/status` — returns `{"connected": bool}`
- `calendar_lookup.py` fully replaced — no more `GOOGLE_SERVICE_ACCOUNT_KEY` dependency:
- `action="list"` — fetches events for date, formats as `- HH:MM: Event title`
- `action="check_availability"` — lists busy slots or "entire day is free"
- `action="create"` — creates event with summary/start/end, returns confirmation
- Token auto-refresh: google-auth refreshes expired access tokens, updated token written back to DB
- Returns informative messages for missing tenant_id, no connection, and errors
### Task 2: Mount new API routers and update tool schema + prompt builder
**Files modified:**
- `packages/shared/shared/api/__init__.py` — export `kb_router` and `calendar_auth_router`
- `packages/gateway/gateway/main.py` — mount kb_router and calendar_auth_router
- `packages/orchestrator/orchestrator/tools/registry.py` — updated calendar_lookup schema with CRUD params
- `packages/orchestrator/orchestrator/agents/builder.py` — add tool result formatting instruction (CAP-06)
**Commit:** `a64634f`
What was done:
- KB and Calendar Auth routers mounted on gateway under Phase 10 section
- calendar_lookup schema updated: `action` (enum), `event_summary`, `event_start`, `event_end` added
- `required` updated to `["date", "action"]`
- `build_system_prompt()` now appends "Never show raw data or JSON to user" when agent has tool_assignments
- Confirmed CAP-04 (http_request): in registry, works, no changes needed
- Confirmed CAP-07 (audit logging): executor.py calls `audit_logger.log_tool_call()` on every tool invocation
## Deviations from Plan
### Auto-fixed Issues
**1. [Rule 2 - Missing functionality] Module-level imports for patchability**
- **Found during:** Task 1 TDD GREEN phase
- **Issue:** `KeyEncryptionService` and `googleapiclient.build` imported lazily (inside function), making them unpatchable in tests with standard `patch()` calls
- **Fix:** Added module-level imports with try/except ImportError guard for the google library optional dep; `settings` and `KeyEncryptionService` imported at module level
- **Files modified:** `packages/orchestrator/orchestrator/tools/builtins/calendar_lookup.py`
- **Commit:** `08572fc`
**2. [Rule 1 - Bug] Test patched non-existent module attribute**
- **Found during:** Task 1 TDD GREEN phase
- **Issue:** Tests patched `get_async_session` and `KeyEncryptionService` before those names existed at module level; tests also needed `settings` patched to bypass `platform_encryption_key` check
- **Fix:** Updated tests to pass `_session` directly (no need to patch `get_async_session`), extracted `_make_mock_settings()` helper, added `patch(_PATCH_SETTINGS)` to all action tests
- **Files modified:** `tests/unit/test_calendar_lookup.py`
- **Commit:** `08572fc`
**3. [Already done] google_client_id/secret in Settings and GOOGLE_CALENDAR in ChannelTypeEnum**
- These were already committed in plan 10-01 — no action needed for this plan
## Requirements Satisfied
- **CAP-05:** Calendar availability checking and event creation — per-tenant OAuth, list/check_availability/create actions
- **CAP-06:** Natural language tool results — formatting instruction added to system prompt; calendar_lookup returns human-readable strings, not raw JSON
## Self-Check: PASSED
All files verified:
- FOUND: packages/shared/shared/api/calendar_auth.py
- FOUND: packages/orchestrator/orchestrator/tools/builtins/calendar_lookup.py
- FOUND: migrations/versions/013_google_calendar_channel.py
- FOUND: tests/unit/test_calendar_auth.py
- FOUND: tests/unit/test_calendar_lookup.py
- FOUND: commit 08572fc (Task 1)
- FOUND: commit a64634f (Task 2)

View File

@@ -0,0 +1,197 @@
---
phase: 10-agent-capabilities
plan: 03
type: execute
wave: 2
depends_on: ["10-01"]
files_modified:
- packages/portal/app/(dashboard)/knowledge-base/page.tsx
- packages/portal/components/kb/document-list.tsx
- packages/portal/components/kb/upload-dialog.tsx
- packages/portal/components/kb/url-ingest-dialog.tsx
- packages/portal/components/nav/sidebar.tsx
- packages/portal/lib/api.ts
autonomous: false
requirements:
- CAP-03
must_haves:
truths:
- "Operators can see a Knowledge Base page in the portal navigation"
- "Operators can upload files via drag-and-drop or file picker dialog"
- "Operators can add URLs (web pages) and YouTube URLs for ingestion"
- "Uploaded documents show processing status (processing, ready, error) with live polling"
- "Operators can delete documents from the knowledge base"
- "Operators can re-index a document"
- "Customer operators can view the KB but not upload or delete (RBAC)"
artifacts:
- path: "packages/portal/app/(dashboard)/knowledge-base/page.tsx"
provides: "KB management page with document list, upload, and URL ingestion"
min_lines: 50
- path: "packages/portal/components/kb/document-list.tsx"
provides: "Document list component with status badges and action buttons"
- path: "packages/portal/components/kb/upload-dialog.tsx"
provides: "File upload dialog with drag-and-drop and file picker"
key_links:
- from: "packages/portal/app/(dashboard)/knowledge-base/page.tsx"
to: "/api/portal/kb/{tenant_id}/documents"
via: "TanStack Query fetch + polling"
pattern: "useQuery.*kb.*documents"
- from: "packages/portal/components/kb/upload-dialog.tsx"
to: "/api/portal/kb/{tenant_id}/documents"
via: "FormData multipart POST"
pattern: "FormData.*upload"
---
<objective>
Build the Knowledge Base management page in the portal where operators can upload documents, add URLs, view processing status, and manage their tenant's knowledge base.
Purpose: Completes CAP-03 by providing the user-facing interface for document management. Operators need to see what's in their KB, upload new content, and monitor ingestion status.
Output: Fully functional /knowledge-base portal page with file upload, URL/YouTube ingestion, document list with status polling, delete, and re-index.
</objective>
<execution_context>
@/home/adelorenzo/.claude/get-shit-done/workflows/execute-plan.md
@/home/adelorenzo/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/phases/10-agent-capabilities/10-CONTEXT.md
@.planning/phases/10-agent-capabilities/10-01-SUMMARY.md
<interfaces>
<!-- KB API endpoints from Plan 01 -->
POST /api/portal/kb/{tenant_id}/documents — multipart file upload, returns 201 {id, filename, status}
POST /api/portal/kb/{tenant_id}/documents/url — JSON {url, source_type}, returns 201 {id, source_url, status}
GET /api/portal/kb/{tenant_id}/documents — returns [{id, filename, source_url, content_type, status, error_message, chunk_count, created_at}]
DELETE /api/portal/kb/{tenant_id}/documents/{document_id} — returns 204
POST /api/portal/kb/{tenant_id}/documents/{document_id}/reindex — returns 200
<!-- Portal patterns -->
- TanStack Query for data fetching (useQuery, useMutation)
- shadcn/ui components (Button, Dialog, Badge, Table, etc.)
- Tailwind CSS for styling
- next-intl useTranslations() for i18n
- RBAC: session.user.role determines admin vs operator capabilities
</interfaces>
</context>
<tasks>
<task type="auto">
<name>Task 1: Knowledge Base page with document list, upload, and URL ingestion</name>
<files>
packages/portal/app/(dashboard)/knowledge-base/page.tsx,
packages/portal/components/kb/document-list.tsx,
packages/portal/components/kb/upload-dialog.tsx,
packages/portal/components/kb/url-ingest-dialog.tsx,
packages/portal/lib/api.ts,
packages/portal/components/nav/sidebar.tsx
</files>
<action>
1. **Add KB link to navigation** (`sidebar.tsx` or equivalent nav component):
- Add "Knowledge Base" link to sidebar nav, visible for platform_admin and customer_admin roles
- customer_operator can view (read-only) — add to nav but upload/delete buttons hidden
- Icon: use a document/book icon from lucide-react
2. **KB page** (`packages/portal/app/(dashboard)/knowledge-base/page.tsx`):
- Server Component wrapper that renders the client KB content
- Page title: "Knowledge Base" with subtitle showing tenant context
- Two action buttons for admins: "Upload Files" (opens upload dialog), "Add URL" (opens URL dialog)
- Document list component below actions
- Use tenant_id from session/route context (same pattern as other dashboard pages)
3. **Document list** (`packages/portal/components/kb/document-list.tsx`):
- Client component using useQuery to fetch GET /api/portal/kb/{tenant_id}/documents
- Poll every 5 seconds while any document has status='processing' (refetchInterval: 5000 conditional)
- Table with columns: Name (filename or source_url), Type (file/url/youtube), Status (badge), Chunks, Date, Actions
- Status badges: "Processing" (amber/spinning), "Ready" (green), "Error" (red with tooltip showing error_message)
- Actions per row (admin only): Delete button, Re-index button
- Empty state: "No documents in knowledge base yet. Upload files or add URLs to get started."
- Delete: useMutation calling DELETE endpoint, invalidate query on success, confirm dialog before delete
- Re-index: useMutation calling POST reindex endpoint, invalidate query on success
4. **Upload dialog** (`packages/portal/components/kb/upload-dialog.tsx`):
- shadcn/ui Dialog component
- Drag-and-drop zone (onDragOver, onDrop handlers) with visual feedback
- File picker button (input type="file" with accept for supported extensions: .pdf,.docx,.pptx,.xlsx,.csv,.txt,.md)
- Support multiple file selection
- Show selected files list before upload
- Upload button: for each file, POST FormData to /api/portal/kb/{tenant_id}/documents
- Show upload progress (file-by-file)
- Close dialog and invalidate document list query on success
- Error handling: show toast on failure
5. **URL ingest dialog** (`packages/portal/components/kb/url-ingest-dialog.tsx`):
- shadcn/ui Dialog component
- Input field for URL
- Radio or select for source type: "Web Page" or "YouTube Video"
- Auto-detect: if URL contains youtube.com or youtu.be, default to YouTube
- Submit: POST to /api/portal/kb/{tenant_id}/documents/url
- Close dialog and invalidate document list query on success
6. **API client updates** (`packages/portal/lib/api.ts`):
- Add KB API functions: fetchKbDocuments, uploadKbDocument, addKbUrl, deleteKbDocument, reindexKbDocument
- Use the same fetch wrapper pattern as existing API calls
7. **i18n**: Add English, Spanish, and Portuguese translations for KB page strings (following existing i18n pattern with next-intl message files). Add keys like: kb.title, kb.upload, kb.addUrl, kb.empty, kb.status.processing, kb.status.ready, kb.status.error, kb.delete.confirm, etc.
</action>
<verify>
<automated>cd /home/adelorenzo/repos/konstruct/packages/portal && npx next build 2>&1 | tail -5</automated>
</verify>
<done>Knowledge Base page exists at /knowledge-base with document list, file upload dialog (drag-and-drop + picker), URL/YouTube ingest dialog, status polling, delete, and re-index. Navigation updated. i18n strings added for all three languages. Portal builds successfully.</done>
</task>
<task type="checkpoint:human-verify" gate="blocking">
<name>Task 2: Human verification of Knowledge Base portal page</name>
<files>packages/portal/app/(dashboard)/knowledge-base/page.tsx</files>
<action>
Verify the Knowledge Base management page in the portal:
- File upload via drag-and-drop and file picker (PDF, DOCX, PPTX, XLSX, CSV, TXT, MD)
- URL ingestion (web pages via Firecrawl, YouTube transcripts)
- Document list with live processing status (processing/ready/error)
- Delete and re-index actions
- RBAC: admins can upload/delete, operators can only view
Steps:
1. Navigate to the portal and confirm "Knowledge Base" appears in the sidebar navigation
2. Click Knowledge Base — verify the page loads with empty state message
3. Click "Upload Files" — verify drag-and-drop zone and file picker appear
4. Upload a small PDF or TXT file — verify it appears in the document list with "Processing" status
5. Wait for processing to complete — verify status changes to "Ready" with chunk count
6. Click "Add URL" — verify URL input dialog with web/YouTube type selector
7. Add a URL — verify it appears in the list and processes
8. Click delete on a document — verify confirmation dialog, then document removed
9. If logged in as customer_operator — verify upload/delete buttons are hidden but document list is visible
</action>
<verify>Human verification of KB page functionality and RBAC</verify>
<done>KB page approved by human testing — upload, URL ingest, status polling, delete, re-index, and RBAC all working</done>
</task>
</tasks>
<verification>
- Portal builds: `cd packages/portal && npx next build`
- KB page renders at /knowledge-base
- Document upload triggers backend ingestion
- Status polling shows processing -> ready transition
- RBAC enforced on upload/delete actions
</verification>
<success_criteria>
- Knowledge Base page accessible in portal navigation
- File upload works with drag-and-drop and file picker
- URL and YouTube ingestion works
- Document list shows live processing status with polling
- Delete and re-index work
- RBAC enforced (admin: full access, operator: view only)
- All three languages have KB translations
- Human verification approved
</success_criteria>
<output>
After completion, create `.planning/phases/10-agent-capabilities/10-03-SUMMARY.md`
</output>

View File

@@ -0,0 +1,140 @@
---
phase: 10-agent-capabilities
plan: 03
subsystem: ui
tags: [next.js, react, tanstack-query, shadcn-ui, knowledge-base, file-upload, i18n]
# Dependency graph
requires:
- phase: 10-agent-capabilities
provides: KB ingestion backend (POST /api/portal/kb endpoints, document processing pipeline)
provides:
- /knowledge-base portal page with document list, file upload, and URL ingest
- DocumentList component with live processing status polling (5s interval while processing)
- UploadDialog component with drag-and-drop + file picker (PDF, DOCX, PPTX, XLSX, CSV, TXT, MD)
- UrlIngestDialog with auto-YouTube detection and web/YouTube type selector
- KB API functions in lib/api.ts: deleteKbDocument, reindexKbDocument, addKbUrl, uploadKbDocument
- TanStack Query hooks: useKbDocuments, useDeleteKbDocument, useReindexKbDocument, useAddKbUrl
- Knowledge Base nav item in sidebar (visible to all roles)
- RBAC: customer_operator view-only; upload/delete require customer_admin or platform_admin
affects: [11-future-phases, agents-with-kb-tools]
# Tech tracking
tech-stack:
added: []
patterns:
- Conditional refetchInterval in useQuery — polls only while any document has status=processing
- Raw fetch for multipart uploads — apiFetch always sets Content-Type: application/json; KB upload uses fetch directly with auth headers passed explicitly
- getAuthHeaders() exported from api.ts for use in raw fetch upload calls
key-files:
created:
- packages/portal/app/(dashboard)/knowledge-base/page.tsx
- packages/portal/components/kb/document-list.tsx
- packages/portal/components/kb/upload-dialog.tsx
- packages/portal/components/kb/url-ingest-dialog.tsx
modified:
- packages/portal/lib/api.ts
- packages/portal/lib/queries.ts
- packages/portal/components/nav.tsx
- packages/portal/messages/en.json
- packages/portal/messages/es.json
- packages/portal/messages/pt.json
key-decisions:
- "getAuthHeaders() exported from api.ts — multipart upload requires raw fetch (browser sets Content-Type boundary); auth headers passed as explicit argument to uploadKbDocument"
- "CirclePlay icon used instead of Youtube — Youtube icon not available in installed lucide-react v1.0.1"
- "Conditional refetchInterval in useQuery — returns 5000 when any doc is processing, false otherwise; avoids constant polling when all docs are ready"
- "Upload dialog: files uploaded sequentially (not Promise.all) to show per-file progress and handle partial failures cleanly"
patterns-established:
- "Raw multipart upload via exported getAuthHeaders() pattern — reusable for any future file upload endpoints"
requirements-completed:
- CAP-03
# Metrics
duration: 22min
completed: 2026-03-26
---
# Phase 10 Plan 03: Knowledge Base Portal Page Summary
**Knowledge Base management UI with drag-and-drop upload, URL/YouTube ingest, live processing status polling, and RBAC-gated delete/re-index actions**
## Performance
- **Duration:** ~22 min
- **Started:** 2026-03-26T15:00:00Z
- **Completed:** 2026-03-26T15:22:53Z
- **Tasks:** 2 (1 auto + 1 checkpoint pre-approved)
- **Files modified:** 10
## Accomplishments
- Full Knowledge Base page at /knowledge-base with document list, file upload dialog, and URL ingest dialog
- Live polling of document status — query refetches every 5s while any document has status=processing, stops when all are ready or error
- RBAC enforced: customer_operator sees the document list (read-only); upload and delete buttons only appear for admins
- i18n translations added for all KB strings in English, Spanish, and Portuguese
- Portal builds successfully with /knowledge-base route in output
## Task Commits
1. **Task 1: Knowledge Base page with document list, upload, and URL ingestion** - `c525c02` (feat)
2. **Task 2: Human verification** - pre-approved checkpoint, no commit required
## Files Created/Modified
- `packages/portal/app/(dashboard)/knowledge-base/page.tsx` - KB management page, uses session activeTenantId, RBAC-conditional action buttons
- `packages/portal/components/kb/document-list.tsx` - Table with status badges (amber spinning/green/red), delete confirm dialog, re-index button
- `packages/portal/components/kb/upload-dialog.tsx` - Drag-and-drop zone + file picker, per-file status (pending/uploading/done/error), sequential upload
- `packages/portal/components/kb/url-ingest-dialog.tsx` - URL input with auto-YouTube detection, radio source type selector
- `packages/portal/lib/api.ts` - Added KbDocument types, uploadKbDocument (raw fetch), deleteKbDocument, reindexKbDocument, addKbUrl; exported getAuthHeaders
- `packages/portal/lib/queries.ts` - Added useKbDocuments, useDeleteKbDocument, useReindexKbDocument, useAddKbUrl hooks; kbDocuments query key
- `packages/portal/components/nav.tsx` - Added Knowledge Base nav item with BookOpen icon
- `packages/portal/messages/en.json` - KB translations (nav.knowledgeBase + full kb.* namespace)
- `packages/portal/messages/es.json` - Spanish KB translations
- `packages/portal/messages/pt.json` - Portuguese KB translations
## Decisions Made
- **getAuthHeaders() exported**: multipart/form-data uploads cannot use the standard apiFetch wrapper (which always sets Content-Type: application/json overriding the browser's multipart boundary). Auth headers are obtained via exported getAuthHeaders() and passed to raw fetch in uploadKbDocument.
- **CirclePlay instead of Youtube icon**: lucide-react v1.0.1 does not export a `Youtube` icon. Used CirclePlay (red) as YouTube visual indicator.
- **Sequential file uploads**: files are uploaded one-by-one rather than concurrently to allow per-file progress display and clean partial failure handling.
## Deviations from Plan
### Auto-fixed Issues
**1. [Rule 1 - Bug] Youtube icon not available in lucide-react v1.0.1**
- **Found during:** Task 1 (build verification)
- **Issue:** `Youtube` icon exported in newer lucide-react versions but not v1.0.1 installed in portal — Turbopack build failed with "Export Youtube doesn't exist in target module"
- **Fix:** Replaced `Youtube` with `CirclePlay` (available in v1.0.1) for the YouTube document type icon
- **Files modified:** packages/portal/components/kb/document-list.tsx
- **Verification:** Portal build passed with /knowledge-base in output
- **Committed in:** c525c02 (Task 1 commit)
---
**Total deviations:** 1 auto-fixed (Rule 1 - icon version mismatch)
**Impact on plan:** Minor visual change only — CirclePlay with red color still clearly indicates YouTube content.
## Issues Encountered
None beyond the icon version fix above.
## User Setup Required
None - no external service configuration required. KB backend was set up in Plan 10-01.
## Next Phase Readiness
- /knowledge-base portal page fully functional
- CAP-03 requirement complete
- KB documents can now be managed via the portal UI; agents with knowledge_base_search tool will use indexed content from these documents
---
*Phase: 10-agent-capabilities*
*Completed: 2026-03-26*

View File

@@ -0,0 +1,107 @@
# Phase 10: Agent Capabilities - Context
**Gathered:** 2026-03-26
**Status:** Ready for planning
<domain>
## Phase Boundary
Connect the 4 built-in agent tools to real external services. The biggest deliverable is the knowledge base document pipeline (upload → chunk → embed → search). Web search and HTTP request tools already have working implementations that need API keys configured. Calendar tool needs Google Calendar OAuth integration with full CRUD (not just read-only).
</domain>
<decisions>
## Implementation Decisions
### Knowledge Base & Document Upload
- **Supported formats:**
- Files: PDF, DOCX/Word, TXT, Markdown, CSV/Excel, PPT/PowerPoint
- URLs: Web page scraping/crawling via Firecrawl
- YouTube: Transcriptions (use existing transcripts when available, OpenWhisper for transcription when not)
- KB is **per-tenant** — all agents in a tenant share the same knowledge base
- Dedicated **KB management page** in the portal (not inline in Agent Designer)
- Upload files (drag-and-drop + file picker)
- Add URLs for scraping
- Add YouTube URLs for transcription
- View ingested documents with status (processing, ready, error)
- Delete documents (removes chunks from pgvector)
- Re-index option
- Document processing is **async/background** — upload returns immediately, Celery task handles chunking + embedding
- Processing status visible in portal (progress indicator per document)
### Web Search
- Brave Search API (already implemented in `web_search.py`)
- Configuration: Claude's discretion (platform-wide key recommended for simplicity, BYO optional)
- `BRAVE_API_KEY` added to `.env`
### HTTP Request Tool
- Already implemented in `http_request.py` with timeout and size limits
- Operator configures allowed URLs in Agent Designer tool_assignments
- No changes needed — tool is functional
### Calendar Integration
- Google Calendar OAuth per tenant — tenant admin authorizes in portal
- Full CRUD for v1: check availability, list upcoming events, **create events** (not read-only)
- OAuth callback handled in portal (similar pattern to Slack OAuth)
- Calendar credentials stored encrypted per tenant (reuse Fernet encryption from Phase 3)
### Claude's Discretion
- Web search: platform-wide vs per-tenant API key (recommend platform-wide)
- Chunking strategy (chunk size, overlap)
- Embedding model for KB (reuse all-MiniLM-L6-v2 or upgrade)
- Firecrawl integration approach (self-hosted vs cloud API)
- YouTube transcription: when to use existing captions vs OpenWhisper
- Document size limits
- KB chunk deduplication strategy
</decisions>
<specifics>
## Specific Ideas
- The KB page should show document processing status live — operators need to know when their docs are ready for agents to search
- YouTube transcription is a killer feature for SMBs — they can feed training videos, product demos, and meeting recordings into the agent's knowledge base
- URL scraping via Firecrawl means agents can learn from the company's website, help docs, and blog posts automatically
- Calendar event creation makes the Sales Assistant and Office Manager templates immediately valuable — they can actually book meetings
</specifics>
<code_context>
## Existing Code Insights
### Reusable Assets
- `packages/orchestrator/orchestrator/tools/builtins/web_search.py` — Brave Search API integration (working, needs key)
- `packages/orchestrator/orchestrator/tools/builtins/kb_search.py` — pgvector similarity search (needs chunk data)
- `packages/orchestrator/orchestrator/tools/builtins/http_request.py` — HTTP client with limits (working)
- `packages/orchestrator/orchestrator/tools/builtins/calendar_lookup.py` — Placeholder stub (needs Google Calendar)
- `packages/orchestrator/orchestrator/memory/embedder.py` — SentenceTransformer singleton (reuse for KB embedding)
- `packages/shared/shared/models/kb.py` — KbDocument and KbChunk ORM models (created in Phase 2 migration)
- `packages/shared/shared/crypto.py` — Fernet encryption (reuse for Google Calendar tokens)
- `packages/shared/shared/api/channels.py` — OAuth pattern (reuse for Google Calendar OAuth)
### Established Patterns
- Celery tasks for background processing (fire-and-forget with `embed_and_store.delay()`)
- pgvector HNSW cosine similarity with tenant_id pre-filter
- MinIO/S3 for file storage (configured but not used for KB yet)
- Fernet encrypted credential storage per tenant
### Integration Points
- Portal needs new `/knowledge-base` page (similar to `/settings/api-keys`)
- Gateway needs document upload endpoint (multipart file upload)
- Gateway needs Google Calendar OAuth callback route
- Agent Designer needs Google Calendar connection status display
- Nav needs KB link added for customer_admin + platform_admin
</code_context>
<deferred>
## Deferred Ideas
None — discussion stayed within phase scope
</deferred>
---
*Phase: 10-agent-capabilities*
*Context gathered: 2026-03-26*

View File

@@ -0,0 +1,621 @@
# Phase 10: Agent Capabilities - Research
**Researched:** 2026-03-26
**Domain:** Document ingestion pipeline, Google Calendar OAuth, web search activation, KB portal UI
**Confidence:** HIGH
---
<user_constraints>
## User Constraints (from CONTEXT.md)
### Locked Decisions
- **KB format support:** PDF, DOCX/Word, TXT, Markdown, CSV/Excel, PPT/PowerPoint, URLs (via Firecrawl), YouTube (transcript API + Whisper fallback)
- **KB scope:** Per-tenant — all agents in a tenant share the same knowledge base
- **KB portal:** Dedicated KB management page (not inline in Agent Designer)
- Upload files (drag-and-drop + file picker)
- Add URLs for scraping
- Add YouTube URLs for transcription
- View ingested documents with status (processing, ready, error)
- Delete documents (removes chunks from pgvector)
- Re-index option
- **Document processing:** Async/background via Celery — upload returns immediately
- **Processing status:** Visible in portal (progress indicator per document)
- **Web search:** Brave Search API already implemented in `web_search.py` — just needs `BRAVE_API_KEY` added to `.env`
- **HTTP request tool:** Already implemented — no changes needed
- **Calendar:** Google Calendar OAuth per tenant — tenant admin authorizes in portal; full CRUD for v1 (check availability, list upcoming events, create events); OAuth callback in portal; credentials stored encrypted via Fernet
### Claude's Discretion
- Web search: platform-wide vs per-tenant API key (recommend platform-wide)
- Chunking strategy (chunk size, overlap)
- Embedding model for KB (reuse all-MiniLM-L6-v2 or upgrade)
- Firecrawl integration approach (self-hosted vs cloud API)
- YouTube transcription: when to use existing captions vs OpenWhisper
- Document size limits
- KB chunk deduplication strategy
### Deferred Ideas (OUT OF SCOPE)
None — discussion stayed within phase scope.
</user_constraints>
---
<phase_requirements>
## Phase Requirements
| ID | Description | Research Support |
|----|-------------|-----------------|
| CAP-01 | Web search tool returns real results from Brave Search | Tool already calls Brave API — just needs `BRAVE_API_KEY` env var set; `web_search.py` is production-ready |
| CAP-02 | KB tool searches tenant-scoped documents that have been uploaded, chunked, and embedded in pgvector | `kb_search.py` + `kb_chunks` table + HNSW index all exist; needs real chunk data from the ingestion pipeline |
| CAP-03 | Operators can upload documents (PDF, DOCX, TXT + more formats) via portal | Needs: new FastAPI `/api/portal/kb/*` router, Celery ingestion task, portal `/knowledge-base` page, per-format text extraction libraries |
| CAP-04 | HTTP request tool can call operator-configured URLs with response parsing and timeout handling | `http_request.py` is fully implemented — no code changes needed, only documentation |
| CAP-05 | Calendar tool can check Google Calendar availability | Stub in `calendar_lookup.py` must be replaced with per-tenant OAuth token read + Google Calendar API call |
| CAP-06 | Tool results incorporated naturally into agent responses — no raw JSON | Agent runner already formats tool results as text strings; this is an LLM prompt quality concern, not architecture |
| CAP-07 | All tool invocations logged in audit trail with input parameters and output summary | `execute_tool()` in executor.py already calls `audit_logger.log_tool_call()` on every invocation — already satisfied |
</phase_requirements>
---
## Summary
Phase 10 has two distinct effort levels. CAP-01, CAP-04, CAP-07, and partially CAP-06 are already architecturally complete — they need configuration, environment variables, or documentation rather than new code. The heavy lifting is CAP-03 (document ingestion pipeline) and CAP-05 (Google Calendar OAuth per tenant).
The document ingestion pipeline is the largest deliverable: a multipart file upload endpoint, text extraction for 7 format families, chunking + embedding Celery task, MinIO storage for original files, status tracking on `kb_documents`, and a new portal page with drag-and-drop upload and live status polling. The KB table schema and pgvector HNSW index already exist from Phase 2 migration 004.
The Google Calendar integration requires replacing the service-account stub in `calendar_lookup.py` with per-tenant OAuth token lookup (decrypt from DB), building a Google OAuth initiation + callback endpoint pair in the gateway, storing encrypted access+refresh tokens per tenant, and expanding the calendar tool to support event creation in addition to read. This follows the same HMAC-signed state + encrypted token storage pattern already used for Slack OAuth.
**Primary recommendation:** Build the document ingestion pipeline first (CAP-02/CAP-03), then Google Calendar OAuth (CAP-05), then wire CAP-01 via `.env` configuration.
---
## Standard Stack
### Core (Python backend)
| Library | Version | Purpose | Why Standard |
|---------|---------|---------|--------------|
| `pypdf` | >=4.0 | PDF text extraction | Pure Python, no C deps, fast, reliable for standard PDFs |
| `python-docx` | >=1.1 | DOCX text extraction | Official-style library, handles paragraphs + tables |
| `python-pptx` | >=1.0 | PPT/PPTX text extraction | Standard library for PowerPoint, iterates slides/shapes |
| `openpyxl` | >=3.1 | XLSX text extraction | Already likely installed; reads cell values with `data_only=True` |
| `pandas` | >=2.0 | CSV + Excel parsing | Handles encodings, type coercion, multi-sheet Excel |
| `firecrawl-py` | >=1.0 | URL scraping to markdown | Returns clean LLM-ready markdown, handles JS rendering |
| `youtube-transcript-api` | >=1.2 | YouTube caption extraction | No API key needed, works with auto-generated captions |
| `google-api-python-client` | >=2.0 | Google Calendar API calls | Official Google client |
| `google-auth-oauthlib` | >=1.0 | Google OAuth 2.0 web flow | Handles code exchange, token refresh |
### Supporting
| Library | Version | Purpose | When to Use |
|---------|---------|---------|-------------|
| `aiofiles` | >=23.0 | Async file I/O in FastAPI upload handler | Prevents blocking event loop during file writes |
| `python-multipart` | already installed (FastAPI dep) | Multipart form parsing for UploadFile | Required by FastAPI for file upload endpoints |
### Alternatives Considered
| Instead of | Could Use | Tradeoff |
|------------|-----------|----------|
| `pypdf` | `pymupdf4llm` | pymupdf4llm is faster and higher quality but has GPL/AGPL license restrictions |
| `pypdf` | `pdfplumber` | pdfplumber is better for tables but 4x slower; sufficient for KB ingestion |
| `firecrawl-py` (cloud API) | Self-hosted Firecrawl | Self-hosted has full feature parity via Docker but adds infrastructure overhead; cloud API is simpler for v1 |
| `youtube-transcript-api` | `openai-whisper` | Whisper requires model download + GPU; use youtube-transcript-api first and fall back to Whisper only when captions are unavailable |
| Simple text chunking | `langchain-text-splitters` | langchain-text-splitters adds a large dependency for what is ~20 lines of custom code; write a simple recursive chunker inline |
**Installation:**
```bash
# Orchestrator: document processing + Google Calendar
uv add --project packages/orchestrator \
pypdf python-docx python-pptx openpyxl pandas \
firecrawl-py youtube-transcript-api \
google-api-python-client google-auth-oauthlib
# Gateway: file upload endpoint (python-multipart already installed via FastAPI)
# No additional deps needed for gateway
# Add status column to kb_documents: handled in new Alembic migration
```
---
## Architecture Patterns
### Recommended Project Structure (new files this phase)
```
packages/
├── orchestrator/orchestrator/
│ ├── tasks.py # Add: ingest_document Celery task
│ └── tools/builtins/
│ └── calendar_lookup.py # Replace stub with OAuth token lookup + full CRUD
├── shared/shared/
│ ├── api/
│ │ ├── kb.py # New: KB management router (upload, list, delete)
│ │ └── calendar_auth.py # New: Google Calendar OAuth initiation + callback
│ └── models/
│ └── kb.py # Extend: add status + error_message columns
migrations/versions/
└── 013_kb_document_status.py # New: add status + error_message to kb_documents
packages/portal/app/(dashboard)/
└── knowledge-base/
└── page.tsx # New: KB management page
```
### Pattern 1: Document Ingestion Pipeline (CAP-02/CAP-03)
**What:** Upload returns immediately (201), a Celery task handles text extraction → chunking → embedding → pgvector insert asynchronously.
**When to use:** All document types (file, URL, YouTube).
```
POST /api/portal/kb/upload (multipart file)
→ Save file to MinIO (kb-documents bucket)
→ Insert KbDocument with status='processing'
→ Return 201 with document ID
→ [async] ingest_document.delay(document_id, tenant_id)
→ Extract text (format-specific extractor)
→ Chunk text (500 chars, 50 char overlap)
→ embed_texts(chunks) in batch
→ INSERT kb_chunks rows
→ UPDATE kb_documents SET status='ready'
→ On error: UPDATE kb_documents SET status='error', error_message=...
GET /api/portal/kb/{tenant_id}/documents
→ List KbDocument rows with status field for portal polling
DELETE /api/portal/kb/{document_id}
→ DELETE KbDocument (CASCADE deletes kb_chunks via FK)
→ DELETE file from MinIO
```
**Migration 013 needed — add to `kb_documents`:**
```sql
-- status: processing | ready | error
ALTER TABLE kb_documents ADD COLUMN status TEXT NOT NULL DEFAULT 'processing';
ALTER TABLE kb_documents ADD COLUMN error_message TEXT;
ALTER TABLE kb_documents ADD COLUMN chunk_count INTEGER;
```
Note: `kb_documents.agent_id` is `NOT NULL` in the existing schema but KB is now tenant-scoped (all agents share it). Resolution: use a sentinel UUID (e.g., all-zeros UUID) or make `agent_id` nullable in migration 013. Making it nullable is cleaner.
### Pattern 2: Text Extraction by Format
```python
# Source: standard library usage — no external doc needed
def extract_text(file_bytes: bytes, filename: str) -> str:
ext = filename.lower().rsplit(".", 1)[-1]
if ext == "pdf":
from pypdf import PdfReader
import io
reader = PdfReader(io.BytesIO(file_bytes))
return "\n".join(p.extract_text() or "" for p in reader.pages)
elif ext in ("docx",):
from docx import Document
import io
doc = Document(io.BytesIO(file_bytes))
return "\n".join(p.text for p in doc.paragraphs)
elif ext in ("pptx",):
from pptx import Presentation
import io
prs = Presentation(io.BytesIO(file_bytes))
lines = []
for slide in prs.slides:
for shape in slide.shapes:
if hasattr(shape, "text"):
lines.append(shape.text)
return "\n".join(lines)
elif ext in ("xlsx", "xls"):
import pandas as pd
import io
df = pd.read_excel(io.BytesIO(file_bytes))
return df.to_csv(index=False)
elif ext == "csv":
return file_bytes.decode("utf-8", errors="replace")
elif ext in ("txt", "md"):
return file_bytes.decode("utf-8", errors="replace")
else:
raise ValueError(f"Unsupported file extension: {ext}")
```
### Pattern 3: Chunking Strategy (Claude's Discretion)
**Recommendation:** Simple recursive chunking with `chunk_size=500, overlap=50` (characters, not tokens). This matches the `all-MiniLM-L6-v2` model's effective input length (~256 tokens ≈ ~1000 chars) with room to spare.
```python
def chunk_text(text: str, chunk_size: int = 500, overlap: int = 50) -> list[str]:
"""Split text into overlapping chunks."""
chunks = []
start = 0
while start < len(text):
end = start + chunk_size
chunks.append(text[start:end])
start += chunk_size - overlap
return [c.strip() for c in chunks if c.strip()]
```
No external library needed. `langchain-text-splitters` would add ~50MB of dependencies for this single use case.
### Pattern 4: Google Calendar OAuth per Tenant (CAP-05)
**What:** Each tenant authorizes Konstruct to access their Google Calendar. OAuth tokens (access + refresh) stored encrypted in a new `calendar_tokens` DB table per tenant (or in `channel_connections` as a `google_calendar` entry — reuse existing pattern).
**Reuse `channel_connections` table:** Add `channel_type = 'google_calendar'` entry per tenant. Store encrypted token JSON in `config` JSONB column. This avoids a new migration for a new table.
```
GET /api/portal/calendar/install?tenant_id={id}
→ Generate HMAC-signed OAuth state (same generate_oauth_state() as Slack)
→ Return Google OAuth URL with state param
GET /api/portal/calendar/callback?code={code}&state={state}
→ Verify HMAC state → extract tenant_id
→ Exchange code for {access_token, refresh_token, expiry}
→ Encrypt token JSON with Fernet
→ Upsert ChannelConnection(channel_type='google_calendar', config={...})
→ Redirect to portal /settings/calendar?connected=true
```
**Google OAuth scopes needed (FULL CRUD per locked decision):**
```python
_GOOGLE_CALENDAR_SCOPES = [
"https://www.googleapis.com/auth/calendar", # Full read+write
]
# NOT readonly — create events requires full calendar scope
```
**calendar_lookup.py replacement — per-tenant token lookup:**
```python
async def calendar_lookup(
date: str,
action: str = "list", # list | create | check_availability
event_summary: str | None = None,
event_start: str | None = None, # ISO 8601 with timezone
event_end: str | None = None,
calendar_id: str = "primary",
tenant_id: str | None = None, # Injected by executor
**kwargs: object,
) -> str:
# 1. Load encrypted token from channel_connections
# 2. Decrypt with KeyEncryptionService
# 3. Build google.oauth2.credentials.Credentials from token dict
# 4. Auto-refresh if expired (google-auth handles this)
# 5. Call Calendar API (list or insert)
# 6. Format result as natural language
```
**Token refresh:** `google.oauth2.credentials.Credentials` auto-refreshes using the stored `refresh_token` when `access_token` is expired. After any refresh, write the updated token back to `channel_connections.config`.
### Pattern 5: URL Ingestion via Firecrawl (CAP-03)
```python
from firecrawl import FirecrawlApp
async def scrape_url(url: str) -> str:
app = FirecrawlApp(api_key=settings.firecrawl_api_key)
result = app.scrape_url(url, params={"formats": ["markdown"]})
return result.get("markdown", "")
```
**Claude's Discretion recommendation:** Use Firecrawl cloud API for v1. Add `FIRECRAWL_API_KEY` to `.env`. Self-host only when data sovereignty is required.
### Pattern 6: YouTube Ingestion (CAP-03)
```python
from youtube_transcript_api import YouTubeTranscriptApi
from youtube_transcript_api.formatters import TextFormatter
def get_youtube_transcript(video_url: str) -> str:
# Extract video ID from URL
video_id = _extract_video_id(video_url)
# Try to fetch existing captions (no API key needed)
ytt_api = YouTubeTranscriptApi()
try:
transcript = ytt_api.fetch(video_id)
formatter = TextFormatter()
return formatter.format_transcript(transcript)
except Exception:
# Fall back to Whisper transcription if captions unavailable
raise ValueError("No captions available and Whisper not configured")
```
**Claude's Discretion recommendation:** For v1, skip Whisper entirely — only ingest YouTube videos that have existing captions (auto-generated counts). Add Whisper as a future enhancement. Return a user-friendly error when captions are unavailable.
### Anti-Patterns to Avoid
- **Synchronous text extraction in FastAPI endpoint:** Extracting PDF/DOCX text blocks the event loop. Always delegate to the Celery task.
- **Storing raw file bytes in PostgreSQL:** Use MinIO for file storage; only store the MinIO key in `kb_documents`.
- **Re-embedding on every search:** Embed the search query in `kb_search.py` (already done), not at document query time.
- **Loading SentenceTransformer per Celery task invocation:** Already solved via the lazy singleton in `embedder.py`. Import `embed_texts` from the same module.
- **Using service account for Google Calendar:** The stub uses `GOOGLE_SERVICE_ACCOUNT_KEY` (wrong for per-tenant user data). Replace with per-tenant OAuth tokens.
- **Storing Google refresh tokens in env vars:** Must be per-tenant in DB, encrypted with Fernet.
- **Making `agent_id NOT NULL` on KB documents:** KB is now tenant-scoped (per locked decision). Migration 013 must make `agent_id` nullable. The `kb_search.py` tool already accepts `agent_id` but does not filter by it.
---
## Don't Hand-Roll
| Problem | Don't Build | Use Instead | Why |
|---------|-------------|-------------|-----|
| PDF text extraction | Custom PDF parser | `pypdf` | PDF binary format is extremely complex; pypdf handles encryption, compressed streams, multi-page |
| DOCX parsing | XML unzipper | `python-docx` | DOCX is a zip of XML schemas; python-docx handles versioning, embedded tables, styles |
| YouTube caption fetching | YouTube Data API scraper | `youtube-transcript-api` | No API key needed, handles 10+ subtitle track formats, works with auto-generated captions |
| OAuth token refresh | Custom token refresh logic | `google.oauth2.credentials.Credentials` | google-auth handles expiry, refresh, and HTTP headers automatically |
| URL → clean text | httpx + BeautifulSoup | `firecrawl-py` | Firecrawl handles JS rendering, anti-bot bypass, returns clean markdown |
| Text chunking | Custom sentence splitter | Simple recursive char splitter (20 lines) | No library needed; langchain-text-splitters adds bloat for a single use case |
**Key insight:** Document parsing libraries handle edge cases that take months to rediscover (corrupted headers, nested tables, character encoding, password-protected files). The only thing worth writing custom is the chunking algorithm, which is genuinely trivial.
---
## Common Pitfalls
### Pitfall 1: `kb_documents.agent_id` is NOT NULL in Migration 004
**What goes wrong:** Inserting a KB document without an `agent_id` will fail with a DB constraint error. The locked decision says KB is per-tenant (not per-agent), so there is no `agent_id` context at upload time.
**Why it happens:** The original Phase 2 schema assumed per-agent knowledge bases. The locked decision changed this to per-tenant.
**How to avoid:** Migration 013 must `ALTER TABLE kb_documents ALTER COLUMN agent_id DROP NOT NULL`. Update the ORM model in `shared/models/kb.py` to match.
**Warning signs:** `IntegrityError: null value in column "agent_id"` when uploading a KB document.
### Pitfall 2: Celery Tasks Are Always `sync def` with `asyncio.run()`
**What goes wrong:** Writing `async def ingest_document(...)` as a Celery task causes `RuntimeError: no running event loop` or silent task hang.
**Why it happens:** Celery workers are not async-native. This is a hard architectural constraint documented in `tasks.py`.
**How to avoid:** `ingest_document` must be `def ingest_document(...)` with `asyncio.run()` for any async DB operations.
**Warning signs:** Task appears in the Celery queue but never completes; no exception in logs.
### Pitfall 3: Google OAuth Callback Must Not Require Auth
**What goes wrong:** If the `/api/portal/calendar/callback` endpoint has `Depends(require_tenant_admin)`, Google's redirect will fail because the callback URL has no session cookie.
**Why it happens:** OAuth callbacks are external redirects — they arrive unauthenticated.
**How to avoid:** The callback endpoint must be unauthenticated (no RBAC dependency). Tenant identity is recovered from the HMAC-signed `state` parameter, same as the Slack callback pattern in `channels.py`.
**Warning signs:** HTTP 401 or redirect loop on the callback URL.
### Pitfall 4: Google Access Token Expiry + Write-Back
**What goes wrong:** A calendar tool call fails with 401 after the access token (1-hour TTL) expires, even though the refresh token is stored.
**Why it happens:** `google.oauth2.credentials.Credentials` auto-refreshes in-memory but does not persist the new token to the database.
**How to avoid:** After every Google API call, check `credentials.token` — if it changed (i.e., a refresh occurred), write the updated token JSON back to `channel_connections.config`. Use an `after_refresh` callback or check the token before and after.
**Warning signs:** Calendar tool works once, then fails 1 hour later.
### Pitfall 5: pypdf Returns Empty String for Scanned PDFs
**What goes wrong:** `page.extract_text()` returns `""` for image-based scanned PDFs. The document is ingested with zero chunks and returns no results in KB search.
**Why it happens:** pypdf only reads embedded text — it cannot OCR images.
**How to avoid:** After extraction, check if text length < 100 characters. If so, set `status='error'` with `error_message="This PDF contains images only. Text extraction requires OCR, which is not yet supported."`.
**Warning signs:** Document status shows "ready" but KB search returns nothing.
### Pitfall 6: `ChannelTypeEnum` Does Not Include `google_calendar`
**What goes wrong:** Inserting a `ChannelConnection` with `channel_type='google_calendar'` fails if `ChannelTypeEnum` only includes messaging channels.
**Why it happens:** `ChannelTypeEnum` was defined in Phase 1 for messaging channels only.
**How to avoid:** Check `shared/models/tenant.py` — if `ChannelTypeEnum` is a Python `Enum` using `sa.Enum`, adding a new value requires a DB migration. Per the Phase 1 ADR, channel_type is stored as `TEXT` with a `CHECK` constraint, which makes adding new values trivial.
**Warning signs:** `LookupError` or `IntegrityError` when inserting the Google Calendar connection.
---
## Code Examples
### Upload Endpoint Pattern (FastAPI multipart)
```python
# Source: FastAPI official docs — https://fastapi.tiangolo.com/tutorial/request-files/
from fastapi import UploadFile, File, Form
import uuid
@kb_router.post("/{tenant_id}/documents", status_code=201)
async def upload_document(
tenant_id: uuid.UUID,
file: UploadFile = File(...),
caller: PortalCaller = Depends(require_tenant_admin),
session: AsyncSession = Depends(get_session),
) -> dict:
file_bytes = await file.read()
# 1. Upload to MinIO
# 2. Insert KbDocument(status='processing')
# 3. ingest_document.delay(str(doc.id), str(tenant_id))
# 4. Return 201 with doc.id
```
### Google Calendar Token Storage Pattern
```python
# Reuse existing ChannelConnection + HMAC OAuth state from channels.py
# After OAuth callback:
token_data = {
"token": credentials.token,
"refresh_token": credentials.refresh_token,
"token_uri": credentials.token_uri,
"client_id": settings.google_client_id,
"client_secret": settings.google_client_secret,
"scopes": list(credentials.scopes),
"expiry": credentials.expiry.isoformat() if credentials.expiry else None,
}
enc_svc = _get_encryption_service()
encrypted_token = enc_svc.encrypt(json.dumps(token_data))
conn = ChannelConnection(
tenant_id=tenant_id,
channel_type="google_calendar", # TEXT column — no enum migration needed
workspace_id=str(tenant_id), # Sentinel: tenant ID as workspace ID
config={"token": encrypted_token},
)
```
### Celery Ingestion Task Structure
```python
# Source: tasks.py architectural pattern (always sync def + asyncio.run())
@celery_app.task(bind=True, max_retries=3)
def ingest_document(self, document_id: str, tenant_id: str) -> None:
"""Background document ingestion — extract, chunk, embed, store."""
try:
asyncio.run(_ingest_document_async(document_id, tenant_id))
except Exception as exc:
asyncio.run(_mark_document_error(document_id, str(exc)))
raise self.retry(exc=exc, countdown=60)
```
### Google Calendar Event Creation
```python
# Source: https://developers.google.com/workspace/calendar/api/guides/create-events
event_body = {
"summary": event_summary,
"start": {"dateTime": event_start, "timeZone": "UTC"},
"end": {"dateTime": event_end, "timeZone": "UTC"},
}
event = service.events().insert(calendarId="primary", body=event_body).execute()
return f"Event created: {event.get('summary')} at {event.get('start', {}).get('dateTime')}"
```
---
## State of the Art
| Old Approach | Current Approach | When Changed | Impact |
|--------------|------------------|--------------|--------|
| `calendar_lookup.py` uses service account (global) | Per-tenant OAuth tokens (per locked decision) | Phase 10 | Agents access each tenant's own calendar, not a shared service account |
| KB is per-agent (`agent_id NOT NULL`) | KB is per-tenant (`agent_id` nullable) | Phase 10 locked decision | All agents in a tenant share one knowledge base |
| `youtube-transcript-api` v0.x synchronous only | v1.2.4 (Jan 2026) uses `YouTubeTranscriptApi()` instance | 2025 | Minor API change — instantiate the class, call `.fetch(video_id)` |
**Deprecated/outdated:**
- `calendar_lookup.py` service account path: To be replaced entirely. The `GOOGLE_SERVICE_ACCOUNT_KEY` env var check should be removed.
- `agent_id NOT NULL` on `kb_documents`: Migration 013 removes this constraint.
---
## Open Questions
1. **Firecrawl API key management**
- What we know: `firecrawl-py` SDK connects to cloud API by default; self-hosted option available
- What's unclear: Whether to add `FIRECRAWL_API_KEY` as a platform-wide setting in `shared/config.py` or as a tenant BYO credential
- Recommendation: Add as platform-wide `FIRECRAWL_API_KEY` in `settings` (same pattern as `BRAVE_API_KEY`); make it optional with graceful degradation
2. **`ChannelTypeEnum` compatibility for `google_calendar`**
- What we know: Phase 1 ADR chose `TEXT + CHECK` over `sa.Enum` to avoid migration DDL conflicts
- What's unclear: Whether there's a CHECK constraint that needs updating, or if it's open TEXT
- Recommendation: Inspect `channel_connections` table DDL in migration 001 before writing migration 013
3. **Document re-index flow**
- What we know: CONTEXT.md mentions a re-index option in the KB portal
- What's unclear: Whether re-index deletes all existing chunks first or appends
- Recommendation: Delete all `kb_chunks` for the document, then re-run `ingest_document.delay()` — simplest and idempotent
4. **Whisper fallback for YouTube**
- What we know: `openai-whisper` requires model download (~140MB minimum) and GPU for reasonable speed
- What's unclear: Whether v1 should include Whisper at all given the infrastructure cost
- Recommendation: Omit Whisper for v1; return error when captions unavailable; add to v2 requirements
---
## Validation Architecture
### Test Framework
| Property | Value |
|----------|-------|
| Framework | pytest + pytest-asyncio (existing) |
| Config file | `pytest.ini` or `pyproject.toml [tool.pytest]` at repo root |
| Quick run command | `pytest tests/unit -x -q` |
| Full suite command | `pytest tests/unit tests/integration -x -q` |
### Phase Requirements → Test Map
| Req ID | Behavior | Test Type | Automated Command | File Exists? |
|--------|----------|-----------|-------------------|-------------|
| CAP-01 | `web_search()` returns Brave results when key is set; gracefully degrades when key is missing | unit | `pytest tests/unit/test_web_search.py -x` | ❌ Wave 0 |
| CAP-02 | `kb_search()` returns ranked chunks for a query after ingestion | integration | `pytest tests/integration/test_kb_search.py -x` | ❌ Wave 0 |
| CAP-03 | File upload endpoint accepts PDF/DOCX/TXT, creates KbDocument with status=processing, triggers Celery task | unit+integration | `pytest tests/unit/test_kb_upload.py tests/integration/test_kb_ingestion.py -x` | ❌ Wave 0 |
| CAP-04 | `http_request()` returns correct response; rejects invalid methods; handles timeout | unit | `pytest tests/unit/test_http_request.py -x` | ❌ Wave 0 |
| CAP-05 | Calendar tool reads tenant token from DB, calls Google API, returns formatted events | unit (mock Google) | `pytest tests/unit/test_calendar_lookup.py -x` | ❌ Wave 0 |
| CAP-06 | Tool results in agent responses are natural language, not raw JSON | unit (prompt check) | `pytest tests/unit/test_tool_response_format.py -x` | ❌ Wave 0 |
| CAP-07 | Every tool invocation writes an audit_events row with tool name + args summary | integration | Covered by existing `tests/integration/test_audit.py` — extend with tool invocation cases | ✅ (extend) |
### Sampling Rate
- **Per task commit:** `pytest tests/unit -x -q`
- **Per wave merge:** `pytest tests/unit tests/integration -x -q`
- **Phase gate:** Full suite green before `/gsd:verify-work`
### Wave 0 Gaps
- [ ] `tests/unit/test_web_search.py` — covers CAP-01 (mock httpx, test key-missing degradation + success path)
- [ ] `tests/unit/test_kb_upload.py` — covers CAP-03 upload endpoint (mock MinIO, mock Celery task dispatch)
- [ ] `tests/unit/test_kb_ingestion.py` — covers text extraction functions per format (PDF, DOCX, TXT, CSV)
- [ ] `tests/integration/test_kb_search.py` — covers CAP-02 (real pgvector, insert test chunks, verify similarity search)
- [ ] `tests/integration/test_kb_ingestion.py` — covers CAP-03 end-to-end (upload → task → chunks in DB)
- [ ] `tests/unit/test_http_request.py` — covers CAP-04 (mock httpx, test method validation, timeout)
- [ ] `tests/unit/test_calendar_lookup.py` — covers CAP-05 (mock Google API, mock DB token lookup)
---
## Sources
### Primary (HIGH confidence)
- FastAPI official docs (https://fastapi.tiangolo.com/tutorial/request-files/) — UploadFile pattern
- Google Calendar API docs (https://developers.google.com/workspace/calendar/api/guides/create-events) — event creation
- Google OAuth 2.0 web server docs (https://developers.google.com/identity/protocols/oauth2/web-server) — token exchange flow
- Existing codebase: `packages/orchestrator/orchestrator/tools/builtins/` — 4 tool files reviewed
- Existing codebase: `migrations/versions/004_phase2_audit_kb.py` — KB schema confirmed
- Existing codebase: `packages/shared/shared/api/channels.py` — Slack OAuth HMAC pattern to reuse
- Existing codebase: `packages/orchestrator/orchestrator/tools/executor.py` — CAP-07 already implemented
### Secondary (MEDIUM confidence)
- PyPI: `youtube-transcript-api` v1.2.4 (Jan 2026) — version + API confirmed
- PyPI: `firecrawl-py` — cloud + self-hosted documented
- WebSearch 2025: pypdf for PDF extraction — confirmed as lightweight, no C-deps option
- WebSearch 2025: Celery sync def constraint confirmed via tasks.py docstring cross-reference
### Tertiary (LOW confidence)
- Chunking parameters (500 chars, 50 overlap) — from community RAG practice, not benchmarked for this dataset
- Firecrawl cloud vs self-hosted recommendation — based on project stage, not measured performance comparison
---
## Metadata
**Confidence breakdown:**
- Standard stack: HIGH — all libraries verified via PyPI + official docs
- Architecture: HIGH — pattern directly extends existing Phase 1-3 Slack OAuth and Celery task patterns in codebase
- Pitfalls: HIGH — agent_id NOT NULL issue is verified directly from migration 004 source code; token write-back is documented in google-auth source
- Chunking strategy: MEDIUM — recommended values are community defaults, not project-specific benchmarks
**Research date:** 2026-03-26
**Valid until:** 2026-06-26 (stable domain; Google OAuth API is very stable)

View File

@@ -0,0 +1,82 @@
---
phase: 10
slug: agent-capabilities
status: draft
nyquist_compliant: false
wave_0_complete: false
created: 2026-03-26
---
# Phase 10 — Validation Strategy
> Per-phase validation contract for feedback sampling during execution.
---
## Test Infrastructure
| Property | Value |
|----------|-------|
| **Framework** | pytest 8.x + pytest-asyncio (existing) |
| **Config file** | `pyproject.toml` (existing) |
| **Quick run command** | `pytest tests/unit -x -q` |
| **Full suite command** | `pytest tests/ -x` |
| **Estimated runtime** | ~45 seconds |
---
## Sampling Rate
- **After every task commit:** Run `pytest tests/unit -x -q`
- **After every plan wave:** Run `pytest tests/ -x`
- **Before `/gsd:verify-work`:** Full suite must be green
- **Max feedback latency:** 45 seconds
---
## Per-Task Verification Map
| Task ID | Plan | Wave | Requirement | Test Type | Automated Command | File Exists | Status |
|---------|------|------|-------------|-----------|-------------------|-------------|--------|
| 10-xx | 01 | 1 | CAP-01 | unit | `pytest tests/unit/test_web_search.py -x` | ❌ W0 | ⬜ pending |
| 10-xx | 01 | 1 | CAP-02,03 | unit | `pytest tests/unit/test_kb_ingestion.py -x` | ❌ W0 | ⬜ pending |
| 10-xx | 01 | 1 | CAP-04 | unit | `pytest tests/unit/test_http_request.py -x` | ❌ W0 | ⬜ pending |
| 10-xx | 02 | 2 | CAP-05 | unit | `pytest tests/unit/test_calendar.py -x` | ❌ W0 | ⬜ pending |
| 10-xx | 02 | 2 | CAP-06 | unit | `pytest tests/unit/test_tool_output.py -x` | ❌ W0 | ⬜ pending |
| 10-xx | 03 | 2 | CAP-03 | build | `cd packages/portal && npx next build` | ✅ | ⬜ pending |
| 10-xx | 03 | 2 | CAP-07 | integration | `pytest tests/integration/test_audit.py -x` | ✅ extend | ⬜ pending |
---
## Wave 0 Requirements
- [ ] `tests/unit/test_web_search.py` — CAP-01: Brave Search API integration
- [ ] `tests/unit/test_kb_ingestion.py` — CAP-02,03: document chunking, embedding, search
- [ ] `tests/unit/test_http_request.py` — CAP-04: HTTP request tool validation
- [ ] `tests/unit/test_calendar.py` — CAP-05: Google Calendar OAuth + CRUD
- [ ] `tests/unit/test_tool_output.py` — CAP-06: natural language tool result formatting
- [ ] Install: `uv add pypdf python-docx python-pptx openpyxl pandas firecrawl-py youtube-transcript-api google-auth google-auth-oauthlib google-api-python-client`
---
## Manual-Only Verifications
| Behavior | Requirement | Why Manual | Test Instructions |
|----------|-------------|------------|-------------------|
| Web search returns real results | CAP-01 | Requires live Brave API key | Send message requiring web search, verify results |
| Document upload + search works end-to-end | CAP-02,03 | Requires file upload + LLM | Upload PDF, ask agent about its content |
| Calendar books a meeting | CAP-05 | Requires live Google Calendar OAuth | Connect calendar, ask agent to book a meeting |
| Agent response reads naturally with tool data | CAP-06 | Qualitative assessment | Chat with agent using tools, verify natural language |
---
## Validation Sign-Off
- [ ] All tasks have `<automated>` verify or Wave 0 dependencies
- [ ] Sampling continuity: no 3 consecutive tasks without automated verify
- [ ] Wave 0 covers all MISSING references
- [ ] No watch-mode flags
- [ ] Feedback latency < 45s
- [ ] `nyquist_compliant: true` set in frontmatter
**Approval:** pending

View File

@@ -0,0 +1,155 @@
---
phase: 10-agent-capabilities
verified: 2026-03-25T22:00:00Z
status: passed
score: 15/15 must-haves verified
re_verification: false
---
# Phase 10: Agent Capabilities Verification Report
**Phase Goal:** Connect the 4 built-in agent tools to real external services so AI Employees can actually search the web, query a knowledge base of uploaded documents, make HTTP API calls, and check calendar availability
**Verified:** 2026-03-25
**Status:** PASSED
**Re-verification:** No — initial verification
---
## Goal Achievement
### Observable Truths
All must-haves are drawn from plan frontmatter across plans 10-01, 10-02, and 10-03.
#### Plan 10-01 Truths
| # | Truth | Status | Evidence |
|---|-------|--------|----------|
| 1 | Documents uploaded via API are saved to MinIO and a KbDocument row is created with status=processing | VERIFIED | `kb.py` L150-157: inserts `KnowledgeBaseDocument(status='processing')`, `L162-176`: uploads bytes to MinIO via boto3 |
| 2 | The Celery ingestion task extracts text from PDF, DOCX, PPTX, XLSX, CSV, TXT, and MD files | VERIFIED | `extractors.py`: real implementations for all 7 formats using pypdf, python-docx, python-pptx, pandas, UTF-8 decode |
| 3 | Extracted text is chunked (500 chars, 50 overlap) and embedded via all-MiniLM-L6-v2 into kb_chunks with tenant_id | VERIFIED | `ingest.py` L56-92: `chunk_text` sliding window; L174: `embed_texts(chunks)`; L186-202: raw SQL INSERT into kb_chunks with CAST vector |
| 4 | kb_search tool receives tenant_id injection from executor and returns matching chunks | VERIFIED | `executor.py` L126-127: `args["tenant_id"] = str(tenant_id)`; `kb_search.py` L24: accepts `tenant_id` kwarg, runs pgvector cosine similarity query |
| 5 | BRAVE_API_KEY and FIRECRAWL_API_KEY are platform-wide settings in shared config | VERIFIED | `config.py` L223-227: `brave_api_key` and `firecrawl_api_key` as Field entries |
| 6 | Tool executor injects tenant_id and agent_id into tool handler kwargs for context-aware tools | VERIFIED | `executor.py` L126-127: injection occurs after schema validation (L98-103), before handler call (L134) |
#### Plan 10-02 Truths
| # | Truth | Status | Evidence |
|---|-------|--------|----------|
| 7 | Tenant admin can initiate Google Calendar OAuth from the portal and authorize calendar access | VERIFIED | `calendar_auth.py` L104-130: `GET /install` endpoint returns Google OAuth URL with HMAC-signed state, offline access, and consent prompt |
| 8 | Calendar OAuth callback exchanges code for tokens and stores them encrypted per tenant | VERIFIED | `calendar_auth.py` L175-235: httpx POST to Google token endpoint, Fernet encrypt, upsert ChannelConnection(channel_type=GOOGLE_CALENDAR) |
| 9 | Calendar tool reads per-tenant OAuth tokens from channel_connections and calls Google Calendar API | VERIFIED | `calendar_lookup.py` L137-147: SELECT ChannelConnection WHERE channel_type=GOOGLE_CALENDAR; L178: builds Google Credentials; L194-207: run_in_executor for API call |
| 10 | Calendar tool supports list events, check availability, and create event actions | VERIFIED | `calendar_lookup.py` L267-273: dispatches to `_action_list`, `_action_check_availability`, `_action_create`; all three fully implemented |
| 11 | Token auto-refresh works — expired access tokens are refreshed via stored refresh_token and written back to DB | VERIFIED | `calendar_lookup.py` L190: records `token_before`; L210-225: if `creds.token != token_before`, encrypts and commits updated token to DB |
| 12 | Tool results are formatted as natural language (no raw JSON) | VERIFIED | `builder.py` L180-181: system prompt appends "Never show raw data or JSON to the user"; all `calendar_lookup` actions return formatted strings, not dicts |
#### Plan 10-03 Truths
| # | Truth | Status | Evidence |
|---|-------|--------|----------|
| 13 | Operators can see a Knowledge Base page in the portal navigation | VERIFIED | `nav.tsx` L49: `{ href: "/knowledge-base", label: t("knowledgeBase"), icon: BookOpen }`; i18n key present in en/es/pt message files |
| 14 | Operators can upload files via drag-and-drop or file picker dialog | VERIFIED | `upload-dialog.tsx` 249 lines: drag-and-drop zone, file picker input, sequential upload via `uploadKbDocument`; `api.ts` uses `new FormData()` |
| 15 | Uploaded documents show processing status with live polling | VERIFIED | `queries.ts` L518-521: `refetchInterval` returns 5000 when any doc has `status === "processing"`, false otherwise |
**Score:** 15/15 truths verified
---
### Required Artifacts
| Artifact | Status | Details |
|----------|--------|---------|
| `migrations/versions/014_kb_status.py` | VERIFIED | Adds status, error_message, chunk_count to kb_documents; makes agent_id nullable |
| `migrations/versions/013_google_calendar_channel.py` | VERIFIED | Adds google_calendar to channel_connections CHECK constraint |
| `packages/orchestrator/orchestrator/tools/extractors.py` | VERIFIED | 142 lines; real implementations for all 7 format families; exports `extract_text` |
| `packages/orchestrator/orchestrator/tools/ingest.py` | VERIFIED | 323 lines; exports `chunk_text` and `ingest_document_pipeline`; full pipeline with MinIO, YouTube, Firecrawl |
| `packages/shared/shared/api/kb.py` | VERIFIED | 377 lines; 5 endpoints; exports `kb_router` |
| `packages/orchestrator/orchestrator/tasks.py` | VERIFIED | `ingest_document` Celery task at L1008-1036; calls `asyncio.run(ingest_document_pipeline(...))` |
| `packages/orchestrator/orchestrator/tools/executor.py` | VERIFIED | Tenant/agent injection at L126-127, after schema validation, before handler call |
| `packages/shared/shared/api/calendar_auth.py` | VERIFIED | Full OAuth flow; exports `calendar_auth_router`; 3 endpoints |
| `packages/orchestrator/orchestrator/tools/builtins/calendar_lookup.py` | VERIFIED | Service account stub replaced; per-tenant OAuth; list/create/check_availability; token refresh write-back |
| `packages/orchestrator/orchestrator/tools/registry.py` | VERIFIED | All 4 tools in registry; calendar_lookup schema updated with action enum, event_summary, event_start, event_end |
| `packages/gateway/gateway/main.py` | VERIFIED | `kb_router` and `calendar_auth_router` mounted at L174-175 |
| `packages/portal/app/(dashboard)/knowledge-base/page.tsx` | VERIFIED | 88 lines; RBAC-conditional buttons; uses session for tenantId |
| `packages/portal/components/kb/document-list.tsx` | VERIFIED | 259 lines; status badges; delete confirm dialog; re-index; polling via `useKbDocuments` |
| `packages/portal/components/kb/upload-dialog.tsx` | VERIFIED | 249 lines; drag-and-drop; file picker; sequential upload with per-file progress |
| `packages/portal/components/kb/url-ingest-dialog.tsx` | VERIFIED | 162 lines; URL input; auto-YouTube detection; radio source type |
| `tests/unit/test_extractors.py` | VERIFIED | Exists on disk |
| `tests/unit/test_kb_upload.py` | VERIFIED | Exists on disk |
| `tests/unit/test_ingestion.py` | VERIFIED | Exists on disk |
| `tests/unit/test_executor_injection.py` | VERIFIED | Exists on disk |
| `tests/unit/test_calendar_lookup.py` | VERIFIED | Exists on disk |
| `tests/unit/test_calendar_auth.py` | VERIFIED | Exists on disk |
---
### Key Link Verification
| From | To | Via | Status | Details |
|------|----|-----|--------|---------|
| `shared/api/kb.py` | `orchestrator/tasks.py` | `ingest_document.delay(document_id, tenant_id)` | WIRED | L185-187 in kb.py: `_get_ingest_task().delay(str(doc_id), str(tenant_id))`; lazy import avoids circular dep |
| `orchestrator/tools/executor.py` | `tool.handler` | `tenant_id/agent_id` injected into kwargs | WIRED | L126-127: `args["tenant_id"] = str(tenant_id); args["agent_id"] = str(agent_id)` after schema validation |
| `shared/api/calendar_auth.py` | `channel_connections` table | Upsert with `channel_type='google_calendar'` and encrypted token | WIRED | L213-233: `enc_svc.encrypt(token_json)`, upsert `ChannelConnection(channel_type=GOOGLE_CALENDAR, config={"token": encrypted_token})` |
| `orchestrator/tools/builtins/calendar_lookup.py` | `channel_connections` table | Load encrypted token, decrypt, build Credentials | WIRED | L137-147: SELECT ChannelConnection; L167-172: `enc_svc.decrypt(encrypted_token)`; L76-83: `Credentials(refresh_token=...)` |
| `portal/components/kb/knowledge-base/page.tsx` | `/api/portal/kb/{tenant_id}/documents` | TanStack Query fetch + polling | WIRED | `document-list.tsx` L30: imports `useKbDocuments`; L111: `const { data } = useKbDocuments(tenantId)`; `queries.ts` L518-521: conditional `refetchInterval` |
| `portal/components/kb/upload-dialog.tsx` | `/api/portal/kb/{tenant_id}/documents` | FormData multipart POST | WIRED | L109: `await uploadKbDocument(tenantId, files[i].file, authHeaders)`; `api.ts` L378: `const formData = new FormData()` |
| `gateway/gateway/main.py` | `kb_router` + `calendar_auth_router` | `app.include_router(...)` | WIRED | L174-175: both routers mounted |
---
### Requirements Coverage
| Requirement | Source Plan | Description | Status | Evidence |
|-------------|-------------|-------------|--------|----------|
| CAP-01 | 10-01 | Web search tool returns real results from Brave Search | SATISFIED | `web_search.py` L23: `_BRAVE_API_URL = "https://api.search.brave.com/res/v1/web/search"`; L40: `settings.brave_api_key`; full httpx call with error handling |
| CAP-02 | 10-01 | KB tool searches tenant-scoped documents chunked and embedded in pgvector | SATISFIED | `kb_search.py`: pgvector cosine similarity query on kb_chunks; executor injects tenant_id; `ingest.py`: embed_texts + INSERT with CAST vector |
| CAP-03 | 10-01, 10-03 | Operators can upload documents (PDF, DOCX, TXT) via portal | SATISFIED | `kb.py`: upload endpoint + Celery dispatch; portal KB page with upload dialog, URL ingest, status polling, delete, reindex |
| CAP-04 | 10-02 (confirmed) | HTTP request tool can call operator-configured URLs with timeout | SATISFIED | `http_request.py`: full httpx implementation, 30s timeout, 1MB cap, in registry |
| CAP-05 | 10-02 | Calendar tool can check Google Calendar availability and create events | SATISFIED | `calendar_lookup.py`: per-tenant OAuth, list/check_availability/create actions; full Google Calendar API integration |
| CAP-06 | 10-02 | Tool results incorporated naturally — no raw JSON shown to users | SATISFIED | `builder.py` L180-181: system prompt instruction; all tool handlers return formatted strings |
| CAP-07 | 10-02 (confirmed) | All tool invocations logged in audit trail | SATISFIED | `executor.py` L137-145: `audit_logger.log_tool_call(...)` on every success; L153-161: logged on every error; L192: logged on validation failure |
**All 7 requirements satisfied. No orphaned requirements.**
---
### Anti-Patterns Found
None detected. Scanned all key backend and portal files for TODO, FIXME, placeholder, `return null`, `return {}`, `console.log` — none found.
---
### Human Verification Required
**1. Google Calendar OAuth end-to-end flow**
**Test:** With GOOGLE_CLIENT_ID/SECRET configured, navigate to portal settings, click "Connect Google Calendar", complete Google consent, verify redirect back with `?calendar=connected`
**Expected:** Token stored in channel_connections; subsequent agent messages can list/create calendar events
**Why human:** External OAuth redirect flow cannot be verified programmatically without real Google credentials and a live browser session
**2. Knowledge Base document ingestion end-to-end**
**Test:** Upload a PDF or DOCX via the portal KB page, wait for status to change from "Processing" to "Ready", then send a message to an agent with kb_search assigned that references the document content
**Expected:** Agent correctly cites information from the uploaded document
**Why human:** Requires live MinIO, Celery worker, pgvector DB, and LLM inference stack to be running
**3. Portal RBAC enforcement on KB page**
**Test:** Log in as a customer_operator user, navigate to /knowledge-base
**Expected:** Document list is visible; "Upload Files" and "Add URL" buttons are hidden; Delete and Re-index action buttons are hidden
**Why human:** RBAC conditional rendering requires live portal with a real operator session
**4. Web search returns real results**
**Test:** With BRAVE_API_KEY set, trigger an agent tool call to `web_search` with a current events query
**Expected:** Agent receives and summarizes real search results, not cached or static data
**Why human:** Requires live Brave API key and working agent inference loop
---
### Gaps Summary
No gaps. All 15 must-have truths verified, all 7 requirements satisfied (CAP-01 through CAP-07), all key links wired, no anti-patterns found, all artifacts are substantive implementations (not stubs).
Notable: The portal KB implementation (Plan 10-03) is in a git submodule at `packages/portal`. The commit `c525c02` exists in the submodule log but is not surfaced in the parent repo's git log — this is expected submodule behavior. The files exist on disk and are substantive.
---
_Verified: 2026-03-25_
_Verifier: Claude (gsd-verifier)_

103
CHANGELOG.md Normal file
View File

@@ -0,0 +1,103 @@
# Changelog
All notable changes to Konstruct are documented in this file.
## [1.0.0] - 2026-03-26
### Phase 10: Agent Capabilities
- Knowledge base ingestion pipeline — upload PDF, DOCX, PPTX, XLSX, CSV, TXT, Markdown; add URLs (Firecrawl scraping); add YouTube videos (transcript extraction)
- Async document processing via Celery — chunk, embed (all-MiniLM-L6-v2), store in pgvector
- KB management portal page with drag-and-drop upload, live status polling, delete, reindex
- Google Calendar OAuth per tenant — list events, check availability, create events
- Token auto-refresh with encrypted DB write-back
- Web search connected to Brave Search API (platform-wide key)
- Tool executor injects tenant_id/agent_id into all tool handlers
- System prompt includes tool result formatting instruction (no raw JSON)
### Phase 9: Testing & QA
- Playwright E2E test suite — 29 tests across 7 critical flows (login, tenants, agent deploy, chat, RBAC, i18n, mobile)
- Cross-browser testing — Chromium, Firefox, WebKit
- Visual regression snapshots at 3 viewports (desktop, tablet, mobile)
- axe-core accessibility scans on all pages
- Lighthouse CI score gating (>= 80 hard floor)
- Gitea Actions CI pipeline — backend lint + pytest → portal build + E2E + Lighthouse
### Phase 8: Mobile + PWA
- Responsive mobile layout with bottom tab bar (Dashboard, Employees, Chat, Usage, More)
- Full-screen WhatsApp-style mobile chat with back arrow + agent name header
- Visual Viewport API keyboard handling for iOS
- PWA manifest with K monogram icons
- Service worker (Serwist) with app shell + runtime caching
- Web Push notifications (VAPID) with push subscription management
- IndexedDB offline message queue with drain-on-reconnect
- Smart install prompt on second visit
- iOS safe-area support
### Phase 7: Multilanguage
- Full portal UI localization — English, Spanish, Portuguese
- next-intl v4 (cookie-based locale, no URL routing)
- Language switcher in sidebar (post-auth) and login page (pre-auth)
- Browser locale auto-detection on first visit
- Language preference saved to DB, synced to JWT
- Agent templates translated in all 3 languages (JSONB translations column)
- System prompt language instruction — agents auto-detect and respond in user's language
- Localized invitation emails
### Phase 6: Web Chat
- Real-time WebSocket chat in the portal
- Direct LLM streaming from WebSocket handler (bypasses Celery for speed)
- Token-by-token streaming via NDJSON → Redis pub-sub → WebSocket
- Conversation persistence (web_conversations + web_conversation_messages tables)
- Agent picker dialog for new conversations
- Markdown rendering (react-markdown + remark-gfm)
- Typing indicator during LLM generation
- All roles can chat (operators included)
### Phase 5: Employee Design
- Three-path AI employee creation: Templates / Guided Setup / Advanced
- 6 pre-built agent templates (Customer Support Rep, Sales Assistant, Marketing Manager, Office Manager, Project Coordinator, Finance & Accounting Manager)
- 5-step wizard (Role → Persona → Tools → Channels → Escalation)
- System prompt auto-generation from wizard inputs
- Templates stored as DB seed data with one-click deploy
- Agent Designer as "Advanced" mode
### Phase 4: RBAC
- Three-tier roles: platform_admin, customer_admin, customer_operator
- FastAPI RBAC guard dependencies (require_platform_admin, require_tenant_admin, require_tenant_member)
- Email invitation flow with HMAC tokens (48-hour expiry, resend capability)
- SMTP email sending via Python stdlib
- Portal navigation and API endpoints enforce role-based access
- Impersonation for platform admins with audit trail
- Global user management page
### Phase 3: Operator Experience
- Slack OAuth "Add to Slack" flow with HMAC state protection
- WhatsApp guided manual setup
- 3-step onboarding wizard (Connect → Configure → Test)
- Stripe subscription management (per-agent $49/month, 14-day trial)
- BYO API key management with Fernet encryption + MultiFernet key rotation
- Cost dashboard with Recharts (token usage, provider costs, message volume, budget alerts)
- Agent-level cost tracking and budget limits
### Phase 2: Agent Features
- Two-layer conversational memory (Redis sliding window + pgvector HNSW)
- Cross-conversation memory keyed per-user per-agent
- Tool framework with 4 built-in tools (web search, KB search, HTTP request, calendar)
- Schema-validated tool execution with confirmation flow for side-effecting actions
- Immutable audit logging (REVOKE UPDATE/DELETE at DB level)
- WhatsApp Business Cloud API adapter with Meta 2026 policy compliance
- Two-tier business-function scoping (keyword allowlist + role-based LLM)
- Human escalation with DM delivery, full transcript, and assistant mode
- Cross-channel bidirectional media support with multimodal LLM interpretation
### Phase 1: Foundation
- Monorepo with uv workspaces
- Docker Compose dev environment (PostgreSQL 16 + pgvector, Redis, Ollama)
- PostgreSQL Row Level Security with FORCE ROW LEVEL SECURITY
- Shared Pydantic models (KonstructMessage) and SQLAlchemy 2.0 async ORM
- LiteLLM Router with Ollama + Anthropic/OpenAI and fallback routing
- Celery orchestrator with sync-def pattern (asyncio.run)
- Slack adapter (Events API) with typing indicator
- Message Router with tenant resolution, rate limiting, idempotency
- Next.js 16 admin portal with Auth.js v5, tenant CRUD, Agent Designer
- Premium UI design system (indigo brand, dark sidebar, glass-morphism, DM Sans)

167
README.md Normal file
View File

@@ -0,0 +1,167 @@
# Konstruct
**Build your AI workforce.** Deploy AI employees that work in the channels your team already uses — Slack, WhatsApp, and the built-in web chat. Zero behavior change required.
---
## What is Konstruct?
Konstruct is an AI workforce platform where SMBs subscribe to AI employees. Each AI employee has a name, role, persona, and tools — and communicates through familiar messaging channels. Think of it as "hire an AI department" rather than "subscribe to another SaaS dashboard."
### Key Features
- **Channel-native AI employees** — Agents respond in Slack, WhatsApp, and the portal web chat
- **Knowledge base** — Upload documents (PDF, DOCX, PPTX, Excel, CSV, TXT, Markdown), URLs, and YouTube videos. Agents search them automatically.
- **Google Calendar** — Agents check availability, list events, and book meetings via OAuth
- **Web search** — Agents search the web via Brave Search API
- **Real-time streaming** — Web chat streams LLM responses word-by-word
- **6 pre-built templates** — Customer Support Rep, Sales Assistant, Marketing Manager, Office Manager, Project Coordinator, Finance & Accounting Manager
- **Employee wizard** — 5-step guided setup or one-click template deployment
- **3-tier RBAC** — Platform admin, customer admin, customer operator with email invitation flow
- **Multilanguage** — English, Spanish, Portuguese (portal UI + agent responses)
- **Mobile + PWA** — Bottom tab bar, full-screen chat, push notifications, offline support
- **Stripe billing** — Per-agent monthly pricing with 14-day free trial
- **BYO API keys** — Tenants can bring their own LLM provider keys (Fernet encrypted)
---
## Quick Start
### Prerequisites
- Docker + Docker Compose
- Ollama running on the host (port 11434)
- Node.js 22+ (for portal development)
- Python 3.12+ with `uv` (for backend development)
### Setup
```bash
# Clone
git clone https://git.oe74.net/adelorenzo/konstruct.git
cd konstruct
# Configure
cp .env.example .env
# Edit .env — set OLLAMA_MODEL, API keys, SMTP, etc.
# Start all services
docker compose up -d
# Create admin user
curl -X POST http://localhost:8001/api/portal/auth/register \
-H "Content-Type: application/json" \
-d '{"email": "admin@example.com", "password": "YourPassword123", "name": "Admin"}'
# Set as platform admin
docker exec konstruct-postgres psql -U postgres -d konstruct \
-c "UPDATE portal_users SET role = 'platform_admin' WHERE email = 'admin@example.com';"
```
Open `http://localhost:3000` and sign in.
### Services
| Service | Port | Description |
|---------|------|-------------|
| Portal | 3000 | Next.js admin dashboard |
| Gateway | 8001 | FastAPI API + WebSocket |
| LLM Pool | internal | LiteLLM router (Ollama + commercial) |
| Celery Worker | internal | Background task processing |
| PostgreSQL | internal | Primary database with RLS + pgvector |
| Redis | internal | Cache, sessions, pub-sub, task queue |
---
## Architecture
```
Client (Slack / WhatsApp / Web Chat)
┌─────────────────────┐
│ Channel Gateway │ Unified ingress, normalizes to KonstructMessage
│ (FastAPI :8001) │
└────────┬────────────┘
┌─────────────────────┐
│ Agent Orchestrator │ Memory, tools, escalation, audit
│ (Celery / Direct) │ Web chat streams directly (no Celery)
└────────┬────────────┘
┌─────────────────────┐
│ LLM Backend Pool │ LiteLLM → Ollama / Anthropic / OpenAI
└─────────────────────┘
```
---
## Tech Stack
### Backend
- **Python 3.12+** — FastAPI, SQLAlchemy 2.0, Pydantic v2, Celery
- **PostgreSQL 16** — RLS multi-tenancy, pgvector for embeddings
- **Redis** — Cache, pub-sub, task queue, sliding window memory
- **LiteLLM** — Unified LLM provider routing with fallback
### Frontend
- **Next.js 16** — App Router, standalone output
- **Tailwind CSS v4** — Utility-first styling
- **shadcn/ui** — Component library (base-nova style)
- **next-intl** — Internationalization (en/es/pt)
- **Serwist** — Service worker for PWA
- **DM Sans** — Primary font
### Infrastructure
- **Docker Compose** — Development and deployment
- **Alembic** — Database migrations (14 migrations)
- **Playwright** — E2E testing (7 flows, 3 browsers)
- **Gitea Actions** — CI/CD pipeline
---
## Configuration
All configuration is via environment variables in `.env`:
| Variable | Description | Default |
|----------|-------------|---------|
| `OLLAMA_MODEL` | Ollama model for local inference | `qwen3:32b` |
| `OLLAMA_BASE_URL` | Ollama server URL | `http://host.docker.internal:11434` |
| `ANTHROPIC_API_KEY` | Anthropic API key (optional) | — |
| `OPENAI_API_KEY` | OpenAI API key (optional) | — |
| `BRAVE_API_KEY` | Brave Search API key | — |
| `FIRECRAWL_API_KEY` | Firecrawl API key for URL scraping | — |
| `STRIPE_SECRET_KEY` | Stripe billing key | — |
| `AUTH_SECRET` | JWT signing secret | — |
| `PLATFORM_ENCRYPTION_KEY` | Fernet key for BYO API key encryption | — |
See `.env.example` for the complete list.
---
## Project Structure
```
konstruct/
├── packages/
│ ├── gateway/ # Channel Gateway (FastAPI)
│ ├── orchestrator/ # Agent Orchestrator (Celery tasks)
│ ├── llm-pool/ # LLM Backend Pool (LiteLLM)
│ ├── router/ # Message Router (tenant resolution, rate limiting)
│ ├── shared/ # Shared models, config, API routers
│ └── portal/ # Admin Portal (Next.js 16)
├── migrations/ # Alembic DB migrations
├── tests/ # Backend test suite
├── docker-compose.yml # Service definitions
├── .planning/ # GSD planning artifacts
└── .env # Environment configuration
```
---
## License
Proprietary. All rights reserved.

View File

@@ -0,0 +1,52 @@
"""Add google_calendar to channel_type CHECK constraint
Revision ID: 013
Revises: 012
Create Date: 2026-03-26
Adds 'google_calendar' to the valid channel types in channel_connections.
This enables per-tenant Google Calendar OAuth token storage alongside
existing Slack/WhatsApp/web connections.
Steps:
1. Drop old CHECK constraint on channel_connections.channel_type
2. Re-create it with the updated list including 'google_calendar'
"""
from __future__ import annotations
from alembic import op
# Alembic revision identifiers
revision: str = "013"
down_revision: str | None = "012"
branch_labels = None
depends_on = None
# All valid channel types including 'google_calendar'
_CHANNEL_TYPES = (
"slack", "whatsapp", "mattermost", "rocketchat", "teams", "telegram", "signal", "web", "google_calendar"
)
def upgrade() -> None:
# Drop the existing CHECK constraint (added in 008_web_chat.py as chk_channel_type)
op.execute("ALTER TABLE channel_connections DROP CONSTRAINT IF EXISTS chk_channel_type")
# Re-create with the updated list
op.execute(
"ALTER TABLE channel_connections ADD CONSTRAINT chk_channel_type "
f"CHECK (channel_type IN {tuple(_CHANNEL_TYPES)})"
)
def downgrade() -> None:
# Restore 008's constraint (without google_calendar)
_PREV_TYPES = (
"slack", "whatsapp", "mattermost", "rocketchat", "teams", "telegram", "signal", "web"
)
op.execute("ALTER TABLE channel_connections DROP CONSTRAINT IF EXISTS chk_channel_type")
op.execute(
"ALTER TABLE channel_connections ADD CONSTRAINT chk_channel_type "
f"CHECK (channel_type IN {tuple(_PREV_TYPES)})"
)

View File

@@ -0,0 +1,84 @@
"""KB document status columns and agent_id nullable
Revision ID: 014
Revises: 013
Create Date: 2026-03-26
Changes:
- kb_documents.status TEXT NOT NULL DEFAULT 'processing' (CHECK constraint)
- kb_documents.error_message TEXT NULL
- kb_documents.chunk_count INTEGER NULL
- kb_documents.agent_id DROP NOT NULL (make nullable — KB is per-tenant, not per-agent)
Note: google_calendar channel type was added in migration 013.
This migration is numbered 014 and depends on 013.
"""
from __future__ import annotations
from typing import Sequence, Union
import sqlalchemy as sa
from alembic import op
revision: str = "014"
down_revision: Union[str, None] = "013"
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
# --------------------------------------------------------------------------
# 1. Add status, error_message, chunk_count columns to kb_documents
# --------------------------------------------------------------------------
op.add_column(
"kb_documents",
sa.Column(
"status",
sa.Text(),
nullable=False,
server_default="processing",
comment="Document ingestion status: processing | ready | error",
),
)
op.add_column(
"kb_documents",
sa.Column(
"error_message",
sa.Text(),
nullable=True,
comment="Error details when status='error'",
),
)
op.add_column(
"kb_documents",
sa.Column(
"chunk_count",
sa.Integer(),
nullable=True,
comment="Number of chunks created after ingestion",
),
)
# CHECK constraint on status values
op.create_check_constraint(
"ck_kb_documents_status",
"kb_documents",
"status IN ('processing', 'ready', 'error')",
)
# --------------------------------------------------------------------------
# 2. Make agent_id nullable — KB is per-tenant, not per-agent
# --------------------------------------------------------------------------
op.alter_column("kb_documents", "agent_id", nullable=True)
def downgrade() -> None:
# Restore agent_id NOT NULL
op.alter_column("kb_documents", "agent_id", nullable=False)
# Drop added columns
op.drop_constraint("ck_kb_documents_status", "kb_documents", type_="check")
op.drop_column("kb_documents", "chunk_count")
op.drop_column("kb_documents", "error_message")
op.drop_column("kb_documents", "status")

View File

@@ -17,6 +17,8 @@ Endpoints:
GET /api/portal/tenants/{id}/llm-keys — BYO LLM key management
GET /api/portal/usage/* — Usage and cost analytics
POST /api/webhooks/* — Stripe webhook receiver
GET /api/portal/kb/* — Knowledge base document management
GET /api/portal/calendar/* — Google Calendar OAuth endpoints
GET /health — Health check
Startup sequence:
@@ -43,9 +45,11 @@ from gateway.channels.web import web_chat_router
from gateway.channels.whatsapp import whatsapp_router
from shared.api import (
billing_router,
calendar_auth_router,
channels_router,
chat_router,
invitations_router,
kb_router,
llm_keys_router,
portal_router,
push_router,
@@ -164,6 +168,12 @@ app.include_router(web_chat_router) # WebSocket: /chat/ws/{conversation_id}
# ---------------------------------------------------------------------------
app.include_router(push_router) # Push subscribe/unsubscribe/send
# ---------------------------------------------------------------------------
# Phase 10 Agent Capabilities routers
# ---------------------------------------------------------------------------
app.include_router(kb_router) # KB documents: /api/portal/kb/{tenant_id}/documents
app.include_router(calendar_auth_router) # Google Calendar OAuth: /api/portal/calendar/*
# ---------------------------------------------------------------------------
# Routes

View File

@@ -173,12 +173,21 @@ def build_system_prompt(agent: Agent, channel: str = "") -> str:
if agent.persona and agent.persona.strip():
parts.append(f"Persona: {agent.persona.strip()}")
# 4. AI transparency clause — unconditional, non-overridable
# 4. Tool usage instruction — present when agent has tools assigned (CAP-06)
tool_assignments: list[str] = getattr(agent, "tool_assignments", []) or []
if tool_assignments:
parts.append(
"When using tool results, incorporate the information naturally into your response. "
"Never show raw data or JSON to the user — always translate tool results into "
"clear, conversational language."
)
# 5. AI transparency clause — unconditional, non-overridable
parts.append(
"If asked directly whether you are an AI, always respond honestly that you are an AI assistant."
)
# 5. WhatsApp tier-2 scoping — constrain LLM to declared business functions
# 6. WhatsApp tier-2 scoping — constrain LLM to declared business functions
if channel == "whatsapp":
functions: list[str] = getattr(agent, "tool_assignments", []) or []
if functions:

View File

@@ -997,3 +997,45 @@ async def _update_slack_placeholder(
channel_id,
placeholder_ts,
)
# =============================================================================
# KB Document Ingestion Task
# =============================================================================
@app.task(
name="orchestrator.tasks.ingest_document",
bind=True,
max_retries=2,
default_retry_delay=60,
ignore_result=True,
)
def ingest_document(self, document_id: str, tenant_id: str) -> None: # type: ignore[override]
"""
Celery task: run the KB document ingestion pipeline.
Downloads the document from MinIO (or scrapes URL/YouTube), extracts text,
chunks, embeds with all-MiniLM-L6-v2, and stores kb_chunks rows.
Updates kb_documents.status to 'ready' on success, 'error' on failure.
MUST be sync def — Celery workers are not async-native. asyncio.run() is
used to bridge the sync Celery world to the async pipeline.
Args:
document_id: UUID string of the KnowledgeBaseDocument row.
tenant_id: UUID string of the owning tenant.
"""
from orchestrator.tools.ingest import ingest_document_pipeline
try:
asyncio.run(ingest_document_pipeline(document_id, tenant_id))
except Exception as exc:
logger.exception(
"ingest_document task failed for document=%s tenant=%s: %s",
document_id,
tenant_id,
exc,
)
self.retry(exc=exc, countdown=60)

View File

@@ -1,108 +1,302 @@
"""
Built-in tool: calendar_lookup
Reads calendar events from Google Calendar for a given date.
Reads and creates Google Calendar events using per-tenant OAuth tokens.
Authentication options (in priority order):
1. GOOGLE_SERVICE_ACCOUNT_KEY env var — JSON key for service account impersonation
2. Per-tenant OAuth (future: Phase 3 portal) — not yet implemented
3. Graceful degradation: returns informative message if not configured
Authentication:
Tokens are stored per-tenant in channel_connections (channel_type='google_calendar').
The tenant admin must complete the OAuth flow via /api/portal/calendar/install first.
If no token is found, returns an informative message asking admin to connect.
This tool is read-only (requires_confirmation=False in registry).
Actions:
- list: List events for the given date (default)
- check_availability: Return free/busy summary for the given date
- create: Create a new calendar event
Token auto-refresh:
google.oauth2.credentials.Credentials auto-refreshes expired access tokens
using the stored refresh_token. After each API call, if credentials.token
changed (refresh occurred), the updated token is encrypted and written back
to channel_connections so subsequent calls don't re-trigger refresh.
All results are formatted as natural language strings — no raw JSON exposed.
"""
from __future__ import annotations
import asyncio
import json
import logging
import os
from datetime import datetime, timezone
import uuid
from typing import Any
# Module-level imports for patchability in tests.
# google-auth and googleapiclient are optional dependencies — import errors handled
# gracefully in the functions that use them.
try:
from googleapiclient.discovery import build # type: ignore[import-untyped]
except ImportError:
build = None # type: ignore[assignment]
from shared.config import settings
from shared.crypto import KeyEncryptionService
logger = logging.getLogger(__name__)
# Google Calendar API scope (must match what was requested during OAuth)
_CALENDAR_SCOPE = "https://www.googleapis.com/auth/calendar"
_GOOGLE_TOKEN_URL = "https://oauth2.googleapis.com/token"
def google_credentials_from_token(token_dict: dict[str, Any]) -> Any:
"""
Build a google.oauth2.credentials.Credentials object from a stored token dict.
The token dict is the JSON structure written by calendar_auth.py during OAuth:
{
"token": "ya29.access_token",
"refresh_token": "1//refresh_token",
"token_uri": "https://oauth2.googleapis.com/token",
"client_id": "...",
"client_secret": "...",
"scopes": ["https://www.googleapis.com/auth/calendar"]
}
Args:
token_dict: Parsed token dictionary.
Returns:
google.oauth2.credentials.Credentials instance.
Raises:
ImportError: If google-auth is not installed.
"""
from google.oauth2.credentials import Credentials # type: ignore[import-untyped]
return Credentials(
token=token_dict.get("token"),
refresh_token=token_dict.get("refresh_token"),
token_uri=token_dict.get("token_uri", _GOOGLE_TOKEN_URL),
client_id=token_dict.get("client_id"),
client_secret=token_dict.get("client_secret"),
scopes=token_dict.get("scopes", [_CALENDAR_SCOPE]),
)
async def calendar_lookup(
date: str,
action: str = "list",
event_summary: str | None = None,
event_start: str | None = None,
event_end: str | None = None,
calendar_id: str = "primary",
tenant_id: str | None = None,
_session: Any = None, # Injected in tests; production uses DB session from task context
**kwargs: object,
) -> str:
"""
Look up calendar events for a specific date.
Look up, check availability, or create Google Calendar events for a specific date.
Args:
date: Date in YYYY-MM-DD format.
date: Date in YYYY-MM-DD format (required).
action: One of "list", "check_availability", "create". Default: "list".
event_summary: Event title (required for action="create").
event_start: ISO 8601 datetime with timezone (required for action="create").
event_end: ISO 8601 datetime with timezone (required for action="create").
calendar_id: Google Calendar ID. Defaults to 'primary'.
tenant_id: Konstruct tenant UUID string. Required for token lookup.
_session: Injected AsyncSession (for testing). Production passes None.
Returns:
Formatted string listing events for the given date,
or an informative message if Google Calendar is not configured.
Natural language string describing the result.
"""
service_account_key_json = os.getenv("GOOGLE_SERVICE_ACCOUNT_KEY", "")
if not service_account_key_json:
return (
"Calendar lookup is not configured. "
"Set the GOOGLE_SERVICE_ACCOUNT_KEY environment variable to enable calendar access."
)
# Guard: tenant_id is required to look up per-tenant OAuth token
if not tenant_id:
return "Calendar not available: missing tenant context."
# Get DB session
session = _session
if session is None:
# Production: obtain a session from the DB pool
# Import here to avoid circular imports at module load time
try:
from shared.db import async_session_factory
session = async_session_factory()
# Note: caller is responsible for closing the session
# In practice, the orchestrator task context manages session lifecycle
except Exception:
logger.exception("Failed to create DB session for calendar_lookup")
return "Calendar lookup failed: unable to connect to the database."
try:
import asyncio
tenant_uuid = uuid.UUID(tenant_id)
except ValueError:
return f"Calendar lookup failed: invalid tenant ID '{tenant_id}'."
result = await asyncio.get_event_loop().run_in_executor(
None,
_fetch_calendar_events_sync,
service_account_key_json,
calendar_id,
date,
# Load per-tenant OAuth token from channel_connections
try:
from sqlalchemy import select
from shared.models.tenant import ChannelConnection, ChannelTypeEnum
result = await session.execute(
select(ChannelConnection).where(
ChannelConnection.tenant_id == tenant_uuid,
ChannelConnection.channel_type == ChannelTypeEnum.GOOGLE_CALENDAR,
)
return result
)
conn = result.scalar_one_or_none()
except Exception:
logger.exception("Calendar lookup failed for date=%s calendar=%s", date, calendar_id)
logger.exception("DB error loading calendar connection for tenant=%s", tenant_id)
return "Calendar lookup failed: database error loading calendar connection."
if conn is None:
return (
"Google Calendar is not connected for this tenant. "
"Ask an admin to connect it in Settings."
)
# Decrypt token
encrypted_token = conn.config.get("token", "")
if not encrypted_token:
return "Calendar lookup failed: no token found in connection config."
try:
if not settings.platform_encryption_key:
return "Calendar lookup failed: encryption key not configured."
enc_svc = KeyEncryptionService(
primary_key=settings.platform_encryption_key,
previous_key=settings.platform_encryption_key_previous,
)
token_json: str = enc_svc.decrypt(encrypted_token)
token_dict: dict[str, Any] = json.loads(token_json)
except Exception:
logger.exception("Failed to decrypt calendar token for tenant=%s", tenant_id)
return "Calendar lookup failed: unable to decrypt stored credentials."
# Build Google credentials
try:
creds = google_credentials_from_token(token_dict)
except ImportError:
return (
"Google Calendar library not installed. "
"Run: uv add google-api-python-client google-auth"
)
except Exception:
logger.exception("Failed to build Google credentials for tenant=%s", tenant_id)
return "Calendar lookup failed: invalid stored credentials."
# Record the token before the API call to detect refresh
token_before = creds.token
# Execute the API call in a thread executor (blocking SDK)
try:
result_str = await asyncio.get_event_loop().run_in_executor(
None,
_execute_calendar_action,
creds,
action,
date,
calendar_id,
event_summary,
event_start,
event_end,
)
except Exception:
logger.exception("Calendar API call failed for tenant=%s date=%s action=%s", tenant_id, date, action)
return f"Calendar lookup failed for {date}. Please try again."
# Token refresh write-back: if token changed after the API call, persist the update
if creds.token and creds.token != token_before:
try:
new_token_dict = {
"token": creds.token,
"refresh_token": creds.refresh_token or token_dict.get("refresh_token", ""),
"token_uri": token_dict.get("token_uri", _GOOGLE_TOKEN_URL),
"client_id": token_dict.get("client_id", ""),
"client_secret": token_dict.get("client_secret", ""),
"scopes": token_dict.get("scopes", [_CALENDAR_SCOPE]),
}
new_encrypted = enc_svc.encrypt(json.dumps(new_token_dict))
conn.config = {"token": new_encrypted}
await session.commit()
logger.debug("Calendar token refreshed and written back for tenant=%s", tenant_id)
except Exception:
logger.exception("Failed to write back refreshed calendar token for tenant=%s", tenant_id)
# Non-fatal: the API call succeeded, just log the refresh failure
def _fetch_calendar_events_sync(
service_account_key_json: str,
calendar_id: str,
return result_str
def _execute_calendar_action(
creds: Any,
action: str,
date: str,
calendar_id: str,
event_summary: str | None,
event_start: str | None,
event_end: str | None,
) -> str:
"""
Synchronous implementation — runs in thread executor to avoid blocking event loop.
Synchronous calendar action — runs in thread executor to avoid blocking.
Uses google-api-python-client with service account credentials.
Args:
creds: Google Credentials object.
action: One of "list", "check_availability", "create".
date: Date in YYYY-MM-DD format.
calendar_id: Google Calendar ID (default "primary").
event_summary: Title for create action.
event_start: ISO 8601 start for create action.
event_end: ISO 8601 end for create action.
Returns:
Natural language result string.
"""
try:
from google.oauth2 import service_account
from googleapiclient.discovery import build
except ImportError:
if build is None:
return (
"Google Calendar library not installed. "
"Run: uv add google-api-python-client google-auth"
)
try:
key_data = json.loads(service_account_key_json)
except json.JSONDecodeError:
return "Invalid GOOGLE_SERVICE_ACCOUNT_KEY: not valid JSON."
try:
credentials = service_account.Credentials.from_service_account_info(
key_data,
scopes=["https://www.googleapis.com/auth/calendar.readonly"],
)
service = build("calendar", "v3", credentials=creds, cache_discovery=False)
except Exception as exc:
return f"Failed to create Google credentials: {exc}"
logger.warning("Failed to build Google Calendar service: %s", exc)
return f"Calendar service error: {exc}"
# Parse the date and create RFC3339 time boundaries for the day
if action == "create":
return _action_create(service, calendar_id, event_summary, event_start, event_end)
elif action == "check_availability":
return _action_check_availability(service, calendar_id, date)
else:
# Default: "list"
return _action_list(service, calendar_id, date)
def _time_boundaries(date: str) -> tuple[str, str]:
"""Return (time_min, time_max) RFC3339 strings for the full given day (UTC)."""
return f"{date}T00:00:00Z", f"{date}T23:59:59Z"
def _format_event_time(event: dict[str, Any]) -> str:
"""Extract and format the start time of a calendar event."""
start = event.get("start", {})
raw = start.get("dateTime") or start.get("date") or "Unknown time"
# Trim the timezone part for readability if full datetime
if "T" in raw:
try:
date_obj = datetime.strptime(date, "%Y-%m-%d").replace(tzinfo=timezone.utc)
except ValueError:
return f"Invalid date format: {date!r}. Expected YYYY-MM-DD."
# e.g. "2026-03-26T09:00:00+00:00" → "09:00"
time_part = raw.split("T")[1][:5]
return time_part
except IndexError:
return raw
return raw
time_min = date_obj.strftime("%Y-%m-%dT00:00:00Z")
time_max = date_obj.strftime("%Y-%m-%dT23:59:59Z")
def _action_list(service: Any, calendar_id: str, date: str) -> str:
"""List calendar events for the given date."""
time_min, time_max = _time_boundaries(date)
try:
service = build("calendar", "v3", credentials=credentials, cache_discovery=False)
events_result = (
service.events()
.list(
@@ -115,17 +309,89 @@ def _fetch_calendar_events_sync(
.execute()
)
except Exception as exc:
logger.warning("Google Calendar API error: %s", exc)
return f"Calendar API error: {exc}"
logger.warning("Google Calendar list error: %s", exc)
return f"Calendar error listing events for {date}: {exc}"
items = events_result.get("items", [])
if not items:
return f"No events found on {date}."
lines = [f"Calendar events for {date}:\n"]
lines = [f"Calendar events for {date}:"]
for event in items:
start = event["start"].get("dateTime", event["start"].get("date", "Unknown time"))
time_str = _format_event_time(event)
summary = event.get("summary", "Untitled event")
lines.append(f"- {start}: {summary}")
lines.append(f"- {time_str}: {summary}")
return "\n".join(lines)
def _action_check_availability(service: Any, calendar_id: str, date: str) -> str:
"""Return a free/busy summary for the given date."""
time_min, time_max = _time_boundaries(date)
try:
events_result = (
service.events()
.list(
calendarId=calendar_id,
timeMin=time_min,
timeMax=time_max,
singleEvents=True,
orderBy="startTime",
)
.execute()
)
except Exception as exc:
logger.warning("Google Calendar availability check error: %s", exc)
return f"Calendar error checking availability for {date}: {exc}"
items = events_result.get("items", [])
if not items:
return f"No events on {date} — the entire day is free."
lines = [f"Busy slots on {date}:"]
for event in items:
time_str = _format_event_time(event)
summary = event.get("summary", "Untitled event")
lines.append(f"- {time_str}: {summary}")
return "\n".join(lines)
def _action_create(
service: Any,
calendar_id: str,
event_summary: str | None,
event_start: str | None,
event_end: str | None,
) -> str:
"""Create a new calendar event."""
if not event_summary or not event_start or not event_end:
missing = []
if not event_summary:
missing.append("event_summary")
if not event_start:
missing.append("event_start")
if not event_end:
missing.append("event_end")
return f"Cannot create event: missing required fields: {', '.join(missing)}."
event_body = {
"summary": event_summary,
"start": {"dateTime": event_start},
"end": {"dateTime": event_end},
}
try:
created = (
service.events()
.insert(calendarId=calendar_id, body=event_body)
.execute()
)
except Exception as exc:
logger.warning("Google Calendar create error: %s", exc)
return f"Failed to create calendar event: {exc}"
summary = created.get("summary", event_summary)
start = created.get("start", {}).get("dateTime", event_start)
end = created.get("end", {}).get("dateTime", event_end)
return f"Event created: {summary} from {start} to {end}."

View File

@@ -13,10 +13,11 @@ raising an exception (graceful degradation for agents without search configured)
from __future__ import annotations
import logging
import os
import httpx
from shared.config import settings
logger = logging.getLogger(__name__)
_BRAVE_API_URL = "https://api.search.brave.com/res/v1/web/search"
@@ -24,24 +25,26 @@ _BRAVE_TIMEOUT = httpx.Timeout(timeout=15.0, connect=5.0)
_MAX_RESULTS = 3
async def web_search(query: str) -> str:
async def web_search(query: str, **kwargs: object) -> str:
"""
Search the web using Brave Search API.
Args:
query: The search query string.
**kwargs: Accepts injected tenant_id/agent_id from executor (unused).
Returns:
Formatted string with top 3 search results (title + URL + description),
or an error message if the API is unavailable.
"""
api_key = os.getenv("BRAVE_API_KEY", "")
api_key = settings.brave_api_key
if not api_key:
return (
"Web search is not configured. "
"Set the BRAVE_API_KEY environment variable to enable web search."
)
try:
async with httpx.AsyncClient(timeout=_BRAVE_TIMEOUT) as client:
response = await client.get(

View File

@@ -119,7 +119,15 @@ async def execute_tool(
return confirmation_msg
# ------------------------------------------------------------------
# 5. Execute the handler
# 5. Inject tenant context into args AFTER schema validation
# This ensures kb_search, calendar_lookup, and future context-aware
# tools receive tenant/agent context without the LLM providing it.
# ------------------------------------------------------------------
args["tenant_id"] = str(tenant_id)
args["agent_id"] = str(agent_id)
# ------------------------------------------------------------------
# 6. Execute the handler
# ------------------------------------------------------------------
start_ms = time.monotonic()
try:

View File

@@ -0,0 +1,141 @@
"""
Text extraction functions for knowledge base document ingestion.
Supports: PDF, DOCX, PPTX, XLSX/XLS, CSV, TXT, MD
Usage:
text = extract_text("document.pdf", pdf_bytes)
text = extract_text("report.docx", docx_bytes)
Raises:
ValueError: If the file extension is not supported.
"""
from __future__ import annotations
import io
import logging
import os
logger = logging.getLogger(__name__)
# Supported extensions grouped by extraction method
_PDF_EXTENSIONS = {".pdf"}
_DOCX_EXTENSIONS = {".docx"}
_PPTX_EXTENSIONS = {".pptx"}
_SPREADSHEET_EXTENSIONS = {".xlsx", ".xls"}
_TEXT_EXTENSIONS = {".csv", ".txt", ".md"}
_ALL_SUPPORTED = (
_PDF_EXTENSIONS
| _DOCX_EXTENSIONS
| _PPTX_EXTENSIONS
| _SPREADSHEET_EXTENSIONS
| _TEXT_EXTENSIONS
)
# Minimum characters for a PDF to be considered successfully extracted
# Below this threshold the PDF likely needs OCR (scanned/image-only PDF)
_PDF_MIN_CHARS = 100
def extract_text(filename: str, file_bytes: bytes) -> str:
"""
Extract plain text from a document given its filename and raw bytes.
Args:
filename: Original filename including extension (e.g., "report.pdf").
The extension determines which parser to use.
file_bytes: Raw bytes of the document.
Returns:
Extracted plain text as a string.
Raises:
ValueError: If the file extension is not in the supported set.
"""
_, ext = os.path.splitext(filename.lower())
if ext in _PDF_EXTENSIONS:
return _extract_pdf(file_bytes)
elif ext in _DOCX_EXTENSIONS:
return _extract_docx(file_bytes)
elif ext in _PPTX_EXTENSIONS:
return _extract_pptx(file_bytes)
elif ext in _SPREADSHEET_EXTENSIONS:
return _extract_spreadsheet(file_bytes)
elif ext in _TEXT_EXTENSIONS:
return _extract_text_plain(file_bytes)
else:
raise ValueError(
f"Unsupported file extension: '{ext}'. "
f"Supported formats: {', '.join(sorted(_ALL_SUPPORTED))}"
)
def _extract_pdf(file_bytes: bytes) -> str:
"""Extract text from a PDF file using pypdf."""
from pypdf import PdfReader
reader = PdfReader(io.BytesIO(file_bytes))
pages_text: list[str] = []
for page in reader.pages:
page_text = page.extract_text() or ""
if page_text.strip():
pages_text.append(page_text)
text = "\n".join(pages_text)
if len(text.strip()) < _PDF_MIN_CHARS:
logger.warning("PDF text extraction yielded < %d chars — PDF may be image-only", _PDF_MIN_CHARS)
return (
f"This PDF appears to be image-only or contains very little extractable text "
f"({len(text.strip())} characters). OCR is not supported in the current version. "
f"Please provide a text-based PDF or convert it to a text document first."
)
return text
def _extract_docx(file_bytes: bytes) -> str:
"""Extract text from a DOCX file using python-docx."""
from docx import Document
doc = Document(io.BytesIO(file_bytes))
paragraphs = [para.text for para in doc.paragraphs if para.text.strip()]
return "\n".join(paragraphs)
def _extract_pptx(file_bytes: bytes) -> str:
"""Extract text from a PPTX file using python-pptx."""
from pptx import Presentation
from pptx.util import Pt # noqa: F401 — imported for type completeness
prs = Presentation(io.BytesIO(file_bytes))
slide_texts: list[str] = []
for slide_num, slide in enumerate(prs.slides, start=1):
texts: list[str] = []
for shape in slide.shapes:
if shape.has_text_frame:
for para in shape.text_frame.paragraphs:
line = "".join(run.text for run in para.runs).strip()
if line:
texts.append(line)
if texts:
slide_texts.append(f"[Slide {slide_num}]\n" + "\n".join(texts))
return "\n\n".join(slide_texts)
def _extract_spreadsheet(file_bytes: bytes) -> str:
"""Extract text from XLSX/XLS files as CSV-formatted text using pandas."""
import pandas as pd
df = pd.read_excel(io.BytesIO(file_bytes))
return df.to_csv(index=False)
def _extract_text_plain(file_bytes: bytes) -> str:
"""Decode a plain text file (CSV, TXT, MD) as UTF-8."""
return file_bytes.decode("utf-8", errors="replace")

View File

@@ -0,0 +1,322 @@
"""
Knowledge base document ingestion pipeline.
This module provides:
chunk_text() — sliding window text chunker
ingest_document_pipeline() — async pipeline: fetch → extract → chunk → embed → store
Pipeline steps:
1. Load KnowledgeBaseDocument from DB
2. Download file from MinIO (if filename) OR scrape URL / fetch YouTube transcript
3. Extract text using orchestrator.tools.extractors.extract_text
4. Chunk text with sliding window (500 chars, 50 overlap)
5. Batch embed chunks via all-MiniLM-L6-v2
6. INSERT kb_chunks rows with vector embeddings
7. UPDATE kb_documents SET status='ready', chunk_count=N
On any error: UPDATE kb_documents SET status='error', error_message=str(exc)
IMPORTANT: This module is called from a Celery task via asyncio.run(). All DB
and MinIO operations are async. The embedding call (embed_texts) is synchronous
(SentenceTransformer is sync) — this is fine inside asyncio.run().
"""
from __future__ import annotations
import logging
import uuid
from typing import Any
import boto3
from shared.config import settings
from shared.db import async_session_factory, engine
from shared.rls import configure_rls_hook, current_tenant_id
from orchestrator.memory.embedder import embed_texts
from orchestrator.tools.extractors import extract_text
logger = logging.getLogger(__name__)
# Default chunking parameters
_DEFAULT_CHUNK_SIZE = 500
_DEFAULT_OVERLAP = 50
def _get_minio_client() -> Any:
"""Create a boto3 S3 client pointed at MinIO."""
return boto3.client(
"s3",
endpoint_url=settings.minio_endpoint,
aws_access_key_id=settings.minio_access_key,
aws_secret_access_key=settings.minio_secret_key,
)
def chunk_text(
text: str,
chunk_size: int = _DEFAULT_CHUNK_SIZE,
overlap: int = _DEFAULT_OVERLAP,
) -> list[str]:
"""
Split text into overlapping chunks using a sliding window.
Args:
text: The text to chunk.
chunk_size: Maximum characters per chunk.
overlap: Number of characters to overlap between consecutive chunks.
Returns:
List of non-empty text chunks. Returns empty list for empty/whitespace text.
"""
text = text.strip()
if not text:
return []
if len(text) <= chunk_size:
return [text]
chunks: list[str] = []
start = 0
step = chunk_size - overlap
while start < len(text):
end = start + chunk_size
chunk = text[start:end].strip()
if chunk:
chunks.append(chunk)
if end >= len(text):
break
start += step
return chunks
async def ingest_document_pipeline(document_id: str, tenant_id: str) -> None:
"""
Run the full document ingestion pipeline for a KB document.
Steps:
1. Load the KnowledgeBaseDocument from the database
2. Fetch content (MinIO file OR URL scrape OR YouTube transcript)
3. Extract plain text
4. Chunk text
5. Embed chunks
6. Store kb_chunks rows in the database
7. Mark document as 'ready'
On any error: set status='error' with error_message.
Args:
document_id: UUID string of the KnowledgeBaseDocument to process.
tenant_id: UUID string of the tenant (for RLS context).
"""
from sqlalchemy import select, text as sa_text
from shared.models.kb import KnowledgeBaseDocument
tenant_uuid = uuid.UUID(tenant_id)
doc_uuid = uuid.UUID(document_id)
configure_rls_hook(engine)
token = current_tenant_id.set(tenant_uuid)
try:
async with async_session_factory() as session:
result = await session.execute(
select(KnowledgeBaseDocument).where(
KnowledgeBaseDocument.id == doc_uuid
)
)
doc = result.scalar_one_or_none()
if doc is None:
logger.warning(
"ingest_document_pipeline: document %s not found, skipping",
document_id,
)
return
filename = doc.filename
source_url = doc.source_url
# ------------------------------------------------------------------
# Step 2: Fetch content
# ------------------------------------------------------------------
try:
file_bytes: bytes | None = None
extracted_text: str
if filename:
# Download from MinIO
bucket = settings.minio_kb_bucket
key = f"{tenant_id}/{document_id}/{filename}"
minio = _get_minio_client()
response = minio.get_object(Bucket=bucket, Key=key)
file_bytes = response.read()
extracted_text = extract_text(filename, file_bytes)
elif source_url:
extracted_text = await _fetch_url_content(source_url)
else:
raise ValueError("Document has neither filename nor source_url")
# ------------------------------------------------------------------
# Step 3-4: Chunk text
# ------------------------------------------------------------------
chunks = chunk_text(extracted_text)
if not chunks:
raise ValueError("No text content could be extracted from this document")
# ------------------------------------------------------------------
# Step 5: Embed chunks
# ------------------------------------------------------------------
embeddings = embed_texts(chunks)
# ------------------------------------------------------------------
# Step 6: Insert kb_chunks
# ------------------------------------------------------------------
# Delete any existing chunks for this document first
await session.execute(
sa_text("DELETE FROM kb_chunks WHERE document_id = :doc_id"),
{"doc_id": str(doc_uuid)},
)
for idx, (chunk_content, embedding) in enumerate(zip(chunks, embeddings)):
embedding_str = "[" + ",".join(str(x) for x in embedding) + "]"
await session.execute(
sa_text("""
INSERT INTO kb_chunks
(tenant_id, document_id, content, chunk_index, embedding)
VALUES
(:tenant_id, :document_id, :content, :chunk_index,
CAST(:embedding AS vector))
"""),
{
"tenant_id": str(tenant_uuid),
"document_id": str(doc_uuid),
"content": chunk_content,
"chunk_index": idx,
"embedding": embedding_str,
},
)
# ------------------------------------------------------------------
# Step 7: Mark document as ready
# ------------------------------------------------------------------
doc.status = "ready"
doc.chunk_count = len(chunks)
doc.error_message = None
await session.commit()
logger.info(
"ingest_document_pipeline: %s ingested %d chunks for document %s",
tenant_id,
len(chunks),
document_id,
)
except Exception as exc:
logger.exception(
"ingest_document_pipeline: error processing document %s: %s",
document_id,
exc,
)
# Try to mark document as error
try:
doc.status = "error"
doc.error_message = str(exc)
await session.commit()
except Exception:
logger.exception(
"ingest_document_pipeline: failed to mark document %s as error",
document_id,
)
finally:
current_tenant_id.reset(token)
async def _fetch_url_content(url: str) -> str:
"""
Fetch text content from a URL.
Supports:
- YouTube URLs (via youtube-transcript-api)
- Generic web pages (via firecrawl-py, graceful fallback if key not set)
"""
if _is_youtube_url(url):
return await _fetch_youtube_transcript(url)
else:
return await _scrape_web_url(url)
def _is_youtube_url(url: str) -> bool:
"""Return True if the URL is a YouTube video."""
return "youtube.com" in url or "youtu.be" in url
async def _fetch_youtube_transcript(url: str) -> str:
"""Fetch YouTube video transcript using youtube-transcript-api."""
try:
from youtube_transcript_api import YouTubeTranscriptApi
# Extract video ID from URL
video_id = _extract_youtube_id(url)
if not video_id:
raise ValueError(f"Could not extract YouTube video ID from URL: {url}")
transcript = YouTubeTranscriptApi.get_transcript(video_id)
return " ".join(entry["text"] for entry in transcript)
except Exception as exc:
raise ValueError(f"Failed to fetch YouTube transcript: {exc}") from exc
def _extract_youtube_id(url: str) -> str | None:
"""Extract YouTube video ID from various URL formats."""
import re
patterns = [
r"youtube\.com/watch\?v=([a-zA-Z0-9_-]+)",
r"youtu\.be/([a-zA-Z0-9_-]+)",
r"youtube\.com/embed/([a-zA-Z0-9_-]+)",
]
for pattern in patterns:
match = re.search(pattern, url)
if match:
return match.group(1)
return None
async def _scrape_web_url(url: str) -> str:
"""Scrape a web URL to markdown using firecrawl-py."""
if not settings.firecrawl_api_key:
# Fallback: try simple httpx fetch
return await _simple_fetch(url)
try:
from firecrawl import FirecrawlApp
app = FirecrawlApp(api_key=settings.firecrawl_api_key)
result = app.scrape_url(url, params={"formats": ["markdown"]})
if isinstance(result, dict):
return result.get("markdown", result.get("content", str(result)))
return str(result)
except Exception as exc:
logger.warning("Firecrawl failed for %s: %s — falling back to simple fetch", url, exc)
return await _simple_fetch(url)
async def _simple_fetch(url: str) -> str:
"""Simple httpx GET fetch as fallback for URL scraping."""
import httpx
try:
async with httpx.AsyncClient(timeout=30.0) as client:
response = await client.get(url, follow_redirects=True)
response.raise_for_status()
return response.text
except Exception as exc:
raise ValueError(f"Failed to fetch URL {url}: {exc}") from exc

View File

@@ -142,24 +142,52 @@ BUILTIN_TOOLS: dict[str, ToolDefinition] = {
"calendar_lookup": ToolDefinition(
name="calendar_lookup",
description=(
"Look up calendar events for a specific date. "
"Returns availability and scheduled events from Google Calendar."
"Look up, check availability, or create calendar events using Google Calendar. "
"Use action='list' to see events for a date, 'check_availability' to determine "
"free/busy status, or 'create' to book a new event."
),
parameters={
"type": "object",
"properties": {
"date": {
"type": "string",
"description": "The date to check in YYYY-MM-DD format.",
"description": "The date in YYYY-MM-DD format.",
},
"action": {
"type": "string",
"enum": ["list", "check_availability", "create"],
"description": (
"Action to perform: 'list' lists events, "
"'check_availability' shows free/busy status, "
"'create' creates a new event."
),
},
"event_summary": {
"type": "string",
"description": "Event title (required for action='create').",
},
"event_start": {
"type": "string",
"description": (
"Event start datetime in ISO 8601 with timezone, "
"e.g. '2026-03-26T10:00:00+00:00' (required for action='create')."
),
},
"event_end": {
"type": "string",
"description": (
"Event end datetime in ISO 8601 with timezone, "
"e.g. '2026-03-26T11:00:00+00:00' (required for action='create')."
),
},
"calendar_id": {
"type": "string",
"description": "Google Calendar ID. Defaults to 'primary'.",
},
},
"required": ["date"],
"required": ["date", "action"],
},
requires_confirmation=False, # Read-only calendar lookup
requires_confirmation=False, # list/check are read-only; create is confirmed by user intent
handler=_calendar_lookup_handler,
),
}

View File

@@ -14,6 +14,15 @@ dependencies = [
"httpx>=0.28.0",
"sentence-transformers>=3.0.0",
"jsonschema>=4.26.0",
"pypdf>=6.9.2",
"python-docx>=1.2.0",
"python-pptx>=1.0.2",
"openpyxl>=3.1.5",
"pandas>=3.0.1",
"firecrawl-py>=4.21.0",
"youtube-transcript-api>=1.2.4",
"google-api-python-client>=2.193.0",
"google-auth-oauthlib>=1.3.0",
]
[tool.uv.sources]

Submodule packages/portal updated: 5c2e42a851...c525c0271b

View File

@@ -5,9 +5,11 @@ Import and mount these routers in service main.py files.
"""
from shared.api.billing import billing_router, webhook_router
from shared.api.calendar_auth import calendar_auth_router
from shared.api.channels import channels_router
from shared.api.chat import chat_router
from shared.api.invitations import invitations_router
from shared.api.kb import kb_router
from shared.api.llm_keys import llm_keys_router
from shared.api.portal import portal_router
from shared.api.push import push_router
@@ -25,4 +27,6 @@ __all__ = [
"templates_router",
"chat_router",
"push_router",
"kb_router",
"calendar_auth_router",
]

View File

@@ -0,0 +1,310 @@
"""
Google Calendar OAuth API endpoints — per-tenant OAuth install + callback.
Endpoints:
GET /api/portal/calendar/install?tenant_id={id}
→ generates HMAC-signed state, returns Google OAuth URL
GET /api/portal/calendar/callback?code={code}&state={state}
→ verifies state, exchanges code for tokens, stores encrypted in channel_connections
GET /api/portal/calendar/{tenant_id}/status
→ returns {"connected": bool}
OAuth state uses the same HMAC-SHA256 signed state pattern as Slack OAuth
(see shared.api.channels.generate_oauth_state / verify_oauth_state).
Token storage:
Token JSON is encrypted with the platform KeyEncryptionService (Fernet) and
stored in channel_connections with channel_type='google_calendar'.
workspace_id is set to str(tenant_id) — Google Calendar is per-tenant,
not per-workspace, so the tenant UUID serves as the workspace identifier.
Token auto-refresh:
The calendar_lookup tool handles refresh via google-auth library.
This module is responsible for initial OAuth install and status checks only.
"""
from __future__ import annotations
import json
import uuid
from typing import Any
import httpx
from fastapi import APIRouter, Depends, HTTPException, Query, status
from fastapi.responses import RedirectResponse
from sqlalchemy import select
from sqlalchemy.ext.asyncio import AsyncSession
from shared.api.channels import generate_oauth_state, verify_oauth_state
from shared.api.rbac import PortalCaller, require_tenant_admin, require_tenant_member
from shared.config import settings
from shared.crypto import KeyEncryptionService
from shared.db import get_session
from shared.models.tenant import ChannelConnection, ChannelTypeEnum
calendar_auth_router = APIRouter(prefix="/api/portal/calendar", tags=["calendar"])
# Google Calendar OAuth scopes — full read+write (locked decision: operators need CRUD)
_CALENDAR_SCOPE = "https://www.googleapis.com/auth/calendar"
# Google OAuth endpoints
_GOOGLE_AUTH_URL = "https://accounts.google.com/o/oauth2/v2/auth"
_GOOGLE_TOKEN_URL = "https://oauth2.googleapis.com/token"
# ---------------------------------------------------------------------------
# Helper: build OAuth URL
# ---------------------------------------------------------------------------
def build_calendar_oauth_url(tenant_id: str, secret: str) -> str:
"""
Build a Google OAuth 2.0 authorization URL for Calendar access.
Args:
tenant_id: Tenant UUID as string — embedded in the HMAC-signed state.
secret: HMAC secret for state generation (oauth_state_secret).
Returns:
Full Google OAuth authorization URL ready to redirect the user to.
"""
state = generate_oauth_state(tenant_id=tenant_id, secret=secret)
redirect_uri = f"{settings.portal_url}/api/portal/calendar/callback"
params = (
f"?client_id={settings.google_client_id}"
f"&redirect_uri={redirect_uri}"
f"&response_type=code"
f"&scope={_CALENDAR_SCOPE}"
f"&access_type=offline"
f"&prompt=consent"
f"&state={state}"
)
return f"{_GOOGLE_AUTH_URL}{params}"
def _get_encryption_service() -> KeyEncryptionService:
"""Return the platform-level KeyEncryptionService."""
if not settings.platform_encryption_key:
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail="PLATFORM_ENCRYPTION_KEY not configured",
)
return KeyEncryptionService(
primary_key=settings.platform_encryption_key,
previous_key=settings.platform_encryption_key_previous,
)
# ---------------------------------------------------------------------------
# Endpoint: GET /install
# ---------------------------------------------------------------------------
@calendar_auth_router.get("/install")
async def calendar_install(
tenant_id: uuid.UUID = Query(...),
caller: PortalCaller = Depends(require_tenant_admin),
) -> dict[str, str]:
"""
Generate the Google Calendar OAuth authorization URL.
Returns a JSON object with a 'url' key. The operator's browser should
be redirected to this URL to begin the Google OAuth consent flow.
"""
if not settings.oauth_state_secret:
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail="OAUTH_STATE_SECRET not configured",
)
if not settings.google_client_id:
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail="GOOGLE_CLIENT_ID not configured",
)
url = build_calendar_oauth_url(
tenant_id=str(tenant_id),
secret=settings.oauth_state_secret,
)
return {"url": url}
# ---------------------------------------------------------------------------
# Callback handler (shared between endpoint and tests)
# ---------------------------------------------------------------------------
async def handle_calendar_callback(
code: str,
state: str,
session: AsyncSession,
) -> str:
"""
Process the Google OAuth callback: verify state, exchange code, store token.
Args:
code: Authorization code from Google.
state: HMAC-signed state parameter.
session: Async DB session for storing the ChannelConnection.
Returns:
Redirect URL string (portal /settings?calendar=connected).
Raises:
HTTPException 400 if state is invalid or token exchange fails.
"""
if not settings.oauth_state_secret:
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail="OAUTH_STATE_SECRET not configured",
)
# Verify HMAC state to recover tenant_id
try:
tenant_id_str = verify_oauth_state(state=state, secret=settings.oauth_state_secret)
tenant_id = uuid.UUID(tenant_id_str)
except (ValueError, Exception) as exc:
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail=f"Invalid OAuth state: {exc}",
) from exc
redirect_uri = f"{settings.portal_url}/api/portal/calendar/callback"
# Exchange authorization code for tokens
async with httpx.AsyncClient(timeout=30.0) as client:
response = await client.post(
_GOOGLE_TOKEN_URL,
data={
"code": code,
"client_id": settings.google_client_id,
"client_secret": settings.google_client_secret,
"redirect_uri": redirect_uri,
"grant_type": "authorization_code",
},
)
if response.status_code != 200:
raise HTTPException(
status_code=status.HTTP_502_BAD_GATEWAY,
detail="Google token exchange failed",
)
token_data: dict[str, Any] = response.json()
if "error" in token_data:
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail=f"Google OAuth error: {token_data.get('error_description', token_data['error'])}",
)
# Build token JSON for storage (google-auth credentials format)
token_json = json.dumps({
"token": token_data.get("access_token", ""),
"refresh_token": token_data.get("refresh_token", ""),
"token_uri": _GOOGLE_TOKEN_URL,
"client_id": settings.google_client_id,
"client_secret": settings.google_client_secret,
"scopes": [_CALENDAR_SCOPE],
})
# Encrypt before storage
enc_svc = _get_encryption_service()
encrypted_token = enc_svc.encrypt(token_json)
# Upsert ChannelConnection for google_calendar
existing = await session.execute(
select(ChannelConnection).where(
ChannelConnection.tenant_id == tenant_id,
ChannelConnection.channel_type == ChannelTypeEnum.GOOGLE_CALENDAR,
)
)
conn = existing.scalar_one_or_none()
if conn is None:
conn = ChannelConnection(
tenant_id=tenant_id,
channel_type=ChannelTypeEnum.GOOGLE_CALENDAR,
workspace_id=str(tenant_id), # tenant UUID as workspace_id
config={"token": encrypted_token},
)
session.add(conn)
else:
conn.config = {"token": encrypted_token}
await session.commit()
return f"{settings.portal_url}/settings?calendar=connected"
# ---------------------------------------------------------------------------
# Endpoint: GET /callback
# ---------------------------------------------------------------------------
@calendar_auth_router.get("/callback")
async def calendar_callback(
code: str = Query(...),
state: str = Query(...),
session: AsyncSession = Depends(get_session),
) -> RedirectResponse:
"""
Handle the Google Calendar OAuth callback from Google.
No auth guard — this endpoint receives an external redirect from Google
(no session cookie available during OAuth flow).
Verifies HMAC state, exchanges code for tokens, stores encrypted token,
then redirects to portal /settings?calendar=connected.
"""
redirect_url = await handle_calendar_callback(code=code, state=state, session=session)
return RedirectResponse(url=redirect_url, status_code=status.HTTP_302_FOUND)
# ---------------------------------------------------------------------------
# Status check helper (for tests)
# ---------------------------------------------------------------------------
async def get_calendar_status(
tenant_id: uuid.UUID,
session: AsyncSession,
) -> dict[str, bool]:
"""
Check if a Google Calendar connection exists for a tenant.
Args:
tenant_id: Tenant UUID to check.
session: Async DB session.
Returns:
{"connected": True} if a ChannelConnection exists, {"connected": False} otherwise.
"""
result = await session.execute(
select(ChannelConnection).where(
ChannelConnection.tenant_id == tenant_id,
ChannelConnection.channel_type == ChannelTypeEnum.GOOGLE_CALENDAR,
)
)
conn = result.scalar_one_or_none()
return {"connected": conn is not None}
# ---------------------------------------------------------------------------
# Endpoint: GET /{tenant_id}/status
# ---------------------------------------------------------------------------
@calendar_auth_router.get("/{tenant_id}/status")
async def calendar_status(
tenant_id: uuid.UUID,
caller: PortalCaller = Depends(require_tenant_member),
session: AsyncSession = Depends(get_session),
) -> dict[str, bool]:
"""
Check if Google Calendar is connected for a tenant.
Returns {"connected": true} if the tenant has authorized Google Calendar,
{"connected": false} otherwise.
"""
return await get_calendar_status(tenant_id=tenant_id, session=session)

View File

@@ -0,0 +1,376 @@
"""
Knowledge Base management API endpoints for the Konstruct portal.
Endpoints:
POST /api/portal/kb/{tenant_id}/documents — upload a file
POST /api/portal/kb/{tenant_id}/documents/url — ingest from URL/YouTube
GET /api/portal/kb/{tenant_id}/documents — list documents
DELETE /api/portal/kb/{tenant_id}/documents/{doc_id} — delete document
POST /api/portal/kb/{tenant_id}/documents/{doc_id}/reindex — re-run ingestion
Upload flow:
1. Validate file extension against supported list
2. Upload raw bytes to MinIO kb-documents bucket (key: {tenant_id}/{doc_id}/{filename})
3. Insert KnowledgeBaseDocument row (status='processing')
4. Dispatch ingest_document.delay(doc_id, tenant_id) Celery task
5. Return 201 with {id, filename, status}
The Celery task handles text extraction, chunking, and embedding asynchronously.
Status is updated to 'ready' or 'error' when ingestion completes.
"""
from __future__ import annotations
import logging
import os
import uuid
from datetime import datetime
from typing import Annotated, Any
import boto3
from botocore.exceptions import ClientError
from fastapi import APIRouter, Depends, File, HTTPException, UploadFile, status
from pydantic import BaseModel, HttpUrl
from sqlalchemy import delete, select
from sqlalchemy.ext.asyncio import AsyncSession
from shared.api.rbac import PortalCaller, require_tenant_admin, require_tenant_member
from shared.config import settings
from shared.db import get_session
from shared.models.kb import KBChunk, KnowledgeBaseDocument
logger = logging.getLogger(__name__)
kb_router = APIRouter(prefix="/api/portal/kb", tags=["knowledge-base"])
# Supported file extensions for upload
_SUPPORTED_EXTENSIONS = {
".pdf", ".docx", ".pptx", ".xlsx", ".xls", ".csv", ".txt", ".md"
}
# ---------------------------------------------------------------------------
# Lazy Celery task import — avoids circular dependency at module load time
# ---------------------------------------------------------------------------
def _get_ingest_task() -> Any:
"""Return the ingest_document Celery task (lazy import to avoid circular deps)."""
from orchestrator.tasks import ingest_document # noqa: PLC0415
return ingest_document
# Convenience alias — tests can patch 'shared.api.kb.ingest_document'
def ingest_document(document_id: str, tenant_id: str) -> None: # type: ignore[empty-body]
"""Placeholder — replaced at call site via _get_ingest_task()."""
# ---------------------------------------------------------------------------
# MinIO client helper
# ---------------------------------------------------------------------------
def _get_minio_client() -> Any:
"""Create a boto3 S3 client pointed at MinIO."""
return boto3.client(
"s3",
endpoint_url=settings.minio_endpoint,
aws_access_key_id=settings.minio_access_key,
aws_secret_access_key=settings.minio_secret_key,
)
def _ensure_bucket(client: Any, bucket: str) -> None:
"""Create bucket if it doesn't exist."""
try:
client.head_bucket(Bucket=bucket)
except ClientError:
try:
client.create_bucket(Bucket=bucket)
except ClientError as exc:
logger.warning("Could not create bucket %s: %s", bucket, exc)
# ---------------------------------------------------------------------------
# Pydantic schemas
# ---------------------------------------------------------------------------
class DocumentResponse(BaseModel):
"""Response schema for a knowledge base document."""
id: str
filename: str | None
source_url: str | None
content_type: str | None
status: str
chunk_count: int | None
created_at: datetime
class UrlIngestRequest(BaseModel):
"""Request body for URL/YouTube ingestion."""
url: str
source_type: str = "web" # "web" | "youtube"
# ---------------------------------------------------------------------------
# POST /{tenant_id}/documents — file upload
# ---------------------------------------------------------------------------
@kb_router.post(
"/{tenant_id}/documents",
status_code=status.HTTP_201_CREATED,
response_model=DocumentResponse,
summary="Upload a document to the knowledge base",
)
async def upload_document(
tenant_id: uuid.UUID,
file: Annotated[UploadFile, File(description="Document file to ingest")],
caller: Annotated[PortalCaller, Depends(require_tenant_admin)],
session: Annotated[AsyncSession, Depends(get_session)],
) -> DocumentResponse:
"""
Upload a document and dispatch the ingestion pipeline.
Supported formats: PDF, DOCX, PPTX, XLSX, XLS, CSV, TXT, MD
"""
filename = file.filename or "upload"
_, ext = os.path.splitext(filename.lower())
if ext not in _SUPPORTED_EXTENSIONS:
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail=f"Unsupported file type '{ext}'. Supported: {', '.join(sorted(_SUPPORTED_EXTENSIONS))}",
)
file_bytes = await file.read()
content_type = file.content_type or "application/octet-stream"
# Insert document row first to get the ID
doc = KnowledgeBaseDocument(
tenant_id=tenant_id,
agent_id=None,
filename=filename,
content_type=content_type,
status="processing",
)
session.add(doc)
await session.flush() # Populate doc.id
doc_id = doc.id
# Upload to MinIO
bucket = settings.minio_kb_bucket
key = f"{tenant_id}/{doc_id}/{filename}"
try:
minio = _get_minio_client()
_ensure_bucket(minio, bucket)
import io
minio.put_object(
Bucket=bucket,
Key=key,
Body=io.BytesIO(file_bytes),
ContentLength=len(file_bytes),
ContentType=content_type,
)
except Exception as exc:
logger.warning("MinIO upload failed for %s: %s", key, exc)
# Continue — ingestion task will try to re-fetch or fail gracefully
await session.commit()
# Dispatch async ingestion task
try:
task = _get_ingest_task()
task.delay(str(doc_id), str(tenant_id))
except Exception as exc:
logger.exception("Failed to dispatch ingest_document task for %s: %s", doc_id, exc)
return DocumentResponse(
id=str(doc_id),
filename=filename,
source_url=None,
content_type=content_type,
status="processing",
chunk_count=None,
created_at=doc.created_at or datetime.utcnow(),
)
# ---------------------------------------------------------------------------
# POST /{tenant_id}/documents/url — URL / YouTube ingest
# ---------------------------------------------------------------------------
@kb_router.post(
"/{tenant_id}/documents/url",
status_code=status.HTTP_201_CREATED,
response_model=DocumentResponse,
summary="Ingest a URL or YouTube video transcript into the knowledge base",
)
async def ingest_url(
tenant_id: uuid.UUID,
body: UrlIngestRequest,
caller: Annotated[PortalCaller, Depends(require_tenant_admin)],
session: Annotated[AsyncSession, Depends(get_session)],
) -> DocumentResponse:
"""Ingest content from a URL (web page or YouTube video) into the KB."""
doc = KnowledgeBaseDocument(
tenant_id=tenant_id,
agent_id=None,
source_url=body.url,
content_type=None,
status="processing",
)
session.add(doc)
await session.flush()
doc_id = doc.id
await session.commit()
try:
task = _get_ingest_task()
task.delay(str(doc_id), str(tenant_id))
except Exception as exc:
logger.exception("Failed to dispatch ingest_document task for %s: %s", doc_id, exc)
return DocumentResponse(
id=str(doc_id),
filename=None,
source_url=body.url,
content_type=None,
status="processing",
chunk_count=None,
created_at=doc.created_at or datetime.utcnow(),
)
# ---------------------------------------------------------------------------
# GET /{tenant_id}/documents — list
# ---------------------------------------------------------------------------
@kb_router.get(
"/{tenant_id}/documents",
response_model=list[DocumentResponse],
summary="List knowledge base documents for a tenant",
)
async def list_documents(
tenant_id: uuid.UUID,
caller: Annotated[PortalCaller, Depends(require_tenant_member)],
session: Annotated[AsyncSession, Depends(get_session)],
) -> list[DocumentResponse]:
"""List all KB documents for the given tenant with status and chunk count."""
result = await session.execute(
select(KnowledgeBaseDocument)
.where(KnowledgeBaseDocument.tenant_id == tenant_id)
.order_by(KnowledgeBaseDocument.created_at.desc())
)
docs = result.scalars().all()
return [
DocumentResponse(
id=str(doc.id),
filename=doc.filename,
source_url=doc.source_url,
content_type=doc.content_type,
status=doc.status,
chunk_count=doc.chunk_count,
created_at=doc.created_at,
)
for doc in docs
]
# ---------------------------------------------------------------------------
# DELETE /{tenant_id}/documents/{document_id} — delete
# ---------------------------------------------------------------------------
@kb_router.delete(
"/{tenant_id}/documents/{document_id}",
status_code=status.HTTP_204_NO_CONTENT,
summary="Delete a knowledge base document and its chunks",
)
async def delete_document(
tenant_id: uuid.UUID,
document_id: uuid.UUID,
caller: Annotated[PortalCaller, Depends(require_tenant_admin)],
session: Annotated[AsyncSession, Depends(get_session)],
) -> None:
"""Delete a document (CASCADE removes all kb_chunks rows automatically)."""
result = await session.execute(
select(KnowledgeBaseDocument).where(
KnowledgeBaseDocument.id == document_id,
KnowledgeBaseDocument.tenant_id == tenant_id,
)
)
doc = result.scalar_one_or_none()
if doc is None:
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Document not found")
# Remove from MinIO if it was a file upload
if doc.filename:
bucket = settings.minio_kb_bucket
key = f"{tenant_id}/{document_id}/{doc.filename}"
try:
minio = _get_minio_client()
minio.remove_object(Bucket=bucket, Key=key)
except Exception as exc:
logger.warning("MinIO delete failed for %s: %s", key, exc)
await session.delete(doc)
await session.commit()
# ---------------------------------------------------------------------------
# POST /{tenant_id}/documents/{document_id}/reindex — re-run ingestion
# ---------------------------------------------------------------------------
@kb_router.post(
"/{tenant_id}/documents/{document_id}/reindex",
status_code=status.HTTP_202_ACCEPTED,
response_model=DocumentResponse,
summary="Delete existing chunks and re-dispatch the ingestion pipeline",
)
async def reindex_document(
tenant_id: uuid.UUID,
document_id: uuid.UUID,
caller: Annotated[PortalCaller, Depends(require_tenant_admin)],
session: Annotated[AsyncSession, Depends(get_session)],
) -> DocumentResponse:
"""Re-run the ingestion pipeline for an existing document."""
result = await session.execute(
select(KnowledgeBaseDocument).where(
KnowledgeBaseDocument.id == document_id,
KnowledgeBaseDocument.tenant_id == tenant_id,
)
)
doc = result.scalar_one_or_none()
if doc is None:
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Document not found")
# Delete existing chunks so they get re-created
await session.execute(
delete(KBChunk).where(KBChunk.document_id == document_id)
)
# Reset status to processing
doc.status = "processing"
doc.error_message = None
doc.chunk_count = None
await session.commit()
try:
task = _get_ingest_task()
task.delay(str(document_id), str(tenant_id))
except Exception as exc:
logger.exception("Failed to dispatch reindex task for %s: %s", document_id, exc)
return DocumentResponse(
id=str(doc.id),
filename=doc.filename,
source_url=doc.source_url,
content_type=doc.content_type,
status="processing",
chunk_count=None,
created_at=doc.created_at,
)

View File

@@ -96,6 +96,10 @@ class Settings(BaseSettings):
default="konstruct-media",
description="MinIO bucket name for media attachments",
)
minio_kb_bucket: str = Field(
default="kb-documents",
description="MinIO bucket name for knowledge base documents",
)
# -------------------------------------------------------------------------
# LLM Providers
@@ -213,6 +217,30 @@ class Settings(BaseSettings):
description="HMAC secret for signing OAuth state parameters (CSRF protection)",
)
# -------------------------------------------------------------------------
# Web Search / Scraping
# -------------------------------------------------------------------------
brave_api_key: str = Field(
default="",
description="Brave Search API key for the web_search built-in tool",
)
firecrawl_api_key: str = Field(
default="",
description="Firecrawl API key for URL scraping in KB ingestion pipeline",
)
# -------------------------------------------------------------------------
# Google OAuth (Calendar integration)
# -------------------------------------------------------------------------
google_client_id: str = Field(
default="",
description="Google OAuth 2.0 Client ID for Calendar integration",
)
google_client_secret: str = Field(
default="",
description="Google OAuth 2.0 Client Secret for Calendar integration",
)
# -------------------------------------------------------------------------
# Application
# -------------------------------------------------------------------------

View File

@@ -20,6 +20,11 @@ from sqlalchemy import DateTime, ForeignKey, Integer, Text, func
from sqlalchemy.dialects.postgresql import UUID
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column, relationship
# Valid status values for KnowledgeBaseDocument.status
KB_STATUS_PROCESSING = "processing"
KB_STATUS_READY = "ready"
KB_STATUS_ERROR = "error"
class KBBase(DeclarativeBase):
"""Separate declarative base for KB models."""
@@ -47,11 +52,27 @@ class KnowledgeBaseDocument(KBBase):
nullable=False,
index=True,
)
agent_id: Mapped[uuid.UUID] = mapped_column(
agent_id: Mapped[uuid.UUID | None] = mapped_column(
UUID(as_uuid=True),
nullable=False,
nullable=True,
index=True,
comment="Agent this document is associated with",
comment="Agent this document is associated with (nullable — KB is per-tenant)",
)
status: Mapped[str] = mapped_column(
Text,
nullable=False,
server_default=KB_STATUS_PROCESSING,
comment="Ingestion status: processing | ready | error",
)
error_message: Mapped[str | None] = mapped_column(
Text,
nullable=True,
comment="Error details when status='error'",
)
chunk_count: Mapped[int | None] = mapped_column(
Integer,
nullable=True,
comment="Number of chunks created after successful ingestion",
)
filename: Mapped[str | None] = mapped_column(
Text,

View File

@@ -37,6 +37,8 @@ class ChannelTypeEnum(str, enum.Enum):
TEAMS = "teams"
TELEGRAM = "telegram"
SIGNAL = "signal"
WEB = "web"
GOOGLE_CALENDAR = "google_calendar"
class Tenant(Base):

View File

@@ -0,0 +1,205 @@
"""
Unit tests for Google Calendar OAuth endpoints.
Tests:
- /install endpoint returns OAuth URL with HMAC state
- /callback verifies state, stores encrypted token in DB
- /status returns connected=True when token exists, False otherwise
- HMAC state generation and verification work correctly
- Missing credentials configuration handled gracefully
"""
from __future__ import annotations
import base64
import json
import uuid
from unittest.mock import AsyncMock, MagicMock, patch
import pytest
_SECRET = "test-oauth-state-secret"
_TENANT_ID = str(uuid.uuid4())
# ---------------------------------------------------------------------------
# HMAC state helper tests (reuse from channels.py)
# ---------------------------------------------------------------------------
def test_calendar_install_builds_oauth_url():
"""
calendar_install endpoint returns a dict with a 'url' key pointing at
accounts.google.com/o/oauth2/v2/auth.
"""
from shared.api.calendar_auth import build_calendar_oauth_url
url = build_calendar_oauth_url(tenant_id=_TENANT_ID, secret=_SECRET)
assert "accounts.google.com/o/oauth2/v2/auth" in url
assert "client_id=" in url
assert "scope=" in url
assert "state=" in url
assert "access_type=offline" in url
assert "prompt=consent" in url
def test_calendar_oauth_url_contains_signed_state():
"""State parameter in the OAuth URL encodes the tenant_id."""
from shared.api.calendar_auth import build_calendar_oauth_url
from shared.api.channels import verify_oauth_state
url = build_calendar_oauth_url(tenant_id=_TENANT_ID, secret=_SECRET)
# Extract state from URL
import urllib.parse
parsed = urllib.parse.urlparse(url)
params = urllib.parse.parse_qs(parsed.query)
state = params["state"][0]
# Verify the state recovers the tenant_id
recovered = verify_oauth_state(state=state, secret=_SECRET)
assert recovered == _TENANT_ID
def test_calendar_oauth_url_uses_calendar_scope():
"""OAuth URL requests full Google Calendar scope."""
from shared.api.calendar_auth import build_calendar_oauth_url
url = build_calendar_oauth_url(tenant_id=_TENANT_ID, secret=_SECRET)
assert "googleapis.com/auth/calendar" in url
# ---------------------------------------------------------------------------
# Callback token exchange and storage
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_calendar_callback_stores_encrypted_token():
"""
handle_calendar_callback() exchanges code for tokens, encrypts them,
and upserts a ChannelConnection with channel_type='google_calendar'.
"""
from shared.api.calendar_auth import handle_calendar_callback
mock_session = AsyncMock()
mock_session.execute = AsyncMock()
mock_session.add = MagicMock()
mock_session.commit = AsyncMock()
# Simulate no existing connection (first install)
mock_result = MagicMock()
mock_result.scalar_one_or_none.return_value = None
mock_session.execute.return_value = mock_result
token_response = {
"access_token": "ya29.test_access_token",
"refresh_token": "1//test_refresh_token",
"token_type": "Bearer",
"expires_in": 3600,
}
with (
patch("shared.api.calendar_auth.httpx.AsyncClient") as mock_client_cls,
patch("shared.api.calendar_auth.KeyEncryptionService") as mock_enc_cls,
patch("shared.api.calendar_auth.settings") as mock_settings,
):
mock_settings.oauth_state_secret = _SECRET
mock_settings.google_client_id = "test-client-id"
mock_settings.google_client_secret = "test-client-secret"
mock_settings.portal_url = "http://localhost:3000"
mock_settings.platform_encryption_key = "test-key"
mock_settings.platform_encryption_key_previous = ""
mock_response = MagicMock()
mock_response.status_code = 200
mock_response.json.return_value = token_response
mock_http = AsyncMock()
mock_http.__aenter__ = AsyncMock(return_value=mock_http)
mock_http.__aexit__ = AsyncMock(return_value=None)
mock_http.post = AsyncMock(return_value=mock_response)
mock_client_cls.return_value = mock_http
mock_enc = MagicMock()
mock_enc.encrypt.return_value = "encrypted_token_data"
mock_enc_cls.return_value = mock_enc
# Generate a valid state
from shared.api.channels import generate_oauth_state
state = generate_oauth_state(tenant_id=_TENANT_ID, secret=_SECRET)
redirect_url = await handle_calendar_callback(
code="test_auth_code",
state=state,
session=mock_session,
)
# Should redirect to portal settings
assert "settings" in redirect_url or "calendar" in redirect_url
# Session.add should have been called (new ChannelConnection)
mock_session.add.assert_called_once()
# Encryption was called
mock_enc.encrypt.assert_called_once()
# The ChannelConnection passed to add should have google_calendar type
conn = mock_session.add.call_args[0][0]
assert "google_calendar" in str(conn.channel_type).lower()
@pytest.mark.asyncio
async def test_calendar_callback_invalid_state_raises():
"""handle_calendar_callback raises HTTPException for tampered state."""
from fastapi import HTTPException
from shared.api.calendar_auth import handle_calendar_callback
mock_session = AsyncMock()
with patch("shared.api.calendar_auth.settings") as mock_settings:
mock_settings.oauth_state_secret = _SECRET
with pytest.raises(HTTPException) as exc_info:
await handle_calendar_callback(
code="some_code",
state="TAMPERED.INVALID",
session=mock_session,
)
assert exc_info.value.status_code == 400
# ---------------------------------------------------------------------------
# Status endpoint
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_calendar_status_connected():
"""get_calendar_status returns connected=True when ChannelConnection exists."""
from shared.api.calendar_auth import get_calendar_status
mock_session = AsyncMock()
mock_result = MagicMock()
# Simulate existing connection
mock_conn = MagicMock()
mock_result.scalar_one_or_none.return_value = mock_conn
mock_session.execute.return_value = mock_result
tenant_id = uuid.uuid4()
status = await get_calendar_status(tenant_id=tenant_id, session=mock_session)
assert status["connected"] is True
@pytest.mark.asyncio
async def test_calendar_status_not_connected():
"""get_calendar_status returns connected=False when no ChannelConnection exists."""
from shared.api.calendar_auth import get_calendar_status
mock_session = AsyncMock()
mock_result = MagicMock()
mock_result.scalar_one_or_none.return_value = None
mock_session.execute.return_value = mock_result
tenant_id = uuid.uuid4()
status = await get_calendar_status(tenant_id=tenant_id, session=mock_session)
assert status["connected"] is False

View File

@@ -0,0 +1,423 @@
"""
Unit tests for the per-tenant OAuth calendar_lookup tool.
Tests:
- Returns "not configured" message when no tenant_id provided
- Returns "not connected" message when no ChannelConnection exists for tenant
- action="list" calls Google Calendar API and returns formatted event list
- action="check_availability" returns free/busy summary
- action="create" creates an event and returns confirmation
- Token refresh write-back: updated credentials written to DB
- All responses are natural language strings (no raw JSON)
- API errors return human-readable messages
"""
from __future__ import annotations
import uuid
from unittest.mock import AsyncMock, MagicMock, patch
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
_TENANT_ID = str(uuid.uuid4())
_DATE = "2026-03-26"
# Fake encrypted token JSON stored in channel_connections.config
_FAKE_ENCRYPTED_TOKEN = "gAAAAAB..."
# Decrypted token dict (as would come from Google OAuth)
_FAKE_TOKEN_DICT = {
"token": "ya29.test_access_token",
"refresh_token": "1//test_refresh_token",
"token_uri": "https://oauth2.googleapis.com/token",
"client_id": "test-client-id",
"client_secret": "test-client-secret",
"scopes": ["https://www.googleapis.com/auth/calendar"],
}
# Sample Google Calendar events response
_FAKE_EVENTS = {
"items": [
{
"summary": "Team Standup",
"start": {"dateTime": "2026-03-26T09:00:00+00:00"},
"end": {"dateTime": "2026-03-26T09:30:00+00:00"},
},
{
"summary": "Sprint Planning",
"start": {"dateTime": "2026-03-26T14:00:00+00:00"},
"end": {"dateTime": "2026-03-26T15:00:00+00:00"},
},
]
}
# Common patch targets
_PATCH_ENC = "orchestrator.tools.builtins.calendar_lookup.KeyEncryptionService"
_PATCH_CREDS = "orchestrator.tools.builtins.calendar_lookup.google_credentials_from_token"
_PATCH_BUILD = "orchestrator.tools.builtins.calendar_lookup.build"
_PATCH_SETTINGS = "orchestrator.tools.builtins.calendar_lookup.settings"
def _make_mock_session(conn_config: dict | None = None):
"""Build a mock AsyncSession that returns a ChannelConnection or None."""
session = AsyncMock()
mock_result = MagicMock()
if conn_config is not None:
mock_conn = MagicMock()
mock_conn.id = uuid.uuid4()
mock_conn.config = conn_config
mock_result.scalar_one_or_none.return_value = mock_conn
else:
mock_result.scalar_one_or_none.return_value = None
session.execute.return_value = mock_result
session.commit = AsyncMock()
return session
def _make_enc_mock():
"""Create a mock KeyEncryptionService with decrypt returning the fake token JSON."""
import json
mock_enc = MagicMock()
mock_enc.decrypt.return_value = json.dumps(_FAKE_TOKEN_DICT)
mock_enc.encrypt.return_value = "new_encrypted_token"
return mock_enc
def _make_mock_settings():
"""Create mock settings with encryption key configured."""
mock_settings = MagicMock()
mock_settings.platform_encryption_key = "test-key"
mock_settings.platform_encryption_key_previous = ""
return mock_settings
# ---------------------------------------------------------------------------
# No tenant_id
# ---------------------------------------------------------------------------
async def test_calendar_lookup_no_tenant_id_returns_message():
"""calendar_lookup without tenant_id returns a helpful error message."""
from orchestrator.tools.builtins.calendar_lookup import calendar_lookup
result = await calendar_lookup(date=_DATE)
assert "tenant" in result.lower() or "not available" in result.lower()
assert isinstance(result, str)
# ---------------------------------------------------------------------------
# Not connected
# ---------------------------------------------------------------------------
async def test_calendar_lookup_not_connected_returns_message():
"""calendar_lookup with no ChannelConnection returns 'not connected' message."""
from orchestrator.tools.builtins.calendar_lookup import calendar_lookup
mock_session = _make_mock_session(conn_config=None)
# Pass _session directly to bypass DB session creation
result = await calendar_lookup(date=_DATE, tenant_id=_TENANT_ID, _session=mock_session)
assert "not connected" in result.lower() or "connect" in result.lower()
assert isinstance(result, str)
# ---------------------------------------------------------------------------
# action="list"
# ---------------------------------------------------------------------------
async def test_calendar_lookup_list_returns_formatted_events():
"""
action="list" returns a natural-language event list.
No raw JSON — results are human-readable strings.
"""
from orchestrator.tools.builtins.calendar_lookup import calendar_lookup
mock_session = _make_mock_session(conn_config={"token": _FAKE_ENCRYPTED_TOKEN})
mock_creds = MagicMock()
mock_creds.token = "ya29.test_access_token"
mock_creds.expired = False
mock_creds.valid = True
mock_service = MagicMock()
mock_events_list = MagicMock()
mock_events_list.execute.return_value = _FAKE_EVENTS
mock_service.events.return_value.list.return_value = mock_events_list
with (
patch(_PATCH_ENC) as mock_enc_cls,
patch(_PATCH_CREDS) as mock_creds_fn,
patch(_PATCH_BUILD) as mock_build,
patch(_PATCH_SETTINGS, _make_mock_settings()),
):
mock_enc_cls.return_value = _make_enc_mock()
mock_creds_fn.return_value = mock_creds
mock_build.return_value = mock_service
result = await calendar_lookup(
date=_DATE,
action="list",
tenant_id=_TENANT_ID,
_session=mock_session,
)
assert isinstance(result, str)
assert "Team Standup" in result
assert "Sprint Planning" in result
# No raw JSON
assert "{" not in result or "items" not in result
async def test_calendar_lookup_list_no_events():
"""action="list" with no events returns a 'no events' message."""
from orchestrator.tools.builtins.calendar_lookup import calendar_lookup
mock_session = _make_mock_session(conn_config={"token": _FAKE_ENCRYPTED_TOKEN})
mock_creds = MagicMock()
mock_creds.expired = False
mock_creds.valid = True
mock_service = MagicMock()
mock_service.events.return_value.list.return_value.execute.return_value = {"items": []}
with (
patch(_PATCH_ENC) as mock_enc_cls,
patch(_PATCH_CREDS) as mock_creds_fn,
patch(_PATCH_BUILD) as mock_build,
patch(_PATCH_SETTINGS, _make_mock_settings()),
):
mock_enc_cls.return_value = _make_enc_mock()
mock_creds_fn.return_value = mock_creds
mock_build.return_value = mock_service
result = await calendar_lookup(
date=_DATE,
action="list",
tenant_id=_TENANT_ID,
_session=mock_session,
)
assert "no event" in result.lower() or "free" in result.lower()
# ---------------------------------------------------------------------------
# action="check_availability"
# ---------------------------------------------------------------------------
async def test_calendar_lookup_check_availability_with_events():
"""action="check_availability" returns busy slot summary when events exist."""
from orchestrator.tools.builtins.calendar_lookup import calendar_lookup
mock_session = _make_mock_session(conn_config={"token": _FAKE_ENCRYPTED_TOKEN})
mock_creds = MagicMock()
mock_creds.expired = False
mock_creds.valid = True
mock_service = MagicMock()
mock_service.events.return_value.list.return_value.execute.return_value = _FAKE_EVENTS
with (
patch(_PATCH_ENC) as mock_enc_cls,
patch(_PATCH_CREDS) as mock_creds_fn,
patch(_PATCH_BUILD) as mock_build,
patch(_PATCH_SETTINGS, _make_mock_settings()),
):
mock_enc_cls.return_value = _make_enc_mock()
mock_creds_fn.return_value = mock_creds
mock_build.return_value = mock_service
result = await calendar_lookup(
date=_DATE,
action="check_availability",
tenant_id=_TENANT_ID,
_session=mock_session,
)
assert isinstance(result, str)
assert "busy" in result.lower() or "slot" in result.lower() or "standup" in result.lower()
async def test_calendar_lookup_check_availability_free_day():
"""action="check_availability" with no events returns 'entire day is free'."""
from orchestrator.tools.builtins.calendar_lookup import calendar_lookup
mock_session = _make_mock_session(conn_config={"token": _FAKE_ENCRYPTED_TOKEN})
mock_creds = MagicMock()
mock_creds.expired = False
mock_creds.valid = True
mock_service = MagicMock()
mock_service.events.return_value.list.return_value.execute.return_value = {"items": []}
with (
patch(_PATCH_ENC) as mock_enc_cls,
patch(_PATCH_CREDS) as mock_creds_fn,
patch(_PATCH_BUILD) as mock_build,
patch(_PATCH_SETTINGS, _make_mock_settings()),
):
mock_enc_cls.return_value = _make_enc_mock()
mock_creds_fn.return_value = mock_creds
mock_build.return_value = mock_service
result = await calendar_lookup(
date=_DATE,
action="check_availability",
tenant_id=_TENANT_ID,
_session=mock_session,
)
assert "free" in result.lower()
# ---------------------------------------------------------------------------
# action="create"
# ---------------------------------------------------------------------------
async def test_calendar_lookup_create_event():
"""action="create" inserts an event and returns confirmation."""
from orchestrator.tools.builtins.calendar_lookup import calendar_lookup
mock_session = _make_mock_session(conn_config={"token": _FAKE_ENCRYPTED_TOKEN})
mock_creds = MagicMock()
mock_creds.expired = False
mock_creds.valid = True
created_event = {
"id": "abc123",
"summary": "Product Demo",
"start": {"dateTime": "2026-03-26T10:00:00+00:00"},
"end": {"dateTime": "2026-03-26T11:00:00+00:00"},
}
mock_service = MagicMock()
mock_service.events.return_value.insert.return_value.execute.return_value = created_event
with (
patch(_PATCH_ENC) as mock_enc_cls,
patch(_PATCH_CREDS) as mock_creds_fn,
patch(_PATCH_BUILD) as mock_build,
patch(_PATCH_SETTINGS, _make_mock_settings()),
):
mock_enc_cls.return_value = _make_enc_mock()
mock_creds_fn.return_value = mock_creds
mock_build.return_value = mock_service
result = await calendar_lookup(
date=_DATE,
action="create",
event_summary="Product Demo",
event_start="2026-03-26T10:00:00+00:00",
event_end="2026-03-26T11:00:00+00:00",
tenant_id=_TENANT_ID,
_session=mock_session,
)
assert isinstance(result, str)
assert "created" in result.lower() or "product demo" in result.lower()
async def test_calendar_lookup_create_missing_fields():
"""action="create" without event_summary returns an error message."""
from orchestrator.tools.builtins.calendar_lookup import calendar_lookup
mock_session = _make_mock_session(conn_config={"token": _FAKE_ENCRYPTED_TOKEN})
mock_creds = MagicMock()
mock_creds.expired = False
mock_creds.valid = True
mock_service = MagicMock()
with (
patch(_PATCH_ENC) as mock_enc_cls,
patch(_PATCH_CREDS) as mock_creds_fn,
patch(_PATCH_BUILD) as mock_build,
patch(_PATCH_SETTINGS, _make_mock_settings()),
):
mock_enc_cls.return_value = _make_enc_mock()
mock_creds_fn.return_value = mock_creds
mock_build.return_value = mock_service
result = await calendar_lookup(
date=_DATE,
action="create",
# No event_summary, event_start, event_end
tenant_id=_TENANT_ID,
_session=mock_session,
)
assert isinstance(result, str)
assert "error" in result.lower() or "required" in result.lower() or "missing" in result.lower()
# ---------------------------------------------------------------------------
# Token refresh write-back
# ---------------------------------------------------------------------------
async def test_calendar_lookup_token_refresh_writeback():
"""
When credentials.token changes after an API call (refresh occurred),
the updated token should be encrypted and written back to channel_connections.
"""
from orchestrator.tools.builtins.calendar_lookup import calendar_lookup
conn_id = uuid.uuid4()
mock_session = _make_mock_session(conn_config={"token": _FAKE_ENCRYPTED_TOKEN})
# Get the mock connection to track updates
mock_conn = mock_session.execute.return_value.scalar_one_or_none.return_value
mock_conn.id = conn_id
mock_conn.config = {"token": _FAKE_ENCRYPTED_TOKEN}
# Credentials that change token after API call (simulating refresh)
original_token = "ya29.original_token"
refreshed_token = "ya29.refreshed_token"
mock_creds = MagicMock()
mock_creds.token = original_token
mock_creds.refresh_token = "1//refresh_token"
mock_creds.expired = False
mock_creds.valid = True
def simulate_api_call_that_refreshes():
"""Simulate the side effect of token refresh during API call."""
mock_creds.token = refreshed_token
return {"items": []}
mock_service = MagicMock()
mock_service.events.return_value.list.return_value.execute.side_effect = simulate_api_call_that_refreshes
mock_enc = _make_enc_mock()
with (
patch(_PATCH_ENC) as mock_enc_cls,
patch(_PATCH_CREDS) as mock_creds_fn,
patch(_PATCH_BUILD) as mock_build,
patch(_PATCH_SETTINGS, _make_mock_settings()),
):
mock_enc_cls.return_value = mock_enc
mock_creds_fn.return_value = mock_creds
mock_build.return_value = mock_service
await calendar_lookup(
date=_DATE,
action="list",
tenant_id=_TENANT_ID,
_session=mock_session,
)
# encrypt should have been called for write-back
mock_enc.encrypt.assert_called()
# session.commit should have been called to persist the updated token
mock_session.commit.assert_called()

View File

@@ -0,0 +1,186 @@
"""
Unit tests for executor tenant_id/agent_id injection.
Tests that execute_tool injects tenant_id and agent_id into handler kwargs
before calling the handler, so context-aware tools (kb_search, calendar_lookup)
receive tenant context without the LLM needing to provide it.
"""
from __future__ import annotations
import uuid
from typing import Any
from unittest.mock import AsyncMock, MagicMock
import pytest
def _make_tool(handler: Any, requires_confirmation: bool = False) -> Any:
"""Create a minimal ToolDefinition-like object for tests."""
tool = MagicMock()
tool.handler = handler
tool.requires_confirmation = requires_confirmation
tool.parameters = {
"type": "object",
"properties": {
"query": {"type": "string"},
},
"required": ["query"],
}
return tool
class TestExecutorTenantInjection:
@pytest.mark.asyncio
async def test_tenant_id_injected_into_handler_kwargs(self) -> None:
"""Handler should receive tenant_id even though LLM didn't provide it."""
from orchestrator.tools.executor import execute_tool
received_kwargs: dict[str, Any] = {}
async def mock_handler(**kwargs: Any) -> str:
received_kwargs.update(kwargs)
return "handler result"
tool = _make_tool(mock_handler)
registry = {"test_tool": tool}
tenant_id = uuid.uuid4()
agent_id = uuid.uuid4()
audit_logger = MagicMock()
audit_logger.log_tool_call = AsyncMock()
tool_call = {
"function": {
"name": "test_tool",
"arguments": '{"query": "hello world"}',
}
}
result = await execute_tool(tool_call, registry, tenant_id, agent_id, audit_logger)
assert result == "handler result"
assert "tenant_id" in received_kwargs
assert received_kwargs["tenant_id"] == str(tenant_id)
@pytest.mark.asyncio
async def test_agent_id_injected_into_handler_kwargs(self) -> None:
"""Handler should receive agent_id even though LLM didn't provide it."""
from orchestrator.tools.executor import execute_tool
received_kwargs: dict[str, Any] = {}
async def mock_handler(**kwargs: Any) -> str:
received_kwargs.update(kwargs)
return "ok"
tool = _make_tool(mock_handler)
registry = {"test_tool": tool}
tenant_id = uuid.uuid4()
agent_id = uuid.uuid4()
audit_logger = MagicMock()
audit_logger.log_tool_call = AsyncMock()
tool_call = {
"function": {
"name": "test_tool",
"arguments": '{"query": "test"}',
}
}
await execute_tool(tool_call, registry, tenant_id, agent_id, audit_logger)
assert "agent_id" in received_kwargs
assert received_kwargs["agent_id"] == str(agent_id)
@pytest.mark.asyncio
async def test_injected_ids_are_strings(self) -> None:
"""Injected tenant_id and agent_id should be strings, not UUIDs."""
from orchestrator.tools.executor import execute_tool
received_kwargs: dict[str, Any] = {}
async def mock_handler(**kwargs: Any) -> str:
received_kwargs.update(kwargs)
return "ok"
tool = _make_tool(mock_handler)
registry = {"test_tool": tool}
tenant_id = uuid.uuid4()
agent_id = uuid.uuid4()
audit_logger = MagicMock()
audit_logger.log_tool_call = AsyncMock()
tool_call = {
"function": {
"name": "test_tool",
"arguments": '{"query": "test"}',
}
}
await execute_tool(tool_call, registry, tenant_id, agent_id, audit_logger)
assert isinstance(received_kwargs["tenant_id"], str)
assert isinstance(received_kwargs["agent_id"], str)
@pytest.mark.asyncio
async def test_llm_provided_args_preserved(self) -> None:
"""Original LLM-provided args should still be present after injection."""
from orchestrator.tools.executor import execute_tool
received_kwargs: dict[str, Any] = {}
async def mock_handler(**kwargs: Any) -> str:
received_kwargs.update(kwargs)
return "ok"
tool = _make_tool(mock_handler)
registry = {"test_tool": tool}
tenant_id = uuid.uuid4()
agent_id = uuid.uuid4()
audit_logger = MagicMock()
audit_logger.log_tool_call = AsyncMock()
tool_call = {
"function": {
"name": "test_tool",
"arguments": '{"query": "search term from LLM"}',
}
}
await execute_tool(tool_call, registry, tenant_id, agent_id, audit_logger)
assert received_kwargs["query"] == "search term from LLM"
assert received_kwargs["tenant_id"] == str(tenant_id)
assert received_kwargs["agent_id"] == str(agent_id)
@pytest.mark.asyncio
async def test_injection_after_schema_validation(self) -> None:
"""Injection happens after validation — injected keys don't cause schema failures."""
from orchestrator.tools.executor import execute_tool
# Tool requires exactly 'query', nothing else in schema required
# Schema should pass even though we inject tenant_id/agent_id
async def mock_handler(**kwargs: Any) -> str:
return "passed"
tool = _make_tool(mock_handler)
registry = {"test_tool": tool}
tenant_id = uuid.uuid4()
agent_id = uuid.uuid4()
audit_logger = MagicMock()
audit_logger.log_tool_call = AsyncMock()
tool_call = {
"function": {
"name": "test_tool",
"arguments": '{"query": "test"}',
}
}
result = await execute_tool(tool_call, registry, tenant_id, agent_id, audit_logger)
assert result == "passed"

View File

@@ -0,0 +1,201 @@
"""
Unit tests for orchestrator.tools.extractors.
Tests that each document format produces expected text output, and that
unsupported formats raise ValueError.
All test fixtures are constructed in-memory using the same libraries that
the extractor uses — no external files needed.
"""
from __future__ import annotations
import csv
import io
import pytest
# ---------------------------------------------------------------------------
# Helpers to build minimal valid files in memory
# ---------------------------------------------------------------------------
def _make_pdf_bytes(text: str) -> bytes:
"""Create a minimal valid PDF with one page containing the given text."""
from pypdf import PdfWriter
writer = PdfWriter()
page = writer.add_blank_page(width=200, height=200)
writer.add_page(page)
buf = io.BytesIO()
writer.write(buf)
# Build a simple PDF manually since pypdf cannot add text without a font
# Instead, use reportlab if available, fall back to a minimal hand-crafted PDF
try:
from reportlab.pdfgen import canvas as rl_canvas
buf2 = io.BytesIO()
c = rl_canvas.Canvas(buf2)
c.drawString(10, 100, text)
c.save()
return buf2.getvalue()
except ImportError:
pass
# Hand-crafted minimal PDF with embedded text stream
content_stream = f"BT /F1 12 Tf 50 700 Td ({text}) Tj ET"
stream_bytes = content_stream.encode()
pdf = (
b"%PDF-1.4\n"
b"1 0 obj\n<< /Type /Catalog /Pages 2 0 R >>\nendobj\n"
b"2 0 obj\n<< /Type /Pages /Kids [3 0 R] /Count 1 >>\nendobj\n"
b"3 0 obj\n<< /Type /Page /Parent 2 0 R /MediaBox [0 0 612 792]"
b" /Contents 4 0 R /Resources << /Font << /F1 << /Type /Font"
b" /Subtype /Type1 /BaseFont /Helvetica >> >> >> >>\nendobj\n"
b"4 0 obj\n<< /Length " + str(len(stream_bytes)).encode() + b" >>\n"
b"stream\n" + stream_bytes + b"\nendstream\nendobj\n"
b"xref\n0 5\n0000000000 65535 f \n"
b"trailer\n<< /Size 5 /Root 1 0 R >>\nstartxref\n0\n%%EOF"
)
return pdf
def _make_docx_bytes(paragraphs: list[str]) -> bytes:
"""Create a minimal DOCX with the given paragraph texts."""
from docx import Document
doc = Document()
for p in paragraphs:
doc.add_paragraph(p)
buf = io.BytesIO()
doc.save(buf)
return buf.getvalue()
def _make_pptx_bytes(slide_texts: list[str]) -> bytes:
"""Create a PPTX with one text box per slide."""
from pptx import Presentation
from pptx.util import Inches
prs = Presentation()
blank_layout = prs.slide_layouts[6] # blank layout
for text in slide_texts:
slide = prs.slides.add_slide(blank_layout)
txBox = slide.shapes.add_textbox(Inches(1), Inches(1), Inches(4), Inches(2))
txBox.text_frame.text = text
buf = io.BytesIO()
prs.save(buf)
return buf.getvalue()
def _make_xlsx_bytes(rows: list[list[str]]) -> bytes:
"""Create an XLSX with the given rows."""
import openpyxl
wb = openpyxl.Workbook()
ws = wb.active
for row in rows:
ws.append(row)
buf = io.BytesIO()
wb.save(buf)
return buf.getvalue()
# ---------------------------------------------------------------------------
# Tests
# ---------------------------------------------------------------------------
class TestExtractTextDocx:
def test_extracts_paragraph_text(self) -> None:
from orchestrator.tools.extractors import extract_text
docx_bytes = _make_docx_bytes(["Hello world", "Second paragraph"])
result = extract_text("document.docx", docx_bytes)
assert "Hello world" in result
assert "Second paragraph" in result
def test_empty_docx_returns_string(self) -> None:
from orchestrator.tools.extractors import extract_text
docx_bytes = _make_docx_bytes([])
result = extract_text("empty.docx", docx_bytes)
assert isinstance(result, str)
class TestExtractTextPptx:
def test_extracts_slide_text(self) -> None:
from orchestrator.tools.extractors import extract_text
pptx_bytes = _make_pptx_bytes(["Slide one content", "Slide two content"])
result = extract_text("slides.pptx", pptx_bytes)
assert "Slide one content" in result
assert "Slide two content" in result
class TestExtractTextXlsx:
def test_extracts_cell_data_as_csv(self) -> None:
from orchestrator.tools.extractors import extract_text
xlsx_bytes = _make_xlsx_bytes([["Name", "Age"], ["Alice", "30"], ["Bob", "25"]])
result = extract_text("data.xlsx", xlsx_bytes)
assert "Name" in result
assert "Alice" in result
assert "Bob" in result
class TestExtractTextCsv:
def test_extracts_csv_text(self) -> None:
from orchestrator.tools.extractors import extract_text
csv_content = "col1,col2\nval1,val2\n"
csv_bytes = csv_content.encode("utf-8")
result = extract_text("data.csv", csv_bytes)
assert "col1" in result
assert "val1" in result
def test_handles_non_utf8_gracefully(self) -> None:
from orchestrator.tools.extractors import extract_text
bad_bytes = b"hello\xff world"
result = extract_text("data.csv", bad_bytes)
assert "hello" in result
class TestExtractTextTxt:
def test_extracts_plain_text(self) -> None:
from orchestrator.tools.extractors import extract_text
txt_bytes = b"Hello, this is a plain text file."
result = extract_text("notes.txt", txt_bytes)
assert "Hello, this is a plain text file." in result
class TestExtractTextMarkdown:
def test_extracts_markdown_text(self) -> None:
from orchestrator.tools.extractors import extract_text
md_bytes = b"# Heading\n\nSome paragraph text here."
result = extract_text("notes.md", md_bytes)
assert "Heading" in result
assert "Some paragraph text here." in result
class TestExtractTextUnsupported:
def test_raises_value_error_for_unsupported_extension(self) -> None:
from orchestrator.tools.extractors import extract_text
with pytest.raises(ValueError, match="Unsupported file extension"):
extract_text("file.exe", b"some bytes")
def test_raises_for_zip(self) -> None:
from orchestrator.tools.extractors import extract_text
with pytest.raises(ValueError, match="Unsupported file extension"):
extract_text("archive.zip", b"PK\x03\x04")

View File

@@ -0,0 +1,183 @@
"""
Unit tests for the KB ingestion pipeline.
Tests:
- chunk_text: sliding window chunker produces correctly-sized, overlapping chunks
- ingest_document_pipeline: downloads file from MinIO, extracts, chunks, embeds, stores
- ingest_document_pipeline: sets status='error' on failure
"""
from __future__ import annotations
import uuid
from unittest.mock import AsyncMock, MagicMock, patch
import pytest
class TestChunkText:
def test_basic_chunking(self) -> None:
from orchestrator.tools.ingest import chunk_text
text = "a" * 1000
chunks = chunk_text(text, chunk_size=100, overlap=10)
assert len(chunks) > 0
for chunk in chunks:
assert len(chunk) <= 100
def test_overlap_between_chunks(self) -> None:
from orchestrator.tools.ingest import chunk_text
# Create text with identifiable segments
text = "AAAA" * 50 + "BBBB" * 50 # 400 chars
chunks = chunk_text(text, chunk_size=200, overlap=50)
# With overlap=50, consecutive chunks should share chars
assert len(chunks) >= 2
def test_short_text_returns_one_chunk(self) -> None:
from orchestrator.tools.ingest import chunk_text
text = "Hello world"
chunks = chunk_text(text, chunk_size=500, overlap=50)
assert len(chunks) == 1
assert chunks[0] == "Hello world"
def test_empty_text_returns_empty_list(self) -> None:
from orchestrator.tools.ingest import chunk_text
chunks = chunk_text("", chunk_size=500, overlap=50)
assert chunks == []
def test_whitespace_only_returns_empty_list(self) -> None:
from orchestrator.tools.ingest import chunk_text
chunks = chunk_text(" \n ", chunk_size=500, overlap=50)
assert chunks == []
def test_default_parameters(self) -> None:
from orchestrator.tools.ingest import chunk_text
text = "word " * 500 # 2500 chars
chunks = chunk_text(text)
assert len(chunks) > 1
# Default chunk_size is 500
for chunk in chunks:
assert len(chunk) <= 500
class TestIngestDocumentPipeline:
@pytest.mark.asyncio
async def test_file_upload_sets_status_ready(self) -> None:
"""Pipeline downloads file, extracts, chunks, embeds, stores, sets ready."""
from orchestrator.tools.ingest import ingest_document_pipeline
tenant_id = str(uuid.uuid4())
document_id = str(uuid.uuid4())
mock_doc = MagicMock()
mock_doc.id = uuid.UUID(document_id)
mock_doc.tenant_id = uuid.UUID(tenant_id)
mock_doc.filename = "test.txt"
mock_doc.source_url = None
mock_doc.status = "processing"
with (
patch("orchestrator.tools.ingest.async_session_factory") as mock_sf,
patch("orchestrator.tools.ingest.engine"),
patch("orchestrator.tools.ingest.configure_rls_hook"),
patch("orchestrator.tools.ingest.current_tenant_id"),
patch("orchestrator.tools.ingest._get_minio_client") as mock_minio,
patch("orchestrator.tools.ingest.extract_text", return_value="Test content " * 50) as mock_extract,
patch("orchestrator.tools.ingest.embed_texts", return_value=[[0.1] * 384]) as mock_embed,
):
mock_session = AsyncMock()
mock_result = MagicMock()
mock_result.scalar_one_or_none.return_value = mock_doc
mock_session.execute = AsyncMock(return_value=mock_result)
mock_session.commit = AsyncMock()
mock_session.__aenter__ = AsyncMock(return_value=mock_session)
mock_session.__aexit__ = AsyncMock(return_value=False)
mock_sf.return_value = mock_session
# MinIO returns file bytes
minio_client = MagicMock()
response_obj = MagicMock()
response_obj.read.return_value = b"Test content " * 50
minio_client.get_object.return_value = response_obj
mock_minio.return_value = minio_client
await ingest_document_pipeline(document_id, tenant_id)
# Status should be set to 'ready' on the document
assert mock_doc.status == "ready"
assert mock_doc.chunk_count is not None
@pytest.mark.asyncio
async def test_pipeline_sets_error_on_exception(self) -> None:
"""Pipeline marks document as error when extraction fails."""
from orchestrator.tools.ingest import ingest_document_pipeline
tenant_id = str(uuid.uuid4())
document_id = str(uuid.uuid4())
mock_doc = MagicMock()
mock_doc.id = uuid.UUID(document_id)
mock_doc.tenant_id = uuid.UUID(tenant_id)
mock_doc.filename = "test.txt"
mock_doc.source_url = None
mock_doc.status = "processing"
with (
patch("orchestrator.tools.ingest.async_session_factory") as mock_sf,
patch("orchestrator.tools.ingest.engine"),
patch("orchestrator.tools.ingest.configure_rls_hook"),
patch("orchestrator.tools.ingest.current_tenant_id"),
patch("orchestrator.tools.ingest._get_minio_client") as mock_minio,
):
mock_session = AsyncMock()
mock_result = MagicMock()
mock_result.scalar_one_or_none.return_value = mock_doc
mock_session.execute = AsyncMock(return_value=mock_result)
mock_session.commit = AsyncMock()
mock_session.__aenter__ = AsyncMock(return_value=mock_session)
mock_session.__aexit__ = AsyncMock(return_value=False)
mock_sf.return_value = mock_session
# MinIO raises an error
minio_client = MagicMock()
minio_client.get_object.side_effect = Exception("MinIO connection failed")
mock_minio.return_value = minio_client
await ingest_document_pipeline(document_id, tenant_id)
assert mock_doc.status == "error"
assert mock_doc.error_message is not None
@pytest.mark.asyncio
async def test_document_not_found_is_no_op(self) -> None:
"""If document doesn't exist, pipeline exits gracefully."""
from orchestrator.tools.ingest import ingest_document_pipeline
tenant_id = str(uuid.uuid4())
document_id = str(uuid.uuid4())
with (
patch("orchestrator.tools.ingest.async_session_factory") as mock_sf,
patch("orchestrator.tools.ingest.engine"),
patch("orchestrator.tools.ingest.configure_rls_hook"),
patch("orchestrator.tools.ingest.current_tenant_id"),
):
mock_session = AsyncMock()
mock_result = MagicMock()
mock_result.scalar_one_or_none.return_value = None # Not found
mock_session.execute = AsyncMock(return_value=mock_result)
mock_session.__aenter__ = AsyncMock(return_value=mock_session)
mock_session.__aexit__ = AsyncMock(return_value=False)
mock_sf.return_value = mock_session
# Should not raise
await ingest_document_pipeline(document_id, tenant_id)

View File

@@ -0,0 +1,278 @@
"""
Unit tests for the KB upload API router.
Tests:
- POST /{tenant_id}/documents — file upload returns 201 with document_id
- GET /{tenant_id}/documents — list returns documents with status field
- DELETE /{tenant_id}/documents/{doc_id} — removes document
- POST /{tenant_id}/documents/url — URL ingest dispatches Celery task
- POST /{tenant_id}/documents/{doc_id}/reindex — re-dispatches Celery task
All external dependencies (MinIO, DB, Celery) are mocked.
Auth dependencies are overridden via FastAPI app.dependency_overrides.
"""
from __future__ import annotations
import uuid
from datetime import datetime
from typing import Any
from unittest.mock import AsyncMock, MagicMock, patch
import pytest
from fastapi import FastAPI
from httpx import ASGITransport, AsyncClient
from shared.api.rbac import require_tenant_admin, require_tenant_member
from shared.db import get_session
# ---------------------------------------------------------------------------
# Fixtures
# ---------------------------------------------------------------------------
TENANT_ID = str(uuid.uuid4())
DOC_ID = uuid.uuid4()
def _make_mock_caller() -> MagicMock:
caller = MagicMock()
caller.tenant_id = uuid.UUID(TENANT_ID)
caller.role = "admin"
return caller
def _make_test_app(mock_session: AsyncMock) -> FastAPI:
"""Create a minimal FastAPI app mounting the kb_router with overridden deps."""
from shared.api.kb import kb_router
test_app = FastAPI()
test_app.include_router(kb_router)
# Override auth dependencies so no real JWT validation happens
mock_caller = _make_mock_caller()
test_app.dependency_overrides[require_tenant_admin] = lambda: mock_caller
test_app.dependency_overrides[require_tenant_member] = lambda: mock_caller
# Override DB session
async def _override_session() -> AsyncMock: # type: ignore[return]
yield mock_session
test_app.dependency_overrides[get_session] = _override_session
return test_app
@pytest.fixture
def mock_session() -> AsyncMock:
session = AsyncMock()
session.add = MagicMock()
session.flush = AsyncMock()
session.commit = AsyncMock()
session.delete = AsyncMock()
return session
@pytest.fixture
def mock_doc() -> MagicMock:
doc = MagicMock()
doc.id = DOC_ID
doc.tenant_id = uuid.UUID(TENANT_ID)
doc.filename = "test.txt"
doc.source_url = None
doc.content_type = "text/plain"
doc.status = "processing"
doc.chunk_count = None
doc.created_at = datetime(2026, 1, 1, 12, 0, 0)
return doc
# ---------------------------------------------------------------------------
# Tests
# ---------------------------------------------------------------------------
class TestKbUploadEndpoint:
@pytest.mark.asyncio
async def test_upload_file_returns_201(self, mock_session: AsyncMock) -> None:
"""Uploading a file should return 201 with document_id."""
def _side_add(obj: Any) -> None:
obj.id = DOC_ID
obj.created_at = datetime(2026, 1, 1, 12, 0, 0)
mock_session.add.side_effect = _side_add
app = _make_test_app(mock_session)
with (
patch("shared.api.kb._get_minio_client") as mock_minio,
patch("shared.api.kb._get_ingest_task") as mock_get_task,
):
minio_client = MagicMock()
minio_client.put_object = MagicMock()
minio_client.head_bucket = MagicMock()
mock_minio.return_value = minio_client
mock_task = MagicMock()
mock_task.delay = MagicMock()
mock_get_task.return_value = mock_task
async with AsyncClient(
transport=ASGITransport(app=app), base_url="http://test"
) as client:
response = await client.post(
f"/api/portal/kb/{TENANT_ID}/documents",
files={"file": ("hello.txt", b"Hello world content", "text/plain")},
)
assert response.status_code == 201
data = response.json()
assert "id" in data
assert data["filename"] == "hello.txt"
assert data["status"] == "processing"
mock_task.delay.assert_called_once()
@pytest.mark.asyncio
async def test_upload_unsupported_extension_returns_400(self, mock_session: AsyncMock) -> None:
"""Uploading an unsupported file type should return 400."""
app = _make_test_app(mock_session)
async with AsyncClient(
transport=ASGITransport(app=app), base_url="http://test"
) as client:
response = await client.post(
f"/api/portal/kb/{TENANT_ID}/documents",
files={"file": ("malware.exe", b"bad bytes", "application/octet-stream")},
)
assert response.status_code == 400
assert "Unsupported" in response.json()["detail"]
class TestKbListEndpoint:
@pytest.mark.asyncio
async def test_list_returns_documents_with_status(
self, mock_session: AsyncMock, mock_doc: MagicMock
) -> None:
"""GET /{tenant_id}/documents should return list with status field."""
mock_result = MagicMock()
mock_result.scalars.return_value.all.return_value = [mock_doc]
mock_session.execute = AsyncMock(return_value=mock_result)
app = _make_test_app(mock_session)
async with AsyncClient(
transport=ASGITransport(app=app), base_url="http://test"
) as client:
response = await client.get(f"/api/portal/kb/{TENANT_ID}/documents")
assert response.status_code == 200
data = response.json()
assert isinstance(data, list)
assert len(data) == 1
assert data[0]["status"] == "processing"
assert "id" in data[0]
class TestKbDeleteEndpoint:
@pytest.mark.asyncio
async def test_delete_document_returns_204(
self, mock_session: AsyncMock, mock_doc: MagicMock
) -> None:
"""DELETE /{tenant_id}/documents/{doc_id} should remove document."""
mock_result = MagicMock()
mock_result.scalar_one_or_none.return_value = mock_doc
mock_session.execute = AsyncMock(return_value=mock_result)
app = _make_test_app(mock_session)
with patch("shared.api.kb._get_minio_client") as mock_minio:
minio_client = MagicMock()
minio_client.remove_object = MagicMock()
mock_minio.return_value = minio_client
async with AsyncClient(
transport=ASGITransport(app=app), base_url="http://test"
) as client:
response = await client.delete(
f"/api/portal/kb/{TENANT_ID}/documents/{DOC_ID}"
)
assert response.status_code == 204
@pytest.mark.asyncio
async def test_delete_nonexistent_returns_404(self, mock_session: AsyncMock) -> None:
"""DELETE on a document that doesn't exist should return 404."""
mock_result = MagicMock()
mock_result.scalar_one_or_none.return_value = None
mock_session.execute = AsyncMock(return_value=mock_result)
app = _make_test_app(mock_session)
async with AsyncClient(
transport=ASGITransport(app=app), base_url="http://test"
) as client:
response = await client.delete(
f"/api/portal/kb/{TENANT_ID}/documents/{DOC_ID}"
)
assert response.status_code == 404
class TestKbUrlIngestEndpoint:
@pytest.mark.asyncio
async def test_url_ingest_dispatches_celery(self, mock_session: AsyncMock) -> None:
"""POST /{tenant_id}/documents/url should dispatch ingest_document task."""
def _side_add(obj: Any) -> None:
obj.id = DOC_ID
obj.created_at = datetime(2026, 1, 1, 12, 0, 0)
mock_session.add.side_effect = _side_add
app = _make_test_app(mock_session)
with patch("shared.api.kb._get_ingest_task") as mock_get_task:
mock_task = MagicMock()
mock_task.delay = MagicMock()
mock_get_task.return_value = mock_task
async with AsyncClient(
transport=ASGITransport(app=app), base_url="http://test"
) as client:
response = await client.post(
f"/api/portal/kb/{TENANT_ID}/documents/url",
json={"url": "https://example.com/page", "source_type": "web"},
)
assert response.status_code == 201
mock_task.delay.assert_called_once()
class TestKbReindexEndpoint:
@pytest.mark.asyncio
async def test_reindex_dispatches_celery(
self, mock_session: AsyncMock, mock_doc: MagicMock
) -> None:
"""POST /{tenant_id}/documents/{doc_id}/reindex should dispatch ingest task."""
mock_result = MagicMock()
mock_result.scalar_one_or_none.return_value = mock_doc
mock_session.execute = AsyncMock(return_value=mock_result)
app = _make_test_app(mock_session)
with patch("shared.api.kb._get_ingest_task") as mock_get_task:
mock_task = MagicMock()
mock_task.delay = MagicMock()
mock_get_task.return_value = mock_task
async with AsyncClient(
transport=ASGITransport(app=app), base_url="http://test"
) as client:
response = await client.post(
f"/api/portal/kb/{TENANT_ID}/documents/{DOC_ID}/reindex",
)
assert response.status_code == 202
mock_task.delay.assert_called_once()

612
uv.lock generated
View File

@@ -2,9 +2,15 @@ version = 1
revision = 3
requires-python = ">=3.12"
resolution-markers = [
"python_full_version >= '3.14'",
"python_full_version == '3.13.*'",
"python_full_version < '3.13'",
"python_full_version >= '3.14' and sys_platform == 'win32'",
"python_full_version >= '3.14' and sys_platform == 'emscripten'",
"python_full_version >= '3.14' and sys_platform != 'emscripten' and sys_platform != 'win32'",
"python_full_version == '3.13.*' and sys_platform == 'win32'",
"python_full_version == '3.13.*' and sys_platform == 'emscripten'",
"python_full_version == '3.13.*' and sys_platform != 'emscripten' and sys_platform != 'win32'",
"python_full_version < '3.13' and sys_platform == 'win32'",
"python_full_version < '3.13' and sys_platform == 'emscripten'",
"python_full_version < '3.13' and sys_platform != 'emscripten' and sys_platform != 'win32'",
]
[manifest]
@@ -314,6 +320,7 @@ dependencies = [
{ name = "jmespath" },
{ name = "s3transfer" },
]
sdist = { url = "https://files.pythonhosted.org/packages/74/ec/636ab2aa7ad9e6bf6e297240ac2d44dba63cc6611e2d5038db318436d449/boto3-1.42.74.tar.gz", hash = "sha256:dbacd808cf2a3dadbf35f3dbd8de97b94dc9f78b1ebd439f38f552e0f9753577", size = 112739, upload-time = "2026-03-23T19:34:09.815Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/ad/16/a264b4da2af99f4a12609b93fea941cce5ec41da14b33ed3fef77a910f0c/boto3-1.42.74-py3-none-any.whl", hash = "sha256:4bf89c044d618fe4435af854ab820f09dd43569c0df15d7beb0398f50b9aa970", size = 140557, upload-time = "2026-03-23T19:34:07.084Z" },
]
@@ -612,7 +619,7 @@ name = "cuda-bindings"
version = "13.2.0"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "cuda-pathfinder" },
{ name = "cuda-pathfinder", marker = "sys_platform != 'emscripten' and sys_platform != 'win32'" },
]
wheels = [
{ url = "https://files.pythonhosted.org/packages/52/c8/b2589d68acf7e3d63e2be330b84bc25712e97ed799affbca7edd7eae25d6/cuda_bindings-13.2.0-cp312-cp312-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:e865447abfb83d6a98ad5130ed3c70b1fc295ae3eeee39fd07b4ddb0671b6788", size = 5722404, upload-time = "2026-03-11T00:12:44.041Z" },
@@ -643,37 +650,46 @@ wheels = [
[package.optional-dependencies]
cublas = [
{ name = "nvidia-cublas", marker = "sys_platform == 'linux' or sys_platform == 'win32'" },
{ name = "nvidia-cublas", marker = "sys_platform == 'linux'" },
]
cudart = [
{ name = "nvidia-cuda-runtime", marker = "sys_platform == 'linux' or sys_platform == 'win32'" },
{ name = "nvidia-cuda-runtime", marker = "sys_platform == 'linux'" },
]
cufft = [
{ name = "nvidia-cufft", marker = "sys_platform == 'linux' or sys_platform == 'win32'" },
{ name = "nvidia-cufft", marker = "sys_platform == 'linux'" },
]
cufile = [
{ name = "nvidia-cufile", marker = "sys_platform == 'linux'" },
]
cupti = [
{ name = "nvidia-cuda-cupti", marker = "sys_platform == 'linux' or sys_platform == 'win32'" },
{ name = "nvidia-cuda-cupti", marker = "sys_platform == 'linux'" },
]
curand = [
{ name = "nvidia-curand", marker = "sys_platform == 'linux' or sys_platform == 'win32'" },
{ name = "nvidia-curand", marker = "sys_platform == 'linux'" },
]
cusolver = [
{ name = "nvidia-cusolver", marker = "sys_platform == 'linux' or sys_platform == 'win32'" },
{ name = "nvidia-cusolver", marker = "sys_platform == 'linux'" },
]
cusparse = [
{ name = "nvidia-cusparse", marker = "sys_platform == 'linux' or sys_platform == 'win32'" },
{ name = "nvidia-cusparse", marker = "sys_platform == 'linux'" },
]
nvjitlink = [
{ name = "nvidia-nvjitlink", marker = "sys_platform == 'linux' or sys_platform == 'win32'" },
{ name = "nvidia-nvjitlink", marker = "sys_platform == 'linux'" },
]
nvrtc = [
{ name = "nvidia-cuda-nvrtc", marker = "sys_platform == 'linux' or sys_platform == 'win32'" },
{ name = "nvidia-cuda-nvrtc", marker = "sys_platform == 'linux'" },
]
nvtx = [
{ name = "nvidia-nvtx", marker = "sys_platform == 'linux' or sys_platform == 'win32'" },
{ name = "nvidia-nvtx", marker = "sys_platform == 'linux'" },
]
[[package]]
name = "defusedxml"
version = "0.7.1"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/0f/d5/c66da9b79e5bdb124974bfe172b4daf3c984ebd9c2a06e2b8a4dc7331c72/defusedxml-0.7.1.tar.gz", hash = "sha256:1bb3032db185915b62d7c6209c5a8792be6a32ab2fedacc84e01b52c51aa3e69", size = 75520, upload-time = "2021-03-08T10:59:26.269Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/07/6c/aa3f2f849e01cb6a001cd8554a88d4c77c5c1a31c95bdf1cf9301e6d9ef4/defusedxml-0.7.1-py2.py3-none-any.whl", hash = "sha256:a352e7e428770286cc899e2542b6cdaedb2b4953ff269a210103ec58f6198a61", size = 25604, upload-time = "2021-03-08T10:59:24.45Z" },
]
[[package]]
@@ -719,6 +735,15 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/de/15/545e2b6cf2e3be84bc1ed85613edd75b8aea69807a71c26f4ca6a9258e82/email_validator-2.3.0-py3-none-any.whl", hash = "sha256:80f13f623413e6b197ae73bb10bf4eb0908faf509ad8362c5edeb0be7fd450b4", size = 35604, upload-time = "2025-08-26T13:09:05.858Z" },
]
[[package]]
name = "et-xmlfile"
version = "2.0.0"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/d3/38/af70d7ab1ae9d4da450eeec1fa3918940a5fafb9055e934af8d6eb0c2313/et_xmlfile-2.0.0.tar.gz", hash = "sha256:dab3f4764309081ce75662649be815c4c9081e88f0837825f90fd28317d4da54", size = 17234, upload-time = "2024-10-25T17:25:40.039Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/c1/8b/5fe2cc11fee489817272089c4203e679c63b570a5aaeb18d852ae3cbba6a/et_xmlfile-2.0.0-py3-none-any.whl", hash = "sha256:7a91720bc756843502c3b7504c77b8fe44217c85c537d85037f0f536151b2caa", size = 18059, upload-time = "2024-10-25T17:25:39.051Z" },
]
[[package]]
name = "fakeredis"
version = "2.34.1"
@@ -921,6 +946,24 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/a4/a5/842ae8f0c08b61d6484b52f99a03510a3a72d23141942d216ebe81fefbce/filelock-3.25.2-py3-none-any.whl", hash = "sha256:ca8afb0da15f229774c9ad1b455ed96e85a81373065fb10446672f64444ddf70", size = 26759, upload-time = "2026-03-11T20:45:37.437Z" },
]
[[package]]
name = "firecrawl-py"
version = "4.21.0"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "aiohttp" },
{ name = "httpx" },
{ name = "nest-asyncio" },
{ name = "pydantic" },
{ name = "python-dotenv" },
{ name = "requests" },
{ name = "websockets" },
]
sdist = { url = "https://files.pythonhosted.org/packages/6d/6b/8201b737c0667bf70748b86a6fb117aefc648154b4e05c5ee649432cbc3d/firecrawl_py-4.21.0.tar.gz", hash = "sha256:14a7e0967d816c711c3c53325c9371e2f780a787d1e94333a34d8aea7a43a237", size = 174256, upload-time = "2026-03-25T16:22:00.002Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/18/f1/1c0f1e5b33a318d7b9705b9e23c4397253d730e516e3d8a2f6aaea4b71a2/firecrawl_py-4.21.0-py3-none-any.whl", hash = "sha256:4e431f36117b4f2aaae633e747859a91626b0f2c6aaa6b7f86dfb7669a3595eb", size = 217607, upload-time = "2026-03-25T16:21:58.708Z" },
]
[[package]]
name = "frozenlist"
version = "1.8.0"
@@ -1019,6 +1062,89 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/e6/ab/fb21f4c939bb440104cc2b396d3be1d9b7a9fd3c6c2a53d98c45b3d7c954/fsspec-2026.2.0-py3-none-any.whl", hash = "sha256:98de475b5cb3bd66bedd5c4679e87b4fdfe1a3bf4d707b151b3c07e58c9a2437", size = 202505, upload-time = "2026-02-05T21:50:51.819Z" },
]
[[package]]
name = "google-api-core"
version = "2.30.0"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "google-auth" },
{ name = "googleapis-common-protos" },
{ name = "proto-plus" },
{ name = "protobuf" },
{ name = "requests" },
]
sdist = { url = "https://files.pythonhosted.org/packages/22/98/586ec94553b569080caef635f98a3723db36a38eac0e3d7eb3ea9d2e4b9a/google_api_core-2.30.0.tar.gz", hash = "sha256:02edfa9fab31e17fc0befb5f161b3bf93c9096d99aed584625f38065c511ad9b", size = 176959, upload-time = "2026-02-18T20:28:11.926Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/45/27/09c33d67f7e0dcf06d7ac17d196594e66989299374bfb0d4331d1038e76b/google_api_core-2.30.0-py3-none-any.whl", hash = "sha256:80be49ee937ff9aba0fd79a6eddfde35fe658b9953ab9b79c57dd7061afa8df5", size = 173288, upload-time = "2026-02-18T20:28:10.367Z" },
]
[[package]]
name = "google-api-python-client"
version = "2.193.0"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "google-api-core" },
{ name = "google-auth" },
{ name = "google-auth-httplib2" },
{ name = "httplib2" },
{ name = "uritemplate" },
]
sdist = { url = "https://files.pythonhosted.org/packages/90/f4/e14b6815d3b1885328dd209676a3a4c704882743ac94e18ef0093894f5c8/google_api_python_client-2.193.0.tar.gz", hash = "sha256:8f88d16e89d11341e0a8b199cafde0fb7e6b44260dffb88d451577cbd1bb5d33", size = 14281006, upload-time = "2026-03-17T18:25:29.415Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/f0/6d/fe75167797790a56d17799b75e1129bb93f7ff061efc7b36e9731bd4be2b/google_api_python_client-2.193.0-py3-none-any.whl", hash = "sha256:c42aa324b822109901cfecab5dc4fc3915d35a7b376835233c916c70610322db", size = 14856490, upload-time = "2026-03-17T18:25:26.608Z" },
]
[[package]]
name = "google-auth"
version = "2.49.1"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "cryptography" },
{ name = "pyasn1-modules" },
]
sdist = { url = "https://files.pythonhosted.org/packages/ea/80/6a696a07d3d3b0a92488933532f03dbefa4a24ab80fb231395b9a2a1be77/google_auth-2.49.1.tar.gz", hash = "sha256:16d40da1c3c5a0533f57d268fe72e0ebb0ae1cc3b567024122651c045d879b64", size = 333825, upload-time = "2026-03-12T19:30:58.135Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/e9/eb/c6c2478d8a8d633460be40e2a8a6f8f429171997a35a96f81d3b680dec83/google_auth-2.49.1-py3-none-any.whl", hash = "sha256:195ebe3dca18eddd1b3db5edc5189b76c13e96f29e73043b923ebcf3f1a860f7", size = 240737, upload-time = "2026-03-12T19:30:53.159Z" },
]
[[package]]
name = "google-auth-httplib2"
version = "0.3.0"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "google-auth" },
{ name = "httplib2" },
]
sdist = { url = "https://files.pythonhosted.org/packages/d5/ad/c1f2b1175096a8d04cf202ad5ea6065f108d26be6fc7215876bde4a7981d/google_auth_httplib2-0.3.0.tar.gz", hash = "sha256:177898a0175252480d5ed916aeea183c2df87c1f9c26705d74ae6b951c268b0b", size = 11134, upload-time = "2025-12-15T22:13:51.825Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/99/d5/3c97526c8796d3caf5f4b3bed2b05e8a7102326f00a334e7a438237f3b22/google_auth_httplib2-0.3.0-py3-none-any.whl", hash = "sha256:426167e5df066e3f5a0fc7ea18768c08e7296046594ce4c8c409c2457dd1f776", size = 9529, upload-time = "2025-12-15T22:13:51.048Z" },
]
[[package]]
name = "google-auth-oauthlib"
version = "1.3.0"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "google-auth" },
{ name = "requests-oauthlib" },
]
sdist = { url = "https://files.pythonhosted.org/packages/ac/b4/1b19567e4c567b796f5c593d89895f3cfae5a38e04f27c6af87618fd0942/google_auth_oauthlib-1.3.0.tar.gz", hash = "sha256:cd39e807ac7229d6b8b9c1e297321d36fcc8a9e4857dff4301870985df51a528", size = 21777, upload-time = "2026-02-27T14:13:01.489Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/2f/56/909fd5632226d3fba31d7aeffd4754410735d49362f5809956fe3e9af344/google_auth_oauthlib-1.3.0-py3-none-any.whl", hash = "sha256:386b3fb85cf4a5b819c6ad23e3128d975216b4cac76324de1d90b128aaf38f29", size = 19308, upload-time = "2026-02-27T14:12:47.865Z" },
]
[[package]]
name = "googleapis-common-protos"
version = "1.73.0"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "protobuf" },
]
sdist = { url = "https://files.pythonhosted.org/packages/99/96/a0205167fa0154f4a542fd6925bdc63d039d88dab3588b875078107e6f06/googleapis_common_protos-1.73.0.tar.gz", hash = "sha256:778d07cd4fbeff84c6f7c72102f0daf98fa2bfd3fa8bea426edc545588da0b5a", size = 147323, upload-time = "2026-03-06T21:53:09.727Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/69/28/23eea8acd65972bbfe295ce3666b28ac510dfcb115fac089d3edb0feb00a/googleapis_common_protos-1.73.0-py3-none-any.whl", hash = "sha256:dfdaaa2e860f242046be561e6d6cb5c5f1541ae02cfbcb034371aadb2942b4e8", size = 297578, upload-time = "2026-03-06T21:52:33.933Z" },
]
[[package]]
name = "greenlet"
version = "3.3.2"
@@ -1103,6 +1229,15 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/b4/7e/ccf239da366b37ba7f0b36095450efae4a64980bdc7ec2f51354205fdf39/hf_xet-1.4.2-cp37-abi3-win_arm64.whl", hash = "sha256:32c012286b581f783653e718c1862aea5b9eb140631685bb0c5e7012c8719a87", size = 3533426, upload-time = "2026-03-13T06:58:55.46Z" },
]
[[package]]
name = "http-ece"
version = "1.2.1"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "cryptography" },
]
sdist = { url = "https://files.pythonhosted.org/packages/7c/af/249d1576653b69c20b9ac30e284b63bd94af6a175d72d87813235caf2482/http_ece-1.2.1.tar.gz", hash = "sha256:8c6ab23116bbf6affda894acfd5f2ca0fb8facbcbb72121c11c75c33e7ce8cff", size = 8830, upload-time = "2024-08-08T00:10:47.301Z" }
[[package]]
name = "httpcore"
version = "1.0.9"
@@ -1116,6 +1251,18 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/7e/f5/f66802a942d491edb555dd61e3a9961140fd64c90bce1eafd741609d334d/httpcore-1.0.9-py3-none-any.whl", hash = "sha256:2d400746a40668fc9dec9810239072b40b4484b640a8c38fd654a024c7a1bf55", size = 78784, upload-time = "2025-04-24T22:06:20.566Z" },
]
[[package]]
name = "httplib2"
version = "0.31.2"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "pyparsing" },
]
sdist = { url = "https://files.pythonhosted.org/packages/c1/1f/e86365613582c027dda5ddb64e1010e57a3d53e99ab8a72093fa13d565ec/httplib2-0.31.2.tar.gz", hash = "sha256:385e0869d7397484f4eab426197a4c020b606edd43372492337c0b4010ae5d24", size = 250800, upload-time = "2026-01-23T11:04:44.165Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/2f/90/fd509079dfcab01102c0fdd87f3a9506894bc70afcf9e9785ef6b2b3aff6/httplib2-0.31.2-py3-none-any.whl", hash = "sha256:dbf0c2fa3862acf3c55c078ea9c0bc4481d7dc5117cae71be9514912cf9f8349", size = 91099, upload-time = "2026-01-23T11:04:42.78Z" },
]
[[package]]
name = "httptools"
version = "0.7.1"
@@ -1393,6 +1540,7 @@ name = "konstruct-gateway"
version = "0.1.0"
source = { editable = "packages/gateway" }
dependencies = [
{ name = "aiohttp" },
{ name = "boto3" },
{ name = "fastapi", extra = ["standard"] },
{ name = "httpx" },
@@ -1400,12 +1548,14 @@ dependencies = [
{ name = "konstruct-router" },
{ name = "konstruct-shared" },
{ name = "python-telegram-bot" },
{ name = "pywebpush" },
{ name = "redis" },
{ name = "slack-bolt" },
]
[package.metadata]
requires-dist = [
{ name = "aiohttp", specifier = ">=3.9.0" },
{ name = "boto3", specifier = ">=1.35.0" },
{ name = "fastapi", extras = ["standard"], specifier = ">=0.115.0" },
{ name = "httpx", specifier = ">=0.28.0" },
@@ -1413,6 +1563,7 @@ requires-dist = [
{ name = "konstruct-router", editable = "packages/router" },
{ name = "konstruct-shared", editable = "packages/shared" },
{ name = "python-telegram-bot", specifier = ">=21.0" },
{ name = "pywebpush", specifier = ">=2.0.0" },
{ name = "redis", specifier = ">=5.0.0" },
{ name = "slack-bolt", specifier = ">=1.22.0" },
]
@@ -1433,7 +1584,7 @@ requires-dist = [
{ name = "fastapi", extras = ["standard"], specifier = ">=0.115.0" },
{ name = "httpx", specifier = ">=0.28.0" },
{ name = "konstruct-shared", editable = "packages/shared" },
{ name = "litellm", specifier = "==1.82.5" },
{ name = "litellm", git = "https://github.com/BerriAI/litellm.git" },
]
[[package]]
@@ -1443,20 +1594,38 @@ source = { editable = "packages/orchestrator" }
dependencies = [
{ name = "celery", extra = ["redis"] },
{ name = "fastapi", extra = ["standard"] },
{ name = "firecrawl-py" },
{ name = "google-api-python-client" },
{ name = "google-auth-oauthlib" },
{ name = "httpx" },
{ name = "jsonschema" },
{ name = "konstruct-shared" },
{ name = "openpyxl" },
{ name = "pandas" },
{ name = "pypdf" },
{ name = "python-docx" },
{ name = "python-pptx" },
{ name = "sentence-transformers" },
{ name = "youtube-transcript-api" },
]
[package.metadata]
requires-dist = [
{ name = "celery", extras = ["redis"], specifier = ">=5.4.0" },
{ name = "fastapi", extras = ["standard"], specifier = ">=0.115.0" },
{ name = "firecrawl-py", specifier = ">=4.21.0" },
{ name = "google-api-python-client", specifier = ">=2.193.0" },
{ name = "google-auth-oauthlib", specifier = ">=1.3.0" },
{ name = "httpx", specifier = ">=0.28.0" },
{ name = "jsonschema", specifier = ">=4.26.0" },
{ name = "konstruct-shared", editable = "packages/shared" },
{ name = "openpyxl", specifier = ">=3.1.5" },
{ name = "pandas", specifier = ">=3.0.1" },
{ name = "pypdf", specifier = ">=6.9.2" },
{ name = "python-docx", specifier = ">=1.2.0" },
{ name = "python-pptx", specifier = ">=1.0.2" },
{ name = "sentence-transformers", specifier = ">=3.0.0" },
{ name = "youtube-transcript-api", specifier = ">=1.2.4" },
]
[[package]]
@@ -1593,8 +1762,8 @@ wheels = [
[[package]]
name = "litellm"
version = "1.82.5"
source = { registry = "https://pypi.org/simple" }
version = "1.82.6"
source = { git = "https://github.com/BerriAI/litellm.git#f9d29e4e4e33e6b8d2181aa602111170f3c5e427" }
dependencies = [
{ name = "aiohttp" },
{ name = "click" },
@@ -1609,9 +1778,85 @@ dependencies = [
{ name = "tiktoken" },
{ name = "tokenizers" },
]
sdist = { url = "https://files.pythonhosted.org/packages/d7/f0/ec42ee14b388ce1d08a1df638f894ed7f1e6ac35b9daf0588ff7f7d52262/litellm-1.82.5.tar.gz", hash = "sha256:7988a9b48c8ccd9e5ebced80a4dfce9ce87083b303c3f67082450a4ad6dd312f", size = 17406156, upload-time = "2026-03-21T00:03:53.239Z" }
[[package]]
name = "lxml"
version = "6.0.2"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/aa/88/262177de60548e5a2bfc46ad28232c9e9cbde697bd94132aeb80364675cb/lxml-6.0.2.tar.gz", hash = "sha256:cd79f3367bd74b317dda655dc8fcfa304d9eb6e4fb06b7168c5cf27f96e0cd62", size = 4073426, upload-time = "2025-09-22T04:04:59.287Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/68/1f/b6c8043eec81eade53a4d0e15a50b788ab0e82661e01a25e0b8536a4dca0/litellm-1.82.5-py3-none-any.whl", hash = "sha256:e1012ab816352215c4e00776dd48b0c68058b537888a8ff82cca62af19e6fb11", size = 15589652, upload-time = "2026-03-21T00:03:48.87Z" },
{ url = "https://files.pythonhosted.org/packages/f3/c8/8ff2bc6b920c84355146cd1ab7d181bc543b89241cfb1ebee824a7c81457/lxml-6.0.2-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:a59f5448ba2ceccd06995c95ea59a7674a10de0810f2ce90c9006f3cbc044456", size = 8661887, upload-time = "2025-09-22T04:01:17.265Z" },
{ url = "https://files.pythonhosted.org/packages/37/6f/9aae1008083bb501ef63284220ce81638332f9ccbfa53765b2b7502203cf/lxml-6.0.2-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:e8113639f3296706fbac34a30813929e29247718e88173ad849f57ca59754924", size = 4667818, upload-time = "2025-09-22T04:01:19.688Z" },
{ url = "https://files.pythonhosted.org/packages/f1/ca/31fb37f99f37f1536c133476674c10b577e409c0a624384147653e38baf2/lxml-6.0.2-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:a8bef9b9825fa8bc816a6e641bb67219489229ebc648be422af695f6e7a4fa7f", size = 4950807, upload-time = "2025-09-22T04:01:21.487Z" },
{ url = "https://files.pythonhosted.org/packages/da/87/f6cb9442e4bada8aab5ae7e1046264f62fdbeaa6e3f6211b93f4c0dd97f1/lxml-6.0.2-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:65ea18d710fd14e0186c2f973dc60bb52039a275f82d3c44a0e42b43440ea534", size = 5109179, upload-time = "2025-09-22T04:01:23.32Z" },
{ url = "https://files.pythonhosted.org/packages/c8/20/a7760713e65888db79bbae4f6146a6ae5c04e4a204a3c48896c408cd6ed2/lxml-6.0.2-cp312-cp312-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:c371aa98126a0d4c739ca93ceffa0fd7a5d732e3ac66a46e74339acd4d334564", size = 5023044, upload-time = "2025-09-22T04:01:25.118Z" },
{ url = "https://files.pythonhosted.org/packages/a2/b0/7e64e0460fcb36471899f75831509098f3fd7cd02a3833ac517433cb4f8f/lxml-6.0.2-cp312-cp312-manylinux_2_26_i686.manylinux_2_28_i686.whl", hash = "sha256:700efd30c0fa1a3581d80a748157397559396090a51d306ea59a70020223d16f", size = 5359685, upload-time = "2025-09-22T04:01:27.398Z" },
{ url = "https://files.pythonhosted.org/packages/b9/e1/e5df362e9ca4e2f48ed6411bd4b3a0ae737cc842e96877f5bf9428055ab4/lxml-6.0.2-cp312-cp312-manylinux_2_26_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:c33e66d44fe60e72397b487ee92e01da0d09ba2d66df8eae42d77b6d06e5eba0", size = 5654127, upload-time = "2025-09-22T04:01:29.629Z" },
{ url = "https://files.pythonhosted.org/packages/c6/d1/232b3309a02d60f11e71857778bfcd4acbdb86c07db8260caf7d008b08f8/lxml-6.0.2-cp312-cp312-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:90a345bbeaf9d0587a3aaffb7006aa39ccb6ff0e96a57286c0cb2fd1520ea192", size = 5253958, upload-time = "2025-09-22T04:01:31.535Z" },
{ url = "https://files.pythonhosted.org/packages/35/35/d955a070994725c4f7d80583a96cab9c107c57a125b20bb5f708fe941011/lxml-6.0.2-cp312-cp312-manylinux_2_31_armv7l.whl", hash = "sha256:064fdadaf7a21af3ed1dcaa106b854077fbeada827c18f72aec9346847cd65d0", size = 4711541, upload-time = "2025-09-22T04:01:33.801Z" },
{ url = "https://files.pythonhosted.org/packages/1e/be/667d17363b38a78c4bd63cfd4b4632029fd68d2c2dc81f25ce9eb5224dd5/lxml-6.0.2-cp312-cp312-manylinux_2_38_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:fbc74f42c3525ac4ffa4b89cbdd00057b6196bcefe8bce794abd42d33a018092", size = 5267426, upload-time = "2025-09-22T04:01:35.639Z" },
{ url = "https://files.pythonhosted.org/packages/ea/47/62c70aa4a1c26569bc958c9ca86af2bb4e1f614e8c04fb2989833874f7ae/lxml-6.0.2-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:6ddff43f702905a4e32bc24f3f2e2edfe0f8fde3277d481bffb709a4cced7a1f", size = 5064917, upload-time = "2025-09-22T04:01:37.448Z" },
{ url = "https://files.pythonhosted.org/packages/bd/55/6ceddaca353ebd0f1908ef712c597f8570cc9c58130dbb89903198e441fd/lxml-6.0.2-cp312-cp312-musllinux_1_2_armv7l.whl", hash = "sha256:6da5185951d72e6f5352166e3da7b0dc27aa70bd1090b0eb3f7f7212b53f1bb8", size = 4788795, upload-time = "2025-09-22T04:01:39.165Z" },
{ url = "https://files.pythonhosted.org/packages/cf/e8/fd63e15da5e3fd4c2146f8bbb3c14e94ab850589beab88e547b2dbce22e1/lxml-6.0.2-cp312-cp312-musllinux_1_2_ppc64le.whl", hash = "sha256:57a86e1ebb4020a38d295c04fc79603c7899e0df71588043eb218722dabc087f", size = 5676759, upload-time = "2025-09-22T04:01:41.506Z" },
{ url = "https://files.pythonhosted.org/packages/76/47/b3ec58dc5c374697f5ba37412cd2728f427d056315d124dd4b61da381877/lxml-6.0.2-cp312-cp312-musllinux_1_2_riscv64.whl", hash = "sha256:2047d8234fe735ab77802ce5f2297e410ff40f5238aec569ad7c8e163d7b19a6", size = 5255666, upload-time = "2025-09-22T04:01:43.363Z" },
{ url = "https://files.pythonhosted.org/packages/19/93/03ba725df4c3d72afd9596eef4a37a837ce8e4806010569bedfcd2cb68fd/lxml-6.0.2-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:6f91fd2b2ea15a6800c8e24418c0775a1694eefc011392da73bc6cef2623b322", size = 5277989, upload-time = "2025-09-22T04:01:45.215Z" },
{ url = "https://files.pythonhosted.org/packages/c6/80/c06de80bfce881d0ad738576f243911fccf992687ae09fd80b734712b39c/lxml-6.0.2-cp312-cp312-win32.whl", hash = "sha256:3ae2ce7d6fedfb3414a2b6c5e20b249c4c607f72cb8d2bb7cc9c6ec7c6f4e849", size = 3611456, upload-time = "2025-09-22T04:01:48.243Z" },
{ url = "https://files.pythonhosted.org/packages/f7/d7/0cdfb6c3e30893463fb3d1e52bc5f5f99684a03c29a0b6b605cfae879cd5/lxml-6.0.2-cp312-cp312-win_amd64.whl", hash = "sha256:72c87e5ee4e58a8354fb9c7c84cbf95a1c8236c127a5d1b7683f04bed8361e1f", size = 4011793, upload-time = "2025-09-22T04:01:50.042Z" },
{ url = "https://files.pythonhosted.org/packages/ea/7b/93c73c67db235931527301ed3785f849c78991e2e34f3fd9a6663ffda4c5/lxml-6.0.2-cp312-cp312-win_arm64.whl", hash = "sha256:61cb10eeb95570153e0c0e554f58df92ecf5109f75eacad4a95baa709e26c3d6", size = 3672836, upload-time = "2025-09-22T04:01:52.145Z" },
{ url = "https://files.pythonhosted.org/packages/53/fd/4e8f0540608977aea078bf6d79f128e0e2c2bba8af1acf775c30baa70460/lxml-6.0.2-cp313-cp313-macosx_10_13_universal2.whl", hash = "sha256:9b33d21594afab46f37ae58dfadd06636f154923c4e8a4d754b0127554eb2e77", size = 8648494, upload-time = "2025-09-22T04:01:54.242Z" },
{ url = "https://files.pythonhosted.org/packages/5d/f4/2a94a3d3dfd6c6b433501b8d470a1960a20ecce93245cf2db1706adf6c19/lxml-6.0.2-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:6c8963287d7a4c5c9a432ff487c52e9c5618667179c18a204bdedb27310f022f", size = 4661146, upload-time = "2025-09-22T04:01:56.282Z" },
{ url = "https://files.pythonhosted.org/packages/25/2e/4efa677fa6b322013035d38016f6ae859d06cac67437ca7dc708a6af7028/lxml-6.0.2-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:1941354d92699fb5ffe6ed7b32f9649e43c2feb4b97205f75866f7d21aa91452", size = 4946932, upload-time = "2025-09-22T04:01:58.989Z" },
{ url = "https://files.pythonhosted.org/packages/ce/0f/526e78a6d38d109fdbaa5049c62e1d32fdd70c75fb61c4eadf3045d3d124/lxml-6.0.2-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:bb2f6ca0ae2d983ded09357b84af659c954722bbf04dea98030064996d156048", size = 5100060, upload-time = "2025-09-22T04:02:00.812Z" },
{ url = "https://files.pythonhosted.org/packages/81/76/99de58d81fa702cc0ea7edae4f4640416c2062813a00ff24bd70ac1d9c9b/lxml-6.0.2-cp313-cp313-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:eb2a12d704f180a902d7fa778c6d71f36ceb7b0d317f34cdc76a5d05aa1dd1df", size = 5019000, upload-time = "2025-09-22T04:02:02.671Z" },
{ url = "https://files.pythonhosted.org/packages/b5/35/9e57d25482bc9a9882cb0037fdb9cc18f4b79d85df94fa9d2a89562f1d25/lxml-6.0.2-cp313-cp313-manylinux_2_26_i686.manylinux_2_28_i686.whl", hash = "sha256:6ec0e3f745021bfed19c456647f0298d60a24c9ff86d9d051f52b509663feeb1", size = 5348496, upload-time = "2025-09-22T04:02:04.904Z" },
{ url = "https://files.pythonhosted.org/packages/a6/8e/cb99bd0b83ccc3e8f0f528e9aa1f7a9965dfec08c617070c5db8d63a87ce/lxml-6.0.2-cp313-cp313-manylinux_2_26_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:846ae9a12d54e368933b9759052d6206a9e8b250291109c48e350c1f1f49d916", size = 5643779, upload-time = "2025-09-22T04:02:06.689Z" },
{ url = "https://files.pythonhosted.org/packages/d0/34/9e591954939276bb679b73773836c6684c22e56d05980e31d52a9a8deb18/lxml-6.0.2-cp313-cp313-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:ef9266d2aa545d7374938fb5c484531ef5a2ec7f2d573e62f8ce722c735685fd", size = 5244072, upload-time = "2025-09-22T04:02:08.587Z" },
{ url = "https://files.pythonhosted.org/packages/8d/27/b29ff065f9aaca443ee377aff699714fcbffb371b4fce5ac4ca759e436d5/lxml-6.0.2-cp313-cp313-manylinux_2_31_armv7l.whl", hash = "sha256:4077b7c79f31755df33b795dc12119cb557a0106bfdab0d2c2d97bd3cf3dffa6", size = 4718675, upload-time = "2025-09-22T04:02:10.783Z" },
{ url = "https://files.pythonhosted.org/packages/2b/9f/f756f9c2cd27caa1a6ef8c32ae47aadea697f5c2c6d07b0dae133c244fbe/lxml-6.0.2-cp313-cp313-manylinux_2_38_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:a7c5d5e5f1081955358533be077166ee97ed2571d6a66bdba6ec2f609a715d1a", size = 5255171, upload-time = "2025-09-22T04:02:12.631Z" },
{ url = "https://files.pythonhosted.org/packages/61/46/bb85ea42d2cb1bd8395484fd72f38e3389611aa496ac7772da9205bbda0e/lxml-6.0.2-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:8f8d0cbd0674ee89863a523e6994ac25fd5be9c8486acfc3e5ccea679bad2679", size = 5057175, upload-time = "2025-09-22T04:02:14.718Z" },
{ url = "https://files.pythonhosted.org/packages/95/0c/443fc476dcc8e41577f0af70458c50fe299a97bb6b7505bb1ae09aa7f9ac/lxml-6.0.2-cp313-cp313-musllinux_1_2_armv7l.whl", hash = "sha256:2cbcbf6d6e924c28f04a43f3b6f6e272312a090f269eff68a2982e13e5d57659", size = 4785688, upload-time = "2025-09-22T04:02:16.957Z" },
{ url = "https://files.pythonhosted.org/packages/48/78/6ef0b359d45bb9697bc5a626e1992fa5d27aa3f8004b137b2314793b50a0/lxml-6.0.2-cp313-cp313-musllinux_1_2_ppc64le.whl", hash = "sha256:dfb874cfa53340009af6bdd7e54ebc0d21012a60a4e65d927c2e477112e63484", size = 5660655, upload-time = "2025-09-22T04:02:18.815Z" },
{ url = "https://files.pythonhosted.org/packages/ff/ea/e1d33808f386bc1339d08c0dcada6e4712d4ed8e93fcad5f057070b7988a/lxml-6.0.2-cp313-cp313-musllinux_1_2_riscv64.whl", hash = "sha256:fb8dae0b6b8b7f9e96c26fdd8121522ce5de9bb5538010870bd538683d30e9a2", size = 5247695, upload-time = "2025-09-22T04:02:20.593Z" },
{ url = "https://files.pythonhosted.org/packages/4f/47/eba75dfd8183673725255247a603b4ad606f4ae657b60c6c145b381697da/lxml-6.0.2-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:358d9adae670b63e95bc59747c72f4dc97c9ec58881d4627fe0120da0f90d314", size = 5269841, upload-time = "2025-09-22T04:02:22.489Z" },
{ url = "https://files.pythonhosted.org/packages/76/04/5c5e2b8577bc936e219becb2e98cdb1aca14a4921a12995b9d0c523502ae/lxml-6.0.2-cp313-cp313-win32.whl", hash = "sha256:e8cd2415f372e7e5a789d743d133ae474290a90b9023197fd78f32e2dc6873e2", size = 3610700, upload-time = "2025-09-22T04:02:24.465Z" },
{ url = "https://files.pythonhosted.org/packages/fe/0a/4643ccc6bb8b143e9f9640aa54e38255f9d3b45feb2cbe7ae2ca47e8782e/lxml-6.0.2-cp313-cp313-win_amd64.whl", hash = "sha256:b30d46379644fbfc3ab81f8f82ae4de55179414651f110a1514f0b1f8f6cb2d7", size = 4010347, upload-time = "2025-09-22T04:02:26.286Z" },
{ url = "https://files.pythonhosted.org/packages/31/ef/dcf1d29c3f530577f61e5fe2f1bd72929acf779953668a8a47a479ae6f26/lxml-6.0.2-cp313-cp313-win_arm64.whl", hash = "sha256:13dcecc9946dca97b11b7c40d29fba63b55ab4170d3c0cf8c0c164343b9bfdcf", size = 3671248, upload-time = "2025-09-22T04:02:27.918Z" },
{ url = "https://files.pythonhosted.org/packages/03/15/d4a377b385ab693ce97b472fe0c77c2b16ec79590e688b3ccc71fba19884/lxml-6.0.2-cp314-cp314-macosx_10_13_universal2.whl", hash = "sha256:b0c732aa23de8f8aec23f4b580d1e52905ef468afb4abeafd3fec77042abb6fe", size = 8659801, upload-time = "2025-09-22T04:02:30.113Z" },
{ url = "https://files.pythonhosted.org/packages/c8/e8/c128e37589463668794d503afaeb003987373c5f94d667124ffd8078bbd9/lxml-6.0.2-cp314-cp314-macosx_10_13_x86_64.whl", hash = "sha256:4468e3b83e10e0317a89a33d28f7aeba1caa4d1a6fd457d115dd4ffe90c5931d", size = 4659403, upload-time = "2025-09-22T04:02:32.119Z" },
{ url = "https://files.pythonhosted.org/packages/00/ce/74903904339decdf7da7847bb5741fc98a5451b42fc419a86c0c13d26fe2/lxml-6.0.2-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:abd44571493973bad4598a3be7e1d807ed45aa2adaf7ab92ab7c62609569b17d", size = 4966974, upload-time = "2025-09-22T04:02:34.155Z" },
{ url = "https://files.pythonhosted.org/packages/1f/d3/131dec79ce61c5567fecf82515bd9bc36395df42501b50f7f7f3bd065df0/lxml-6.0.2-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:370cd78d5855cfbffd57c422851f7d3864e6ae72d0da615fca4dad8c45d375a5", size = 5102953, upload-time = "2025-09-22T04:02:36.054Z" },
{ url = "https://files.pythonhosted.org/packages/3a/ea/a43ba9bb750d4ffdd885f2cd333572f5bb900cd2408b67fdda07e85978a0/lxml-6.0.2-cp314-cp314-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:901e3b4219fa04ef766885fb40fa516a71662a4c61b80c94d25336b4934b71c0", size = 5055054, upload-time = "2025-09-22T04:02:38.154Z" },
{ url = "https://files.pythonhosted.org/packages/60/23/6885b451636ae286c34628f70a7ed1fcc759f8d9ad382d132e1c8d3d9bfd/lxml-6.0.2-cp314-cp314-manylinux_2_26_i686.manylinux_2_28_i686.whl", hash = "sha256:a4bf42d2e4cf52c28cc1812d62426b9503cdb0c87a6de81442626aa7d69707ba", size = 5352421, upload-time = "2025-09-22T04:02:40.413Z" },
{ url = "https://files.pythonhosted.org/packages/48/5b/fc2ddfc94ddbe3eebb8e9af6e3fd65e2feba4967f6a4e9683875c394c2d8/lxml-6.0.2-cp314-cp314-manylinux_2_26_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:b2c7fdaa4d7c3d886a42534adec7cfac73860b89b4e5298752f60aa5984641a0", size = 5673684, upload-time = "2025-09-22T04:02:42.288Z" },
{ url = "https://files.pythonhosted.org/packages/29/9c/47293c58cc91769130fbf85531280e8cc7868f7fbb6d92f4670071b9cb3e/lxml-6.0.2-cp314-cp314-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:98a5e1660dc7de2200b00d53fa00bcd3c35a3608c305d45a7bbcaf29fa16e83d", size = 5252463, upload-time = "2025-09-22T04:02:44.165Z" },
{ url = "https://files.pythonhosted.org/packages/9b/da/ba6eceb830c762b48e711ded880d7e3e89fc6c7323e587c36540b6b23c6b/lxml-6.0.2-cp314-cp314-manylinux_2_31_armv7l.whl", hash = "sha256:dc051506c30b609238d79eda75ee9cab3e520570ec8219844a72a46020901e37", size = 4698437, upload-time = "2025-09-22T04:02:46.524Z" },
{ url = "https://files.pythonhosted.org/packages/a5/24/7be3f82cb7990b89118d944b619e53c656c97dc89c28cfb143fdb7cd6f4d/lxml-6.0.2-cp314-cp314-manylinux_2_38_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:8799481bbdd212470d17513a54d568f44416db01250f49449647b5ab5b5dccb9", size = 5269890, upload-time = "2025-09-22T04:02:48.812Z" },
{ url = "https://files.pythonhosted.org/packages/1b/bd/dcfb9ea1e16c665efd7538fc5d5c34071276ce9220e234217682e7d2c4a5/lxml-6.0.2-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:9261bb77c2dab42f3ecd9103951aeca2c40277701eb7e912c545c1b16e0e4917", size = 5097185, upload-time = "2025-09-22T04:02:50.746Z" },
{ url = "https://files.pythonhosted.org/packages/21/04/a60b0ff9314736316f28316b694bccbbabe100f8483ad83852d77fc7468e/lxml-6.0.2-cp314-cp314-musllinux_1_2_armv7l.whl", hash = "sha256:65ac4a01aba353cfa6d5725b95d7aed6356ddc0a3cd734de00124d285b04b64f", size = 4745895, upload-time = "2025-09-22T04:02:52.968Z" },
{ url = "https://files.pythonhosted.org/packages/d6/bd/7d54bd1846e5a310d9c715921c5faa71cf5c0853372adf78aee70c8d7aa2/lxml-6.0.2-cp314-cp314-musllinux_1_2_ppc64le.whl", hash = "sha256:b22a07cbb82fea98f8a2fd814f3d1811ff9ed76d0fc6abc84eb21527596e7cc8", size = 5695246, upload-time = "2025-09-22T04:02:54.798Z" },
{ url = "https://files.pythonhosted.org/packages/fd/32/5643d6ab947bc371da21323acb2a6e603cedbe71cb4c99c8254289ab6f4e/lxml-6.0.2-cp314-cp314-musllinux_1_2_riscv64.whl", hash = "sha256:d759cdd7f3e055d6bc8d9bec3ad905227b2e4c785dc16c372eb5b5e83123f48a", size = 5260797, upload-time = "2025-09-22T04:02:57.058Z" },
{ url = "https://files.pythonhosted.org/packages/33/da/34c1ec4cff1eea7d0b4cd44af8411806ed943141804ac9c5d565302afb78/lxml-6.0.2-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:945da35a48d193d27c188037a05fec5492937f66fb1958c24fc761fb9d40d43c", size = 5277404, upload-time = "2025-09-22T04:02:58.966Z" },
{ url = "https://files.pythonhosted.org/packages/82/57/4eca3e31e54dc89e2c3507e1cd411074a17565fa5ffc437c4ae0a00d439e/lxml-6.0.2-cp314-cp314-win32.whl", hash = "sha256:be3aaa60da67e6153eb15715cc2e19091af5dc75faef8b8a585aea372507384b", size = 3670072, upload-time = "2025-09-22T04:03:38.05Z" },
{ url = "https://files.pythonhosted.org/packages/e3/e0/c96cf13eccd20c9421ba910304dae0f619724dcf1702864fd59dd386404d/lxml-6.0.2-cp314-cp314-win_amd64.whl", hash = "sha256:fa25afbadead523f7001caf0c2382afd272c315a033a7b06336da2637d92d6ed", size = 4080617, upload-time = "2025-09-22T04:03:39.835Z" },
{ url = "https://files.pythonhosted.org/packages/d5/5d/b3f03e22b3d38d6f188ef044900a9b29b2fe0aebb94625ce9fe244011d34/lxml-6.0.2-cp314-cp314-win_arm64.whl", hash = "sha256:063eccf89df5b24e361b123e257e437f9e9878f425ee9aae3144c77faf6da6d8", size = 3754930, upload-time = "2025-09-22T04:03:41.565Z" },
{ url = "https://files.pythonhosted.org/packages/5e/5c/42c2c4c03554580708fc738d13414801f340c04c3eff90d8d2d227145275/lxml-6.0.2-cp314-cp314t-macosx_10_13_universal2.whl", hash = "sha256:6162a86d86893d63084faaf4ff937b3daea233e3682fb4474db07395794fa80d", size = 8910380, upload-time = "2025-09-22T04:03:01.645Z" },
{ url = "https://files.pythonhosted.org/packages/bf/4f/12df843e3e10d18d468a7557058f8d3733e8b6e12401f30b1ef29360740f/lxml-6.0.2-cp314-cp314t-macosx_10_13_x86_64.whl", hash = "sha256:414aaa94e974e23a3e92e7ca5b97d10c0cf37b6481f50911032c69eeb3991bba", size = 4775632, upload-time = "2025-09-22T04:03:03.814Z" },
{ url = "https://files.pythonhosted.org/packages/e4/0c/9dc31e6c2d0d418483cbcb469d1f5a582a1cd00a1f4081953d44051f3c50/lxml-6.0.2-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:48461bd21625458dd01e14e2c38dd0aea69addc3c4f960c30d9f59d7f93be601", size = 4975171, upload-time = "2025-09-22T04:03:05.651Z" },
{ url = "https://files.pythonhosted.org/packages/e7/2b/9b870c6ca24c841bdd887504808f0417aa9d8d564114689266f19ddf29c8/lxml-6.0.2-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:25fcc59afc57d527cfc78a58f40ab4c9b8fd096a9a3f964d2781ffb6eb33f4ed", size = 5110109, upload-time = "2025-09-22T04:03:07.452Z" },
{ url = "https://files.pythonhosted.org/packages/bf/0c/4f5f2a4dd319a178912751564471355d9019e220c20d7db3fb8307ed8582/lxml-6.0.2-cp314-cp314t-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:5179c60288204e6ddde3f774a93350177e08876eaf3ab78aa3a3649d43eb7d37", size = 5041061, upload-time = "2025-09-22T04:03:09.297Z" },
{ url = "https://files.pythonhosted.org/packages/12/64/554eed290365267671fe001a20d72d14f468ae4e6acef1e179b039436967/lxml-6.0.2-cp314-cp314t-manylinux_2_26_i686.manylinux_2_28_i686.whl", hash = "sha256:967aab75434de148ec80597b75062d8123cadf2943fb4281f385141e18b21338", size = 5306233, upload-time = "2025-09-22T04:03:11.651Z" },
{ url = "https://files.pythonhosted.org/packages/7a/31/1d748aa275e71802ad9722df32a7a35034246b42c0ecdd8235412c3396ef/lxml-6.0.2-cp314-cp314t-manylinux_2_26_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:d100fcc8930d697c6561156c6810ab4a508fb264c8b6779e6e61e2ed5e7558f9", size = 5604739, upload-time = "2025-09-22T04:03:13.592Z" },
{ url = "https://files.pythonhosted.org/packages/8f/41/2c11916bcac09ed561adccacceaedd2bf0e0b25b297ea92aab99fd03d0fa/lxml-6.0.2-cp314-cp314t-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:2ca59e7e13e5981175b8b3e4ab84d7da57993eeff53c07764dcebda0d0e64ecd", size = 5225119, upload-time = "2025-09-22T04:03:15.408Z" },
{ url = "https://files.pythonhosted.org/packages/99/05/4e5c2873d8f17aa018e6afde417c80cc5d0c33be4854cce3ef5670c49367/lxml-6.0.2-cp314-cp314t-manylinux_2_31_armv7l.whl", hash = "sha256:957448ac63a42e2e49531b9d6c0fa449a1970dbc32467aaad46f11545be9af1d", size = 4633665, upload-time = "2025-09-22T04:03:17.262Z" },
{ url = "https://files.pythonhosted.org/packages/0f/c9/dcc2da1bebd6275cdc723b515f93edf548b82f36a5458cca3578bc899332/lxml-6.0.2-cp314-cp314t-manylinux_2_38_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:b7fc49c37f1786284b12af63152fe1d0990722497e2d5817acfe7a877522f9a9", size = 5234997, upload-time = "2025-09-22T04:03:19.14Z" },
{ url = "https://files.pythonhosted.org/packages/9c/e2/5172e4e7468afca64a37b81dba152fc5d90e30f9c83c7c3213d6a02a5ce4/lxml-6.0.2-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:e19e0643cc936a22e837f79d01a550678da8377d7d801a14487c10c34ee49c7e", size = 5090957, upload-time = "2025-09-22T04:03:21.436Z" },
{ url = "https://files.pythonhosted.org/packages/a5/b3/15461fd3e5cd4ddcb7938b87fc20b14ab113b92312fc97afe65cd7c85de1/lxml-6.0.2-cp314-cp314t-musllinux_1_2_armv7l.whl", hash = "sha256:1db01e5cf14345628e0cbe71067204db658e2fb8e51e7f33631f5f4735fefd8d", size = 4764372, upload-time = "2025-09-22T04:03:23.27Z" },
{ url = "https://files.pythonhosted.org/packages/05/33/f310b987c8bf9e61c4dd8e8035c416bd3230098f5e3cfa69fc4232de7059/lxml-6.0.2-cp314-cp314t-musllinux_1_2_ppc64le.whl", hash = "sha256:875c6b5ab39ad5291588aed6925fac99d0097af0dd62f33c7b43736043d4a2ec", size = 5634653, upload-time = "2025-09-22T04:03:25.767Z" },
{ url = "https://files.pythonhosted.org/packages/70/ff/51c80e75e0bc9382158133bdcf4e339b5886c6ee2418b5199b3f1a61ed6d/lxml-6.0.2-cp314-cp314t-musllinux_1_2_riscv64.whl", hash = "sha256:cdcbed9ad19da81c480dfd6dd161886db6096083c9938ead313d94b30aadf272", size = 5233795, upload-time = "2025-09-22T04:03:27.62Z" },
{ url = "https://files.pythonhosted.org/packages/56/4d/4856e897df0d588789dd844dbed9d91782c4ef0b327f96ce53c807e13128/lxml-6.0.2-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:80dadc234ebc532e09be1975ff538d154a7fa61ea5031c03d25178855544728f", size = 5257023, upload-time = "2025-09-22T04:03:30.056Z" },
{ url = "https://files.pythonhosted.org/packages/0f/85/86766dfebfa87bea0ab78e9ff7a4b4b45225df4b4d3b8cc3c03c5cd68464/lxml-6.0.2-cp314-cp314t-win32.whl", hash = "sha256:da08e7bb297b04e893d91087df19638dc7a6bb858a954b0cc2b9f5053c922312", size = 3911420, upload-time = "2025-09-22T04:03:32.198Z" },
{ url = "https://files.pythonhosted.org/packages/fe/1a/b248b355834c8e32614650b8008c69ffeb0ceb149c793961dd8c0b991bb3/lxml-6.0.2-cp314-cp314t-win_amd64.whl", hash = "sha256:252a22982dca42f6155125ac76d3432e548a7625d56f5a273ee78a5057216eca", size = 4406837, upload-time = "2025-09-22T04:03:34.027Z" },
{ url = "https://files.pythonhosted.org/packages/92/aa/df863bcc39c5e0946263454aba394de8a9084dbaff8ad143846b0d844739/lxml-6.0.2-cp314-cp314t-win_arm64.whl", hash = "sha256:bb4c1847b303835d89d785a18801a883436cdfd5dc3d62947f9c49e24f0f5a2c", size = 3822205, upload-time = "2025-09-22T04:03:36.249Z" },
]
[[package]]
@@ -1860,6 +2105,15 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/79/7b/2c79738432f5c924bef5071f933bcc9efd0473bac3b4aa584a6f7c1c8df8/mypy_extensions-1.1.0-py3-none-any.whl", hash = "sha256:1be4cccdb0f2482337c4743e60421de3a356cd97508abadd57d47403e94f5505", size = 4963, upload-time = "2025-04-22T14:54:22.983Z" },
]
[[package]]
name = "nest-asyncio"
version = "1.6.0"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/83/f8/51569ac65d696c8ecbee95938f89d4abf00f47d58d48f6fbabfe8f0baefe/nest_asyncio-1.6.0.tar.gz", hash = "sha256:6f172d5449aca15afd6c646851f4e31e02c598d553a667e38cafa997cfec55fe", size = 7418, upload-time = "2024-01-21T14:25:19.227Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/a0/c4/c2971a3ba4c6103a3d10c4b0f24f461ddc027f0f09763220cf35ca1401b3/nest_asyncio-1.6.0-py3-none-any.whl", hash = "sha256:87af6efd6b5e897c81050477ef65c62e2b2f35d51703cae01aff2905b1852e1c", size = 5195, upload-time = "2024-01-21T14:25:17.223Z" },
]
[[package]]
name = "networkx"
version = "3.6.1"
@@ -1971,7 +2225,7 @@ name = "nvidia-cudnn-cu13"
version = "9.19.0.56"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "nvidia-cublas" },
{ name = "nvidia-cublas", marker = "sys_platform != 'emscripten' and sys_platform != 'win32'" },
]
wheels = [
{ url = "https://files.pythonhosted.org/packages/f1/84/26025437c1e6b61a707442184fa0c03d083b661adf3a3eecfd6d21677740/nvidia_cudnn_cu13-9.19.0.56-py3-none-manylinux_2_27_aarch64.whl", hash = "sha256:6ed29ffaee1176c612daf442e4dd6cfeb6a0caa43ddcbeb59da94953030b1be4", size = 433781201, upload-time = "2026-02-03T20:40:53.805Z" },
@@ -1983,7 +2237,7 @@ name = "nvidia-cufft"
version = "12.0.0.61"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "nvidia-nvjitlink" },
{ name = "nvidia-nvjitlink", marker = "sys_platform != 'emscripten' and sys_platform != 'win32'" },
]
wheels = [
{ url = "https://files.pythonhosted.org/packages/8b/ae/f417a75c0259e85c1d2f83ca4e960289a5f814ed0cea74d18c353d3e989d/nvidia_cufft-12.0.0.61-py3-none-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:2708c852ef8cd89d1d2068bdbece0aa188813a0c934db3779b9b1faa8442e5f5", size = 214053554, upload-time = "2025-09-04T08:31:38.196Z" },
@@ -2013,9 +2267,9 @@ name = "nvidia-cusolver"
version = "12.0.4.66"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "nvidia-cublas" },
{ name = "nvidia-cusparse" },
{ name = "nvidia-nvjitlink" },
{ name = "nvidia-cublas", marker = "sys_platform != 'emscripten' and sys_platform != 'win32'" },
{ name = "nvidia-cusparse", marker = "sys_platform != 'emscripten' and sys_platform != 'win32'" },
{ name = "nvidia-nvjitlink", marker = "sys_platform != 'emscripten' and sys_platform != 'win32'" },
]
wheels = [
{ url = "https://files.pythonhosted.org/packages/c8/c3/b30c9e935fc01e3da443ec0116ed1b2a009bb867f5324d3f2d7e533e776b/nvidia_cusolver-12.0.4.66-py3-none-manylinux_2_27_aarch64.whl", hash = "sha256:02c2457eaa9e39de20f880f4bd8820e6a1cfb9f9a34f820eb12a155aa5bc92d2", size = 223467760, upload-time = "2025-09-04T08:33:04.222Z" },
@@ -2027,7 +2281,7 @@ name = "nvidia-cusparse"
version = "12.6.3.3"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "nvidia-nvjitlink" },
{ name = "nvidia-nvjitlink", marker = "sys_platform != 'emscripten' and sys_platform != 'win32'" },
]
wheels = [
{ url = "https://files.pythonhosted.org/packages/f8/94/5c26f33738ae35276672f12615a64bd008ed5be6d1ebcb23579285d960a9/nvidia_cusparse-12.6.3.3-py3-none-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:80bcc4662f23f1054ee334a15c72b8940402975e0eab63178fc7e670aa59472c", size = 162155568, upload-time = "2025-09-04T08:33:42.864Z" },
@@ -2079,6 +2333,15 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/a8/64/3708a90d1ebe202ffdeb7185f878a3c84d15c2b2c31858da2ce0583e2def/nvidia_nvtx-13.0.85-py3-none-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:cb7780edb6b14107373c835bf8b72e7a178bac7367e23da7acb108f973f157a6", size = 148878, upload-time = "2025-09-04T08:28:53.627Z" },
]
[[package]]
name = "oauthlib"
version = "3.3.1"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/0b/5f/19930f824ffeb0ad4372da4812c50edbd1434f678c90c2733e1188edfc63/oauthlib-3.3.1.tar.gz", hash = "sha256:0f0f8aa759826a193cf66c12ea1af1637f87b9b4622d46e866952bb022e538c9", size = 185918, upload-time = "2025-06-19T22:48:08.269Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/be/9c/92789c596b8df838baa98fa71844d84283302f7604ed565dafe5a6b5041a/oauthlib-3.3.1-py3-none-any.whl", hash = "sha256:88119c938d2b8fb88561af5f6ee0eec8cc8d552b7bb1f712743136eb7523b7a1", size = 160065, upload-time = "2025-06-19T22:48:06.508Z" },
]
[[package]]
name = "openai"
version = "2.29.0"
@@ -2098,6 +2361,18 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/d0/b1/35b6f9c8cf9318e3dbb7146cc82dab4cf61182a8d5406fc9b50864362895/openai-2.29.0-py3-none-any.whl", hash = "sha256:b7c5de513c3286d17c5e29b92c4c98ceaf0d775244ac8159aeb1bddf840eb42a", size = 1141533, upload-time = "2026-03-17T17:53:47.348Z" },
]
[[package]]
name = "openpyxl"
version = "3.1.5"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "et-xmlfile" },
]
sdist = { url = "https://files.pythonhosted.org/packages/3d/f9/88d94a75de065ea32619465d2f77b29a0469500e99012523b91cc4141cd1/openpyxl-3.1.5.tar.gz", hash = "sha256:cf0e3cf56142039133628b5acffe8ef0c12bc902d2aadd3e0fe5878dc08d1050", size = 186464, upload-time = "2024-06-28T14:03:44.161Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/c0/da/977ded879c29cbd04de313843e76868e6e13408a94ed6b987245dc7c8506/openpyxl-3.1.5-py2.py3-none-any.whl", hash = "sha256:5282c12b107bffeef825f4617dc029afaf41d0ea60823bbb665ef3079dc79de2", size = 250910, upload-time = "2024-06-28T14:03:41.161Z" },
]
[[package]]
name = "packaging"
version = "26.0"
@@ -2107,6 +2382,58 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/b7/b9/c538f279a4e237a006a2c98387d081e9eb060d203d8ed34467cc0f0b9b53/packaging-26.0-py3-none-any.whl", hash = "sha256:b36f1fef9334a5588b4166f8bcd26a14e521f2b55e6b9de3aaa80d3ff7a37529", size = 74366, upload-time = "2026-01-21T20:50:37.788Z" },
]
[[package]]
name = "pandas"
version = "3.0.1"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "numpy" },
{ name = "python-dateutil" },
{ name = "tzdata", marker = "sys_platform == 'emscripten' or sys_platform == 'win32'" },
]
sdist = { url = "https://files.pythonhosted.org/packages/2e/0c/b28ed414f080ee0ad153f848586d61d1878f91689950f037f976ce15f6c8/pandas-3.0.1.tar.gz", hash = "sha256:4186a699674af418f655dbd420ed87f50d56b4cd6603784279d9eef6627823c8", size = 4641901, upload-time = "2026-02-17T22:20:16.434Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/37/51/b467209c08dae2c624873d7491ea47d2b47336e5403309d433ea79c38571/pandas-3.0.1-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:476f84f8c20c9f5bc47252b66b4bb25e1a9fc2fa98cead96744d8116cb85771d", size = 10344357, upload-time = "2026-02-17T22:18:38.262Z" },
{ url = "https://files.pythonhosted.org/packages/7c/f1/e2567ffc8951ab371db2e40b2fe068e36b81d8cf3260f06ae508700e5504/pandas-3.0.1-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:0ab749dfba921edf641d4036c4c21c0b3ea70fea478165cb98a998fb2a261955", size = 9884543, upload-time = "2026-02-17T22:18:41.476Z" },
{ url = "https://files.pythonhosted.org/packages/d7/39/327802e0b6d693182403c144edacbc27eb82907b57062f23ef5a4c4a5ea7/pandas-3.0.1-cp312-cp312-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:b8e36891080b87823aff3640c78649b91b8ff6eea3c0d70aeabd72ea43ab069b", size = 10396030, upload-time = "2026-02-17T22:18:43.822Z" },
{ url = "https://files.pythonhosted.org/packages/3d/fe/89d77e424365280b79d99b3e1e7d606f5165af2f2ecfaf0c6d24c799d607/pandas-3.0.1-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:532527a701281b9dd371e2f582ed9094f4c12dd9ffb82c0c54ee28d8ac9520c4", size = 10876435, upload-time = "2026-02-17T22:18:45.954Z" },
{ url = "https://files.pythonhosted.org/packages/b5/a6/2a75320849dd154a793f69c951db759aedb8d1dd3939eeacda9bdcfa1629/pandas-3.0.1-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:356e5c055ed9b0da1580d465657bc7d00635af4fd47f30afb23025352ba764d1", size = 11405133, upload-time = "2026-02-17T22:18:48.533Z" },
{ url = "https://files.pythonhosted.org/packages/58/53/1d68fafb2e02d7881df66aa53be4cd748d25cbe311f3b3c85c93ea5d30ca/pandas-3.0.1-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:9d810036895f9ad6345b8f2a338dd6998a74e8483847403582cab67745bff821", size = 11932065, upload-time = "2026-02-17T22:18:50.837Z" },
{ url = "https://files.pythonhosted.org/packages/75/08/67cc404b3a966b6df27b38370ddd96b3b023030b572283d035181854aac5/pandas-3.0.1-cp312-cp312-win_amd64.whl", hash = "sha256:536232a5fe26dd989bd633e7a0c450705fdc86a207fec7254a55e9a22950fe43", size = 9741627, upload-time = "2026-02-17T22:18:53.905Z" },
{ url = "https://files.pythonhosted.org/packages/86/4f/caf9952948fb00d23795f09b893d11f1cacb384e666854d87249530f7cbe/pandas-3.0.1-cp312-cp312-win_arm64.whl", hash = "sha256:0f463ebfd8de7f326d38037c7363c6dacb857c5881ab8961fb387804d6daf2f7", size = 9052483, upload-time = "2026-02-17T22:18:57.31Z" },
{ url = "https://files.pythonhosted.org/packages/0b/48/aad6ec4f8d007534c091e9a7172b3ec1b1ee6d99a9cbb936b5eab6c6cf58/pandas-3.0.1-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:5272627187b5d9c20e55d27caf5f2cd23e286aba25cadf73c8590e432e2b7262", size = 10317509, upload-time = "2026-02-17T22:18:59.498Z" },
{ url = "https://files.pythonhosted.org/packages/a8/14/5990826f779f79148ae9d3a2c39593dc04d61d5d90541e71b5749f35af95/pandas-3.0.1-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:661e0f665932af88c7877f31da0dc743fe9c8f2524bdffe23d24fdcb67ef9d56", size = 9860561, upload-time = "2026-02-17T22:19:02.265Z" },
{ url = "https://files.pythonhosted.org/packages/fa/80/f01ff54664b6d70fed71475543d108a9b7c888e923ad210795bef04ffb7d/pandas-3.0.1-cp313-cp313-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:75e6e292ff898679e47a2199172593d9f6107fd2dd3617c22c2946e97d5df46e", size = 10365506, upload-time = "2026-02-17T22:19:05.017Z" },
{ url = "https://files.pythonhosted.org/packages/f2/85/ab6d04733a7d6ff32bfc8382bf1b07078228f5d6ebec5266b91bfc5c4ff7/pandas-3.0.1-cp313-cp313-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:1ff8cf1d2896e34343197685f432450ec99a85ba8d90cce2030c5eee2ef98791", size = 10873196, upload-time = "2026-02-17T22:19:07.204Z" },
{ url = "https://files.pythonhosted.org/packages/48/a9/9301c83d0b47c23ac5deab91c6b39fd98d5b5db4d93b25df8d381451828f/pandas-3.0.1-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:eca8b4510f6763f3d37359c2105df03a7a221a508f30e396a51d0713d462e68a", size = 11370859, upload-time = "2026-02-17T22:19:09.436Z" },
{ url = "https://files.pythonhosted.org/packages/59/fe/0c1fc5bd2d29c7db2ab372330063ad555fb83e08422829c785f5ec2176ca/pandas-3.0.1-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:06aff2ad6f0b94a17822cf8b83bbb563b090ed82ff4fe7712db2ce57cd50d9b8", size = 11924584, upload-time = "2026-02-17T22:19:11.562Z" },
{ url = "https://files.pythonhosted.org/packages/d6/7d/216a1588b65a7aa5f4535570418a599d943c85afb1d95b0876fc00aa1468/pandas-3.0.1-cp313-cp313-win_amd64.whl", hash = "sha256:9fea306c783e28884c29057a1d9baa11a349bbf99538ec1da44c8476563d1b25", size = 9742769, upload-time = "2026-02-17T22:19:13.926Z" },
{ url = "https://files.pythonhosted.org/packages/c4/cb/810a22a6af9a4e97c8ab1c946b47f3489c5bca5adc483ce0ffc84c9cc768/pandas-3.0.1-cp313-cp313-win_arm64.whl", hash = "sha256:a8d37a43c52917427e897cb2e429f67a449327394396a81034a4449b99afda59", size = 9043855, upload-time = "2026-02-17T22:19:16.09Z" },
{ url = "https://files.pythonhosted.org/packages/92/fa/423c89086cca1f039cf1253c3ff5b90f157b5b3757314aa635f6bf3e30aa/pandas-3.0.1-cp313-cp313t-macosx_10_13_x86_64.whl", hash = "sha256:d54855f04f8246ed7b6fc96b05d4871591143c46c0b6f4af874764ed0d2d6f06", size = 10752673, upload-time = "2026-02-17T22:19:18.304Z" },
{ url = "https://files.pythonhosted.org/packages/22/23/b5a08ec1f40020397f0faba72f1e2c11f7596a6169c7b3e800abff0e433f/pandas-3.0.1-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:4e1b677accee34a09e0dc2ce5624e4a58a1870ffe56fc021e9caf7f23cd7668f", size = 10404967, upload-time = "2026-02-17T22:19:20.726Z" },
{ url = "https://files.pythonhosted.org/packages/5c/81/94841f1bb4afdc2b52a99daa895ac2c61600bb72e26525ecc9543d453ebc/pandas-3.0.1-cp313-cp313t-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:a9cabbdcd03f1b6cd254d6dda8ae09b0252524be1592594c00b7895916cb1324", size = 10320575, upload-time = "2026-02-17T22:19:24.919Z" },
{ url = "https://files.pythonhosted.org/packages/0a/8b/2ae37d66a5342a83adadfd0cb0b4bf9c3c7925424dd5f40d15d6cfaa35ee/pandas-3.0.1-cp313-cp313t-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:5ae2ab1f166668b41e770650101e7090824fd34d17915dd9cd479f5c5e0065e9", size = 10710921, upload-time = "2026-02-17T22:19:27.181Z" },
{ url = "https://files.pythonhosted.org/packages/a2/61/772b2e2757855e232b7ccf7cb8079a5711becb3a97f291c953def15a833f/pandas-3.0.1-cp313-cp313t-musllinux_1_2_aarch64.whl", hash = "sha256:6bf0603c2e30e2cafac32807b06435f28741135cb8697eae8b28c7d492fc7d76", size = 11334191, upload-time = "2026-02-17T22:19:29.411Z" },
{ url = "https://files.pythonhosted.org/packages/1b/08/b16c6df3ef555d8495d1d265a7963b65be166785d28f06a350913a4fac78/pandas-3.0.1-cp313-cp313t-musllinux_1_2_x86_64.whl", hash = "sha256:6c426422973973cae1f4a23e51d4ae85974f44871b24844e4f7de752dd877098", size = 11782256, upload-time = "2026-02-17T22:19:32.34Z" },
{ url = "https://files.pythonhosted.org/packages/55/80/178af0594890dee17e239fca96d3d8670ba0f5ff59b7d0439850924a9c09/pandas-3.0.1-cp313-cp313t-win_amd64.whl", hash = "sha256:b03f91ae8c10a85c1613102c7bef5229b5379f343030a3ccefeca8a33414cf35", size = 10485047, upload-time = "2026-02-17T22:19:34.605Z" },
{ url = "https://files.pythonhosted.org/packages/bb/8b/4bb774a998b97e6c2fd62a9e6cfdaae133b636fd1c468f92afb4ae9a447a/pandas-3.0.1-cp314-cp314-macosx_10_15_x86_64.whl", hash = "sha256:99d0f92ed92d3083d140bf6b97774f9f13863924cf3f52a70711f4e7588f9d0a", size = 10322465, upload-time = "2026-02-17T22:19:36.803Z" },
{ url = "https://files.pythonhosted.org/packages/72/3a/5b39b51c64159f470f1ca3b1c2a87da290657ca022f7cd11442606f607d1/pandas-3.0.1-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:3b66857e983208654294bb6477b8a63dee26b37bdd0eb34d010556e91261784f", size = 9910632, upload-time = "2026-02-17T22:19:39.001Z" },
{ url = "https://files.pythonhosted.org/packages/4e/f7/b449ffb3f68c11da12fc06fbf6d2fa3a41c41e17d0284d23a79e1c13a7e4/pandas-3.0.1-cp314-cp314-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:56cf59638bf24dc9bdf2154c81e248b3289f9a09a6d04e63608c159022352749", size = 10440535, upload-time = "2026-02-17T22:19:41.157Z" },
{ url = "https://files.pythonhosted.org/packages/55/77/6ea82043db22cb0f2bbfe7198da3544000ddaadb12d26be36e19b03a2dc5/pandas-3.0.1-cp314-cp314-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:c1a9f55e0f46951874b863d1f3906dcb57df2d9be5c5847ba4dfb55b2c815249", size = 10893940, upload-time = "2026-02-17T22:19:43.493Z" },
{ url = "https://files.pythonhosted.org/packages/03/30/f1b502a72468c89412c1b882a08f6eed8a4ee9dc033f35f65d0663df6081/pandas-3.0.1-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:1849f0bba9c8a2fb0f691d492b834cc8dadf617e29015c66e989448d58d011ee", size = 11442711, upload-time = "2026-02-17T22:19:46.074Z" },
{ url = "https://files.pythonhosted.org/packages/0d/f0/ebb6ddd8fc049e98cabac5c2924d14d1dda26a20adb70d41ea2e428d3ec4/pandas-3.0.1-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:c3d288439e11b5325b02ae6e9cc83e6805a62c40c5a6220bea9beb899c073b1c", size = 11963918, upload-time = "2026-02-17T22:19:48.838Z" },
{ url = "https://files.pythonhosted.org/packages/09/f8/8ce132104074f977f907442790eaae24e27bce3b3b454e82faa3237ff098/pandas-3.0.1-cp314-cp314-win_amd64.whl", hash = "sha256:93325b0fe372d192965f4cca88d97667f49557398bbf94abdda3bf1b591dbe66", size = 9862099, upload-time = "2026-02-17T22:19:51.081Z" },
{ url = "https://files.pythonhosted.org/packages/e6/b7/6af9aac41ef2456b768ef0ae60acf8abcebb450a52043d030a65b4b7c9bd/pandas-3.0.1-cp314-cp314-win_arm64.whl", hash = "sha256:97ca08674e3287c7148f4858b01136f8bdfe7202ad25ad04fec602dd1d29d132", size = 9185333, upload-time = "2026-02-17T22:19:53.266Z" },
{ url = "https://files.pythonhosted.org/packages/66/fc/848bb6710bc6061cb0c5badd65b92ff75c81302e0e31e496d00029fe4953/pandas-3.0.1-cp314-cp314t-macosx_10_15_x86_64.whl", hash = "sha256:58eeb1b2e0fb322befcf2bbc9ba0af41e616abadb3d3414a6bc7167f6cbfce32", size = 10772664, upload-time = "2026-02-17T22:19:55.806Z" },
{ url = "https://files.pythonhosted.org/packages/69/5c/866a9bbd0f79263b4b0db6ec1a341be13a1473323f05c122388e0f15b21d/pandas-3.0.1-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:cd9af1276b5ca9e298bd79a26bda32fa9cc87ed095b2a9a60978d2ca058eaf87", size = 10421286, upload-time = "2026-02-17T22:19:58.091Z" },
{ url = "https://files.pythonhosted.org/packages/51/a4/2058fb84fb1cfbfb2d4a6d485e1940bb4ad5716e539d779852494479c580/pandas-3.0.1-cp314-cp314t-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:94f87a04984d6b63788327cd9f79dda62b7f9043909d2440ceccf709249ca988", size = 10342050, upload-time = "2026-02-17T22:20:01.376Z" },
{ url = "https://files.pythonhosted.org/packages/22/1b/674e89996cc4be74db3c4eb09240c4bb549865c9c3f5d9b086ff8fcfbf00/pandas-3.0.1-cp314-cp314t-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:85fe4c4df62e1e20f9db6ebfb88c844b092c22cd5324bdcf94bfa2fc1b391221", size = 10740055, upload-time = "2026-02-17T22:20:04.328Z" },
{ url = "https://files.pythonhosted.org/packages/d0/f8/e954b750764298c22fa4614376531fe63c521ef517e7059a51f062b87dca/pandas-3.0.1-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:331ca75a2f8672c365ae25c0b29e46f5ac0c6551fdace8eec4cd65e4fac271ff", size = 11357632, upload-time = "2026-02-17T22:20:06.647Z" },
{ url = "https://files.pythonhosted.org/packages/6d/02/c6e04b694ffd68568297abd03588b6d30295265176a5c01b7459d3bc35a3/pandas-3.0.1-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:15860b1fdb1973fffade772fdb931ccf9b2f400a3f5665aef94a00445d7d8dd5", size = 11810974, upload-time = "2026-02-17T22:20:08.946Z" },
{ url = "https://files.pythonhosted.org/packages/89/41/d7dfb63d2407f12055215070c42fc6ac41b66e90a2946cdc5e759058398b/pandas-3.0.1-cp314-cp314t-win_amd64.whl", hash = "sha256:44f1364411d5670efa692b146c748f4ed013df91ee91e9bec5677fb1fd58b937", size = 10884622, upload-time = "2026-02-17T22:20:11.711Z" },
{ url = "https://files.pythonhosted.org/packages/68/b0/34937815889fa982613775e4b97fddd13250f11012d769949c5465af2150/pandas-3.0.1-cp314-cp314t-win_arm64.whl", hash = "sha256:108dd1790337a494aa80e38def654ca3f0968cf4f362c85f44c15e471667102d", size = 9452085, upload-time = "2026-02-17T22:20:14.331Z" },
]
[[package]]
name = "pathspec"
version = "1.0.4"
@@ -2128,6 +2455,75 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/5a/26/6cee8a1ce8c43625ec561aff19df07f9776b7525d9002c86bceb3e0ac970/pgvector-0.4.2-py3-none-any.whl", hash = "sha256:549d45f7a18593783d5eec609ea1684a724ba8405c4cb182a0b2b08aeff04e08", size = 27441, upload-time = "2025-12-05T01:07:16.536Z" },
]
[[package]]
name = "pillow"
version = "12.1.1"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/1f/42/5c74462b4fd957fcd7b13b04fb3205ff8349236ea74c7c375766d6c82288/pillow-12.1.1.tar.gz", hash = "sha256:9ad8fa5937ab05218e2b6a4cff30295ad35afd2f83ac592e68c0d871bb0fdbc4", size = 46980264, upload-time = "2026-02-11T04:23:07.146Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/07/d3/8df65da0d4df36b094351dce696f2989bec731d4f10e743b1c5f4da4d3bf/pillow-12.1.1-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:ab323b787d6e18b3d91a72fc99b1a2c28651e4358749842b8f8dfacd28ef2052", size = 5262803, upload-time = "2026-02-11T04:20:47.653Z" },
{ url = "https://files.pythonhosted.org/packages/d6/71/5026395b290ff404b836e636f51d7297e6c83beceaa87c592718747e670f/pillow-12.1.1-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:adebb5bee0f0af4909c30db0d890c773d1a92ffe83da908e2e9e720f8edf3984", size = 4657601, upload-time = "2026-02-11T04:20:49.328Z" },
{ url = "https://files.pythonhosted.org/packages/b1/2e/1001613d941c67442f745aff0f7cc66dd8df9a9c084eb497e6a543ee6f7e/pillow-12.1.1-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:bb66b7cc26f50977108790e2456b7921e773f23db5630261102233eb355a3b79", size = 6234995, upload-time = "2026-02-11T04:20:51.032Z" },
{ url = "https://files.pythonhosted.org/packages/07/26/246ab11455b2549b9233dbd44d358d033a2f780fa9007b61a913c5b2d24e/pillow-12.1.1-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:aee2810642b2898bb187ced9b349e95d2a7272930796e022efaf12e99dccd293", size = 8045012, upload-time = "2026-02-11T04:20:52.882Z" },
{ url = "https://files.pythonhosted.org/packages/b2/8b/07587069c27be7535ac1fe33874e32de118fbd34e2a73b7f83436a88368c/pillow-12.1.1-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:a0b1cd6232e2b618adcc54d9882e4e662a089d5768cd188f7c245b4c8c44a397", size = 6349638, upload-time = "2026-02-11T04:20:54.444Z" },
{ url = "https://files.pythonhosted.org/packages/ff/79/6df7b2ee763d619cda2fb4fea498e5f79d984dae304d45a8999b80d6cf5c/pillow-12.1.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:7aac39bcf8d4770d089588a2e1dd111cbaa42df5a94be3114222057d68336bd0", size = 7041540, upload-time = "2026-02-11T04:20:55.97Z" },
{ url = "https://files.pythonhosted.org/packages/2c/5e/2ba19e7e7236d7529f4d873bdaf317a318896bac289abebd4bb00ef247f0/pillow-12.1.1-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:ab174cd7d29a62dd139c44bf74b698039328f45cb03b4596c43473a46656b2f3", size = 6462613, upload-time = "2026-02-11T04:20:57.542Z" },
{ url = "https://files.pythonhosted.org/packages/03/03/31216ec124bb5c3dacd74ce8efff4cc7f52643653bad4825f8f08c697743/pillow-12.1.1-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:339ffdcb7cbeaa08221cd401d517d4b1fe7a9ed5d400e4a8039719238620ca35", size = 7166745, upload-time = "2026-02-11T04:20:59.196Z" },
{ url = "https://files.pythonhosted.org/packages/1f/e7/7c4552d80052337eb28653b617eafdef39adfb137c49dd7e831b8dc13bc5/pillow-12.1.1-cp312-cp312-win32.whl", hash = "sha256:5d1f9575a12bed9e9eedd9a4972834b08c97a352bd17955ccdebfeca5913fa0a", size = 6328823, upload-time = "2026-02-11T04:21:01.385Z" },
{ url = "https://files.pythonhosted.org/packages/3d/17/688626d192d7261bbbf98846fc98995726bddc2c945344b65bec3a29d731/pillow-12.1.1-cp312-cp312-win_amd64.whl", hash = "sha256:21329ec8c96c6e979cd0dfd29406c40c1d52521a90544463057d2aaa937d66a6", size = 7033367, upload-time = "2026-02-11T04:21:03.536Z" },
{ url = "https://files.pythonhosted.org/packages/ed/fe/a0ef1f73f939b0eca03ee2c108d0043a87468664770612602c63266a43c4/pillow-12.1.1-cp312-cp312-win_arm64.whl", hash = "sha256:af9a332e572978f0218686636610555ae3defd1633597be015ed50289a03c523", size = 2453811, upload-time = "2026-02-11T04:21:05.116Z" },
{ url = "https://files.pythonhosted.org/packages/d5/11/6db24d4bd7685583caeae54b7009584e38da3c3d4488ed4cd25b439de486/pillow-12.1.1-cp313-cp313-ios_13_0_arm64_iphoneos.whl", hash = "sha256:d242e8ac078781f1de88bf823d70c1a9b3c7950a44cdf4b7c012e22ccbcd8e4e", size = 4062689, upload-time = "2026-02-11T04:21:06.804Z" },
{ url = "https://files.pythonhosted.org/packages/33/c0/ce6d3b1fe190f0021203e0d9b5b99e57843e345f15f9ef22fcd43842fd21/pillow-12.1.1-cp313-cp313-ios_13_0_arm64_iphonesimulator.whl", hash = "sha256:02f84dfad02693676692746df05b89cf25597560db2857363a208e393429f5e9", size = 4138535, upload-time = "2026-02-11T04:21:08.452Z" },
{ url = "https://files.pythonhosted.org/packages/a0/c6/d5eb6a4fb32a3f9c21a8c7613ec706534ea1cf9f4b3663e99f0d83f6fca8/pillow-12.1.1-cp313-cp313-ios_13_0_x86_64_iphonesimulator.whl", hash = "sha256:e65498daf4b583091ccbb2556c7000abf0f3349fcd57ef7adc9a84a394ed29f6", size = 3601364, upload-time = "2026-02-11T04:21:10.194Z" },
{ url = "https://files.pythonhosted.org/packages/14/a1/16c4b823838ba4c9c52c0e6bbda903a3fe5a1bdbf1b8eb4fff7156f3e318/pillow-12.1.1-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:6c6db3b84c87d48d0088943bf33440e0c42370b99b1c2a7989216f7b42eede60", size = 5262561, upload-time = "2026-02-11T04:21:11.742Z" },
{ url = "https://files.pythonhosted.org/packages/bb/ad/ad9dc98ff24f485008aa5cdedaf1a219876f6f6c42a4626c08bc4e80b120/pillow-12.1.1-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:8b7e5304e34942bf62e15184219a7b5ad4ff7f3bb5cca4d984f37df1a0e1aee2", size = 4657460, upload-time = "2026-02-11T04:21:13.786Z" },
{ url = "https://files.pythonhosted.org/packages/9e/1b/f1a4ea9a895b5732152789326202a82464d5254759fbacae4deea3069334/pillow-12.1.1-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:18e5bddd742a44b7e6b1e773ab5db102bd7a94c32555ba656e76d319d19c3850", size = 6232698, upload-time = "2026-02-11T04:21:15.949Z" },
{ url = "https://files.pythonhosted.org/packages/95/f4/86f51b8745070daf21fd2e5b1fe0eb35d4db9ca26e6d58366562fb56a743/pillow-12.1.1-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:fc44ef1f3de4f45b50ccf9136999d71abb99dca7706bc75d222ed350b9fd2289", size = 8041706, upload-time = "2026-02-11T04:21:17.723Z" },
{ url = "https://files.pythonhosted.org/packages/29/9b/d6ecd956bb1266dd1045e995cce9b8d77759e740953a1c9aad9502a0461e/pillow-12.1.1-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:5a8eb7ed8d4198bccbd07058416eeec51686b498e784eda166395a23eb99138e", size = 6346621, upload-time = "2026-02-11T04:21:19.547Z" },
{ url = "https://files.pythonhosted.org/packages/71/24/538bff45bde96535d7d998c6fed1a751c75ac7c53c37c90dc2601b243893/pillow-12.1.1-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:47b94983da0c642de92ced1702c5b6c292a84bd3a8e1d1702ff923f183594717", size = 7038069, upload-time = "2026-02-11T04:21:21.378Z" },
{ url = "https://files.pythonhosted.org/packages/94/0e/58cb1a6bc48f746bc4cb3adb8cabff73e2742c92b3bf7a220b7cf69b9177/pillow-12.1.1-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:518a48c2aab7ce596d3bf79d0e275661b846e86e4d0e7dec34712c30fe07f02a", size = 6460040, upload-time = "2026-02-11T04:21:23.148Z" },
{ url = "https://files.pythonhosted.org/packages/6c/57/9045cb3ff11eeb6c1adce3b2d60d7d299d7b273a2e6c8381a524abfdc474/pillow-12.1.1-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:a550ae29b95c6dc13cf69e2c9dc5747f814c54eeb2e32d683e5e93af56caa029", size = 7164523, upload-time = "2026-02-11T04:21:25.01Z" },
{ url = "https://files.pythonhosted.org/packages/73/f2/9be9cb99f2175f0d4dbadd6616ce1bf068ee54a28277ea1bf1fbf729c250/pillow-12.1.1-cp313-cp313-win32.whl", hash = "sha256:a003d7422449f6d1e3a34e3dd4110c22148336918ddbfc6a32581cd54b2e0b2b", size = 6332552, upload-time = "2026-02-11T04:21:27.238Z" },
{ url = "https://files.pythonhosted.org/packages/3f/eb/b0834ad8b583d7d9d42b80becff092082a1c3c156bb582590fcc973f1c7c/pillow-12.1.1-cp313-cp313-win_amd64.whl", hash = "sha256:344cf1e3dab3be4b1fa08e449323d98a2a3f819ad20f4b22e77a0ede31f0faa1", size = 7040108, upload-time = "2026-02-11T04:21:29.462Z" },
{ url = "https://files.pythonhosted.org/packages/d5/7d/fc09634e2aabdd0feabaff4a32f4a7d97789223e7c2042fd805ea4b4d2c2/pillow-12.1.1-cp313-cp313-win_arm64.whl", hash = "sha256:5c0dd1636633e7e6a0afe7bf6a51a14992b7f8e60de5789018ebbdfae55b040a", size = 2453712, upload-time = "2026-02-11T04:21:31.072Z" },
{ url = "https://files.pythonhosted.org/packages/19/2a/b9d62794fc8a0dd14c1943df68347badbd5511103e0d04c035ffe5cf2255/pillow-12.1.1-cp313-cp313t-macosx_10_13_x86_64.whl", hash = "sha256:0330d233c1a0ead844fc097a7d16c0abff4c12e856c0b325f231820fee1f39da", size = 5264880, upload-time = "2026-02-11T04:21:32.865Z" },
{ url = "https://files.pythonhosted.org/packages/26/9d/e03d857d1347fa5ed9247e123fcd2a97b6220e15e9cb73ca0a8d91702c6e/pillow-12.1.1-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:5dae5f21afb91322f2ff791895ddd8889e5e947ff59f71b46041c8ce6db790bc", size = 4660616, upload-time = "2026-02-11T04:21:34.97Z" },
{ url = "https://files.pythonhosted.org/packages/f7/ec/8a6d22afd02570d30954e043f09c32772bfe143ba9285e2fdb11284952cd/pillow-12.1.1-cp313-cp313t-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:2e0c664be47252947d870ac0d327fea7e63985a08794758aa8af5b6cb6ec0c9c", size = 6269008, upload-time = "2026-02-11T04:21:36.623Z" },
{ url = "https://files.pythonhosted.org/packages/3d/1d/6d875422c9f28a4a361f495a5f68d9de4a66941dc2c619103ca335fa6446/pillow-12.1.1-cp313-cp313t-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:691ab2ac363b8217f7d31b3497108fb1f50faab2f75dfb03284ec2f217e87bf8", size = 8073226, upload-time = "2026-02-11T04:21:38.585Z" },
{ url = "https://files.pythonhosted.org/packages/a1/cd/134b0b6ee5eda6dc09e25e24b40fdafe11a520bc725c1d0bbaa5e00bf95b/pillow-12.1.1-cp313-cp313t-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:e9e8064fb1cc019296958595f6db671fba95209e3ceb0c4734c9baf97de04b20", size = 6380136, upload-time = "2026-02-11T04:21:40.562Z" },
{ url = "https://files.pythonhosted.org/packages/7a/a9/7628f013f18f001c1b98d8fffe3452f306a70dc6aba7d931019e0492f45e/pillow-12.1.1-cp313-cp313t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:472a8d7ded663e6162dafdf20015c486a7009483ca671cece7a9279b512fcb13", size = 7067129, upload-time = "2026-02-11T04:21:42.521Z" },
{ url = "https://files.pythonhosted.org/packages/1e/f8/66ab30a2193b277785601e82ee2d49f68ea575d9637e5e234faaa98efa4c/pillow-12.1.1-cp313-cp313t-musllinux_1_2_aarch64.whl", hash = "sha256:89b54027a766529136a06cfebeecb3a04900397a3590fd252160b888479517bf", size = 6491807, upload-time = "2026-02-11T04:21:44.22Z" },
{ url = "https://files.pythonhosted.org/packages/da/0b/a877a6627dc8318fdb84e357c5e1a758c0941ab1ddffdafd231983788579/pillow-12.1.1-cp313-cp313t-musllinux_1_2_x86_64.whl", hash = "sha256:86172b0831b82ce4f7877f280055892b31179e1576aa00d0df3bb1bbf8c3e524", size = 7190954, upload-time = "2026-02-11T04:21:46.114Z" },
{ url = "https://files.pythonhosted.org/packages/83/43/6f732ff85743cf746b1361b91665d9f5155e1483817f693f8d57ea93147f/pillow-12.1.1-cp313-cp313t-win32.whl", hash = "sha256:44ce27545b6efcf0fdbdceb31c9a5bdea9333e664cda58a7e674bb74608b3986", size = 6336441, upload-time = "2026-02-11T04:21:48.22Z" },
{ url = "https://files.pythonhosted.org/packages/3b/44/e865ef3986611bb75bfabdf94a590016ea327833f434558801122979cd0e/pillow-12.1.1-cp313-cp313t-win_amd64.whl", hash = "sha256:a285e3eb7a5a45a2ff504e31f4a8d1b12ef62e84e5411c6804a42197c1cf586c", size = 7045383, upload-time = "2026-02-11T04:21:50.015Z" },
{ url = "https://files.pythonhosted.org/packages/a8/c6/f4fb24268d0c6908b9f04143697ea18b0379490cb74ba9e8d41b898bd005/pillow-12.1.1-cp313-cp313t-win_arm64.whl", hash = "sha256:cc7d296b5ea4d29e6570dabeaed58d31c3fea35a633a69679fb03d7664f43fb3", size = 2456104, upload-time = "2026-02-11T04:21:51.633Z" },
{ url = "https://files.pythonhosted.org/packages/03/d0/bebb3ffbf31c5a8e97241476c4cf8b9828954693ce6744b4a2326af3e16b/pillow-12.1.1-cp314-cp314-ios_13_0_arm64_iphoneos.whl", hash = "sha256:417423db963cb4be8bac3fc1204fe61610f6abeed1580a7a2cbb2fbda20f12af", size = 4062652, upload-time = "2026-02-11T04:21:53.19Z" },
{ url = "https://files.pythonhosted.org/packages/2d/c0/0e16fb0addda4851445c28f8350d8c512f09de27bbb0d6d0bbf8b6709605/pillow-12.1.1-cp314-cp314-ios_13_0_arm64_iphonesimulator.whl", hash = "sha256:b957b71c6b2387610f556a7eb0828afbe40b4a98036fc0d2acfa5a44a0c2036f", size = 4138823, upload-time = "2026-02-11T04:22:03.088Z" },
{ url = "https://files.pythonhosted.org/packages/6b/fb/6170ec655d6f6bb6630a013dd7cf7bc218423d7b5fa9071bf63dc32175ae/pillow-12.1.1-cp314-cp314-ios_13_0_x86_64_iphonesimulator.whl", hash = "sha256:097690ba1f2efdeb165a20469d59d8bb03c55fb6621eb2041a060ae8ea3e9642", size = 3601143, upload-time = "2026-02-11T04:22:04.909Z" },
{ url = "https://files.pythonhosted.org/packages/59/04/dc5c3f297510ba9a6837cbb318b87dd2b8f73eb41a43cc63767f65cb599c/pillow-12.1.1-cp314-cp314-macosx_10_15_x86_64.whl", hash = "sha256:2815a87ab27848db0321fb78c7f0b2c8649dee134b7f2b80c6a45c6831d75ccd", size = 5266254, upload-time = "2026-02-11T04:22:07.656Z" },
{ url = "https://files.pythonhosted.org/packages/05/30/5db1236b0d6313f03ebf97f5e17cda9ca060f524b2fcc875149a8360b21c/pillow-12.1.1-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:f7ed2c6543bad5a7d5530eb9e78c53132f93dfa44a28492db88b41cdab885202", size = 4657499, upload-time = "2026-02-11T04:22:09.613Z" },
{ url = "https://files.pythonhosted.org/packages/6f/18/008d2ca0eb612e81968e8be0bbae5051efba24d52debf930126d7eaacbba/pillow-12.1.1-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:652a2c9ccfb556235b2b501a3a7cf3742148cd22e04b5625c5fe057ea3e3191f", size = 6232137, upload-time = "2026-02-11T04:22:11.434Z" },
{ url = "https://files.pythonhosted.org/packages/70/f1/f14d5b8eeb4b2cd62b9f9f847eb6605f103df89ef619ac68f92f748614ea/pillow-12.1.1-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:d6e4571eedf43af33d0fc233a382a76e849badbccdf1ac438841308652a08e1f", size = 8042721, upload-time = "2026-02-11T04:22:13.321Z" },
{ url = "https://files.pythonhosted.org/packages/5a/d6/17824509146e4babbdabf04d8171491fa9d776f7061ff6e727522df9bd03/pillow-12.1.1-cp314-cp314-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:b574c51cf7d5d62e9be37ba446224b59a2da26dc4c1bb2ecbe936a4fb1a7cb7f", size = 6347798, upload-time = "2026-02-11T04:22:15.449Z" },
{ url = "https://files.pythonhosted.org/packages/d1/ee/c85a38a9ab92037a75615aba572c85ea51e605265036e00c5b67dfafbfe2/pillow-12.1.1-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:a37691702ed687799de29a518d63d4682d9016932db66d4e90c345831b02fb4e", size = 7039315, upload-time = "2026-02-11T04:22:17.24Z" },
{ url = "https://files.pythonhosted.org/packages/ec/f3/bc8ccc6e08a148290d7523bde4d9a0d6c981db34631390dc6e6ec34cacf6/pillow-12.1.1-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:f95c00d5d6700b2b890479664a06e754974848afaae5e21beb4d83c106923fd0", size = 6462360, upload-time = "2026-02-11T04:22:19.111Z" },
{ url = "https://files.pythonhosted.org/packages/f6/ab/69a42656adb1d0665ab051eec58a41f169ad295cf81ad45406963105408f/pillow-12.1.1-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:559b38da23606e68681337ad74622c4dbba02254fc9cb4488a305dd5975c7eeb", size = 7165438, upload-time = "2026-02-11T04:22:21.041Z" },
{ url = "https://files.pythonhosted.org/packages/02/46/81f7aa8941873f0f01d4b55cc543b0a3d03ec2ee30d617a0448bf6bd6dec/pillow-12.1.1-cp314-cp314-win32.whl", hash = "sha256:03edcc34d688572014ff223c125a3f77fb08091e4607e7745002fc214070b35f", size = 6431503, upload-time = "2026-02-11T04:22:22.833Z" },
{ url = "https://files.pythonhosted.org/packages/40/72/4c245f7d1044b67affc7f134a09ea619d4895333d35322b775b928180044/pillow-12.1.1-cp314-cp314-win_amd64.whl", hash = "sha256:50480dcd74fa63b8e78235957d302d98d98d82ccbfac4c7e12108ba9ecbdba15", size = 7176748, upload-time = "2026-02-11T04:22:24.64Z" },
{ url = "https://files.pythonhosted.org/packages/e4/ad/8a87bdbe038c5c698736e3348af5c2194ffb872ea52f11894c95f9305435/pillow-12.1.1-cp314-cp314-win_arm64.whl", hash = "sha256:5cb1785d97b0c3d1d1a16bc1d710c4a0049daefc4935f3a8f31f827f4d3d2e7f", size = 2544314, upload-time = "2026-02-11T04:22:26.685Z" },
{ url = "https://files.pythonhosted.org/packages/6c/9d/efd18493f9de13b87ede7c47e69184b9e859e4427225ea962e32e56a49bc/pillow-12.1.1-cp314-cp314t-macosx_10_15_x86_64.whl", hash = "sha256:1f90cff8aa76835cba5769f0b3121a22bd4eb9e6884cfe338216e557a9a548b8", size = 5268612, upload-time = "2026-02-11T04:22:29.884Z" },
{ url = "https://files.pythonhosted.org/packages/f8/f1/4f42eb2b388eb2ffc660dcb7f7b556c1015c53ebd5f7f754965ef997585b/pillow-12.1.1-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:1f1be78ce9466a7ee64bfda57bdba0f7cc499d9794d518b854816c41bf0aa4e9", size = 4660567, upload-time = "2026-02-11T04:22:31.799Z" },
{ url = "https://files.pythonhosted.org/packages/01/54/df6ef130fa43e4b82e32624a7b821a2be1c5653a5fdad8469687a7db4e00/pillow-12.1.1-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:42fc1f4677106188ad9a55562bbade416f8b55456f522430fadab3cef7cd4e60", size = 6269951, upload-time = "2026-02-11T04:22:33.921Z" },
{ url = "https://files.pythonhosted.org/packages/a9/48/618752d06cc44bb4aae8ce0cd4e6426871929ed7b46215638088270d9b34/pillow-12.1.1-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:98edb152429ab62a1818039744d8fbb3ccab98a7c29fc3d5fcef158f3f1f68b7", size = 8074769, upload-time = "2026-02-11T04:22:35.877Z" },
{ url = "https://files.pythonhosted.org/packages/c3/bd/f1d71eb39a72fa088d938655afba3e00b38018d052752f435838961127d8/pillow-12.1.1-cp314-cp314t-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:d470ab1178551dd17fdba0fef463359c41aaa613cdcd7ff8373f54be629f9f8f", size = 6381358, upload-time = "2026-02-11T04:22:37.698Z" },
{ url = "https://files.pythonhosted.org/packages/64/ef/c784e20b96674ed36a5af839305f55616f8b4f8aa8eeccf8531a6e312243/pillow-12.1.1-cp314-cp314t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:6408a7b064595afcab0a49393a413732a35788f2a5092fdc6266952ed67de586", size = 7068558, upload-time = "2026-02-11T04:22:39.597Z" },
{ url = "https://files.pythonhosted.org/packages/73/cb/8059688b74422ae61278202c4e1ad992e8a2e7375227be0a21c6b87ca8d5/pillow-12.1.1-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:5d8c41325b382c07799a3682c1c258469ea2ff97103c53717b7893862d0c98ce", size = 6493028, upload-time = "2026-02-11T04:22:42.73Z" },
{ url = "https://files.pythonhosted.org/packages/c6/da/e3c008ed7d2dd1f905b15949325934510b9d1931e5df999bb15972756818/pillow-12.1.1-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:c7697918b5be27424e9ce568193efd13d925c4481dd364e43f5dff72d33e10f8", size = 7191940, upload-time = "2026-02-11T04:22:44.543Z" },
{ url = "https://files.pythonhosted.org/packages/01/4a/9202e8d11714c1fc5951f2e1ef362f2d7fbc595e1f6717971d5dd750e969/pillow-12.1.1-cp314-cp314t-win32.whl", hash = "sha256:d2912fd8114fc5545aa3a4b5576512f64c55a03f3ebcca4c10194d593d43ea36", size = 6438736, upload-time = "2026-02-11T04:22:46.347Z" },
{ url = "https://files.pythonhosted.org/packages/f3/ca/cbce2327eb9885476b3957b2e82eb12c866a8b16ad77392864ad601022ce/pillow-12.1.1-cp314-cp314t-win_amd64.whl", hash = "sha256:4ceb838d4bd9dab43e06c363cab2eebf63846d6a4aeaea283bbdfd8f1a8ed58b", size = 7182894, upload-time = "2026-02-11T04:22:48.114Z" },
{ url = "https://files.pythonhosted.org/packages/ec/d2/de599c95ba0a973b94410477f8bf0b6f0b5e67360eb89bcb1ad365258beb/pillow-12.1.1-cp314-cp314t-win_arm64.whl", hash = "sha256:7b03048319bfc6170e93bd60728a1af51d3dd7704935feb228c4d4faab35d334", size = 2546446, upload-time = "2026-02-11T04:22:50.342Z" },
]
[[package]]
name = "pluggy"
version = "1.6.0"
@@ -2233,6 +2629,66 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/5b/5a/bc7b4a4ef808fa59a816c17b20c4bef6884daebbdf627ff2a161da67da19/propcache-0.4.1-py3-none-any.whl", hash = "sha256:af2a6052aeb6cf17d3e46ee169099044fd8224cbaf75c76a2ef596e8163e2237", size = 13305, upload-time = "2025-10-08T19:49:00.792Z" },
]
[[package]]
name = "proto-plus"
version = "1.27.1"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "protobuf" },
]
sdist = { url = "https://files.pythonhosted.org/packages/3a/02/8832cde80e7380c600fbf55090b6ab7b62bd6825dbedde6d6657c15a1f8e/proto_plus-1.27.1.tar.gz", hash = "sha256:912a7460446625b792f6448bade9e55cd4e41e6ac10e27009ef71a7f317fa147", size = 56929, upload-time = "2026-02-02T17:34:49.035Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/5d/79/ac273cbbf744691821a9cca88957257f41afe271637794975ca090b9588b/proto_plus-1.27.1-py3-none-any.whl", hash = "sha256:e4643061f3a4d0de092d62aa4ad09fa4756b2cbb89d4627f3985018216f9fefc", size = 50480, upload-time = "2026-02-02T17:34:47.339Z" },
]
[[package]]
name = "protobuf"
version = "6.33.6"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/66/70/e908e9c5e52ef7c3a6c7902c9dfbb34c7e29c25d2f81ade3856445fd5c94/protobuf-6.33.6.tar.gz", hash = "sha256:a6768d25248312c297558af96a9f9c929e8c4cee0659cb07e780731095f38135", size = 444531, upload-time = "2026-03-18T19:05:00.988Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/fc/9f/2f509339e89cfa6f6a4c4ff50438db9ca488dec341f7e454adad60150b00/protobuf-6.33.6-cp310-abi3-win32.whl", hash = "sha256:7d29d9b65f8afef196f8334e80d6bc1d5d4adedb449971fefd3723824e6e77d3", size = 425739, upload-time = "2026-03-18T19:04:48.373Z" },
{ url = "https://files.pythonhosted.org/packages/76/5d/683efcd4798e0030c1bab27374fd13a89f7c2515fb1f3123efdfaa5eab57/protobuf-6.33.6-cp310-abi3-win_amd64.whl", hash = "sha256:0cd27b587afca21b7cfa59a74dcbd48a50f0a6400cfb59391340ad729d91d326", size = 437089, upload-time = "2026-03-18T19:04:50.381Z" },
{ url = "https://files.pythonhosted.org/packages/5c/01/a3c3ed5cd186f39e7880f8303cc51385a198a81469d53d0fdecf1f64d929/protobuf-6.33.6-cp39-abi3-macosx_10_9_universal2.whl", hash = "sha256:9720e6961b251bde64edfdab7d500725a2af5280f3f4c87e57c0208376aa8c3a", size = 427737, upload-time = "2026-03-18T19:04:51.866Z" },
{ url = "https://files.pythonhosted.org/packages/ee/90/b3c01fdec7d2f627b3a6884243ba328c1217ed2d978def5c12dc50d328a3/protobuf-6.33.6-cp39-abi3-manylinux2014_aarch64.whl", hash = "sha256:e2afbae9b8e1825e3529f88d514754e094278bb95eadc0e199751cdd9a2e82a2", size = 324610, upload-time = "2026-03-18T19:04:53.096Z" },
{ url = "https://files.pythonhosted.org/packages/9b/ca/25afc144934014700c52e05103c2421997482d561f3101ff352e1292fb81/protobuf-6.33.6-cp39-abi3-manylinux2014_s390x.whl", hash = "sha256:c96c37eec15086b79762ed265d59ab204dabc53056e3443e702d2681f4b39ce3", size = 339381, upload-time = "2026-03-18T19:04:54.616Z" },
{ url = "https://files.pythonhosted.org/packages/16/92/d1e32e3e0d894fe00b15ce28ad4944ab692713f2e7f0a99787405e43533a/protobuf-6.33.6-cp39-abi3-manylinux2014_x86_64.whl", hash = "sha256:e9db7e292e0ab79dd108d7f1a94fe31601ce1ee3f7b79e0692043423020b0593", size = 323436, upload-time = "2026-03-18T19:04:55.768Z" },
{ url = "https://files.pythonhosted.org/packages/c4/72/02445137af02769918a93807b2b7890047c32bfb9f90371cbc12688819eb/protobuf-6.33.6-py3-none-any.whl", hash = "sha256:77179e006c476e69bf8e8ce866640091ec42e1beb80b213c3900006ecfba6901", size = 170656, upload-time = "2026-03-18T19:04:59.826Z" },
]
[[package]]
name = "py-vapid"
version = "1.9.4"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "cryptography" },
]
sdist = { url = "https://files.pythonhosted.org/packages/a3/ed/c648c8018fab319951764f4babe68ddcbbff7f2bbcd7ff7e531eac1788c8/py_vapid-1.9.4.tar.gz", hash = "sha256:a004023560cbc54e34fc06380a0580f04ffcc788e84fb6d19e9339eeb6551a28", size = 74750, upload-time = "2026-01-05T22:13:25.201Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/7f/15/f9d0171e1ad863ca49e826d5afb6b50566f20dc9b4f76965096d3555ce9e/py_vapid-1.9.4-py2.py3-none-any.whl", hash = "sha256:f165a5bf90dcf966b226114f01f178f137579a09784c7f0628fa2f0a299741b6", size = 23912, upload-time = "2026-01-05T20:42:05.455Z" },
]
[[package]]
name = "pyasn1"
version = "0.6.3"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/5c/5f/6583902b6f79b399c9c40674ac384fd9cd77805f9e6205075f828ef11fb2/pyasn1-0.6.3.tar.gz", hash = "sha256:697a8ecd6d98891189184ca1fa05d1bb00e2f84b5977c481452050549c8a72cf", size = 148685, upload-time = "2026-03-17T01:06:53.382Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/5d/a0/7d793dce3fa811fe047d6ae2431c672364b462850c6235ae306c0efd025f/pyasn1-0.6.3-py3-none-any.whl", hash = "sha256:a80184d120f0864a52a073acc6fc642847d0be408e7c7252f31390c0f4eadcde", size = 83997, upload-time = "2026-03-17T01:06:52.036Z" },
]
[[package]]
name = "pyasn1-modules"
version = "0.4.2"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "pyasn1" },
]
sdist = { url = "https://files.pythonhosted.org/packages/e9/e6/78ebbb10a8c8e4b61a59249394a4a594c1a7af95593dc933a349c8d00964/pyasn1_modules-0.4.2.tar.gz", hash = "sha256:677091de870a80aae844b1ca6134f54652fa2c8c5a52aa396440ac3106e941e6", size = 307892, upload-time = "2025-03-28T02:41:22.17Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/47/8d/d529b5d697919ba8c11ad626e835d4039be708a35b0d22de83a269a6682c/pyasn1_modules-0.4.2-py3-none-any.whl", hash = "sha256:29253a9207ce32b64c3ac6600edc75368f98473906e8fd1043bd6b5b1de2c14a", size = 181259, upload-time = "2025-03-28T02:41:19.028Z" },
]
[[package]]
name = "pycparser"
version = "3.0"
@@ -2369,6 +2825,24 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/c7/21/705964c7812476f378728bdf590ca4b771ec72385c533964653c68e86bdc/pygments-2.19.2-py3-none-any.whl", hash = "sha256:86540386c03d588bb81d44bc3928634ff26449851e99741617ecb9037ee5ec0b", size = 1225217, upload-time = "2025-06-21T13:39:07.939Z" },
]
[[package]]
name = "pyparsing"
version = "3.3.2"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/f3/91/9c6ee907786a473bf81c5f53cf703ba0957b23ab84c264080fb5a450416f/pyparsing-3.3.2.tar.gz", hash = "sha256:c777f4d763f140633dcb6d8a3eda953bf7a214dc4eff598413c070bcdc117cbc", size = 6851574, upload-time = "2026-01-21T03:57:59.36Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/10/bd/c038d7cc38edc1aa5bf91ab8068b63d4308c66c4c8bb3cbba7dfbc049f9c/pyparsing-3.3.2-py3-none-any.whl", hash = "sha256:850ba148bd908d7e2411587e247a1e4f0327839c40e2e5e6d05a007ecc69911d", size = 122781, upload-time = "2026-01-21T03:57:55.912Z" },
]
[[package]]
name = "pypdf"
version = "6.9.2"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/31/83/691bdb309306232362503083cb15777491045dd54f45393a317dc7d8082f/pypdf-6.9.2.tar.gz", hash = "sha256:7f850faf2b0d4ab936582c05da32c52214c2b089d61a316627b5bfb5b0dab46c", size = 5311837, upload-time = "2026-03-23T14:53:27.983Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/a5/7e/c85f41243086a8fe5d1baeba527cb26a1918158a565932b41e0f7c0b32e9/pypdf-6.9.2-py3-none-any.whl", hash = "sha256:662cf29bcb419a36a1365232449624ab40b7c2d0cfc28e54f42eeecd1fd7e844", size = 333744, upload-time = "2026-03-23T14:53:26.573Z" },
]
[[package]]
name = "pytest"
version = "9.0.2"
@@ -2423,6 +2897,19 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/ec/57/56b9bcc3c9c6a792fcbaf139543cee77261f3651ca9da0c93f5c1221264b/python_dateutil-2.9.0.post0-py2.py3-none-any.whl", hash = "sha256:a8b2bc7bffae282281c8140a97d3aa9c14da0b136dfe83f850eea9a5f7470427", size = 229892, upload-time = "2024-03-01T18:36:18.57Z" },
]
[[package]]
name = "python-docx"
version = "1.2.0"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "lxml" },
{ name = "typing-extensions" },
]
sdist = { url = "https://files.pythonhosted.org/packages/a9/f7/eddfe33871520adab45aaa1a71f0402a2252050c14c7e3009446c8f4701c/python_docx-1.2.0.tar.gz", hash = "sha256:7bc9d7b7d8a69c9c02ca09216118c86552704edc23bac179283f2e38f86220ce", size = 5723256, upload-time = "2025-06-16T20:46:27.921Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/d0/00/1e03a4989fa5795da308cd774f05b704ace555a70f9bf9d3be057b680bcf/python_docx-1.2.0-py3-none-any.whl", hash = "sha256:3fd478f3250fbbbfd3b94fe1e985955737c145627498896a8a6bf81f4baf66c7", size = 252987, upload-time = "2025-06-16T20:46:22.506Z" },
]
[[package]]
name = "python-dotenv"
version = "1.2.2"
@@ -2441,6 +2928,21 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/1b/d0/397f9626e711ff749a95d96b7af99b9c566a9bb5129b8e4c10fc4d100304/python_multipart-0.0.22-py3-none-any.whl", hash = "sha256:2b2cd894c83d21bf49d702499531c7bafd057d730c201782048f7945d82de155", size = 24579, upload-time = "2026-01-25T10:15:54.811Z" },
]
[[package]]
name = "python-pptx"
version = "1.0.2"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "lxml" },
{ name = "pillow" },
{ name = "typing-extensions" },
{ name = "xlsxwriter" },
]
sdist = { url = "https://files.pythonhosted.org/packages/52/a9/0c0db8d37b2b8a645666f7fd8accea4c6224e013c42b1d5c17c93590cd06/python_pptx-1.0.2.tar.gz", hash = "sha256:479a8af0eaf0f0d76b6f00b0887732874ad2e3188230315290cd1f9dd9cc7095", size = 10109297, upload-time = "2024-08-07T17:33:37.772Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/d9/4f/00be2196329ebbff56ce564aa94efb0fbc828d00de250b1980de1a34ab49/python_pptx-1.0.2-py3-none-any.whl", hash = "sha256:160838e0b8565a8b1f67947675886e9fea18aa5e795db7ae531606d68e785cba", size = 472788, upload-time = "2024-08-07T17:33:28.192Z" },
]
[[package]]
name = "python-telegram-bot"
version = "22.7"
@@ -2454,6 +2956,22 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/94/f7/0e2f89dd62f45d46d4ea0d8aec5893ce5b37389638db010c117f46f11450/python_telegram_bot-22.7-py3-none-any.whl", hash = "sha256:d72eed532cf763758cd9331b57a6d790aff0bb4d37d8f4e92149436fe21c6475", size = 745365, upload-time = "2026-03-16T09:36:01.498Z" },
]
[[package]]
name = "pywebpush"
version = "2.3.0"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "aiohttp" },
{ name = "cryptography" },
{ name = "http-ece" },
{ name = "py-vapid" },
{ name = "requests" },
]
sdist = { url = "https://files.pythonhosted.org/packages/87/d9/e497a24bc9f659bfc0e570382a41e6b2d6726fbcfa4d85aaa23fe9c81ba2/pywebpush-2.3.0.tar.gz", hash = "sha256:d1e27db8de9e6757c1875f67292554bd54c41874c36f4b5c4ebb5442dce204f2", size = 28489, upload-time = "2026-02-09T23:30:18.574Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/3d/d8/ac21241cf8007cb93255eabf318da4f425ec0f75d28c366992253aa8c1b2/pywebpush-2.3.0-py3-none-any.whl", hash = "sha256:3d97469fb14d4323c362319d438183737249a4115b50e146ce233e7f01e3cf98", size = 22851, upload-time = "2026-02-09T23:30:16.093Z" },
]
[[package]]
name = "pyyaml"
version = "6.0.3"
@@ -2626,6 +3144,19 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/1e/db/4254e3eabe8020b458f1a747140d32277ec7a271daf1d235b70dc0b4e6e3/requests-2.32.5-py3-none-any.whl", hash = "sha256:2462f94637a34fd532264295e186976db0f5d453d1cdd31473c85a6a161affb6", size = 64738, upload-time = "2025-08-18T20:46:00.542Z" },
]
[[package]]
name = "requests-oauthlib"
version = "2.0.0"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "oauthlib" },
{ name = "requests" },
]
sdist = { url = "https://files.pythonhosted.org/packages/42/f2/05f29bc3913aea15eb670be136045bf5c5bbf4b99ecb839da9b422bb2c85/requests-oauthlib-2.0.0.tar.gz", hash = "sha256:b3dffaebd884d8cd778494369603a9e7b58d29111bf6b41bdc2dcd87203af4e9", size = 55650, upload-time = "2024-03-22T20:32:29.939Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/3b/5d/63d4ae3b9daea098d5d6f5da83984853c1bbacd5dc826764b249fe119d24/requests_oauthlib-2.0.0-py2.py3-none-any.whl", hash = "sha256:7dd8a5c40426b779b0868c404bdef9768deccf22749cde15852df527e6269b36", size = 24179, upload-time = "2024-03-22T20:32:28.055Z" },
]
[[package]]
name = "rich"
version = "14.3.3"
@@ -3396,6 +3927,15 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/c2/14/e2a54fabd4f08cd7af1c07030603c3356b74da07f7cc056e600436edfa17/tzlocal-5.3.1-py3-none-any.whl", hash = "sha256:eb1a66c3ef5847adf7a834f1be0800581b683b5608e74f86ecbcef8ab91bb85d", size = 18026, upload-time = "2025-03-05T21:17:39.857Z" },
]
[[package]]
name = "uritemplate"
version = "4.2.0"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/98/60/f174043244c5306c9988380d2cb10009f91563fc4b31293d27e17201af56/uritemplate-4.2.0.tar.gz", hash = "sha256:480c2ed180878955863323eea31b0ede668795de182617fef9c6ca09e6ec9d0e", size = 33267, upload-time = "2025-06-02T15:12:06.318Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/a9/99/3ae339466c9183ea5b8ae87b34c0b897eda475d2aec2307cae60e5cd4f29/uritemplate-4.2.0-py3-none-any.whl", hash = "sha256:962201ba1c4edcab02e60f9a0d3821e82dfc5d2d6662a21abd533879bdb8a686", size = 11488, upload-time = "2025-06-02T15:12:03.405Z" },
]
[[package]]
name = "urllib3"
version = "2.6.3"
@@ -3658,6 +4198,15 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/1a/c7/8528ac2dfa2c1e6708f647df7ae144ead13f0a31146f43c7264b4942bf12/wrapt-2.1.2-py3-none-any.whl", hash = "sha256:b8fd6fa2b2c4e7621808f8c62e8317f4aae56e59721ad933bac5239d913cf0e8", size = 43993, upload-time = "2026-03-06T02:53:12.905Z" },
]
[[package]]
name = "xlsxwriter"
version = "3.2.9"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/46/2c/c06ef49dc36e7954e55b802a8b231770d286a9758b3d936bd1e04ce5ba88/xlsxwriter-3.2.9.tar.gz", hash = "sha256:254b1c37a368c444eac6e2f867405cc9e461b0ed97a3233b2ac1e574efb4140c", size = 215940, upload-time = "2025-09-16T00:16:21.63Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/3a/0c/3662f4a66880196a590b202f0db82d919dd2f89e99a27fadef91c4a33d41/xlsxwriter-3.2.9-py3-none-any.whl", hash = "sha256:9a5db42bc5dff014806c58a20b9eae7322a134abb6fce3c92c181bfb275ec5b3", size = 175315, upload-time = "2025-09-16T00:16:20.108Z" },
]
[[package]]
name = "yarl"
version = "1.23.0"
@@ -3762,6 +4311,19 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/69/68/c8739671f5699c7dc470580a4f821ef37c32c4cb0b047ce223a7f115757f/yarl-1.23.0-py3-none-any.whl", hash = "sha256:a2df6afe50dea8ae15fa34c9f824a3ee958d785fd5d089063d960bae1daa0a3f", size = 48288, upload-time = "2026-03-01T22:07:51.388Z" },
]
[[package]]
name = "youtube-transcript-api"
version = "1.2.4"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "defusedxml" },
{ name = "requests" },
]
sdist = { url = "https://files.pythonhosted.org/packages/60/43/4104185a2eaa839daa693b30e15c37e7e58795e8e09ec414f22b3db54bec/youtube_transcript_api-1.2.4.tar.gz", hash = "sha256:b72d0e96a335df599d67cee51d49e143cff4f45b84bcafc202ff51291603ddcd", size = 469839, upload-time = "2026-01-29T09:09:17.088Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/be/95/129ea37efd6cd6ed00f62baae6543345c677810b8a3bf0026756e1d3cf3c/youtube_transcript_api-1.2.4-py3-none-any.whl", hash = "sha256:03878759356da5caf5edac77431780b91448fb3d8c21d4496015bdc8a7bc43ff", size = 485227, upload-time = "2026-01-29T09:09:15.427Z" },
]
[[package]]
name = "zipp"
version = "3.23.0"