Commit Graph

18 Commits

Author SHA256 Message Date
tlg
17818a3860 feat: FastAPI app assembly with all routes and backend wiring
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 10:04:56 +02:00
tlg
d55c80ae35 feat: API routes for models, chat, transcription, speech, and admin
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 10:04:45 +02:00
tlg
ef44bc09b9 feat: Chatterbox TTS backend with turbo/multilingual/default variants 2026-04-04 09:40:42 +02:00
tlg
c6677dcab3 feat: llama-cpp-python backend with GGUF, vision, and tool support 2026-04-04 09:40:40 +02:00
tlg
de25b5e2a7 feat: transformers ASR backend for cohere-transcribe 2026-04-04 09:40:39 +02:00
tlg
449e37d318 feat: abstract base class for model backends
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 09:29:35 +02:00
tlg
813bbe0ad0 fix: VRAM eviction cascades through all tiers for large LLM loads
The original eviction logic blocked ASR eviction even when an LLM
genuinely needed all 16GB VRAM (e.g., gpt-oss-20b at 13GB). Now uses
two-pass eviction: first evicts lower/same priority, then cascades to
higher priority as last resort. Added tests for ASR-survives and
full-cascade scenarios.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 09:22:14 +02:00
tlg
d7a091df8c feat: VRAM manager with priority-based model eviction
Tracks GPU VRAM usage (16GB) and handles model loading/unloading with
priority-based eviction: LLM (lowest) -> TTS -> ASR (highest, protected).
Uses asyncio Lock for concurrency safety.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 09:14:41 +02:00
tlg
969bcb3292 feat: API key authentication dependency
Implements create_api_key_dependency() FastAPI dependency that validates
Bearer tokens against a configured list of ApiKey objects (401 on missing,
malformed, or unknown tokens). Includes 5 TDD tests covering all cases.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 07:31:30 +02:00
tlg
c4eaf5088b feat: model registry with virtual-to-physical resolution
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 07:31:10 +02:00
tlg
690ad46d88 feat: config loading for models.yaml and api_keys.yaml
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 07:30:13 +02:00
tlg
a64f32b590 feat: project scaffolding with config files and test fixtures
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 07:23:14 +02:00
tlg
cf7c77b3b5 Add llmux implementation plan (30 tasks)
Covers project scaffolding, config, auth, VRAM manager, all four
backends, API routes, Dockerfile, deployment scripts, and four
phases of testing (integration, functional, VRAM, performance).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 22:43:37 +02:00
tlg
45947e80a4 Update manual steps: DNS done, Open WebUI config automated
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 22:25:51 +02:00
tlg
7187c58c5e Add llmux product requirements in StrictDoc format
42 requirements covering architecture, runtimes, models, VRAM
management, API, authentication, configuration, integration,
and four-phase testing plan.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 21:11:05 +02:00
tlg
bd0ed74d32 Clarify VRAM eviction rule for cross-priority edge case
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 13:20:53 +02:00
tlg
e6be9dcb85 Add llmux design specification
Covers architecture, model registry, VRAM management, API endpoints,
container setup, Open WebUI integration, Traefik routing, and
four-phase testing plan.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 13:15:46 +02:00
tlg
e7cf075e2f Initial commit with .gitignore
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 17:58:54 +02:00