DesTEngSsv006_swd

SHA256

Author	SHA256	Message	Date
tlg	17818a3860	feat: FastAPI app assembly with all routes and backend wiring Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 10:04:56 +02:00
tlg	d55c80ae35	feat: API routes for models, chat, transcription, speech, and admin Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 10:04:45 +02:00
tlg	ef44bc09b9	feat: Chatterbox TTS backend with turbo/multilingual/default variants	2026-04-04 09:40:42 +02:00
tlg	c6677dcab3	feat: llama-cpp-python backend with GGUF, vision, and tool support	2026-04-04 09:40:40 +02:00
tlg	de25b5e2a7	feat: transformers ASR backend for cohere-transcribe	2026-04-04 09:40:39 +02:00
tlg	449e37d318	feat: abstract base class for model backends Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-04 09:29:35 +02:00
tlg	813bbe0ad0	fix: VRAM eviction cascades through all tiers for large LLM loads The original eviction logic blocked ASR eviction even when an LLM genuinely needed all 16GB VRAM (e.g., gpt-oss-20b at 13GB). Now uses two-pass eviction: first evicts lower/same priority, then cascades to higher priority as last resort. Added tests for ASR-survives and full-cascade scenarios. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-04 09:22:14 +02:00
tlg	d7a091df8c	feat: VRAM manager with priority-based model eviction Tracks GPU VRAM usage (16GB) and handles model loading/unloading with priority-based eviction: LLM (lowest) -> TTS -> ASR (highest, protected). Uses asyncio Lock for concurrency safety. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-04 09:14:41 +02:00
tlg	969bcb3292	feat: API key authentication dependency Implements create_api_key_dependency() FastAPI dependency that validates Bearer tokens against a configured list of ApiKey objects (401 on missing, malformed, or unknown tokens). Includes 5 TDD tests covering all cases. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-04 07:31:30 +02:00
tlg	c4eaf5088b	feat: model registry with virtual-to-physical resolution Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-04 07:31:10 +02:00
tlg	690ad46d88	feat: config loading for models.yaml and api_keys.yaml Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-04 07:30:13 +02:00
tlg	a64f32b590	feat: project scaffolding with config files and test fixtures Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-04 07:23:14 +02:00
tlg	cf7c77b3b5	Add llmux implementation plan (30 tasks) Covers project scaffolding, config, auth, VRAM manager, all four backends, API routes, Dockerfile, deployment scripts, and four phases of testing (integration, functional, VRAM, performance). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 22:43:37 +02:00
tlg	45947e80a4	Update manual steps: DNS done, Open WebUI config automated Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 22:25:51 +02:00
tlg	7187c58c5e	Add llmux product requirements in StrictDoc format 42 requirements covering architecture, runtimes, models, VRAM management, API, authentication, configuration, integration, and four-phase testing plan. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 21:11:05 +02:00
tlg	bd0ed74d32	Clarify VRAM eviction rule for cross-priority edge case Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 13:20:53 +02:00
tlg	e6be9dcb85	Add llmux design specification Covers architecture, model registry, VRAM management, API endpoints, container setup, Open WebUI integration, Traefik routing, and four-phase testing plan. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 13:15:46 +02:00
tlg	e7cf075e2f	Initial commit with .gitignore Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-31 17:58:54 +02:00

18 Commits