Commit Graph

7 Commits

Author SHA256 Message Date
tlg
d55c80ae35 feat: API routes for models, chat, transcription, speech, and admin
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 10:04:45 +02:00
tlg
813bbe0ad0 fix: VRAM eviction cascades through all tiers for large LLM loads
The original eviction logic blocked ASR eviction even when an LLM
genuinely needed all 16GB VRAM (e.g., gpt-oss-20b at 13GB). Now uses
two-pass eviction: first evicts lower/same priority, then cascades to
higher priority as last resort. Added tests for ASR-survives and
full-cascade scenarios.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 09:22:14 +02:00
tlg
d7a091df8c feat: VRAM manager with priority-based model eviction
Tracks GPU VRAM usage (16GB) and handles model loading/unloading with
priority-based eviction: LLM (lowest) -> TTS -> ASR (highest, protected).
Uses asyncio Lock for concurrency safety.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 09:14:41 +02:00
tlg
969bcb3292 feat: API key authentication dependency
Implements create_api_key_dependency() FastAPI dependency that validates
Bearer tokens against a configured list of ApiKey objects (401 on missing,
malformed, or unknown tokens). Includes 5 TDD tests covering all cases.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 07:31:30 +02:00
tlg
c4eaf5088b feat: model registry with virtual-to-physical resolution
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 07:31:10 +02:00
tlg
690ad46d88 feat: config loading for models.yaml and api_keys.yaml
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 07:30:13 +02:00
tlg
a64f32b590 feat: project scaffolding with config files and test fixtures
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 07:23:14 +02:00