Tracks GPU VRAM usage (16GB) and handles model loading/unloading with
priority-based eviction: LLM (lowest) -> TTS -> ASR (highest, protected).
Uses asyncio Lock for concurrency safety.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>