Clarify VRAM eviction rule for cross-priority edge case

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
tlg
2026-04-03 13:20:53 +02:00
parent e6be9dcb85
commit bd0ed74d32

View File

@@ -93,7 +93,7 @@ When a request arrives for a model whose physical model is not loaded:
- Evict LLM first
- Evict TTS second
- Evict ASR only as last resort
- Never evict a higher-priority model to load a lower-priority one
- Never evict a higher-priority model to load a lower-priority one (e.g., never evict ASR to make room for TTS; in that case, evict the LLM instead)
4. Load the requested model.
### Concurrency