Clarify VRAM eviction rule for cross-priority edge case
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -93,7 +93,7 @@ When a request arrives for a model whose physical model is not loaded:
|
||||
- Evict LLM first
|
||||
- Evict TTS second
|
||||
- Evict ASR only as last resort
|
||||
- Never evict a higher-priority model to load a lower-priority one
|
||||
- Never evict a higher-priority model to load a lower-priority one (e.g., never evict ASR to make room for TTS; in that case, evict the LLM instead)
|
||||
4. Load the requested model.
|
||||
|
||||
### Concurrency
|
||||
|
||||
Reference in New Issue
Block a user