From bd0ed74d323927a6ecd315746e0b50fbcb08e320b624115759fe12b5fec17915 Mon Sep 17 00:00:00 2001 From: tlg Date: Fri, 3 Apr 2026 13:20:53 +0200 Subject: [PATCH] Clarify VRAM eviction rule for cross-priority edge case Co-Authored-By: Claude Opus 4.6 (1M context) --- .../llmux/docs/superpowers/specs/2026-04-03-llmux-design.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kischdle/llmux/docs/superpowers/specs/2026-04-03-llmux-design.md b/kischdle/llmux/docs/superpowers/specs/2026-04-03-llmux-design.md index d1e33be..fca5bac 100644 --- a/kischdle/llmux/docs/superpowers/specs/2026-04-03-llmux-design.md +++ b/kischdle/llmux/docs/superpowers/specs/2026-04-03-llmux-design.md @@ -93,7 +93,7 @@ When a request arrives for a model whose physical model is not loaded: - Evict LLM first - Evict TTS second - Evict ASR only as last resort - - Never evict a higher-priority model to load a lower-priority one + - Never evict a higher-priority model to load a lower-priority one (e.g., never evict ASR to make room for TTS; in that case, evict the LLM instead) 4. Load the requested model. ### Concurrency