DesTEngSsv006_swd

SHA256

Files

tlg 7a0ff55eb5 fix: remove unsupported KV cache quantization in llama-cpp backend

GGML_TYPE_Q8_0 for type_k/type_v not supported in this llama-cpp-python
version. Keep reduced n_ctx=4096 for VRAM savings.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-05 23:35:05 +02:00

config

fix: Chatterbox uses separate classes per variant, remove turbo

2026-04-05 21:43:40 +02:00

docs/superpowers

Add llmux implementation plan (30 tasks)

2026-04-03 22:43:37 +02:00

llmux

fix: remove unsupported KV cache quantization in llama-cpp backend

2026-04-05 23:35:05 +02:00

scripts

fix: use LLMUX_SRC env var for Dockerfile path in pod creation script

2026-04-05 13:05:38 +02:00

tests

fix: Chatterbox uses separate classes per variant, remove turbo

2026-04-05 21:43:40 +02:00

.gitignore

chore: add .gitignore for venv, caches, and local dirs

2026-04-05 22:17:42 +02:00

Dockerfile

fix: add triton kernels for MXFP4, fix GGUF KV cache quantization

2026-04-05 22:49:16 +02:00

requirements.txt

fix: Dockerfile uses explicit pip install, skip pre-installed packages

2026-04-05 14:10:07 +02:00