DesTEngSsv006_swd/kischdle/llmux/Dockerfile at 7a0ff55eb5dcdc8e386e2aea7864cba2581093db324df6bcb3da17b12cba1e2a

SHA256

Files

tlg da35e94b16 fix: add triton kernels for MXFP4, fix GGUF KV cache quantization

- Add 'kernels' package to Dockerfile for native MXFP4 execution
  (fixes gpt-oss-20b OOM: 15.2GB→13.5GB)
- Reduce GGUF n_ctx from 8192 to 4096 and quantize KV cache to Q8_0
  to reduce VRAM usage
- Use GGML_TYPE_Q8_0 constant instead of string for type_k/type_v

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-05 22:49:16 +02:00

2.3 KiB

Raw Blame History

View Raw

2.3 KiB Raw Blame History

2.3 KiB

Raw Blame History