GGML_TYPE_Q8_0 for type_k/type_v not supported in this llama-cpp-python version. Keep reduced n_ctx=4096 for VRAM savings. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
GGML_TYPE_Q8_0 for type_k/type_v not supported in this llama-cpp-python version. Keep reduced n_ctx=4096 for VRAM savings. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>