- Thinking/Instruct toggle via Jinja template patching in llama-cpp
backend: creates separate handlers for thinking-enabled and
thinking-disabled modes
- Replace lovedheart/Qwen3.5-9B-FP8 (safetensors, 15.8GB OOM) with
unsloth/Qwen3.5-9B-GGUF Q8_0 (9.2GB, fits)
- Enable flash_attn in llama-cpp for better performance
- GGUF path resolution falls back to flat gguf/ directory
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ChatterboxTTS and ChatterboxMultilingualTTS are separate classes.
Turbo variant doesn't exist in chatterbox-tts 0.1.7.
Multilingual generate() requires language_id parameter.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
llama-cpp-python backend now uses huggingface_hub to resolve GGUF
file paths within the HF cache structure instead of assuming flat
/models/ directory.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>