Commit Graph

5 Commits

Author SHA256 Message Date
tlg
61308703dc feat: replace gpt-oss-20b-uncensored with HauhauCS MXFP4 GGUF
aoxo model had no quantization (BF16, ~40GB OOM). HauhauCS model
uses MXFP4 GGUF format, loads at 11.9GB via llama-cpp backend.
All three reasoning levels (Low/Medium/High) work.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 16:41:41 +02:00
tlg
7c4bbe0b29 feat: Jinja template thinking toggle, Qwen3.5-9B GGUF Q8_0
- Thinking/Instruct toggle via Jinja template patching in llama-cpp
  backend: creates separate handlers for thinking-enabled and
  thinking-disabled modes
- Replace lovedheart/Qwen3.5-9B-FP8 (safetensors, 15.8GB OOM) with
  unsloth/Qwen3.5-9B-GGUF Q8_0 (9.2GB, fits)
- Enable flash_attn in llama-cpp for better performance
- GGUF path resolution falls back to flat gguf/ directory

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 09:44:02 +02:00
tlg
d615bb4553 fix: Chatterbox uses separate classes per variant, remove turbo
ChatterboxTTS and ChatterboxMultilingualTTS are separate classes.
Turbo variant doesn't exist in chatterbox-tts 0.1.7.
Multilingual generate() requires language_id parameter.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 21:43:40 +02:00
tlg
f24a225baf fix: resolve GGUF paths through HF cache, add model_id to GGUF config
llama-cpp-python backend now uses huggingface_hub to resolve GGUF
file paths within the HF cache structure instead of assuming flat
/models/ directory.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 21:33:36 +02:00
tlg
a64f32b590 feat: project scaffolding with config files and test fixtures
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 07:23:14 +02:00