DesTEngSsv006_swd

SHA256

Author	SHA256	Message	Date
tlg	3edc055299	fix: Open WebUI integration — Harmony stripping, VRAM eviction, concurrency lock - Add harmony.py: strip GPT-OSS-20B analysis/thinking channel from both streaming and non-streaming responses (HarmonyStreamFilter + extract_final_text) - Add per-model asyncio.Lock in llamacpp backend to prevent concurrent C++ access that caused container segfaults (exit 139) - Fix chat handler swap for streaming: move inside _stream_generate within lock scope (was broken by try/finally running before stream was consumed) - Filter /v1/models to return only LLM models (hide ASR/TTS from chat dropdown) - Correct Qwen3.5-4B estimated_vram_gb: 4 → 9 (actual allocation ~8GB) - Add GPU memory verification after eviction with retry loop in vram_manager - Add HF_TOKEN_PATH support in main.py for gated model access - Add /v1/audio/models and /v1/audio/voices discovery endpoints (no auth) - Add OOM error handling in both backends and chat route - Add AUDIO_STT_SUPPORTED_CONTENT_TYPES for webm/wav/mp3/ogg - Add performance test script (scripts/perf_test.py) - Update tests to match current config (42 tests pass) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 21:50:39 +02:00
tlg	61308703dc	feat: replace gpt-oss-20b-uncensored with HauhauCS MXFP4 GGUF aoxo model had no quantization (BF16, ~40GB OOM). HauhauCS model uses MXFP4 GGUF format, loads at 11.9GB via llama-cpp backend. All three reasoning levels (Low/Medium/High) work. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-06 16:41:41 +02:00
tlg	7c4bbe0b29	feat: Jinja template thinking toggle, Qwen3.5-9B GGUF Q8_0 - Thinking/Instruct toggle via Jinja template patching in llama-cpp backend: creates separate handlers for thinking-enabled and thinking-disabled modes - Replace lovedheart/Qwen3.5-9B-FP8 (safetensors, 15.8GB OOM) with unsloth/Qwen3.5-9B-GGUF Q8_0 (9.2GB, fits) - Enable flash_attn in llama-cpp for better performance - GGUF path resolution falls back to flat gguf/ directory Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-06 09:44:02 +02:00
tlg	d615bb4553	fix: Chatterbox uses separate classes per variant, remove turbo ChatterboxTTS and ChatterboxMultilingualTTS are separate classes. Turbo variant doesn't exist in chatterbox-tts 0.1.7. Multilingual generate() requires language_id parameter. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-05 21:43:40 +02:00
tlg	f24a225baf	fix: resolve GGUF paths through HF cache, add model_id to GGUF config llama-cpp-python backend now uses huggingface_hub to resolve GGUF file paths within the HF cache structure instead of assuming flat /models/ directory. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-05 21:33:36 +02:00
tlg	a64f32b590	feat: project scaffolding with config files and test fixtures Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-04 07:23:14 +02:00

6 Commits