-
3edc055299
fix: Open WebUI integration — Harmony stripping, VRAM eviction, concurrency lock
main
tlg
2026-04-08 21:50:39 +02:00
-
06923d51b4
fix: streaming response fix + GPT-OSS-20B-Uncensored MXFP4 GGUF
tlg
2026-04-06 22:21:22 +02:00
-
61308703dc
feat: replace gpt-oss-20b-uncensored with HauhauCS MXFP4 GGUF
tlg
2026-04-06 16:41:41 +02:00
-
7c4bbe0b29
feat: Jinja template thinking toggle, Qwen3.5-9B GGUF Q8_0
tlg
2026-04-06 09:44:02 +02:00
-
7a0ff55eb5
fix: remove unsupported KV cache quantization in llama-cpp backend
tlg
2026-04-05 23:35:05 +02:00
-
da35e94b16
fix: add triton kernels for MXFP4, fix GGUF KV cache quantization
tlg
2026-04-05 22:49:16 +02:00
-
a88f0afb8a
chore: add .gitignore for venv, caches, and local dirs
tlg
2026-04-05 22:17:42 +02:00
-
d615bb4553
fix: Chatterbox uses separate classes per variant, remove turbo
tlg
2026-04-05 21:43:40 +02:00
-
f24a225baf
fix: resolve GGUF paths through HF cache, add model_id to GGUF config
tlg
2026-04-05 21:33:36 +02:00
-
38e1523d7e
feat: proper VRAM cleanup and admin clear-vram endpoint
tlg
2026-04-05 21:03:39 +02:00
-
aa7a160118
fix: proper VRAM cleanup on model unload + CUDA alloc config
tlg
2026-04-05 17:59:23 +02:00
-
d3285bad8a
fix: add accelerate package for transformers device_map support
tlg
2026-04-05 17:19:17 +02:00
-
f2f73d204c
fix: Dockerfile multi-stage build with working dependency resolution
tlg
2026-04-05 15:46:34 +02:00
-
d6a3fe5427
fix: Dockerfile uses explicit pip install, skip pre-installed packages
tlg
2026-04-05 14:10:07 +02:00
-
8816a06369
fix: add --break-system-packages for pip in container
tlg
2026-04-05 14:07:14 +02:00
-
8a6f6a5097
fix: use LLMUX_SRC env var for Dockerfile path in pod creation script
tlg
2026-04-05 13:05:38 +02:00
-
d5a98879c9
fix: use full Docker Hub registry path in Dockerfile
tlg
2026-04-05 13:04:53 +02:00
-
2f4d242f55
fix: use llm venv paths for huggingface-cli and python in download script
tlg
2026-04-05 12:52:09 +02:00
-
1a26d34ea5
feat: Dockerfile, model download script, and pod creation script
tlg
2026-04-05 10:09:34 +02:00
-
17818a3860
feat: FastAPI app assembly with all routes and backend wiring
tlg
2026-04-05 10:04:56 +02:00
-
d55c80ae35
feat: API routes for models, chat, transcription, speech, and admin
tlg
2026-04-05 10:04:45 +02:00
-
ef44bc09b9
feat: Chatterbox TTS backend with turbo/multilingual/default variants
tlg
2026-04-04 09:40:42 +02:00
-
c6677dcab3
feat: llama-cpp-python backend with GGUF, vision, and tool support
tlg
2026-04-04 09:40:40 +02:00
-
de25b5e2a7
feat: transformers ASR backend for cohere-transcribe
tlg
2026-04-04 09:40:39 +02:00
-
449e37d318
feat: abstract base class for model backends
tlg
2026-04-04 09:29:35 +02:00
-
813bbe0ad0
fix: VRAM eviction cascades through all tiers for large LLM loads
tlg
2026-04-04 09:22:14 +02:00
-
d7a091df8c
feat: VRAM manager with priority-based model eviction
tlg
2026-04-04 09:14:41 +02:00
-
969bcb3292
feat: API key authentication dependency
tlg
2026-04-04 07:31:30 +02:00
-
c4eaf5088b
feat: model registry with virtual-to-physical resolution
tlg
2026-04-04 07:31:10 +02:00
-
690ad46d88
feat: config loading for models.yaml and api_keys.yaml
tlg
2026-04-04 07:30:13 +02:00
-
a64f32b590
feat: project scaffolding with config files and test fixtures
tlg
2026-04-04 07:23:14 +02:00
-
cf7c77b3b5
Add llmux implementation plan (30 tasks)
tlg
2026-04-03 22:43:37 +02:00
-
45947e80a4
Update manual steps: DNS done, Open WebUI config automated
tlg
2026-04-03 22:25:51 +02:00
-
7187c58c5e
Add llmux product requirements in StrictDoc format
tlg
2026-04-03 21:11:05 +02:00
-
bd0ed74d32
Clarify VRAM eviction rule for cross-priority edge case
tlg
2026-04-03 13:20:53 +02:00
-
e6be9dcb85
Add llmux design specification
tlg
2026-04-03 13:15:46 +02:00
-
e7cf075e2f
Initial commit with .gitignore
tlg
2026-03-31 17:58:54 +02:00