Commit Graph

  • 3edc055299 fix: Open WebUI integration — Harmony stripping, VRAM eviction, concurrency lock main tlg 2026-04-08 21:50:39 +02:00
  • 06923d51b4 fix: streaming response fix + GPT-OSS-20B-Uncensored MXFP4 GGUF tlg 2026-04-06 22:21:22 +02:00
  • 61308703dc feat: replace gpt-oss-20b-uncensored with HauhauCS MXFP4 GGUF tlg 2026-04-06 16:41:41 +02:00
  • 7c4bbe0b29 feat: Jinja template thinking toggle, Qwen3.5-9B GGUF Q8_0 tlg 2026-04-06 09:44:02 +02:00
  • 7a0ff55eb5 fix: remove unsupported KV cache quantization in llama-cpp backend tlg 2026-04-05 23:35:05 +02:00
  • da35e94b16 fix: add triton kernels for MXFP4, fix GGUF KV cache quantization tlg 2026-04-05 22:49:16 +02:00
  • a88f0afb8a chore: add .gitignore for venv, caches, and local dirs tlg 2026-04-05 22:17:42 +02:00
  • d615bb4553 fix: Chatterbox uses separate classes per variant, remove turbo tlg 2026-04-05 21:43:40 +02:00
  • f24a225baf fix: resolve GGUF paths through HF cache, add model_id to GGUF config tlg 2026-04-05 21:33:36 +02:00
  • 38e1523d7e feat: proper VRAM cleanup and admin clear-vram endpoint tlg 2026-04-05 21:03:39 +02:00
  • aa7a160118 fix: proper VRAM cleanup on model unload + CUDA alloc config tlg 2026-04-05 17:59:23 +02:00
  • d3285bad8a fix: add accelerate package for transformers device_map support tlg 2026-04-05 17:19:17 +02:00
  • f2f73d204c fix: Dockerfile multi-stage build with working dependency resolution tlg 2026-04-05 15:46:34 +02:00
  • d6a3fe5427 fix: Dockerfile uses explicit pip install, skip pre-installed packages tlg 2026-04-05 14:10:07 +02:00
  • 8816a06369 fix: add --break-system-packages for pip in container tlg 2026-04-05 14:07:14 +02:00
  • 8a6f6a5097 fix: use LLMUX_SRC env var for Dockerfile path in pod creation script tlg 2026-04-05 13:05:38 +02:00
  • d5a98879c9 fix: use full Docker Hub registry path in Dockerfile tlg 2026-04-05 13:04:53 +02:00
  • 2f4d242f55 fix: use llm venv paths for huggingface-cli and python in download script tlg 2026-04-05 12:52:09 +02:00
  • 1a26d34ea5 feat: Dockerfile, model download script, and pod creation script tlg 2026-04-05 10:09:34 +02:00
  • 17818a3860 feat: FastAPI app assembly with all routes and backend wiring tlg 2026-04-05 10:04:56 +02:00
  • d55c80ae35 feat: API routes for models, chat, transcription, speech, and admin tlg 2026-04-05 10:04:45 +02:00
  • ef44bc09b9 feat: Chatterbox TTS backend with turbo/multilingual/default variants tlg 2026-04-04 09:40:42 +02:00
  • c6677dcab3 feat: llama-cpp-python backend with GGUF, vision, and tool support tlg 2026-04-04 09:40:40 +02:00
  • de25b5e2a7 feat: transformers ASR backend for cohere-transcribe tlg 2026-04-04 09:40:39 +02:00
  • 449e37d318 feat: abstract base class for model backends tlg 2026-04-04 09:29:35 +02:00
  • 813bbe0ad0 fix: VRAM eviction cascades through all tiers for large LLM loads tlg 2026-04-04 09:22:14 +02:00
  • d7a091df8c feat: VRAM manager with priority-based model eviction tlg 2026-04-04 09:14:41 +02:00
  • 969bcb3292 feat: API key authentication dependency tlg 2026-04-04 07:31:30 +02:00
  • c4eaf5088b feat: model registry with virtual-to-physical resolution tlg 2026-04-04 07:31:10 +02:00
  • 690ad46d88 feat: config loading for models.yaml and api_keys.yaml tlg 2026-04-04 07:30:13 +02:00
  • a64f32b590 feat: project scaffolding with config files and test fixtures tlg 2026-04-04 07:23:14 +02:00
  • cf7c77b3b5 Add llmux implementation plan (30 tasks) tlg 2026-04-03 22:43:37 +02:00
  • 45947e80a4 Update manual steps: DNS done, Open WebUI config automated tlg 2026-04-03 22:25:51 +02:00
  • 7187c58c5e Add llmux product requirements in StrictDoc format tlg 2026-04-03 21:11:05 +02:00
  • bd0ed74d32 Clarify VRAM eviction rule for cross-priority edge case tlg 2026-04-03 13:20:53 +02:00
  • e6be9dcb85 Add llmux design specification tlg 2026-04-03 13:15:46 +02:00
  • e7cf075e2f Initial commit with .gitignore tlg 2026-03-31 17:58:54 +02:00