Luce DFlash: Qwen3.6-27B at up to 2x throughput on a single RTX 3090

☆ Yσɠƚԋσʂ ☆@lemmy.ml · edit-2 10 hours ago

Luce DFlash: Qwen3.6-27B at up to 2x throughput on a single RTX 3090

doo@sh.itjust.works · 2 hours ago

The irony. Before llamacpp the only way to run llama was using other and on Nvidia GPUs. Then llamacpp expanded to other models, introduced gguf, added backends to run on GPUs and now we’re taking about running qwen using just python on a single Nvidia. Ouroboros is complete.

Luce DFlash: Qwen3.6-27B at up to 2x throughput on a single RTX 3090

Luce DFlash: Qwen3.6-27B at up to 2x throughput on a single RTX 3090

GitHub - Luce-Org/lucebox-hub: Lucebox optimization hub: hand-tuned LLM inference, built for specific consumer hardware.