I wanted Claude Code-style workflows without sending code to the cloud, so I built Loki

Dark-Alex-17@lemmy.world · 4 hours ago

I wanted Claude Code-style workflows without sending code to the cloud, so I built Loki

Dark-Alex-17@lemmy.world · 3 hours ago

I’m using a ton of different ones but the main ones I use daily are

gemma4:26b
deepseek-coder
deepseek-r1:32b
devstral:24b
granite-code:34b
openthinker:latest
phi4:latest
qwen3:30b
mixtral:8x22b

I’m also going to use this opportunity to plug an amazing project to help figure out which models will work well on my hardware: https://github.com/AlexsJones/llmfit Is amazing!

Blue_Morpho@lemmy.world · 3 hours ago

Isn’t it a huge delay to swap out to a different ~30b model every few minutes depending on the use case?

Dark-Alex-17@lemmy.world · 2 hours ago

Unfortunately, yes. It’s one reason I’m trying to figure out a good mechanism to maybe do something like multiple ollama hosts. So like: you can specify what model to use specifically in an agent. But if an agent delegates to a sub-agent, it unloads that model and loads the new one. I’m trying to figure out if there’s a way to “alternate” between multiple hosts (say, ollama running locally and one running on your server), so that when a switch happens, it does it on the secondary host while also looking ahead to see what needs to be switched, if anything, on the primary host.

It supports multiple Ollama hosts right now as-is so what I’ve honestly been doing for the time being is specify which model on which host each agent uses so there’s only loading of one model at the beginning of a session. Then there’s no unloading/loading/etc. The other thing I’ve been trying is to see how small I can get the models to be without losing performance. While the tricks implemented in Loki help dramatically, I know there’s still a lot more I can do to improve it further.