I have used Ollama so far and it’s indeed quite slow, can you recommend a good guide for setting up llama.cpp (on linux). I have Ollama running in a docker container with openwebui, that kind of setup would be ideal.
I just run the llama-swap docker container with a config file mounted, set to listen for config changes so I don’t have to restart it to add new models. I don’t have a guide besides the README for llama-swap.
I have used Ollama so far and it’s indeed quite slow, can you recommend a good guide for setting up llama.cpp (on linux). I have Ollama running in a docker container with openwebui, that kind of setup would be ideal.
I just run the
llama-swapdocker container with a config file mounted, set to listen for config changes so I don’t have to restart it to add new models. I don’t have a guide besides the README for llama-swap.