Has anyone tried in organization to use self hosted llm models for agentic programming?
Im curious if it makes any sense. My organization spends fortune on tokens from us companies. I want to recommend something…
Has anyone tried in organization to use self hosted llm models for agentic programming?
Im curious if it makes any sense. My organization spends fortune on tokens from us companies. I want to recommend something…
Qwen 3.6 and gemma4 models are the only ones usable for agentic prog sessions that I and my employer run locally. It’s less stable and slower than third-party services, even on much better hardware (as it’s with my employer). The best way is to go with a provider hosting deepseek flash/pro if your privacy policy allows though. It’s going to be hard to beat their price.
I thought those didn’t support tool calling. Has that changed?
they do
How many concurrent users and what hardware if i may ask?
it’s an h100, I think, no idea about how many users
in my personal setup i use quantized versions on a 3080, which is not great, so I still lean a lot on APIs