Has anyone tried in organization to use self hosted llm models for agentic programming?

Im curious if it makes any sense. My organization spends fortune on tokens from us companies. I want to recommend something…

  • MagicShel@lemmy.zip
    link
    fedilink
    English
    arrow-up
    2
    ·
    2 days ago

    I run this setup with 36GB (32+4). Local LLMs can be really effective BUT you are constrained by context size in a way you aren’t on cloud services.

    Cline supports running a local model through lmstudio but my experience feeding it any significant tasks is it just can’t handle reading and holding the contexts to build components for enterprise scale applications.

    I use Claude to write a lot of utility one-off scripts. With a maximum window of 1M tokens I can hit 30+% context just writing Python scripts. API contracts, development standards, existing reusable modules, and sometimes reading the code/documentation of the services I’m going to be calling.

    My MacBook can’t handle 300k token contexts. 30k seems doable. I should see how it handles my utility script folder…

    Anyway that’s still no Claude but if you need a cheaper model and you can afford for developers to spend time on it before ultimately deciding they need to spend for Claude or Codex or Gemini, then rubbing a local model on a beefy MacBook is 100% an option.

    Stepping up from there to building a locally hosted LLM is probably the worst of all worlds. It will be a beefy CapEx, prone to saturation by all the users, and you will most likely still have to punt the hardest jobs to cloud AI. It can certainly be done and done well, but the best example I know runs on $250-500k worth of hardware (to service a pretty big number of users to be fair).