Let me preface by saying I despise corpo llm use and slop creation. I hate it.
However, it does seem like it could be an interesting helpful tool if ran locally in the cli. I’ve seen quite a few people doing this. Again, it personally makes me feel like a lazy asshole when I use it, but its not much different from web searching commands every minute (other than that the data used in training it is obtained by pure theft).
Have any of you tried this out?


I’m running gpt-oss-120b and glm-4.5-air locally in llama.cpp.
It’s pretty useful for shell commands and has replaced a lot of web searching for me.
The smaller models (4b, 8b, 20b) are not all that useful without providing them data to search through (e.g. via RAG) and even then, they have a bad “understanding” of more complicated prompts.
The 100b+ models are much more interesting since they have a lot more knowledge in them. They are still not useful for very complicated tasks but they can get you started quite quickly with regular shell commands and scripts.
The catch: You need about 128GB of VRAM/RAM to run these. The easiest way to do this locally is to either get a Strix Halo mini PC with 128GB VRAM or put 128GB of RAM in a server/PC.