For some reason, these local LLMS are straight up stupid. I tried deepseek R1 through ollama and it was straight up stupid and gave everything wrong. Anyone got the same results? I did the 7b and 14b (if I remember these numbers correctly), 32 straight up didn’t install because I didn’t have enough RAM.
I had more success with Qwen3 14b/8b,But it still does small mistakes(like for me I asked It to compare Gstreamer and ffmpeg it got the licensing wrong)
Did you use a heavily quantized version? Those models are much smaller than the state of the art ones to begin with, and if you chop their weights from float16 to float2 or something it reduces their capabilities a lot more
I’ve had good experience with smollm2:135m. The test case I used was determining why an HTTP request from one system was not received by another system. In total, there are 10 DB tables it must examine not only for logging but for configuration to understand if/how the request should be processed or blocked. Some of those were mapping tables designed such that table B must be used to join table A to table C, table D must be used to join table C to table E. Therefore I have a path to traverse a complete configuration set (table A <-> table E).
I had to describe each field being pulled (~150 fields total), but it was able to determine the correct reason for the request failure.
The only issue I’ve had was a separate incident using a different LLM when I tried to use AI to generate golang template code for a database library I was wanting to use.
It didn’t use it and recommended a different library.
When instructed that it must use this specific library, it refused (politely).
That caught me off-guard. I shouldn’t have to create a scenario where the AI goes to jail if it fails to use something.
I should just have to provide the instruction and, if that instruction is reasonable, await output.
For some reason, these local LLMS are straight up stupid. I tried deepseek R1 through ollama and it was straight up stupid and gave everything wrong. Anyone got the same results? I did the 7b and 14b (if I remember these numbers correctly), 32 straight up didn’t install because I didn’t have enough RAM.
I had more success with Qwen3 14b/8b,But it still does small mistakes(like for me I asked It to compare Gstreamer and ffmpeg it got the licensing wrong)
Did you use a heavily quantized version? Those models are much smaller than the state of the art ones to begin with, and if you chop their weights from float16 to float2 or something it reduces their capabilities a lot more
I’ve had good experience with smollm2:135m. The test case I used was determining why an HTTP request from one system was not received by another system. In total, there are 10 DB tables it must examine not only for logging but for configuration to understand if/how the request should be processed or blocked. Some of those were mapping tables designed such that table B must be used to join table A to table C, table D must be used to join table C to table E. Therefore I have a path to traverse a complete configuration set (table A <-> table E).
I had to describe each field being pulled (~150 fields total), but it was able to determine the correct reason for the request failure. The only issue I’ve had was a separate incident using a different LLM when I tried to use AI to generate golang template code for a database library I was wanting to use. It didn’t use it and recommended a different library. When instructed that it must use this specific library, it refused (politely). That caught me off-guard. I shouldn’t have to create a scenario where the AI goes to jail if it fails to use something. I should just have to provide the instruction and, if that instruction is reasonable, await output.
The performance is relative to the user. Could it be that you’re a god damned genius? :/