cross-posted from: https://sh.itjust.works/post/61139432
I seriously can’t believe how much progress he’s made for the FOSS community. He actually might take a bite out of the big 3’s profits with this
cross-posted from: https://sh.itjust.works/post/61139432
I seriously can’t believe how much progress he’s made for the FOSS community. He actually might take a bite out of the big 3’s profits with this
I have a rx5600xt (6gb), 32gb ram, ryzen 3600. System hasn’t been updated since i built it during covid. QwenV3-vl35B is the heftiest thing I can run, it gets around 2 tokens/sec, in LM studio. It’s easier than most people seem to think.
How do you now run out of RAM? Does it offload to system RAM?
Yes, offloads into system. Oh and i forgot to mention that’s with the context set around 25k. That can vary greatly per model though, it’s taken some experimentation to figure that out.
Thank you. That’s good to know.