Switzerland government release full FOSS LLM under Apache 2.0, argue for AI as Public Utility

Cooper8@feddit.online · 1 day ago

Switzerland government release full FOSS LLM under Apache 2.0, argue for AI as Public Utility

ABetterTomorrow@sh.itjust.works · 21 hours ago

I can’t find any hardware requirements for this. What will it take to run this smoothly?

General_Effort@lemmy.world · 10 hours ago

For fastest inference, you want to fit the entire model in VRAM. Plus, you need a few GB extra for context.

Context means the text (+images, etc) it works on. That’s the chat log, in the case of a chatbot, plus any texts you might want summarized/translated/ask questions about.

Models can be quantized, which is a kind of lossy compression. They get smaller but also dumber. As with JPGs, the quality loss is insignificant at first and absolutely worth it.

Inference can be split between GPU and CPU, substituting VRAM with normal RAM. Makes it slower, but you’ll probably will still feel that it’s smooth.

Basically, it’s all trade-offs between quality, context size, and speed.

IngeniousRocks (They/She) @lemmy.dbzer0.com · edit-2 9 hours ago

8b parameter models are relatively fast on 3rd gen RTX hardware with at least 8gigs of vram, CPU inferencing is slower and requires boatloads of ram but is doable on older hardware. These really aren’t designed to run on consumer hardware, but the 8b model should do fine on relatively powerful consumer hardware.

If you have something that would’ve been a high end gaming rig 4 years ago, you’re good.

If you wanna be more specific, check huggingface, they have charts. If you’re using linux with nvidia hardware you’ll be better off doing CPU inferencing.

Edit: Omg y’all I didn’t think I needed to include my sources but this is quite literally a huge issue on nvidia. Nvidia works fine on linux but you’re limited to whatever VRAM is on your video card, no RAM sharing. Y’all can disagree all you want but those are the facts. Thays why AMD and CPU inferencing are more reliable, and allow for higher context limits. They are not faster though.

Sources for nvidia stuff https://github.com/NVIDIA/open-gpu-kernel-modules/discussions/618

https://forums.developer.nvidia.com/t/shared-vram-on-linux-super-huge-problem/336867/

https://github.com/NVIDIA/open-gpu-kernel-modules/issues/758

https://forums.opensuse.org/t/is-anyone-getting-vram-backed-by-system-memory-with-nvidia-drivers/185902

Jakeroxs@sh.itjust.works · 9 hours ago

Disagree on Linux nvidia support, it works fine

IngeniousRocks (They/She) @lemmy.dbzer0.com · 9 hours ago

deleted by creator

frongt@lemmy.zip · 8 hours ago

So that’s only in the case of sharing RAM? Because the vast majority of people use Nvidia and Linux without issue, myself included.

IngeniousRocks (They/She) @lemmy.dbzer0.com · edit-2 7 hours ago

Yes, precisely.

If you’re trying to use large models, you need more RAM than consumer grade nvidia products can supply. Without system ram sharing, the models error out and start repeating themselves or just crash and need to be restarted.

This can be fixed with CPU inferencing but would be much slower.

An 8b model will run fine on an RTX30 series, a 70b model will absolutely not. BUT you can do cpu inferencing with the 70b model if you don’t mind the wait.

frongt@lemmy.zip · 5 hours ago

Yeah for most people that’s not an issue, because they aren’t trying to run 70b models.

Jakeroxs@sh.itjust.works · 8 hours ago

No shit lmao, are you going to tell me a horse can’t pull an 18 wheeler trailer next?

IngeniousRocks (They/She) @lemmy.dbzer0.com · 7 hours ago

You don’t need to be rude.
My original comment was in reply to someone looking for this type of information, the conversation then continued.
Disengage: I don’t want to deal with it today frankly, I don’t have time for rude people.

Jakeroxs@sh.itjust.works · edit-2 6 hours ago

Its a leap to say “nvidia AI support on Linux is bad” when you mean a very particular set of circumstances (which don’t apply to someone who would just be getting into it as they’re using consumer grade hardware) causes issues.

I have a 3080ti and run 8b - 12b all in VRAM just fine, which is what a majority of people getting into it would be doing as well, again to my point about pulling an 18 wheeler trailer with a horse, you’ve got worse problems then nvidia on Linux (if you’re trying to run a 70b model on consumer hardware).

ABetterTomorrow@sh.itjust.works · 12 hours ago

Thanks for the reply. Never been on the HF site and doing it on mobile of the first time I seem lost. I couldn’t find it but I’m sure I will.

m532@lemmygrad.ml · 14 hours ago

A desktop cpu and 20gb of ram.

Switzerland government release full FOSS LLM under Apache 2.0, argue for AI as Public Utility

Switzerland government release full FOSS LLM under Apache 2.0, argue for AI as Public Utility

Apertus: a fully open, transparent, multilingual language model