Off-and-on trying out an account over at @tal@oleo.cafe due to scraping bots bogging down lemmy.today to the point of near-unusability.

  • 17 Posts
  • 815 Comments
Joined 2 years ago
cake
Cake day: October 4th, 2023

help-circle

  • The incident began from June 2025. Multiple independaent security researchers have assessed that the threat acotor is likely a Chinese state-sponsored group, which would explain the highly selective targeting obseved during the campaign.

    I do kind of wonder about the emacs package management infrastructure system. Like, if attacking things that text editors use online is an actively-used vector.


  • If the chips are just being hoarded to shut out competitors, which is what the OpenAI deal was rumoured to be about, we could see the unused chips getting bought and used, but equally likely we could see the chips (and dimms and cpus) deliberately shredded to prevent them falling into competitor hands.

    By the time the things are cycled out, they may not be terribly compute-competitive, in which case…shrugs

    Also, a major unknown is where models go. Say that a bunch of people decide that they can’t get access to parallel compute hardware or a lot of memory, and they research looking at models split up into MoEs or otherwise broken up. Recent LLM models have been oriented towards MoEs. Llama.cpp, and I assume the other engines capable of running LLMs, has the ability to offload experts that don’t fit in GPU memory to main memory when they aren’t actively being used. Then maybe…having a bank of consumer-level 24GB GPUs or something like that is fine, and having chips with direct access to very large amounts of memory isn’t all that interesting. Then, what becomes essential to being competitive changes.

    EDIT: I also think that it’s safe to say that more memory will probably always benefit. But I’ll also say that it is also probably very likely that our existing models are staggeringly inefficient with memory. We are still doing early passes at this.

    Let me give you an example. I have a LLama 3-based model currently loaded on my Framework Desktop that’s using 96GB of memory for the model and associated storage.

    Prompt: What is 1+1?

    Response: I can answer that. 1+1 = 2.

    Prompt: How about 37 times 12?

    Response: 37 times 12 is 444.

    Now, those are correct answers. But…in order to make an LLM capable of providing that correct response, to do it purely via running a neural net trained on natural language, we had to stick a really inefficient amount of data into memory. That same hardware that I’m running it on has the ability to do billions of integer computations per second. As of today, the software running that model doesn’t provide it access to the hardware, and the model was never trained to use it. But…it could be. And if it were, suddenly a lot of need for storing edges in some neural net wasted on arithmetic go away.

    Plus, we could get better results:

    Prompt: What about 783901/76523?

    Response: 783901 divided by 76523 is approximately 10.23.

    That’s not far off — it is approximately 10.23 — but it should have been rounded to 10.24.

    $ maxima -q
    
    (%i1) float(783901/76523);
    
    (%o1)                         10.243992002404506
    (%i2) 
    

    So we could probably get more-useful models that don’t waste a ton of space in the model if we gave the model access to the computational hardware that’s presently sitting idle and trained it to use it. That’s an off-the-cuff example, but I think that it highlights how we’re solving problems inefficiently in terms of memory.

    Same sort of thing with a lot of other problems that we have (immensely-more-efficient and probably accurate) software packages that we can already solve problems with. If you can train the model to use those and run the software in an isolated sandbox rather than trying to do it itself, then we don’t need to blow space in the LLM on the capabilities there, shrink it.

    If we reduce the memory requirements enough to solve a lot of problems that people want with a much-smaller amount of memory, or with a much-less-densely-connected set of neural networks, the hardware that people care about may radically change. In early 2026, the most-in-demand hardware is hugely-power-hungry parallel processors with immense amounts of memory directly connected to it. But maybe, in 2028, we figure out how to get models to use existing software packages designed for mostly-serial computation, and suddenly, what everyone is falling over themselves to get ahold of is more-traditional computer hardware. Maybe the neural net isn’t even where most of the computation is happening for most workloads.

    Maybe the future is training a model to use a library of software and to write tiny, throwaway programs that run on completely different hardware optimized for this “model scratch computation” purposes, and mostly it consulting those.

    Lot of unknowns there.


  • I posted in a thread a bit back about this, but I can’t find it right now, annoyingly.

    You can use the memory on GPUs as swap, though on Linux, that’s currently through FUSE — going through userspace — and probably not terribly efficient.

    https://wiki.archlinux.org/title/Swap_on_video_RAM

    Linux apparently can use it via HMM: the memory will show up as system memory.

    https://www.kernel.org/doc/html/latest/mm/hmm.html

    Provide infrastructure and helpers to integrate non-conventional memory (device memory like GPU on board memory) into regular kernel path

    It will have higher latency due to the PCI bus. It sounds like it basically uses main memory as a cache, and all attempts to directly access a page on the device trigger an MMU page fault:

    Note that any CPU access to a device page triggers a page fault and a migration back to main memory. For example, when a page backing a given CPU address A is migrated from a main memory page to a device page, then any CPU access to address A triggers a page fault and initiates a migration back to main memory.

    I don’t know how efficiently Linux deals with this for various workloads; if it can accurately predict the next access, it might be able to pre-request pages and do this pretty quickly. That is, it’s not that the throughput is so bad, but the latency is, so you’d want to mitigate that where possible. There are going to be some workloads for which that’s impossible: an example case would be just allocating a ton of memory, and then accessing random pages. The kernel can’t mitigate the PCI latency in that case.

    There’s someone who wrote a driver to do this for old Nvidia cards, something that starts with a “P”, that I also can’t find at the moment, which I thought was the only place where it worked, but it sounds like it can also be done on newer Nvidia and AMD hardware. Haven’t dug into it, but I’m sure that it’d be possible.

    A second problem with using a card as swap is going to be that a Blackwell card uses extreme amounts of power, enough to overload a typical consumer desktop PSU. That presumably only has to be used if you’re using the compute hardware, which you wouldn’t if you’re just moving memory around. I mean, existing GPUs normally use much less power than they do when crunching numbers. But if you’re running a GPU on a PSU that cannot actually provide enough power for it running at full blast, you have to be sure that you never actually power up that hardware.

    EDIT: For an H200 (141 GB memory):

    https://www.techpowerup.com/gpu-specs/h200-nvl.c4254

    TDP: 600 W

    EDIT2: Just to drive home the power issue:

    https://www.financialcontent.com/article/tokenring-2025-12-30-the-great-chill-how-nvidias-1000w-blackwell-and-rubin-chips-ended-the-era-of-air-cooled-data-centers

    NVIDIA’s Blackwell B200 GPUs, which became the industry standard earlier this year, operate at a TDP of 1,200W, while the GB200 Superchip modules—combining two Blackwell GPUs with a Grace CPU—demand a staggering 2,700W per unit. However, it is the Rubin architecture, slated for broader rollout in 2026 but already being integrated into early-access “AI Factories,” that has truly broken the thermal ceiling. Rubin chips are reaching 1,800W to 2,300W, with the “Ultra” variants projected to hit 3,600W.

    A standard 120V, 15A US household circuit can only handle 1,800W, even if you keep it fully loaded. Even if you get a PSU capable of doing that and dedicate the entire household circuit to that, beyond that, you’re talking something like multiple PSUs on independent circuits or 240V service or something like that.

    I have a space heater in my bedroom that can do either 400W or 800W.

    So one question, if one wants to use the card for more memory, is going to be what the ceiling on power usage that you can ensure that those cards will use while most of their on-board hardware is idle.


  • I mean, the article is talking about providing public inbound access, rather than having the software go outbound.

    I suspect that in some cases, people just aren’t aware that they are providing access to the world, and it’s unintentional. Or maybe they just don’t know how to set up a VPN or SSH tunnel or some kind of authenticated reverse proxy or something like that, and want to provide public access for remote use from, say, a phone or laptop or something, which is a legit use case.

    ollama targets being easy to set up. I do kinda think that there’s an argument that maybe it should try to facilitate configuration for that setup, even though it expands the scope of what they’re doing, since I figure that there are probably a lot of people without a lot of, say, networking familiarity who just want to play with local LLMs setting these up.

    EDIT: I do kind of think that there’s a good argument that the consumer router situation plus personal firewall situation is kind of not good today. Like, “I want to have a computer at my house that I want to access remotely via some secure, authenticated mechanism without dicking it up via misconfiguration” is something that people understandably want to do and should be more straightforward.

    I mean, we did it with Bluetooth, did a consumer-friendly way to establish secure communication over insecure airwaves. We don’t really have that for accessing hardware remotely via the Internet.




  • (10^100) + 1 − (10^100) is 1, not 0.

    A “computer algebra system” would have accomplished a similar goal, but been much slower and much more complicated

    $ maxima -q
    
    (%i1) (10^100)+1-(10^100);
    
    (%o1)                                  1
    (%i2) 
    

    There’s no perceptible delay on my laptop here, and I use maxima on my phone and my computers. And a CAS gives you a lot more power to do other things.






  • First, the Linux kernel doesn’t support resource forks at all. They aren’t part of POSIX nor do they really fit the unix file philosophy.

    The resource fork isn’t gonna be really meaningful to essentially all Linux software, but there have been ways to access filesystems that do have resource forks. IIRC, there was some client to mount some Apple file server protocol, exposed the resource forks as a file with a different name and the data fork as just a regular file.

    https://www.kernel.org/doc/html/latest/filesystems/hfsplus.html

    Linux does support HFS+, which has resource forks, as the hfsplus driver, so I imagine that it provides access one way or another.

    searches

    https://superuser.com/questions/363602/how-to-access-resource-fork-of-hfs-filesystem-on-linux

    Add /..namedfork/rsrc to the end of the file name to access the resource fork.

    Also, pretty esoteric, but NTFS, the current Windows file system, also has a resource fork, though it’s not typically used.

    searches

    Ah, the WP article that OP, @evol@lemmy.today linked to describes it.

    The Windows NT NTFS can support forks (and so can be a file server for Mac files), the native feature providing that support is called an alternate data stream. Windows operating system features (such as the standard Summary tab in the Properties page for non-Office files) and Windows applications use them and Microsoft was developing a next-generation file system that has this sort of feature as basis.