• Hoimo@ani.social
    link
    fedilink
    arrow-up
    10
    ·
    edit-2
    7 hours ago

    Can someone explain Laurie Wired to me? I see her in my recommendations sometimes, but I don’t click obvious clickbait.

    Take this one, is it actually a design flaw or is it just a compromise that was made for good reasons and is kept around for those same reasons?

    Maybe I’ll watch the video and report back, can always remove it from my watch history.

    Edit: It’s an hour? Not like I won’t watch hour-long videos, but that’s a lot to figure out if it was clickbait or not.

      • ProdigalFrog@slrpnk.netOP
        link
        fedilink
        English
        arrow-up
        1
        ·
        edit-2
        21 minutes ago

        90% of youtube thumbnails have a face in them, usually of an exaggerated emotion, and that goes for both male and female youtubers. Many youtubers have confirmed time and time again that the algorithm favors faces by a pretty wide margin, and thus most play that game.

        I’m not a fan of it, I wish they didn’t or the algorithm was changed to not favor it, but I understand why they do it. Though I don’t think it’s particularly gendered as your image claims.

      • Hoimo@ani.social
        link
        fedilink
        arrow-up
        1
        ·
        6 minutes ago

        The title is objectively clickbait though, even if she does eventually explain the design flaw. But I think if she’s doing an hour on the history of RAM design, she could be honest about that.

        This is probably a matter of taste, but I can’t sit through 58 minutes of slow buildup just to get to “ram has to refresh, that takes 300 nanoseconds sometimes, you could eliminate that at the hardware level by making all ram twice as expensive”

        Thanks Laurie, but you don’t have to pretend all ram is fundamentally broken to make me watch an hour of maths and engineering. 3blue1brown does that all the time with titles like “What is a laplace transform?” and thumbnails of plain formulas on black backgrounds.

    • Redjard@reddthat.com
      link
      fedilink
      arrow-up
      7
      ·
      6 hours ago

      In those cases it’s less painfull to use a website to extract the transcript and read that.
      You can skim around text way easier than a video.


      TLDR: ddr ram refreshes itself, making cpus freeze sometimes when reading ram. High speed traders don’t want that so they figure out ways to make data live with two copies on two different portions of ram that freeze at different times. This is impractical for normal programs. Most of the effort is spent on working around multiple abstraction layers, where the os and then the ram itself changes where specifically data goes.

      Every 3.9 microsconds, your RAM goes blind. Your RAM physically has to shut down to recharge.
      This lockout is defined by the Jedex spec as TRFC or refresh cycle time. Now, a regular read on DDR5 might take you like 80 nanoseconds. But if you happen to accidentally get caught by this lockout, that’s going to bump you up to about 400 nanoseconds.

      Think for a second. What industry might really care about wasting a couple hundred nanconds where one incorrectly timed stall would cost you millions of dollars? That’s right, the world of highfrequency trading.

      [custom benchmark program on ddr4 ram and 2.65GHz cpu:] When you plot the gaps between the slow reads, they’re all the same, 7.82 microsconds [20,720 cycles] apart every single time. […] So, the question is, if this is so periodic, can we potentially predict when the refresh cycle is going to happen and then try to read around it?

      See, it’s not like the whole stick of RAM gets locked when the refresh cycle happens. It’s a lot more granular than that. With DDR4, for example, the refresh happens at the rank level. And then DDR5 gets even more complicated where you can like subsection down even further than that.

      The memory controller does what’s called opportunistic refresh scheduling, which basically means that it can postpone up to eight refreshes and then catch up later if we happen to be in a busy period. […] how the heck are you going to predict opportunistic refresh scheduling?

      Then stuff about virtual memory management in modern OSs

      And I take two copies of my data and I space them nicely 128 bytes apart. And I’m feeling pretty good about myself, but for all I know, it could be straddling a page boundary and then the OS could have decided to put them wherever it felt like putting them.

      physical ram address issues:

      So the exor [XOR?] hashing phase kind of acts like a load balancer baked like directly into the silicon itself. Takes in your physical address, does a little bit of scrambling, and tries to spread it out evenly across all of the banks and channels.

      This also helps with rowhammer attacks where writing close to a physical address lets you write to that other address.

      So, DRAM [XOR] hashing strategies were already not documented publicly. But then after the entire rowhammer thing, obviously, there was even less incentive to publish these load balancing math strategies publicly.

      If AMD and Intel documented this kind of stuff, they’d kind of be like locking themselves into a strategy because customers would start to build against it. And then next year when it comes around, it’s really going to make your life difficult because you’re not going to be able to change things nearly as easily. But if you just don’t document it, well, who’s going to complain? only weirdos doing crazy stuff like me.

      Inside of your CPU, right next to the memory controllers, there’s actually tiny little hardware counters, one for every channel. […] If we do a simple pseudo [sudo] modprobe AMD uncore, it reveals those hardware counters to the standard Linux Perf tool. […] If I write a tight loop of code that constantly flushes the cache and hammers one particular memory address, then that means one counter should start to light up. And theoretically, this should tell us exactly what channel that our data is living on.

      Can’t really tell what’s going on here. Well, that, my friend, is OS noise. […] The problem is these counters are pretty dumb. So you can’t tell it only count the reads from this particular process. […] All we need to do is run it 50,000 times. […] See that spike? Super cool. And now I really know where my data lives.

      So, to me, I don’t really care which channel I’m ending up on, whether that’s channel 3, channel 7, whatever, doesn’t matter to me. All I need to do is make sure I’m ending up on different channels. […] The mathematical answer is that XOR is linear over GF2 which is actually really simple. Basically that means that no matter what scrambling the memory controller does, flipping a base bit will always flip the output no matter how many things are chained together.

      Goes on to write low latency benchmarks which show lower latency.