I am trying to do what would be a very simple task. I have two HDDs (spinning drives) and I am trying to move the data from one to the other using rsync.

The command in itself is very simple

rsync -r --info=progress2 /mnt/disk1/backupfolder /mnt/disk2/backupfolder

The amount of data to move is around 4tb.

Somehow, once around 89% and another at 94% the process dies, and halts the server itself, making it completely unavailable and unresponsive (pings don’t work, nothing hosted works, ssh does not work). Only a reset via button on the case works here.

At first I was under suspicion was temperature. After constantly checking the second time with beszel, seems everything is in the normal ranges.

Did anyone else experience such bizarre system shutdowns/hangs? In the meantime I am going to test the memory with memtest just to be sure is not that.

Edit: forgot to mention, both drive smart data gives a pass, although they are second hnd bought with warranty.

Edit2: memtest finished and nothing is there (thank goodness, because ram right now is just stupid priced). Some commenters mentioned something on the disks. Will now proceed with this lead

  • SpikesOtherDog@ani.social
    link
    fedilink
    English
    arrow-up
    9
    ·
    edit-2
    6 hours ago

    Sounds like a bad drive, TBH. Not as much the platters but the electronics.

    If you can move all the data off and do a secure erase on it, it will tell you all lot.

    • ZeDoTelhado@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      edit-2
      35 minutes ago

      I am also inclining in this direction. I just ordered a new 8tb drive, and will proceed with smart long tests. When you talk about secure erase, are we talking using dd with /Dev/null?

    • tal@lemmy.today
      link
      fedilink
      English
      arrow-up
      5
      ·
      edit-2
      5 hours ago

      I’d suspect that too. Try just reading from the source drive or just writing to the destination drive and see which causes the problems. Could also be a corrupt filesystem; probably not a bad idea to try to fsck it.

      IME, on a failing disk, you can get I/O blocking as the system retries, but it usually won’t freeze the system unless your swap partition/file is on that drive. Then, as soon as the kernel goes to pull something from swap on the failing drive, everything blocks. If you have a way to view the kernel log (e.g. you’re looking at a Linux console or have serial access or something else that keeps working), you’ll probably see kernel log messages. Might try swapoff -a before doing the rsync to disable swap.

      At first I was under suspicion was temperature.

      I’ve never had it happen, but it is possible for heat to cause issues for hard drives; I’m assuming that OP is checking CPU temperature. If you’ve ever copied the contents of a full disk, the case will tend to get pretty toasty. I don’t know if the firmware will slow down operation to keep temperature sane — all the rotational drives I’ve used in the past have had temperature sensors, so I’d think that it would. Could try aiming a fan at the things. I doubt that that’s it, though.

      • ZeDoTelhado@lemmy.worldOP
        link
        fedilink
        English
        arrow-up
        1
        ·
        36 minutes ago

        The reason I suspected temps was I changed very recently to a define r6 (got it second hand). And since the start I am a bit suspicious of how it performs thermally (terms of sound is actually quite OK).

        I do have a fan on the drives but still one of the drives goes up to 40C still (even with front door open).

        Also, when you talk about fsck, what could be good options for this to check the drive?

    • frongt@lemmy.zip
      link
      fedilink
      English
      arrow-up
      2
      ·
      6 hours ago

      If both drives exhibit the behavior, I’d suspect the drive controller.

      • SpikesOtherDog@ani.social
        link
        fedilink
        English
        arrow-up
        1
        ·
        4 hours ago

        True, but it’s not clear to me that both drives are exhibiting the behavior and it sounds more like a copy between two drives. I wouldn’t rule it out and do think it is a possibility, but in my professional experience drives fail much more frequently than controllers.

        It makes sense to me to test the drives individually, in another system preferably, using smart long test, which is non-destructive. Next test other drives in this system. If there are errors, try changing out the SATA cables, too. If you can shuffle the data off the drives, do so and then try running them through a secure erase in another system. A bad drive should fail the same way in another system.

        My other thought for probably not being the controller is that 4TB is a very long time for a sustained transfer to fail on a flakey component. Also, there are no reports of other errors.