Proxmox somehow just dies during rsync

ZeDoTelhado@lemmy.world · edit-2 38 minutes ago

Proxmox somehow just dies during rsync

SpikesOtherDog@ani.social · edit-2 6 hours ago

Sounds like a bad drive, TBH. Not as much the platters but the electronics.

If you can move all the data off and do a secure erase on it, it will tell you all lot.

ZeDoTelhado@lemmy.world · edit-2 35 minutes ago

I am also inclining in this direction. I just ordered a new 8tb drive, and will proceed with smart long tests. When you talk about secure erase, are we talking using dd with /Dev/null?

tal@lemmy.today · edit-2 5 hours ago

I’d suspect that too. Try just reading from the source drive or just writing to the destination drive and see which causes the problems. Could also be a corrupt filesystem; probably not a bad idea to try to fsck it.

IME, on a failing disk, you can get I/O blocking as the system retries, but it usually won’t freeze the system unless your swap partition/file is on that drive. Then, as soon as the kernel goes to pull something from swap on the failing drive, everything blocks. If you have a way to view the kernel log (e.g. you’re looking at a Linux console or have serial access or something else that keeps working), you’ll probably see kernel log messages. Might try swapoff -a before doing the rsync to disable swap.

At first I was under suspicion was temperature.

I’ve never had it happen, but it is possible for heat to cause issues for hard drives; I’m assuming that OP is checking CPU temperature. If you’ve ever copied the contents of a full disk, the case will tend to get pretty toasty. I don’t know if the firmware will slow down operation to keep temperature sane — all the rotational drives I’ve used in the past have had temperature sensors, so I’d think that it would. Could try aiming a fan at the things. I doubt that that’s it, though.

ZeDoTelhado@lemmy.world · 36 minutes ago

The reason I suspected temps was I changed very recently to a define r6 (got it second hand). And since the start I am a bit suspicious of how it performs thermally (terms of sound is actually quite OK).

I do have a fan on the drives but still one of the drives goes up to 40C still (even with front door open).

Also, when you talk about fsck, what could be good options for this to check the drive?

frongt@lemmy.zip · 6 hours ago

If both drives exhibit the behavior, I’d suspect the drive controller.

SpikesOtherDog@ani.social · 4 hours ago

True, but it’s not clear to me that both drives are exhibiting the behavior and it sounds more like a copy between two drives. I wouldn’t rule it out and do think it is a possibility, but in my professional experience drives fail much more frequently than controllers.

It makes sense to me to test the drives individually, in another system preferably, using smart long test, which is non-destructive. Next test other drives in this system. If there are errors, try changing out the SATA cables, too. If you can shuffle the data off the drives, do so and then try running them through a secure erase in another system. A bad drive should fail the same way in another system.

My other thought for probably not being the controller is that 4TB is a very long time for a sustained transfer to fail on a flakey component. Also, there are no reports of other errors.