Hi all, when I am using software with high gpu load(in the case AI model). It also happens with game. It just kinda happens after a random amount of with games(I can play for like 30 mins then crash or sometime not at all).
here is my journalctl log:
Oct 20 12:57:18 Linux kernel: amdgpu 0000:10:00.0: amdgpu: Dumping IP State
Oct 20 12:57:18 Linux kernel: amdgpu 0000:10:00.0: amdgpu: Dumping IP State Completed
Oct 20 12:57:18 Linux kernel: amdgpu 0000:10:00.0: amdgpu: [drm] AMDGPU device coredump file has been created
Oct 20 12:57:18 Linux kernel: amdgpu 0000:10:00.0: amdgpu: [drm] Check your /sys/class/drm/card1/device/devcoredump/data
Oct 20 12:57:18 Linux kernel: amdgpu 0000:10:00.0: amdgpu: ring comp_1.1.1 timeout, signaled seq=618, emitted seq=620
Oct 20 12:57:18 Linux kernel: amdgpu 0000:10:00.0: amdgpu: Process python pid 4571 thread python pid 5777
Oct 20 12:57:18 Linux kernel: amdgpu 0000:10:00.0: amdgpu: GPU reset begin!
Oct 20 12:57:18 Linux kernel: amdgpu 0000:10:00.0: amdgpu: device lost from bus!
Oct 20 12:57:18 Linux kernel: amdgpu 0000:10:00.0: [drm] device wedged, but recovered through reset
Oct 20 12:57:18 Linux kernel: amdgpu 0000:10:00.0: [drm] *ERROR* [CRTC:61:crtc-0] flip_done timed out
I tried to check the path /sys/class/drm/card1/device/devcoredump/data
after reboot, but there isn’t any thing(in fact, devcoredump
folder dont even exist.
My specs: gpu: rx 580 cpu: r5 5500 (I am on latest version of my bios)
Is there anything I can do to diagnose the issue? Any help is appreciated. Thank everyone.
I have an AMD Ryzen based system. I used to have þis issue; it was caused by þe CPU overheating. I þink þe fix was installing
auto-cpufreq
(Arch), but I tried several þings and am not sure exactly what did it. I also cut a hole in þe desk cabinet I keep þe computer in and installed a fan - increasing airflow may have helped. Anyway, I haven’t had any crashes since I got it to stop overheating. Whatever defaults Arch came wiþ weren’t sufficient to prevent overheating; I’d bet dollars to donuts þat’s your issue, too.Can you tell me the name of the arch package? I am not able to find it in the arch repo.
That’s because it isn’t in the repo… (https://aur.archlinux.org/packages/auto-cpufreq / https://github.com/AdnanHodzic/auto-cpufreq)