What happened before the error:
I've been trying to diagnose a restart issue that only happens when playing Skyrim (modded).
Restarts happened before, but this is the first time i saw this error upon booting:
❯ journalctl -p 3 -b lip 04 04:55:02 cachyos kernel: [Hardware Error]: System Fatal error. lip 04 04:55:02 cachyos kernel: [Hardware Error]: CPU:14 (19:21:2) MC5_STATUS[-|UE|MiscV|AddrV|PCC|TCC|SyndV|-|-|-]: 0xbea0000001000108 lip 04 04:55:02 cachyos kernel: [Hardware Error]: Error Addr: 0x00006ffffaf0ffd7 lip 04 04:55:02 cachyos kernel: [Hardware Error]: IPID: 0x000500b000000000, Syndrome: 0x000000004d000000 lip 04 04:55:02 cachyos kernel: [Hardware Error]: Execution Unit Ext. Error Code: 0 lip 04 04:55:02 cachyos kernel: [Hardware Error]: cache level: RESV, tx: GEN, mem-tx: GEN lip 04 04:55:02 cachyos kernel: amdgpu: Overdrive is enabled, please disable it before reporting any bugs unrelated to overdrive. lip 04 04:55:03 cachyos kernel: Bluetooth: hci0: No support for _PRR ACPI method
The only thing different now and before is that i have a new CPU. And it only happened after the restart, those errors. I rebooted again myself, and there was no error anymore.
Wat does this mean?
For reference, this is the boot before (the one that forced a restart):
❯ journalctl -p 3 -b -1
lip 04 02:36:48 cachyos kernel: amdgpu: Overdrive is enabled, please disable it before reporting any bugs unrelated to overdrive.
lip 04 02:36:48 cachyos kernel: Bluetooth: hci0: No support for _PRR ACPI method
lip 04 02:36:48 cachyos kernel: Bluetooth: hci0: FW download error recovery failed (-19)
lip 04 02:36:48 cachyos kernel: Bluetooth: hci0: sending frame failed (-19)
lip 04 02:36:48 cachyos kernel: Bluetooth: hci0: Failed to read MSFT supported features (-19)
lip 04 02:36:49 cachyos kernel: Bluetooth: hci0: No support for _PRR ACPI method
lip 04 02:39:17 cachyos plasmashell[1247]: qt.network.http2.connection: [0x7075f404e5f0] Connection error: HPACK decompression failed (9)
lip 04 02:48:03 cachyos kernel: playstation 0005:054C:0CE6.000E: DualSense input CRC's check failed
lip 04 02:59:11 cachyos kernel: playstation 0005:054C:0CE6.000E: DualSense input CRC's check failed
lip 04 03:01:59 cachyos kernel: playstation 0005:054C:0CE6.000E: DualSense input CRC's check failed
lip 04 03:03:31 cachyos kernel: playstation 0005:054C:0CE6.000E: DualSense input CRC's check failed
lip 04 03:06:16 cachyos kernel: playstation 0005:054C:0CE6.000E: DualSense input CRC's check failed
lip 04 03:08:25 cachyos kernel: playstation 0005:054C:0CE6.000E: DualSense input CRC's check failed
lip 04 03:08:33 cachyos kernel: playstation 0005:054C:0CE6.000E: DualSense input CRC's check failed
lip 04 03:08:59 cachyos kernel: playstation 0005:054C:0CE6.000E: DualSense input CRC's check failed
lip 04 03:21:29 cachyos kernel: playstation 0005:054C:0CE6.000E: DualSense input CRC's check failed
lip 04 04:29:38 cachyos systemd-coredump[25951]: [🡕] Process 25946 (sed) of user 1000 dumped core.
Stack trace of thread 25946:
#0 0x000070009ca00d2b n/a (/usr/lib/ld-linux-x86-64.so.2 + 0x26d2b)
#1 0x000070009c9fae23 n/a (/usr/lib/ld-linux-x86-64.so.2 + 0x20e23)
#2 0x000070009c9fc6d2 n/a (/usr/lib/ld-linux-x86-64.so.2 + 0x226d2)
#3 0x000070009c9fb488 n/a (/usr/lib/ld-linux-x86-64.so.2 + 0x21488)
ELF object binary architecture: AMD x86-64
inxi -b:
System: Host: cachyos Kernel: 6.15.0-2-cachyos arch: x86_64 bits: 64 Desktop: KDE Plasma v: 6.3.5 Distro: CachyOS Machine: Type: Desktop Mobo: ASRock model: B550M Pro4 serial: <superuser required> UEFI: American Megatrends LLC. v: P3.40 date: 01/18/2024 CPU: Info: 8-core AMD Ryzen 7 5700X3D [MT MCP] speed (MHz): avg: 3592 min/max: 575/4151 Graphics: Device-1: Advanced Micro Devices [AMD/ATI] Navi 32 [Radeon RX 7700 XT / 7800 XT] driver: amdgpu v: kernel Display: wayland server: X.org v: 1.21.1.16 with: Xwayland v: 24.1.6 compositor: kwin_wayland driver: gpu: amdgpu resolution: 1: 2560x1440~75Hz 2: 2560x1440~75Hz API: OpenGL v: 4.6 compat-v: 4.5 vendor: amd mesa v: 25.1.1-cachyos1.3 renderer: AMD Radeon RX 7800 XT (radeonsi navi32 LLVM 19.1.7 DRM 3.63 6.15.0-2-cachyos) Info: Tools: api: clinfo, eglinfo, glxinfo, vulkaninfo de: kscreen-console,kscreen-doctor gpu: lact wl: wayland-info x11: xdpyinfo, xprop, xrandr Network: Device-1: Realtek RTL8111/8168/8211/8411 PCI Express Gigabit Ethernet driver: r8169 Device-2: Intel Wi-Fi 6E AX210/AX1675 2x2 [Typhoon Peak] driver: iwlwifi Device-3: ASUSTek TUF GAMING M4 WIRELESS driver: hid-generic,usbhid type: USB Drives: Local Storage: total: 2.96 TiB used: 769.19 GiB (25.4%) Info: Memory: total: 32 GiB available: 31.26 GiB used: 5.07 GiB (16.2%) Processes: 414 Uptime: 1h 5m Shell: fish inxi: 3.3.38
Amdgpu issue with linux-cachyos kernel 6.12.x [SOLVED: NOT AN ISSUE]
When i Boot I get this Error /// amdgpu overdrive is enabled please disable it How? - Unsupported Software (AUR & Other) - Garuda Linux Forum
[Unstable Update] May 2025 - Page 3 - Unstable Updates - Manjaro Linux Forum
AMDGPU Overdrive not working on Fedora 31 64 bit
Edit: Seems my title for this issue was a little sensational. Folks in this thread are saying that the clock boost is expected normal behavior. My original post noted that I worked around the problem by manually setting my gpu clock, but after testing for a day I again crashed with the same error messages found in syslog (detailed below.) There is still an underlying problem somewhere. I hope folks can fix it soon, sadly this type of low level programming is way out of my wheel house so all I can do is post on reddit. </3
TLDR See: https://gitlab.freedesktop.org/drm/amd/-/issues/3131
I found that when I tried to play Stranded Alien Dawn, the screen would go black. Looked through syslog and found:
amdgpu 0000:0d:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00501430 amdgpu 0000:0d:00.0: amdgpu: Faulty UTCL2 client ID: SQC (data) (0xa) amdgpu 0000:0d:00.0: amdgpu: MORE_FAULTS: 0x0 amdgpu 0000:0d:00.0: amdgpu: WALKER_ERROR: 0x0 amdgpu 0000:0d:00.0: amdgpu: PERMISSION_FAULTS: 0x3 amdgpu 0000:0d:00.0: amdgpu: MAPPING_ERROR: 0x0 amdgpu 0000:0d:00.0: amdgpu: RW: 0x0
Did some searching and found this: https://gitlab.freedesktop.org/drm/amd/-/issues/3067
Which directed me to https://gitlab.freedesktop.org/drm/amd/-/issues/3131
I read through the comments and found out that this existed https://github.com/ilya-zlobintsev/LACT Installed and monitored my GPU clocks and noticed that it had the max gpu clock 400 mhz over the manufacturer's set clock. (I have the Sapphire Pulse 7900 xtx).
I've been able to work around it by manually setting my clocks as suggested in the comments. FWIW I'm running kernel version 6.9.3, but the comments in that gitlab issue seem to indicate a bug in linux-firmware which I guess is separate from the kernel? (Forgive me, I don't exactly know how this works and I'm just trying to peice it together myself)