This seems to be the answer to my issues. The card is an RX550 so it shouldn't need that line but it does. What exactly am I losing by using it? I know it affects audio through hdmi/display port but I don't need that anyway.
TL;DR: With amdgpu.dc=0, my 1440p monitor works, but intensive games will eventually crash my system. With amdgpu.dc=1, intensive games are perfectly stable, but my 1440p monitor doesn't work.
I'll start from the beginning.
When I upgraded to Lubuntu 18.04 w/ Kernel 4.15 back in April my primary monitor, a 1440p DVI monitor, was receiving no signal. Xrandr could detect the monitor was plugged in, but no modes were found. After some Googling I found out that setting amdgpu.dc=0 fixed this problem.
Fast forward to Wednesday and with the recent release of Steam Play, I'm eager to start testing some games. I launch a number of titles and run a few benchmarks, everything seems fine. Then I try to play GTA V for an extended period of time. This is when I discover that after playing for about 30 minutes, the graphics start to get corrupted and then eventually my system freezes.
I look at my kernel log and discover that at the exact moment the graphical corruption begins, I start getting spammed with variations of the following lines:
Aug 25 23:54:59 THE-GIANT-SLAB kernel: [10462.962860] amdgpu 0000:23:00.0: GPU fault detected: 146 0x0ff0480c Aug 25 23:54:59 THE-GIANT-SLAB kernel: [10462.962864] amdgpu 0000:23:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x000003FE Aug 25 23:54:59 THE-GIANT-SLAB kernel: [10462.962866] amdgpu 0000:23:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0404800C Aug 25 23:54:59 THE-GIANT-SLAB kernel: [10462.962869] amdgpu 0000:23:00.0: VM fault (0x0c, vmid 2) at page 1022, read from 'TC4' (0x54433400) (72)
I spend a couple of days running various tests to try and either verify or debunk the claim of a faulty GPU. I had a breakthrough today when I decided to try launching the kernel without amdgpu.dc=0. As expected, it boots up and my 1440p monitor is not receiving a signal. I have now been playing GTA V for about two hours without any errors in dmesg.
So that's the situation. I can either have my monitor working, and experience full system crashes, or I can have perfectly stable gameplay, but only on my smaller secondary monitor. As I'm sure you can imagine, neither situation is particularly ideal.
How do I even start trying to come up with a solution to this?
Thanks in advance for your advice.
EDIT: Dunno how I completely forgot to post my specs. Derp.
GPU: 8GB Sapphire RX 480 Nitro+
CPU: AMD Ryzen 5 1600
RAM: 16GB Corsair Vengeance LPX 3200MHz
Mobo: MSI Tomahawk B350
Kernel: 4.15.0.33-generic
Mesa version: 18.3.0-devel (padoka PPA)
EDIT 2: Installing a fresher kernel sorted it out. Currently using my 1440p monitor with AMDGPU DC enabled. Thanks everyone! Now if you'll excuse me, I'm off to play some GTA V...
My linux mint computer is not booting. It wouldn't boot on normal mode either when I was trying the live usb installer(I had to use compatibility mode). I added amdgpu.dc=0 by pressing TAB. My question is how can I edit the boot cfg before linux mint starts. I know that in fedora you have to press E, but on linux mint?
June 14, 2018 Update
Based on this ArchLinux forum thread it appears you need to add:
amdgpu.dc=0
to your /etc/default/grub LINUX line after quiet splash. Then run sudo update-grub.
Being a new install of Ubuntu 18.04 you are one of the lucky ones that can use journalctl to look at the last boot (which locked up). Use:
journalctl -b-1
Then press the End key to jump to EOF (End Of File). In my successful last boot it says:
Jun 10 16:18:51 alien systemd[1]: Unmounting /mnt/d...
Jun 10 16:18:51 alien systemd[1]: Unmounted /run/user/1000.
Jun 10 16:18:51 alien systemd[1]: Unmounted /media/rick/Ubuntu 18.04 LTS amd64.
Jun 10 16:18:51 alien systemd[1]: Unmounted /boot/efi.
Jun 10 16:18:51 alien ntfs-3g[648]: Unmounting /dev/nvme0n1p8 (Shared_WSL+Linux)
Jun 10 16:18:51 alien ntfs-3g[648]: Permissions cache : 21 writes, 4033288 reads, 99.9% hits
Jun 10 16:18:51 alien systemd[1]: Unmounted /media/rick/casper-rw.
Jun 10 16:18:51 alien systemd[1]: Unmounted /mnt/e.
Jun 10 16:18:51 alien ntfs-3g[736]: Unmounting /dev/sda3 (HGST_Win10)
Jun 10 16:18:51 alien ntfs-3g[736]: Permissions cache : 754 writes, 4108560 reads, 99.9% hits
Jun 10 16:18:51 alien ntfs-3g[637]: Unmounting /dev/nvme0n1p4 (NVMe_Win10)
Jun 10 16:18:51 alien ntfs-3g[637]: Permissions cache : 987 writes, 4983239 reads, 99.9% hits
Jun 10 16:18:51 alien systemd[1]: Unmounted /mnt/d.
Jun 10 16:18:51 alien systemd[1]: Unmounted /mnt/c.
Jun 10 16:18:51 alien systemd[1]: Reached target Unmount All Filesystems.
Jun 10 16:18:51 alien systemd[1]: Stopped target Local File Systems (Pre).
Jun 10 16:18:51 alien systemd[1]: Stopped Remount Root and Kernel File Systems.
Jun 10 16:18:51 alien systemd[1]: Stopped Create Static Device Nodes in /dev.
Jun 10 16:18:51 alien systemd[1]: Reached target Shutdown.
Jun 10 16:18:51 alien systemd[1]: Reached target Final Step.
Jun 10 16:18:51 alien systemd[1]: dev-disk-by\x2dpartlabel-Basic\x5cx20data\x5cx20partition.device: Dev dev-
Jun 10 16:18:51 alien systemd[1]: Received SIGRTMIN+20 from PID 18665 (plymouthd).
Jun 10 16:18:51 alien systemd[1]: Started Show Plymouth Reboot Screen.
Jun 10 16:18:51 alien systemd[1]: Starting Reboot...
Jun 10 16:18:51 alien systemd[1]: Shutting down.
Jun 10 16:18:51 alien kernel: systemd-shutdow: 36 output lines suppressed due to ratelimiting
Jun 10 16:18:51 alien systemd-shutdown[1]: Sending SIGTERM to remaining processes...
Jun 10 16:18:51 alien dnsmasq[1393]: exiting on receipt of SIGTERM
Jun 10 16:18:51 alien systemd-journald[288]: Journal stopped
lines 46804-46832/46832 (END)
In yours you need to look for error messages.
You may have to use the Page Up key to see them.
When you have found what you are looking for (or have given up looking) press Q to exit.
If overheating was causing the shutdown you can install Intel Powerclamp: Stop cpu from overheating
Besides lm-sensors you can get temperature readings for all thermal zones directly from the command line using this one-liner:
$ paste <(cat /sys/class/thermal/thermal_zone*/type) <(cat /sys/class/thermal/thermal_zone*/temp) | column -s $'\t' -t | sed 's/\(.\)..$/.\1°C/'
INT3400 Thermal 20.0°C
SEN1 44.0°C
SEN2 52.0°C
SEN3 64.0°C
SEN4 59.0°C
B0D4 73.0°C
pch_skylake 76.5°C
x86_pkg_temp 73.0°C
Reported in Celsius and drop the last three zeros.
in addition to solution with amdgpu.dc=0 kernel option, upgrade to ubuntu 18.10 kernel based on linux 4.18 has fixed this issue and no longer requires this amdgpu.dc=0 parameter in kernel boot for graphics to work correctly. (AMD Stoney hardware)