So, let me summarize to a lot of you.
my GPU is a 1660 Super, and since NVIDIA has a lot of problems on Linux I tend to try and find solutions and alternatives, but nothing seems to get to fix it.
I want to set a fan curve for my GPU, and Green With Envy sadly seems to just never work.
Even after setting the optimal settings, and enabling coolbits, it still refuses to work no matter what. If there is some kind of way to control fans for GPU fans on Linux, I would be very glad and it would help me a lot.
NVIDIA in general is still awful on Linux from my point of view, but i'm hoping those kinds of things will improve in the future.
Videos
In the terminal run:
sudo nvidia-xconfig
sudo nvidia-xconfig --cool-bits=4
restart your computer and search for NVIDIA X Server Settings in the Dash. There should be an option to change fan speed under Thermal Settings.
To control Nvidia GPU fan speed via Terminal on Linux Mint 20 with a 1070 Ti:
sudo nvidia-xconfig --cool-bits=4
to tell nvidia-xconfig to allow the fan to be controlled in the command line. You may need to reboot here.
nvidia-smi
Gives information about the GPU(s) and their numbers. Importantly I note that my 1070 Ti is GPU 0.
nvidia-settings -a "[gpu:0]/GPUFanControlState=1" -a "[fan:0]/GPUTargetFanSpeed=55"
For a much more detailed overview of this feature including multiple GPU fans, check out this thorough documentation Nvidia Overclocking and Cooling
For a somewhat rambling and wayward thread which lead me to the above link, check out Set Fanspeed in Linux from Terminal
==================END OF ANSWER==================
And as an extra tidbit not asked for in this question, you can also adjust the power output of your Nvidia GPU with:
sudo nvidia-smi -i 0 -pl 90
Where the 0 is my GPU number, and the 90 is the maximum power in watts. If you set this too low, you will get an error. In my limited experience, setting it too high had no effect. I found I can get 95%+ performance for ~75% energy cost by setting the power level to 100 in the above command in my mining rig, but I imagine other power conscious users would appreciate this too.
I know it's possible to get GPU fan RPM values via nvidia-settings, but my gnome-extension switched to nvidia-smi due to high CPU usage when polling for GPU temp.
$ nvidia-smi --help-query-gpu | grep fan
"fan.speed"
The fan speed value is the percent of the product's maximum noise tolerance fan speed that the device's fan is currently intended to run at. This value may exceed 100% in certain cases. Note: The reported speed is the intended fan speed. If the fan is physically blocked and unable to spin, this output will not match the actual fan speed. Many parts do not report fan speeds because they rely on cooling via fans in the surrounding enclosure.
Sadly, it appears that nvidia-smi has just one option for fan speed which displays it as a percentage (for ex. 51%).
I understand it would be possible to approximate values based on inference by comparing to the RPM speeds from nvidia-settings but I was trying to avoid an ugly hack that like in my PR.
Is it possible to get RPM values natively from nvidia-smi at all?
Is there a specific reason you need the RPM other than wanting the RPM?
You can have a look at NVIDIA Management Library (NVML)nvmlUnitGetFanSpeedInfo API should return you fan speed in rpm via nvmlUnitFanInfo_t structure.
This is a C library. It has official Perl and Python bindings.
The following is a simple method that does not require scripting, connecting fake monitors, or fiddling and can be executed over SSH to control multiple NVIDIA GPUs' fans. It has been tested on Arch Linux.
Create xorg.conf
sudo nvidia-xconfig --allow-empty-initial-configuration --enable-all-gpus --cool-bits=7
This will create an /etc/X11/xorg.conf with an entry for each GPU, similar to the manual method.
Note: Some distributions (Fedora, CentOS, Manjaro) have additional config files (eg in /etc/X11/xorg.conf.d/ or /usr/share/X11/xorg.conf.d/), which override xorg.conf and set AllowNVIDIAGPUScreens. This option is not compatible with this guide. The extra config files should be modified or deleted. The X11 log file shows which config files have been loaded.
Alternative: Create xorg.conf manually
Identify your cards' PCI IDs:
nvidia-xconfig --query-gpu-info
Find the PCI BusID fields. Note that these are not the same as the bus IDs reported in the kernel.
Alternatively, do sudo startx, open /var/log/Xorg.0.log (or whatever location startX lists in its output under the line "Log file:"), and look for the line NVIDIA(0): Valid display device(s) on GPU-<GPU number> at PCI:<PCI ID>.
Edit /etc/X11/xorg.conf
Here is an example of xorg.conf for a three-GPU machine:
Section "ServerLayout"
Identifier "dual"
Screen 0 "Screen0"
Screen 1 "Screen1" RightOf "Screen0"
Screen 1 "Screen2" RightOf "Screen1"
EndSection
Section "Device"
Identifier "Device0"
Driver "nvidia"
VendorName "NVIDIA Corporation"
BusID "PCI:5:0:0"
Option "Coolbits" "7"
Option "AllowEmptyInitialConfiguration"
EndSection
Section "Device"
Identifier "Device1"
Driver "nvidia"
VendorName "NVIDIA Corporation"
BusID "PCI:6:0:0"
Option "Coolbits" "7"
Option "AllowEmptyInitialConfiguration"
EndSection
Section "Device"
Identifier "Device2"
Driver "nvidia"
VendorName "NVIDIA Corporation"
BusID "PCI:9:0:0"
Option "Coolbits" "7"
Option "AllowEmptyInitialConfiguration"
EndSection
Section "Screen"
Identifier "Screen0"
Device "Device0"
EndSection
Section "Screen"
Identifier "Screen1"
Device "Device1"
EndSection
Section "Screen"
Identifier "Screen2"
Device "Device2"
EndSection
The BusID must match the bus IDs we identified in the previous step. The option AllowEmptyInitialConfiguration allows X to start even if no monitor is connected. The option Coolbits allows fans to be controlled. It can also allow overclocking.
Note: Some distributions (Fedora, CentOS, Manjaro) have additional config files (eg in /etc/X11/xorg.conf.d/ or /usr/share/X11/xorg.conf.d/), which override xorg.conf and set AllowNVIDIAGPUScreens. This option is not compatible with this guide. The extra config files should be modified or deleted. The X11 log file shows which config files have been loaded.
Edit /root/.xinitrc
nvidia-settings -q fans
nvidia-settings -a [gpu:0]/GPUFanControlState=1 -a [fan:0]/GPUTargetFanSpeed=75
nvidia-settings -a [gpu:1]/GPUFanControlState=1 -a [fan:1]/GPUTargetFanSpeed=75
nvidia-settings -a [gpu:2]/GPUFanControlState=1 -a [fan:2]/GPUTargetFanSpeed=75
I use .xinitrc to execute nvidia-settings for convenience, although there's probably other ways. The first line will print out every GPU fan in the system. Here, I set the fans to 75%.
Launch X
sudo startx -- :0
You can execute this command from SSH. The output will be:
Current version of pixman: 0.34.0
Before reporting problems, check http://wiki.x.org
to make sure that you have the latest version.
Markers: (--) probed, (**) from config file, (==) default setting,
(++) from command line, (!!) notice, (II) informational,
(WW) warning, (EE) error, (NI) not implemented, (??) unknown.
(==) Log file: "/var/log/Xorg.0.log", Time: Sat May 27 02:22:08 2017
(==) Using config file: "/etc/X11/xorg.conf"
(==) Using system config directory "/usr/share/X11/xorg.conf.d"
Attribute 'GPUFanControlState' (pushistik:0[gpu:0]) assigned value 1.
Attribute 'GPUTargetFanSpeed' (pushistik:0[fan:0]) assigned value 75.
Attribute 'GPUFanControlState' (pushistik:0[gpu:1]) assigned value 1.
Attribute 'GPUTargetFanSpeed' (pushistik:0[fan:1]) assigned value 75.
Attribute 'GPUFanControlState' (pushistik:0[gpu:2]) assigned value 1.
Attribute 'GPUTargetFanSpeed' (pushistik:0[fan:2]) assigned value 75.
Monitor temperatures and clock speeds
nvidia-smi and nvtop can be used to observe temperatures and power draw. Lower temperatures will allow the card to clock higher and increase its power draw. You can use sudo nvidia-smi -pl 150 to limit power draw and keep the cards cool, or use sudo nvidia-smi -pl 300 to let them overclock. My 1080 Ti runs at 1480 MHz if given 150W, and over 1800 MHz if given 300W, but this depends on the workload. You can monitor their clock speed with nvidia-smi -q or more specifically, watch 'nvidia-smi -q | grep -E "Utilization| Graphics|Power Draw"'
Returning to automatic fan management.
Reboot. I haven't found another way to make the fans automatic.
I've written a pip-installable Python script to do something similar to @AlexsandrDubinsky's suggestion.
When you run fans.py, it sets up a temporary X server for each GPU with a fake display attached. Then, it loops over the GPUs every few seconds and sets the fan speed according to their temperature. When the script dies, it returns control of the fans to the drivers and cleans up the X servers.