AMDGPU

From ArchWiki
(Redirected from AMD)

AMDGPU is the open source graphics driver for AMD Radeon graphics cards since the Graphics Core Next family.

Selecting the right driver

Depending on the card you have, find the right driver in Xorg#AMD. This driver supports Southern Islands (SI) cards and later. AMD has no plans to support pre-GCN GPUs. Owners of unsupported GPUs may use the open source ATI driver.

Installation

Install the mesa package, which provides both the DRI driver for 3D acceleration and VA-API/VDPAU drivers for accelerated video decoding.

Experimental

It may be worthwhile for some users to use the upstream experimental build of mesa.

Install the mesa-gitAUR package, which provides the DRI driver for 3D acceleration.

  • For 32-bit application support, also install the lib32-mesa-gitAUR package from the mesa-git repository or the AUR.
  • For the DDX driver (which provides 2D acceleration in Xorg), install the xf86-video-amdgpu-gitAUR package.
  • For Vulkan support using the mesa-git repository, install the vulkan-radeon-git package. Optionally install the lib32-vulkan-radeon-git package for 32-bit application support. This should not be required if building mesa-gitAUR from the AUR.
Tip: Users who do not wish to go through the process of compiling the mesa-gitAUR package can use the mesa-git unofficial repository.

Enable Southern Islands (SI) and Sea Islands (CIK) support

The linux package enables AMDGPU support for cards of the Southern Islands (HD 7000 Series, SI, ie. GCN 1) and Sea Islands (HD 8000 Series, CIK, ie. GCN 2). The amdgpu kernel driver needs to be loaded before the radeon one. You can check which kernel driver is loaded by running lspci -k. It should be like this:

$ lspci -k -d ::03xx
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Curacao PRO [Radeon R7 370 / R9 270/370 OEM]
	Subsystem: Gigabyte Technology Co., Ltd Device 226c
	Kernel driver in use: amdgpu
	Kernel modules: radeon, amdgpu

If the amdgpu driver is not in use, follow instructions in the next section.

Load amdgpu driver

The module parameters of both amdgpu and radeon modules are cik_support= and si_support=.

They need to be set as kernel parameters or in a modprobe configuration file, and depend on the cards GCN version.

You can use both parameters if you are unsure which kernel card you have.

Tip: dmesg may indicate the correct kernel parameter to use: [..] amdgpu 0000:01:00.0: Use radeon.cik_support=0 amdgpu.cik_support=1 to override.
Set module parameters in kernel command line

Set one of the following kernel parameters:

  • Southern Islands (SI): radeon.si_support=0 amdgpu.si_support=1
  • Sea Islands (CIK): radeon.cik_support=0 amdgpu.cik_support=1

Specify the correct module order

Make sure amdgpu has been set as first module in the Mkinitcpio#MODULES array, e.g. MODULES=(amdgpu radeon).

Set kernel module parameters

For Southern Islands (SI) use the si_support=1 kernel module parameter, for Sea Islands (CIK) use cik_support=1:

/etc/modprobe.d/amdgpu.conf
options amdgpu si_support=1
options amdgpu cik_support=1
/etc/modprobe.d/radeon.conf
options radeon si_support=0
options radeon cik_support=0

Make sure modconf is in the HOOKS array in /etc/mkinitcpio.conf and regenerate the initramfs.

Compile kernel which supports amdgpu driver

When building or compiling a kernel, CONFIG_DRM_AMDGPU_SI=Y and/or CONFIG_DRM_AMDGPU_CIK=Y should be set in the config.

Disable loading radeon completely at boot

The kernel may still probe and load the radeon module depending on the specific graphics chip involved, but the module is unnecessary to have loaded after confirming amdgpu works as expected. Reboot between each step to confirm it works before moving to the next step:

  1. Use the module parameters on the kernel commandline method to ensure amdgpu works as expected
  2. Use the MODULES=(amdgpu) mkinitcpio method but do not add radeon to the configuration
  3. Test that modprobe -r radeon will remove the kernel module cleanly after logged into the desktop
  4. Blacklist the radeon module from being probed by the kernel during second stage boot:
/etc/modprobe.d/radeon.conf
blacklist radeon

The output of lsmod and dmesg should now only show the amdgpu driver loading, radeon should not be present. The directory /sys/module/radeon should not exist.

ACO compiler

The ACO compiler is an open source shader compiler created and developed by Valve Corporation to directly compete with the LLVM compiler, the AMDVLK drivers, as well as Windows 10. It offers lesser compilation time and also performs better while gaming than LLVM and AMDVLK.

Some benchmarks can be seen on GitHub and Phoronix (1) (2) (3).

Since mesa version 20.2 ACO is the default shader compiler.

Loading

The amdgpu kernel module is supposed to load automatically on system boot.

If it does not:

It is possible it loads, but late, after the X server requires it. In this case see Kernel mode setting#Early KMS start.

Xorg configuration

Xorg will automatically load the driver and it will use your monitor's EDID to set the native resolution. Configuration is only required for tuning the driver.

If you want manual configuration, create /etc/X11/xorg.conf.d/20-amdgpu.conf, and add the following:

/etc/X11/xorg.conf.d/20-amdgpu.conf
Section "OutputClass"
     Identifier "AMD"
     MatchDriver "amdgpu"
     Driver "amdgpu"
EndSection

Using this section, you can enable features and tweak the driver settings, see amdgpu(4) first before setting driver options.

Tear free rendering

TearFree controls tearing prevention using the hardware page flipping mechanism. By default, TearFree will be on for rotated outputs, outputs with RandR transforms applied, and for RandR 1.4 slave outputs, and off for everything else. Or you can configure it to be always on or always off with true or false respectively.

Option "TearFree" "true"

You can also enable TearFree temporarily with xrandr:

$ xrandr --output output --set TearFree on

Where output should look like DisplayPort-0 or HDMI-A-0 and can be acquired by running xrandr -q.

DRI level

DRI sets the maximum level of DRI to enable. Valid values are 2 for DRI2 or 3 for DRI3. The default is 3 for DRI3 if the Xorg version is >= 1.18.3, otherwise DRI2 is used:

Option "DRI" "3"

Variable refresh rate

See Variable refresh rate.

10-bit color

Warning: Many applications may have graphical artifacts or crash when 10-bit color is enabled. This notably includes Steam, which crashes with an X Error.

Newer AMD cards support 10bpc color, but the default is 24-bit color and 30-bit color must be explicitly enabled. Enabling it can reduce visible banding/artifacts in gradients, if the applications support this too. To check if your monitor supports it search for "EDID" in your Xorg log file (e.g. /var/log/Xorg.0.log or ~/.local/share/xorg/Xorg.0.log):

[   336.695] (II) AMDGPU(0): EDID for output DisplayPort-0
[   336.695] (II) AMDGPU(0): EDID for output DisplayPort-1
[   336.695] (II) AMDGPU(0): Manufacturer: DEL  Model: a0ec  Serial#: 123456789
[   336.695] (II) AMDGPU(0): Year: 2018  Week: 23
[   336.695] (II) AMDGPU(0): EDID Version: 1.4
[   336.695] (II) AMDGPU(0): Digital Display Input
[   336.695] (II) AMDGPU(0): 10 bits per channel

To check whether it is currently enabled search for "Depth"):

[   336.618] (**) AMDGPU(0): Depth 30, (--) framebuffer bpp 32
[   336.618] (II) AMDGPU(0): Pixel depth = 30 bits stored in 4 bytes (32 bpp pixmaps)

With the default configuration it will instead say the depth is 24, with 24 bits stored in 4 bytes.

To check whether 10-bit works, exit Xorg if you have it running and run Xorg -retro which will display a black and white grid, then press Ctrl-Alt-F1 and Ctrl-C to exit X, and run Xorg -depth 30 -retro. If this works fine, then 10-bit is working.

To launch in 10-bit via startx, use startx -- -depth 30. To permanently enable it, create or add to:

/etc/X11/xorg.conf.d/20-amdgpu.conf
Section "Screen"
	Identifier "asdf"
	DefaultDepth 30
EndSection

Reduce output latency

If you want to minimize latency you can disable page flipping and tear free:

/etc/X11/xorg.conf.d/20-amdgpu.conf
Section "OutputClass"
     Identifier "AMD"
     MatchDriver "amdgpu"
     Driver "amdgpu"
     Option "EnablePageFlip" "off"
     Option "TearFree" "false"
EndSection

See Gaming#Reducing DRI latency to further reduce latency.

Note: Setting these options may cause tearing and short-lived artifacts to appear.

Features

Video acceleration

See Hardware video acceleration#AMD/ATI.

Monitoring

Monitoring your GPU is often used to check the temperature and also the P-states of your GPU.

CLI

  • amdgpu_top — Tool to display AMDGPU usage
https://github.com/Umio-Yasuno/amdgpu_top || amdgpu_topAUR
  • nvtop — GPUs process monitoring for AMD, Intel and NVIDIA
https://github.com/Syllo/nvtop || nvtop
  • radeontop — A GPU utilization viewer, both for the total activity percent and individual blocks
https://github.com/clbr/radeontop || radeontop

GUI

  • amdgpu_top — Tool to display AMDGPU usage
https://github.com/Umio-Yasuno/amdgpu_top || amdgpu_topAUR
  • AmdGuid — A basic fan control GUI fully written in Rust.
https://github.com/Eraden/amdgpud || amdguid-wayland-binAUR, amdguid-glow-binAUR
  • TuxClocker — A Qt5 monitoring and overclocking tool.
https://github.com/Lurkki14/tuxclocker || tuxclockerAUR

Manually

To check your GPU's P-states, execute:

$ cat /sys/class/drm/card0/device/pp_od_clk_voltage

To monitor your GPU, execute:

# watch -n 0.5 cat /sys/kernel/debug/dri/0/amdgpu_pm_info

To check your GPU utilization, execute:

$ cat /sys/class/drm/card0/device/gpu_busy_percent

To check your GPU frequency, execute:

$ cat /sys/class/drm/card0/device/pp_dpm_sclk

To check your GPU temperature, execute:

$ cat /sys/class/drm/card0/device/hwmon/hwmon*/temp1_input

To check your VRAM frequency, execute:

$ cat /sys/class/drm/card0/device/pp_dpm_mclk

To check your VRAM usage, execute:

$ cat /sys/class/drm/card0/device/mem_info_vram_used

To check your VRAM size, execute:

$ cat /sys/class/drm/card0/device/mem_info_vram_total

Overclocking

Since Linux 4.17, once you have enabled the features at boot below, it is possible to adjust clocks and voltages of the graphics card via /sys/class/drm/card0/device/pp_od_clk_voltage.

Boot parameter

It is required to unlock access to adjust clocks and voltages in sysfs by appending the Kernel parameter amdgpu.ppfeaturemask=0xffffffff.

Not all bits are defined, and new features may be added over time. Setting all 32 bits may enable unstable features that cause problems such as screen flicker or broken resume from suspend. It should be sufficient to set the PP_OVERDRIVE_MASK bit, 0x4000, in combination with the default ppfeaturemask. To compute a reasonable parameter for your system, execute:

$ printf 'amdgpu.ppfeaturemask=0x%x\n' "$(($(cat /sys/module/amdgpu/parameters/ppfeaturemask) | 0x4000))"

Manual (default)

Note: In sysfs, paths like /sys/class/drm/... are just symlinks and may change between reboots. Persistent locations can be found in /sys/devices/, e.g. /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/. Adjust the commands accordingly for a reliable result.

For in-depth information on all possible options, read the kernel documentation for amdgpu thermal control.

To set the GPU clock for the maximum P-state 7 on e.g. a Polaris GPU to 1209MHz and 900mV voltage, run:

# echo "s 7 1209 900" > /sys/class/drm/card0/device/pp_od_clk_voltage

The same procedure can be applied to the VRAM, e.g. maximum P-state 2 on Polaris 5xx series cards:

# echo "m 2 1850 850" > /sys/class/drm/card0/device/pp_od_clk_voltage
Warning: Double check the entered values, as mistakes might instantly cause fatal hardware damage!

To apply, run:

# echo "c" > /sys/class/drm/card0/device/pp_od_clk_voltage

To check if it worked out, read out clocks and voltage under 3D load:

# watch -n 0.5  cat /sys/kernel/debug/dri/0/amdgpu_pm_info

You can reset to the default values using:

# echo "r" > /sys/class/drm/card0/device/pp_od_clk_voltage

It is also possible to forbid the driver from switching to certain P-states, e.g. to workaround problems with deep powersaving P-states, such as flickering artifacts or stutter. To force the highest VRAM P-state on a card, while still allowing the GPU itself to run with lower clocks, first find the highest possible P-state, then set it:

# cat /sys/class/drm/card0/device/pp_dpm_mclk
0: 96Mhz *
1: 456Mhz 
2: 675Mhz 
3: 1000Mhz 
# echo "manual" > /sys/class/drm/card0/device/power_dpm_force_performance_level
# echo "3" >  /sys/class/drm/card0/device/pp_dpm_mclk

Allow only the three highest GPU P-states:

# echo "5 6 7" >  /sys/class/drm/card0/device/pp_dpm_sclk

To set the allowed maximum power consumption of the GPU to e.g. 50 Watts, run:

# echo 50000000 > /sys/class/drm/card0/device/hwmon/hwmon0/power1_cap
Note: The above procedure was tested with a Polaris RX 560 card. There may be different behavior or bugs with different GPUs.

Assisted

If you are not inclined to fully manually overclock your GPU, there are some overclocking tools that are offered by the community to assist you to overclock and monitor your AMD GPU.

CLI tools
  • amdgpu-clocks — A script that can be used to monitor and set custom power states for AMD GPUs. It also offers a Systemd service to apply the settings automatically upon boot.
https://github.com/sibradzic/amdgpu-clocks || amdgpu-clocks-gitAUR
GUI tools
  • TuxClocker — A Qt5 monitoring and overclocking tool.
https://github.com/Lurkki14/tuxclocker || tuxclockerAUR
  • CoreCtrl — A GUI overclocking tool with a WattMan-like UI that supports per-application profiles.
https://gitlab.com/corectrl/corectrl || corectrl
  • LACT — A GTK tool to view information and control your AMD GPU.
https://github.com/ilya-zlobintsev/LACT || lactAUR

Startup on boot

One way is to use systemd units, if you want your settings to apply automatically upon boot, consider looking at this Reddit thread to configure and apply your settings on boot.

Another way is to use udev rules for some of the values, for example, to set a low performance level to save energy:

/etc/udev/rules.d/30-amdgpu-low-power.rules 
SUBSYSTEM=="pci", DRIVER=="amdgpu", ATTR{power_dpm_force_performance_level}="low"

Performance levels

AMDGPU offers several performance levels, the file power_dpm_force_performance_level is used for this, it is possible to select between these levels:

  • auto: dynamically select the optimal power profile for current conditions in the driver.
  • low: clocks are forced to the lowest power state.
  • high: clocks are forced to the highest power state.
  • manual: user can manually adjust which power states are enabled for each clock domain (used for setting #Power profiles)
  • profile_standard, profile_min_sclk, profile_min_mclk, profile_peak: clock and power gating are disabled and the clocks are set for different profiling cases. This mode is recommended for profiling specific work loads

To set the AMDGPU device to use a low performance level, the following command can be executed:

# echo "low" > /sys/class/drm/card0/device/power_dpm_force_performance_level
Note: Performance levels changes should be reapplied at every boot, see #Startup on boot to automate this.

Power profiles

AMDGPU offers several optimizations via power profiles, one of the most commonly used is the compute mode for OpenCL intensive applications. Available power profiles can be listed with:

cat /sys/class/drm/card0/device/pp_power_profile_mode
NUM        MODE_NAME     SCLK_UP_HYST   SCLK_DOWN_HYST SCLK_ACTIVE_LEVEL     MCLK_UP_HYST   MCLK_DOWN_HYST MCLK_ACTIVE_LEVEL
  0   BOOTUP_DEFAULT:        -                -                -                -                -                -
  1   3D_FULL_SCREEN:        0              100               30                0              100               10
  2     POWER_SAVING:       10                0               30                -                -                -
  3            VIDEO:        -                -                -               10               16               31
  4               VR:        0               11               50                0              100               10
  5        COMPUTE *:        0                5               30               10               60               25
  6           CUSTOM:        -                -                -                -                -                -
Note: card0 identifies a specific GPU in your machine, in case of multiple GPUs be sure to address the right one.

To use a specific power profile you should first enable manual control over them with:

# echo "manual" > /sys/class/drm/card0/device/power_dpm_force_performance_level

Then to select a power profile by writing the NUM field associated with it, e.g. to enable COMPUTE run:

# echo "5" > /sys/class/drm/card0/device/pp_power_profile_mode
Note: Power profile changes should be reapplied at every boot, see #Startup on boot to automate this.

Enable GPU display scaling

This article or section is a candidate for merging with xrandr.

Notes: Not specific to AMDGPU. (Discuss in Talk:AMDGPU#Moving "Enable GPU display scaling" to xrandr)

To avoid the usage of the scaler which is built in the display, and use the GPU own scaler instead, when not using the native resolution of the monitor, execute:

$ xrandr --output output --set "scaling mode" scaling_mode

Possible values for "scaling mode" are: None, Full, Center, Full aspect.

  • To show the available outputs and settings, execute:
$ xrandr --prop
  • To set scaling mode = Full aspect for just every available output, execute:
$ for output in $(xrandr --prop | grep -E -o -i "^[A-Z\-]+-[0-9]+"); do xrandr --output "$output" --set "scaling mode" "Full aspect"; done

Troubleshooting

Module parameters

The amdgpu module stashes several config parameters (modinfo amdgpu | grep mask) in masks that are only documented in the kernel sources.

Xorg or applications will not start

  • "(EE) AMDGPU(0): [DRI2] DRI2SwapBuffers: drawable has no back or front?" error after opening glxgears, can open Xorg server but OpenGL applications crash.
  • "(EE) AMDGPU(0): Given depth (32) is not supported by amdgpu driver" error, Xorg will not start.

Setting the screen's depth under Xorg to 16 or 32 will cause problems/crash. To avoid that, you should use a standard screen depth of 24 by adding this to your "screen" section:

/etc/X11/xorg.conf.d/10-screen.conf
Section "Screen"
       Identifier     "Screen"
       DefaultDepth    24
       SubSection      "Display"
               Depth   24
       EndSubSection
EndSection

Screen artifacts and frequency problem

Dynamic power management may cause screen artifacts to appear when displaying to monitors at higher frequencies (anything above 60Hz) due to issues in the way GPU clock speeds are managed[1][2].

A workaround [3] is saving high or low in /sys/class/drm/card0/device/power_dpm_force_performance_level.

To make it persistent, you may create a udev rule:

/etc/udev/rules.d/30-amdgpu-pm.rules
KERNEL=="card0", SUBSYSTEM=="drm", DRIVERS=="amdgpu", ATTR{device/power_dpm_force_performance_level}="high"

To determine the KERNEL name execute:

$ find /sys/class/drm/ -regextype awk -regex '.+/card[0-9]+' -printf '%f\n'

There is also a GUI solution [4] where you can manage the "power_dpm" with radeon-profile-gitAUR and radeon-profile-daemon-gitAUR.

Artifacts in Chromium

If you see artifacts in Chromium, forcing the vulkan-based backend might help. Go to chrome://flags and enable #ignore-gpu-blocklist and #enable-vulkan.

R9 390 series poor performance and/or instability

If you experience issues [5] with a AMD R9 390 series graphics card, set radeon.cik_support=0 radeon.si_support=0 amdgpu.cik_support=1 amdgpu.si_support=1 amdgpu.dc=1 as kernel parameters to force the use of amdgpu driver instead of radeon.

If it still does not work, disabling DPM might help, add radeon.cik_support=0 radeon.si_support=0 amdgpu.cik_support=1 amdgpu.si_support=1 to the kernel parameters.

Freezes with "[drm] IP block:gmc_v8_0 is hung!" kernel error

If you experience freezes and kernel crashes during a GPU intensive task with the kernel error " [drm] IP block:gmc_v8_0 is hung!" [6], a workaround is to set amdgpu.vm_update_mode=3 as kernel parameters to force the GPUVM page tables update to be done using the CPU. Downsides are listed here [7].

Screen flickering white/gray

When you change resolution or connect to an external monitor, if the screen flickers or stays white, add amdgpu.sg_display=0 as a kernel parameter.

System freeze or crash when gaming on Vega cards

Dynamic power management may cause a complete system freeze whilst gaming due to issues in the way GPU clock speeds are managed. [8] A workaround is to disable dynamic power management, see ATI#Dynamic power management for details.

WebRenderer (Firefox) corruption

Artifacts and other anomalies may present themselves (e.g. inability to select extension options) when WebRenderer is force enabled by the user. Workaround is to fall back to OpenGL compositing.

Double-speed or "chipmunk" audio, or no audio when a 4K@60Hz device is connected

This is sometimes caused by a communication issue between an AMDGPU device and a 4K display connected over HDMI. A possible workaround is to enable HDR or "Ultra HD Deep Color" via the display's built-in settings. On many Android based TVs, this means setting this to "Standard" instead of "Optimal".

Issues with power management / dynamic re-activation of a discrete amdgpu graphics card

If you encounter issues where the kernel driver is loaded, but the discrete graphics card still is not available for games or becomes disabled during use (similar to [9]), you can workaround the issue by setting the kernel parameter amdgpu.runpm=0, which prevents the dGPU from being powered down dynamically at runtime.

kfd: amdgpu: TOPAZ not supported in kfd

In the system journal or the kernel message keyring a critical level error message

kfd: amdgpu: TOPAZ not supported in kfd

may appear. If you are not planning to use Radeon Open Compute, this can be safely ignored. It is not supported in TOPAZ, as they are old GPUs. [10] [11]

High idle power draw due to MCLK locked at MAX (1000MHz), or MIN (96MHz) causing low game performance (on 6.4 kernel)

On high resolutions and refresh rates, the MCLK (vram / memory clock) may be locked at the highest clock rate (1000MHz) [12] [13] causing higher GPU idle power draw. On Linux kernel 6.4.x, MCLK clocks at the lowest (96MHz), causing low performance in games [14] [15].

This is likely due to a monitor not using Coordinated Video Timings (CVT) with a low V-Blank value for the affected resolutions and refresh rates, see this gist for a workaround.

Failure to suspend to RAM

The amdgpu kernel module tries to buffer VRAM in RAM when the system enters S3 to prevent memory loss through VRAM decay which is not sufficiently refreshed.

If you are using a lot of VRAM and are short on free RAM this can fail despite sufficient SWAP memory would be available, because the IO subsystem might have been suspended before.

You will see something like:

kernel: systemd-sleep: page allocation failure: order:0, mode:0x100c02(GFP_NOIO|__GFP_HIGHMEM|__GFP_HARDWALL), nodemask=(null),cpuset=/,mems_allowed=0
kernel: Call Trace:
kernel:  <TASK>
kernel:  dump_stack_lvl+0x47/0x60
kernel:  warn_alloc+0x165/0x1e0
kernel:  __alloc_pages_slowpath.constprop.0+0xd7d/0xde0
kernel:  __alloc_pages+0x32d/0x350
kernel:  ttm_pool_alloc+0x19f/0x600 [ttm 0bd92a9d9dccc3a4f19554535860aaeda76eb4f4]

As a workaround, a userspace service can ensure to allocate enough RAM for the VRAM to be buffered by swapping out enough RAM before the system is suspended.

Failure to shut down and to suspend

This article or section needs expansion.

Reason: Missing kernel info and bug reports. (Discuss in Talk:AMDGPU)

hid_sensor_*_3d group of kernel modules can cause system lockups on bootup, shutdown, and suspend. Process list will show multiple instances of udev-worker which then fail to freeze upon system sleep.

You will see something like:

kernel: PM: suspend entry (deep)
kernel: Filesystems sync: 0.002 seconds
kernel: Freezing user space processes
kernel: Freezing user space processes failed after 20.004 seconds (1 tasks refusing to freeze, wq_busy=0):
kernel: task:(udev-worker)   state:D stack:0     pid:479   tgid:479   ppid:422    flags:0x00004006
kernel: Call Trace:
kernel:  <TASK>
kernel:  __schedule+0x3db/0x1520
kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
kernel:  ? __wake_up_common+0x78/0xa0
kernel:  ? srso_alias_return_thunk+0x5/0xfbef5

To work around this problem, blacklist the problematic modules by creating e.g. /etc/modprobe.d/blacklist-hid_sensors.conf

blacklist hid_sensor_accel_3d
blacklist hid_sensor_gyro_3d
blacklist hid_sensor_magn_3d

See also