External GPU

From ArchWiki

On computers equipped with Thunderbolt 3+ or USB4, it is possible to attach a desktop-grade external graphics card (eGPU) using a GPU enclosure. eGPU.io is a good resource with buyer's guide and a community forum. While some manual configuration (shown below) is needed for most modes of operation, Linux support for eGPUs is generally good.

Note: For USB4 laptops the data rate is specified to be at minimum 20 Gbit/s and optionally superior (40, 80, or more ). When using a Thunderbolt enclosure it is a good idea to ensure the laptop USB4 implementation supports the same data rate.

Installation

Thunderbolt

The eGPU enclosure Thunderbolt device may need to be authorized first after plugging in (based on your BIOS/UEFI Firmware configuration). Follow Thunderbolt#User device authorization. If successful, the external graphics card should show up in lspci:

$ lspci -d ::03xx
00:02.0 VGA compatible controller: Intel Corporation UHD Graphics 620 (rev 07)             # internal GPU
1a:10.3 VGA compatible controller: NVIDIA Corporation GP107 [GeForce GTX 1050] (rev a1)    # external GPU

Depending on your computer, its firmware and enclosure firmware, Thunderbolt will limit host <-> eGPU bandwidth to some extent due to the number of PCIe lanes and OPI Mode:

# dmesg | grep PCIe
[19888.928225] pci 0000:1a:10.3: 8.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x4 link at 0000:05:01.0 (capable of 126.016 Gb/s with 8.0 GT/s PCIe x16 link)

Drivers

A driver compatible with your GPU model should be installed:

If installed successfully, lspci -k should show that a driver has been associated with your card:

$ lspci -k -d ::03xx
1a:10.3 VGA compatible controller: NVIDIA Corporation GP107 [GeForce GTX 1050] (rev a1)
        Subsystem: NVIDIA Corporation GP107 [GeForce GTX 1050]
        Kernel driver in use: nvidia
        Kernel modules: nouveau, nvidia_drm, nvidia

AMDGPU

Note that the AMDGPU driver (with either Thunderbolt or USB4) might in some cases set the wrong pcie_gen_cap option and fall back to PCIe gen 1.1 speed, with possibly serious performance issues. In this case the proper value can be set as module option (see Kernel module#Using modprobe.d) or even passed as kernel parameters:

/etc/modprobe.d/amd-egpu-pcie-speed.conf
options amdgpu pcie_gen_cap=0x40000

This will set PCIe gen 3 speed. A full list of options can be found in amd_pcie.h.

NVIDIA

For NVIDIA eGPUs on some systems you may need to early load the thunderbolt kernel module to ensure it is loaded before nvidia_drm.

Compute-only workloads

Right after completing installation steps, compute-only workloads like GPGPU#CUDA that do not need to display anything should work without any extra configuration. The nvidia-smi utility (provided by the nvidia-utils package) should work with the proprietary NVIDIA driver. Proprietary NVENC/NVDEC should work (without OpenGL interop).

This use-case should also support full hotplug. Hot-unplug should be also possible (probably depending on drivers used). On NVIDIA, active nvidia-persistenced is expected to prevent clean hot-unplug.

Xorg

Multiple setups combining internal (iGPU) and external (eGPU) cards are possible, each with own advantages and disadvantages.

Xorg rendered on eGPU, PRIME display offload to iGPU

  • Most programs that make use of GPU run out-of-the-box on eGPU: glxinfo/glxgears, eglinfo/eglgears_x11, NVENC/NVDEC (including OpenGL interop).
  • Xorg only starts with the eGPU plugged in.
  • Monitors attached to eGPU work out-of-the-box, PRIME display offload can be used for monitors attached to iGPU (i.e. internal laptop screen).

Main article is PRIME#Reverse PRIME. Also documented in NVIDIA driver docs Chapter 34. Offloading Graphics Display with RandR 1.4.

Use Xorg configuration snippet like this one:

/etc/X11/xorg.conf.d/80-egpu-primary-igpu-offload.conf
Section "Device"
    Identifier "Device0"
    Driver     "nvidia"
    BusID      "PCI:26:16:3"                 # Edit according to lspci, translate from hex to decimal.
    Option     "AllowExternalGpus" "True"    # Required for proprietary NVIDIA driver.
EndSection

Section "Module"
    # Load modesetting module for the iGPU, which should show up in XrandR 1.4 as a provider.
    Load "modesetting"
EndSection
Note: Xorg uses decimal bus IDs, while most other tools use hexadecimal. We had to convert 1a:10.3 to 26:16:3 for xorg.conf snippet.
Tip: With modern Xorg, it is not needed to specify ServerLayout and Screen sections, as these are inferred automatically. First Device defined will be considered primary.

To validate this setup, use xrandr --listproviders, which should display

Providers: number : 2
Provider 0: id: 0x1b8 cap: 0x1, Source Output crtcs: 4 outputs: 4 associated providers: 0 name:NVIDIA-0
Provider 1: id: 0x1f3 cap: 0xf, Source Output, Sink Output, Source Offload, Sink Offload crtcs: 3 outputs: 5 associated providers: 0 name:modesetting

To output to internal laptop screen and/or other monitors attached to iGPU, RandR 1.4 PRIME display offload can be used, using names from above xrandr --listproviders output:

xrandr --setprovideroutputsource modesetting NVIDIA-0 && xrandr --auto
Note: The xrandr --auto is optional and may be substituted by any RandR-based display configuration tool. Its presence prevents all-screens-black situation.

You may want to run this command before a display manager shows login propmt or before desktop environment starts, see xrandr#Configuration and xinit.

Vulkan may enumerate GPUs independently of Xorg, so in order to run for example vkcube in this setup, one may need to pass --gpu_number 1 option. Alternatively, a layer to reorder GPUs during enumeration can be activated with the same effect: __NV_PRIME_RENDER_OFFLOAD=1 vkcube or equivalently prime-run vkcube.

Tip: If using optimus-manager on a laptop, you can render on eGPU by adding the BusId of the eGPU in the appropriate file for your mode and graphics card in /etc/optimus-manager/xorg/.

Xorg rendered on iGPU, PRIME render offload to eGPU

  • Programs are rendered on iGPU by default, but PRIME render offload can be used to render them on eGPU.
  • Xorg starts even with eGPU disconnected, but render/display offload will not work until it is restarted.
  • Monitors attached to iGPU (i.e. internal laptop screen) work out-of-the-box, PRIME display offload can be used for monitors attached to eGPU.

Main article is PRIME#PRIME GPU offloading. Also documented in NVIDIA driver docs Chapter 35. PRIME Render Offload.

With many discrete GPU drivers, this mode should be the default without any manual Xorg configuration. If that does not work, or if you use proprietary NVIDIA drivers, use the following:

/etc/X11/xorg.conf.d/80-igpu-primary-egpu-offload.conf
Section "Device"
    Identifier "Device0"
    Driver     "modesetting"
EndSection

Section "Device"
    Identifier "Device1"
    Driver     "nvidia"
    BusID      "PCI:26:16:3"                 # Edit according to lspci, translate from hex to decimal.
    Option     "AllowExternalGpus" "True"    # Required for proprietary NVIDIA driver.
EndSection

To validate this setup, use xrandr --listproviders, which should display

$ xrandr --listproviders
Providers: number : 2
Provider 0: id: 0x47 cap: 0xf, Source Output, Sink Output, Source Offload, Sink Offload crtcs: 3 outputs: 5 associated providers: 0 name:modesetting
Provider 1: id: 0x24a cap: 0x2, Sink Output crtcs: 4 outputs: 4 associated providers: 0 name:NVIDIA-G0

This article or section is a candidate for merging with PRIME#Configure applications to render using GPU.

Notes: We should link to the dedicated page for these variables instead of duplicating them here. (Discuss in Talk:External GPU)

To render some_program on the eGPU, PRIME render offload can be used:

  • for proprietary NVIDIA drivers:
    $ __NV_PRIME_RENDER_OFFLOAD=1 __VK_LAYER_NV_optimus=NVIDIA_only __GLX_VENDOR_LIBRARY_NAME=nvidia some_program
  • for proprietary NVIDIA drivers (convenience wrapper):
    $ prime-run some_program
  • for open-source drivers:
    $ DRI_PRIME=1 some_program

To output to monitors connected to eGPU, RandR 1.4 PRIME display offload can be again used:

$ xrandr --setprovideroutputsource NVIDIA-G0 modesetting && xrandr --auto
Tip: The order of providers is different, and the NVIDIA the has a slightly different name this time.

NVIDIA drivers 460.27.04+ implement an optimization for a special case of combined render and display offloads:

Added support for “Reverse PRIME Bypass”, an optimization that bypasses the bandwidth overhead of PRIME Render Offload and PRIME Display Offload in conditions where a render offload application is fullscreen, unredirected, and visible only on a given NVIDIA-driven PRIME Display Offload output. Use of the optimization is reported in the X log when verbose logging is enabled in the X server.

Separate Xorg instance for eGPU

Main article is nvidia-xrun#External GPU setup.

Known issues with eGPUs on Xorg

  • hotplug is not supported with most discrete GPU Xorg drivers: the eGPU needs to be plugged in when Xorg starts. Logging out and in again should suffice to restart Xorg.
  • hot-unplug is not supported at all: doing so leads to system instability or outright freezes (as acknowledged in the NVIDIA documentation).

Wayland

Wayland support for eGPUs (or multiple GPUs in general) is much less tested, but should work with even less manual configuration.

Note that there need to be explicit GPU hotplug support by the Wayland compositor, but most already have some level of support:

For open-source drivers, DRI offloading works fine:

$ DRI_PRIME=1 some_program

Some projects, such as all-ways-egpu, are trying to provide more efficient methods for GPU selection under Wayland.

Hotplugging NVIDIA eGPU

It is possible to hotplug eGPU when using Wayland, and use PRIME feature. NVIDIA already has great implementation of PRIME for dGPUs, and it is working same way for eGPU.

First you need to make sure that no program uses NVIDIA modules. EGL programs tend to use 1MB dGPU memory per program, even if they run on iGPU, and it can be seen in nvidia-smi. To avoid this, add __EGL_VENDOR_LIBRARY_FILENAMES=/usr/share/glvnd/egl_vendor.d/50_mesa.json as environment variable. Best place for that is /etc/environment.d/50_mesa.conf.

Then you unload NVIDIA modules:

# rmmod nvidia_uvm
# rmmod nvidia_drm
# rmmod nvidia_modeset
# rmmod nvidia

When NVIDIA modules is no longer loaded, you can connect external GPU. When GPU is initialized, load NVIDIA modules again with modprobe nvidia-drm command or modprobe nvidia-current-drm. Naming depends on source of modules, either it is drivers from NVIDIA website or from package manager. In some cases (for example, for GIGABYTE AORUS GAMING BOX) eGPU does not work with proprietary modules, so you might need to load open-source ones: modprobe nvidia-current-open-drm.

When modules successfully loaded, prime feature will work, but since we set __EGL_VENDOR_LIBRARY_FILENAMES variable to use MESA, we need to add __EGL_VENDOR_LIBRARY_FILENAMES=/usr/share/glvnd/egl_vendor.d/10_nvidia.json before starting program. Full string will look like:

__GLX_VENDOR_LIBRARY_NAME=nvidia __NV_PRIME_RENDER_OFFLOAD=1 __VK_LAYER_NV_optimus=NVIDIA_only __EGL_VENDOR_LIBRARY_FILENAMES=/usr/share/glvnd/egl_vendor.d/10_nvidia.json %command%

For GNOME users, you might need to patch switcheroo-control to include __EGL_VENDOR_LIBRARY_FILENAMES into list of environment variables. This will allow programs to run on eGPU naturally with right click and "Launch using Dedicated Graphics Card". But this is beyond scope of this article.

Hotplugging NVIDIA eGPU and temporarily disabling NVIDIA dGPU

In case you have iGPU, NVIDIA dGPU and want to connect NVIDIA eGPU, you will encounter a conflict, where graphics renders only on dGPU, no matter what you do. To solve this, you need to temporarily disable dGPU, so NVIDIA driver will not notice it. Best way to do that is to override its driver.

First, you need to unload NVIDIA driver:

# rmmod nvidia_uvm
# rmmod nvidia_drm
# rmmod nvidia_modeset
# rmmod nvidia

Then, you need to override dGPU driver with driverctlAUR utility. In this example, 0000:01:00.0 is address of dGPU. It can be found with lspci utility.

# driverctl --nosave set-override 0000:01:00.0 vfio-pci

It is important to use --nosave parameter, to prevent driverctl to override driver on boot. It is useful in case something goes wrong, simple reboot cleans everything.

When dGPU is disabled, you can load kernel modules with modprobe nvidia-drm and then check if nvidia-smi shows 1 or 2 GPUs.

Bringing dGPU back is tricky, because it is unintuitive. First, unload NVIDIA modules, unplug eGPU and then run this series of commands:

# modprobe nvidia-current
# driverctl --nosave unset-override 0000:01:00.0
# modprobe nvidia-current
# driverctl --nosave unset-override 0000:01:00.0
# modprobe nvidia-current-modeset
# modprobe nvidia-current-drm

It is strange that we need to run first 2 commands twice, but otherwise it will not bring back dGPU. Command will error once, but it is not critical.