GPGPU stands for General-purpose computing on graphics processing units.
OpenCL (Open Computing Language) is an open, royalty-free parallel programming specification developed by the Khronos Group, a non-profit consortium.
The OpenCL specification describes a programming language, a general environment that is required to be present, and a C API to enable programmers to call into this environment.
To execute programs that use OpenCL, a compatible hardware runtime needs to be installed.
- or : OpenCL support with clover and rusticl for mesa drivers
ROC_ENABLE_PRE_VEGA=1. This is similar, but not quite equivalent to specifying
opencl=rocrin ubuntu's amdgpu-install, because this package's rocm version differs from ubuntu's installer version.
: Part of AMD's ROCm GPU compute stack, officially supporting GFX8 and later cards (Fiji, Polaris, Vega), with unofficial and partial support for Navi10 based cards. To support cards older than Vega, you need to set the runtime variable
opencl=legacyin ubuntu's amdgpu-install.
AUR: Legacy Orca OpenCL repackaged from AMD's ubuntu releases. Equivalent to
opencl=rocr,legacyin ubuntu's amdgpu-install.
AUR, AUR: ROCr and Orca OpenCL repackaged from AMD's ubuntu releases. Equivalent to
- AUR: AMD CPU runtime
- NVIDIA runtime : official
- : a.k.a. the Neo OpenCL runtime, the open-source implementation for Intel HD Graphics GPU on Gen8 (Broadwell) and beyond.
- or : OpenCL support with clover and rusticl for mesa drivers
- AUR: the open-source implementation for Intel HD Graphics GPU on Gen7 (Ivy Bridge) and beyond, deprecated by Intel in favour of NEO OpenCL driver, remains recommended solution for legacy hardware platforms (e.g. Ivy Bridge, Haswell).
- AUR: the proprietary implementation for Intel HD Graphics GPU on Gen7 (Ivy Bridge) and beyond, deprecated by Intel in favour of NEO OpenCL driver, remains recommended solution for legacy hardware platforms (e.g. Ivy Bridge, Haswell).
- AUR: the implementation for Intel Core and Xeon processors. It also supports non-Intel CPUs.
- : LLVM-based OpenCL implementation (hardware independent)
There is compiler and translator enable OpenCL applications to be run over a Vulkan run-time.
- AUR: Clspv is a prototype compiler for a subset of OpenCL C to Vulkan compute shaders.
- AUR: clvk is a prototype implementation of OpenCL 3.0 on top of Vulkan using clspv as the compiler.
- xrt AUR: Xilinx Run Time for FPGA
- fpga-runtime-for-opencl:FPGA Runtime
To execute 32-bit programs that use OpenCL, a compatible hardware 32-bit runtime needs to be installed.
- or : OpenCL support for AMD/ATI Radeon mesa drivers (32-bit)
- : OpenCL implemention for NVIDIA (32-bit)
ICD loader (libOpenCL.so)
The OpenCL ICD loader is supposed to be a platform-agnostic library that provides the means to load device-specific drivers through the OpenCL API. Most OpenCL vendors provide their own implementation of an OpenCL ICD loader, and these should all work with the other vendors' OpenCL implementations. Unfortunately, most vendors do not provide completely up-to-date ICD loaders, and therefore Arch Linux has decided to provide this library from a separate project () which currently provides a functioning implementation of the current OpenCL API.
The other ICD loader libraries are installed as part of each vendor's SDK. If you want to ensure the ICD loader from the
/etc/ld.so.conf.d which adds
/usr/lib to the dynamic program loader's search directories:
This is necessary because all the SDKs add their runtime's lib directories to the search path through
The available packages containing various OpenCL ICDs are:
- : recommended, most up-to-date
- AUR by Intel. Provides OpenCL 2.0, deprecated in favour of .
For OpenCL development, the bare minimum additional packages required, are:
- : OpenCL ICD loader implementation, up to date with the latest OpenCL specification.
- : OpenCL C/C++ API headers.
The vendors' SDKs provide a multitude of tools and support libraries:
- Intel OpenCL SDK (old version, new OpenCL SDKs are included in the INDE and Intel Media Server Studio) AUR:
/opt/AMDAPPand apart from SDK files it also contains a number of code samples (
/opt/AMDAPP/SDK/samples/). It also provides the
clinfoutility which lists OpenCL platforms and devices present in the system and displays detailed information about them. As the SDK itself contains a CPU OpenCL driver, no extra driver is needed to execute OpenCL on CPU devices (regardless of its vendor).
AUR: This package is installed as
- : Nvidia's GPU SDK which includes support for OpenCL 3.0.
To see which OpenCL implementations are currently active on your system, use the following command:
$ ls /etc/OpenCL/vendors
To find out all possible (known) properties of the OpenCL platform and devices available on the system, install .
You can specify which implementations should your application see usingAUR. For example:
$ ocl-icd-choose amdocl64.icd:mesa.icd davinci-resolve-checker
Rusticl is a new OpenCL implementation written in Rust provided by . It can be enabled by using the environment variable
driver is a Gallium driver, such as
Optionally, if OpenCL applications still do not detect Rusticl, use the following environment variable:
- D: cl4d or DCompute
- Java: Aparapi or JOCL (a part of JogAmp)
- Mono/.NET: Open Toolkit
- Go: OpenCL bindings for Go
- Racket: Racket has a native interface on PLaneT that can be installed via raco.
- Rust: ocl
- Julia: OpenCL.jl
According to Wikipedia:SYCL:
- SYCL is a higher-level programming model to improve programming productivity on various hardware accelerators. It is a single-source embedded domain-specific language (eDSL) based on pure C++17.
- SYCL is a royalty-free, cross-platform abstraction layer that builds on the underlying concepts, portability and efficiency inspired by OpenCL that enables code for heterogeneous processors to be written in a “single-source” style using completely standard C++. SYCL enables single-source development where C++ template functions can contain both host and device code to construct complex algorithms that use hardware accelerators, and then re-use them throughout their source code on different types of data.
- While the SYCL standard started as the higher-level programming model sub-group of the OpenCL working group and was originally developed for use with OpenCL and SPIR, SYCL is a Khronos Group workgroup independent from the OpenCL working group since September 20, 2019 and starting with SYCL 2020, SYCL has been generalized as a more general heterogeneous framework able to target other systems. This is now possible with the concept of a generic backend to target any acceleration API while enabling full interoperability with the target API, like using existing native libraries to reach the maximum performance along with simplifying the programming effort. For example, the Open SYCL implementation targets ROCm and CUDA via AMD's cross-vendor HIP.
- Source). AUR Codeplay's proprietary implementation of SYCL 1.2.1. Can target SPIR, SPIR-V and experimentally PTX (NVIDIA) as device targets (ends of support on 1st september 2023, will get merged into intel llvm implementation
- AUR: Open source implementation mainly driven by Xilinx.
- AUR and AUR: Free implementation built over AMD's HIP instead of OpenCL. Is able to run on AMD and NVIDIA GPUs.
- : Intel's Data Parallel C++: the oneAPI Implementation of SYCL.
Checking for SPIR support
Most SYCL implementations are able to compile the accelerator code to SPIR or SPIR-V. Both are intermediate languages designed by Khronos that can be consumed by an OpenCL driver. To check whether SPIR or SPIR-V are supported can be used:
$ clinfo | grep -i spir
Platform Extensions cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_intel_exec_by_local_thread cl_khr_spir cl_khr_fp64 cl_khr_image2d_from_buffer cl_intel_vec_len_hint IL version SPIR-V_1.0 SPIR versions 1.2
ComputeCpp additionally ships with a tool that summarizes the relevant system information:
Device 0: Device is supported : UNTESTED - Untested OS CL_DEVICE_NAME : Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz CL_DEVICE_VENDOR : Intel(R) Corporation CL_DRIVER_VERSION : 18.1.0.0920 CL_DEVICE_TYPE : CL_DEVICE_TYPE_CPU
Drivers known to at least partially support SPIR or SPIR-V include broken link: package not found]., AUR, and AUR[
SYCL requires a working C++11 environment to be set up. There are a few open source libraries available:
- ComputeCpp SDK: Collection of code examples, integration for ComputeCpp
- SYCL-DNN: Neural network performance primitives
- SYCL-BLAS: Linear algebra performance primitives
- VisionCpp: Computer Vision library
- SYCL Parallel STL: GPU implementation of the C++17 parallel algorithms
- Proprietary NVIDIA kernel module
- CUDA "driver" and "runtime" libraries
- Additional libraries: CUBLAS, CUFFT, CUSPARSE, etc.
- CUDA toolkit, including the
- CUDA SDK, which contains many code samples and examples of CUDA and OpenCL programs
The kernel module and CUDA "driver" library are shipped in
cuda-gdb needs AUR to be installed, see FS#46598.
/opt/cuda. The script in
/etc/profile.d/cuda.sh sets the relevant environment variables so all build systems that support CUDA can find it.
To find whether the installation was successful and whether CUDA is up and running, you can compile the CUDA samples. One way to check the installation is to run the
- Fortran: PGI CUDA Fortran Compiler
- Haskell: The accelerate package lists available CUDA backends
- Java: JCuda
- Mathematica: CUDAlink
- Mono/.NET: CUDAfy.NET, managedCuda
- Perl: KappaCUDA, CUDA-Minimal
- Ruby: rbcuda
- Rust: cuda-sys (bindings) or RustaCUDA (high-level wrapper)
Switch between different versions
Go to CUDA Toolkit Archive, select your CUDA version, and get the installation script by clicking on Linux > x86_64 > Ubuntu > (highest number) > runfile (local).
It will give you instructions to retrieve the script (a
wget command) and run it. Follow them.
Your CUDA version will be installated to
/usr/local/cuda-VERSION. We will create a symlink to
# ln -s /usr/local/cuda-VERSION /usr/local/cuda
Now we will add CUDA folders to your paths. In your
export PATH=/usr/local/cuda/bin:$PATH export LD_LIBRARY_PATH=/usr/local/cuda/lib:$PATH
To change CUDA version, install another version following steps 1 and 2, and re-link to the new version:
# ln -s /usr/local/cuda-ANOTHER-VERSION /usr/local/cuda
ROCm (Radeon Open Compute) is AMD's open-source parallel computing architecture and framework. Although it requires an AMD GPU some ROCm tools are hardware agnostic. See the ROCm for Arch Linux repository for more information.
- : Develop applications using HIP and libraries for AMD platforms.
- : Develop OpenCL-based applications for AMD platforms.
The Heterogeneous Interface for Portability (HIP) is AMD's dedicated GPU programming environment for designing high performance kernels on GPU hardware. HIP is a C++ runtime API and programming language that allows developers to create portable applications on different platforms.
- : The base runtime, packages to run HIP applications on the AMD platform.
- : The Heterogeneous Interface for AMDGPUs in ROCm. Supports GPUs from the polaris architecture (RX 500 series) till AMD's latest RDNA 2 architecture (RX 6000 series)
- : AMD's open source deep learning library with HIP backend.
- AUR: The Heterogeneous Interface for NVIDIA GPUs in ROCm.
The AOMP - an open source Clang/LLVM based compiler with added support for the OpenMP API on AMD GPUs.AUR package provides
Thepackage is the part of the ROCm framework providing an OpenCL runtime.
OpenCL image support
The latest ROCm versions now includes OpenCL Image Support used by GPGPU accelerated software such as Darktable. ROCm with the AMDGPU open source graphics driver are all that is required. AMDGPU PRO is not required.
$ /opt/rocm/bin/clinfo | grep -i "image support"
Image support Yes
List of GPGPU accelerated software
- Blender – CUDA support for Nvidia GPUs and HIP support for AMD GPUs. More information here.
- FFmpeg – more information here.
- GIMP – experimental – more information here.
- LibreOffice Calc – more information here.
- mpv - See mpv#Hardware video acceleration.
- – Find all possible (known) properties of the OpenCL platform and devices available on the system.
- AUR – a GPU memtest. Despite its name, is supports both CUDA and OpenCL.
- – OpenCL feature requires at least 1 GB RAM on GPU and Image support (check output of clinfo command).
- DaVinci Resolve - a non-linear video editor. Can use both OpenCL and CUDA.
- AUR - Used for searching the neural network (supports tensorflow, OpenCL, CUDA, and openblas)
- - PyTorch with CUDA backend
- - Port of TensorFlow to CUDA
- AUR - Port of TensorFlow to SYCL
- - High Perf CryptoNote CPU and GPU (OpenCL, CUDA) miner