Stress testing

From ArchWiki

Stress testing is the process of running various work loads on a computer to assess its stability. This is often used to reliably check the stability of overclocked/undervolted hardware and monitor the thermal behavior of the system (e.g. maximum temperatures, throttling, noise levels). There are several programs available to stress test various parts of the system such as CPU, GPU, RAM, and storage, using different types of work loads.

Stress testing tasks

The table below lists some stress testing software based on the kind of test and the overall intensity of the work load. It is important to stress test using mixed loads to verify stability under many use cases.

Warning: Before proceeding, it is highly recommended that users have some means to monitor the system temperatures. See Sensors.
Work load Tested hardware1 Task Description
Light2
CPU, storage Updating patches Custom script Refreshing hundreds of kernel patches in the OpenWRT project. See #Updating patches for OpenWRT.
CPU, storage Writing a disk image See #Writing to an image file.
RAM Memory stressing See #MemTest86+.
Realistic3
CPU, RAM, storage Compilation Parallel compilation is a good way to stress test the CPU. See #GCC.
CPU, RAM Video encoding ffmpeg, x264, handbrake-cli, etc. can be used to encode video. See #Video encoding.
CPU, RAM Cryptocurrencies mining xmrig - xmrig --stress will use different cryptocurrency mining algorithms (based on CPU model) to generate the highest possible load. A good way to test stability and temperatures.
GPU 3D rendering unigine-heavenAUR is a GPU benchmark that runs in a loop. It is a decent stress test for GPUs. See Benchmarking#Graphics.
Synthetic4 CPU, RAM, storage Synthetic stressing stress is a simple CPU, memory, I/O, and disk workload generator implemented in C. See #stress.
CPU, RAM Prime numbers calculation mprime-binAUR factors large numbers and is an excellent way to stress CPU and memory. See #MPrime.
CPU Algebra calculation linpackAUR - Linpack makes use of the BLAS (Basic Linear Algebra Subprograms) libraries for performing basic vector and matrix operations and is an excellent way to stress CPUs for stability. See #Linpack.
CPU Pi decimals calculation systesterAUR Systester is a multithreaded piece of software capable of deriving values of pi out to 128,000,000 decimal places. It has built in check for system stability. See #Systester.
RAM Memory stressing stressapptestAUR is a memory interface test.
  • 1 The main target of the test, virtually all testing will also involve the CPU and RAM to some extent.
  • 2 Light tests do not push the components very hard (in terms of power/heat limits). These tests are still useful to test how the hardware behaves in lower power levels (P states), in particular for undervolted systems.
  • 3 Realistic tests are based on real world workloads.
  • 4 Synthetic tests are explicitly designed to torture the hardware as much as possible and may not be representative of real-world workloads.
Tip: To ensure the stability of a system, it is recommended to run such tests for a long period of time, from a few hours to a few days, in different temperature conditions. If the room temperature can for example vary significantly between winter and summer time, this is something to be considered.

Updating patches for OpenWRT

A good stability test of a low load workload is to run though updating the patch sets in the OpenWRT project. Follow these steps.

Note: Several dependencies including git and curl are needed, most others should be provided by the dependencies of the base-devel meta package. To be safe, users can always install the metapackage openwrt-develAUR.
git clone --depth 1 git@github.com:openwrt/openwrt.git
cd openwrt
mkdir -p staging_dir/host/bin
cp /usr/bin/sed ./staging_dir/host/bin
curl -Os https://raw.githubusercontent.com/KanjiMonster/maintainer-tools/master/update_kernel.sh
chmod +x update_kernel.sh
./update_kernel.sh -v -u 5.15

stress

stress performs a loop that calculates the square root of a random number in order to stress the CPU. It can run simultaneously several workers to load all the cores of a CPU for example. It can also generate memory, I/O or disk workload depending on the parameters passed. The FAQ provides examples and explanations.

To spawn 4 workers working on calculating a square root, use the command:

$ stress --cpu 4

stress++

stress++AUR is a lightweight stress-testing program written by CodeLog that stimulates the processor by computing the Ackermann function.

MPrime

MPrime (also known as Prime95 in its Windows and MacOS implementation) is recognised universally as one defacto measure of system stability. MPrime under torture test mode will perform a series of very CPU intensive calculations and compare the values it gets to known good values.

The Linux implementation is called mprimeAUR.

To run mprime, simply open a shell and type "mprime":

$ mprime
Note: If using CPU frequency scaling, sometimes users need to manually set the processor to run with its highest multiplier because mprime uses a nice value that does not always trip the step-up in multiplier.

When the software loads, simply answer 'N' to the first question to begin the torture testing:

Main Menu

1.  Test/Primenet
2.  Test/Worker threads
3.  Test/Status
4.  Test/Continue
5.  Test/Exit
6.  Advanced/Test
7.  Advanced/Time
8.  Advanced/P-1
9.  Advanced/ECM
10.  Advanced/Manual Communication
11.  Advanced/Unreserve Exponent
12.  Advanced/Quit Gimps
13.  Options/CPU
14.  Options/Preferences
15.  Options/Torture Test
16.  Options/Benchmark
17.  Help/About
18.  Help/About PrimeNet Server

There are several options for the torture test (menu option 15).

  • Small FFTs (option 1) to stress the CPU
  • In-place large FFTs (option 2) to test the CPU and memory controller
  • Blend (option 3) is the default and constitutes a hybrid mode which stresses the CPU and RAM.

Errors will be reported should they occur both to stdout and to ~/results.txt for review later. Many do not consider a system as 'stable' unless it can run the Large FFTs for a 24 hour period.

Example ~/results.txt; note that the two runs from 26-June indicate a hardware failure. In this case, due to insufficient vcore to the CPU:

[Sun Jun 26 20:10:35 2011]
FATAL ERROR: Rounding was 0.5, expected less than 0.4
Hardware failure detected, consult stress.txt file.
FATAL ERROR: Rounding was 0.5, expected less than 0.4
Hardware failure detected, consult stress.txt file.
[Sat Aug 20 10:50:45 2011]
Self-test 480K passed!
Self-test 480K passed!
[Sat Aug 20 11:06:02 2011]
Self-test 128K passed!
Self-test 128K passed!
[Sat Aug 20 11:22:10 2011]
Self-test 560K passed!
Self-test 560K passed!
...
Note: Users suspecting bad memory or memory controllers should try the blend test first as the small FFT test uses very little memory.

Linpack

linpackAUR makes use of the BLAS (Basic Linear Algebra Subprograms) libraries for performing basic vector and matrix operations. It is an excellent way to stress CPUs for stability (only Intel CPUs are supported). After installation, users should copy /usr/share/linpack/linpack.conf to ~/.config/linpack.conf and adjust it according to the amount of memory on the system.

Systester

SystesterAUR (aka SuperPi for Windows) is available in both CLI and GUI version. It tests system stability by calculating up to 128 millions of Pi digits and includes error checking. Note that one can select from two different calculation algorithms: Quadratic Convergence of Borwein and Gauss-Legendre. The latter being the same method that the popular SuperPi for Windows uses.

A CLI example using 8 threads is given:

$ systester-cli -gausslg 64M -threads 8

MemTest86+

Use MemTest86 (proprietary) or Memtest86+ (GPL) to test your memory (RAM).

Tip:
  • A reliable source of the version history is the History of MemTest86 section in memtest86.com, in particular the section "2002 - 2004" and following. Notice the proprietary MemTest86 from version 5 through 7 claims to support both BIOS and UEFI, but they simply bundle old and new versions.
  • Allowing tests to run for at least 10 cycles without errors is usually sufficient.

Writing to an image file

A good stability test under a low load workload is using dd to format an image. This can be a physical disk or a loop mounted image. The script below uses mounted image and cycles through each core one-by-one. Note that you should adjust the variables in the top of script to match your system. By default the script will run the command just once per core. It can be easily customised to run on known-weak cores rather than scanning all core 0 through n by altering the for loop. Run the script as root.

format-test.sh
#!/bin/bash

# define the path to store the image, recommended to be a tmpfs mounted location to avoid read/writes
img=/scratch/image.img

# define the mount point
mnt=/mnt/loop

# size of time arg to pass to truncate, make sure you select something less than the free memory on the system
# see truncate --help for available options
size=40G

# defaults to 1 less than the number of virtual cores, manually redefine if desired
max=$(($(nproc) - 1))

if [[ ! -f $img ]]; then
  truncate -s $size $img
  mkfs.ext4 $img
  [[ -d $mnt ]] || mkdir -p $mnt
  if ! mountpoint -q $mnt; then
    mount -o loop $img $mnt || exit 1
  fi
fi

for i in $(eval echo "{0..$max}"); do
  echo "using core $i of $max"
  taskset -c "$i" time dd if=/dev/zero of=$mnt/zerofill status=progress
done

umount $mnt
rm $img

GCC

Parallel compilation using GCC (or other compilers) will generate a heavy load on the CPU and memory. To avoid I/O bottlenecking, compile on a SSD or in a tmpfs.

A good example would be compiling the kernel: see Kernel/Arch build system for detailed instructions, run makepkg -sf MAKEFLAGS="-j$(nproc)" at Kernel/Arch build system#Compiling.

Video encoding

Most video encoders are highly parallel and are designed to use most of a CPU's capabilities. The example below will encode noise using x265, and discard the result. This will heavily load the CPU.

ffmpeg -y -f rawvideo -video_size 1920x1080 -pixel_format yuv420p -framerate 60 -i /dev/urandom -c:v libx265 -preset placebo -f matroska /dev/null

Discovering errors

Some stressing applications like #MPrime or #Linpack have built in consistency checks to discover errors due to non-matching results. A more general and simple method for measuring hardware instabilities can be found in the kernel itself. To use it, simply filter the journal on a crash like so:

# journalctl -k --grep=mce

Multicore chips can also give info as to which physical/logical core gave the error. This can be important if users are optimizing settings on a per-core basis.

The kernel can throw these errors while the stressing application is running, before it ends the calculation and reports the error, thus providing a very sensitive method to assess stability. Consider the following from a Ryzen 5900X:

mce: [Hardware Error]: Machine check events logged
mce: [Hardware Error]: CPU 21: Machine Check: 0 Bank 5: baa0000000030150
mce: [Hardware Error]: TSC 0 MISC d012000100000000 SYND 4d000002 IPID 500b000000000
mce: [Hardware Error]: PROCESSOR 2:a20f10 TIME 1625265814 SOCKET 0 APIC 4 microcode a201016

This chip as 12 physical cores. In this case, CPU 21 can be traced back to physical core 10. Use lstopo from hwloc to print the hardware topology.

Core 0 = CPU 0 + CPU 1
Core 1 = CPU 2 + CPU 3
Core 2 = CPU 4 + CPU 5
Core 3 = CPU 6 + CPU 7
Core 4 = CPU 8 + CPU 9
Core 5 = CPU 10 + CPU 11
Core 6 = CPU 12 + CPU 13
Core 7 = CPU 14 + CPU 15
Core 8 = CPU 16 + CPU 17
Core 9 = CPU 18 + CPU 19
Core 10 = CPU 20 + CPU 21
Core 11 = CPU 22 + CPU 23
Note: Numbering can vary based on different CPU model and vendor, but in general both Cores and CPUs start at 0, not 1.