https://wiki.archlinux.org/api.php?action=feedcontributions&user=Wribbe&feedformat=atomArchWiki - User contributions [en]2024-03-28T14:02:50ZUser contributionsMediaWiki 1.41.0https://wiki.archlinux.org/index.php?title=PCI_passthrough_via_OVMF&diff=548481PCI passthrough via OVMF2018-10-19T18:59:17Z<p>Wribbe: /* Configuring libvirt */ Added quemu-block-iscsi as dependency, libvirtd crashed withouth it.</p>
<hr />
<div>[[Category:Virtualization]]<br />
[[ja:OVMF による PCI パススルー]]<br />
[[zh-hans:PCI passthrough via OVMF]]<br />
The Open Virtual Machine Firmware ([https://github.com/tianocore/tianocore.github.io/wiki/OVMF OVMF]) is a project to enable UEFI support for virtual machines. Starting with Linux 3.9 and recent versions of [[QEMU]], it is now possible to passthrough a graphics card, offering the VM native graphics performance which is useful for graphic-intensive tasks.<br />
<br />
Provided you have a desktop computer with a spare GPU you can dedicate to the host (be it an integrated GPU or an old OEM card, the brands do not even need to match) and that your hardware supports it (see [[#Prerequisites]]), it is possible to have a VM of any OS with its own dedicated GPU and near-native performance. For more information on techniques see the background [https://www.linux-kvm.org/images/b/b3/01x09b-VFIOandYou-small.pdf presentation (pdf)]. <br />
<br />
== Prerequisites ==<br />
<br />
A VGA Passthrough relies on a number of technologies that are not ubiquitous as of today and might not be available on your hardware. You will not be able to do this on your machine unless the following requirements are met :<br />
<br />
* Your CPU must support hardware virtualization (for kvm) and IOMMU (for the passthrough itself)<br />
** [https://ark.intel.com/Search/FeatureFilter?productType=processors&VTD=true List of compatible Intel CPUs (Intel VT-x and Intel VT-d)]<br />
** All AMD CPUs from the Bulldozer generation and up (including Zen) should be compatible.<br />
*** CPUs from the K10 generation (2007) do not have an IOMMU, so you '''need''' to have a motherboard with a [https://support.amd.com/TechDocs/43403.pdf#page=18 890FX] or [https://support.amd.com/TechDocs/48691.pdf#page=21 990FX] chipset to make it work, as those have their own IOMMU.<br />
* Your motherboard must also support IOMMU<br />
** Both the chipset and the BIOS must support it. It is not always easy to tell at a glance whether or not this is the case, but there is a fairly comprehensive list on the matter on the [https://wiki.xen.org/wiki/VTd_HowTo Xen wiki] as well as [[Wikipedia:List of IOMMU-supporting hardware]].<br />
* Your guest GPU ROM must support UEFI.<br />
** If you can find [https://www.techpowerup.com/vgabios/ any ROM in this list] that applies to your specific GPU and is said to support UEFI, you are generally in the clear. All GPUs from 2012 and later should support this, as Microsoft made UEFI a requirement for devices to be marketed as compatible with Windows 8.<br />
<br />
You will probably want to have a spare monitor or one with multiple input ports connected to different GPUs (the passthrough GPU will not display anything if there is no screen plugged in and using a VNC or Spice connection will not help your performance), as well as a mouse and a keyboard you can pass to your VM. If anything goes wrong, you will at least have a way to control your host machine this way.<br />
<br />
== Setting up IOMMU ==<br />
<br />
{{Note|<br />
* IOMMU is a generic name for Intel VT-d and AMD-Vi.<br />
* VT-d stands for ''Intel Virtualization Technology for Directed I/O'' and should not be confused with VT-x ''Intel Virtualization Technology''. VT-x allows one hardware platform to function as multiple “virtual” platforms while VT-d improves security and reliability of the systems and also improves performance of I/O devices in virtualized environments.<br />
}}<br />
<br />
Using IOMMU opens to features like PCI passthrough and memory protection from faulty or malicious devices, see [[Wikipedia:Input-output memory management unit#Advantages]] and [https://www.quora.com/Memory-Management-computer-programming/Could-you-explain-IOMMU-in-plain-English Memory Management (computer programming): Could you explain IOMMU in plain English?].<br />
<br />
=== Enabling IOMMU ===<br />
<br />
Ensure that AMD-Vi/Intel VT-d is supported by the CPU and enabled in the BIOS settings. Both normally show up alongside other CPU features (meaning they could be in an overclocking-related menu) either with their actual names ("VT-d" or "AMD-Vi") or in more ambiguous terms such as "Virtualization technology", which may or may not be explained in the manual.<br />
<br />
Enable IOMMU support by setting the correct [[kernel parameter]] depending on the type of CPU in use:<br />
<br />
* For Intel CPUs (VT-d) set {{ic|1=intel_iommu=on}} <br />
* For AMD CPUs (AMD-Vi) set {{ic|1=amd_iommu=on}}<br />
<br />
You should also append the {{ic|1=iommu=pt}} parameter. This will prevent Linux from touching devices which cannot be passed through.<br />
<br />
After rebooting, check dmesg to confirm that IOMMU has been correctly enabled:<br />
<br />
{{hc|dmesg {{!}} grep -e DMAR -e IOMMU|<br />
[ 0.000000] ACPI: DMAR 0x00000000BDCB1CB0 0000B8 (v01 INTEL BDW 00000001 INTL 00000001)<br />
[ 0.000000] Intel-IOMMU: enabled<br />
[ 0.028879] dmar: IOMMU 0: reg_base_addr fed90000 ver 1:0 cap c0000020660462 ecap f0101a<br />
[ 0.028883] dmar: IOMMU 1: reg_base_addr fed91000 ver 1:0 cap d2008c20660462 ecap f010da<br />
[ 0.028950] IOAPIC id 8 under DRHD base 0xfed91000 IOMMU 1<br />
[ 0.536212] DMAR: No ATSR found<br />
[ 0.536229] IOMMU 0 0xfed90000: using Queued invalidation<br />
[ 0.536230] IOMMU 1 0xfed91000: using Queued invalidation<br />
[ 0.536231] IOMMU: Setting RMRR:<br />
[ 0.536241] IOMMU: Setting identity map for device 0000:00:02.0 [0xbf000000 - 0xcf1fffff]<br />
[ 0.537490] IOMMU: Setting identity map for device 0000:00:14.0 [0xbdea8000 - 0xbdeb6fff]<br />
[ 0.537512] IOMMU: Setting identity map for device 0000:00:1a.0 [0xbdea8000 - 0xbdeb6fff]<br />
[ 0.537530] IOMMU: Setting identity map for device 0000:00:1d.0 [0xbdea8000 - 0xbdeb6fff]<br />
[ 0.537543] IOMMU: Prepare 0-16MiB unity mapping for LPC<br />
[ 0.537549] IOMMU: Setting identity map for device 0000:00:1f.0 [0x0 - 0xffffff]<br />
[ 2.182790] [drm] DMAR active, disabling use of stolen memory<br />
}}<br />
<br />
=== Ensuring that the groups are valid ===<br />
<br />
The following script should allow you to see how your various PCI devices are mapped to IOMMU groups. If it does not return anything, you either have not enabled IOMMU support properly or your hardware does not support it.<br />
<br />
#!/bin/bash<br />
shopt -s nullglob<br />
for d in /sys/kernel/iommu_groups/*/devices/*; do <br />
n=${d#*/iommu_groups/*}; n=${n%%/*}<br />
printf 'IOMMU Group %s ' "$n"<br />
lspci -nns "${d##*/}"<br />
done;<br />
<br />
Example output:<br />
<br />
IOMMU Group 0 00:00.0 Host bridge [0600]: Intel Corporation 2nd Generation Core Processor Family DRAM Controller [8086:0104] (rev 09)<br />
IOMMU Group 1 00:16.0 Communication controller [0780]: Intel Corporation 6 Series/C200 Series Chipset Family MEI Controller #1 [8086:1c3a] (rev 04)<br />
IOMMU Group 2 00:19.0 Ethernet controller [0200]: Intel Corporation 82579LM Gigabit Network Connection [8086:1502] (rev 04)<br />
IOMMU Group 3 00:1a.0 USB controller [0c03]: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 [8086:1c2d] (rev <br />
...<br />
<br />
An IOMMU group is the smallest set of physical devices that can be passed to a virtual machine. For instance, in the example above, both the GPU in 06:00.0 and its audio controller in 6:00.1 belong to IOMMU group 13 and can only be passed together. The frontal USB controller, however, has its own group (group 2) which is separate from both the USB expansion controller (group 10) and the rear USB controller (group 4), meaning that [[#USB controller|any of them could be passed to a VM without affecting the others]].<br />
<br />
=== Gotchas ===<br />
<br />
==== Plugging your guest GPU in an unisolated CPU-based PCIe slot ====<br />
<br />
Not all PCI-E slots are the same. Most motherboards have PCIe slots provided by both the CPU and the PCH. Depending on your CPU, it is possible that your processor-based PCIe slot does not support isolation properly, in which case the PCI slot itself will appear to be grouped with the device that is connected to it.<br />
<br />
IOMMU Group 1 00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port (rev 09)<br />
IOMMU Group 1 01:00.0 VGA compatible controller: NVIDIA Corporation GM107 [GeForce GTX 750] (rev a2)<br />
IOMMU Group 1 01:00.1 Audio device: NVIDIA Corporation Device 0fbc (rev a1)<br />
<br />
This is fine so long as only your guest GPU is included in here, such as above. Depending on what is plugged in to your other PCIe slots and whether they are allocated to your CPU or your PCH, you may find yourself with additional devices within the same group, which would force you to pass those as well. If you are ok with passing everything that is in there to your VM, you are free to continue. Otherwise, you will either need to try and plug your GPU in your other PCIe slots (if you have any) and see if those provide isolation from the rest or to install the ACS override patch, which comes with its own drawbacks. See [[#Bypassing the IOMMU groups (ACS override patch)]] for more information.<br />
<br />
{{Note|If they are grouped with other devices in this manner, pci root ports and bridges should neither be bound to vfio at boot, nor be added to the VM.}}<br />
<br />
== Isolating the GPU ==<br />
<br />
In order to assign a device to a virtual machine, this device and all those sharing the same IOMMU group must have their driver replaced by a stub driver or a VFIO driver in order to prevent the host machine from interacting with them. In the case of most devices, this can be done on the fly right before the VM starts.<br />
<br />
However, due to their size and complexity, GPU drivers do not tend to support dynamic rebinding very well, so you cannot just have some GPU you use on the host be transparently passed to a VM without having both drivers conflict with each other. Because of this, it is generally advised to bind those placeholder drivers manually before starting the VM, in order to stop other drivers from attempting to claim it.<br />
<br />
The following section details how to configure a GPU so those placeholder drivers are bound early during the boot process, which makes said device inactive until a VM claims it or the driver is switched back. This is the preferred method, considering it has less caveats than switching drivers once the system is fully online.<br />
<br />
{{Warning|Once you reboot after this procedure, whatever GPU you have configured will no longer be usable on the host until you reverse the manipulation. Make sure the GPU you intend to use on the host is properly configured before doing this - your motherboard should be set to display using the host GPU.}}<br />
<br />
Starting with Linux 4.1, the kernel includes vfio-pci. This is a VFIO driver, meaning it fulfills the same role as pci-stub did, but it can also control devices to an extent, such as by switching them into their D3 state when they are not in use.<br />
<br />
{{hc|$ modinfo vfio-pci|<br />
filename: /lib/modules/4.4.5-1-ARCH/kernel/drivers/vfio/pci/vfio-pci.ko.gz<br />
description: VFIO PCI - User Level meta-driver<br />
author: Alex Williamson <alex.williamson@redhat.com><br />
...<br />
}}<br />
<br />
Vfio-pci normally targets PCI devices by ID, meaning you only need to specify the IDs of the devices you intend to passthrough. For the following IOMMU group, you would want to bind vfio-pci with {{ic|10de:13c2}} and {{ic|10de:0fbb}}, which will be used as example values for the rest of this section.<br />
<br />
IOMMU Group 13 06:00.0 VGA compatible controller: NVIDIA Corporation GM204 [GeForce GTX 970] [10de:13c2] (rev a1)<br />
IOMMU Group 13 06:00.1 Audio device: NVIDIA Corporation GM204 High Definition Audio Controller [10de:0fbb] (rev a1)}}<br />
<br />
{{Note|You cannot specify which device to isolate using vendor-device ID pairs if the host GPU and the guest GPU share the same pair (i.e : if both are the same model). If this is your case, read [[#Using identical guest and host GPUs]] instead.}}<br />
<br />
You can then add those vendor-device ID pairs to the default parameters passed to vfio-pci whenever it is inserted into the kernel.<br />
<br />
{{hc|/etc/modprobe.d/vfio.conf|2=<br />
options vfio-pci ids=10de:13c2,10de:0fbb<br />
}}<br />
<br />
{{Note|If, as noted in [[#Plugging your guest GPU in an unisolated CPU-based PCIe slot]], your pci root port is part of your IOMMU group, you '''must not''' pass its ID to {{ic|vfio-pci}}, as it needs to remain attached to the host to function properly. Any other device within that group, however, should be left for {{ic|vfio-pci}} to bind with.}}<br />
<br />
This, however, does not guarantee that vfio-pci will be loaded before other graphics drivers. To ensure that, we need to statically bind it in the kernel image alongside with its dependencies. That means adding, in this order, {{ic|vfio_pci}}, {{ic|vfio}}, {{ic|vfio_iommu_type1}}, and {{ic|vfio_virqfd}} to [[mkinitcpio]]:<br />
<br />
{{hc|/etc/mkinitcpio.conf|2=<br />
MODULES=(... vfio_pci vfio vfio_iommu_type1 vfio_virqfd ...)<br />
}}<br />
<br />
{{Note|If you also have another driver loaded this way for [[Kernel mode setting#Early KMS start|early modesetting]] (such as {{ic|nouveau}}, {{ic|radeon}}, {{ic|amdgpu}}, {{ic|i915}}, etc.), all of the aforementioned VFIO modules must precede it.}}<br />
<br />
Also, ensure that the modconf hook is included in the HOOKS list of {{ic|mkinitcpio.conf}}:<br />
<br />
{{hc|/etc/mkinitcpio.conf|2=<br />
HOOKS=(... modconf ...)<br />
}}<br />
<br />
Since new modules have been added to the initramfs configuration, you must [[regenerate the initramfs]]. Should you change the IDs of the devices in {{ic|/etc/modprobe.d/vfio.conf}}, you will also have to regenerate it, as those parameters must be specified in the initramfs to be known during the early boot stages.<br />
<br />
Reboot and verify that vfio-pci has loaded properly and that it is now bound to the right devices.<br />
<br />
{{hc|$ dmesg {{!}} grep -i vfio|<br />
[ 0.329224] VFIO - User Level meta-driver version: 0.3<br />
[ 0.341372] vfio_pci: add [10de:13c2[ffff:ffff]] class 0x000000/00000000<br />
[ 0.354704] vfio_pci: add [10de:0fbb[ffff:ffff]] class 0x000000/00000000<br />
[ 2.061326] vfio-pci 0000:06:00.0: enabling device (0100 -> 0103)<br />
}}<br />
<br />
It is not necessary for all devices (or even expected device) from {{ic|vfio.conf}} to be in dmesg output. Sometimes a device does not appear in output at boot but actually is able to be visible and operatable in guest VM.<br />
<br />
{{hc|$ lspci -nnk -d 10de:13c2|<br />
06:00.0 VGA compatible controller: NVIDIA Corporation GM204 [GeForce GTX 970] [10de:13c2] (rev a1)<br />
Kernel driver in use: vfio-pci<br />
Kernel modules: nouveau nvidia<br />
}}<br />
<br />
{{hc|$ lspci -nnk -d 10de:0fbb|<br />
06:00.1 Audio device: NVIDIA Corporation GM204 High Definition Audio Controller [10de:0fbb] (rev a1)<br />
Kernel driver in use: vfio-pci<br />
Kernel modules: snd_hda_intel<br />
}}<br />
<br />
== Setting up an OVMF-based guest VM ==<br />
<br />
OVMF is an open-source UEFI firmware for QEMU virtual machines. While it is possible to use SeaBIOS to get similar results to an actual PCI passthough, the setup process is different and it is generally preferable to use the EFI method if your hardware supports it.<br />
<br />
=== Configuring libvirt ===<br />
<br />
[[Libvirt]] is a wrapper for a number of virtualization utilities that greatly simplifies the configuration and deployment process of virtual machines. In the case of KVM and QEMU, the frontend it provides allows us to avoid dealing with the permissions for QEMU and make it easier to add and remove various devices on a live VM. Its status as a wrapper, however, means that it might not always support all of the latest qemu features, which could end up requiring the use of a wrapper script to provide some extra arguments to QEMU.<br />
<br />
After installing {{Pkg|qemu}}, {{Pkg|qemu-block-iscsi}}, {{Pkg|libvirt}}, {{Pkg|ovmf}}, and {{Pkg|virt-manager}}, add the path to your OVMF firmware image and runtime variables template to your libvirt config so {{ic|virt-install}} or {{ic|virt-manager}} can find those later on.<br />
<br />
{{hc|/etc/libvirt/qemu.conf|2=<br />
nvram = [<br />
"/usr/share/ovmf/x64/OVMF_CODE.fd:/usr/share/ovmf/x64/OVMF_VARS.fd"<br />
]<br />
}}<br />
<br />
You can now [[enable]] and [[start]] {{ic|libvirtd.service}} and its logging component {{ic|virtlogd.socket}}.<br />
<br />
=== Setting up the guest OS ===<br />
<br />
The process of setting up a VM using {{ic|virt-manager}} is mostly self-explanatory, as most of the process comes with fairly comprehensive on-screen instructions.<br />
<br />
If using {{ic|virt-manager}}, you have to add your user to the libvirt [[group]] to ensure authentication.<br />
<br />
However, you should pay special attention to the following steps :<br />
<br />
* When the VM creation wizard asks you to name your VM (final step before clicking "Finish"), check the "Customize before install" checkbox.<br />
* In the "Overview" section, [https://i.imgur.com/73r2ctM.png set your firmware to "UEFI"]. If the option is grayed out, make sure that you have correctly specified the location of your firmware in {{ic|/etc/libvirt/qemu.conf}} and restart {{ic|libvirtd.service}}.<br />
* In the "CPUs" section, change your CPU model to "host-passthrough". If it is not in the list, you will have to type it by hand. This will ensure that your CPU is detected properly, since it causes libvirt to expose your CPU capabilities exactly as they are instead of only those it recognizes (which is the preferred default behavior to make CPU behavior easier to reproduce). Without it, some applications may complain about your CPU being of an unknown model.<br />
* If you want to minimize IO overhead, go into "Add Hardware" and add a Controller for SCSI drives of the "VirtIO SCSI" model. You can then change the default IDE disk for a SCSI disk, which will bind to said controller.<br />
** Windows VMs will not recognize those drives by default, so you need to download the ISO containing the drivers from [https://fedorapeople.org/groups/virt/virtio-win/direct-downloads/latest-virtio/ here] and add an IDE (or SATA for Windows 8.1 and newer) CD-ROM storage device linking to said ISO, otherwise you will not be able to get Windows to recognize it during the installation process. When prompted to select a disk to install windows on, load the drivers contained on the CD-ROM under ''vioscsi''.<br />
<br />
The rest of the installation process will take place as normal using a standard QXL video adapter running in a window. At this point, there is no need to install additional drivers for the rest of the virtual devices, since most of them will be removed later on. Once the guest OS is done installing, simply turn off the virtual machine. It is possible you will be dropped into the UEFI menu instead of starting the installation upon powering your VM for the first time. Sometimes the correct ISO file was not automatically detected and you will need to manually specify the drive to boot. By typing exit and navigating to "boot manager" you will enter a menu that allows you to choose between devices.<br />
<br />
=== Attaching the PCI devices ===<br />
<br />
With the installation done, it is now possible to edit the hardware details in libvirt and remove virtual integration devices, such as the spice channel and virtual display, the QXL video adapter, the emulated mouse and keyboard and the USB tablet device. Since that leaves you with no input devices, you may want to bind a few USB host devices to your VM as well, but remember to '''leave at least one mouse and/or keyboard assigned to your host''' in case something goes wrong with the guest. At this point, it also becomes possible to attach the PCI device that was isolated earlier; simply click on "Add Hardware" and select the PCI Host Devices you want to passthrough. If everything went well, the screen plugged into your GPU should show the OVMF splash screen and your VM should start up normally. From there, you can setup the drivers for the rest of your VM.<br />
<br />
=== Gotchas ===<br />
<br />
==== Using a non-EFI image on an OVMF-based VM ====<br />
<br />
The OVMF firmware does not support booting off non-EFI mediums. If the installation process drops you in a UEFI shell right after booting, you may have an invalid EFI boot media. Try using an alternate linux/windows image to determine if you have an invalid media.<br />
<br />
== Performance tuning ==<br />
<br />
Most use cases for PCI passthroughs relate to performance-intensive domains such as video games and GPU-accelerated tasks. While a PCI passthrough on its own is a step towards reaching native performance, there are still a few ajustments on the host and guest to get the most out of your VM.<br />
<br />
=== CPU pinning ===<br />
<br />
The default behavior for KVM guests is to run operations coming from the guest as a number of threads representing virtual processors. Those threads are managed by the Linux scheduler like any other thread and are dispatched to any available CPU cores based on niceness and priority queues. As such, the local CPU cache benefits (L1/L2) are lost each time the host scheduler reschedules the virtual CPU thread on a different physical CPU. This can noticeably harm performance on the guest. CPU pinning aims to resolve this by limiting which physical CPUs the virtual CPUs are allowed to run on. The ideal setup is a one to one mapping such that the virtual CPU cores match physical CPU cores while taking hyperthreading/SMT into account.<br />
<br />
{{Note|For certain users enabling CPU pinning may introduce stuttering and short hangs, especially with the MuQSS scheduler (present in linux-ck and linux-zen kernels). You might want to try disabling pinning first if you experience similar issues, which effectively trades maximum performance for responsiveness at all times.}}<br />
<br />
==== CPU topology ====<br />
<br />
Most modern CPU's support hardware multitasking, also known as hyper-threading on Intel CPU's or SMT on AMD CPU's. Hyper-threading/SMT is simply a very efficient way of running two threads on one CPU core at any given time. You will want to take into consideration that the CPU pinning you choose will greatly depend on what you do with your host while your VM is running.<br />
<br />
To find the topology for your CPU run {{ic|1=lscpu -e}}:<br />
<br />
{{Note|Pay special attention to the 4th column '''"CORE"''' as this shows the association of the Physical/Logical CPU cores}}<br />
<br />
{{ic|lscpu -e}} on a 6c/12t Ryzen 5 1600:<br />
<br />
CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE MAXMHZ MINMHZ<br />
0 0 0 0 0:0:0:0 yes 3800.0000 1550.0000<br />
1 0 0 0 0:0:0:0 yes 3800.0000 1550.0000<br />
2 0 0 1 1:1:1:0 yes 3800.0000 1550.0000<br />
3 0 0 1 1:1:1:0 yes 3800.0000 1550.0000<br />
4 0 0 2 2:2:2:0 yes 3800.0000 1550.0000<br />
5 0 0 2 2:2:2:0 yes 3800.0000 1550.0000<br />
6 0 0 3 3:3:3:1 yes 3800.0000 1550.0000<br />
7 0 0 3 3:3:3:1 yes 3800.0000 1550.0000<br />
8 0 0 4 4:4:4:1 yes 3800.0000 1550.0000<br />
9 0 0 4 4:4:4:1 yes 3800.0000 1550.0000<br />
10 0 0 5 5:5:5:1 yes 3800.0000 1550.0000<br />
11 0 0 5 5:5:5:1 yes 3800.0000 1550.0000<br />
<br />
{{ic|lscpu -e}} on a 6c/12t Intel 8700k:<br />
<br />
CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE MAXMHZ MINMHZ<br />
0 0 0 0 0:0:0:0 yes 4600.0000 800.0000<br />
1 0 0 1 1:1:1:0 yes 4600.0000 800.0000<br />
2 0 0 2 2:2:2:0 yes 4600.0000 800.0000<br />
3 0 0 3 3:3:3:0 yes 4600.0000 800.0000<br />
4 0 0 4 4:4:4:0 yes 4600.0000 800.0000<br />
5 0 0 5 5:5:5:0 yes 4600.0000 800.0000<br />
6 0 0 0 0:0:0:0 yes 4600.0000 800.0000<br />
7 0 0 1 1:1:1:0 yes 4600.0000 800.0000<br />
8 0 0 2 2:2:2:0 yes 4600.0000 800.0000<br />
9 0 0 3 3:3:3:0 yes 4600.0000 800.0000<br />
10 0 0 4 4:4:4:0 yes 4600.0000 800.0000<br />
11 0 0 5 5:5:5:0 yes 4600.0000 800.0000<br />
<br />
As we see above, with AMD '''Core 0''' is sequential with '''CPU 0 & 1''', whereas Intel places '''Core 0''' on '''CPU 0 & 6'''.<br />
<br />
If you don't need all cores for the guest, it would then be preferable to leave at the very least one core for the host. Choosing which cores one to use for the host or guest should be based on the specific hardware characteristics of your CPU, however '''Core 0''' is a good choice for the host in most cases. If any cores are reserved for the host, it is recommended to pin the emulator and iothreads, if used, to the host cores rather than the VCPUs. This may improve performance and reduce latency for the guest since those threads will not pollute the cache or contend for scheduling with the guest VCPU threads. If all cores are passed to the guest, there is no need or benefit to pinning the emulator or iothreads.<br />
<br />
<br />
==== XML examples ====<br />
<br />
{{Note|Do not use the '''iothread''' lines from the XML examples shown below if you have not added an '''iothread''' to your disk controller. '''iothread''''s only work on '''virtio-scsi''' or '''virtio-blk''' devices.}}<br />
<br />
===== 4c/1t CPU w/o Hyperthreading Example =====<br />
<br />
{{hc|$ virsh edit [vmname]|<nowiki><br />
...<br />
<vcpu placement='static'>4</vcpu><br />
<cputune><br />
<vcpupin vcpu='0' cpuset='0'/><br />
<vcpupin vcpu='1' cpuset='1'/><br />
<vcpupin vcpu='2' cpuset='2'/><br />
<vcpupin vcpu='3' cpuset='3'/><br />
</cputune><br />
...<br />
</nowiki>}}<br />
<br />
===== 4c/2t Intel CPU pinning example =====<br />
<br />
{{hc|$ virsh edit [vmname]|<nowiki><br />
...<br />
<vcpu placement='static'>8</vcpu><br />
<iothreads>1</iothreads><br />
<cputune><br />
<vcpupin vcpu='0' cpuset='2'/><br />
<vcpupin vcpu='1' cpuset='8'/><br />
<vcpupin vcpu='2' cpuset='3'/><br />
<vcpupin vcpu='3' cpuset='9'/><br />
<vcpupin vcpu='4' cpuset='4'/><br />
<vcpupin vcpu='5' cpuset='10'/><br />
<vcpupin vcpu='6' cpuset='5'/><br />
<vcpupin vcpu='7' cpuset='11'/><br />
<emulatorpin cpuset='0,6'/><br />
<iothreadpin iothread='1' cpuset='0,6'/><br />
</cputune><br />
...<br />
<topology sockets='1' cores='4' threads='2'/><br />
...<br />
</nowiki>}}<br />
<br />
===== 4c/2t AMD CPU example =====<br />
<br />
{{hc|$ virsh edit [vmname]|<nowiki><br />
...<br />
<vcpu placement='static'>8</vcpu><br />
<iothreads>1</iothreads><br />
<cputune><br />
<vcpupin vcpu='0' cpuset='2'/><br />
<vcpupin vcpu='1' cpuset='3'/><br />
<vcpupin vcpu='2' cpuset='4'/><br />
<vcpupin vcpu='3' cpuset='5'/><br />
<vcpupin vcpu='4' cpuset='6'/><br />
<vcpupin vcpu='5' cpuset='7'/><br />
<vcpupin vcpu='6' cpuset='8'/><br />
<vcpupin vcpu='7' cpuset='9'/><br />
<emulatorpin cpuset='0-1'/><br />
<iothreadpin iothread='1' cpuset='0-1'/><br />
</cputune><br />
...<br />
<topology sockets='1' cores='4' threads='2'/><br />
...<br />
</nowiki>}}<br />
<br />
{{Note|If further CPU isolation is needed, consider using the '''isolcpus''' kernel command-line parameter on the unused physical/logical cores.}}<br />
<br />
If you do not intend to be doing any computation-heavy work on the host (or even anything at all) at the same time as you would on the VM, you may want to pin your VM threads across all of your cores, so that the VM can fully take advantage of the spare CPU time the host has available. Be aware that pinning all physical and logical cores of your CPU could induce latency in the guest VM.<br />
<br />
=== Huge memory pages ===<br />
<br />
When dealing with applications that require large amounts of memory, memory latency can become a problem since the more memory pages are being used, the more likely it is that this application will attempt to access information across multiple memory "pages", which is the base unit for memory allocation. Resolving the actual address of the memory page takes multiple steps, and so CPUs normally cache information on recently used memory pages to make subsequent uses on the same pages faster. Applications using large amounts of memory run into a problem where, for instance, a virtual machine uses 4 GiB of memory divided into 4 KiB pages (which is the default size for normal pages) for a total of 1.04 million pages, meaning that such cache misses can become extremely frequent and greatly increase memory latency. Huge pages exist to mitigate this issue by giving larger individual pages to those applications, increasing the odds that multiple operations will target the same page in succession.<br />
<br />
==== Transparent huge pages ====<br />
<br />
QEMU will use 2MiB sized transparent huge pages automatically without any explicit configuration in QEMU or Libvirt, subject to some important caveats. When using VFIO the pages are locked in at boot time and transparent huge pages are allocated up front when the VM first boots. If the kernel memory is highly fragmented, or the VM is using a majority of the remaining free memory, it is likely that the kernel will not have enough 2MiB pages to fully satisfy the allocation. In such a case, it silently fails by using a mix of 2MiB and 4KiB pages. Since the pages are locked in VFIO mode, the kernel will not be able to convert those 4KiB pages to huge after the VM starts either. The number of available 2MiB huge pages available to THP is the same as via the [[#Dynamic huge pages]] mechanism described in the following sections.<br />
<br />
To check how much memory THP is using globally:<br />
<br />
$ grep AnonHugePages /proc/meminfo<br />
AnonHugePages: 8091648 kB<br />
<br />
To check a specific QEMU instance. QEMU's PID must be substituted in the grep command:<br />
<br />
$ grep -P 'AnonHugePages:\s+(?!0)\d+' /proc/[PID]/smaps<br />
AnonHugePages: 8087552 kB<br />
<br />
In this example, the VM was allocated 8388608KiB of memory, but only 8087552KiB was available via THP. The remaining 301056KiB are allocated as 4KiB pages. Aside from manually checking, there is no indication when partial allocations occur. As such, THP's effectiveness is very much dependent on the host system's memory fragmentation at the time of VM startup. If this trade off is unacceptable or strict guarantees are required, [[#Static huge pages]] is recommended.<br />
<br />
Arch kernels have THP compiled in and enabled by default with {{ic|1=/sys/kernel/mm/transparent_hugepage/enabled}} set to {{ic|1=madvise}} mode.<br />
<br />
==== Static huge pages ====<br />
<br />
While transparent huge pages should work in the vast majority of cases, they can also be allocated statically during boot. This should only be needed to make use 1 GiB hugepages on machines that support it, since transparent huge pages normally only go up to 2 MiB.<br />
<br />
{{Warning|Static huge pages lock down the allocated amount of memory, making it unavailable for applications that are not configured to use them. Allocating 4 GiBs worth of huge pages on a machine with 8 GiB of memory will only leave you with 4 GiB of available memory on the host '''even when the VM is not running'''.}}<br />
<br />
To allocate huge pages at boot, one must simply specify the desired amount on their kernel command line with {{ic|1=hugepages=''x''}}. For instance, reserving 1024 pages with {{ic|1=hugepages=1024}} and the default size of 2048 KiB per huge page creates 2 GiB worth of memory for the virtual machine to use.<br />
<br />
If supported by CPU page size could be set manually. 1 GiB huge page support could be verified by {{ic|grep pdpe1gb /proc/cpuinfo}}. Setting 1 GiB huge page size via kernel parameters : {{ic|1=default_hugepagesz=1G hugepagesz=1G hugepages=X}}.<br />
<br />
Also, since static huge pages can only be used by applications that specifically request it, you must add this section in your libvirt domain configuration to allow kvm to benefit from them :<br />
<br />
{{hc|$ virsh edit [vmname]|<nowiki><br />
...<br />
<memoryBacking><br />
<hugepages/><br />
</memoryBacking><br />
...<br />
</nowiki>}}<br />
<br />
==== Dynamic huge pages ====<br />
<br />
{{Accuracy|Need futher testing if this variant as effective as static one}}<br />
<br />
Hugepages could be allocated manually via {{ic|vm.nr_overcommit_hugepages}} [[sysctl]] parameter.<br />
<br />
{{hc|/etc/sysctl.d/10-kvm.conf|2=<br />
vm.nr_hugepages = 0<br />
vm.nr_overcommit_hugepages = ''num''<br />
}}<br />
<br />
Where {{ic|''num''}} - is the number of huge pages, which default size if 2 MiB.<br />
Pages will be automatically allocated, and freed after VM stops.<br />
<br />
More manual way:<br />
<br />
# echo ''num'' > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages<br />
# echo ''num'' > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages<br />
<br />
For 2 MiB and 1 GiB page size respectively.<br />
And they should be manually freed in the same way.<br />
<br />
It is hardly recommended to drop caches, compact memory and wait couple of seconds before starting VM, as there could be not enough free contiguous memory for required huge pages blocks. Especially after some uptime of the host system.<br />
<br />
# echo 3 > /proc/sys/vm/drop_caches<br />
# echo 1 > /proc/sys/vm/compact_memory<br />
<br />
Theoretically, 1 GiB pages works as 2 MiB. But practically - no guaranteed way was found to get contiguous 1 GiB memory blocks. Each consequent request of 1 GiB blocks lead to lesser and lesser dynamically allocated count.<br />
<br />
=== CPU frequency governor ===<br />
<br />
Depending on the way your [[CPU frequency scaling|CPU governor]] is configured, the VM threads may not hit the CPU load thresholds for the frequency to ramp up. Indeed, KVM cannot actually change the CPU frequency on its own, which can be a problem if it does not scale up with vCPU usage as it would result in underwhelming performance. An easy way to see if it behaves correctly is to check if the frequency reported by {{ic|watch lscpu}} goes up when running a CPU-intensive task on the guest. If you are indeed experiencing stutter and the frequency does not go up to reach its reported maximum, it may be due to [https://lime-technology.com/forum/index.php?topic=46664.msg447678#msg447678 cpu scaling being controlled by the host OS]. In this case, try setting all cores to maximum frequency to see if this improves performance. Note that if you are using a modern intel chip with the default pstate driver, cpupower commands will be [[CPU frequency scaling#CPU frequency driver|ineffective]], so monitor {{ic|/proc/cpuinfo}} to make sure your cpu is actually at max frequency.<br />
<br />
=== CPU pinning with isolcpus ===<br />
<br />
Alternatively, make sure that you have isolated CPUs properly. In this example, let us assume you are using CPUs 4-7.<br />
Use the kernel parameters {{ic|isolcpus nohz_full rcu_nocbs}} to completely isolate the CPUs from the kernel. <br />
<br />
{{hc|/etc/defaults/grub|2=<br />
...<br />
GRUB_CMDLINE_LINUX="..your other params.. isolcpus=4-7 nohz_full=4-7 rcu_nocbs=4-7"<br />
...<br />
</nowiki>}}<br />
<br />
Then, run {{ic|qemu-system-x86_64}} with taskset and chrt:<br />
<br />
# chrt -r 1 taskset -c 4-7 qemu-system-x86_64 ...<br />
<br />
The {{ic|chrt}} command will ensure that the task scheduler will round-robin distribute work (otherwise it will all stay on the first cpu). For {{ic|taskset}}, the CPU numbers can be comma- and/or dash-separated, like "0,1,2,3" or "0-4" or "1,7-8,10" etc.<br />
<br />
See [https://www.reddit.com/r/VFIO/comments/6vgtpx/high_dpc_latency_and_audio_stuttering_on_windows/dm0sfto/ this reddit comment] for more info.<br />
<br />
=== Improving performance on AMD CPUs ===<br />
<br />
Previously, Nested Page Tables (NPT) had to be disabled on AMD systems running KVM to improve GPU performance because of a [https://sourceforge.net/p/kvm/bugs/230/ very old bug], but the trade off was decreased CPU performance, including stuttering.<br />
<br />
There is a [https://patchwork.kernel.org/patch/10027525/ kernel patch] that resolves this issue, which was accepted into kernel 4.14-stable and 4.9-stable. If you are running the official {{Pkg|linux}} or {{Pkg|linux-lts}} kernel the patch has already been applied (make sure you are on the latest). If you are running another kernel you might need to manually patch yourself.<br />
<br />
{{Note|Several Ryzen users (see [https://www.reddit.com/r/VFIO/comments/78i3jx/possible_fix_for_the_npt_issue_discussed_on_iommu/ this Reddit thread]) have tested the patch, and can confirm that it works, bringing GPU passthrough performance up to near native quality.}}<br />
<br />
Starting with QEMU 3.1 the TOPOEXT cpuid flag is disabled by default. In order to use hyperthreading(SMT) on AMD CPU's you need to manually enable it:<br />
<br />
<cpu mode='host-passthrough' check='none'><br />
<topology sockets='1' cores='4' threads='2'/><br />
<feature policy='require' name='topoext'/><br />
</cpu><br />
<br />
commit: https://git.qemu.org/?p=qemu.git;a=commit;h=7210a02c58572b2686a3a8d610c6628f87864aed<br />
<br />
=== Further tuning ===<br />
<br />
More specialized VM tuning tips are available at [https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html-single/Virtualization_Tuning_and_Optimization_Guide/index.html Red Hat's Virtualization Tuning and Optimization Guide]. This guide may be especially helpful if you are experiencing:<br />
<br />
* Moderate CPU load on the host during downloads/uploads from within the guest: See ''Bridge Zero Copy Transmit'' for a potential fix.<br />
* Guests capping out at certain network speeds during downloads/uploads despite virtio-net being used: See ''Multi-queue virtio-net'' for a potential fix.<br />
* Guests "stuttering" under high I/O, despite the same workload not affecting hosts to the same degree: See ''Multi-queue virtio-scsi'' for a potential fix.<br />
<br />
== Special procedures ==<br />
<br />
Certain setups require specific configuration tweaks in order to work properly. If you are having problems getting your host or your VM to work properly, see if your system matches one of the cases below and try adjusting your configuration accordingly.<br />
<br />
=== Using identical guest and host GPUs ===<br />
<br />
{{Expansion|A number of users have been having issues with this, it should probably be adressed by the article.|Talk:PCI passthrough via OVMF#Additionnal sections}}<br />
<br />
Due to how vfio-pci uses your vendor and device id pair to identify which device they need to bind to at boot, if you have two GPUs sharing such an ID pair you will not be able to get your passthough driver to bind with just one of them. This sort of setup makes it necessary to use a script, so that whichever driver you are using is instead assigned by pci bus address using the {{ic|driver_override}} mechanism.<br />
<br />
==== Script variants ====<br />
<br />
===== Passthrough all GPUs but the boot GPU =====<br />
<br />
Here, we will make a script to bind vfio-pci to all GPUs but the boot gpu. Create the script {{ic|/usr/bin/vfio-pci-override.sh}}:<br />
<br />
{{bc|<nowiki><br />
#!/bin/sh<br />
<br />
for i in /sys/devices/pci*/*/boot_vga; do<br />
if [ $(cat "$i") -eq 0 ]; then<br />
GPU="${i%/boot_vga}"<br />
AUDIO="$(echo "$GPU" | sed -e "s/0$/1/")"<br />
echo "vfio-pci" > "$GPU/driver_override"<br />
if [ -d "$AUDIO" ]; then<br />
echo "vfio-pci" > "$AUDIO/driver_override"<br />
fi<br />
fi<br />
done<br />
<br />
modprobe -i vfio-pci<br />
</nowiki>}}<br />
<br />
===== Passthrough selected GPU =====<br />
<br />
In this case we manually specify the GPU to bind.<br />
<br />
{{bc|<nowiki><br />
#!/bin/sh<br />
<br />
GROUP="0000:00:03.0"<br />
DEVS="0000:03:00.0 0000:03:00.1 ."<br />
<br />
if [ ! -z "$(ls -A /sys/class/iommu)" ]; then<br />
for DEV in $DEVS; do<br />
echo "vfio-pci" > /sys/bus/pci/devices/$GROUP/$DEV/driver_override<br />
done<br />
fi<br />
<br />
modprobe -i vfio-pci<br />
</nowiki>}}<br />
<br />
==== Script installation ====<br />
<br />
Create {{ic|/etc/modprobe.d/vfio.conf}} with the following:<br />
<br />
install vfio-pci /usr/bin/vfio-pci-override.sh<br />
<br />
Edit {{ic|/etc/mkinitcpio.conf}}<br />
<br />
Remove any video drivers from MODULES, and add {{ic|vfio-pci}}, and {{ic|vfio_iommu_type1}}<br />
<br />
MODULES=(ext4 vfat vfio-pci vfio_iommu_type1)<br />
<br />
Add {{ic|/etc/modprobe.d/vfio.conf}} and {{ic|/usr/bin/vfio-pci-override.sh}} to FILES:<br />
<br />
FILES=(/etc/modprobe.d/vfio.conf /usr/bin/vfio-pci-override.sh)<br />
<br />
[[Regenerate the initramfs]] and reboot:<br />
<br />
=== Passing the boot GPU to the guest ===<br />
<br />
{{Expansion|This is related to VBIOS issues and should be moved into a separate section regarding VBIOS compatibility.|section=UEFI (OVMF) Compatibility in VBIOS}}<br />
<br />
The GPU marked as {{ic|boot_vga}} is a special case when it comes to doing PCI passthroughs, since the BIOS needs to use it in order to display things like boot messages or the BIOS configuration menu. To do that, it makes [https://www.redhat.com/archives/vfio-users/2016-May/msg00224.html a copy of the VGA boot ROM which can then be freely modified]. This modified copy is the version the system gets to see, which the passthrough driver may reject as invalid. As such, it is generally recommended to change the boot GPU in the BIOS configuration so the host GPU is used instead or, if that is not possible, to swap the host and guest cards in the machine itself.<br />
<br />
=== Using Looking Glass to stream guest screen to the host ===<br />
<br />
It is possible to make VM share the monitor, and optionally a keyboard and a mouse with a help of [https://looking-glass.hostfission.com/ Looking Glass].<br />
<br />
==== Adding IVSHMEM Device to VM ====<br />
<br />
Looking glass works by creating a shared memory buffer between a host and a guest. This is a lot faster than streaming frames via localhost, but requires additional setup. <br />
<br />
With your VM turned off open the machine configuration<br />
<br />
{{hc|$ virsh edit [vmname]|<nowiki><br />
...<br />
<devices><br />
...<br />
<shmem name='looking-glass'><br />
<model type='ivshmem-plain'/><br />
<size unit='M'>32</size><br />
</shmem><br />
</devices><br />
...<br />
</nowiki>}}<br />
<br />
You should replace 32 with your own calculated value based on what resolution you are going to pass through. It can be calculated like this:<br />
<br />
width x height x 4 x 2 = total bytes<br />
total bytes / 1024 / 1024 = total mebibytes + 2<br />
<br />
For example, in case of 1920x1080<br />
<br />
1920 x 1080 x 4 x 2 = 16,588,800 bytes<br />
16,588,800 / 1024 / 1024 = 15.82 MiB + 2 = 17.82<br />
<br />
The result must be '''rounded up''' to the nearest power of two, and since 17.82 is bigger than 16 we should choose 32.<br />
<br />
Next create a script to create a shared memory file.<br />
<br />
{{hc|/usr/local/bin/looking-glass-init.sh|2=<br />
#!/bin/sh<br />
<br />
touch /dev/shm/looking-glass<br />
chown '''user''':kvm /dev/shm/looking-glass<br />
chmod 660 /dev/shm/looking-glass<br />
}}<br />
<br />
Replace user with your username.<br />
<br />
Remember to make the script [[executable]].<br />
<br />
Create a [[systemd]] unit to execute this script during boot<br />
<br />
{{hc|/etc/systemd/system/looking-glass-init.service|2=<br />
[Unit]<br />
Description=Create shared memory for looking glass<br />
<br />
[Service]<br />
Type=oneshot<br />
ExecStart=/usr/local/bin/looking-glass-init.sh<br />
<br />
[Install]<br />
WantedBy=multi-user.target<br />
}}<br />
<br />
[[Start]] and [[enable]] {{ic|looking-glass-init.service}}<br />
<br />
==== Installing the IVSHMEM Host to Windows guest ====<br />
<br />
Currently Windows would not notify users about a new IVSHMEM device, it would silently install a dummy driver. To actually enable the device you have to go into device manager and update the driver for the device under the "System Devices" node for '''"PCI standard RAM Controller"'''. Download a signed driver [https://github.com/virtio-win/kvm-guest-drivers-windows/issues/217 from issue 217], as it is not yet available elsewhere.<br />
<br />
Once the driver is installed you must download the latest [https://looking-glass.hostfission.com/downloads looking-glass-host] from the looking glass website and start it on your guest. In order to run it you would also need to install Microsoft Visual C++ Redistributable from [https://www.visualstudio.com/downloads/ Microsoft]<br />
It is also possible to make it start automatically on VM boot by editing the {{ic|HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Run}} registry and adding a path to the downloaded executable.<br />
<br />
==== Getting a client ====<br />
<br />
Looking glass client can be installed from AUR using {{AUR|looking-glass}} or {{AUR|looking-glass-git}} packages.<br />
<br />
You can start it once the VM is set up and running<br />
<br />
$ looking-glass-client<br />
<br />
If you do not want to use Spice to control the guest mouse and keyboard you can disable the Spice server.<br />
<br />
$ looking-glass-client -s<br />
<br />
Additionally you may want to start Looking Glass Client as full screen, otherwise the image may be scaled down resulting in poor image fidelity.<br />
<br />
$ looking-glass-client -F<br />
<br />
Launch with the {{ic|--help}} option for further information.<br />
<br />
=== Swap peripherals to and from the Host ===<br />
<br />
Looking Glass includes a Spice client in order to control mouse movement on the Windows guest. However this may have too much latency for certain applications, such as gaming. An alternative method is passing through specific USB devices for minimal latency. This allows for switching the devices between host and guest.<br />
<br />
First create a .xml file for the device(s) you wish to pass-through, which libvirt will use to identify the device.<br />
<br />
{{hc|/home/$USER/.VFIOinput/input_1.xml|2=<br />
<hostdev mode='subsystem' type='usb' managed='no'><br />
<source><br />
<vendor id='0x[Before Colon]'/><br />
<product id='0x[After Colon]'/><br />
</source><br />
</hostdev><br />
}}<br />
<br />
Replace [Before/After Colon] with the contents of the 'lsusb' command, specific to the device you want to pass-through.<br />
<br />
For instance my mouse is {{ic|Bus 005 Device 002: ID 1532:0037 Razer USA, Ltd}} so I would replace {{ic|vendor id}} with 1532, and {{ic|product id}} with 1037.<br />
<br />
Repeat this process for any additional USB devices you want to pass-through. If your mouse / keyboard has multiple entries in {{ic|lsusb}}, perhaps if it is wireless, then create additional xml files for each.<br />
<br />
{{Note|Do not forget to change the path & name of the script(s) above and below to match your user and specific system.}}<br />
<br />
Next a bash script file is needed to tell libvirt what to attach/detach the USB devices to the guest.<br />
<br />
{{hc|/home/$USER/.VFIOinput/input_attach.sh|2=<br />
#!/bin/bash<br />
<br />
virsh attach-device [VM-Name] [USBdevice]<br />
}}<br />
<br />
Replace [VM-Name] with the name of your virtual machine, which can be seen under virt-manager. Additionally replace [USBdevice] with the '''full''' path to the .xml file for the device you wish to pass-through. Add additional lines for more than 1 device. For example here is my script:<br />
<br />
{{hc|/home/$USER/.VFIOinput/input_attach.sh|2=<br />
#!/bin/bash<br />
<br />
virsh attach-device win10 /home/$USER/.VFIOinput/input_mouse.xml<br />
virsh attach-device win10 /home/$USER/.VFIOinput/input_keyboard.xml<br />
}}<br />
<br />
Next duplicate the script file and replace {{ic|attach-device}} with {{ic|detach-device}}. Ensure both scripts are executable with {{ic|chmod +x $script.sh}}<br />
<br />
This 2 script files can now be executed to attach or detach your USB devices from the host to the guest VM. It is important to note that they may need to be executed as root. To run the script from the Windows VM, one possibility is using [[PuTTY]] to [[SSH]] into the host, and execute the script. On Windows PuTTY comes with plink.exe which can execute singular commands over SSH before then logging out, instead of opening a SSH terminal, all in the background.<br />
<br />
{{hc|detach_devices.bat|2=<br />
"C:\Program Files\PuTTY\plink.exe" root@$HOST_IP -pw $ROOTPASSWORD /home/$USER/.VFIOinput/input_detach.sh<br />
}}<br />
<br />
Replace {{ic|$HOST_IP}} with the Host [[Network configuration#IP addresses|IP Address]] and $ROOTPASSWORD with the root password.<br />
<br />
{{warning|This method is insecure if somebody has access to your VM, since they could open the file and read your password. It is advisable to use [[SSH keys]] instead!}}<br />
<br />
You may also want to execute the script files using key binds. On Windows one option is [https://autohotkey.com/ Autohotkey], and on the Host [[Xbindkeys]]. Because of the need to run the scripts as root, you may also need to use [[Polkit]] which can be used to authenticate specific executables as able to run as root without needing a password.<br />
<br />
=== Bypassing the IOMMU groups (ACS override patch) ===<br />
<br />
If you find your PCI devices grouped among others that you do not wish to pass through, you may be able to seperate them using Alex Williamson's ACS override patch. Make sure you understand [https://vfio.blogspot.com/2014/08/iommu-groups-inside-and-out.html the potential risk] of doing so. <br />
<br />
You will need a kernel with the patch applied. The easiest method to acquiring this is through the {{AUR|linux-vfio}} package. <br />
<br />
In addition, the ACS override patch needs to be enabled with kernel command line options. The patch file adds the following documentation:<br />
<br />
pcie_acs_override =<br />
[PCIE] Override missing PCIe ACS support for:<br />
downstream<br />
All downstream ports - full ACS capabilties<br />
multifunction<br />
All multifunction devices - multifunction ACS subset<br />
id:nnnn:nnnn<br />
Specfic device - full ACS capabilities<br />
Specified as vid:did (vendor/device ID) in hex<br />
<br />
The option {{ic|1=pcie_acs_override=downstream}} is typically sufficient.<br />
<br />
After installation and configuration, reconfigure your [[Kernel parameters|bootloader kernel parameters]] to load the new kernel with the {{ic|1=pcie_acs_override=}} option enabled.<br />
<br />
== Plain QEMU without libvirt ==<br />
<br />
Instead of setting up a virtual machine with the help of libvirt, plain QEMU commands with custom parameters can be used for running the VM intended to be used with PCI passthrough. This is desirable for some use cases like scripted setups, where the flexibility for usage with other scripts is needed.<br />
<br />
To achieve this after [[#Setting up IOMMU]] and [[#Isolating the GPU]], follow the [[QEMU]] article to setup the virtualized environment, [[QEMU#Enabling KVM|enable KVM]] on it and use the flag {{ic|1=-device vfio-pci,host=07:00.0}} replacing the identifier (07:00.0) with your actual device's ID that you used for the GPU isolation earlier.<br />
<br />
For utilizing the OVMF firmware, make sure the {{Pkg|ovmf}} package is installed, copy the UEFI variables from {{ic|/usr/share/ovmf/x64/OVMF_VARS.fd}} to temporary location like {{ic|/tmp/MY_VARS.fd}} and finally specify the OVMF paths by appending the following parameters to the QEMU command (order matters):<br />
<br />
* {{ic|1=-drive if=pflash,format=raw,readonly,file=/usr/share/ovmf/x64/OVMF_CODE.fd}} for the actual OVMF firmware binary, note the readonly option<br />
* {{ic|1=-drive if=pflash,format=raw,file=/tmp/MY_VARS.fd}} for the variables<br />
<br />
{{Note|QEMU's default SeaBIOS can be used instead of OVMF, but it is not recommended as it can cause issues with passthrough setups.}}<br />
<br />
It is recommended to study the QEMU article for ways to enhance the performance by using the [[QEMU#Installing virtio drivers|virtio drivers]] and other further configurations for the setup.<br />
<br />
You also might have to use the {{ic|1=-cpu host,kvm=off}} parameter to forward the host's CPU model info to the VM and fool the virtualization detection used by Nvidia's and possibly other manufacturers' device drivers trying to block the full hardware usage inside a virtualized system.<br />
<br />
== Passing though other devices ==<br />
<br />
=== USB controller ===<br />
<br />
If your motherboard has multiple USB controllers mapped to multiple groups, it is possible to pass those instead of USB devices. Passing an actual controller over an individual USB device provides the following advantages : <br />
<br />
* If a device disconnects or changes ID over the course of an given operation (such as a phone undergoing an update), the VM will not suddenly stop seeing it.<br />
* Any USB port managed by this controller is directly handled by the VM and can have its devices unplugged, replugged and changed without having to notify the hypervisor.<br />
* Libvirt will not complain if one of the USB devices you usually pass to the guest is missing when starting the VM.<br />
<br />
Unlike with GPUs, drivers for most USB controllers do not require any specific configuration to work on a VM and control can normally be passed back and forth between the host and guest systems with no side effects.<br />
<br />
{{Warning|Make sure your USB controller supports resetting: [[#Passing through a device that does not support resetting]]}}<br />
<br />
You can find out which PCI devices correspond to which controller and how various ports and devices are assigned to each one of them using this command :<br />
<br />
{{hc|$ <nowiki>for usb_ctrl in $(find /sys/bus/usb/devices/usb* -maxdepth 0 -type l); do pci_path="$(dirname "$(realpath "${usb_ctrl}")")"; echo "Bus $(cat "${usb_ctrl}/busnum") --> $(basename $pci_path) (IOMMU group $(basename $(realpath $pci_path/iommu_group)))"; lsusb -s "$(cat "${usb_ctrl}/busnum"):"; echo; done</nowiki>|<br />
Bus 1 --> 0000:00:1a.0 (IOMMU group 4)<br />
Bus 001 Device 004: ID 04f2:b217 Chicony Electronics Co., Ltd Lenovo Integrated Camera (0.3MP)<br />
Bus 001 Device 007: ID 0a5c:21e6 Broadcom Corp. BCM20702 Bluetooth 4.0 [ThinkPad]<br />
Bus 001 Device 008: ID 0781:5530 SanDisk Corp. Cruzer<br />
Bus 001 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub<br />
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub<br />
<br />
Bus 2 --> 0000:00:1d.0 (IOMMU group 9)<br />
Bus 002 Device 006: ID 0451:e012 Texas Instruments, Inc. TI-Nspire Calculator<br />
Bus 002 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub<br />
Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub<br />
}}<br />
<br />
This laptop has 3 USB ports managed by 2 USB controllers, each with their own IOMMU group. In this example, Bus 001 manages a single USB port (with a SanDisk USB pendrive plugged into it so it appears on the list), but also a number of internal devices, such as the internal webcam and the bluetooth card. Bus 002, on the other hand, does not apprear to manage anything except for the calculator that is plugged into it. The third port is empty, which is why it does not show up on the list, but is actually managed by Bus 002.<br />
<br />
Once you have identified which controller manages which ports by plugging various devices into them and decided which one you want to passthrough, simply add it to the list of PCI host devices controlled by the VM in your guest configuration. No other configuration should be needed.<br />
<br />
{{Note|If your USB controller does not support resetting, is not in an isolated group, or is otherwise unable to be passed through then it may still be possible to accomplish similar results through [[udev]] rules. See [https://github.com/olavmrk/usb-libvirt-hotplug] which allows any device connected to specified USB ports to be automatically attached to a virtual machine.}}<br />
<br />
=== Passing VM audio to host via PulseAudio ===<br />
<br />
It is possible to route the virtual machine's audio to the host as an application using libvirt. This has the advantage of multiple audio streams being routable to one host output, and working with audio output devices that do not support passthrough. [[PulseAudio]] is required for this to work.<br />
<br />
First, remove the comment from the {{ic|1=#user = ""}} line. Then add your username in the quotations. This tells QEMU which user's pulseaudio stream to route through.<br />
<br />
{{hc|/etc/libvirt/qemu.conf|2=<br />
user = "example"<br />
}}<br />
<br />
Next, modify the libvirt configuration<br />
<br />
{{hc|$ virsh edit [vmname]|<br />
<domain type<nowiki>=</nowiki>'kvm'><br />
}}<br />
<br />
to<br />
<br />
{{hc|$ virsh edit [vmname]|<br />
<domain type<nowiki>=</nowiki>'kvm' xmlns:qemu<nowiki>='http://libvirt.org/schemas/domain/qemu/1.0'</nowiki>><br />
}}<br />
<br />
Then set the QEMU PulseAudio environment variables at the bottom of the libvirt xml file<br />
<br />
{{hc|$ virsh edit [vmname]|<br />
</devices><br />
</domain><br />
}}<br />
<br />
to<br />
<br />
{{hc|$ virsh edit [vmname]|<br />
</devices><br />
<qemu:commandline><br />
<qemu:env name<nowiki>=</nowiki>'QEMU_AUDIO_DRV' value<nowiki>=</nowiki>'pa'/><br />
<qemu:env name<nowiki>=</nowiki>'QEMU_PA_SERVER' value<nowiki>=</nowiki>'/run/user/1000/pulse/native'/><br />
</qemu:commandline><br />
</domain><br />
}}<br />
<br />
Change 1000 under the user directory to your user uid (which can be found by running the {{ic|id}} command. Remember to save the file and exit it without ending the process before continuing, otherwise the changes will not register. If you get the message {{ic|<nowiki>Domain [vmname] XML configuration edited.</nowiki>}} after exiting, it means that your changes have been applied.<br />
<br />
[[Restart]] {{ic|libvirtd.service}} and [[systemd/User|user service]] {{ic|pulseaudio.service}}.<br />
<br />
Virtual Machine audio will now be routed through the host as an application. The application {{Pkg|pavucontrol}} can be used to control the output device. Be aware that on Windows guests, this can cause audio crackling without [[#Slowed down audio pumped through HDMI on the video card|using Message-Signaled Interrupts.]]<br />
<br />
==== QEMU 3.0 audio changes ====<br />
<br />
As of QEMU 3.0 part of the audio patches have been merged ([https://www.reddit.com/r/VFIO/comments/97iuov/qemu_30_released/e49wmyd/ reddit link]). The {{AUR|qemu-patched}} package currently includes some additional audio patches, as some of the patches have not been officially up-streamed yet.<br />
<br />
You will need to change the chipset accordingly to how your VM is set up, i.e. {{ic|pc-q35-3.0}} or {{ic|pc-i440fx-3.0}} (after installing qemu 3.0) to use the new code paths:<br />
<br />
{{hc|$ virsh edit [vmname]|<br />
<domain type<nowiki>=</nowiki>'kvm'><br />
...<br />
<os><br />
<type arch<nowiki>=</nowiki>'x86_64' machine<nowiki>=</nowiki>'pc-q35-3.0'>hvm</type><br />
...<br />
</os><br />
}}<br />
<br />
{{hc|$ virsh edit [vmname]|<br />
<domain type<nowiki>=</nowiki>'kvm'><br />
...<br />
<os><br />
<type arch<nowiki>=</nowiki>'x86_64' machine<nowiki>=</nowiki>'pc-i440fx-3.0'>hvm</type><br />
...<br />
</os><br />
}}<br />
<br />
{{Note|<br />
* To speed up compilation time with {{AUR|qemu-patched}} use {{ic|1=--target-list=x86_64-softmmu}} to compile qemu with only x86_64 guest support.<br />
* Since Qemu 3.0 the XML arguments {{ic|qemu:env}} above are ''not'' needed if you run PulseAudio as your user and you have {{ic|1= nographics_allow_host_audio = 1}} enabled in {{ic|1=/etc/libvirt/qemu.conf}}. If you use a different user with QEMU/Libvirt, you will need to keep the {{ic|QEMU_PA_SERVER}} variable otherwise permission errors will occur.<br />
}}<br />
<br />
=== Physical disk/partition ===<br />
<br />
A whole disk or a partition may be used as a whole for improved I/O performance by adding an entry to the XML<br />
<br />
Virtio-BLK Example:<br />
{{hc|$ virsh edit [vmname]| <nowiki><br />
<devices><br />
...<br />
<disk type='block' device='disk'><br />
<driver name='qemu' type='raw' cache='none' io='native'/><br />
<source dev='/dev/disk/by-id/xxxxxxxx'/><br />
<target dev='vda' bus='virtio'/><br />
</disk><br />
...<br />
</devices><br />
</nowiki><br />
}}<br />
<br />
Virtio-SCSI Example:<br />
{{hc|$ virsh edit [vmname]| <nowiki><br />
<devices><br />
...<br />
<disk type='block' device='disk'><br />
<driver name='qemu' type='raw' cache='none' io='native'/><br />
<source dev='/dev/disk/by-id/xxxxxxxx'/><br />
<target dev='sda' bus='scsi'/><br />
</disk><br />
...<br />
</devices><br />
</nowiki><br />
}}<br />
<br />
To find out which disk/partition is associated with the one you would like to pass:<br />
<br />
{{hc|<br />
$ ls -l /dev/disk/by-id|<br />
ata-ST1000LM002-9VQ14L_Z0501SZ9 -> ../../sdd<br />
ata-ST1000LM002-9VQ14L_Z0501SZ9-part1 -> ../../sdd1<br />
}}<br />
<br />
You can also add the disk with Virt-Manager's '''Add Hardware''' menu and then type the disk you want in the '''Select or create custom storage''' box, e.g. '''/dev/disk/by-id/ata-ST1000LM002-9VQ14L_Z0501SZ9'''<br />
<br />
Depending on which bus you use, the above step will require either the VIOSTOR(bus=virtio) or VIOSCSI(bus=scsi) driver on Windows guests, refer to [[#Setting up the guest OS]] for the driver ISO.<br />
<br />
=== Gotchas ===<br />
<br />
==== Passing through a device that does not support resetting ====<br />
<br />
When the VM shuts down, all devices used by the guest are deinitialized by its OS in preparation for shutdown. In this state, those devices are no longer functional and must then be power-cycled before they can resume normal operation. Linux can handle this power-cycling on its own, but when a device has no known reset methods, it remains in this disabled state and becomes unavailable. Since Libvirt and Qemu both expect all host PCI devices to be ready to reattach to the host before completely stopping the VM, when encountering a device that will not reset, they will hang in a "Shutting down" state where they will not be able to be restarted until the host system has been rebooted. It is therefore reccomanded to only pass through PCI devices which the kernel is able to reset, as evidenced by the presence of a {{ic|reset}} file in the PCI device sysfs node, such as {{ic|/sys/bus/pci/devices/0000:00:1a.0/reset}}.<br />
<br />
The following bash command shows which devices can and cannot be reset.<br />
<br />
{{hc|<nowiki>for iommu_group in $(find /sys/kernel/iommu_groups/ -maxdepth 1 -mindepth 1 -type d);do echo "IOMMU group $(basename "$iommu_group")"; for device in $(\ls -1 "$iommu_group"/devices/); do if [[ -e "$iommu_group"/devices/"$device"/reset ]]; then echo -n "[RESET]"; fi; echo -n $'\t';lspci -nns "$device"; done; done</nowiki>|<br />
IOMMU group 0<br />
00:00.0 Host bridge [0600]: Intel Corporation Xeon E3-1200 v2/Ivy Bridge DRAM Controller [8086:0158] (rev 09)<br />
IOMMU group 1<br />
00:01.0 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port [8086:0151] (rev 09)<br />
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GK208 [GeForce GT 720] [10de:1288] (rev a1)<br />
01:00.1 Audio device [0403]: NVIDIA Corporation GK208 HDMI/DP Audio Controller [10de:0e0f] (rev a1)<br />
IOMMU group 2<br />
00:14.0 USB controller [0c03]: Intel Corporation 7 Series/C210 Series Chipset Family USB xHCI Host Controller [8086:1e31] (rev 04)<br />
IOMMU group 4<br />
[RESET] 00:1a.0 USB controller [0c03]: Intel Corporation 7 Series/C210 Series Chipset Family USB Enhanced Host Controller #2 [8086:1e2d] (rev 04)<br />
IOMMU group 5<br />
[RESET] 00:1b.0 Audio device [0403]: Intel Corporation 7 Series/C210 Series Chipset Family High Definition Audio Controller [8086:1e20] (rev 04)<br />
IOMMU group 10<br />
[RESET] 00:1d.0 USB controller [0c03]: Intel Corporation 7 Series/C210 Series Chipset Family USB Enhanced Host Controller #1 [8086:1e26] (rev 04)<br />
IOMMU group 13<br />
06:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204 [GeForce GTX 970] [10de:13c2] (rev a1)<br />
06:00.1 Audio device [0403]: NVIDIA Corporation GM204 High Definition Audio Controller [10de:0fbb] (rev a1)<br />
}}<br />
<br />
This signals that the xHCI USB controller in 00:14.0 cannot be reset and will therefore stop the VM from shutting down properly, while the integrated sound card in 00:1b.0 and the other two controllers in 00:1a.0 and 00:1d.0 do not share this problem and can be passed without issue.<br />
<br />
== Complete setups and examples ==<br />
<br />
For many reasons users may seek to see [[PCI_passthrough_via_OVMF/Examples|complete passthrough setup examples]].<br />
<br />
These examples offer a supplement to existing hardware compatibility lists. Additionally, if you have trouble configuring a certain mechanism in your setup, you might find these examples very valuable. Users there have described their setups in detail, and some have provided examples of their configuration files as well. <br />
<br />
We encourage those who successfully build their system from this resource to help improve it by contributing their builds. Due to the many different hardware manufacturers involved, the seemingly significant lack of sufficient documentation, as well as other issues due to the nature of this process, community contributions are necessary.<br />
<br />
== Troubleshooting ==<br />
<br />
If your issue is not mentioned below, you may want to browse [[QEMU#Troubleshooting]].<br />
<br />
=== "Error 43: Driver failed to load" on Nvidia GPUs passed to Windows VMs ===<br />
<br />
{{Note|<br />
* This may also fix SYSTEM_THREAD_EXCEPTION_NOT_HANDLED boot crashes related to Nvidia drivers.<br />
* This may also fix problems under linux guests.<br />
}}<br />
<br />
Since version 337.88, Nvidia drivers on Windows check if an hypervisor is running and fail if it detects one, which results in an Error 43 in the Windows device manager. Starting with QEMU 2.5.0 and libvirt 1.3.3, the vendor_id for the hypervisor can be spoofed, which is enough to fool the Nvidia drivers into loading anyway. All one must do is add {{ic|<nowiki>hv_vendor_id=whatever</nowiki>}} to the cpu parameters in their QEMU command line, or by adding the following line to their libvirt domain configuration. It may help for the ID to be set to a 12-character alphanumeric (e.g. '1234567890ab') as opposed to longer or shorter strings.<br />
<br />
{{hc|$ virsh edit [vmname]|<nowiki><br />
...<br />
<features><br />
<hyperv><br />
...<br />
<vendor_id state='on' value='whatever'/><br />
...<br />
</hyperv><br />
...<br />
<kvm><br />
<hidden state='on'/><br />
</kvm><br />
</features><br />
...<br />
</nowiki>}}<br />
<br />
Users with older versions of QEMU and/or libvirt will instead have to disable a few hypervisor extensions, which can degrade performance substantially. If this is what you want to do, do the following replacement in your libvirt domain config file.<br />
<br />
{{hc|$ virsh edit [vmname]|<nowiki><br />
...<br />
<features><br />
<hyperv><br />
<relaxed state='on'/><br />
<vapic state='on'/><br />
<spinlocks state='on' retries='8191'/><br />
</hyperv><br />
...<br />
</features><br />
...<br />
<clock offset='localtime'><br />
<timer name='hypervclock' present='yes'/><br />
</clock><br />
...<br />
</nowiki>}}<br />
<br />
{{bc|<nowiki><br />
...<br />
<clock offset='localtime'><br />
<timer name='hypervclock' present='no'/><br />
</clock><br />
...<br />
<features><br />
<kvm><br />
<hidden state='on'/><br />
</kvm><br />
...<br />
<hyperv><br />
<relaxed state='off'/><br />
<vapic state='off'/><br />
<spinlocks state='off'/><br />
</hyperv><br />
...<br />
</features><br />
...<br />
</nowiki>}}<br />
<br />
=== "BAR 3: cannot reserve [mem]" error in dmesg after starting VM ===<br />
<br />
{{Expansion|This error is actually related to the boot_vgs issue and should be merged together with everything else concerning GPU ROMs.|section=UEFI (OVMF) Compatibility in VBIOS}}<br />
<br />
With respect to [https://www.linuxquestions.org/questions/linux-kernel-70/kernel-fails-to-assign-memory-to-pcie-device-4175487043/ this article]:<br />
<br />
If you still have code 43 check dmesg for memory reservation errors after starting up VM, if you have similar it could be the case:<br />
<br />
vfio-pci 0000:09:00.0: BAR 3: cannot reserve [mem 0xf0000000-0xf1ffffff 64bit pref]<br />
<br />
Find out a PCI Bridge your graphic card is connected to. This will give actual hierarchy of devices:<br />
<br />
$ lspci -t<br />
<br />
Before starting VM run following lines replacing IDs with actual from previous output.<br />
<br />
# echo 1 > /sys/bus/pci/devices/0000\:00\:03.1/remove<br />
# echo 1 > /sys/bus/pci/rescan<br />
<br />
{{Note|Probably setting [[kernel parameter]] {{ic|1=video=efifb:off}} is required as well. [https://pve.proxmox.com/wiki/Pci_passthrough#BAR_3:_can.27t_reserve_.5Bmem.5D_error Source]}}<br />
<br />
=== UEFI (OVMF) compatibility in VBIOS ===<br />
<br />
{{Remove|Flashing you guest GPU for the purpose of a GPU passthrough is '''never''' good advice. A full section should be dedicated to VBIOS compatibility.|section= UEFI (OVMF) Compatibility in VBIOS}}<br />
<br />
With respect to [https://pve.proxmox.com/wiki/Pci_passthrough#How_to_known_if_card_is_UEFI_.28ovmf.29_compatible this article]:<br />
<br />
Error 43 can be caused by the GPU's VBIOS without UEFI support. To check whenever your VBIOS supports it, you will have to use {{ic|rom-parser}}:<br />
<br />
$ git clone https://github.com/awilliam/rom-parser<br />
$ cd rom-parser && make<br />
<br />
Dump the GPU VBIOS:<br />
<br />
# echo 1 > /sys/bus/pci/devices/0000:01:00.0/rom<br />
# cat /sys/bus/pci/devices/0000:01:00.0/rom > /tmp/image.rom<br />
# echo 0 > /sys/bus/pci/devices/0000:01:00.0/rom<br />
<br />
And test it for compatibility:<br />
<br />
{{hc|$ ./rom-parser /tmp/image.rom|<br />
Valid ROM signature found @600h, PCIR offset 190h<br />
PCIR: type 0 (x86 PC-AT), vendor: 10de, device: 1184, class: 030000<br />
PCIR: revision 0, vendor revision: 1<br />
Valid ROM signature found @fa00h, PCIR offset 1ch<br />
PCIR: type 3 (EFI), vendor: 10de, device: 1184, class: 030000<br />
PCIR: revision 3, vendor revision: 0<br />
EFI: Signature Valid, Subsystem: Boot, Machine: X64<br />
Last image<br />
}}<br />
<br />
To be UEFI compatible, you need a "type 3 (EFI)" in the result. If it is not there, try updating your GPU VBIOS. GPU manufacturers often share VBIOS upgrades on their support pages. A large database of known compatible and working VBIOSes (along with their UEFI compatibility status!) is available on [https://www.techpowerup.com/vgabios/ TechPowerUp].<br />
<br />
Updated VBIOS can be used in the VM without flashing. To load it in QEMU:<br />
<br />
-device vfio-pci,host=07:00.0,......,romfile=/path/to/your/gpu/bios.bin \<br />
<br />
And in libvirt:<br />
<br />
{{bc|1=<br />
<hostdev><br />
...<br />
<rom file='/path/to/your/gpu/bios.bin'/><br />
...<br />
</hostdev><br />
}}<br />
<br />
One should compare VBIOS versions between host and guest systems using [https://www.techpowerup.com/download/nvidia-nvflash/ nvflash] (Linux versions under ''Show more versions'') or <br />
[https://www.techpowerup.com/download/techpowerup-gpu-z/ GPU-Z] (in Windows guest). To check the currently loaded VBIOS:<br />
<br />
{{hc|$ ./nvflash --version|<br />
...<br />
Version : 80.04.XX.00.97<br />
...<br />
UEFI Support : No<br />
UEFI Version : N/A<br />
UEFI Variant Id : N/A ( Unknown )<br />
UEFI Signer(s) : Unsigned<br />
...<br />
}}<br />
<br />
And to check a given VBIOS file:<br />
<br />
{{hc|$ ./nvflash --version NV299MH.rom|<br />
...<br />
Version : 80.04.XX.00.95<br />
...<br />
UEFI Support : Yes<br />
UEFI Version : 0x10022 (Jul 2 2013 @ 16377903 )<br />
UEFI Variant Id : 0x0000000000000004 ( GK1xx )<br />
UEFI Signer(s) : Microsoft Corporation UEFI CA 2011<br />
...<br />
}}<br />
<br />
If the external ROM did not work as it should in the guest, you will have to flash the newer VBIOS image to the GPU. In some cases it is possible to create your own VBIOS image with UEFI support using [https://www.win-raid.com/t892f16-AMD-and-Nvidia-GOP-update-No-requests-DIY.html GOPUpd] tool, however this is risky and may result in GPU brick.<br />
<br />
{{Warning|Failure during flashing may "brick" your GPU - recovery may be possible, but rarely easy and often requires additional hardware. '''DO NOT''' flash VBIOS images for other GPU models (different boards may use different VBIOSes, clocks, fan configuration). If it breaks, you get to keep all the pieces.}}<br />
<br />
In order to avoid the irreparable damage to your graphics adapter it is necessary to unload the NVIDIA kernel driver first:<br />
<br />
# modprobe -r nvidia_modeset nvidia <br />
<br />
Flashing the VBIOS can be done with:<br />
<br />
# ./nvflash romfile.bin<br />
<br />
{{Warning|'''DO NOT''' interrupt the flashing process, even if it looks like it is stuck. Flashing should take about a minute on most GPUs, but may take longer.}}<br />
<br />
=== Slowed down audio pumped through HDMI on the video card ===<br />
<br />
For some users VM's audio slows down/starts stuttering/becomes demonic after a while when it is pumped through HDMI on the video card. This usually also slows down graphics.<br />
A possible solution consists of enabling MSI (Message Signaled-Based Interrupts) instead of the default (Line-Based Interrupts).<br />
<br />
In order to check whether MSI is supported or enabled, run the following command as root:<br />
<br />
# lspci -vs $device | grep 'MSI:'<br />
<br />
where `$device` is the card's address (e.g. `01:00.0`).<br />
<br />
The output should be similar to:<br />
<br />
Capabilities: [60] MSI: Enable'''-''' Count=1/1 Maskable- 64bit+<br />
<br />
A {{ic|-}} after {{ic|Enable}} means MSI is supported, but not used by the VM, while a {{ic|+}} says that the VM is using it.<br />
<br />
The procedure to enable it is quite complex, instructions and an overview of the setting can be found [https://forums.guru3d.com/showthread.php?t=378044 here].<br />
<br />
Other hints can be found on the [https://lime-technology.com/wiki/index.php/UnRAID_6/VM_Guest_Support#Enable_MSI_for_Interrupts_to_Fix_HDMI_Audio_Support lime-technology's wiki], or on this article on [https://vfio.blogspot.it/2014/09/vfio-interrupts-and-how-to-coax-windows.html VFIO tips and tricks].<br />
<br />
A UI tool called [https://github.com/CHEF-KOCH/MSI-utility MSI Utility (FOSS Version 2)] works with Windows 10 64-bit and simplifies the process.<br />
<br />
In order to fix the issues enabling MSI on the 0 function of a nVidia card ({{ic|01:00.0 VGA compatible controller: NVIDIA Corporation GM206 [GeForce GTX 960] (rev a1) (prog-if 00 [VGA controller])}}) was not enough; it will also be required to enable it on the other function ({{ic|01:00.1 Audio device: NVIDIA Corporation Device 0fba (rev a1)}}) to fix the issue.<br />
<br />
=== No HDMI audio output on host when intel_iommu is enabled ===<br />
<br />
If after enabling {{ic|intel_iommu}} the HDMI output device of Intel GPU becomes unusable on the host then setting the option {{ic|igfx_off}} (i.e. {{ic|1=intel_iommu=on,igfx_off}}) might bring the audio back, please read {{ic|Graphics Problems?}} in [https://www.kernel.org/doc/Documentation/Intel-IOMMU.txt Intel-IOMMU.txt] for details about setting {{ic|igfx_off}}.<br />
<br />
=== X does not start after enabling vfio_pci ===<br />
<br />
This is related to the host GPU being detected as a secondary GPU, which causes X to fail/crash when it tries to load a driver for the guest GPU. To circumvent this, a Xorg configuration file specifying the BusID for the host GPU is required. The correct BusID can be acquired from lspci or the Xorg log. [https://www.redhat.com/archives/vfio-users/2016-August/msg00025.html Source →]<br />
<br />
{{hc|/etc/X11/xorg.conf.d/10-intel.conf|<nowiki><br />
Section "Device"<br />
Identifier "Intel GPU"<br />
Driver "modesetting"<br />
BusID "PCI:0:2:0"<br />
EndSection<br />
</nowiki>}}<br />
<br />
=== Chromium ignores integrated graphics for acceleration ===<br />
<br />
Chromium and friends will try to detect as many GPUs as they can in the system and pick which one is preferred (usually discrete NVIDIA/AMD graphics). It tries to pick a GPU by looking at PCI devices, not OpenGL renderers available in the system - the result is that Chromium may ignore the integrated GPU available for rendering and try to use the dedicated GPU bound to the {{ic|vfio-pci}} driver, and unusable on the host system, regardless of whenever a guest VM is running or not. This results in software rendering being used (leading to higher CPU load, which may also result in choppy video playback, scrolling and general un-smoothness).<br />
<br />
This can be fixed by [[Chromium/Tips and tricks#Forcing specific GPU|explicitly telling Chromium which GPU you want to use]].<br />
<br />
=== VM only uses one core ===<br />
<br />
For some users, even if IOMMU is enabled and the core count is set to more than 1, the VM still only uses one CPU core and thread. To solve this enable "Manually set CPU topology" in {{ic|virt-manager}} and set it to the desirable amount of CPU sockets, cores and threads. Keep in mind that "Threads" refers to the thread count per CPU, not the total count.<br />
<br />
=== Passthrough seems to work but no output is displayed ===<br />
<br />
Make sure if you are using virt-manager that UEFI firmware is selected for your virtual machine. Also, make sure you have passed the correct device to the VM.<br />
<br />
=== virt-manager has permission issues ===<br />
<br />
If you are getting a permission error with virt-manager add the following to your {{ic|/etc/libvirt/qemu.conf}}:<br />
<br />
group="kvm"<br />
user="''user''"<br />
<br />
If that does not work make sure your user account is added to the kvm and libvirt [[group]]s.<br />
<br />
=== Host lockup after VM shutdown ===<br />
<br />
This issue seems to primarily affect users running a Windows 10 guest and usually after the VM has been run for a prolonged period of time: the host will experience multiple CPU core lockups (see [https://bbs.archlinux.org/viewtopic.php?id=206050&p=2]). To fix this try enabling Message Signal Interrupts on the GPU passed through to the guest. A good guide for how to do this can be found in [https://forums.guru3d.com/threads/windows-line-based-vs-message-signaled-based-interrupts.378044/].<br />
<br />
=== Host lockup if guest is left running during sleep ===<br />
<br />
VFIO-enabled virtual machines tend to become unstable if left running through a sleep/wakeup cycle and have been known to cause the host machine to lockup when an attempt is then made to shut them down. In order to avoid this, one can simply prevent the host from going into sleep while the guest is running using the following libvirt hook script and systemd unit. The hook file needs executable permissions to work.<br />
<br />
{{hc|/etc/libvirt/hooks/qemu|<nowiki><br />
#!/bin/bash<br />
<br />
OBJECT="$1"<br />
OPERATION="$2"<br />
SUBOPERATION="$3"<br />
EXTRA_ARG="$4"<br />
<br />
case "$OPERATION" in<br />
"prepare")<br />
systemctl start libvirt-nosleep@"$OBJECT"<br />
;;<br />
"release")<br />
systemctl stop libvirt-nosleep@"$OBJECT"<br />
;;<br />
esac<br />
</nowiki>}}<br />
<br />
{{hc|/etc/systemd/system/libvirt-nosleep@.service|<nowiki><br />
[Unit]<br />
Description=Preventing sleep while libvirt domain "%i" is running<br />
<br />
[Service]<br />
Type=simple<br />
ExecStart=/usr/bin/systemd-inhibit --what=sleep --why="Libvirt domain \"%i\" is running" --who=%U --mode=block sleep infinity<br />
</nowiki>}}<br />
<br />
=== Cannot boot after upgrading ovmf ===<br />
If you cannot boot after upgrading from ovmf-1:r23112.018432f0ce-1 then you need to remove the old *VARS.fd file in {{ic|/var/lib/libvirt/qemu/nvram}}:<br />
<br />
mv /var/lib/libvirt/qemu/nvram/vmname_VARS.fd /var/lib/libvirt/qemu/nvram/vmname_VARS.fd.old<br />
<br />
See {{Bug|57825}} for further details.<br />
<br />
== See also ==<br />
<br />
* [https://bbs.archlinux.org/viewtopic.php?id=162768 Discussion on Arch Linux forums] | [https://archive.is/kZYMt Archived link]<br />
* [https://docs.google.com/spreadsheet/ccc?key=0Aryg5nO-kBebdFozaW9tUWdVd2VHM0lvck95TUlpMlE User contributed hardware compatibility list]<br />
* [https://pastebin.com/rcnUZCv7 Example script] from https://www.youtube.com/watch?v=37D2bRsthfI<br />
* [https://vfio.blogspot.com/ Complete tutorial for PCI passthrough]<br />
* [https://www.redhat.com/archives/vfio-users/ VFIO users mailing list]<br />
* [ircs://chat.freenode.net/vfio-users #vfio-users on freenode]<br />
* [https://www.youtube.com/watch?v=aLeWg11ZBn0 YouTube: Level1Linux - GPU Passthrough for Virtualization with Ryzen]<br />
* [https://www.reddit.com/r/VFIO /r/VFIO: A subreddit focused on vfio]</div>Wribbehttps://wiki.archlinux.org/index.php?title=RAID&diff=481568RAID2017-07-09T14:32:47Z<p>Wribbe: /* Prepare the Devices */</p>
<hr />
<div>[[Category:File systems]]<br />
[[es:RAID]]<br />
[[it:RAID]]<br />
[[ja:RAID]]<br />
[[zh-hans:RAID]]<br />
{{Related articles start}}<br />
{{Related|Software RAID and LVM}}<br />
{{Related|Installing with Fake RAID}}<br />
{{Related|Convert a single drive system to RAID}}<br />
{{Related|ZFS}}<br />
{{Related|Experimenting with ZFS}}<br />
{{Related|Swap#Striping}}<br />
{{Related|Btrfs#RAID}}<br />
{{Related articles end}}<br />
{{Style|Non-standard headers, other [[Help:Style]] issues}}<br />
This article explains what RAID is and how to create/manage a software RAID array using mdadm.<br />
<br />
== Introduction ==<br />
<br />
See also [[Wikipedia:RAID]].<br />
<br />
Redundant Array of Independent Disks (RAID) is a storage technology that combines multiple disk drive components (typically disk drives or partitions thereof) into a logical unit. Depending on the RAID implementation, this logical unit can be a file system or an additional transparent layer that can hold several partitions. Data is distributed across the drives in one of several ways called "RAID levels", depending on the level of redundancy and performance required. The RAID level chosen can thus prevent data loss in the event of a hard disk failure, increase performance or be a combination of both.<br />
<br />
Despite redundancy implied by most RAID levels, RAID does not guarantee that data is safe. A RAID will not protect data if there is a fire, the computer is stolen or multiple hard drives fail at once. Furthermore, installing a system with RAID is a complex process that may destroy data. {{Warning|Therefore, be sure [[Backup programs|to back up]] all data before proceeding.}}<br />
<br />
{{Note|Users considering a RAID array for data storage/redundancy should also consider RAIDZ which is implemented via [[ZFS]], a more modern and powerful alternative to software RAID.}}<br />
<br />
===Standard RAID levels===<br />
There are many different [[Wikipedia:Standard RAID levels|levels of RAID]], please find hereafter the most commonly used ones.<br />
<br />
; [[Wikipedia:Standard RAID levels#RAID 0|RAID 0]]<br />
: Uses striping to combine disks. Even though it ''does not provide redundancy'', it is still considered RAID. It does, however, ''provide a big speed benefit''. If the speed increase is worth the possibility of data loss (for [[swap]] partition for example), choose this RAID level. On a server, RAID 1 and RAID 5 arrays are more appropriate. The size of a RAID 0 array block device is the size of the smallest component partition times the number of component partitions.<br />
; [[Wikipedia:Standard RAID levels#RAID 1|RAID 1]]<br />
: The most straightforward RAID level: straight mirroring. As with other RAID levels, it only makes sense if the partitions are on different physical disk drives. If one of those drives fails, the block device provided by the RAID array will continue to function as normal. The example will be using RAID 1 for everything except [[swap]] and temporary data. Please note that with a software implementation, the RAID 1 level is the only option for the boot partition, because bootloaders reading the boot partition do not understand RAID, but a RAID 1 component partition can be read as a normal partition. The size of a RAID 1 array block device is the size of the smallest component partition.<br />
; [[Wikipedia:Standard RAID levels#RAID 5|RAID 5]]<br />
: Requires 3 or more physical drives, and provides the redundancy of RAID 1 combined with the speed and size benefits of RAID 0. RAID 5 uses striping, like RAID 0, but also stores parity blocks ''distributed across each member disk''. In the event of a failed disk, these parity blocks are used to reconstruct the data on a replacement disk. RAID 5 can withstand the loss of one member disk.<br />
: {{Note|RAID 5 is a common choice due to its combination of speed and data redundancy. The caveat is that if one drive were to fail and another drive failed before that drive was replaced, all data will be lost. Furthermore, with modern disk sizes and expected unrecoverable read error rates on consumer disks, the rebuild of a 4TiB array is '''expected''' (i.e. higher than 50% chance) to have at least one URE. Because of this, RAID 5 is no longer advised by the storage industry.}}<br />
; [[Wikipedia:Standard RAID levels#RAID 6|RAID 6]]<br />
: Requires 4 or more physical drives, and provides the benefits of RAID 5 but with security against two drive failures. RAID 6 also uses striping, like RAID 5, but stores two distinct parity blocks ''distributed across each member disk''. In the event of a failed disk, these parity blocks are used to reconstruct the data on a replacement disk. RAID 6 can withstand the loss of two member disks. The robustness against unrecoverable read errors is somewhat better, because the array still has parity blocks when rebuilding from a single failed drive. However, given the overhead, RAID 6 is costly and in most settings RAID 10 in far2 layout (see below) provides better speed benefits and robustness, and is therefore preferred.<br />
<br />
===Nested RAID levels===<br />
; [[Wikipedia:Nested RAID levels#RAID 1 + 0|RAID 1+0]]<br />
: RAID1+0 is a nested RAID that combines two of the standard levels of RAID to gain performance and additional redundancy. It is commonly referred to as ''RAID10'', however, Linux MD RAID10 is slightly different from simple RAID layering, see below.<br />
<br />
; [[Wikipedia:Non-standard_RAID_levels#Linux_MD_RAID_10|RAID 10]]<br />
: RAID10 under Linux is built on the concepts of RAID1+0, however, it implements this as a single layer, with multiple possible layouts.<br />
<br />
: The ''near X'' layout on Y disks repeats each chunk X times on Y/2 stripes, but does not need X to divide Y evenly. The chunks are placed on almost the same location on each disk they're mirrored on, hence the name. It can work with any number of disks, starting at 2. Near 2 on 2 disks is equivalent to RAID1, near 2 on 4 disks to RAID1+0.<br />
<br />
: The ''far X'' layout on Y disks is designed to offer striped read performance on a mirrored array. It accomplishes this by dividing each disk in two sections, say front and back, and what is written to disk 1 front is mirrored in disk 2 back, and vice versa. This has the effect of being able to stripe sequential reads, which is where RAID0 and RAID5 get their performance from. The drawback is that sequential writing has a very slight performance penalty because of the distance the disk needs to seek to the other section of the disk to store the mirror. RAID10 in far 2 layout is, however, preferable to layered RAID1+0 '''and''' RAID5 whenever read speeds are of concern and availability / redundancy is crucial. However, it is still not a substitute for backups. See the wikipedia page for more information.<br />
<br />
=== RAID level comparison ===<br />
{| class="wikitable"<br />
! RAID level!!Data redundancy!!Physical drive utilization!!Read performance!!Write performance!!Min drives<br />
|-<br />
| '''0'''||{{No}}||100%||nX<br />
<br />
'''Best'''<br />
||nX<br />
<br />
'''Best'''<br />
||2<br />
|-<br />
| '''1'''||{{Yes}}||50%||Up to nX if multiple processes are reading, otherwise 1X<br />
||1X||2<br />
|-<br />
| '''5'''||{{Yes}}||67% - 94%||(n−1)X<br />
<br />
'''Superior'''<br />
||(n−1)X<br />
<br />
'''Superior'''<br />
||3<br />
|-<br />
| '''6'''||{{Yes}}||50% - 88%||(n−2)X||(n−2)X||4<br />
|-<br />
| '''10,far2'''||{{Yes}}||50%||nX<br />
'''Best;''' on par with RAID0 but redundant<br />
||(n/2)X||2<br />
|-<br />
| '''10,near2'''||{{Yes}}||50%||Up to nX if multiple processes are reading, otherwise 1X||(n/2)X||2<br />
|}<br />
<nowiki>*</nowiki> Where ''n'' is standing for the number of dedicated disks.<br />
<br />
== Implementation ==<br />
The RAID devices can be managed in different ways:<br />
<br />
; Software RAID<br />
: This is the easiest implementation as it does not rely on obscure proprietary firmware and software to be used. The array is managed by the operating system either by:<br />
:* by an abstraction layer (e.g. [[#Installation|mdadm]]); {{Note|This is the method we will use later in this guide.}}<br />
:* by a logical volume manager (e.g. [[LVM]]);<br />
:* by a component of a file system (e.g. [[ZFS]], [[Btrfs]]).<br />
<br />
; Hardware RAID<br />
: The array is directly managed by a dedicated hardware card installed in the PC to which the disks are directly connected. The RAID logic runs on an on-board processor independently of [[Wikipedia:Central processing unit|the host processor (CPU)]]. Although this solution is independent of any operating system, the latter requires a driver in order to function properly with the hardware RAID controller. The RAID array can either be configured via an option rom interface or, depending on the manufacturer, with a dedicated application when the OS has been installed. The configuration is transparent for the Linux kernel: it doesn't see the disks separately.<br />
<br />
; [[Fakeraid|FakeRAID]]<br />
: This type of RAID is properly called BIOS or Onboard RAID, but is falsely advertised as hardware RAID. The array is managed by pseudo-RAID controllers where the RAID logic is implemented in an option rom or in the firmware itself [http://www.win-raid.com/t19f13-Intel-EFI-RAID-quot-SataDriver-quot-BIOS-Modules.html with a EFI SataDriver] (in case of [[UEFI]]), but are not full hardware RAID controllers with ''all'' RAID features implemented. Therefore, this type of RAID is sometimes called FakeRAID. {{Pkg|dmraid}} from the [[official repositories]], will be used to deal with these controllers. Here are some examples of FakeRAID controllers: [[Wikipedia:Intel Rapid Storage Technology|Intel Rapid Storage]], JMicron JMB36x RAID ROM, AMD RAID, ASMedia 106x, and NVIDIA MediaShield.<br />
<br />
===Which type of RAID do I have?===<br />
<br />
Since software RAID is implemented by the user, the type of RAID is easily known to the user.<br />
<br />
However, discerning between FakeRAID and true hardware RAID can be more difficult. As stated, manufacturers often incorrectly distinguish these two types of RAID and false advertising is always possible. The best solution in this instance is to run the {{ic|lspci}} command and looking through the output to find the RAID controller. Then do a search to see what information can be located about the RAID controller. Hardware RAID controllers appear in this list, but FakeRAID implementations do not. Also, true hardware RAID controller are often rather expensive (~$400+), so if someone customized the system, then it is very likely that choosing a hardware RAID setup made a very noticeable change in the computer's price.<br />
<br />
== Installation ==<br />
<br />
Install {{Pkg|mdadm}} from the [[official repositories]]. ''mdadm'' is used for administering pure software RAID using plain block devices: the underlying hardware does not provide any RAID logic, just a supply of disks. ''mdadm'' will work with any collection of block devices. Even if unusual. For example, one can thus make a RAID array from a collection of thumb drives.<br />
<br />
===Prepare the Devices===<br />
<br />
{{Warning|These steps erase everything on a device, so type carefully!}}<br />
<br />
If the device is being reused or re-purposed from an existing array, erase any old RAID configuration information:<br />
# mdadm --misc --zero-superblock /dev/<drive><br />
<br />
or if a particular partition on a drive is to be deleted:<br />
# mdadm --misc --zero-superblock /dev/<partition><br />
<br />
{{Note|<br />
* Zapping a partition's superblock should not affect the other partitions on the disk.<br />
* Due to the nature of RAID functionality it is very difficult to [[Securely wipe disk]]s fully on a running array. Consider whether it is useful to do so before creating it.}}<br />
<br />
===Create the Partition Table (GPT)===<br />
<br />
It is highly recommended to pre-partition the disks to be used in the array. Since most RAID users are selecting HDDs >2 TB, GPT partition tables are required and recommended. Disks are easily partitioned using {{Pkg|gptfdisk}}.<br />
<br />
* After created, the partition type should be assigned hex code FD00.<br />
* If a larger disk array is employed, consider assigning [[Persistent_block_device_naming#by-label|disk labels]] or [[Persistent_block_device_naming#by-partlabel|partition labels]] to make it easier to identify an individual disk later. <br />
* Creating partitions that are of the same size on each of the devices is preferred.<br />
* A good tip is to leave approx 100 MB at the end of the device when partitioning. See below for rationale.<br />
<br />
====Partitions Types for MBR====<br />
<br />
For those creating partitions on HDDs with a MBR partition table, the partition types available for use are:<br />
<br />
* 0xDA (for non-fs data -- current recommendation by [https://raid.wiki.kernel.org/index.php/Partition_Types kernel.org])<br />
* 0xFD (for raid autodetect arrays -- was useful before booting an initrd to load kernel modules)<br />
<br />
{{Note|It is also possible to create a RAID directly on the raw disks (without partitions), but not recommended because it can cause problems when swapping a failed disk.}}<br />
<br />
When replacing a failed disk of a RAID, the new disk has to be exactly the same size as the failed disk or bigger — otherwise the array recreation process will not work. Even hard drives of the same manufacturer and model can have small size differences. By leaving a little space at the end of the disk unallocated one can compensate for the size differences between drives, which makes choosing a replacement drive model easier. Therefore, '''it is good practice to leave about 100 MB of unallocated space at the end of the disk.'''<br />
<br />
=== Build the Array ===<br />
<br />
{{Warning|Kernel versions 4.2.x and 4.3.x currently have an active bug that prevents them from assembling a RAID10 array; users needing this layout are encouraged to use kernel version 4.1.x series provided by {{pkg|linux-lts}} until this bug is fixed. References are provided in the [[#See also]] section.}}<br />
<br />
Use {{ic|mdadm}} to build the array. Several examples are given below.<br />
<br />
{{Warning|Do not simply copy/paste the examples below; use your brain and substitute the correct options/drive letters!}}<br />
<br />
{{Note|If this is a RAID1 array which is intended to boot from [[Syslinux]] a limitation in syslinux v4.07 requires the metadata value to be 1.0 rather than the default of 1.2.}}<br />
<br />
The following example shows building a 2-device RAID1 array:<br />
<br />
# mdadm --create --verbose --level=1 --metadata=1.2 --raid-devices=2 /dev/md0 /dev/sdb1 /dev/sdc1<br />
<br />
The following example shows building a RAID5 array with 4 active devices and 1 spare device:<br />
<br />
# mdadm --create --verbose --level=5 --metadata=1.2 --chunk=256 --raid-devices=4 /dev/md0 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 --spare-devices=1 /dev/sdf1<br />
<br />
:{{Tip|{{ic|--chunk}} was used in the previous example to change the chunk size from the default value. See [http://www.zdnet.com/article/chunks-the-hidden-key-to-raid-performance/ Chunks: the hidden key to RAID performance] for more on chunk size optimisation.}}<br />
<br />
The following example shows building a RAID10,far2 array with 2 devices:<br />
<br />
# mdadm --create --verbose --level=10 --metadata=1.2 --chunk=512 --raid-devices=2 --layout=f2 /dev/md0 /dev/sdb1 /dev/sdc1<br />
<br />
{{Tip| The {{ic|--homehost}} and {{ic|--name}} options can be used for a custom raid device name. They are delimited by a colon in the resulting name.}}<br />
<br />
The array is created under the virtual device {{ic|/dev/mdX}}, assembled and ready to use (in degraded mode). One can directly start using it while mdadm resyncs the array in the background. It can take a long time to restore parity. Check the progress with:<br />
<br />
$ cat /proc/mdstat<br />
<br />
=== Update configuration file ===<br />
<br />
By default, most of {{ic|mdadm.conf}} is commented out, and it contains just the following:<br />
<br />
{{hc|/etc/mdadm.conf|DEVICE partitions}}<br />
<br />
This directive tells mdadm to examine the devices referenced by {{ic|/proc/partitions}} and assemble as many arrays as possible. This is fine if you really do want to start all available arrays and are confident that no unexpected superblocks will be found (such as after installing a new storage device). A more precise approach is as follows:<br />
# echo 'DEVICE partitions' > /etc/mdadm.conf<br />
# mdadm --detail --scan >> /etc/mdadm.conf<br />
<br />
This results in something like the following:<br />
{{hc|/etc/mdadm.conf|2=<br />
DEVICE partitions<br />
ARRAY /dev/md/0 metadata=1.2 name=pine:0 UUID=27664f0d:111e493d:4d810213:9f291abe<br />
}}<br />
<br />
This also causes mdadm to examine the devices referenced by {{ic|/proc/partitions}}. However, only devices that have superblocks with a UUID of {{ic|27664…}} are assembled in to active arrays.<br />
<br />
See {{ic|mdadm.conf(5)}} for more information.<br />
<br />
===Assemble the Array===<br />
<br />
Once the configuration file has been updated the array can be assembled using mdadm:<br />
<br />
# mdadm --assemble --scan<br />
<br />
===Format the RAID Filesystem===<br />
<br />
The array can now be formatted like any other disk, just keep in mind that:<br />
* Due to the large volume size not all filesystems are suited (see: [[Wikipedia:Comparison of file systems#Limits|File system limits]]).<br />
* The filesystem should support growing and shrinking while online (see: [[Wikipedia:Comparison of file systems#Features|File system features]]).<br />
* One should calculate the correct stride and stripe-width for optimal performance.<br />
<br />
==== Calculating the Stride and Stripe Width ====<br />
<br />
Two parameters are required to optimise the filesystem structure to fit optimally within the underlying RAID structure: the ''stride'' and ''stripe width''. These are derived from the RAID ''chunk size'', the filesystem ''block size'', and the ''number of "data disks"''.<br />
<br />
The chunk size is a property of the RAID array, decided at the time of its creation. {{ic|mdadm}}'s current default is 512 KiB. It can be found with {{ic|mdadm}}:<br />
<br />
# mdadm --detail /dev/mdX | grep 'Chunk Size'<br />
<br />
The block size is a property of the filesystem, decided at ''its'' creation. The default for many filesystems, including ext4, is 4 KiB. See {{ic|/etc/mke2fs.conf}} for details on ext4.<br />
<br />
The number of "data disks" is the minimum number of devices in the array required to completely rebuild it without data loss. For example, this is N for a raid0 array of N devices and N-1 for raid5.<br />
<br />
Once you have these three quantities, the stride and the stripe width can be calculated using the following formulas:<br />
<br />
stride = chunk size / block size<br />
stripe width = number of data disks * stride<br />
<br />
<br />
===== Example 1. RAID0 =====<br />
<br />
Example formatting to ext4 with the correct stripe width and stride:<br />
* Hypothetical RAID0 array is composed of 2 physical disks.<br />
* Chunk size is 64 KiB.<br />
* Block size is 4 KiB.<br />
<br />
stride = chunk size / block size.<br />
In this example, the math is 64/4 so the stride = 16.<br />
<br />
stripe width = # of physical '''data''' disks * stride.<br />
In this example, the math is 2*16 so the stripe width = 32.<br />
<br />
# mkfs.ext4 -v -L myarray -m 0.5 -b 4096 -E stride=16,stripe-width=32 /dev/md0<br />
<br />
===== Example 2. RAID5 =====<br />
<br />
Example formatting to ext4 with the correct stripe width and stride:<br />
* Hypothetical RAID5 array is composed of 4 physical disks; 3 data discs and 1 parity disc.<br />
* Chunk size is 512 KiB.<br />
* Block size is 4 KiB.<br />
<br />
stride = chunk size / block size.<br />
In this example, the math is 512/4 so the stride = 128.<br />
<br />
stripe width = # of physical '''data''' disks * stride.<br />
In this example, the math is 3*128 so the stripe width = 384.<br />
<br />
# mkfs.ext4 -v -L myarray -m 0.01 -b 4096 -E stride=128,stripe-width=384 /dev/md0<br />
<br />
For more on stride and stripe width, see: [http://wiki.centos.org/HowTos/Disk_Optimization RAID Math].<br />
<br />
===== Example 3. RAID10,far2 =====<br />
<br />
Example formatting to ext4 with the correct stripe width and stride:<br />
* Hypothetical RAID10 array is composed of 2 physical disks. Because of the properties of RAID10 in far2 layout, both count as data disks.<br />
* Chunk size is 512 KiB.<br />
# mdadm --detail /dev/md0 | grep 'Chunk Size'<br />
Chunk Size : 512K<br />
* Block size is 4 KiB.<br />
<br />
stride = chunk size / block size.<br />
In this example, the math is 512/4 so the stride = 128.<br />
<br />
stripe width = # of physical '''data''' disks * stride.<br />
In this example, the math is 2*128 so the stripe width = 256.<br />
<br />
# mkfs.ext4 -v -L myarray -m 0.01 -b 4096 -E stride=128,stripe-width=256 /dev/md0<br />
<br />
== Mounting from a Live CD ==<br />
<br />
Users wanting to mount the RAID partition from a Live CD, use:<br />
<br />
# mdadm --assemble /dev/<disk1> /dev/<disk2> /dev/<disk3> /dev/<disk4><br />
<br />
If your RAID 1 that's missing a disk array was wrongly auto-detected as RAID 1 (as per {{ic|mdadm --detail /dev/md<number>}}) and reported as inactive (as per {{ic|cat /proc/mdstat}}), stop the array first:<br />
<br />
# mdadm --stop /dev/md<number><br />
<br />
== Installing Arch Linux on RAID ==<br />
{{Note|The following section is applicable only if the root filesystem resides on the array. Users may skip this section if the array holds a data partition(s).}}<br />
You should create the RAID array between the [[Partitioning]] and [[File systems#Format a device|formatting]]{{Broken section link}} steps of the Installation Procedure. Instead of directly formatting a partition to be your root file system, it will be created on a RAID array.<br />
Follow the section [[#Setup]]{{Broken section link}} to create the RAID array. Then continue with the installation procedure until the pacstrap step is completed.<br />
When using [[Unified Extensible Firmware Interface|UEFI boot]], also read [[EFI System Partition#ESP on RAID|ESP on RAID]].<br />
<br />
=== Update configuration file ===<br />
{{Note|This should be done outside of the chroot, hence the prefix /mnt to the filepath.}}<br />
After the base system is installed the default configuration file, {{ic|mdadm.conf}}, must be updated like so:<br />
# mdadm --detail --scan >> /mnt/etc/mdadm.conf<br />
<br />
Always check the mdadm.conf configuration file using a text editor after running this command to ensure that its contents look reasonable.<br />
{{Note|To prevent failure of '''mdmonitor''' at boot (enabled by default), you will need to uncomment '''MAILADDR''' and provide an e-mail address and/or application to handle notification of problems with your array at the bottom of mdadm.conf.}}<br />
Continue with the installation procedure until you reach the step “Create initial ramdisk environment”, then follow the next section.<br />
<br />
=== Add mdadm hook to mkinitcpio.conf ===<br />
<br />
{{Accuracy|1=[[Mkinitcpio#Common hooks]] also suggests adding {{ic|mdmon}} to the {{ic|BINARIES}} section. See also [https://bbs.archlinux.org/viewtopic.php?id=148947] and [http://unix.stackexchange.com/questions/57440/cant-remount-local-file-systems-for-read-write-raid1].}}<br />
<br />
{{Note|This should be done whilst chrooted.}}<br />
<br />
Add {{ic|mdadm_udev}} to the [[Mkinitcpio#HOOKS|HOOKS]] section of the {{ic|mkinitcpio.conf}} to add support for mdadm directly into the init image:<br />
<br />
HOOKS="base udev autodetect block '''mdadm_udev''' filesystems usbinput fsck"<br />
<br />
Then [[Mkinitcpio#Image creation and activation|Regenerate the initramfs image]].<br />
<br />
=== Configure the boot loader ===<br />
<br />
{{Accuracy|1=[[Mkinitcpio#Using_RAID]] says kernel parameters are no longer needed. Also can refer to the RAID device via {{ic|/dev/md/label\:0}}}}<br />
<br />
Add an {{ic|md}} [[kernel parameter]] for each RAID array; also point the {{ic|root}} parameter to the correct mapped device. The following example accommodates three RAID 1 arrays and sets the second one as root:<br />
<br />
root=/dev/md1 md=0,/dev/sda2,/dev/sdb2 md=1,/dev/sda3,/dev/sdb3 md=2,/dev/sda4,/dev/sdb4<br />
<br />
If booting from a software raid partition fails using the kernel device node method above, an alternative and more reliable way is to use [[Persistent block device naming]], for example:<br />
<br />
root=LABEL=Root_Label<br />
<br />
See also [[GRUB#RAID]].<br />
<br />
== RAID Maintenance ==<br />
=== Scrubbing ===<br />
It is good practice to regularly run data [[wikipedia:Data_scrubbing|scrubbing]] to check for and fix errors. Depending on the size/configuration of the array, a scrub may take multiple hours to complete.<br />
<br />
To initiate a data scrub:<br />
# echo check > /sys/block/md0/md/sync_action<br />
<br />
The check operation scans the drives for bad sectors and automatically repairs them. If it finds good sectors that contain bad data (the data in a sector does not agree with what the data from another disk indicates that it should be, for example the parity block + the other data blocks would cause us to think that this data block is incorrect), then no action is taken, but the event is logged (see below). This "do nothing" allows admins to inspect the data in the sector and the data that would be produced by rebuilding the sectors from redundant information and pick the correct data to keep.<br />
<br />
As with many tasks/items relating to mdadm, the status of the scrub can be queried by reading {{ic|/proc/mdstat}}.<br />
<br />
Example:<br />
{{hc|$ cat /proc/mdstat|<nowiki><br />
Personalities : [raid6] [raid5] [raid4] [raid1] <br />
md0 : active raid1 sdb1[0] sdc1[1]<br />
3906778112 blocks super 1.2 [2/2] [UU]<br />
[>....................] check = 4.0% (158288320/3906778112) finish=386.5min speed=161604K/sec<br />
bitmap: 0/30 pages [0KB], 65536KB chunk<br />
</nowiki>}}<br />
<br />
To stop a currently running data scrub safely:<br />
# echo idle > /sys/block/md0/md/sync_action<br />
<br />
{{Note|If the system is rebooted after a partial scrub has been suspended, the scrub will start over.}}<br />
<br />
When the scrub is complete, admins may check how many blocks (if any) have been flagged as bad:<br />
# cat /sys/block/md0/md/mismatch_cnt<br />
<br />
==== General Notes on Scrubbing ====<br />
{{Note|Users may alternatively echo '''repair''' to /sys/block/md0/md/sync_action but this is ill-advised since if a mismatch in the data is encountered, it would be automatically updated to be consistent. The danger is that we really don't know whether it's the parity or the data block that's correct (or which data block in case of RAID1). It's luck-of-the-draw whether or not the operation gets the right data instead of the bad data.}}<br />
<br />
It is a good idea to set up a cron job as root to schedule a periodic scrub. See {{AUR|raid-check}} which can assist with this. To perform a periodic scrub using systemd timers instead of cron, See {{AUR|raid-check-systemd}} which contains the same script along with associated systemd timer unit files. (note: for typical platter drives, scrubbing can take approximately '''six seconds per gigabyte''' [that's one hour forty-five minutes per terabyte] so plan the start of your cron job or timer appropriately)<br />
<br />
==== RAID1 and RAID10 Notes on Scrubbing ====<br />
Due to the fact that RAID1 and RAID10 writes in the kernel are unbuffered, an array can have non-0 mismatch counts even when the array is healthy. These non-0 counts will only exist in transient data areas where they don't pose a problem. However, we can't tell the difference between a non-0 count that is just in transient data or a non-0 count that signifies a real problem. This fact is a source of false positives for RAID1 and RAID10 arrays. It is however still recommended to scrub regularly in order to catch and correct any bad sectors that might be present in the devices.<br />
<br />
===Removing Devices from an Array===<br />
One can remove a device from the array after marking it as faulty:<br />
# mdadm --fail /dev/md0 /dev/sdxx<br />
<br />
Now remove it from the array:<br />
# mdadm --remove /dev/md0 /dev/sdxx<br />
<br />
Remove device permanently (for example, to use it individually from now on):<br />
Issue the two commands described above then:<br />
<br />
# mdadm --zero-superblock /dev/sdxx<br />
<br />
{{Warning | '''DO NOT''' issue this command on linear or RAID0 arrays or data '''LOSS''' will occur!}}<br />
{{Warning | Reusing the removed disk without zeroing the superblock '''WILL CAUSE LOSS OF ALL DATA''' on the next boot. (After mdadm will try to use it as the part of the raid array).}}<br />
<br />
Stop using an array:<br />
# Umount target array<br />
# Stop the array with: {{ic|mdadm --stop /dev/md0}}<br />
# Repeat the three command described in the beginning of this section on each device.<br />
# Remove the corresponding line from /etc/mdadm.conf<br />
<br />
=== Adding a New Device to an Array ===<br />
Adding new devices with mdadm can be done on a running system with the devices mounted.<br />
Partition the new device using the same layout as one of those already in the arrays as discussed above.<br />
<br />
Assemble the RAID array if it is not already assembled:<br />
# mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1<br />
<br />
Add the new device the array:<br />
# mdadm --add /dev/md0 /dev/sdc1<br />
<br />
This should not take long for mdadm to do. Again, check the progress with:<br />
# cat /proc/mdstat<br />
<br />
Check that the device has been added with the command:<br />
# mdadm --misc --detail /dev/md0<br />
<br />
{{Note|For RAID0 arrays you may get the following error message:<br />
mdadm: add new device failed for /dev/sdc1 as 2: Invalid argument<br />
This is because the above commands will add the new disk as a "spare" but RAID0 doesn't have spares. If you want to add a device to a RAID0 array, you need to "grow" and "add" in the same command. This is demonstrated below:<br />
# mdadm --grow /dev/md0 --raid-devices<nowiki>=</nowiki>3 --add /dev/sdc1<br />
}}<br />
<br />
=== Increasing Size of a RAID Volume ===<br />
<br />
If larger disks are installed in a RAID array or partition size has been increased, it may be desirable to increase the size of the RAID volume to fill the larger available space. This process may be begun by first following the above sections pertaining to replacing disks. Once the RAID volume has been rebuilt onto the larger disks it must be "grown" to fill the space.<br />
# mdadm --grow /dev/md0 --size=max<br />
Next, partitions present on the RAID volume {{ic|/dev/md0}} may need to be resized. See [[Partitioning]] for details. Finally, the filesystem on the above mentioned partition will need to be resized. If partitioning was performed with {{ic|gparted}} this will be done automatically. If other tools were used, unmount and then resize the filesystem manually.<br />
{{bc|<br />
# umount /storage<br />
# fsck.ext4 -f /dev/md0p1<br />
# resize2fs /dev/md0p1<br />
}}<br />
<br />
=== Change sync speed limits ===<br />
<br />
Syncing can take a while. If the machine is not needed for other tasks the speed limit can be increased.<br />
<br />
{{hc|# cat /proc/mdstat|<nowiki><br />
Personalities : [raid1] <br />
md0 : active raid1 sda3[2] sdb3[1]<br />
155042219 blocks super 1.2 [2/1] [_U]<br />
[>....................] recovery = 0.0% (77696/155042219) finish=265.8min speed=9712K/sec<br />
<br />
unused devices: <none><br />
</nowiki>}}<br />
<br />
Check the current speed limit.<br />
<br />
{{hc|# cat /proc/sys/dev/raid/speed_limit_min|<br />
1000<br />
}}<br />
{{hc|# cat /proc/sys/dev/raid/speed_limit_max|<br />
200000<br />
}}<br />
<br />
Increase the limits.<br />
<br />
# echo 400000 >/proc/sys/dev/raid/speed_limit_min<br />
# echo 400000 >/proc/sys/dev/raid/speed_limit_max<br />
<br />
Then check out the syncing speed and estimated finish time.<br />
<br />
{{hc|# cat /proc/mdstat|<nowiki><br />
Personalities : [raid1] <br />
md0 : active raid1 sda3[2] sdb3[1]<br />
155042219 blocks super 1.2 [2/1] [_U]<br />
[>....................] recovery = 1.3% (2136640/155042219) finish=158.2min speed=16102K/sec<br />
<br />
unused devices: <none><br />
</nowiki>}}<br />
<br />
<br />
See also [[sysctl#MDADM]].<br />
<br />
== Monitoring ==<br />
A simple one-liner that prints out the status of the RAID devices:<br />
{{hc|awk '/^md/ {printf "%s: ", $1}; /blocks/ {print $NF}' </proc/mdstat<br />
|md1: [UU]<br />
md0: [UU]<br />
}}<br />
<br />
===Watch mdstat===<br />
{{bc|watch -t 'cat /proc/mdstat'}}<br />
Or preferably using {{pkg|tmux}}<br />
{{bc|tmux split-window -l 12 "watch -t 'cat /proc/mdstat'"}}<br />
<br />
===Track IO with iotop===<br />
The {{pkg|iotop}} package displays the input/output stats for processes. Use this command to view the IO for raid threads.<br />
<br />
{{bc|<nowiki>iotop -a -p $(sed 's, , -p ,g' <<<`pgrep "_raid|_resync|jbd2"`)</nowiki>}}<br />
<br />
===Track IO with iostat===<br />
<br />
The ''iostat'' utility from {{Pkg|sysstat}} package displays the input/output statistics for devices and partitions.<br />
<br />
iostat -dmy 1 /dev/md0<br />
iostat -dmy 1 # all<br />
<br />
===Mailing on events===<br />
A smtp mail server (sendmail) or at least an email forwarder (ssmtp/msmtp) is required to accomplish this. Perhaps the most simplistic solution is to use {{AUR|dma}} which is very tiny (installs to 0.08 MiB) and requires no setup.<br />
<br />
Edit {{ic|/etc/mdadm.conf}} defining the email address to which notifications will be received. <br />
{{Note|If using dma as mentioned above, users may simply mail directly to the username on the localhost rather than to an external email address.}}<br />
<br />
To test the configuration:<br />
# mdadm --monitor --scan --oneshot --test<br />
<br />
[[mdadm]] includes a systemd service (mdmonitor.service) to perform the monitoring task, so at this point, you have nothing left to do. If you do not set a mail address in {{ic|/etc/mdadm.conf}}, that service will fail. If you do not want to receive mail on mdadm events, the failure can be ignored; if you don't want notifications and are sensitive about failure messages, you can mask the unit.<br />
<br />
==== Alternative method ====<br />
<br />
To avoid the installation of a smtp mail server or an email forwarder you can use the [[S-nail]] tool (don't forget to setup) already on your system.<br />
<br />
Create a file named {{ic|/etc/mdadm_warning.sh}} with:<br />
<br />
#!/bin/bash<br />
event=$1<br />
device=$2<br />
<br />
echo " " | /usr/bin/mailx -s "$event on $device" '''destination@email.com'''<br />
<br />
And give it execution permissions {{ic|chmod +x /etc/mdadm_warning.sh}}<br />
<br />
Then add this to the mdadm.conf<br />
<br />
PROGRAM /etc/mdadm_warning.sh<br />
<br />
To test and enable use the same as in the previous method.<br />
<br />
==Troubleshooting==<br />
If you are getting error when you reboot about "invalid raid superblock magic" and you have additional hard drives other than the ones you installed to, check that your hard drive order is correct. During installation, your RAID devices may be hdd, hde and hdf, but during boot they may be hda, hdb and hdc. Adjust your kernel line accordingly. This is what happened to me anyway.<br />
<br />
===Error: "kernel: ataX.00: revalidation failed"===<br />
If you suddenly (after reboot, changed BIOS settings) experience Error messages like:<br />
<br />
Feb 9 08:15:46 hostserver kernel: ata8.00: revalidation failed (errno=-5)<br />
<br />
Is doesn't necessarily mean that a drive is broken. You often find panic links on the web which go for the worst. In a word, No Panic. Maybe you just changed APIC or ACPI settings within your BIOS or Kernel parameters somehow. Change them back and you should be fine. Ordinarily, turning ACPI and/orACPI off should help.<br />
<br />
===Start arrays read-only===<br />
When an md array is started, the superblock will be written, and resync may begin. To start read-only set the kernel module {{ic|md_mod}} parameter {{ic|start_ro}}. When this is set, new arrays get an 'auto-ro' mode, which disables all internal io (superblock updates, resync, recovery) and is automatically switched to 'rw' when the first write request arrives.<br />
<br />
{{Note|The array can be set to true 'ro' mode using {{ic|mdadm --readonly}} before the first write request, or resync can be started without a write using {{ic|mdadm --readwrite}}.}}<br />
<br />
To set the parameter at boot, add {{ic|<nowiki>md_mod.start_ro=1</nowiki>}} to your kernel line.<br />
<br />
Or set it at module load time from {{ic|/etc/modprobe.d/}} file or from directly from {{ic|/sys/}}.<br />
{{bc|echo 1 > /sys/module/md_mod/parameters/start_ro}}<br />
<br />
===Recovering from a broken or missing drive in the raid===<br />
You might get the above mentioned error also when one of the drives breaks for whatever reason. In that case you will have to force the raid to still turn on even with one disk short. Type this (change where needed):<br />
# mdadm --manage /dev/md0 --run<br />
<br />
Now you should be able to mount it again with something like this (if you had it in fstab):<br />
# mount /dev/md0<br />
<br />
Now the raid should be working again and available to use, however with one disk short! So, to add that one disc partition it the way like described above in [[#Prepare the Devices|Prepare the device]]. Once that is done you can add the new disk to the raid by doing:<br />
# mdadm --manage --add /dev/md0 /dev/sdd1<br />
<br />
If you type:<br />
# cat /proc/mdstat<br />
you probably see that the raid is now active and rebuilding.<br />
<br />
You also might want to update your configuration (see: [[#Update configuration file]]).<br />
<br />
== Benchmarking ==<br />
There are several tools for benchmarking a RAID. The most notable improvement is the speed increase when multiple threads are reading from the same RAID volume.<br />
<br />
{{AUR|tiobench}} specifically benchmarks these performance improvements by measuring fully-threaded I/O on the disk.<br />
<br />
{{Pkg|bonnie++}} tests database type access to one or more files, and creation, reading, and deleting of small files which can simulate the usage of programs such as Squid, INN, or Maildir format e-mail. The enclosed [http://www.coker.com.au/bonnie++/zcav/ ZCAV] program tests the performance of different zones of a hard drive without writing any data to the disk.<br />
<br />
{{ic|hdparm}} should '''NOT''' be used to benchmark a RAID, because it provides very inconsistent results.<br />
<br />
== See also ==<br />
* [http://www.gentoo.org/doc/en/articles/software-raid-p1.xml Software RAID in the new Linux 2.4 kernel, Part 1] and [http://www.gentoo.org/doc/en/articles/software-raid-p2.xml Part 2] in the Gentoo Linux Docs<br />
* [http://raid.wiki.kernel.org/index.php/Linux_Raid Linux RAID wiki entry] on The Linux Kernel Archives<br />
* [https://raid.wiki.kernel.org/index.php/Write-intent_bitmap How Bitmaps Work]<br />
* [http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/ch-raid.html Chapter 15: Redundant Array of Independent Disks (RAID)] of Red Hat Enterprise Linux 6 Documentation<br />
* [http://tldp.org/FAQ/Linux-RAID-FAQ/x37.html Linux-RAID FAQ] on the Linux Documentation Project<br />
* [http://support.dell.com/support/topics/global.aspx/support/entvideos/raid?c=us&l=en&s=gen Dell.com Raid Tutorial] - Interactive Walkthrough of Raid<br />
* [http://www.miracleas.com/BAARF/ BAARF] including ''[http://www.miracleas.com/BAARF/RAID5_versus_RAID10.txt Why should I not use RAID 5?]'' by Art S. Kagel<br />
* [http://www.linux-mag.com/id/7924/ Introduction to RAID], [http://www.linux-mag.com/id/7931/ Nested-RAID: RAID-5 and RAID-6 Based Configurations], [http://www.linux-mag.com/id/7928/ Intro to Nested-RAID: RAID-01 and RAID-10], and [http://www.linux-mag.com/id/7932/ Nested-RAID: The Triple Lindy] in Linux Magazine<br />
* [http://www.cyberciti.biz/tips/linux-raid-increase-resync-rebuild-speed.html HowTo: Speed Up Linux Software Raid Building And Re-syncing]<br />
* [http://fomori.org/blog/?p=94 RAID5-Server to hold all your data]<br />
<br />
'''Active bugs'''<br />
* [http://marc.info/?l=linux-raid&m=144960710718870&w=2 linux-raid ML #1]<br />
* [http://marc.info/?l=linux-raid&m=144830809503689&w=2 linux-raid ML #2]<br />
<br />
'''mdadm'''<br />
* [http://anonscm.debian.org/gitweb/?p=pkg-mdadm/mdadm.git;a=blob_plain;f=debian/FAQ;hb=HEAD Debian mdadm FAQ]<br />
* [http://www.kernel.org/pub/linux/utils/raid/mdadm/ mdadm source code]<br />
* [http://www.linux-mag.com/id/7939/ Software RAID on Linux with mdadm] in Linux Magazine<br />
<br />
'''Forum threads'''<br />
* [http://forums.overclockers.com.au/showthread.php?t=865333 Raid Performance Improvements with bitmaps]<br />
* [https://bbs.archlinux.org/viewtopic.php?id=125445 GRUB and GRUB2]<br />
* [https://bbs.archlinux.org/viewtopic.php?id=123698 Can't install grub2 on software RAID]<br />
* [http://forums.gentoo.org/viewtopic-t-888624-start-0.html Use RAID metadata 1.2 in boot and root partition]<br />
<br />
'''RAID with encryption'''<br />
* [http://www.shimari.com/dm-crypt-on-raid/ Linux/Fedora: Encrypt /home and swap over RAID with dm-crypt] by Justin Wells</div>Wribbe