Difference between revisions of "Talk:PCI passthrough via OVMF"

From ArchWiki
Jump to: navigation, search
m (Passthrough via OVMF without libvirt: forgot to sign)
m (Passthrough via OVMF without libvirt: copy the vars)
Line 194: Line 194:
 
     -drive file=diskimg.qcow2,format=qcow2
 
     -drive file=diskimg.qcow2,format=qcow2
 
[[User:Dhead|Dhead]] ([[User talk:Dhead|talk]]) 15:58, 8 May 2017 (UTC)
 
[[User:Dhead|Dhead]] ([[User talk:Dhead|talk]]) 15:58, 8 May 2017 (UTC)
 +
 +
: And you should copy the vars first
 +
mkdir -p /tmp/ovmf/qemu_vm-01
 +
cp /usr/share/ovmf/ovmf_vars_x64.bin /tmp/ovmf/qemu_vm-01/ovmf_vars_x64.bin
 +
: [[User:Dhead|Dhead]] ([[User talk:Dhead|talk]]) 16:08, 8 May 2017 (UTC)

Revision as of 16:08, 8 May 2017

Supported Hardware Table

I think it would be a good idea to add a table of working / not working hardware like this:

Motherboard CPU Host GPU Client GPU uname -r ACS Override needed Notes
ASUS Z170-A Intel i7-6700K Integrated ASUS Strix GTX 980 Ti 4.3.3-3-ARCH no No problems

--Phiresky (talk) 12:28, 29 January 2016 (UTC)

Well, there's: https://docs.google.com/spreadsheets/d/1LnGpTrXalwGVNy0PWJDURhyxa3sgqkGXmvNCIvIMenk/edit#gid=2 and it has (if I read it properly) 208 unique motherboards. I'm not sure if Wiki can handle such a long list. Annisar (talk)
Huh, looks like I missed that. Yeah, that list is probably too long. Kind of unclean / hard to read and can't edit it though. Phiresky (talk) 13:20, 29 January 2016 (UTC)

Page rewrite

I've fiddled a lot with vfio in the last few months and I've been thinking about restructuring this page based on the information I've gathered over time. Considering large chunks of the page date all the way back from wen it was first written, and that the structure of the page isn't as researched as what you'd find on the rest of the wiki, I think a restructuration could greatly improve the overall understability of the page. Here's the overall structure I had in mind :

Prerequisites
Setting up IOMMU
Enabling IOMMU
Ensuring that the groups are valid
Common mistakes/Gotchas
Isolating the GPU
Using vfio-pci (Reccomended, Linux 4.1+)
Using pci-stub (Legacy)
Common mistakes/Gotchas
Setting up QEMU-kvm
Using libvirt
Using the qemu command line
Troubleshooting
Error 43 in Windows on Nvidia cards
"Invalid ROM content" in dmesg when trying to claim the guest GPU
...
Extras
ACS override patch
Improving performance on the VM

I've already written a good chunk of what's left, but I'd like some feedback on the proposed structure and what's already there before I proceed. Victoire (talk) 20:55, 8 April 2016 (UTC)

Due to a lack of technical background, I can't really comment on completeness of your draft TOC compared to what the article covers, but it reads very structured. I had a look over the last edits you already committed to the article. Well done, I only want to give a couple of hints:
  1. Please take care when moving sections and editing, best split the move into its own commit. It is very difficult for someone else (or yourself later) to follow-up on what has been done (see ArchWiki:Contributing#Do not make complex edits at once).
  2. Point 1 is particularly important if you decide content is outdated and remove it during the restructuring. Usually we put Template:Deletion to the part first. Maybe you can do that as well. However, during a restructure the content might be in the way. You can also use a temporary section e.g. "Obsoleted sections" to move the sections to for the time being. If you delete parts, please do so only per subsection (edit button next to the header). This way it is easier to figure via history what went where.
  3. For the language, simple one: We avoid contractions (Help:Style#Language register)
  4. Looking at the existing article, there are some sections (e.g. PCI passthrough via OVMF#Complete example for QEMU .28CLI-based.29 without libvirtd) with very very long code blocks. Too long for good reading. Two alternatives for those (if you need them in the anticipated structure): Either move the long example code blocks to the end of the article (e.g. the Extras section) and crosslink them from above, quoting only the required excerpts. Or quote as well, but move the actual full code to a non-ad-spoiled gist (e.g. gist.github.com). If you move them outside the wiki, please do the removal/replacement with the gist link in one edit (same reason as for deletions, thanks).
  5. In the other talk items there are a few suggestions, e.g. #Article_Specificity and #Supported_Hardware_Table that might be useful to consider going forward.
That said, I hope your push to restructure and refresh the article gets input and help from other contributors to this article. --Indigo (talk) 18:49, 18 April 2016 (UTC)
Thanks for all of the contributions, please remember to keep the scope of the page inside it's original design: to achieve PCI passthrough via OVMF. I would like to have the long blocks of code removed/relocated especially those that do not necessarily explain the actual process of getting passthrough to work. Thanks!
—This unsigned comment is by Naruni (talk) 20:42, 24 April 2016‎. Please sign your posts with ~~~~!
I think it may have lost some focus because OVMF was not explicitly pointed to in the intro. As suggested in #Article_Specificity, it may be useful to split content unrelated to OVMF into a separate article, e.g. PCI passthrough and this article may stay single or become a subpage for it (e.g. PCI passthrough/OVMF), whatever is best to keep focus, while not duplicating instructions or loose contributions which are not directly OVMF related.
If you look at the above structure Victoire has worked out, does it contain sections you would consider out-of-scope for OVMF?
--Indigo (talk) 13:04, 26 April 2016 (UTC)
So I have reached the point where most of the structure is now in place and most of the work left is either related to adressing specific issues people are likely to encounter or rephrasing some parts to make them more readable, such as the part on setting up the guest OS, which could use some fleshing out.
I have also taken the liberty of adding a Performance Tuning section, which may or may not be out of place considering the scope of the article. I would appreciate some feedback on whether or this belongs here.
Also, both for the sake of readability and because I couldn't test the instructions myself, I removed most of the QEMU-related (without libvirt) instructions. While it seemed like a good decision at the time, I would like to get some feedback on this, as it is still a removal of potentially useful instructions (although some of them did seem dubtious).
Victoire (talk) 14:07, 9 May 2016 (UTC)
I was hoping others with topic experience chime in to feedback, perhaps that happens still. Arguably the whole article is about performance tuning, so I would see the section you added in scope. It is an interesting read even for someone like me, who has not used any of this. For the QEMU point I'm unable to give feedback. --Indigo (talk) 08:02, 30 June 2016 (UTC)
I agree that some of the QEMU scripts removed from the article were too detailed and still not explained well enough, but originally when I was setting up my passthrough system with the help of this article, I preferred the scripted way instead of libvirt and still do after a long good experience with the method. At the time the scripts in the article pointed me to the right direction. I'd really like to contribute to the article with smaller and simpler script examples with decent explanations for using every core parameter needed for the VM started with QEMU commands, so anyone with an opinion about this please comment before I start working on it. The scripted way is great for many use cases and I really think it needs to be addressed better on this article. Nd43 (talk) 07:28, 20 April 2017 (UTC)

Additionnal sections

In case I forget, I made a list of things I wanted to add to this article some time ago, I just haven't found the time to write those parts yet.

  • Performance tuning
    • Kernel config (Volountary preemption, 1000 Hz timer frequency, dynticks, halt_poll_ns, etc.)
    • Advanced CPUs features
      • 1GB hugepages
      • NUMA node assignment
      • Hardware virtual interrupt management (AVIC/APICv)
  • Special Procedures
    • Using identical guest and host GPUs
    • Passing the boot GPU (see here)
    • Bypassing the IOMMU groups (ACS override patch)
  • Additionnal devices
    • GPU soundcard (MSI interrupts)

As of now, I don't have the sort of hardware that would allow me to test these special cases, so it's a bit hard for me to justify writing those sections, but it might be interesting to add those someday.

Victoire (talk) 02:16, 5 August 2016 (UTC)

EDIT1 : Victoire (talk) 03:10, 9 August 2016 (UTC)

hotplug gpu

found something interresting for hotpluging the gpu without to restart x:

http://arseniyshestakov.com/2016/03/31/how-to-pass-gpu-to-vm-and-back-without-x-restart/ repo: https://gist.github.com/ArseniyShestakov/dc152d080c65ebaa6781

It removes the card from the graphics driver module (here radeon) and add's it to the vfio-module.

I'm currently testing this. —This unsigned comment is by Xtf (talk) 22:25, 12 August 2016‎. Please sign your posts with ~~~~!

Necessity of ovmf-git

Right now, the article recommends using a package from the AUR, ovmf-gitAUR. The only reason for that is because this version comes with the EFI var template files while the ovmf package from the Extra repos doesn't. I'm not entirely sure how complex those templates are and if there is a way to install OVMF properly without them, though, and it would definitely be best if we could do without the use of an AUR package. Victoire (talk) 17:40, 22 November 2016 (UTC)

I aggree, but has the current maintainer of ovmf been made aware of this issue yet?
--Lineks (talk) 17:33, 9 April 2017 (UTC)

Inaccurate hugepage advice

The static hugepage section says: "On a VM with a PCI passthrough, however, it is not possible to benefit from transparent huge pages, as IOMMU requires that the guest's memory be allocated and pinned as soon as the VM starts. It is therefore required to allocate huge pages statically in order to benefit from them."

This may have been true previously but in my experience (currently verified on an Ubuntu Trusty box with kernel 4.4) this is no longer accurate:

gpu-hypervisor:~$ uname -a
Linux rcgpudc1r54-07 4.4.0-59-generic #80~14.04.1-Ubuntu SMP Fri Jan 6 18:02:02 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
gpu-hypervisor:~$ ps auxww | grep qemu | grep -v qemu
libvirt+ 105915  297 95.3 325650548 251763980 ? SLl  Apr10 9725:30 /usr/bin/qemu-system-x86_64 -name instance-00000167 -S -machine pc-i440fx-xenial,accel=kvm,usb=off -cpu host -m 245760 -realtime mlock=off -smp 24,sockets=2,cores=12,threads=1 -object memory-backend-ram,id=ram-node0,size=128849018880,host-nodes=0,policy=bind -numa node,nodeid=0,cpus=0-11,memdev=ram-node0 -object memory-backend-ram,id=ram-node1,size=128849018880,host-nodes=1,policy=bind -numa node,nodeid=1,cpus=12-23,memdev=ram-node1 -uuid 87681aae-2bc7-4b2e-b17b-f407cf23701e -smbios type=1,manufacturer=OpenStack Foundation,product=OpenStack Nova,version=12.0.4,serial=4c4c4544-0059-4710-8036-c3c04f483832,uuid=87681aae-2bc7-4b2e-b17b-f407cf23701e,family=Virtual Machine -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-instance-00000167/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/nova/instances/87681aae-2bc7-4b2e-b17b-f407cf23701e/disk,format=raw,if=none,id=drive-virtio-disk0,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/var/lib/nova/instances/87681aae-2bc7-4b2e-b17b-f407cf23701e/disk.eph0,format=raw,if=none,id=drive-virtio-disk1,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk1,id=virtio-disk1 -netdev tap,fd=29,id=hostnet0,vhost=on,vhostfd=31 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:5e:21:1c,bus=pci.0,addr=0x3 -chardev file,id=charserial0,path=/var/lib/nova/instances/87681aae-2bc7-4b2e-b17b-f407cf23701e/console.log -device isa-serial,chardev=charserial0,id=serial0 -chardev pty,id=charserial1 -device isa-serial,chardev=charserial1,id=serial1 -device usb-tablet,id=input0 -vnc 0.0.0.0:0 -k en-us -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device vfio-pci,host=05:00.0,id=hostdev0,bus=pci.0,addr=0x6 -device vfio-pci,host=06:00.0,id=hostdev1,bus=pci.0,addr=0x7 -device vfio-pci,host=85:00.0,id=hostdev2,bus=pci.0,addr=0x8 -device vfio-pci,host=86:01.1,id=hostdev3,bus=pci.0,addr=0x9 -device vfio-pci,host=84:00.0,id=hostdev4,bus=pci.0,addr=0xa -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xb -msg timestamp=on
gpu-hypervisor:~$ grep AnonHuge /proc/meminfo
AnonHugePages:  246108160 kB

I don't have an Arch system to verify this on, but I don't expect it to be distro specific. Any objection to removing this?

Have you tested memory performance inside the VM? As in, statically allocated hugepages vs transparent, with VFIO devices bound and actively used inside the VM? DragoonAethis (talk) 11:18, 13 April 2017 (UTC)

Using identical guest and host GPUs - did not work for me.

I had to create the script vfio-pci-override.sh in bin, not sbin, or it would not find the file. (and update the modprobe and mkinitcpio accordingly)

furthermore, I had to change the sys/devices path to search a directory deeper. The script /bin/vfio-pci-override.sh looks like this then:

#!/bin/sh
  
  for i in /sys/devices/pci*/*/*/boot_vga; do
          if [ $(cat "$i") -eq 0 ]; then
                  GPU="${i%/boot_vga}"
                  AUDIO="$(echo "$GPU" | sed -e "s/0$/1/")"
                  echo "vfio-pci" > "$GPU/driver_override"
                  if [ -d "$AUDIO" ]; then
                          echo "vfio-pci" > "$AUDIO/driver_override"
                  fi
          fi
  done
  
  modprobe -i vfio-pci


This situation is the only one that worked for me. Eggz (talk) 19:12, 14 April 2017 (UTC)

Passthrough via OVMF without libvirt

I don't have a graphic card with UEFI fw to test this but it should work (with the ovmf package and not ovmf-git), please test and add it to the page if it does work.

qemu-system-x86_64 \
    -enable-kvm \
    -m 8192 \
    -M q35 \
    -cpu host \
    -smp 4,sockets=1,cores=4,threads=1 \
    -vga none \
    -drive if=pflash,format=raw,readonly,file=/usr/share/ovmf/ovmf_code_x64.bin \
    -drive if=pflash,format=raw,file=/tmp/ovmf/qemu-vm-01/ovmf_vars_x64.bin \
    -device vfio-pci,host=05:00.0,addr=09.0,multifunction=on,x-vga=on \
    -device vfio-pci,host=05:00.1,addr=09.1 \
    -drive file=diskimg.qcow2,format=qcow2

Dhead (talk) 15:58, 8 May 2017 (UTC)

And you should copy the vars first
mkdir -p /tmp/ovmf/qemu_vm-01
cp /usr/share/ovmf/ovmf_vars_x64.bin /tmp/ovmf/qemu_vm-01/ovmf_vars_x64.bin
Dhead (talk) 16:08, 8 May 2017 (UTC)