Talk:PCI passthrough via OVMF

From ArchWiki
Revision as of 14:06, 11 July 2017 by Dhead (talk | contribs) (→‎Moving the full tutorial elsewhere?: moved, no, copied and expanded elsewhere, yes)
Jump to navigation Jump to search

Supported Hardware Table

I think it would be a good idea to add a table of working / not working hardware like this:

Motherboard CPU Host GPU Client GPU uname -r ACS Override needed Notes
ASUS Z170-A Intel i7-6700K Integrated ASUS Strix GTX 980 Ti 4.3.3-3-ARCH no No problems

--Phiresky (talk) 12:28, 29 January 2016 (UTC)

Well, there's: and it has (if I read it properly) 208 unique motherboards. I'm not sure if Wiki can handle such a long list. Annisar (talk)
Huh, looks like I missed that. Yeah, that list is probably too long. Kind of unclean / hard to read and can't edit it though. Phiresky (talk) 13:20, 29 January 2016 (UTC)

Page rewrite

I've fiddled a lot with vfio in the last few months and I've been thinking about restructuring this page based on the information I've gathered over time. Considering large chunks of the page date all the way back from wen it was first written, and that the structure of the page isn't as researched as what you'd find on the rest of the wiki, I think a restructuration could greatly improve the overall understability of the page. Here's the overall structure I had in mind :

Setting up IOMMU
Enabling IOMMU
Ensuring that the groups are valid
Common mistakes/Gotchas
Isolating the GPU
Using vfio-pci (Reccomended, Linux 4.1+)
Using pci-stub (Legacy)
Common mistakes/Gotchas
Setting up QEMU-kvm
Using libvirt
Using the qemu command line
Error 43 in Windows on Nvidia cards
"Invalid ROM content" in dmesg when trying to claim the guest GPU
ACS override patch
Improving performance on the VM

I've already written a good chunk of what's left, but I'd like some feedback on the proposed structure and what's already there before I proceed. Victoire (talk) 20:55, 8 April 2016 (UTC)

Due to a lack of technical background, I can't really comment on completeness of your draft TOC compared to what the article covers, but it reads very structured. I had a look over the last edits you already committed to the article. Well done, I only want to give a couple of hints:
  1. Please take care when moving sections and editing, best split the move into its own commit. It is very difficult for someone else (or yourself later) to follow-up on what has been done (see ArchWiki:Contributing#Do not make complex edits at once).
  2. Point 1 is particularly important if you decide content is outdated and remove it during the restructuring. Usually we put Template:Deletion to the part first. Maybe you can do that as well. However, during a restructure the content might be in the way. You can also use a temporary section e.g. "Obsoleted sections" to move the sections to for the time being. If you delete parts, please do so only per subsection (edit button next to the header). This way it is easier to figure via history what went where.
  3. For the language, simple one: We avoid contractions (Help:Style#Language register)
  4. Looking at the existing article, there are some sections (e.g. PCI passthrough via OVMF#Complete example for QEMU .28CLI-based.29 without libvirtd) with very very long code blocks. Too long for good reading. Two alternatives for those (if you need them in the anticipated structure): Either move the long example code blocks to the end of the article (e.g. the Extras section) and crosslink them from above, quoting only the required excerpts. Or quote as well, but move the actual full code to a non-ad-spoiled gist (e.g. If you move them outside the wiki, please do the removal/replacement with the gist link in one edit (same reason as for deletions, thanks).
  5. In the other talk items there are a few suggestions, e.g. #Article_Specificity and #Supported_Hardware_Table that might be useful to consider going forward.
That said, I hope your push to restructure and refresh the article gets input and help from other contributors to this article. --Indigo (talk) 18:49, 18 April 2016 (UTC)
Thanks for all of the contributions, please remember to keep the scope of the page inside it's original design: to achieve PCI passthrough via OVMF. I would like to have the long blocks of code removed/relocated especially those that do not necessarily explain the actual process of getting passthrough to work. Thanks!
—This unsigned comment is by Naruni (talk) 20:42, 24 April 2016‎. Please sign your posts with ~~~~!
I think it may have lost some focus because OVMF was not explicitly pointed to in the intro. As suggested in #Article_Specificity, it may be useful to split content unrelated to OVMF into a separate article, e.g. PCI passthrough and this article may stay single or become a subpage for it (e.g. PCI passthrough/OVMF), whatever is best to keep focus, while not duplicating instructions or loose contributions which are not directly OVMF related.
If you look at the above structure Victoire has worked out, does it contain sections you would consider out-of-scope for OVMF?
--Indigo (talk) 13:04, 26 April 2016 (UTC)
So I have reached the point where most of the structure is now in place and most of the work left is either related to adressing specific issues people are likely to encounter or rephrasing some parts to make them more readable, such as the part on setting up the guest OS, which could use some fleshing out.
I have also taken the liberty of adding a Performance Tuning section, which may or may not be out of place considering the scope of the article. I would appreciate some feedback on whether or this belongs here.
Also, both for the sake of readability and because I couldn't test the instructions myself, I removed most of the QEMU-related (without libvirt) instructions. While it seemed like a good decision at the time, I would like to get some feedback on this, as it is still a removal of potentially useful instructions (although some of them did seem dubtious).
Victoire (talk) 14:07, 9 May 2016 (UTC)
I was hoping others with topic experience chime in to feedback, perhaps that happens still. Arguably the whole article is about performance tuning, so I would see the section you added in scope. It is an interesting read even for someone like me, who has not used any of this. For the QEMU point I'm unable to give feedback. --Indigo (talk) 08:02, 30 June 2016 (UTC)
I agree that some of the QEMU scripts removed from the article were too detailed and still not explained well enough, but originally when I was setting up my passthrough system with the help of this article, I preferred the scripted way instead of libvirt and still do after a long good experience with the method. At the time the scripts in the article pointed me to the right direction. I'd really like to contribute to the article with smaller and simpler script examples with decent explanations for using every core parameter needed for the VM started with QEMU commands, so anyone with an opinion about this please comment before I start working on it. The scripted way is great for many use cases and I really think it needs to be addressed better on this article. Nd43 (talk) 07:28, 20 April 2017 (UTC)

Additionnal sections

In case I forget, I made a list of things I wanted to add to this article some time ago, I just haven't found the time to write those parts yet.

  • Performance tuning
    • Kernel config (Volountary preemption, 1000 Hz timer frequency, dynticks, halt_poll_ns, etc.)
    • Advanced CPUs features
      • 1GB hugepages
      • NUMA node assignment
      • Hardware virtual interrupt management (AVIC/APICv)
  • Special Procedures
    • Using identical guest and host GPUs
    • Passing the boot GPU (see here)
    • Bypassing the IOMMU groups (ACS override patch)
  • Additionnal devices
    • GPU soundcard (MSI interrupts)

As of now, I don't have the sort of hardware that would allow me to test these special cases, so it's a bit hard for me to justify writing those sections, but it might be interesting to add those someday.

Victoire (talk) 02:16, 5 August 2016 (UTC)

EDIT1 : Victoire (talk) 03:10, 9 August 2016 (UTC)

hotplug gpu

found something interresting for hotpluging the gpu without to restart x: repo:

It removes the card from the graphics driver module (here radeon) and add's it to the vfio-module.

I'm currently testing this. —This unsigned comment is by Xtf (talk) 22:25, 12 August 2016‎. Please sign your posts with ~~~~!

Necessity of ovmf-git

Right now, the article recommends using a package from the AUR, ovmf-gitAUR. The only reason for that is because this version comes with the EFI var template files while the ovmf package from the Extra repos doesn't. I'm not entirely sure how complex those templates are and if there is a way to install OVMF properly without them, though, and it would definitely be best if we could do without the use of an AUR package. Victoire (talk) 17:40, 22 November 2016 (UTC)

I aggree, but has the current maintainer of ovmf been made aware of this issue yet?
--Lineks (talk) 17:33, 9 April 2017 (UTC)
I think whis was fixed on the latest update in march. See
Bug report closed as implemented:
I'll test and confirm.
--WeissJT (talk) 23:15, 8 May 2017 (UTC)
Confirmed, ovmf works fine. ovmf-gitAUR is not needed anymore.
--WeissJT (talk) 23:17, 12 May 2017 (UTC)
Thanks for the confirmation, made the edit on the plain QEMU section too. Nd43 (talk) 07:30, 13 May 2017 (UTC)

Inaccurate hugepage advice

The static hugepage section says: "On a VM with a PCI passthrough, however, it is not possible to benefit from transparent huge pages, as IOMMU requires that the guest's memory be allocated and pinned as soon as the VM starts. It is therefore required to allocate huge pages statically in order to benefit from them."

This may have been true previously but in my experience (currently verified on an Ubuntu Trusty box with kernel 4.4) this is no longer accurate:

gpu-hypervisor:~$ uname -a
Linux rcgpudc1r54-07 4.4.0-59-generic #80~14.04.1-Ubuntu SMP Fri Jan 6 18:02:02 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
gpu-hypervisor:~$ ps auxww | grep qemu | grep -v qemu
libvirt+ 105915  297 95.3 325650548 251763980 ? SLl  Apr10 9725:30 /usr/bin/qemu-system-x86_64 -name instance-00000167 -S -machine pc-i440fx-xenial,accel=kvm,usb=off -cpu host -m 245760 -realtime mlock=off -smp 24,sockets=2,cores=12,threads=1 -object memory-backend-ram,id=ram-node0,size=128849018880,host-nodes=0,policy=bind -numa node,nodeid=0,cpus=0-11,memdev=ram-node0 -object memory-backend-ram,id=ram-node1,size=128849018880,host-nodes=1,policy=bind -numa node,nodeid=1,cpus=12-23,memdev=ram-node1 -uuid 87681aae-2bc7-4b2e-b17b-f407cf23701e -smbios type=1,manufacturer=OpenStack Foundation,product=OpenStack Nova,version=12.0.4,serial=4c4c4544-0059-4710-8036-c3c04f483832,uuid=87681aae-2bc7-4b2e-b17b-f407cf23701e,family=Virtual Machine -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-instance-00000167/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/nova/instances/87681aae-2bc7-4b2e-b17b-f407cf23701e/disk,format=raw,if=none,id=drive-virtio-disk0,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/var/lib/nova/instances/87681aae-2bc7-4b2e-b17b-f407cf23701e/disk.eph0,format=raw,if=none,id=drive-virtio-disk1,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk1,id=virtio-disk1 -netdev tap,fd=29,id=hostnet0,vhost=on,vhostfd=31 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:5e:21:1c,bus=pci.0,addr=0x3 -chardev file,id=charserial0,path=/var/lib/nova/instances/87681aae-2bc7-4b2e-b17b-f407cf23701e/console.log -device isa-serial,chardev=charserial0,id=serial0 -chardev pty,id=charserial1 -device isa-serial,chardev=charserial1,id=serial1 -device usb-tablet,id=input0 -vnc -k en-us -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device vfio-pci,host=05:00.0,id=hostdev0,bus=pci.0,addr=0x6 -device vfio-pci,host=06:00.0,id=hostdev1,bus=pci.0,addr=0x7 -device vfio-pci,host=85:00.0,id=hostdev2,bus=pci.0,addr=0x8 -device vfio-pci,host=86:01.1,id=hostdev3,bus=pci.0,addr=0x9 -device vfio-pci,host=84:00.0,id=hostdev4,bus=pci.0,addr=0xa -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xb -msg timestamp=on
gpu-hypervisor:~$ grep AnonHuge /proc/meminfo
AnonHugePages:  246108160 kB

I don't have an Arch system to verify this on, but I don't expect it to be distro specific. Any objection to removing this?

Have you tested memory performance inside the VM? As in, statically allocated hugepages vs transparent, with VFIO devices bound and actively used inside the VM? DragoonAethis (talk) 11:18, 13 April 2017 (UTC)

Using identical guest and host GPUs - did not work for me.

I had to create the script in bin, not sbin, or it would not find the file. (and update the modprobe and mkinitcpio accordingly)

furthermore, I had to change the sys/devices path to search a directory deeper. The script /bin/ looks like this then:

  for i in /sys/devices/pci*/*/*/boot_vga; do
          if [ $(cat "$i") -eq 0 ]; then
                  AUDIO="$(echo "$GPU" | sed -e "s/0$/1/")"
                  echo "vfio-pci" > "$GPU/driver_override"
                  if [ -d "$AUDIO" ]; then
                          echo "vfio-pci" > "$AUDIO/driver_override"
  modprobe -i vfio-pci

This situation is the only one that worked for me. Eggz (talk) 19:12, 14 April 2017 (UTC)

Passthrough via OVMF without libvirt

I don't have a graphic card with UEFI fw to test this but it should work (with the ovmf package and not ovmf-git), please test and add it to the page if it does work.

qemu-system-x86_64 \
    -enable-kvm \
    -m 8192 \
    -M q35 \
    -cpu host \
    -smp 4,sockets=1,cores=4,threads=1 \
    -vga none \
    -drive if=pflash,format=raw,readonly,file=/usr/share/ovmf/ovmf_code_x64.bin \
    -drive if=pflash,format=raw,file=/tmp/ovmf/qemu-vm-01/ovmf_vars_x64.bin \
    -device vfio-pci,host=05:00.0,addr=09.0,multifunction=on,x-vga=on \
    -device vfio-pci,host=05:00.1,addr=09.1 \
    -drive file=diskimg.qcow2,format=qcow2

Dhead (talk) 15:58, 8 May 2017 (UTC)

And you should copy the vars first
mkdir -p /tmp/ovmf/qemu_vm-01
cp /usr/share/ovmf/ovmf_vars_x64.bin /tmp/ovmf/qemu_vm-01/ovmf_vars_x64.bin
Dhead (talk) 16:08, 8 May 2017 (UTC)
Oh, indeed the official ovmf package seems to be updated after a long while to include the variable files, at least I was told they were missing from it before. I'll test this out and do a writeup about it as I'm actually writing a draft for adding the scripted method back to the artictle, check here: User:Nd43/Scripted QEMU setup without libvirt. It's still incomplete but already it's also getting pretty big and detailed, any comments welcomed. Also I'm wondering if it should be a subpage or not when I finally start merging it back to this actual article. Nd43 (talk) 08:19, 9 May 2017 (UTC)
Nice guide but I don't think the ArchWiki is the right place for it, it's too specific and too detailed to be incorporated into this article and adding it as a subpage will end with duplicated content which will never be in sync.
The starting point of this article is that the user (an Arch user, not a copy-paste monkey) already gone through the QEMU article so everything not directly related to pci passing like systemd service, audio, putting all in a script should not be added to this article. So instead we should just add the very basic command switches related to using OVMF.

I might be a little digressing, but I also think this page should be torn down and reworked. All the performance tuning should be in its own subpage of the main QEMU article. Using OVMF instead of the default SeaBIOS also should be in a subpage (of QEMU). So we will be left with just the PCI passthrough stuff here. Dhead (talk)
How about mentioning just the passthrough switches for QEMU here? The -device vfio-pci,host=... ones, even OVMF isn't *required* for the passthrough to work (although it is easier to work with). Other than that, passthrough can be thrown in for just about any QEMU VM, and it's up to the guest OS to handle these devices somehow. Other QEMU args are specific to that VM. DragoonAethis (talk) 10:09, 9 May 2017 (UTC)
I agree, just adding the related QEMU command switches should be more than enough. p.s. OVMF might be preferred for VGA PCI passthrough as at least on my system SeaBIOS seems to have some issues with USB input devices on boot/in bootloader (not an issue when the OS/kernel is running).Dhead (talk) 12:05, 9 May 2017 (UTC)
You're probably right, but I still think many details of my writeup should be easily available for Arch users looking just to deploy a QEMU VM with PCI passthrough. I also agree that the whole article needs a rewrite on some parts but I really don't want to see the actually relevant information get removed temporarily or buried behind too many internal links to other articles. I've already seen how the article has suffered during the few years and currently its structure is partly very unclear. Does anyone have additional suggestions for the information to be added or should I just start editing with small additions to see how they're received? Nd43 (talk) 18:44, 9 May 2017 (UTC)
Passing through the GPUs to a ready VM is just about adding those two switches and making sure you have some "extras" in if you're using a specific setup (GeForces requiring the spoofed vendor ID, etc). It might make more sense to let people configure their VM first (= installing OS under UEFI, installing VirtIO drivers, configuring their base environment etc) and THEN explain how to pass through the GPU to that ready VM - this might fit pretty well in this article.
Most of these copy-n-paste guides assume a very specific setup (memory, CPU cores, disks, etc), and there is value in those, since you can peek at complete solutions that are known to work under specific hardware combinations. However I'd rather create a separate VFIO "examples" page, where a brief description of that user's hardware, software (kernel + cmdline, additional VM configuration steps, etc) and the QEMU script/libvirt domain file would be posted. Exact specifics (additional scripts, configuration files for Synergy, PulseAudio and such) would have to be posted on GitHub/GitLab/somewhere else as to not clutter the wiki beyond reason (older examples would have to be removed or updated over time, since QEMU/libvirt break things once in a while). Does that sound fine? DragoonAethis (talk) 21:55, 10 May 2017 (UTC)
I like your thinking. I made an initial revision edit at PCI passthrough via OVMF#Plain QEMU without libvirt describing the bare minimum for achieving a practical QEMU setup with links to other relevant articles. I'm probably going to continue editing the section later as I want to pay attention to some details and terminology a bit more, but for now I think it will do for provoking people to make additions/changes to reach better guidance on the topic. So, comments and edits highly welcomed. I also removed the old links for the time being, even though I really liked Melvin's script examples, they indeed belong to the "See also" section or somewhere else totally. Feel free to relocate the links to a logical order. Nd43 (talk) 15:32, 13 May 2017 (UTC)
I've created the PCI_passthrough_via_OVMF/Examples page, feel free to contribute your working setups there. There's a template etc available to make adding new entries easier and more consistent. DragoonAethis (talk) 14:37, 19 May 2017 (UTC)

Moving the full tutorial elsewhere?

A number of people on this talk page have already mentioned how having a complete tutorial like this on the Arch Wiki makes some parts of the article somewhat redundant with other pages on the wiki, and that it would probably be better if the article were split into multiple parts and spread across a number of pages. At the same time, having a full tutorial like this here means it can't really cover other distros without breaking the contribution guidelines, which limits the article's reach somewhat.

However, I know a lot of people (myself included) have been convinced to try Arch after seeing a number of quality articles on the official wiki, and from what I've seen elsewhere, this very article has been this to some people. I've seen people on /r/vfio and elsewhere use this article as their primary reference for people who want to setup their machine this way. That's somewhat what I was aiming for when I started reworking this article to try and adapt AW's blog into something that's less intimidating to read, and it's great to see it has managed to evolve into what it is now thanks to all the contributions that have happened since then.

I'd like to know, according to more experienced Arch Wiki contributors than me, whether or not this article actually belongs here, or if the current page shouldn't be moved to the VFIO subreddit and the Arch page torn down and its content split on other QEMU-related pages.

Victoire (talk) 14:56, 10 July 2017 (UTC)

IMO this article is fine here - it's widely linked, kept mostly up-to-date, clearly explained and doesn't require two days of research to even attempt this setup. I would agree making an exception and allowing other distro-specific information to be posted here would be great, but moving the entire post to /r/vfio, where the wiki pages don't work on mobile and limit contributors to the subreddit's moderators or manually added people will limit the article's reach even more and cause it to go out-of-date in a few months. (If the wiki is open to all registered redditors, most more-popular wiki pages *will* be defaced.) DragoonAethis (talk) 12:24, 11 July 2017 (UTC)
Victoire, this is the Arch Wiki, users of other distros will be better served with full tutorials elsewhere, /r/VFIO has a wiki.
Regarding your choice of words, the page shouldn't be moved anywhere, copied somewhere else, yes, there you could just have a simple full tutorial from start to finish to set a host and guest QEMU machines with GPU passthrough, but the contents of this page should stay here so Arch Wiki users and maintainers could change it as needed, which I gather it's what you meant by "moved". In fact, from the little I've been seeing in /r/VFIO there are nice examples and tutorials that are buried and not being mentioned and linked in the /r/VFIO wiki.
Regards the needed updates to this page, first start, I would say the Performance tuning needs to go into the Tips and tricks in the QEMU page (which deserve its own subpage), the article name should be changed to PCI passthrough so it would reflect the actual content of the page which isn't UEFI specific.
Also, the complete examples page seems superfluous, why not have a wiki page in /r/VFIO with known compatible MB, or MB+GPU setups as all rest of the devices (recent CPU, RAM, storage) shouldn't matter, I wouldn't be surprised if the moderators would remove it.Dhead (talk) 14:05, 11 July 2017 (UTC)