[Bug 817210] New: openSUSE 12.3 Domain 0 doesn't boot with i915 graphics controller under Xen with VT-d enabled
https://bugzilla.novell.com/show_bug.cgi?id=817210
https://bugzilla.novell.com/show_bug.cgi?id=817210#c0
Summary: openSUSE 12.3 Domain 0 doesn't boot with i915 graphics
controller under Xen with VT-d enabled
Classification: openSUSE
Product: openSUSE 12.3
Version: Final
Platform: x86-64
OS/Version: openSUSE 12.3
Status: NEW
Severity: Critical
Priority: P5 - None
Component: Xen
AssignedTo: jdouglas@suse.com
ReportedBy: gadm@avalon-island.ru
QAContact: qa-bugs@suse.de
Found By: ---
Blocker: ---
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:20.0) Gecko/20100101
Firefox/20.0
Most commonly used configuration doesn't work: suppose I want to install
Windows under Xen using virt-manager and play with some unobvious hardware
under it.
I need for this:
1) openSUSE with Xen hypervisor;
2) VT-d-capable hardware to make PCI card available under HVM guest domain;
3) X Window System to start virt-manager and use Windows graphical console.
This is a regression against openSUSE 12.2 -- Linux Kernel v3.3/Xen v4.2 works
well, but not kernel 3.7 -- it locks up hard during boot with blank screen.
As a workaround, you have either:
1) to turn off VT-d (either in BIOS or using "iommu=0" parameter for xen.gz in
grub.conf);
or
2) to turn off kernel modesetting (using "nomodeset" parameter for Linux
kernel).
Both solutions are inacceptable because using first one I will miss
PCI-passthrough and using second I will miss X window system.
So, how can I help to debug the problem in question?
Reproducible: Always
Steps to Reproduce:
1. Install openSUSE on a host with 82Q35 video controller (uses i915 kernel
module);
2. Install Xen hypervisor and turn on IOMMU;
3. Boot.
Actual Results:
Hard lockup with blank screen.
Expected Results:
Working system, though. :)
Hardware list:
00:00.0 Host bridge: Intel Corporation 82Q35 Express DRAM Controller (rev 02)
Subsystem: Fujitsu Technology Solutions Device 10fc
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort-
https://bugzilla.novell.com/show_bug.cgi?id=817210
https://bugzilla.novell.com/show_bug.cgi?id=817210#c
Charles Arnold
https://bugzilla.novell.com/show_bug.cgi?id=817210
https://bugzilla.novell.com/show_bug.cgi?id=817210#c1
--- Comment #1 from Андрей Кольчугин
https://bugzilla.novell.com/show_bug.cgi?id=817210
https://bugzilla.novell.com/show_bug.cgi?id=817210#c2
Jan Beulich
From SLE11 SP3 we know that the i915 DRM code on top of an IOMMU currently has issues. So we're awaiting a fix there, which hopefully will help here too then.
In the meantime, rather than posting lspci output that to the most part is useless here, you could attach a full hypervisor and kernel log, which would likely allow to verify whether the situation is the same as observed on SLE11 SP3. You would need to have "iommu=debug loglvl=all" in place on the Xen command line for this to be useful. And to emphasize on this again - please attach larger pieces of information rather than inlining them! This bug, with only a single comment so far, is already close to unmanagable because of the amount of inline information. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=817210
https://bugzilla.novell.com/show_bug.cgi?id=817210#c3
--- Comment #3 from Jan Beulich
https://bugzilla.novell.com/show_bug.cgi?id=817210
https://bugzilla.novell.com/show_bug.cgi?id=817210#c4
--- Comment #4 from Андрей Кольчугин
From SLE11 SP3 we know that the i915 DRM code on top of an IOMMU currently has issues. So we're awaiting a fix there, which hopefully will help here too then. Analysis of the 'dmesg' output running Linux kernel without Xen but with IOMMU on shows the following diagnostics:
[ 0.243871] dmar: DRHD: handling fault status reg 3 [ 0.243878] dmar: DMAR:[DMA Write] Request device [00:02.0] fault addr bf800000 [ 0.243878] DMAR:[fault reason 01] Present bit in root entry is clear [ 0.243926] PCI-DMA: Intel(R) Virtualization Technology for Directed I/O Device 00:02.0 is, obviously, graphics controller. Although Linux+VT-d doesn't lock up hard, screen is completely unreadable (color bars). -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=817210
https://bugzilla.novell.com/show_bug.cgi?id=817210#c5
--- Comment #5 from Андрей Кольчугин
https://bugzilla.novell.com/show_bug.cgi?id=817210
https://bugzilla.novell.com/show_bug.cgi?id=817210#c6
Jan Beulich
https://bugzilla.novell.com/show_bug.cgi?id=817210
https://bugzilla.novell.com/show_bug.cgi?id=817210#c7
--- Comment #7 from Андрей Кольчугин
Also, btw, I can't see why with "nomodeset" you would have to live without X - you may to to reconfigure X for that purpose (in the worst case using the fb driver), but that doesn't mean X is entirely unavailable. Unfortunately, X is entirely unavailable as of framebuffer by itself is -- under _any_ working IOMMU (either under bare Linux kernel or under tboot/Linux) vesafb doesn't recognise any devices.
Yes, one can use, for example 'modprobe vga16fb' (it works, though) -- but one hardly can tell 'that doesn't mean X is entirely unavailable' if they have X with 640x480x16 :) -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=817210
https://bugzilla.novell.com/show_bug.cgi?id=817210#c8
--- Comment #8 from Jan Beulich
Unfortunately, X is entirely unavailable as of framebuffer by itself is -- under _any_ working IOMMU (either under bare Linux kernel or under tboot/Linux) vesafb doesn't recognise any devices.
I can't confirm this - I'm seeing vesafb working quite fine irrespective of the presence/use of an IOMMU. Are you perhaps not passing a suitable "vga=" option to the kernel (albeit iirc it's the boot loader honoring it), or a "mode=" one to the hypervisor? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=817210
https://bugzilla.novell.com/show_bug.cgi?id=817210#c9
--- Comment #9 from Андрей Кольчугин
Unfortunately, X is entirely unavailable as of framebuffer by itself is -- under _any_ working IOMMU (either under bare Linux kernel or under tboot/Linux) vesafb doesn't recognise any devices. I can't confirm this - I'm seeing vesafb working quite fine irrespective of the presence/use of an IOMMU. Are you perhaps not passing a suitable "vga=" option to the kernel (albeit iirc it's the boot loader honoring it), or a "mode=" one to the hypervisor? Hmmm... Yes, I thought that boot loader honoring it.
1) No. It doesn't in a case of multi-boot; 2) "mode=" is for Linux kernel and "vga=" is for hypervisor; 3) Xen v4.2.1 doesn't like lines a la "vga=031b"/"vga=0x31b", but "vga=gfx-1280x1024x32" works fine; 4) BOTH parameters must be present -- for hypervisor and for kernel. And it works! Framebuffer now recognised by both Linux kernel and X Window system, albeit i915 KMS should be fixed. ;) -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=817210
https://bugzilla.novell.com/show_bug.cgi?id=817210#c10
Андрей Кольчугин
However, the hypervisor log shows that there's a second issue in the Xen case, related to IRQ setup - whether that's a follow-up issue resulting from the earlier IOMMU faults I can't really tell without at least seeing the _full_ log(hypervisor+kernel, both at maximum log level and additionally with APIC logging enabled ["apic_verbosity=debug" on the hypervisor side and "apic=debug" on the kernel side]). Null problemo, sir! :)
Capture of SoL console output. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=817210
https://bugzilla.novell.com/show_bug.cgi?id=817210#c11
--- Comment #11 from Андрей Кольчугин
https://bugzilla.novell.com/show_bug.cgi?id=817210
https://bugzilla.novell.com/show_bug.cgi?id=817210#c12
--- Comment #12 from Jan Beulich
https://bugzilla.novell.com/show_bug.cgi?id=817210
https://bugzilla.novell.com/show_bug.cgi?id=817210#c13
Jan Beulich
https://bugzilla.novell.com/show_bug.cgi?id=817210
https://bugzilla.novell.com/show_bug.cgi?id=817210#c14
Андрей Кольчугин
The fault addresses (0xbf800000) point into hidden RAM (according to my guessing from the E820 map), and hence there being accesses to such memory invisible to Xen implies incomplete IOMMU related tables being provided by the firmware. Hence I don't think it is a Xen bug that "iommu=dom0-strict" doesn't work on that system. To hopefully clarify this, telling us what device is 0000:00:02.0 and attaching the contents of /proc/iomem when running a native kernel will be necessary. I had found my old hard drive with openSuSE v12.2 installed, attached it to system in question and have just booted native Linux kernel with intel_iommu=on -- as I have mentioned before, Linux kernel v3.1 boots flawlessly with KMS/X11 even with IOMMU enabled -- albeit immediately locks up hard when I start, for example, (in)famous 'glxgears', but anyway, it is a regression for openSuSE v12.3 -- newer kernel locks up during boot.
It seems to me that your guesses more than plausible: kernel whines about DMA Write Errors: === DMAR:[DMA Write] Request device [00:02.0] fault addr bffff000 DMAR:[fault reason 05] PTE Write access is not set DRHD: handling fault status reg 3 === But memory region DMA Remapping Engine complains about is ABSENT in /proc/iomem completely! What can be wrong with it? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=817210
https://bugzilla.novell.com/show_bug.cgi?id=817210#c15
--- Comment #15 from Андрей Кольчугин
https://bugzilla.novell.com/show_bug.cgi?id=817210
https://bugzilla.novell.com/show_bug.cgi?id=817210#c16
--- Comment #16 from Андрей Кольчугин
https://bugzilla.novell.com/show_bug.cgi?id=817210
https://bugzilla.novell.com/show_bug.cgi?id=817210#c17
Jan Beulich
https://bugzilla.novell.com/show_bug.cgi?id=817210
https://bugzilla.novell.com/show_bug.cgi?id=817210#c
Swamp Workflow Management
https://bugzilla.novell.com/show_bug.cgi?id=817210
https://bugzilla.novell.com/show_bug.cgi?id=817210#c
Jeff Mahoney
https://bugzilla.novell.com/show_bug.cgi?id=817210
https://bugzilla.novell.com/show_bug.cgi?id=817210#c18
Takashi Iwai
https://bugzilla.novell.com/show_bug.cgi?id=817210
https://bugzilla.novell.com/show_bug.cgi?id=817210#c19
Swamp Workflow Management
https://bugzilla.novell.com/show_bug.cgi?id=817210
https://bugzilla.novell.com/show_bug.cgi?id=817210#c
Swamp Workflow Management
https://bugzilla.novell.com/show_bug.cgi?id=817210
https://bugzilla.novell.com/show_bug.cgi?id=817210#c20
--- Comment #20 from Swamp Workflow Management
http://bugzilla.novell.com/show_bug.cgi?id=817210
Swamp Workflow Management
participants (1)
-
bugzilla_noreply@novell.com