Mailinglist Archive: opensuse-bugs (4724 mails)

< Previous Next >
[Bug 576681] Installation hangs at Loading basic drivers
  • From: bugzilla_noreply@xxxxxxxxxx
  • Date: Tue, 6 Apr 2010 19:14:55 +0000
  • Message-id: <20100406191455.79A86CC7D0@xxxxxxxxxxxxxxxxxxxxxx>
http://bugzilla.novell.com/show_bug.cgi?id=576681

http://bugzilla.novell.com/show_bug.cgi?id=576681#c67


--- Comment #67 from Jiri Bohac <jbohac@xxxxxxxxxx> 2010-04-06 19:14:51 UTC ---
I can now reproduce the problem as well. After more debugging I see that the
machine is stuck in an endless loop of page faults.

The page fault is triggered by the memset at fec0000 and the page fault is
thought to be "spurious" (stale TLB entry) by the page fault handler, so the
kernel does nothing, the STOS instruction of memset is restarted and the
pagefault triggers again.

The reason code for the page fault is 3, that is a protection fault during a
write operation.

Looking at the PMD entry and PTE of the fec00000 page, the page is set to be
writeable, so I don't understand why this happens. The i386 specification says
that the TLB should be flushed automatically after a PF trap, and that is why
the PF handler does nothing if it believes the PF was "spurious".

So, this could either be a VB bug (because it is VB that emulates the paging,
traps, etc in the guest), or there is some other reason why a page protection
fault can happen besides the permission bits in the PTE/PMD entry.

(In reply to comment #65)
[ 0.000000] ACPI: IOAPIC (id[0x01] address[0xfec00000] gsi_base[0])
...
[ 44.781407] * pcpu debug:going to memset: chunk=e8e71140, cpu=0, off=8832,
size=64, addr=fec00000

Yes, I also thought this was the reason at first, but I think the IOAPIC
address refers to a physical address, while the allocated memory that memset
faults on is at virtual address fec00000, right?

2.6.34-rcX has a random memory corruption bug which is showing up as
various boot failures. Yinghai has a patch.

http://thread.gmane.org/gmane.linux.kernel/963616/focus=964914

This looks pretty deterministic, It fails at exactly the same place for more
people.


-rc3 has the fix which got committed to suse kernel repo a couple of days
ago.
It should soon appear on Factory.

Also, this bug is probably going to stop appearing with the new kernel in
Factory, because I recently switched IPv6 to be compiled-in.
Most likely, this bug is not related to IPv6 at all and it is just a
coincidence that the order in which the install CD image loads kernel modules
makes IPv6 be the first one to need a new allocation of pcpu data and trigger
this bug. With IPv6 compiled in, this order is going to change and the bug will
either be triggered by something else or will not show at all.

But even if this bug disappears, I think it is worth finding out what the cause
was, before it causes other headaches in a different situation.

More debugging soon, I currently have some more urgents bugs to deal with.

--
Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

< Previous Next >