[opensuse] Process Hangs / D-wait / Novell Bug 336669
Hi, <https://bugzilla.novell.com/show_bug.cgi?id=336669> This is a very uncool situation. I encounter these processes hanging in D waits quite a lot and it is very disruptive. Whatever the problem is, and it seems to be one of those bugs exacerbated by multi-core systems that we've known was going to be haunting us as multi-core CPUs proliferate, this one, apparently in the Reiser filesystem code, is particularly pernicious. Had I know about this, I'd never have gone with Reiser for my file systems when I installed 10.2 (now 10.3). How close is Novell to releasing a fix? Alternately, is there any good way to convert file systems? I've never had to do it before, and the only way I can think of is to dump, mkfs and restore, and I don't know what sort of pitfalls that presents. Peeved and alarmed, Randall Schulz -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Wednesday 31 October 2007 09:08, Randall R Schulz wrote:
Hi,
How come comments #19 through #22 don't appear? Comments #1 through #18 are there, then it skips to #23 and #24. Randall Schulz -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Thursday 01 November 2007 15:46:35 Randall R Schulz wrote:
On Wednesday 31 October 2007 09:08, Randall R Schulz wrote:
Hi,
How come comments #19 through #22 don't appear? Comments #1 through #18 are there, then it skips to #23 and #24.
Because they're tagged as "internal". Sometimes comments are made that are meant only for an internal audience. It can for example be comments about customers, or internal processes Anders -- Madness takes its toll -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Wednesday 31 October 2007 09:08, Randall R Schulz wrote:
Hi,
<https://bugzilla.novell.com/show_bug.cgi?id=336669>
...
<https://bugzilla.novell.com/show_bug.cgi?id=336669#c32> -==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==- [PATCH] reiserfs: bad unlock in reiserfs_xattr_get ... This is what I was looking for. I've found a bug where if getxattr() is called with either a NULL buffer (common) or a too small buffer (not common), it incorrectly unlocks the mutex, despite it not having been locked. I've checked in a patch to the CVS tree. -==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==- Hurrah! How long might it take to get a new kernel release out??? Waiting with bated breath... Randall Schulz -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Friday 02 November 2007 09:54:56 am Randall R Schulz wrote:
On Wednesday 31 October 2007 09:08, Randall R Schulz wrote:
Hi,
<https://bugzilla.novell.com/show_bug.cgi?id=336669>
...
<https://bugzilla.novell.com/show_bug.cgi?id=336669#c32>
-==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==- [PATCH] reiserfs: bad unlock in reiserfs_xattr_get
...
This is what I was looking for. I've found a bug where if getxattr() is called with either a NULL buffer (common) or a too small buffer (not common), it incorrectly unlocks the mutex, despite it not having been locked.
I've checked in a patch to the CVS tree. -==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==-
Hurrah!
How long might it take to get a new kernel release out???
Waiting with bated breath...
Randall Schulz
Don't see the same issue in 2.6.23 :o) .. Have you tried applying the patch? Ben -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Friday 02 November 2007 10:24:55 am Randall R Schulz wrote:
On Friday 02 November 2007 10:19, Ben Kevan wrote:
...
Have you tried applying the patch?
That's not my thing. I'll wait for a release.
Ben
RRS
Ok let me rephrase.. Has anybody tested and wrote off the fix in a real time enviroment? Someone that was actually seeing these issues reported? I'll build RPM from the modified source on my machine, but I don't have the said issue since I am running ext3 Ben -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Friday 02 November 2007 10:33, Ben Kevan wrote:
...
Has anybody tested and wrote off the fix in a real-time enviroment?
What does "real-time" have to do with it?
...
Ben
Randall Schulz -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Friday 02 November 2007 11:09:53 am Randall R Schulz wrote:
On Friday 02 November 2007 10:33, Ben Kevan wrote:
...
Has anybody tested and wrote off the fix in a real-time enviroment?
What does "real-time" have to do with it?
...
Ben
Randall Schulz
I am guessing quite a bit.. When I was working on fixing some issues with ipw3945 drivers.. I had to make sure that A it didn't break anything else, B that it was fixed on more then just 1 machine and to verify it wasn't something else. Crunching numbers at times is great, but testing in the real world sometimes really does it justice, maybe that's just my thoughts of QA. Ben -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 The Friday 2007-11-02 at 11:35 -0700, Ben Kevan wrote:
Has anybody tested and wrote off the fix in a real-time enviroment?
What does "real-time" have to do with it?
...
Crunching numbers at times is great, but testing in the real world sometimes really does it justice, maybe that's just my thoughts of QA.
You are confusing real time with real world. They are very, very, different issues. - -- Cheers, Carlos E. R. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4-svn0 (GNU/Linux) Comment: Made with pgp4pine 1.76 iD8DBQFHMSOftTMYHG2NR9URAhm+AJ48rfx5tapZ+5iy+RWF5JkHk7XYRACfR1W/ vd8zxgPo+Eqwj+7oQ+AYGpc= =D1FH -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Ben Kevan wrote:
On Friday 02 November 2007 09:54:56 am Randall R Schulz wrote:
On Wednesday 31 October 2007 09:08, Randall R Schulz wrote:
Hi,
<https://bugzilla.novell.com/show_bug.cgi?id=336669>
... <https://bugzilla.novell.com/show_bug.cgi?id=336669#c32>
-==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==- [PATCH] reiserfs: bad unlock in reiserfs_xattr_get
...
This is what I was looking for. I've found a bug where if getxattr() is called with either a NULL buffer (common) or a too small buffer (not common), it incorrectly unlocks the mutex, despite it not having been locked.
I've checked in a patch to the CVS tree. -==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==-
Hurrah!
How long might it take to get a new kernel release out???
Waiting with bated breath...
Randall Schulz
Don't see the same issue in 2.6.23 :o) ..
Have you tried applying the patch?
It wouldn't be in vanilla 2.6.23, but it would be in the CVS HEAD kernel based on 2.6.23. Today's KOTD kernel[1,2] should have the fix included. This is the tree that will eventually be the update kernel. - -Jeff [1] ftp://ftp.suse.com/pub/projects/kernel/kotd/SL103_BRANCH - -- Jeff Mahoney SUSE Labs -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4-svn0 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org iD8DBQFHLIJ9LPWxlyuTD7IRAklKAJ0SgKw83kHf9FRKW6cwPJPATW9W8QCeNljT PUVtWKe9wWiQyLik1JJ+lyI= =LDeH -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Saturday 03 November 2007 07:15:25 am you wrote:
It wouldn't be in vanilla 2.6.23, but it would be in the CVS HEAD kernel based on 2.6.23.
Today's KOTD kernel[1,2] should have the fix included. This is the tree that will eventually be the update kernel.
-Jeff
[1] ftp://ftp.suse.com/pub/projects/kernel/kotd/SL103_BRANCH
Looks good Jeff and yeah I saw that it wasn't in Vanilla 10.3 but never did check out the CVS head. I've also installed 2.6.24RC1 patched from 2.6.23 and well that's clean as expected. When do you think the kernel will get signed off for update? Also, what other fixes may be incorporated with the next release? Ben -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Ben Kevan wrote:
On Saturday 03 November 2007 07:15:25 am you wrote:
It wouldn't be in vanilla 2.6.23, but it would be in the CVS HEAD kernel based on 2.6.23.
Today's KOTD kernel[1,2] should have the fix included. This is the tree that will eventually be the update kernel.
-Jeff
[1] ftp://ftp.suse.com/pub/projects/kernel/kotd/SL103_BRANCH
Looks good Jeff and yeah I saw that it wasn't in Vanilla 10.3 but never did check out the CVS head. I've also installed 2.6.24RC1 patched from 2.6.23 and well that's clean as expected.
When do you think the kernel will get signed off for update? Also, what other fixes may be incorporated with the next release?
It's in the hands of our maintenance team now, but it should be soon. The highlights for the fixes for this release are: * Update to 2.6.22.10 * Fix kernel hang during OCFS2 cluster initialization * Fix error during OCFS2 disk heartbeat writing * Fix Oops during natsemi module unload * Fix machine rebooting instead of power off after shutdown * Fix for first lid closing not triggering suspend * Fix for spurious -ENOSPC on reiserfs * Fix for misplaced unlock in reiserfs_xattr_get() causing process hangs * Fix for oops in alsa hdsp * Fix for integer underflow with runt rx frames in 802.11 * Fix an Oops in NFS encode_lookup() * Improve hda-intel codec probing robustness * Add suspend/resume support for aic7xxx * Make scanning of FAT table faster on mount * Update e1000e to version in 2.6.24-rc1 * Stop zc0301 from claiming Logitech Quickcam * Load ACPI Bay driver when needed automatically * Dell CERC support for megaraid_mbox * Misc Xen fixes - -Jeff - -- Jeff Mahoney SUSE Labs -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4-svn0 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org iD8DBQFHL0EgLPWxlyuTD7IRAojOAJ4oCkblJl84JDwD7XODTItjyC6ewgCggbXz OcdsXy8MuNqnqdONFBDuZug= =Obzw -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Monday 05 November 2007 08:13, Jeff Mahoney wrote:
...
When do you think the kernel will get signed off for update? Also, what other fixes may be incorporated with the next release?
It's in the hands of our maintenance team now, but it should be soon.
Great. I experience the Reiser hang too much for me to be able to trust the system. I especially don't want to reactivate my Subversion server or do package management until this is resolved.
The highlights for the fixes for this release are:
* Update to 2.6.22.10 * Fix kernel hang during OCFS2 cluster initialization * Fix error during OCFS2 disk heartbeat writing * Fix Oops during natsemi module unload * Fix machine rebooting instead of power off after shutdown * Fix for first lid closing not triggering suspend * Fix for spurious -ENOSPC on reiserfs * Fix for misplaced unlock in reiserfs_xattr_get() causing process hangs * Fix for oops in alsa hdsp * Fix for integer underflow with runt rx frames in 802.11 * Fix an Oops in NFS encode_lookup() * Improve hda-intel codec probing robustness * Add suspend/resume support for aic7xxx * Make scanning of FAT table faster on mount * Update e1000e to version in 2.6.24-rc1 * Stop zc0301 from claiming Logitech Quickcam * Load ACPI Bay driver when needed automatically * Dell CERC support for megaraid_mbox * Misc Xen fixes
Spiffy!
Jeff Mahoney SUSE Labs
Thanks, folks. Randall Schulz -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Jeff, when do you think to do it the team ? Aprox. what date ? thanks a lot.
The highlights for the fixes for this release are:
* Update to 2.6.22.10 * Fix kernel hang during OCFS2 cluster initialization * Fix error during OCFS2 disk heartbeat writing * Fix Oops during natsemi module unload * Fix machine rebooting instead of power off after shutdown * Fix for first lid closing not triggering suspend * Fix for spurious -ENOSPC on reiserfs * Fix for misplaced unlock in reiserfs_xattr_get() causing process hangs * Fix for oops in alsa hdsp * Fix for integer underflow with runt rx frames in 802.11 * Fix an Oops in NFS encode_lookup() * Improve hda-intel codec probing robustness * Add suspend/resume support for aic7xxx * Make scanning of FAT table faster on mount * Update e1000e to version in 2.6.24-rc1 * Stop zc0301 from claiming Logitech Quickcam * Load ACPI Bay driver when needed automatically * Dell CERC support for megaraid_mbox * Misc Xen fixes -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 gabriel.schwartz@gmail.com wrote:
Jeff, when do you think to do it the team ? Aprox. what date ?
Sorry, I missed this. Then openSUSE list is high enough traffic that I can't follow every thread unless I'm CC'd. I just happened to look this up again. The update has been approved and released today. It should sync out to the mirrors within the next few hours. - -Jeff
thanks a lot.
The highlights for the fixes for this release are:
* Update to 2.6.22.10 * Fix kernel hang during OCFS2 cluster initialization * Fix error during OCFS2 disk heartbeat writing * Fix Oops during natsemi module unload * Fix machine rebooting instead of power off after shutdown * Fix for first lid closing not triggering suspend * Fix for spurious -ENOSPC on reiserfs * Fix for misplaced unlock in reiserfs_xattr_get() causing process hangs * Fix for oops in alsa hdsp * Fix for integer underflow with runt rx frames in 802.11 * Fix an Oops in NFS encode_lookup() * Improve hda-intel codec probing robustness * Add suspend/resume support for aic7xxx * Make scanning of FAT table faster on mount * Update e1000e to version in 2.6.24-rc1 * Stop zc0301 from claiming Logitech Quickcam * Load ACPI Bay driver when needed automatically * Dell CERC support for megaraid_mbox * Misc Xen fixes
- -- Jeff Mahoney SUSE Labs -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4-svn0 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org iD8DBQFHMz+eLPWxlyuTD7IRAqAfAKCVb3wqcb0TEn4yhxG8xHvvkdET2wCeJnjJ CJ+B8b4Rd3E8Dynxap7AieM= =LOlf -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 The Monday 2007-11-05 at 11:13 -0500, Jeff Mahoney wrote:
* Fix for first lid closing not triggering suspend
My machine refuses to suspend on events. Can that be related? (reported on another thread) - -- Cheers, Carlos E. R. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4-svn0 (GNU/Linux) Comment: Made with pgp4pine 1.76 iD8DBQFHMSOLtTMYHG2NR9URAu8wAJwOyDzJssMKdhp71ydILjo7e+oufwCfa84C iS188+PxGFscVlJZiexiHiI= =YNIC -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Wednesday 31 October 2007 09:08, Randall R Schulz wrote:
Hi,
<https://bugzilla.novell.com/show_bug.cgi?id=336669>
...
How close is Novell to releasing a fix?
<https://bugzilla.novell.com/show_bug.cgi?id=336669#c40> -==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==- --- Comment #40 from Marcus Meissner 2007-11-08 09:45:53 MST --- I have just approved the kernel update fixing this and it will be syncing to our mirrors within the next hours. -==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==- Hurrah! Randall Schulz -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Thursday 08 November 2007 09:02, Randall R Schulz wrote:
On Wednesday 31 October 2007 09:08, Randall R Schulz wrote:
Hi,
<https://bugzilla.novell.com/show_bug.cgi?id=336669>
...
How close is Novell to releasing a fix?
<https://bugzilla.novell.com/show_bug.cgi?id=336669#c40>
-==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==- --- Comment #40 from Marcus Meissner 2007-11-08 09:45:53 MST --- I have just approved the kernel update fixing this and it will be syncing to our mirrors within the next hours. -==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==-
It's there. And none too soon. I had a pile of D-wait-state processes just trying to reboot after installing the new kernel. For some reason, these processes don't cause the system to hang when attempting to reboot, but neither does the file system get cleanly unmounted, it seems. I had many, many transactions to replay upon rebooting. Thanks, folks! Randall Schulz -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Thursday 08 November 2007 20:00:49 Randall R Schulz wrote:
It's there. And none too soon. I had a pile of D-wait-state processes just trying to reboot after installing the new kernel.
FWIW, I've been thrashing the update kernel for the past couple of days by repeatedly building KDE 4 on it, and I haven't seen any more dead processes. Will -- Will Stephenson Desktop Engineer Interfaces and Applications -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
participants (7)
-
Anders Johansson
-
Ben Kevan
-
Carlos E. R.
-
gabriel.schwartz@gmail.com
-
Jeff Mahoney
-
Randall R Schulz
-
Will Stephenson