https://bugzilla.novell.com/show_bug.cgi?id=485712 Summary: Applications become unresponsive during a kernel build Classification: openSUSE Product: openSUSE 11.1 Version: Final Platform: x86-64 OS/Version: openSUSE 11.1 Status: NEW Severity: Normal Priority: P5 - None Component: Other AssignedTo: bnc-team-screening@forge.provo.novell.com ReportedBy: ghaskins@novell.com QAContact: qa@suse.de Found By: Development I've been noticing at least since opensuse 11.0 that certain desktop applications would become unresponsive while I was performing heavy computations, such as a kernel build. The behavior is that the applications in question (firefox and pidgin are two commonly affected apps) would stop responding to inputs, would not refresh, and often times would grey-out (gnome/compiz? feature to show unresponsive apps). The applications will resume normal behavior sometimes mid-build, but usually not until about 5-10 seconds after the kernel build completed. Note that, while noticely slower due to the build, most apps continue to work during this period. At first I thought this might be a CFS bug where applications were getting inserted into the wrong position of the rb-tree. Today, I finally grew annoyed enough at the problem to try to confirm this theory, so I executed a sysrq-w to try to spy the errant sum-exec-time or other such relevant scheduler parameters. To my surprise, the "hung" apps were not even on the runqueue. Instead, they were showing up in the "D" (UNINTERRUPTIBLE) state. Regardless of the app, the pattern of the calltrace was consistent: pidgin D 0000000000000001 0 7126 6820 ffff88012acb7d48 0000000000000082 ffff88012acb7d58 ffff88012acb7cd8 ffffffff80a27000 ffffffff80a31600 ffffffff80a2e3f0 ffffffff80a31600 ffffffff80a27000 ffffffff80a31600 ffffffff80a31600 ffffffff80a31600 Call Trace: [<ffffffffa00bbb38>] log_wait_commit+0x12b/0x17f [jbd] [<ffffffffa00b6bae>] journal_stop+0x222/0x24c [jbd] [<ffffffff802ce449>] __sync_single_inode+0x96/0x242 [<ffffffff802ce724>] __writeback_single_inode+0x12f/0x13a [<ffffffff802ce753>] sync_inode+0x24/0x30 [<ffffffffa00cd67a>] ext3_sync_file+0xa6/0xe0 [ext3] [<ffffffff802d176b>] do_fsync+0x52/0x87 [<ffffffff802d17c4>] __do_fsync+0x24/0x36 [<ffffffff8020bfbb>] system_call_fastpath+0x16/0x1b [<00007f4ddd343010>] 0x7f4ddd343010 I havent decoded the line-numbers for this trace yet, but it appears that the issue is that applications that execute fsync() during this kernel-build activity become blocked for an exorbitant amount of time (usually until the build finishes), presumably on a mutex (thus the UNINTERRUPTIBLE state) though this is not yet confirmed. Here are some details about the system in question: This is a 2x2 (4 core) x86_64 Intel Woodcrest system running with the /home directory (also where the build is taking place) on an md1-raid formatted with ext3. I run the following command to reproduce: "time make -j 32 CC=distcc 2>&1 | tee /tmp/buildlog" So you can see that there are a potentially large number of make threads, but most cc jobs are dispatched to the network via distcc, leaving the localhost only processing ld type workloads. I will attach the full sysrq-w output, but the summary is that there is a ton of io-schedule related activity that is currently blocked. This may or may not be the smoking gun. I assume that this problem is probably not common since it had the same behavior in 2.6.25 (os11.0) as it does in 2.6.27 (os11.1). It probably has something to do with my particular setup with the raid+ext3. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.