[Bug 784219] New: kernel Call Traces from io_schedule
https://bugzilla.novell.com/show_bug.cgi?id=784219 https://bugzilla.novell.com/show_bug.cgi?id=784219#c0 Summary: kernel Call Traces from io_schedule Classification: openSUSE Product: openSUSE 12.2 Version: Final Platform: Other OS/Version: Other Status: NEW Severity: Normal Priority: P5 - None Component: Kernel AssignedTo: meissner@suse.com ReportedBy: rommel@suse.com QAContact: qa-bugs@suse.de CC: jeffm@suse.com Found By: --- Blocker: --- When testing the prepared maintenance update Products: SUSE Linux 12.2 (debug, multiarch) Category: security openSUSE Maintenance ID: 970 openSUSE Review ID: 137259 Repository URL: http://download.opensuse.org/repositories/openSUSE:/Maintenance:/970/openSUS... Packager: jeff_mahoney Bugs: 762693 765230 771392 772566 772831 772923 773406 774523 774859 776925 778630 779432 780624 781134 with newburn I got some kernel call traces like Oct 9 00:45:03 boxer kernel: INFO: task rs:main Q:Reg:24427 blocked for more than 480 seconds. Oct 9 00:45:03 boxer kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Oct 9 00:45:03 boxer kernel: ffff880120721a08 0000000000000082 ffff88011ff06100 ffff880120721fd8 Oct 9 00:45:03 boxer kernel: ffff880120721fd8 ffff880120721fd8 ffff88011aef2600 ffff88011ff06100 Oct 9 00:45:03 boxer kernel: ffffffff81009a05 ffff88011ff06100 ffff880127c53568 0000000000000002 Oct 9 00:45:03 boxer kernel: Call Trace: Oct 9 00:45:03 boxer kernel: [<ffffffff8158c4ba>] io_schedule+0x8a/0xd0 Oct 9 00:45:03 boxer kernel: [<ffffffff810fe929>] sleep_on_page+0x9/0x10 Oct 9 00:45:03 boxer kernel: [<ffffffff8158abcf>] __wait_on_bit+0x4f/0x80 Oct 9 00:45:03 boxer kernel: [<ffffffff810fea40>] wait_on_page_bit+0x70/0x80 Oct 9 00:45:03 boxer kernel: [<ffffffff810ff9da>] grab_cache_page_write_begin+0xaa/0x100 Oct 9 00:45:03 boxer kernel: [<ffffffff811f7451>] ext4_da_write_begin+0xa1/0x230 Oct 9 00:45:03 boxer kernel: [<ffffffff810ff5d1>] generic_file_buffered_write+0x121/0x2d0 Oct 9 00:45:03 boxer kernel: [<ffffffff8110052a>] __generic_file_aio_write+0x21a/0x410 Oct 9 00:45:03 boxer kernel: [<ffffffff81100790>] generic_file_aio_write+0x70/0xf0 Oct 9 00:45:03 boxer kernel: [<ffffffff811f0b67>] ext4_file_write+0xb7/0x260 Oct 9 00:45:03 boxer kernel: [<ffffffff8115cffe>] do_sync_write+0xbe/0x100 Oct 9 00:45:03 boxer kernel: [<ffffffff8115d8c7>] vfs_write+0xa7/0x180 Oct 9 00:45:03 boxer kernel: [<ffffffff8115dbf7>] sys_write+0x47/0x90 Oct 9 00:45:03 boxer kernel: [<ffffffff8159437d>] system_call_fastpath+0x1a/0x1f Oct 9 00:45:03 boxer kernel: [<00007fdd3fe8228d>] 0x7fdd3fe8228c and Oct 9 00:37:03 boxer kernel: INFO: task memtst:31391 blocked for more than 480 seconds. Oct 9 00:37:03 boxer kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Oct 9 00:37:03 boxer kernel: ffff8800a076f718 0000000000000082 ffff8800a49ae440 ffff8800a076ffd8 Oct 9 00:37:03 boxer kernel: ffff8800a076ffd8 ffff8800a076ffd8 ffffffff81a14020 ffff8800a49ae440 Oct 9 00:37:03 boxer kernel: ffffffff81009a05 ffff8800a49ae440 ffff880127c13568 0000000000000002 Oct 9 00:37:03 boxer kernel: Call Trace: Oct 9 00:37:03 boxer kernel: [<ffffffff8158c4ba>] io_schedule+0x8a/0xd0 Oct 9 00:37:03 boxer kernel: [<ffffffff810fe929>] sleep_on_page+0x9/0x10 Oct 9 00:37:03 boxer kernel: [<ffffffff8158abcf>] __wait_on_bit+0x4f/0x80 Oct 9 00:37:03 boxer kernel: [<ffffffff810fea40>] wait_on_page_bit+0x70/0x80 Oct 9 00:37:03 boxer kernel: [<ffffffff81111c1b>] shrink_page_list+0x29b/0x940 Oct 9 00:37:03 boxer kernel: [<ffffffff81112774>] shrink_inactive_list+0x1f4/0x4f0 Oct 9 00:37:03 boxer kernel: [<ffffffff811131d2>] shrink_mem_cgroup_zone+0x3b2/0x520 Oct 9 00:37:03 boxer kernel: [<ffffffff811133b0>] shrink_zone+0x70/0x90 Oct 9 00:37:03 boxer kernel: [<ffffffff81113733>] do_try_to_free_pages+0x103/0x560 Oct 9 00:37:03 boxer kernel: [<ffffffff81113e91>] try_to_free_pages+0x91/0x180 Oct 9 00:37:03 boxer kernel: [<ffffffff81108108>] __alloc_pages_nodemask+0x578/0xa00 Oct 9 00:37:03 boxer kernel: [<ffffffff81143468>] alloc_pages_vma+0xb8/0x1a0 Oct 9 00:37:03 boxer kernel: [<ffffffff81151314>] do_huge_pmd_anonymous_page+0x154/0x380 Oct 9 00:37:03 boxer kernel: [<ffffffff81590839>] do_page_fault+0x149/0x490 Oct 9 00:37:03 boxer kernel: [<ffffffff8158dc75>] page_fault+0x25/0x30 Oct 9 00:37:03 boxer kernel: [<0000000000401685>] 0x401684 How to reproduce: install qa_test_newburn from IBS QA:Head and run /usr/lib/ctcs2/tools/test_newburn-run (in this case I lowered the execution time in that script from 24h to 8h) At this point I don't know if this is a regression when compared to the GM kernel. Please advise how to read/rate these messages. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=784219 https://bugzilla.novell.com/show_bug.cgi?id=784219#c1 --- Comment #1 from Heiko Rommel <rommel@suse.com> 2012-10-09 13:27:14 UTC --- Created an attachment (id=508723) --> (http://bugzilla.novell.com/attachment.cgi?id=508723) /var/log/warn note that this happens both on i386 and x86_64 -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=784219 https://bugzilla.novell.com/show_bug.cgi?id=784219#c2 Marcus Meissner <meissner@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |meissner@suse.com AssignedTo|meissner@suse.com |jeffm@suse.com --- Comment #2 from Marcus Meissner <meissner@suse.com> 2012-10-09 13:32:42 UTC --- jeff, are you the ext4 guy? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=784219 https://bugzilla.novell.com/show_bug.cgi?id=784219#c3 --- Comment #3 from Marcus Meissner <meissner@suse.com> 2012-10-09 13:33:32 UTC --- parent Oct 8 17:18:29 boxer kernel: [ 4886.512282] oom01 invoked oom-killer: gfp_mask=0x280da, order=0, oom_adj=0, oom_score_adj=0 Oct 8 17:18:29 boxer kernel: [ 4886.512290] Pid: 18470, comm: oom01 Tainted: P WC O 3.4.11-2.16-desktop #1 Oct 8 17:18:29 boxer kernel: [ 4886.512291] Call Trace: We seem to be very close to out of memory, so this might be a drain pages issue. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=784219 https://bugzilla.novell.com/show_bug.cgi?id=784219#c Jeff Mahoney <jeffm@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|jeffm@suse.com |jack@suse.com -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=784219 https://bugzilla.novell.com/show_bug.cgi?id=784219#c4 Jan Kara <jack@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED --- Comment #4 from Jan Kara <jack@suse.com> 2012-10-09 16:37:18 UTC --- In both traces, we are waiting for IO on a page to finish and the kernel complains it takes too long (more than 8 minutes!). If the machine is desperately low on memory and is thus swapping and / or evicting page cache pages, I can imagine it takes a long time since at those moments we do pretty much random IO but it's still somewhat suspicious. I'll try to reproduce the issue and will see whether I can find something suspicious. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=784219 https://bugzilla.novell.com/show_bug.cgi?id=784219#c5 --- Comment #5 from Heiko Rommel <rommel@suse.com> 2012-10-11 09:51:07 UTC --- With the GM kernel (3.4.11-2.10) this looks slightly different: Oct 10 18:54:32 boxer kernel: INFO: task rs:main Q:Reg:24843 blocked for more than 480 seconds. Oct 10 18:54:32 boxer kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Oct 10 18:54:32 boxer kernel: ffff8801209c3a08 0000000000000082 ffff88011eebe2c0 ffff8801209c3fd8 Oct 10 18:54:32 boxer kernel: ffff8801209c3fd8 ffff8801209c3fd8 ffff880122306480 ffff88011eebe2c0 Oct 10 18:54:32 boxer kernel: ffffffff81009a05 ffff88011eebe2c0 ffff880127c53528 0000000000000002 Oct 10 18:54:32 boxer kernel: Call Trace: Oct 10 18:54:32 boxer kernel: [<ffffffff8158a8ea>] io_schedule+0x8a/0xd0 Oct 10 18:54:32 boxer kernel: [<ffffffff810fe889>] sleep_on_page+0x9/0x10 Oct 10 18:54:32 boxer kernel: [<ffffffff81588fff>] __wait_on_bit+0x4f/0x80 Oct 10 18:54:32 boxer kernel: [<ffffffff810fe9a0>] wait_on_page_bit+0x70/0x80 Oct 10 18:54:32 boxer kernel: [<ffffffff810ff93a>] grab_cache_page_write_begin+0xaa/0x100 Oct 10 18:54:32 boxer kernel: [<ffffffff811f71d1>] ext4_da_write_begin+0xa1/0x230 Oct 10 18:54:32 boxer kernel: [<ffffffff810ff531>] generic_file_buffered_write+0x121/0x2d0 Oct 10 18:54:32 boxer kernel: [<ffffffff8110048a>] __generic_file_aio_write+0x21a/0x410 Oct 10 18:54:32 boxer kernel: [<ffffffff811006f0>] generic_file_aio_write+0x70/0xf0 Oct 10 18:54:32 boxer kernel: [<ffffffff811f0927>] ext4_file_write+0xb7/0x260 Oct 10 18:54:32 boxer kernel: [<ffffffff8115ce9e>] do_sync_write+0xbe/0x100 Oct 10 18:54:32 boxer kernel: [<ffffffff8115d767>] vfs_write+0xa7/0x180 Oct 10 18:54:32 boxer kernel: [<ffffffff8115da97>] sys_write+0x47/0x90 Oct 10 18:54:32 boxer kernel: [<ffffffff815927bd>] system_call_fastpath+0x1a/0x1f Oct 10 18:54:32 boxer kernel: [<00007fdb83b3c28d>] 0x7fdb83b3c28c and Oct 10 18:54:32 boxer kernel: INFO: task screen:11933 blocked for more than 480 seconds. Oct 10 18:54:32 boxer kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Oct 10 18:54:32 boxer kernel: ffff88011ed7da28 0000000000000086 ffff88011e2aa680 ffff88011ed7dfd8 Oct 10 18:54:32 boxer kernel: ffff88011ed7dfd8 ffff88011ed7dfd8 ffff880122306480 ffff88011e2aa680 Oct 10 18:54:32 boxer kernel: 0000000000000246 ffff8800253295f8 0000000000000000 ffff8801179fe2a0 Oct 10 18:54:32 boxer kernel: Call Trace: Oct 10 18:54:32 boxer kernel: [<ffffffff81242175>] do_get_write_access+0x2f5/0x5c0 Oct 10 18:54:32 boxer kernel: [<ffffffff81242629>] jbd2_journal_get_write_access+0x29/0x50 Oct 10 18:54:32 boxer kernel: [<ffffffff81223388>] __ext4_journal_get_write_access+0x38/0x90 Oct 10 18:54:32 boxer kernel: [<ffffffff811f8988>] ext4_reserve_inode_write+0x78/0xa0 Oct 10 18:54:32 boxer kernel: [<ffffffff811f89f8>] ext4_mark_inode_dirty+0x48/0x220 Oct 10 18:54:32 boxer kernel: [<ffffffff811fb175>] ext4_dirty_inode+0x35/0x70 Oct 10 18:54:32 boxer kernel: [<ffffffff81184ead>] __mark_inode_dirty+0x3d/0x280 Oct 10 18:54:32 boxer kernel: [<ffffffff81176831>] file_update_time+0xf1/0x190 Oct 10 18:54:32 boxer kernel: [<ffffffff81100458>] __generic_file_aio_write+0x1e8/0x410 Oct 10 18:54:32 boxer kernel: [<ffffffff811006f0>] generic_file_aio_write+0x70/0xf0 Oct 10 18:54:32 boxer kernel: [<ffffffff811f0927>] ext4_file_write+0xb7/0x260 Oct 10 18:54:32 boxer kernel: [<ffffffff8115ce9e>] do_sync_write+0xbe/0x100 Oct 10 18:54:32 boxer kernel: [<ffffffff8115d767>] vfs_write+0xa7/0x180 Oct 10 18:54:32 boxer kernel: [<ffffffff8115da97>] sys_write+0x47/0x90 Oct 10 18:54:32 boxer kernel: [<ffffffff815927bd>] system_call_fastpath+0x1a/0x1f Oct 10 18:54:32 boxer kernel: [<00007fb8f8f1c220>] 0x7fb8f8f1c21f -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=784219 https://bugzilla.novell.com/show_bug.cgi?id=784219#c6 --- Comment #6 from Heiko Rommel <rommel@suse.com> 2012-10-11 09:51:59 UTC --- Created an attachment (id=509134) --> (http://bugzilla.novell.com/attachment.cgi?id=509134) /var/log/warn from run with kernel 3.4.11-2.10 -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=784219 https://bugzilla.novell.com/show_bug.cgi?id=784219#c7 Jan Kara <jack@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|P5 - None |P4 - Low Status|ASSIGNED |NEEDINFO InfoProvider| |rommel@suse.com --- Comment #7 from Jan Kara <jack@suse.com> 2012-10-18 17:49:19 UTC --- Thanks for the messages. Effectively, this is the same. We are hanging waiting for IO. Since this is not a regression and as I've checked newburn, it really pushes the machine over the edge, I'm not worried too much. I'm still curious how come IO takes so long to complete. Umm, I have one idea: Can you switch IO scheduler for the underlying device to deadline? You can do that by "echo deadline >/sys/block/<dev>/queue/scheduler". Can you still trigger those messages? Because I remember CFQ (default IO scheduler) has some issues that lots of sync IO can effectively starve async IO and that would lead to similar issues. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=784219 https://bugzilla.novell.com/show_bug.cgi?id=784219#c8 Heiko Rommel <rommel@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED InfoProvider|rommel@suse.com | --- Comment #8 from Heiko Rommel <rommel@suse.com> 2012-10-23 13:44:58 UTC --- When using "echo deadline >/sys/block/<dev>/queue/scheduler" I can NOT trigger those messages. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=784219 https://bugzilla.novell.com/show_bug.cgi?id=784219#c9 --- Comment #9 from Jan Kara <jack@suse.com> 2012-11-08 22:55:56 UTC --- Thanks for the test and sorry, I forgot to update things here. So this really seems to be known issue with CFQ which can starve async IO in presence of synchronous IO. This is more or less feature of this IO scheduler. I'll keep this bug open for a while in a hope that I might find time to look more into CFQ and in how it could be tweaked to improve the situation. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=784219 https://bugzilla.novell.com/show_bug.cgi?id=784219#c10 Jeff Mahoney <jeffm@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |WONTFIX --- Comment #10 from Jeff Mahoney <jeffm@suse.com> 2014-08-08 16:28:56 EDT --- This report is against openSUSE 12.2 which is no longer under maintenance. If you are able to reproduce it with openSUSE 13.1 or openSUSE Factory, please re-open and reset the the "Product" field to the appropriate release. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com