[Bug 566288] New: Kernel oops triggered by switching tabs in flash based page
http://bugzilla.novell.com/show_bug.cgi?id=566288 http://bugzilla.novell.com/show_bug.cgi?id=566288#c0 Summary: Kernel oops triggered by switching tabs in flash based page Classification: openSUSE Product: openSUSE 11.2 Version: Final Platform: x86-64 OS/Version: openSUSE 11.2 Status: NEW Severity: Major Priority: P5 - None Component: Kernel AssignedTo: kernel-maintainers@forge.provo.novell.com ReportedBy: rschweikert@novell.com QAContact: qa@suse.de Found By: --- Blocker: --- A kernel oops was triggered when switching a tab on the flash based web page http://www.bundesliga.de/de/ Kernel info: Linux triumph 2.6.31.5-0.1-default #1 SMP 2009-10-26 15:49:03 +0100 x86_64 x86_64 x86_64 GNU/Linux Message to terminal: Message from syslogd@triumph at Dec 20 07:12:28 ... kernel:[81850.245688] Oops: 0002 [#1] SMP Message from syslogd@triumph at Dec 20 07:12:28 ... kernel:[81850.245696] last sysfs file: /sys/devices/virtual/net/br0/statistics/collisions Message from syslogd@triumph at Dec 20 07:12:28 ... kernel:[81850.245960] Stack: Message from syslogd@triumph at Dec 20 07:12:28 ... kernel:[81850.245999] Call Trace: Message from syslogd@triumph at Dec 20 07:12:28 ... kernel:[81850.246108] Code: 8d 40 ff ff ff e8 e3 fc ff ff 48 8b 8d 40 ff ff ff 48 85 c0 48 89 cb 74 31 48 8b 10 f7 c2 00 00 02 00 0f 85 47 01 00 00 48 89 c2 <f0> ff 42 08 f0 ff 40 0c f6 40 18 01 48 8d 55 c0 74 07 48 8b 95 Message from syslogd@triumph at Dec 20 07:12:28 ... kernel:[81850.246184] CR2: 0000000000000008 Information from dmesg: [10596.490606] BUG: Bad page state in process firefox pfn:171e93 [10596.490617] page:ffffea00050eb028 flags:0040000000000000 count:898597376 mapcount:0 mapping:(null) index:160be9 [10596.490624] Pid: 3958, comm: firefox Tainted: P 2.6.31.5-0.1-default #1 [10596.490630] Call Trace: [10596.490646] [<ffffffff81011749>] try_stack_unwind+0x189/0x1b0 [10596.490654] [<ffffffff8101013d>] dump_trace+0x9d/0x330 [10596.490662] [<ffffffff81011254>] show_trace_log_lvl+0x64/0x90 [10596.490669] [<ffffffff810112a3>] show_trace+0x23/0x40 [10596.490676] [<ffffffff81554378>] dump_stack+0x81/0x9e [10596.490684] [<ffffffff8110f769>] bad_page+0xf9/0x160 [10596.490691] [<ffffffff8110ffcb>] prep_new_page+0x3b/0x190 [10596.490698] [<ffffffff811107cf>] get_page_from_freelist+0x35f/0x6e0 [10596.490706] [<ffffffff811111b5>] __alloc_pages_nodemask+0xe5/0x160 [10596.490713] [<ffffffff811474c0>] alloc_page_vma+0x80/0x120 [10596.490722] [<ffffffff81129ae7>] do_anonymous_page+0x57/0x250 [10596.490729] [<ffffffff8112e34f>] handle_mm_fault+0x38f/0x450 [10596.490737] [<ffffffff8155b333>] do_page_fault+0x193/0x3b0 [10596.490744] [<ffffffff81558455>] page_fault+0x25/0x30 [10596.490768] [<00007f69915607e7>] 0x7f69915607e7 [81850.245646] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 [81850.245663] IP: [<ffffffff8112a2ee>] copy_pte_range+0x29e/0x580 [81850.245678] PGD 17fd75067 PUD 17fd6f067 PMD 0 [81850.245688] Oops: 0002 [#1] SMP [81850.245696] last sysfs file: /sys/devices/virtual/net/br0/statistics/collisions [81850.245706] CPU 0 [81850.245713] Modules linked in: nfs fscache deflate zlib_deflate ctr twofish_x86_64 twofish_common camellia serpent blowfish cast5 des_generic cbc cryptd crypto_wq aes_x86_64 aes_generic xcbc rmd160 sha256_generic sha1_generic md5 hmac cryptomgr aead pcompress nfsd crypto_null crypto_blkcipher lockd crypto_hash nfs_acl crypto_algapi auth_rpcgss snd_pcm_oss snd_mixer_oss sunrpc af_key snd_seq snd_seq_device edd bridge stp llc fuse loop dm_mod snd_hda_codec_realtek snd_hda_intel snd_hda_codec kvm_intel ohci1394 snd_hwdep snd_pcm snd_timer snd i2c_nforce2 ieee1394 pcspkr forcedeth snd_page_alloc floppy kvm sr_mod cdrom button sg nvidia(P) xfs exportfs fan processor ide_pci_generic amd74xx ide_core ata_generic sym53c8xx aic7xxx scsi_transport_spi pata_jmicron pata_amd thermal thermal_sys sata_nv [last unloaded: preloadtrace] [81850.245838] Pid: 3958, comm: firefox Tainted: P B 2.6.31.5-0.1-default #1 132-CK-NF79 [81850.245847] RIP: 0010:[<ffffffff8112a2ee>] [<ffffffff8112a2ee>] copy_pte_range+0x29e/0x580 [81850.245859] RSP: 0018:ffff88017fc3fb70 EFLAGS: 00010206 [81850.245867] RAX: ffffea0005055008 RBX: 800000016f3b7045 RCX: 800000016f3b7045 [81850.245876] RDX: 0000000000000000 RSI: 00007f695d7bd000 RDI: 800000016f3b7045 [81850.245884] RBP: ffff88017fc3fc40 R08: ffff88021d0641e8 R09: 000000000007e13f [81850.245892] R10: 0000000000000001 R11: ffff88018394dd28 R12: ffff88017fd1cde8 [81850.245900] R13: 00007f695d7bd000 R14: 0000000000000000 R15: ffff8801a3b04de8 [81850.245909] FS: 00007f699396a710(0000) GS:ffff880008e4a000(0000) knlGS:0000000000000000 [81850.245918] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [81850.245926] CR2: 0000000000000008 CR3: 000000017fd72000 CR4: 00000000000026f0 [81850.245934] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [81850.245943] DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400 [81850.245951] Process firefox (pid: 3958, threadinfo ffff88017fc3e000, task ffff88017fc62240) [81850.245960] Stack: [81850.245965] ffff88017fc3fc60 00000000000084d0 800000016f3b7045 ffffffff818d66c0 [81850.245973] <0> ffff88023c45c7c0 ffff88023b4df840 ffff88023b4df858 ffff88023b4df850 [81850.245984] <0> ffff88017fc3fc04 ffff88017fedd758 ffff8801c662a758 ffff88023c45c740 [81850.245999] Call Trace: [81850.246013] [<ffffffff8112c5b0>] copy_page_range+0x2d0/0x4f0 [81850.246026] [<ffffffff8106b2fa>] dup_mmap+0x20a/0x350 [81850.246038] [<ffffffff8106baf8>] dup_mm+0xd8/0x140 [81850.246050] [<ffffffff8106c3f2>] copy_process+0x832/0xff0 [81850.246061] [<ffffffff8106cdef>] do_fork+0x8f/0x430 [81850.246074] [<ffffffff8100a606>] sys_clone+0x36/0x60 [81850.246086] [<ffffffff8100c9a3>] stub_clone+0x13/0x20 [81850.246099] [<00007f6992d93946>] 0x7f6992d93946 [81850.246108] Code: 8d 40 ff ff ff e8 e3 fc ff ff 48 8b 8d 40 ff ff ff 48 85 c0 48 89 cb 74 31 48 8b 10 f7 c2 00 00 02 00 0f 85 47 01 00 00 48 89 c2 <f0> ff 42 08 f0 ff 40 0c f6 40 18 01 48 8d 55 c0 74 07 48 8b 95 [81850.246167] RIP [<ffffffff8112a2ee>] copy_pte_range+0x29e/0x580 [81850.246178] RSP <ffff88017fc3fb70> [81850.246184] CR2: 0000000000000008 [81850.246192] ---[ end trace 2e845d812a927e30 ]--- -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=566288
http://bugzilla.novell.com/show_bug.cgi?id=566288#c2
Greg Kroah-Hartman
http://bugzilla.novell.com/show_bug.cgi?id=566288
http://bugzilla.novell.com/show_bug.cgi?id=566288#c3
--- Comment #3 from Robert Schweikert
http://bugzilla.novell.com/show_bug.cgi?id=566288
http://bugzilla.novell.com/show_bug.cgi?id=566288#c4
--- Comment #4 from Robert Schweikert
http://bugzilla.novell.com/show_bug.cgi?id=566288
http://bugzilla.novell.com/show_bug.cgi?id=566288#c5
Robert Schweikert
http://bugzilla.novell.com/show_bug.cgi?id=566288
http://bugzilla.novell.com/show_bug.cgi?id=566288#c6
--- Comment #6 from Robert Schweikert
http://bugzilla.novell.com/show_bug.cgi?id=566288
http://bugzilla.novell.com/show_bug.cgi?id=566288#c7
Roman Drahtmueller
http://bugzilla.novell.com/show_bug.cgi?id=566288
http://bugzilla.novell.com/show_bug.cgi?id=566288#c8
--- Comment #8 from Robert Schweikert
http://bugzilla.novell.com/show_bug.cgi?id=566288
http://bugzilla.novell.com/show_bug.cgi?id=566288#c9
--- Comment #9 from Roman Drahtmueller
http://bugzilla.novell.com/show_bug.cgi?id=566288
http://bugzilla.novell.com/show_bug.cgi?id=566288#c10
--- Comment #10 from Robert Schweikert
http://bugzilla.novell.com/show_bug.cgi?id=566288
http://bugzilla.novell.com/show_bug.cgi?id=566288#c11
--- Comment #11 from Robert Schweikert
http://bugzilla.novell.com/show_bug.cgi?id=566288
http://bugzilla.novell.com/show_bug.cgi?id=566288#c12
--- Comment #12 from Robert Schweikert
http://bugzilla.novell.com/show_bug.cgi?id=566288
http://bugzilla.novell.com/show_bug.cgi?id=566288#c13
--- Comment #13 from Robert Schweikert
http://bugzilla.novell.com/show_bug.cgi?id=566288
http://bugzilla.novell.com/show_bug.cgi?id=566288#c14
Jan Kara
http://bugzilla.novell.com/show_bug.cgi?id=566288
http://bugzilla.novell.com/show_bug.cgi?id=566288#c15
Jeff Mahoney
http://bugzilla.novell.com/show_bug.cgi?id=566288
http://bugzilla.novell.com/show_bug.cgi?id=566288#c16
--- Comment #16 from Robert Schweikert
Can you try to reproduce this with the -debug kernel? It has a bunch of memory corruption checks that may help us track it down.
I'm not seeing any changes in forcedeth or bridging that would address this.
I don't really have time right now to deal with a flacky system sorry. I'll try to switch the machine to 11.2 again when things calm down a bit. Will then try the debug kernel. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=566288
http://bugzilla.novell.com/show_bug.cgi?id=566288#c17
Robert Schweikert
http://bugzilla.novell.com/show_bug.cgi?id=566288
http://bugzilla.novell.com/show_bug.cgi?id=566288#c18
--- Comment #18 from Robert Schweikert
http://bugzilla.novell.com/show_bug.cgi?id=566288
http://bugzilla.novell.com/show_bug.cgi?id=566288#c19
--- Comment #19 from Jan Kara
Here is a trace from an oops that occurred with the debug kernel. The trace occurred while heavy I/O was taking place.
Apr 17 01:18:35 triumph kernel: [629085.891576] page:ffffea0001e53000 flags:0020000000000000 count:-775699968 mapcount:-43821 mapping:(null) index:46 Apr 17 01:18:35 triumph kernel: [629085.891594] Pid: 1122, comm: tar Tainted: P The struct page looks corrupted.
2.6.31.12-0.2-default #1
Apr 17 01:20:01 triumph kernel: [629171.986844] BUG: Bad page state in process tar pfn:91492 Apr 17 01:20:01 triumph kernel: [629171.986865] page:ffffea0001fc7ff0 flags:0020000000000000 count:0 mapcount:0 mapping:000054d2d1c3c200 index:49f Again looks like corrupted struct page. Funnily enough, if you look how
The kernel is not a -debug one as you claim to be (maybe you've just installed the debug package for the -default kernel and not the -debug kernel flavor?). Moreover you have some proprietary modules loaded so this dump isn't particularly interesting... previous struct page was corrupted in hex, you'll see that pattern 0x54D3D1C3C200 was written to a part of the struct - upto one bit the same pattern that corrupted the mapping value in this struct page. So definitely something is scribbling over your memory => not too interesting unless you can reproduce it without tainted kernel. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
Here is some more evidence that XFS is in play. When I had trouble with the machine I dumped (dd) the partition containing all data onto an external drive. Now when I try to restore the data in some directories I get read errors (not unexpected). When those read errors occur I get the following trace in the kernel log.
Apr 19 12:05:14 triumph kernel: [ 4132.476952] XFS internal error XFS_WANT_CORRUPTED_GOTO at line 4653 of file /usr/src/packages/BUILD/kernel-desktop-2.6.31.12/linux-2.6.31/fs/xfs/xfs_bmap.c. XFS driver complains that the filesystem is corrupted (extent tree block
http://bugzilla.novell.com/show_bug.cgi?id=566288
http://bugzilla.novell.com/show_bug.cgi?id=566288#c20
Jan Kara
http://bugzilla.novell.com/show_bug.cgi?id=566288
http://bugzilla.novell.com/show_bug.cgi?id=566288#c21
--- Comment #21 from Robert Schweikert
(In reply to comment #17)
Here is a trace from an oops that occurred with the debug kernel. The trace occurred while heavy I/O was taking place.
Apr 17 01:18:35 triumph kernel: [629085.891576] page:ffffea0001e53000 flags:0020000000000000 count:-775699968 mapcount:-43821 mapping:(null) index:46 Apr 17 01:18:35 triumph kernel: [629085.891594] Pid: 1122, comm: tar Tainted: P The struct page looks corrupted.
2.6.31.12-0.2-default #1
The kernel is not a -debug one as you claim to be (maybe you've just installed the debug package for the -default kernel and not the -debug kernel flavor?). Moreover you have some proprietary modules loaded so this dump isn't particularly interesting...
OK, missed that. what's interesting is that the dump occurred while I was gone and I was certain I had booted the debug kernel before I left. oh well, missed opportunity. Sorry about the incorrect claim.
Apr 17 01:20:01 triumph kernel: [629171.986844] BUG: Bad page state in process tar pfn:91492 Apr 17 01:20:01 triumph kernel: [629171.986865] page:ffffea0001fc7ff0 flags:0020000000000000 count:0 mapcount:0 mapping:000054d2d1c3c200 index:49f Again looks like corrupted struct page. Funnily enough, if you look how previous struct page was corrupted in hex, you'll see that pattern 0x54D3D1C3C200 was written to a part of the struct - upto one bit the same pattern that corrupted the mapping value in this struct page. So definitely something is scribbling over your memory => not too interesting unless you can reproduce it without tainted kernel.
-- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=566288
http://bugzilla.novell.com/show_bug.cgi?id=566288#c22
--- Comment #22 from Robert Schweikert
Here is some more evidence that XFS is in play. When I had trouble with the machine I dumped (dd) the partition containing all data onto an external drive. Now when I try to restore the data in some directories I get read errors (not unexpected). When those read errors occur I get the following trace in the kernel log.
Apr 19 12:05:14 triumph kernel: [ 4132.476952] XFS internal error XFS_WANT_CORRUPTED_GOTO at line 4653 of file /usr/src/packages/BUILD/kernel-desktop-2.6.31.12/linux-2.6.31/fs/xfs/xfs_bmap.c. XFS driver complains that the filesystem is corrupted (extent tree block
(In reply to comment #18) pointers are bogus). Given that in the tainted kernel something is scribbling over your memory, it could be caused by that.
So could you just install openSUSE 11.2 anew on the problematic desktop and avoid installing any proprietary video card driver - maybe installing a system without X Windows would be the best and see whether the corruption still happens?
Installed openSUSE 11.2 again, this time I formatted the disk with ext4, we'll see how that works. Will report back. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=566288
http://bugzilla.novell.com/show_bug.cgi?id=566288#c23
--- Comment #23 from Robert Schweikert
http://bugzilla.novell.com/show_bug.cgi?id=566288
http://bugzilla.novell.com/show_bug.cgi?id=566288#c24
Robert Schweikert
http://bugzilla.novell.com/show_bug.cgi?id=566288
http://bugzilla.novell.com/show_bug.cgi?id=566288#c25
Jan Kara
http://bugzilla.novell.com/show_bug.cgi?id=566288
http://bugzilla.novell.com/show_bug.cgi?id=566288#c26
--- Comment #26 from Robert Schweikert
Thanks for testing Robert. It's again almost the same corruption pattern as in the message from Apr 17 01:18:35. BTW: The "Disabling lock debugging due to kernel taint" message is OK. BUG_ON also taints the kernel and so causes lock debugging to be turned off. Just for reference the beginning of the corrupted struct page as raw bytes is: 000000000000040000C2C3D1D354FFFF.
I agree with your analysis that this is either a bug in XFS or in sym53c8xx driver. Two things which would be worth trying to narrow this down a bit more: a) Try running with a 2.6.32 kernel (e.g. from SLE11 SP1) - you can get one from ftp://ftp.suse.com/pub/projects/kernel/kotd/SLE11-SP1/x86_64/. If the problem was already fixed we could try going through changelogs. b) Try formatting the SATA drive with XFS and run from it. That would narrow down whether the problem is in XFS or in the storage driver.
And one more question: Which older kernel worked fine for you (if there was some)?
The 11.1 kernel worked fine. I updated the system on a regular basis, thus I was running the latest patched kernel for openSUSE 11.1 Will try the SLES11-SP1 kernel next and report back. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=566288
http://bugzilla.novell.com/show_bug.cgi?id=566288#c27
Robert Schweikert
http://bugzilla.novell.com/show_bug.cgi?id=566288
http://bugzilla.novell.com/show_bug.cgi?id=566288#c28
Jan Kara
http://bugzilla.novell.com/show_bug.cgi?id=566288
http://bugzilla.novell.com/show_bug.cgi?id=566288#c29
Robert Schweikert
http://bugzilla.novell.com/show_bug.cgi?id=566288
http://bugzilla.novell.com/show_bug.cgi?id=566288#c30
--- Comment #30 from Robert Schweikert
http://bugzilla.novell.com/show_bug.cgi?id=566288
http://bugzilla.novell.com/show_bug.cgi?id=566288#c31
--- Comment #31 from Robert Schweikert
http://bugzilla.novell.com/show_bug.cgi?id=566288
http://bugzilla.novell.com/show_bug.cgi?id=566288#c32
--- Comment #32 from Jan Kara
http://bugzilla.novell.com/show_bug.cgi?id=566288
http://bugzilla.novell.com/show_bug.cgi?id=566288#c35
--- Comment #35 from Roman Drahtmueller
http://bugzilla.novell.com/show_bug.cgi?id=566288
http://bugzilla.novell.com/show_bug.cgi?id=566288#c36
--- Comment #36 from Robert Schweikert
http://bugzilla.novell.com/show_bug.cgi?id=566288
http://bugzilla.novell.com/show_bug.cgi?id=566288#c37
--- Comment #37 from Robert Schweikert
uname -a Linux triumph 2.6.32.12-0.4-debug #1 SMP 2010-05-06 15:28:06 +0200 x86_64 x86_64 x86_64 GNU/Linux
May 7 15:27:18 triumph kernel: [10680.933107] umount D ffff88000aa59cb8 0 30197 18070 0x00000000 May 7 15:27:18 triumph kernel: [10680.933114] ffff88012a7ebd48 0000000000000086 ffff88012a7ebce8 ffff88012a6a8040 May 7 15:27:18 triumph kernel: [10680.933121] 0000000000013c80 ffff88012a7ebfd8 0000000000013c80 ffff88012a7ebfd8 May 7 15:27:18 triumph kernel: [10680.933128] 0000000000013c80 0000000000013c80 0000000000013c80 0000000000013c80 May 7 15:27:18 triumph kernel: [10680.933134] Call Trace: May 7 15:27:18 triumph kernel: [10680.933154] [<ffffffff8117fade>] bdi_sched_wait+0xe/0x20 May 7 15:27:18 triumph kernel: [10680.933162] [<ffffffff81430202>] __wait_on_bit+0x62/0x90 May 7 15:27:18 triumph kernel: [10680.933168] [<ffffffff814302a9>] out_of_line_wait_on_bit+0x79/0x90 May 7 15:27:18 triumph kernel: [10680.933174] [<ffffffff8117fc71>] sync_inodes_sb+0x81/0xa0 May 7 15:27:18 triumph kernel: [10680.933179] [<ffffffff81185252>] __sync_filesystem+0x82/0x90 May 7 15:27:18 triumph kernel: [10680.933185] [<ffffffff8118545b>] sync_filesystem+0x4b/0x70 May 7 15:27:18 triumph kernel: [10680.933190] [<ffffffff8115aef7>] generic_shutdown_super+0x27/0xe0 May 7 15:27:18 triumph kernel: [10680.933196] [<ffffffff8115afe1>] kill_block_super+0x31/0x50 May 7 15:27:18 triumph kernel: [10680.933201] [<ffffffff8115b8b5>] deactivate_super+0x85/0xa0 May 7 15:27:18 triumph kernel: [10680.933207] [<ffffffff8117848a>] mntput_no_expire+0xca/0x110 May 7 15:27:18 triumph kernel: [10680.933213] [<ffffffff81178b68>] sys_umount+0x58/0xc0 May 7 15:27:18 triumph kernel: [10680.933219] [<ffffffff810032ab>] system_call_fastpath+0x16/0x1b May 7 15:27:18 triumph kernel: [10680.933240] [<00007f3de53d2ed7>] 0x7f3de53d2ed7 -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=566288
http://bugzilla.novell.com/show_bug.cgi?id=566288#c38
--- Comment #38 from Jan Kara
/proc/sys/kernel/hung_task_timeout_secs" to disable the above warning to not disturb you.
-- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=566288
http://bugzilla.novell.com/show_bug.cgi?id=566288#c39
--- Comment #39 from Robert Schweikert
http://bugzilla.novell.com/show_bug.cgi?id=566288
http://bugzilla.novell.com/show_bug.cgi?id=566288#c40
--- Comment #40 from Robert Schweikert
http://bugzilla.novell.com/show_bug.cgi?id=566288
http://bugzilla.novell.com/show_bug.cgi?id=566288#c41
--- Comment #41 from Robert Schweikert
No traceback, but the system hung over night. At some point after my backup process started the system dropped into never never land. Here is a snippet from the syslog:
May 15 00:45:33 triumph rsyslogd: -- MARK -- May 15 01:05:33 triumph rsyslogd: -- MARK -- May 15 01:12:01 triumph /usr/sbin/cron[10659]: (root) CMD (/usr/local/bin/simBackup --verbose) May 15 06:07:40 triumph kernel: imklog 4.4.1, log source = /proc/kmsg started. May 15 06:07:40 triumph rsyslogd: [origin software="rsyslogd" swVersion="4.4.1" x-pid="1917" x-info="http://www.rsyslog.com"] (re)start
At 1:12 my backup script started and then at some point the system hung. At 6:07 is the reboot.
I take it that instead of a corruption and traceback the protection of memory in this kernel now triggers a hang.
What else can I do?
And some additional information that might be useful. The backup is just a tar command, which did get issued according to my log file. Judging by the size of the tarball created and previous backups the hang occurred while the tar command was running. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
No traceback, but the system hung over night. .. I take it that instead of a corruption and traceback the protection of memory in this kernel now triggers a hang.
What else can I do? That would be a good sign (that we manage to catch the corruption earlier). Any chance of setting up a serial console or having a look at VGA console so
http://bugzilla.novell.com/show_bug.cgi?id=566288
http://bugzilla.novell.com/show_bug.cgi?id=566288#c42
--- Comment #42 from Jan Kara
http://bugzilla.novell.com/show_bug.cgi?id=566288
http://bugzilla.novell.com/show_bug.cgi?id=566288#c43
--- Comment #43 from Robert Schweikert
http://bugzilla.novell.com/show_bug.cgi?id=566288
http://bugzilla.novell.com/show_bug.cgi?id=566288#c44
--- Comment #44 from Jan Kara
http://bugzilla.novell.com/show_bug.cgi?id=566288
http://bugzilla.novell.com/show_bug.cgi?id=566288#c45
--- Comment #45 from Robert Schweikert
Are you able to switch terminals on the screen?
No, I cannot. The machine is in neverland when it hangs, i.e. no keyboard, or mouse action possible. I cannot ping the machine either.
The easiest is to check whether there's not something on the screen of the machine (or actually on syslog terminal SUSE has - tty10). If you're unable to switch terminals while the machine is hung, you can try switching to console 10 before trying to reproduce the hang.
Other alternatives are using USB<->serial convertors or netconsole (see Documentation/networking/netconsole.txt in the kernel sources).
BTW: This is very likely a different problem so it would make sense to create a separate bug for it. But we can wait a bit until you find out something about why the machine has hung.
-- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=566288
http://bugzilla.novell.com/show_bug.cgi?id=566288#c46
--- Comment #46 from Robert Schweikert
(In reply to comment #44)
Are you able to switch terminals on the screen?
No, I cannot. The machine is in neverland when it hangs, i.e. no keyboard, or mouse action possible. I cannot ping the machine either.
The easiest is to check whether there's not something on the screen of the machine (or actually on syslog terminal SUSE has - tty10). If you're unable to switch terminals while the machine is hung, you can try switching to console 10 before trying to reproduce the hang.
Other alternatives are using USB<->serial convertors or netconsole (see Documentation/networking/netconsole.txt in the kernel sources).
Still running the 2.6.32.12-0.4-debug kernel you provided, with the extra package installed as wel to gain ext4 support. However, there's no netconsole module for this kernel, is it built in or not built at all?
BTW: This is very likely a different problem so it would make sense to create a separate bug for it. But we can wait a bit until you find out something about why the machine has hung.
-- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=566288
http://bugzilla.novell.com/show_bug.cgi?id=566288#c47
--- Comment #47 from Jan Kara
http://bugzilla.novell.com/show_bug.cgi?id=566288
http://bugzilla.novell.com/show_bug.cgi?id=566288#c48
--- Comment #48 from Robert Schweikert
http://bugzilla.novell.com/show_bug.cgi?id=566288
http://bugzilla.novell.com/show_bug.cgi?id=566288#c49
--- Comment #49 from Jan Kara
http://bugzilla.novell.com/show_bug.cgi?id=566288
http://bugzilla.novell.com/show_bug.cgi?id=566288#c50
--- Comment #50 from Robert Schweikert
ping -c1 192.168.1.5 PING 192.168.1.5 (192.168.1.5) 56(84) bytes of data. 64 bytes from 192.168.1.5: icmp_seq=1 ttl=64 time=0.669 ms
-- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=566288
http://bugzilla.novell.com/show_bug.cgi?id=566288#c51
--- Comment #51 from Jan Kara
http://bugzilla.novell.com/show_bug.cgi?id=566288
http://bugzilla.novell.com/show_bug.cgi?id=566288#c52
--- Comment #52 from Robert Schweikert
That is strange. For me your command line works just fine. Can you strace the failing modprobe and see with what arguments is init_module function called?
mmap(NULL, 24832, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = 0x7fa9bdc9b000 init_module(0x7fa9bdc9b000, 24832, "netconsole=@/,@192.168.1.5/") = -1 EDESTADDRREQ (Destination address required) Could this be related to my network being setup as a bridge, i.e. I have br0 bound to eth1 -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=566288
http://bugzilla.novell.com/show_bug.cgi?id=566288#c53
--- Comment #53 from Jan Kara
http://bugzilla.novell.com/show_bug.cgi?id=566288
http://bugzilla.novell.com/show_bug.cgi?id=566288#c54
--- Comment #54 from Robert Schweikert
Ah, you have a bridge, that explains it. If you do not want to use eth0 as an interface you send UDP packets from, you have to explicitely state it in the netconsole argument. I.e., in your case you should have something like netconsole=@/br0,@192.168.1.5/
duh looked at the docs for netconsole a few times and it still didn't sink in. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=566288
http://bugzilla.novell.com/show_bug.cgi?id=566288#c55
--- Comment #55 from Robert Schweikert
http://bugzilla.novell.com/show_bug.cgi?id=566288
http://bugzilla.novell.com/show_bug.cgi?id=566288#c56
--- Comment #56 from Jan Kara
http://bugzilla.novell.com/show_bug.cgi?id=566288
http://bugzilla.novell.com/show_bug.cgi?id=566288#c57
--- Comment #57 from Robert Schweikert
You should get more information about the error in the kernel log... In this case I'd guess that br0 doesn't support some functionality netconsole needs so maybe try using eth1?
Using eth1 resulted in the "Destination address required" error seen previously. May 26 11:07:37 triumph kernel: [271946.997102] netconsole: local port 6665 May 26 11:07:37 triumph kernel: [271946.997106] netconsole: local IP 0.0.0.0 May 26 11:07:37 triumph kernel: [271946.997108] netconsole: interface br0 May 26 11:07:37 triumph kernel: [271946.997111] netconsole: remote port 6666 May 26 11:07:37 triumph kernel: [271946.997113] netconsole: remote IP 192.168.1.5 May 26 11:07:37 triumph kernel: [271946.997116] netconsole: remote ethernet address ff:ff:ff:ff:ff:ff May 26 11:07:37 triumph kernel: [271946.997121] netconsole: br0 doesn't support polling, aborting. May 26 11:07:37 triumph kernel: [271946.997143] netconsole: cleaning up -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=566288
http://bugzilla.novell.com/show_bug.cgi?id=566288#c58
--- Comment #58 from Jan Kara
https://bugzilla.novell.com/show_bug.cgi?id=566288
https://bugzilla.novell.com/show_bug.cgi?id=566288#c59
Robert Schweikert
participants (1)
-
bugzilla_noreply@novell.com