[Bug 807850] New: kernel BUG + full hang on drop_caches
https://bugzilla.novell.com/show_bug.cgi?id=807850 https://bugzilla.novell.com/show_bug.cgi?id=807850#c0 Summary: kernel BUG + full hang on drop_caches Classification: openSUSE Product: openSUSE 12.3 Version: RC 2 Platform: x86-64 OS/Version: Other Status: NEW Severity: Critical Priority: P5 - None Component: Kernel AssignedTo: kernel-maintainers@forge.provo.novell.com ReportedBy: munderl@tnt.uni-hannover.de QAContact: qa-bugs@suse.de Found By: --- Blocker: --- User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:19.0) Gecko/20100101 Firefox/19.0 echo 1 > /proc/sys/vm/drop_caches leads to immediate kernel BUG. Kernel hangs in an endless loop afterwards. Happens with default and desktop, on 3.7.10 and 3.7.9. [ 81.876598] BUG: unable to handle kernel NULL pointer dereference at 0000000000000058 [ 81.877186] IP: [<ffffffff81190be4>] drop_pagecache_sb+0x74/0xe0 [ 81.877803] PGD 252bc1067 PUD 253d11067 PMD 0 [ 81.878391] Oops: 0000 [#1] SMP [ 81.878965] Modules linked in: fuse af_packet xt_tcpudp xt_pkttype xt_LOG xt_limit bnep bluetooth ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw ipt_REJECT iptable_raw xt_CT iptable_filter ip6table_mangle nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables xt_conntrack nf_conntrack ip6table_filter ip6_tables x_tables cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq snd_hda_codec_hdmi mperf coretemp snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep kvm_intel snd_pcm arc4 snd_seq snd_timer snd_seq_device kvm iwldvm mac80211 snd uvcvideo crc32c_intel videobuf2_core videodev ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw videobuf2_vmalloc aes_x86_64 iTCO_wdt xts tpm_infineon mei r8169 videobuf2_memops iTCO_vendor_support sr_mod lpc_ich iwlwifi gf128mul sony_laptop rts_pstor(C) cdrom i2c_i801 tpm_tis tpm tpm_bios battery mfd_core soundcore snd_page_alloc cfg80211 rfkill ac sg microcode pcspkr autofs4 xhci_hcd ehci_hcd usbcore usb_common radeon i915 video ttm drm_kms_helper drm i2c_algo_bit thermal button processor thermal_sys scsi_dh_emc scsi_dh_rdac scsi_dh_hp_sw scsi_dh_alua scsi_dh [ 81.883248] CPU 0 [ 81.883256] Pid: 1452, comm: bash Tainted: G C 3.7.10-1.1-default #1 Sony Corporation VPCSA4W9E/VAIO [ 81.884813] RIP: 0010:[<ffffffff81190be4>] [<ffffffff81190be4>] drop_pagecache_sb+0x74/0xe0 [ 81.885602] RSP: 0018:ffff880252bc9e18 EFLAGS: 00010246 [ 81.886391] RAX: 0000000000000000 RBX: ffff88024ecb7db0 RCX: 0000000000000002 [ 81.887216] RDX: 0000000000000007 RSI: ffff88024f63a670 RDI: ffff88024ecb7e38 [ 81.888010] RBP: ffff88024ecb7e38 R08: dead000000200200 R09: 0000000000000000 [ 81.888809] R10: 0000000000000001 R11: 0000000000000210 R12: ffff880254d588a0 [ 81.889616] R13: ffff88024fcb25e8 R14: ffffffff81190b70 R15: ffffffffffffffea [ 81.890428] FS: 00007fad2b9ed700(0000) GS:ffff88025fa00000(0000) knlGS:0000000000000000 [ 81.891251] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 81.892083] CR2: 0000000000000058 CR3: 0000000252ad2000 CR4: 00000000000407f0 [ 81.892969] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 81.893863] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 81.894761] Process bash (pid: 1452, threadinfo ffff880252bc8000, task ffff880253d321c0) [ 81.895684] Stack: [ 81.896594] 0000000000000001 ffff880254d58800 ffff880254e94800 ffff880254d58868 [ 81.897529] 0000000000000000 ffffffff8116a499 0000000000000000 0000000000000001 [ 81.898470] ffffffff81a228a0 ffff880252bc9f50 0000000000000002 ffffffff81190cce [ 81.899421] Call Trace: [ 81.900365] [<ffffffff8116a499>] iterate_supers+0xd9/0xe0 [ 81.901320] [<ffffffff81190cce>] drop_caches_sysctl_handler+0x7e/0x90 [ 81.902285] [<ffffffff811d0e26>] proc_sys_call_handler.isra.10+0xc6/0xe0 [ 81.903260] [<ffffffff81166fd7>] vfs_write+0xa7/0x180 [ 81.904239] [<ffffffff81167321>] sys_write+0x51/0xa0 [ 81.905216] [<ffffffff8154f2ed>] system_call_fastpath+0x1a/0x1f [ 81.906204] [<00007fad2ae959c0>] 0x7fad2ae959bf [ 81.907194] Code: 01 00 00 49 39 c4 48 8d 98 00 ff ff ff 74 68 48 8d ab 88 00 00 00 48 89 ef e8 49 69 3b 00 f6 83 a0 00 00 00 38 75 d0 48 8b 43 30 <48> 83 78 58 00 74 c5 48 89 df e8 dd ef fe ff 66 83 45 00 01 66 [ 81.909430] RIP [<ffffffff81190be4>] drop_pagecache_sb+0x74/0xe0 [ 81.910532] RSP <ffff880252bc9e18> [ 81.911640] CR2: 0000000000000058 Reproducible: Always Steps to Reproduce: 1. echo 1 > /proc/sys/vm/drop_caches 2. 3. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=807850
https://bugzilla.novell.com/show_bug.cgi?id=807850#c1
Michal Hocko
https://bugzilla.novell.com/show_bug.cgi?id=807850
https://bugzilla.novell.com/show_bug.cgi?id=807850#c2
Marco Munderloh
https://bugzilla.novell.com/show_bug.cgi?id=807850
https://bugzilla.novell.com/show_bug.cgi?id=807850#c3
Michal Hocko
I blacklisted the rts_pstor module and the kernel is not tainted anymore. However, the problem persists.
OK, this was just a long shot, but definitely good to know. [...]
I set up kdump but it doesn't fire up on a kernel crash, even on manually triggered one (echo c > /proc/sysrq_trigger). Don't know why, it should be set up correctly (rckdump status tells so). Disabling KMS didn't help. Also, the Oops is not in the logs anymore, maybe lost somewhere in the kdump?
This is strange. I assume you crashkernel kernel boot parameter is setup properly and the logs say that the kdump kernel is loaded successfully (you can eventually check that by reading /sys/kernel/kexec_crash_loaded (it should return 1). Let's CC Jack as this sounds ext4 related. Does this ring bells? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=807850
https://bugzilla.novell.com/show_bug.cgi?id=807850#c4
--- Comment #4 from Marco Munderloh
https://bugzilla.novell.com/show_bug.cgi?id=807850
https://bugzilla.novell.com/show_bug.cgi?id=807850#c5
--- Comment #5 from Michal Hocko
I set crashkernel=256M-:128M and the logs say the kdump kernel is loaded. cat /sys/kernel/kexec_crash_loaded returns 1. Still no kdump :|
Please report this as a separate issue. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=807850
https://bugzilla.novell.com/show_bug.cgi?id=807850#c6
--- Comment #6 from Marco Munderloh
https://bugzilla.novell.com/show_bug.cgi?id=807850
https://bugzilla.novell.com/show_bug.cgi?id=807850#c7
--- Comment #7 from Michal Hocko
https://bugzilla.novell.com/show_bug.cgi?id=807850
https://bugzilla.novell.com/show_bug.cgi?id=807850#c8
--- Comment #8 from Marco Munderloh
https://bugzilla.novell.com/show_bug.cgi?id=807850
https://bugzilla.novell.com/show_bug.cgi?id=807850#c9
--- Comment #9 from Michal Hocko
https://bugzilla.novell.com/show_bug.cgi?id=807850
https://bugzilla.novell.com/show_bug.cgi?id=807850#c10
--- Comment #10 from Michal Hocko
https://bugzilla.novell.com/show_bug.cgi?id=807850
https://bugzilla.novell.com/show_bug.cgi?id=807850#c11
Michal Hocko
https://bugzilla.novell.com/show_bug.cgi?id=807850
https://bugzilla.novell.com/show_bug.cgi?id=807850#c12
--- Comment #12 from Marco Munderloh
https://bugzilla.novell.com/show_bug.cgi?id=807850
https://bugzilla.novell.com/show_bug.cgi?id=807850#c13
--- Comment #13 from Marco Munderloh
https://bugzilla.novell.com/show_bug.cgi?id=807850
https://bugzilla.novell.com/show_bug.cgi?id=807850#c14
--- Comment #14 from Michal Hocko
I applied the patch (after fixing the erroneous "." into ",").
dang, I forgot to refresh the patch... This should have all the parts, so only the compile fix is missing. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=807850
https://bugzilla.novell.com/show_bug.cgi?id=807850#c15
--- Comment #15 from Michal Hocko
I applied the patch (after fixing the erroneous "." into ",").
Problem is the BUG report does not go into the logs anymore. However, there is a message for the intel drm driver:
(the drm_open and the drm_stub call are sometimes [<ffffffffa0047e28>] drm_open+0x6a8/0x6e0 [drm] and [<ffffffffa0048565>] drm_stub_open+0xe5/0x170 [drm])
Do you have the full log?
WARNING: at drivers/gpu/drm/drm_fops.c:165 drm_open+0x6a8/0x6e0 [drm]() Hardware name: VPCSA4W9E err_undo inode:ffff88024efdd350 drm_device{devname:(null)}
Interesting so this focuses us to drm code and err_undo tells us that we are taking error path and dev->dev_mapping is NULL. The code few lines above says: old_mapping = dev->dev_mapping; if (old_mapping == NULL) dev->dev_mapping = &inode->i_data; so the NULL dev_mapping should be reinitialized but curiously enough old_mapping never finds out. http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdif... has introduced this code (in 3.7-rc5). I will attach the patch which I think should fix the issue in the next commit. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=807850
https://bugzilla.novell.com/show_bug.cgi?id=807850#c16
--- Comment #16 from Michal Hocko
https://bugzilla.novell.com/show_bug.cgi?id=807850
https://bugzilla.novell.com/show_bug.cgi?id=807850#c17
--- Comment #17 from Marco Munderloh
https://bugzilla.novell.com/show_bug.cgi?id=807850
https://bugzilla.novell.com/show_bug.cgi?id=807850#c18
--- Comment #18 from Michal Hocko
https://bugzilla.novell.com/show_bug.cgi?id=807850
https://bugzilla.novell.com/show_bug.cgi?id=807850#c19
Michal Hocko
https://bugzilla.novell.com/show_bug.cgi?id=807850
https://bugzilla.novell.com/show_bug.cgi?id=807850#c20
--- Comment #20 from Marco Munderloh
https://bugzilla.novell.com/show_bug.cgi?id=807850
https://bugzilla.novell.com/show_bug.cgi?id=807850#c21
Michal Hocko
https://bugzilla.novell.com/show_bug.cgi?id=807850
https://bugzilla.novell.com/show_bug.cgi?id=807850#c
Swamp Workflow Management
https://bugzilla.novell.com/show_bug.cgi?id=807850
https://bugzilla.novell.com/show_bug.cgi?id=807850#c22
--- Comment #22 from Bernhard Wiedemann
https://bugzilla.novell.com/show_bug.cgi?id=807850
https://bugzilla.novell.com/show_bug.cgi?id=807850#c23
--- Comment #23 from Swamp Workflow Management
http://bugzilla.novell.com/show_bug.cgi?id=807850
Swamp Workflow Management
participants (1)
-
bugzilla_noreply@novell.com