[opensuse-kernel] PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
Hi all, in 4.0.0-rc I have seen a few crashes, always when running KVM guests (IIRC). Today I was able to capture a crash dump, this is the backtrace from dmesg.txt: [242060.604870] PANIC: double fault, error_code: 0x0 [242060.604878] CPU: 1 PID: 2132 Comm: qemu-system-x86 Tainted: G W 4.0.0-rc3-2.gd5c547f-desktop #1 [242060.604880] Hardware name: LENOVO 74665EG/74665EG, BIOS 6DET71WW (3.21 ) 12/13/2011 [242060.604883] task: ffff880103f46150 ti: ffff8801013d4000 task.ti: ffff8801013d4000 [242060.604885] RIP: 0010:[<ffffffff816834ad>] [<ffffffff816834ad>] page_fault+0xd/0x30 [242060.604893] RSP: 0018:00007fffa55eafb8 EFLAGS: 00010016 [242060.604895] RAX: 000000000000aa40 RBX: 0000000000000001 RCX: ffffffff81682237 [242060.604896] RDX: 000000000000aa40 RSI: 0000000000000000 RDI: 00007fffa55eb078 [242060.604898] RBP: 00007fffa55f1c1c R08: 0000000000000008 R09: 0000000000000000 [242060.604900] R10: 0000000000000000 R11: 0000000000000293 R12: 000000000000004a [242060.604902] R13: 00007ffa356b5d60 R14: 000000000000000f R15: 00007ffa3556cf20 [242060.604904] FS: 00007ffa33dbfa80(0000) GS:ffff88023bc80000(0000) knlGS:0000000000000000 [242060.604906] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [242060.604908] CR2: 00007fffa55eafa8 CR3: 0000000002d7e000 CR4: 00000000000427e0 [242060.604909] Stack: [242060.604942] BUG: unable to handle kernel paging request at 00007fffa55eafb8 [242060.604995] IP: [<ffffffff81005b44>] show_stack_log_lvl+0x124/0x190 [242060.605036] PGD 4779a067 PUD 40e3e067 PMD 4769e067 PTE 0 [242060.605078] Oops: 0000 [#1] PREEMPT SMP [242060.605106] Modules linked in: vhost_net vhost macvtap macvlan nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc fscache nls_iso8859_1 nls_cp437 vfat fat ppp_deflate bsd_comp ppp_async crc_ccitt ppp_generic slhc ses enclosure uas usb_storage cmac algif_hash ctr ccm rfcomm fuse xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT xt_tcpudp tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables af_packet bnep dm_crypt ecb cbc algif_skcipher af_alg xfs libcrc32c snd_hda_codec_conexant snd_hda_codec_generic iTCO_wdt iTCO_vendor_support snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm_oss snd_pcm [242060.605396] dm_mod snd_seq snd_seq_device snd_timer coretemp kvm_intel kvm snd_mixer_oss cdc_ether cdc_wdm cdc_acm usbnet mii arc4 uvcvideo videobuf2_vmalloc videobuf2_memops thinkpad_acpi videobuf2_core btusb v4l2_common videodev i2c_i801 iwldvm bluetooth serio_raw mac80211 pcspkr e1000e iwlwifi snd lpc_ich mei_me ptp mfd_core pps_core mei cfg80211 shpchp wmi soundcore rfkill battery ac tpm_tis tpm acpi_cpufreq i915 xhci_pci xhci_hcd i2c_algo_bit drm_kms_helper drm thermal video button processor sg loop [242060.605396] CPU: 1 PID: 2132 Comm: qemu-system-x86 Tainted: G W 4.0.0-rc3-2.gd5c547f-desktop #1 [242060.605396] Hardware name: LENOVO 74665EG/74665EG, BIOS 6DET71WW (3.21 ) 12/13/2011 [242060.605396] task: ffff880103f46150 ti: ffff8801013d4000 task.ti: ffff8801013d4000 [242060.605396] RIP: 0010:[<ffffffff81005b44>] [<ffffffff81005b44>] show_stack_log_lvl+0x124/0x190 [242060.605396] RSP: 0018:ffff88023bc84e88 EFLAGS: 00010046 [242060.605396] RAX: 00007fffa55eafc0 RBX: 00007fffa55eafb8 RCX: ffff88023bc7ffc0 [242060.605396] RDX: 0000000000000000 RSI: ffff88023bc84f58 RDI: 0000000000000000 [242060.605396] RBP: ffff88023bc83fc0 R08: ffffffff81a2fe15 R09: 0000000000000020 [242060.605396] R10: 0000000000000afb R11: ffff88023bc84bee R12: ffff88023bc84f58 [242060.605396] R13: 0000000000000000 R14: ffffffff81a2fe15 R15: 0000000000000000 [242060.605396] FS: 00007ffa33dbfa80(0000) GS:ffff88023bc80000(0000) knlGS:0000000000000000 [242060.605396] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [242060.605396] CR2: 00007fffa55eafb8 CR3: 0000000002d7e000 CR4: 00000000000427e0 [242060.605396] Stack: [242060.605396] 0000000002d7e000 0000000000000008 ffff88023bc84ee8 00007fffa55eafb8 [242060.605396] 0000000000000000 ffff88023bc84f58 00007fffa55eafb8 0000000000000040 [242060.605396] 00007ffa356b5d60 000000000000000f 00007ffa3556cf20 ffffffff81005c36 [242060.605396] Call Trace: [242060.605396] [<ffffffff81005c36>] show_regs+0x86/0x210 [242060.605396] [<ffffffff8104636f>] df_debug+0x1f/0x30 [242060.605396] [<ffffffff810041a4>] do_double_fault+0x84/0x100 [242060.605396] [<ffffffff81683088>] double_fault+0x28/0x30 [242060.605396] [<ffffffff816834ad>] page_fault+0xd/0x30 [242060.605396] Code: fe a2 81 31 c0 89 54 24 08 48 89 0c 24 48 8b 5b f8 e8 cc 06 67 00 48 8b 0c 24 8b 54 24 08 85 d2 74 05 f6 c2 03 74 48 48 8d 43 08 <48> 8b 33 48 c7 c7 0d fe a2 81 89 54 24 14 48 89 4c 24 08 48 89 [242060.605396] RIP [<ffffffff81005b44>] show_stack_log_lvl+0x124/0x190 [242060.605396] RSP <ffff88023bc84e88> [242060.605396] CR2: 00007fffa55eafb8 I would not totally rule out a hardware problem, since this machine had another weird crash where it crashed and the bios beeper was constant on until I hit the power button for 5 seconds. Does the above state indicate hardware/memory problem, or is it time to try to really dive into that crash dump? Thanks for any hints, Stefan -- Stefan Seyfried "For a successful technology, reality must take precedence over public relations, for nature cannot be fooled." -- Richard Feynman -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kernel+owner@opensuse.org
Am 14.03.2015 um 22:58 schrieb Stefan Seyfried:
Hi all,
in 4.0.0-rc I have seen a few crashes, always when running KVM guests (IIRC). Today I was able to capture a crash dump, this is the backtrace from dmesg.txt:
[242060.604870] PANIC: double fault, error_code: 0x0 [242060.604878] CPU: 1 PID: 2132 Comm: qemu-system-x86 Tainted: G W 4.0.0-rc3-2.gd5c547f-desktop #1 [242060.604880] Hardware name: LENOVO 74665EG/74665EG, BIOS 6DET71WW (3.21 ) 12/13/2011 [242060.604883] task: ffff880103f46150 ti: ffff8801013d4000 task.ti: ffff8801013d4000 [242060.604885] RIP: 0010:[<ffffffff816834ad>] [<ffffffff816834ad>] page_fault+0xd/0x30 [242060.604893] RSP: 0018:00007fffa55eafb8 EFLAGS: 00010016 [242060.604895] RAX: 000000000000aa40 RBX: 0000000000000001 RCX: ffffffff81682237 [242060.604896] RDX: 000000000000aa40 RSI: 0000000000000000 RDI: 00007fffa55eb078 [242060.604898] RBP: 00007fffa55f1c1c R08: 0000000000000008 R09: 0000000000000000 [242060.604900] R10: 0000000000000000 R11: 0000000000000293 R12: 000000000000004a [242060.604902] R13: 00007ffa356b5d60 R14: 000000000000000f R15: 00007ffa3556cf20 [242060.604904] FS: 00007ffa33dbfa80(0000) GS:ffff88023bc80000(0000) knlGS:0000000000000000 [242060.604906] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [242060.604908] CR2: 00007fffa55eafa8 CR3: 0000000002d7e000 CR4: 00000000000427e0 [242060.604909] Stack: [242060.604942] BUG: unable to handle kernel paging request at 00007fffa55eafb8 [242060.604995] IP: [<ffffffff81005b44>] show_stack_log_lvl+0x124/0x190 [242060.605036] PGD 4779a067 PUD 40e3e067 PMD 4769e067 PTE 0 [242060.605078] Oops: 0000 [#1] PREEMPT SMP [242060.605106] Modules linked in: vhost_net vhost macvtap macvlan nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc fscache nls_iso8859_1 nls_cp437 vfat fat ppp_deflate bsd_comp ppp_async crc_ccitt ppp_generic slhc ses enclosure uas usb_storage cmac algif_hash ctr ccm rfcomm fuse xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT xt_tcpudp tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables af_packet bnep dm_crypt ecb cbc algif_skcipher af_alg xfs libcrc32c snd_hda_codec_conexant snd_hda_codec_generic iTCO_wdt iTCO_vendor_support snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm_oss snd_pcm [242060.605396] dm_mod snd_seq snd_seq_device snd_timer coretemp kvm_intel kvm snd_mixer_oss cdc_ether cdc_wdm cdc_acm usbnet mii arc4 uvcvideo videobuf2_vmalloc videobuf2_memops thinkpad_acpi videobuf2_core btusb v4l2_common videodev i2c_i801 iwldvm bluetooth serio_raw mac80211 pcspkr e1000e iwlwifi snd lpc_ich mei_me ptp mfd_core pps_core mei cfg80211 shpchp wmi soundcore rfkill battery ac tpm_tis tpm acpi_cpufreq i915 xhci_pci xhci_hcd i2c_algo_bit drm_kms_helper drm thermal video button processor sg loop [242060.605396] CPU: 1 PID: 2132 Comm: qemu-system-x86 Tainted: G W 4.0.0-rc3-2.gd5c547f-desktop #1 [242060.605396] Hardware name: LENOVO 74665EG/74665EG, BIOS 6DET71WW (3.21 ) 12/13/2011 [242060.605396] task: ffff880103f46150 ti: ffff8801013d4000 task.ti: ffff8801013d4000 [242060.605396] RIP: 0010:[<ffffffff81005b44>] [<ffffffff81005b44>] show_stack_log_lvl+0x124/0x190 [242060.605396] RSP: 0018:ffff88023bc84e88 EFLAGS: 00010046 [242060.605396] RAX: 00007fffa55eafc0 RBX: 00007fffa55eafb8 RCX: ffff88023bc7ffc0 [242060.605396] RDX: 0000000000000000 RSI: ffff88023bc84f58 RDI: 0000000000000000 [242060.605396] RBP: ffff88023bc83fc0 R08: ffffffff81a2fe15 R09: 0000000000000020 [242060.605396] R10: 0000000000000afb R11: ffff88023bc84bee R12: ffff88023bc84f58 [242060.605396] R13: 0000000000000000 R14: ffffffff81a2fe15 R15: 0000000000000000 [242060.605396] FS: 00007ffa33dbfa80(0000) GS:ffff88023bc80000(0000) knlGS:0000000000000000 [242060.605396] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [242060.605396] CR2: 00007fffa55eafb8 CR3: 0000000002d7e000 CR4: 00000000000427e0 [242060.605396] Stack: [242060.605396] 0000000002d7e000 0000000000000008 ffff88023bc84ee8 00007fffa55eafb8 [242060.605396] 0000000000000000 ffff88023bc84f58 00007fffa55eafb8 0000000000000040 [242060.605396] 00007ffa356b5d60 000000000000000f 00007ffa3556cf20 ffffffff81005c36 [242060.605396] Call Trace: [242060.605396] [<ffffffff81005c36>] show_regs+0x86/0x210 [242060.605396] [<ffffffff8104636f>] df_debug+0x1f/0x30 [242060.605396] [<ffffffff810041a4>] do_double_fault+0x84/0x100 [242060.605396] [<ffffffff81683088>] double_fault+0x28/0x30 [242060.605396] [<ffffffff816834ad>] page_fault+0xd/0x30 [242060.605396] Code: fe a2 81 31 c0 89 54 24 08 48 89 0c 24 48 8b 5b f8 e8 cc 06 67 00 48 8b 0c 24 8b 54 24 08 85 d2 74 05 f6 c2 03 74 48 48 8d 43 08 <48> 8b 33 48 c7 c7 0d fe a2 81 89 54 24 14 48 89 4c 24 08 48 89 [242060.605396] RIP [<ffffffff81005b44>] show_stack_log_lvl+0x124/0x190 [242060.605396] RSP <ffff88023bc84e88> [242060.605396] CR2: 00007fffa55eafb8
I would not totally rule out a hardware problem, since this machine had another weird crash where it crashed and the bios beeper was constant on until I hit the power button for 5 seconds. Does the above state indicate hardware/memory problem, or is it time to try to really dive into that crash dump?
Too bad, that is not an option: susi:/var/crash/2015-03-14-22:46 # crash vmlinux-4.0.0-rc3-2.gd5c547f-desktop vmcore crash 7.1.0 [...] This GDB was configured as "x86_64-unknown-linux-gnu"... WARNING: kernels compiled by different gcc versions: vmlinux-4.0.0-rc3-2.gd5c547f-desktop: (unknown) vmcore kernel: 4.8.3 WARNING: kernel version inconsistency between vmlinux and dumpfile crash: incompatible arguments: vmlinux-4.0.0-rc3-2.gd5c547f-desktop is not SMP -- vmcore is SMP -- Stefan Seyfried "For a successful technology, reality must take precedence over public relations, for nature cannot be fooled." -- Richard Feynman -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kernel+owner@opensuse.org
On Sat 2015-03-14 23:16:20, Stefan Seyfried wrote:
Am 14.03.2015 um 22:58 schrieb Stefan Seyfried:
Hi all,
in 4.0.0-rc I have seen a few crashes, always when running KVM guests (IIRC). Today I was able to capture a crash dump, this is the backtrace from dmesg.txt:
[242060.604870] PANIC: double fault, error_code: 0x0 [242060.604878] CPU: 1 PID: 2132 Comm: qemu-system-x86 Tainted: G W 4.0.0-rc3-2.gd5c547f-desktop #1 [242060.604880] Hardware name: LENOVO 74665EG/74665EG, BIOS 6DET71WW (3.21 ) 12/13/2011 [242060.604883] task: ffff880103f46150 ti: ffff8801013d4000 task.ti: ffff8801013d4000 [242060.604885] RIP: 0010:[<ffffffff816834ad>] [<ffffffff816834ad>] page_fault+0xd/0x30 [242060.604893] RSP: 0018:00007fffa55eafb8 EFLAGS: 00010016 [242060.604895] RAX: 000000000000aa40 RBX: 0000000000000001 RCX: ffffffff81682237 [242060.604896] RDX: 000000000000aa40 RSI: 0000000000000000 RDI: 00007fffa55eb078 [242060.604898] RBP: 00007fffa55f1c1c R08: 0000000000000008 R09: 0000000000000000 [242060.604900] R10: 0000000000000000 R11: 0000000000000293 R12: 000000000000004a [242060.604902] R13: 00007ffa356b5d60 R14: 000000000000000f R15: 00007ffa3556cf20 [242060.604904] FS: 00007ffa33dbfa80(0000) GS:ffff88023bc80000(0000) knlGS:0000000000000000 [242060.604906] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [242060.604908] CR2: 00007fffa55eafa8 CR3: 0000000002d7e000 CR4: 00000000000427e0 [242060.604909] Stack: [242060.604942] BUG: unable to handle kernel paging request at 00007fffa55eafb8 [242060.604995] IP: [<ffffffff81005b44>] show_stack_log_lvl+0x124/0x190 [242060.605036] PGD 4779a067 PUD 40e3e067 PMD 4769e067 PTE 0 [242060.605078] Oops: 0000 [#1] PREEMPT SMP [242060.605106] Modules linked in: vhost_net vhost macvtap macvlan nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc fscache nls_iso8859_1 nls_cp437 vfat fat ppp_deflate bsd_comp ppp_async crc_ccitt ppp_generic slhc ses enclosure uas usb_storage cmac algif_hash ctr ccm rfcomm fuse xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT xt_tcpudp tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables af_packet bnep dm_crypt ecb cbc algif_skcipher af_alg xfs libcrc32c snd_hda_codec_conexant snd_hda_codec_generic iTCO_wdt iTCO_vendor_support snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm_oss snd_pcm [242060.605396] dm_mod snd_seq snd_seq_device snd_timer coretemp kvm_intel kvm snd_mixer_oss cdc_ether cdc_wdm cdc_acm usbnet mii arc4 uvcvideo videobuf2_vmalloc videobuf2_memops thinkpad_acpi videobuf2_core btusb v4l2_common videodev i2c_i801 iwldvm bluetooth serio_raw mac80211 pcspkr e1000e iwlwifi snd lpc_ich mei_me ptp mfd_core pps_core mei cfg80211 shpchp wmi soundcore rfkill battery ac tpm_tis tpm acpi_cpufreq i915 xhci_pci xhci_hcd i2c_algo_bit drm_kms_helper drm thermal video button processor sg loop [242060.605396] CPU: 1 PID: 2132 Comm: qemu-system-x86 Tainted: G W 4.0.0-rc3-2.gd5c547f-desktop #1 [242060.605396] Hardware name: LENOVO 74665EG/74665EG, BIOS 6DET71WW (3.21 ) 12/13/2011 [242060.605396] task: ffff880103f46150 ti: ffff8801013d4000 task.ti: ffff8801013d4000 [242060.605396] RIP: 0010:[<ffffffff81005b44>] [<ffffffff81005b44>] show_stack_log_lvl+0x124/0x190 [242060.605396] RSP: 0018:ffff88023bc84e88 EFLAGS: 00010046 [242060.605396] RAX: 00007fffa55eafc0 RBX: 00007fffa55eafb8 RCX: ffff88023bc7ffc0 [242060.605396] RDX: 0000000000000000 RSI: ffff88023bc84f58 RDI: 0000000000000000 [242060.605396] RBP: ffff88023bc83fc0 R08: ffffffff81a2fe15 R09: 0000000000000020 [242060.605396] R10: 0000000000000afb R11: ffff88023bc84bee R12: ffff88023bc84f58 [242060.605396] R13: 0000000000000000 R14: ffffffff81a2fe15 R15: 0000000000000000 [242060.605396] FS: 00007ffa33dbfa80(0000) GS:ffff88023bc80000(0000) knlGS:0000000000000000 [242060.605396] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [242060.605396] CR2: 00007fffa55eafb8 CR3: 0000000002d7e000 CR4: 00000000000427e0 [242060.605396] Stack: [242060.605396] 0000000002d7e000 0000000000000008 ffff88023bc84ee8 00007fffa55eafb8 [242060.605396] 0000000000000000 ffff88023bc84f58 00007fffa55eafb8 0000000000000040 [242060.605396] 00007ffa356b5d60 000000000000000f 00007ffa3556cf20 ffffffff81005c36 [242060.605396] Call Trace: [242060.605396] [<ffffffff81005c36>] show_regs+0x86/0x210 [242060.605396] [<ffffffff8104636f>] df_debug+0x1f/0x30 [242060.605396] [<ffffffff810041a4>] do_double_fault+0x84/0x100 [242060.605396] [<ffffffff81683088>] double_fault+0x28/0x30 [242060.605396] [<ffffffff816834ad>] page_fault+0xd/0x30 [242060.605396] Code: fe a2 81 31 c0 89 54 24 08 48 89 0c 24 48 8b 5b f8 e8 cc 06 67 00 48 8b 0c 24 8b 54 24 08 85 d2 74 05 f6 c2 03 74 48 48 8d 43 08 <48> 8b 33 48 c7 c7 0d fe a2 81 89 54 24 14 48 89 4c 24 08 48 89 [242060.605396] RIP [<ffffffff81005b44>] show_stack_log_lvl+0x124/0x190 [242060.605396] RSP <ffff88023bc84e88> [242060.605396] CR2: 00007fffa55eafb8
I would not totally rule out a hardware problem, since this machine had another weird crash where it crashed and the bios beeper was constant on until I hit the power button for 5 seconds. Does the above state indicate hardware/memory problem, or is it time to try to really dive into that crash dump?
Too bad, that is not an option:
susi:/var/crash/2015-03-14-22:46 # crash vmlinux-4.0.0-rc3-2.gd5c547f-desktop vmcore
crash 7.1.0 [...] This GDB was configured as "x86_64-unknown-linux-gnu"...
WARNING: kernels compiled by different gcc versions: vmlinux-4.0.0-rc3-2.gd5c547f-desktop: (unknown) vmcore kernel: 4.8.3
WARNING: kernel version inconsistency between vmlinux and dumpfile
crash: incompatible arguments: vmlinux-4.0.0-rc3-2.gd5c547f-desktop is not SMP -- vmcore is SMP
Well, the dmesg messages are valid even if the kernel and vmcore are incompatible. But the above snippet does not help much. There were most likely one or more errors printed before. A previous error probably triggered show_stack_log_lv failed with the double fault. Also the kernel is already tainted, so there was probably an error message when it has happened. By other words, could you please send the whole dmesg log that is accessible from the vmcore? Best Regards, Petr -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kernel+owner@opensuse.org
Hi Petr, Am 16.03.2015 um 16:12 schrieb Petr Mladek:
On Sat 2015-03-14 23:16:20, Stefan Seyfried wrote:
Am 14.03.2015 um 22:58 schrieb Stefan Seyfried:
Hi all,
in 4.0.0-rc I have seen a few crashes, always when running KVM guests (IIRC). Today I was able to capture a crash dump, this is the backtrace from dmesg.txt:
I would not totally rule out a hardware problem, since this machine had another weird crash where it crashed and the bios beeper was constant on until I hit the power button for 5 seconds. Does the above state indicate hardware/memory problem, or is it time to try to really dive into that crash dump?
Too bad, that is not an option:
susi:/var/crash/2015-03-14-22:46 # crash vmlinux-4.0.0-rc3-2.gd5c547f-desktop vmcore
crash 7.1.0 [...] This GDB was configured as "x86_64-unknown-linux-gnu"...
WARNING: kernels compiled by different gcc versions: vmlinux-4.0.0-rc3-2.gd5c547f-desktop: (unknown) vmcore kernel: 4.8.3
WARNING: kernel version inconsistency between vmlinux and dumpfile
crash: incompatible arguments: vmlinux-4.0.0-rc3-2.gd5c547f-desktop is not SMP -- vmcore is SMP
I fixed that by applying an upstream patch so that kernel 4.0 is recognized, SR#290838 to Kernel:kdump Unfortunately, the dump does not tell me much more :-)
Well, the dmesg messages are valid even if the kernel and vmcore are incompatible. But the above snippet does not help much. There were most likely one or more errors printed before. A previous error probably triggered show_stack_log_lv failed with the double fault. Also the kernel is already tainted, so there was probably an error message when it has happened.
There was a warning hours before: [199863.599115] usb 2-1: USB disconnect, device number 11 [199863.602226] blk_update_request: I/O error, dev sdb, sector 392872 [199863.602238] Buffer I/O error on dev sdb6, logical block 981, lost async page write [199863.656036] Buffer I/O error on dev sdb3, logical block 16385, lost sync page write [199863.656041] JBD2: Error -5 detected when updating journal superblock for sdb3-8. [199863.656071] Buffer I/O error on dev sdb3, logical block 1, lost sync page write [199863.656126] ------------[ cut here ]------------ [199863.656133] WARNING: CPU: 0 PID: 23916 at ../fs/block_dev.c:57 __blkdev_put+0x1b7/0x200() [199863.656135] Modules linked in: nls_iso8859_1 nls_cp437 vfat fat ppp_deflate bsd_comp ppp_async crc_ccitt ppp [199863.656189] videobuf2_vmalloc videobuf2_memops thinkpad_acpi videobuf2_core btusb v4l2_common videodev i2c_ [199863.656221] CPU: 0 PID: 23916 Comm: umount Not tainted 4.0.0-rc3-2.gd5c547f-desktop #1 [199863.656223] Hardware name: LENOVO 74665EG/74665EG, BIOS 6DET71WW (3.21 ) 12/13/2011 [199863.656225] 0000000000000000 ffffffff81a6733a ffffffff8167ab5d 0000000000000000 [199863.656229] ffffffff81063af1 ffff880189b708c0 ffff880189b70a38 ffff880189b709b0 [199863.656232] ffff8801c2f60000 ffff880189b708d8 ffffffff8120f957 ffff880189b708d8 [199863.656235] Call Trace: [199863.656248] [<ffffffff8100576c>] dump_trace+0x8c/0x340 [199863.656253] [<ffffffff81005ac3>] show_stack_log_lvl+0xa3/0x190 [199863.656257] [<ffffffff81007221>] show_stack+0x21/0x50 [199863.656263] [<ffffffff8167ab5d>] dump_stack+0x47/0x67 [199863.656269] [<ffffffff81063af1>] warn_slowpath_common+0x81/0xb0 [199863.656273] [<ffffffff8120f957>] __blkdev_put+0x1b7/0x200 [199863.656280] [<ffffffff811da7e7>] deactivate_locked_super+0x47/0x80 [199863.656286] [<ffffffff811f6c8b>] cleanup_mnt+0x3b/0x80 [199863.656291] [<ffffffff8107f724>] task_work_run+0xc4/0xe0 [199863.656295] [<ffffffff81002f89>] do_notify_resume+0x69/0x90 [199863.656301] [<ffffffff8168166b>] int_signal+0x12/0x17 [199863.656311] [<00007f9ea4b03ae7>] 0x7f9ea4b03ae7 [199863.656313] ---[ end trace 8f65dffbbd0d78f0 ]--- [199863.676157] Buffer I/O error on dev sdb6, logical block 1606581, lost sync page write [199863.676161] JBD2: Error -5 detected when updating journal superblock for sdb6-8. [199863.676208] Buffer I/O error on dev sdb6, logical block 0, lost sync page write I don't think is is related, but who knows.
By other words, could you please send the whole dmesg log that is accessible from the vmcore?
http://paste.opensuse.org/48196621 Thanks for having a look. I have for now gone bak to 3.19.1. If it is a hardware problem, I should trigger it there, too. Best regards, Stefan -- Stefan Seyfried "For a successful technology, reality must take precedence over public relations, for nature cannot be fooled." -- Richard Feynman -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kernel+owner@opensuse.org
On Mon 2015-03-16 19:39:24, Stefan Seyfried wrote:
Hi Petr,
Am 16.03.2015 um 16:12 schrieb Petr Mladek:
On Sat 2015-03-14 23:16:20, Stefan Seyfried wrote:
Am 14.03.2015 um 22:58 schrieb Stefan Seyfried:
Hi all,
in 4.0.0-rc I have seen a few crashes, always when running KVM guests (IIRC). Today I was able to capture a crash dump, this is the backtrace from dmesg.txt:
I would not totally rule out a hardware problem, since this machine had another weird crash where it crashed and the bios beeper was constant on until I hit the power button for 5 seconds. Does the above state indicate hardware/memory problem, or is it time to try to really dive into that crash dump?
Too bad, that is not an option:
susi:/var/crash/2015-03-14-22:46 # crash vmlinux-4.0.0-rc3-2.gd5c547f-desktop vmcore
crash 7.1.0 [...] This GDB was configured as "x86_64-unknown-linux-gnu"...
WARNING: kernels compiled by different gcc versions: vmlinux-4.0.0-rc3-2.gd5c547f-desktop: (unknown) vmcore kernel: 4.8.3
WARNING: kernel version inconsistency between vmlinux and dumpfile
crash: incompatible arguments: vmlinux-4.0.0-rc3-2.gd5c547f-desktop is not SMP -- vmcore is SMP
I fixed that by applying an upstream patch so that kernel 4.0 is recognized, SR#290838 to Kernel:kdump Unfortunately, the dump does not tell me much more :-)
Well, the dmesg messages are valid even if the kernel and vmcore are incompatible. But the above snippet does not help much. There were most likely one or more errors printed before. A previous error probably triggered show_stack_log_lv failed with the double fault. Also the kernel is already tainted, so there was probably an error message when it has happened.
There was a warning hours before: [199863.599115] usb 2-1: USB disconnect, device number 11 [199863.602226] blk_update_request: I/O error, dev sdb, sector 392872 [199863.602238] Buffer I/O error on dev sdb6, logical block 981, lost async page write [199863.656036] Buffer I/O error on dev sdb3, logical block 16385, lost sync page write [199863.656041] JBD2: Error -5 detected when updating journal superblock for sdb3-8. [199863.656071] Buffer I/O error on dev sdb3, logical block 1, lost sync page write [199863.656126] ------------[ cut here ]------------ [199863.656133] WARNING: CPU: 0 PID: 23916 at ../fs/block_dev.c:57 __blkdev_put+0x1b7/0x200() [199863.656135] Modules linked in: nls_iso8859_1 nls_cp437 vfat fat ppp_deflate bsd_comp ppp_async crc_ccitt ppp [199863.656189] videobuf2_vmalloc videobuf2_memops thinkpad_acpi videobuf2_core btusb v4l2_common videodev i2c_ [199863.656221] CPU: 0 PID: 23916 Comm: umount Not tainted 4.0.0-rc3-2.gd5c547f-desktop #1 [199863.656223] Hardware name: LENOVO 74665EG/74665EG, BIOS 6DET71WW (3.21 ) 12/13/2011 [199863.656225] 0000000000000000 ffffffff81a6733a ffffffff8167ab5d 0000000000000000 [199863.656229] ffffffff81063af1 ffff880189b708c0 ffff880189b70a38 ffff880189b709b0 [199863.656232] ffff8801c2f60000 ffff880189b708d8 ffffffff8120f957 ffff880189b708d8 [199863.656235] Call Trace: [199863.656248] [<ffffffff8100576c>] dump_trace+0x8c/0x340 [199863.656253] [<ffffffff81005ac3>] show_stack_log_lvl+0xa3/0x190 [199863.656257] [<ffffffff81007221>] show_stack+0x21/0x50 [199863.656263] [<ffffffff8167ab5d>] dump_stack+0x47/0x67 [199863.656269] [<ffffffff81063af1>] warn_slowpath_common+0x81/0xb0 [199863.656273] [<ffffffff8120f957>] __blkdev_put+0x1b7/0x200 [199863.656280] [<ffffffff811da7e7>] deactivate_locked_super+0x47/0x80 [199863.656286] [<ffffffff811f6c8b>] cleanup_mnt+0x3b/0x80 [199863.656291] [<ffffffff8107f724>] task_work_run+0xc4/0xe0 [199863.656295] [<ffffffff81002f89>] do_notify_resume+0x69/0x90 [199863.656301] [<ffffffff8168166b>] int_signal+0x12/0x17 [199863.656311] [<00007f9ea4b03ae7>] 0x7f9ea4b03ae7 [199863.656313] ---[ end trace 8f65dffbbd0d78f0 ]--- [199863.676157] Buffer I/O error on dev sdb6, logical block 1606581, lost sync page write [199863.676161] JBD2: Error -5 detected when updating journal superblock for sdb6-8. [199863.676208] Buffer I/O error on dev sdb6, logical block 0, lost sync page write
I don't think is is related, but who knows.
To be honest, I am still not much experienced with debugging such problems, so I am not sure. Anyway, it seems that this warning caused that the kernel was tainted. Unfortunately, I am unable to see more useful information in the final PANIC. The doulble fault has happened when showing stack. It might help to know who triggered showing the stack but I do not see it anywhere.
By other words, could you please send the whole dmesg log that is accessible from the vmcore?
There are also several segfaults from libvirt_driver_interface.so. IMHO, they should not break kernel but they are suspicious. This partial problem seems to be related to https://bugzilla.opensuse.org/show_bug.cgi?id=920551
Thanks for having a look. I have for now gone bak to 3.19.1. If it is a hardware problem, I should trigger it there, too.
It is worth try, definitely. Best Regards, Petr -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kernel+owner@opensuse.org
Hi Petr, Am 17.03.2015 um 11:06 schrieb Petr Mladek:
On Mon 2015-03-16 19:39:24, Stefan Seyfried wrote:
There was a warning hours before: [199863.599115] usb 2-1: USB disconnect, device number 11 [199863.602226] blk_update_request: I/O error, dev sdb, sector 392872 [199863.602238] Buffer I/O error on dev sdb6, logical block 981, lost async page write [199863.656036] Buffer I/O error on dev sdb3, logical block 16385, lost sync page write [199863.656041] JBD2: Error -5 detected when updating journal superblock for sdb3-8. [199863.656071] Buffer I/O error on dev sdb3, logical block 1, lost sync page write [199863.656126] ------------[ cut here ]------------ [199863.656133] WARNING: CPU: 0 PID: 23916 at ../fs/block_dev.c:57 __blkdev_put+0x1b7/0x200() [199863.656135] Modules linked in: nls_iso8859_1 nls_cp437 vfat fat ppp_deflate bsd_comp ppp_async crc_ccitt ppp [199863.656189] videobuf2_vmalloc videobuf2_memops thinkpad_acpi videobuf2_core btusb v4l2_common videodev i2c_ [199863.656221] CPU: 0 PID: 23916 Comm: umount Not tainted 4.0.0-rc3-2.gd5c547f-desktop #1 [199863.656223] Hardware name: LENOVO 74665EG/74665EG, BIOS 6DET71WW (3.21 ) 12/13/2011 [199863.656225] 0000000000000000 ffffffff81a6733a ffffffff8167ab5d 0000000000000000 [199863.656229] ffffffff81063af1 ffff880189b708c0 ffff880189b70a38 ffff880189b709b0 [199863.656232] ffff8801c2f60000 ffff880189b708d8 ffffffff8120f957 ffff880189b708d8 [199863.656235] Call Trace: [199863.656248] [<ffffffff8100576c>] dump_trace+0x8c/0x340 [199863.656253] [<ffffffff81005ac3>] show_stack_log_lvl+0xa3/0x190 [199863.656257] [<ffffffff81007221>] show_stack+0x21/0x50 [199863.656263] [<ffffffff8167ab5d>] dump_stack+0x47/0x67 [199863.656269] [<ffffffff81063af1>] warn_slowpath_common+0x81/0xb0 [199863.656273] [<ffffffff8120f957>] __blkdev_put+0x1b7/0x200 [199863.656280] [<ffffffff811da7e7>] deactivate_locked_super+0x47/0x80 [199863.656286] [<ffffffff811f6c8b>] cleanup_mnt+0x3b/0x80 [199863.656291] [<ffffffff8107f724>] task_work_run+0xc4/0xe0 [199863.656295] [<ffffffff81002f89>] do_notify_resume+0x69/0x90 [199863.656301] [<ffffffff8168166b>] int_signal+0x12/0x17 [199863.656311] [<00007f9ea4b03ae7>] 0x7f9ea4b03ae7 [199863.656313] ---[ end trace 8f65dffbbd0d78f0 ]--- [199863.676157] Buffer I/O error on dev sdb6, logical block 1606581, lost sync page write [199863.676161] JBD2: Error -5 detected when updating journal superblock for sdb6-8. [199863.676208] Buffer I/O error on dev sdb6, logical block 0, lost sync page write
I don't think is is related, but who knows.
To be honest, I am still not much experienced with debugging such problems, so I am not sure. Anyway, it seems that this warning caused that the kernel was tainted.
I think so, too, that the "W"arn taint is from this warning.
Unfortunately, I am unable to see more useful information in the final PANIC. The doulble fault has happened when showing stack. It might help to know who triggered showing the stack but I do not see it anywhere.
By other words, could you please send the whole dmesg log that is accessible from the vmcore?
There are also several segfaults from libvirt_driver_interface.so. IMHO, they should not break kernel but they are suspicious. This partial problem seems to be related to https://bugzilla.opensuse.org/show_bug.cgi?id=920551
They are definitely related to that, again they should not crash the kernel.
Thanks for having a look. I have for now gone bak to 3.19.1. If it is a hardware problem, I should trigger it there, too.
It is worth try, definitely.
It's all I can do now anyway :-) I'll start some stress-testing inside the VM an will watch out for crashes. Thanks for looking, I have also sent this problem to the linux-kernel list, but apparently I'm the only one having those issues. Best regards, Stefan -- Stefan Seyfried "For a successful technology, reality must take precedence over public relations, for nature cannot be fooled." -- Richard Feynman -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kernel+owner@opensuse.org
On Sat, Mar 14, 2015 at 11:16:20PM +0100, Stefan Seyfried wrote:
[242060.605396] CPU: 1 PID: 2132 Comm: qemu-system-x86 Tainted: G W 4.0.0-rc3-2.gd5c547f-desktop #1 [242060.605396] Hardware name: LENOVO 74665EG/74665EG, BIOS 6DET71WW (3.21 ) 12/13/2011 [242060.605396] task: ffff880103f46150 ti: ffff8801013d4000 task.ti: ffff8801013d4000 [242060.605396] RIP: 0010:[<ffffffff81005b44>] [<ffffffff81005b44>] show_stack_log_lvl+0x124/0x190 [242060.605396] RSP: 0018:ffff88023bc84e88 EFLAGS: 00010046 [242060.605396] RAX: 00007fffa55eafc0 RBX: 00007fffa55eafb8 RCX: ffff88023bc7ffc0 [242060.605396] RDX: 0000000000000000 RSI: ffff88023bc84f58 RDI: 0000000000000000 [242060.605396] RBP: ffff88023bc83fc0 R08: ffffffff81a2fe15 R09: 0000000000000020 [242060.605396] R10: 0000000000000afb R11: ffff88023bc84bee R12: ffff88023bc84f58 [242060.605396] R13: 0000000000000000 R14: ffffffff81a2fe15 R15: 0000000000000000 [242060.605396] FS: 00007ffa33dbfa80(0000) GS:ffff88023bc80000(0000) knlGS:0000000000000000 [242060.605396] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [242060.605396] CR2: 00007fffa55eafb8 CR3: 0000000002d7e000 CR4: 00000000000427e0 [242060.605396] Stack: [242060.605396] 0000000002d7e000 0000000000000008 ffff88023bc84ee8 00007fffa55eafb8 [242060.605396] 0000000000000000 ffff88023bc84f58 00007fffa55eafb8 0000000000000040 [242060.605396] 00007ffa356b5d60 000000000000000f 00007ffa3556cf20 ffffffff81005c36 [242060.605396] Call Trace: [242060.605396] [<ffffffff81005c36>] show_regs+0x86/0x210 [242060.605396] [<ffffffff8104636f>] df_debug+0x1f/0x30 [242060.605396] [<ffffffff810041a4>] do_double_fault+0x84/0x100 [242060.605396] [<ffffffff81683088>] double_fault+0x28/0x30 [242060.605396] [<ffffffff816834ad>] page_fault+0xd/0x30 [242060.605396] Code: fe a2 81 31 c0 89 54 24 08 48 89 0c 24 48 8b 5b f8 e8 cc 06 67 00 48 8b 0c 24 8b 54 24 08 85 d2 74 05 f6 c2 03 74 48 48 8d 43 08 <48> 8b 33 48 c7 c7 0d fe a2 81 89 54 24 14 48 89 4c 24 08 48 89 [242060.605396] RIP [<ffffffff81005b44>] show_stack_log_lvl+0x124/0x190 [242060.605396] RSP <ffff88023bc84e88> [242060.605396] CR2: 00007fffa55eafb8
I encountered a similar problem recently. The thing is, x86 specification says that on a double fault, RIP and RSP registers are undefined, i.e. you not only can't expect them to contain values corresponding to the first or second fault but you can't even expect them to have any usable values at all. Unfortunately the kernel double fault handler doesn't take this into account and does try to display usual crash related information so that it itself does usually crash when trying to show stack content (that's the show_stack_log_lvl() crash). The result is a double fault (which itself would be very hard to debug) followed by a crash in its handler so that analysing the outcome is extremely difficult. Michal Kubeček -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kernel+owner@opensuse.org
On 17.03.15 at 11:34,
wrote: I encountered a similar problem recently. The thing is, x86 specification says that on a double fault, RIP and RSP registers are undefined, i.e. you not only can't expect them to contain values corresponding to the first or second fault but you can't even expect them to have any usable values at all.
While saved CS and RIP are indeed documented to be undefined in recent manuals, this isn't the case for RSP afaics, and this also wasn't always the case. Having a reliable stack pointer to dump from would already be a good start. Printing CS:RIP is certainly useful too - in my experience, they're usually valid (i.e. can help in identifying the cause), and I don't think there's any other strong dependency (i.e. other than the possible printing of opcode bytes) on their values. Jan -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kernel+owner@opensuse.org
participants (4)
-
Jan Beulich
-
Michal Kubecek
-
Petr Mladek
-
Stefan Seyfried