[Bug 641900] New: xen kernel crash after about 16 hours, network stoped later/sometime disk control crash too
https://bugzilla.novell.com/show_bug.cgi?id=641900 https://bugzilla.novell.com/show_bug.cgi?id=641900#c0 Summary: xen kernel crash after about 16 hours, network stoped later/sometime disk control crash too Classification: openSUSE Product: openSUSE 11.3 Version: Final Platform: x86-64 OS/Version: openSUSE 11.3 Status: NEW Severity: Critical Priority: P5 - None Component: Kernel AssignedTo: kernel-maintainers@forge.provo.novell.com ReportedBy: disk_91@hotmail.com QAContact: qa@suse.de Found By: --- Blocker: --- User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; fr; rv:1.9.2.10) Gecko/20100914 SUSE/3.6.10-0.3.1 Firefox/3.6.10 Sep 25 05:50:21 saturn kernel: [58361.780047] ------------[ cut here ]------------ Sep 25 05:50:21 saturn kernel: [58361.780058] WARNING: at /usr/src/packages/BUILD/kernel-xen-2.6.34.7/linux-2.6.34/net/sched/sch_generic.c:256 dev_watchdog+0x25b/0x270() Sep 25 05:50:21 saturn kernel: [58361.780060] Hardware name: System Product Name Sep 25 05:50:21 saturn kernel: [58361.780062] NETDEV WATCHDOG: eth3 (forcedeth): transmit queue 0 timed out Sep 25 05:50:21 saturn kernel: [58361.780063] Modules linked in: ip6t_LOG xt_tcpudp xt_pkttype xt_physdev ipt_LOG xt_limit usbbk gntdev netbk it87 blkbk blkback_pagemap hwmon_vid snd_pcm_oss blktap domctl xenbus_be coretemp evtchn snd_mixer_oss snd_seq snd_seq_device edd nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs bridge stp llc ip6t_REJECT nf_conntrack_ipv6 ip6table_raw xt_NOTRACK ipt_REJECT xt_state iptable_raw iptable_filter ip6table_mangle nf_conntrack_netbios_ns nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables ip6table_filter ip6_tables x_tables fuse loop snd_hda_codec_realtek firewire_ohci firewire_core crc_itu_t snd_hda_intel snd_hda_codec snd_hwdep snd_pcm ohci1394 ieee1394 8139too snd_timer usblp usbhid i2c_nforce2 hid i2c_core 8139cp forcedeth pcspkr snd soundcore snd_page_alloc sr_mod sg shpchp pci_hotplug ext4 jbd2 crc16 dm_mirror dm_region_hash dm_log ohci_hcd ehci_hcd sd_mod usbcore dm_snapshot dm_mod xenblk cdrom xennet processor ata_generic pata_amd sata_nv libata scsi_mod thermal_sys h Sep 25 05:50:21 saturn kernel: wmon Sep 25 05:50:21 saturn kernel: [58361.780122] Pid: 0, comm: swapper Not tainted 2.6.34.7-0.3-xen #1 Sep 25 05:50:21 saturn kernel: [58361.780124] Call Trace: Sep 25 05:50:21 saturn kernel: [58361.780135] [<ffffffff80009646>] dump_trace+0x76/0x1a0 Sep 25 05:50:21 saturn kernel: [58361.780139] [<ffffffff8040a79b>] dump_stack+0x69/0x6f Sep 25 05:50:21 saturn kernel: [58361.780143] [<ffffffff80043943>] warn_slowpath_common+0x73/0xb0 Sep 25 05:50:21 saturn kernel: [58361.780146] [<ffffffff800439e0>] warn_slowpath_fmt+0x40/0x50 Sep 25 05:50:21 saturn kernel: [58361.780149] [<ffffffff8034d04b>] dev_watchdog+0x25b/0x270 Sep 25 05:50:21 saturn kernel: [58361.780155] [<ffffffff80053d34>] run_timer_softirq+0x1d4/0x3d0 Sep 25 05:50:21 saturn kernel: [58361.780159] [<ffffffff8004b8c8>] __do_softirq+0xe8/0x220 Sep 25 05:50:21 saturn kernel: [58361.780162] [<ffffffff80007efc>] call_softirq+0x1c/0x30 Sep 25 05:50:21 saturn kernel: [58361.780166] [<ffffffff80009595>] do_softirq+0xa5/0xe0 Sep 25 05:50:21 saturn kernel: [58361.780175] [<ffffffff8004bafd>] irq_exit+0x8d/0xa0 Sep 25 05:50:21 saturn kernel: [58361.780182] [<ffffffff802d27d2>] evtchn_do_upcall+0x222/0x270 Sep 25 05:50:21 saturn kernel: [58361.780188] [<ffffffff80007a4e>] do_hypervisor_callback+0x1e/0x30 Sep 25 05:50:21 saturn kernel: [58361.780207] [<ffffffff800033aa>] 0xffffffff800033aa Sep 25 05:50:21 saturn kernel: [58361.780213] [<ffffffff80009c0c>] xen_safe_halt+0xc/0x10 Sep 25 05:50:21 saturn kernel: [58361.780216] [<ffffffff8000e763>] xen_idle+0x43/0xc0 Sep 25 05:50:21 saturn kernel: [58361.780220] [<ffffffff80005255>] cpu_idle+0x55/0xa0 Sep 25 05:50:21 saturn kernel: [58361.780223] ---[ end trace 92ba00751c8e0e8f ]--- Sep 25 05:50:21 saturn kernel: [58361.780226] eth3: Got tx_timeout. irq: 00000036 Sep 25 05:50:21 saturn kernel: [58361.780228] eth3: Ring at 80f8000 Sep 25 05:50:21 saturn kernel: [58361.780229] eth3: Dumping tx registers Sep 25 05:50:21 saturn kernel: [58361.780235] 0: 00000036 000000df 00000003 0009000d 00000000 00000000 00000000 00000000 Sep 25 05:50:21 saturn kernel: [58361.780241] 20: 00000000 f0000000 00000000 00000000 00000000 00000000 00000000 00000000 Sep 25 05:50:21 saturn kernel: [58361.780246] 40: 0420e20e 0000a455 00002e20 00000000 00000000 00000000 00000000 00000000 Sep 25 05:50:21 saturn kernel: [58361.780254] 60: 00000000 00000000 00000000 0000ffff 0000ffff 0000ffff 0000ffff 00000000 [...] continue ... saturn:/home/disk # uname -a Linux saturn 2.6.34.7-0.3-xen #1 SMP 2010-09-20 15:27:38 +0200 x86_64 x86_64 x86_64 GNU/Linux saturn:/home/disk # lspci 00:00.0 Host bridge: nVidia Corporation C55 Host Bridge (rev a2) 00:00.1 RAM memory: nVidia Corporation C55 Memory Controller (rev a1) 00:00.2 RAM memory: nVidia Corporation C55 Memory Controller (rev a1) 00:00.3 RAM memory: nVidia Corporation C55 Memory Controller (rev a1) 00:00.4 RAM memory: nVidia Corporation C55 Memory Controller (rev a1) 00:00.5 RAM memory: nVidia Corporation C55 Memory Controller (rev a2) 00:00.6 RAM memory: nVidia Corporation C55 Memory Controller (rev a1) 00:00.7 RAM memory: nVidia Corporation C55 Memory Controller (rev a1) 00:01.0 RAM memory: nVidia Corporation C55 Memory Controller (rev a1) 00:01.1 RAM memory: nVidia Corporation C55 Memory Controller (rev a1) 00:01.2 RAM memory: nVidia Corporation C55 Memory Controller (rev a1) 00:01.3 RAM memory: nVidia Corporation C55 Memory Controller (rev a1) 00:01.4 RAM memory: nVidia Corporation C55 Memory Controller (rev a1) 00:01.5 RAM memory: nVidia Corporation C55 Memory Controller (rev a1) 00:01.6 RAM memory: nVidia Corporation C55 Memory Controller (rev a1) 00:02.0 RAM memory: nVidia Corporation C55 Memory Controller (rev a1) 00:02.1 RAM memory: nVidia Corporation C55 Memory Controller (rev a1) 00:02.2 RAM memory: nVidia Corporation C55 Memory Controller (rev a1) 00:03.0 PCI bridge: nVidia Corporation C55 PCI Express bridge (rev a1) 00:09.0 RAM memory: nVidia Corporation MCP51 Host Bridge (rev a2) 00:0a.0 ISA bridge: nVidia Corporation MCP51 LPC Bridge (rev a3) 00:0a.1 SMBus: nVidia Corporation MCP51 SMBus (rev a3) 00:0a.2 RAM memory: nVidia Corporation MCP51 Memory Controller 0 (rev a3) 00:0b.0 USB Controller: nVidia Corporation MCP51 USB Controller (rev a3) 00:0b.1 USB Controller: nVidia Corporation MCP51 USB Controller (rev a3) 00:0d.0 IDE interface: nVidia Corporation MCP51 IDE (rev a1) 00:0e.0 RAID bus controller: nVidia Corporation MCP51 Serial ATA Controller (rev a1) 00:0f.0 IDE interface: nVidia Corporation MCP51 Serial ATA Controller (rev a1) 00:10.0 PCI bridge: nVidia Corporation MCP51 PCI Bridge (rev a2) 00:10.1 Audio device: nVidia Corporation MCP51 High Definition Audio (rev a2) 00:14.0 Bridge: nVidia Corporation MCP51 Ethernet Controller (rev a3) 01:00.0 PCI bridge: nVidia Corporation Device 05bf (rev a2) 02:00.0 PCI bridge: nVidia Corporation Device 05bf (rev a2) 02:01.0 PCI bridge: nVidia Corporation Device 05bf (rev a2) 02:02.0 PCI bridge: nVidia Corporation Device 05bf (rev a2) 02:03.0 PCI bridge: nVidia Corporation Device 05bf (rev a2) 03:00.0 VGA compatible controller: nVidia Corporation NV44 [GeForce 6200 LE] (rev a1) 07:06.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10) 07:07.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10) 07:08.0 FireWire (IEEE 1394): VIA Technologies, Inc. VT6306/7/8 [Fire II(M)] IEEE 1394 OHCI Controller (rev c0) Hardware : Asus P5N-D last bios version Cpu : model name : Intel(R) Core(TM)2 Quad CPU Q8400 @ 2.66GHz stepping : 10 cpu MHz : 2666.728 Loaded modules: saturn:/home/disk # lsmod Module Size Used by ip6t_LOG 5898 7 xt_tcpudp 2859 26 xt_pkttype 1288 3 xt_physdev 1867 2 ipt_LOG 6067 17 xt_limit 2495 24 usbbk 23847 0 gntdev 8579 3 netbk 41414 0 [permanent] blkbk 28814 0 [permanent] it87 38738 0 blkback_pagemap 2806 1 blkbk snd_pcm_oss 53487 0 hwmon_vid 3226 1 it87 snd_mixer_oss 18913 1 snd_pcm_oss blktap 126702 2 [permanent] domctl 3227 2 blkbk,blktap xenbus_be 3706 4 usbbk,netbk,blkbk,blktap coretemp 6523 0 snd_seq 67827 0 evtchn 38482 4 snd_seq_device 7834 1 snd_seq edd 10176 0 nfsd 330017 9 lockd 84204 1 nfsd nfs_acl 3107 1 nfsd auth_rpcgss 49079 1 nfsd sunrpc 255540 15 nfsd,lockd,nfs_acl,auth_rpcgss exportfs 4715 1 nfsd bridge 85911 2 stp 2331 1 bridge llc 6103 2 bridge,stp ip6t_REJECT 4828 3 nf_conntrack_ipv6 21550 4 ip6table_raw 1627 1 xt_NOTRACK 1192 4 ipt_REJECT 2672 3 xt_state 1618 18 iptable_raw 1686 1 iptable_filter 1946 1 ip6table_mangle 2036 0 nf_conntrack_netbios_ns 1758 0 nf_conntrack_ipv4 10379 14 nf_conntrack 87570 5 nf_conntrack_ipv6,xt_NOTRACK,xt_state,nf_conntrack_netbios_ns,nf_conntrack_ipv4 nf_defrag_ipv4 1673 1 nf_conntrack_ipv4 ip_tables 21762 2 iptable_raw,iptable_filter ip6table_filter 1887 1 ip6_tables 23384 4 ip6t_LOG,ip6table_raw,ip6table_mangle,ip6table_filter x_tables 25752 17 ip6t_LOG,xt_tcpudp,xt_pkttype,xt_physdev,ipt_LOG,xt_limit,ip6t_REJECT,ip6table_raw,xt_NOTRACK,ipt_REJECT,xt_state,iptable_raw,iptable_filter,ip6table_mangle,ip_tables,ip6table_filter,ip6_tables fuse 77021 3 loop 18239 6 snd_hda_codec_realtek 324063 1 firewire_ohci 26970 0 snd_hda_intel 29229 2 firewire_core 61434 1 firewire_ohci crc_itu_t 1747 1 firewire_core snd_hda_codec 112811 2 snd_hda_codec_realtek,snd_hda_intel snd_hwdep 7676 1 snd_hda_codec snd_pcm 107771 3 snd_pcm_oss,snd_hda_intel,snd_hda_codec snd_timer 27312 2 snd_seq,snd_pcm snd 83454 14 snd_pcm_oss,snd_mixer_oss,snd_seq,snd_seq_device,snd_hda_codec_realtek,snd_hda_intel,snd_hda_codec,snd_hwdep,snd_pcm,snd_timer ohci1394 33542 0 soundcore 8757 1 snd 8139too 35962 0 i2c_nforce2 7561 0 usblp 13961 0 pcspkr 2222 0 snd_page_alloc 9473 2 snd_hda_intel,snd_pcm forcedeth 61485 0 8139cp 25731 0 i2c_core 32104 1 i2c_nforce2 ieee1394 104214 1 ohci1394 sr_mod 16364 0 shpchp 34692 0 sg 33047 0 pci_hotplug 31949 1 shpchp ext4 399185 2 jbd2 98208 1 ext4 crc16 1715 1 ext4 usbhid 52713 0 hid 85698 1 usbhid dm_mirror 15871 1 dm_region_hash 13661 1 dm_mirror dm_log 10948 3 dm_mirror,dm_region_hash ohci_hcd 36442 0 ehci_hcd 60996 0 sd_mod 41170 2 usbcore 231747 6 usbbk,usblp,usbhid,ohci_hcd,ehci_hcd dm_snapshot 35225 0 dm_mod 86467 17 dm_mirror,dm_log,dm_snapshot xenblk 26098 0 cdrom 43051 2 sr_mod,xenblk xennet 37357 0 processor 42760 0 ata_generic 3739 0 pata_amd 12922 0 sata_nv 25589 2 libata 211385 3 ata_generic,pata_amd,sata_nv scsi_mod 191240 4 sr_mod,sg,sd_mod,libata thermal_sys 18006 1 processor hwmon 2712 3 it87,coretemp,thermal_sys Other informations: The system is running no specific application on Dom0 The system is running 2 virtual machine based on OpenSuse 11.1 Hope this will help to fix that bug !!! Do not hesitate to contact me to get more informations ... Easy to replicate to me ... just wait a couple of hours to made crash appen ! Reproducible: Always Steps to Reproduce: 1. Start the system 2. Wait 3. Actual Results: The system crash, usually the Ethernet crash and the system becomes unstable ; sometime I'm able to /etc/rc.d/network restart to get it back for a few time but not allways, generally, after the disks disapear (I can't report these logs actually as they are missed due to disk loss...) Expected Results: continue to run normally -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=641900
https://bugzilla.novell.com/show_bug.cgi?id=641900#c1
--- Comment #1 from Paul Pinault
https://bugzilla.novell.com/show_bug.cgi?id=641900
https://bugzilla.novell.com/show_bug.cgi?id=641900#c2
Tony Jones
https://bugzilla.novell.com/show_bug.cgi?id=641900
https://bugzilla.novell.com/show_bug.cgi?id=641900#c3
Jan Beulich
https://bugzilla.novell.com/show_bug.cgi?id=641900
https://bugzilla.novell.com/show_bug.cgi?id=641900#c4
--- Comment #4 from Paul Pinault
Without seeing the full kernel log we can't really judge whether the netdev watchdog kicking in was just a secondary effect. Please attach the full /var/log/messages fragment(s) of the session(s) in question.
I put all what was interestin in the log, most of the time when crash, the log is empty (no more message that when the system work correctly) Do we have a way to get more log messages that could help ? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=641900
https://bugzilla.novell.com/show_bug.cgi?id=641900#c5
--- Comment #5 from Jan Beulich
Do we have a way to get more log messages that could help ?
Without knowing what we're looking for - no. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=641900
https://bugzilla.novell.com/show_bug.cgi?id=641900#c6
--- Comment #6 from Paul Pinault
Without knowing what we're looking for - no.
I'm quite sure it is related with Bridge device as network traffic is the crash trigger. Since I changed my config to use my two RTL ethernet cards instead of the MPC51 one, I have no log into /var/log/message but the network is still crashing : in most of the case, the internal network (between VM and Dom0) is working corectly but the external communication (Dom0 or VM communicating to an external machine) is not working. At this point I can type any command you want to get analysis. In some other cases, the global server simply crash and I get no acces to anything (need to reboot) the /var/log/message have no messages related to this. I can add this point (it may help) it seems that it appens each time on the eth with the higher number : initially eth3 ; now eth1 even when I switch eth0 and eth1 networks (by realocating br0 an eth0 and br1 on eth1 ) always eth1 crash Then I can also add that is apears more frequently when I start a second VM ; in this case the br0 (eth0) is shared by 3 systems (Dom0, VM1, VM2) instead of 2 (Dom0 + VM1) Hope this can help ... Let me know what I can do to help to fix this -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=641900
https://bugzilla.novell.com/show_bug.cgi?id=641900#c7
--- Comment #7 from Jan Beulich
I'm quite sure it is related with Bridge device as network traffic is the crash trigger.
Your newer setup is using bridging just like the older one (just on different NICs), so I can't see how you would want to distinguish the two.
Since I changed my config to use my two RTL ethernet cards instead of the MPC51 one, I have no log into /var/log/message but the network is still crashing : in most of the case, the internal network (between VM and Dom0) is working corectly but the external communication (Dom0 or VM communicating to an external machine) is not working. At this point I can type any command you want to get analysis. In some other cases, the global server simply crash and I get no acces to anything (need to reboot) the /var/log/message have no messages related to this.
Again, we'll need a full log (up to and including any messages generated during an eventual full machine crash - those typically don't make it to persistent store, so you'll have to set up a serial console, at once allowing you to collect both kernel and hypervisor messages at the same time). -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=641900
https://bugzilla.novell.com/show_bug.cgi?id=641900#c8
--- Comment #8 from Paul Pinault
https://bugzilla.novell.com/show_bug.cgi?id=641900
https://bugzilla.novell.com/show_bug.cgi?id=641900#c9
--- Comment #9 from Paul Pinault
https://bugzilla.novell.com/show_bug.cgi?id=641900
https://bugzilla.novell.com/show_bug.cgi?id=641900#c10
--- Comment #10 from Paul Pinault
https://bugzilla.novell.com/show_bug.cgi?id=641900
https://bugzilla.novell.com/show_bug.cgi?id=641900#c11
--- Comment #11 from Jan Beulich
https://bugzilla.novell.com/show_bug.cgi?id=641900
https://bugzilla.novell.com/show_bug.cgi?id=641900#c12
--- Comment #12 from Paul Pinault
The log doesn't tell much, but at least it clarifies it's not the problem I was suspecting. Instead, especially the instance on Sep 16 suggest a more general interrupt handling problem, as a SATA device also suffered. Later instances with the 8139 don't, however - did you reconfigure the system in some way (e.g. was the interrupt shared originally, and now it isn't)?
I did not changed anything like this ; just change my network config to get my system stable for a longer time. SATA was a second side effect, when it crashed, firstly eth3 crashed, then I stoped & restard it ; it worked some time then SATA crashed ... but has you say this seems not to be the root cause, they are side effects on something else.
We'll need /var/log/boot.msg for both a native and a Xen kernel boot, ok, i'll provide this
and we'll need access to Xen's console (if the system is still usable once this state is reached, "xm debug-key" and "xm dmesg" command will do, but if it isn't a serial console is going to be unavoidable). When only network is crashed, the VM continue to work well but w/o external network (internal network with dom0 continue to work) ... until the Dom0 crash.
One other thing to try would be passing "cpuidle=0" to Xen. And of course I assume you already installed the recently released Xen update, and know the issue is not solved by this. All the systems : Dom0 and VMs are patched with the latest version of each systms, I have Opensuse 11.3 as Dom0 and Opensuse 11.1 and Opensuse 11.2 as VMs cpuidle=0 : ok I will chnage this
Finally, it would also be useful to know whether the latest kernel-of-the-day (ftp://ftp.suse.com/pub/projects/kernel/kotd/openSUSE-11.3/x86_64/, 2.6.36-rc based, but specifically with some rework of the interrupt handling) would help. Something possible to do after the others test ... no pbm...
I hope to find a serial cable for this weekend to be able to reproduce with all log info .. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=641900
https://bugzilla.novell.com/show_bug.cgi?id=641900#c13
--- Comment #13 from Paul Pinault
https://bugzilla.novell.com/show_bug.cgi?id=641900
https://bugzilla.novell.com/show_bug.cgi?id=641900#c14
--- Comment #14 from Paul Pinault
https://bugzilla.novell.com/show_bug.cgi?id=641900
https://bugzilla.novell.com/show_bug.cgi?id=641900#c15
--- Comment #15 from Paul Pinault
https://bugzilla.novell.com/show_bug.cgi?id=641900
https://bugzilla.novell.com/show_bug.cgi?id=641900#c16
--- Comment #16 from Paul Pinault
https://bugzilla.novell.com/show_bug.cgi?id=641900
https://bugzilla.novell.com/show_bug.cgi?id=641900#c17
--- Comment #17 from Paul Pinault
https://bugzilla.novell.com/show_bug.cgi?id=641900
https://bugzilla.novell.com/show_bug.cgi?id=641900#c18
--- Comment #18 from Paul Pinault
https://bugzilla.novell.com/show_bug.cgi?id=641900
https://bugzilla.novell.com/show_bug.cgi?id=641900#c19
--- Comment #19 from Paul Pinault
https://bugzilla.novell.com/show_bug.cgi?id=641900
https://bugzilla.novell.com/show_bug.cgi?id=641900#c20
--- Comment #20 from Paul Pinault
Finally, it would also be useful to know whether the latest kernel-of-the-day (ftp://ftp.suse.com/pub/projects/kernel/kotd/openSUSE-11.3/x86_64/, 2.6.36-rc based, but specifically with some rework of the interrupt handling) would help.
Only found ftp://ftp.suse.com/pub/projects/kernel/kotd/openSUSE-11.3/x86_64/kernel-xen-debuginfo-2.6.34.7-0.3.99.8.0873825.x86_64.rpm But try that one ... expecting more trace ! -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
/dev/null) Sep 30 22:55:28 saturn kernel: [ 425.812012] ------------[ cut here ]------------ Sep 30 22:55:28 saturn kernel: [ 425.812023] WARNING: at /usr/src/packages/BUILD/kernel-xen-2.6.34.7/linux-2.6.34/net/sched/sch_generic.c:256 dev_watchdog+0x25b/0x270() Sep 30 22:55:28 saturn kernel: [ 425.812025] Hardware name: System Product Name Sep 30 22:55:28 saturn kernel: [ 425.812027] NETDEV WATCHDOG: eth1 (8139too):
https://bugzilla.novell.com/show_bug.cgi?id=641900
https://bugzilla.novell.com/show_bug.cgi?id=641900#c21
--- Comment #21 from Paul Pinault
https://bugzilla.novell.com/show_bug.cgi?id=641900
https://bugzilla.novell.com/show_bug.cgi?id=641900#c22
--- Comment #22 from Jan Beulich
Created an attachment (id=392409) --> (http://bugzilla.novell.com/attachment.cgi?id=392409) [details] First log booting Dom0 and crashing dom0
Did you see "(XEN) APIC error on CPU3: 00(40)"? Are you having problems with your hardware? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=641900
https://bugzilla.novell.com/show_bug.cgi?id=641900#c23
--- Comment #23 from Jan Beulich
Created an attachment (id=392410) --> (http://bugzilla.novell.com/attachment.cgi?id=392410) [details] Second Log booting dom0 with acpi=on
As usually I boot with acpi=off, i changed this (removing the option), xen kernel start booting but crashed before the end of the boot ... log contains more information on crash.
The log here is completely meaningless. You pressed arbitrary keys on the serial console (or the remote end sent them without you asking for them) - one can't even tell whether the box was hung, or how far the boot progressed. BUT: if you think you need to disable ACPI, that may be part of your problem. I have yet to understand why you need to... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=641900
https://bugzilla.novell.com/show_bug.cgi?id=641900#c24
--- Comment #24 from Jan Beulich
Third test : back to acpi=off, so the context is the same as in the first log, but this time I was not able to finish to boot before crash appends.
Just like for the previous one - there's no evidence that the box crashed, you just had it print huge piles of information. If you didn't ask for it yourself, you'll need to tweak your "other end" of the serial cable (also indicated by the extra blank lines inserted, which make the logs quite hard to read).
I will try to crash it differently to be able to get keyboard access to type the xm dmesg command
No need for "xm dmesg" once you have a serial cable. You get all messages there, and you issue debug keys from the serial console (after switching input to Xen). -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=641900
https://bugzilla.novell.com/show_bug.cgi?id=641900#c25
--- Comment #25 from Jan Beulich
Only found ftp://ftp.suse.com/pub/projects/kernel/kotd/openSUSE-11.3/x86_64/kernel-xen-debuginfo-2.6.34.7-0.3.99.8.0873825.x86_64.rpm But try that one ... expecting more trace !
Sorry, I really intended to direct you to ftp://ftp.suse.com/pub/projects/kernel/kotd/master/x86_64/. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=641900
https://bugzilla.novell.com/show_bug.cgi?id=641900#c
Jan Beulich
https://bugzilla.novell.com/show_bug.cgi?id=641900
https://bugzilla.novell.com/show_bug.cgi?id=641900#c
Jan Beulich
https://bugzilla.novell.com/show_bug.cgi?id=641900
https://bugzilla.novell.com/show_bug.cgi?id=641900#c26
--- Comment #26 from Jan Beulich
https://bugzilla.novell.com/show_bug.cgi?id=641900
https://bugzilla.novell.com/show_bug.cgi?id=641900#c27
--- Comment #27 from Paul Pinault
(In reply to comment #16)
Created an attachment (id=392409) --> (http://bugzilla.novell.com/attachment.cgi?id=392409) [details] [details] First log booting Dom0 and crashing dom0
Did you see "(XEN) APIC error on CPU3: 00(40)"? Are you having problems with your hardware?
I don't think so, system is not crashing when I choose a non xen kernel. CPU is a fresh one never overclocked of something like this. I had a problem with a previous motherboard but I had the problem before and I continue to have it ... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=641900
https://bugzilla.novell.com/show_bug.cgi?id=641900#c28
--- Comment #28 from Paul Pinault
(In reply to comment #17)
Created an attachment (id=392410) --> (http://bugzilla.novell.com/attachment.cgi?id=392410) [details] [details] Second Log booting dom0 with acpi=on
As usually I boot with acpi=off, i changed this (removing the option), xen kernel start booting but crashed before the end of the boot ... log contains more information on crash.
The log here is completely meaningless. You pressed arbitrary keys on the serial console (or the remote end sent them without you asking for them) - one can't even tell whether the box was hung, or how far the boot progressed.
BUT: if you think you need to disable ACPI, that may be part of your problem. I have yet to understand why you need to...
In fact with or without acpi it does not change anything, what I detect is that acpi with a slow serial console in crashing, here, i don't kno why but my remote uart is set at 9600bps and can't be set to a higher baudrate. At the baudrate I can't boot the acpi on xen kernel ... I do not this this log is really interesting regarding the network problem ; it was in case of .. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=641900
https://bugzilla.novell.com/show_bug.cgi?id=641900#c29
--- Comment #29 from Paul Pinault
Turning off ACPI only for Xen makes things even more suspicious. What's the deal here? The deal was to be able to detect my sensors but right now, acpi is on and my sensors worked well so I removed acpi=off. This does not affect the crash (that was the purpose of the different test - unvalidate this setting impact)
Also, can you reproduce your problems on other, very different hardware? I do not have other hadware actually available for this.
Finally, one thing you definitely want to try is disabling the use of the nouveau driver in the Xen case. I do not understand what you mean by this. what is the "nouveau driver" ?
-- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=641900
https://bugzilla.novell.com/show_bug.cgi?id=641900#c30
--- Comment #30 from Paul Pinault
Finally, one thing you definitely want to try is disabling the use of the nouveau driver in the Xen case. I do not understand what you mean by this. what is the "nouveau driver" ?
Sorry ... got it ! I'll try asap. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=641900
https://bugzilla.novell.com/show_bug.cgi?id=641900#c31
--- Comment #31 from Paul Pinault
https://bugzilla.novell.com/show_bug.cgi?id=641900
https://bugzilla.novell.com/show_bug.cgi?id=641900#c32
--- Comment #32 from Paul Pinault
https://bugzilla.novell.com/show_bug.cgi?id=641900
https://bugzilla.novell.com/show_bug.cgi?id=641900#c33
--- Comment #33 from Jan Beulich
https://bugzilla.novell.com/show_bug.cgi?id=641900
https://bugzilla.novell.com/show_bug.cgi?id=641900#c34
--- Comment #34 from Paul Pinault
Still missing the log promised in #31. Nothing attached as the log is the same as the previous one. Nothing to see on it.
Also please try disabling IRQ balancing in Xen ("noirqbalance" on the Xen command line) and/or in Linux (disabling the irq balance daemon in case it is enabled). Next time i will have to reboot my kvmqemu config i will do the test... Actually it works since at least one month. Not sure to answer quickly.
-- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=641900
https://bugzilla.novell.com/show_bug.cgi?id=641900#c35
--- Comment #35 from Jan Beulich
Nothing attached as the log is the same as the previous one. Nothing to see on it.
How that, if now you don't load the nouveau driver, while previously you did? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=641900
https://bugzilla.novell.com/show_bug.cgi?id=641900#c
Ihno Krumreich
https://bugzilla.novell.com/show_bug.cgi?id=641900
https://bugzilla.novell.com/show_bug.cgi?id=641900#c36
--- Comment #36 from Jan Beulich
https://bugzilla.novell.com/show_bug.cgi?id=641900
https://bugzilla.novell.com/show_bug.cgi?id=641900#c37
Jan Beulich
participants (1)
-
bugzilla_noreply@novell.com