[Bug 641900] New: xen kernel crash after about 16 hours, network stoped later/sometime disk control crash too
https://bugzilla.novell.com/show_bug.cgi?id=641900 https://bugzilla.novell.com/show_bug.cgi?id=641900#c0 Summary: xen kernel crash after about 16 hours, network stoped later/sometime disk control crash too Classification: openSUSE Product: openSUSE 11.3 Version: Final Platform: x86-64 OS/Version: openSUSE 11.3 Status: NEW Severity: Critical Priority: P5 - None Component: Kernel AssignedTo: kernel-maintainers@forge.provo.novell.com ReportedBy: disk_91@hotmail.com QAContact: qa@suse.de Found By: --- Blocker: --- User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; fr; rv:1.9.2.10) Gecko/20100914 SUSE/3.6.10-0.3.1 Firefox/3.6.10 Sep 25 05:50:21 saturn kernel: [58361.780047] ------------[ cut here ]------------ Sep 25 05:50:21 saturn kernel: [58361.780058] WARNING: at /usr/src/packages/BUILD/kernel-xen-2.6.34.7/linux-2.6.34/net/sched/sch_generic.c:256 dev_watchdog+0x25b/0x270() Sep 25 05:50:21 saturn kernel: [58361.780060] Hardware name: System Product Name Sep 25 05:50:21 saturn kernel: [58361.780062] NETDEV WATCHDOG: eth3 (forcedeth): transmit queue 0 timed out Sep 25 05:50:21 saturn kernel: [58361.780063] Modules linked in: ip6t_LOG xt_tcpudp xt_pkttype xt_physdev ipt_LOG xt_limit usbbk gntdev netbk it87 blkbk blkback_pagemap hwmon_vid snd_pcm_oss blktap domctl xenbus_be coretemp evtchn snd_mixer_oss snd_seq snd_seq_device edd nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs bridge stp llc ip6t_REJECT nf_conntrack_ipv6 ip6table_raw xt_NOTRACK ipt_REJECT xt_state iptable_raw iptable_filter ip6table_mangle nf_conntrack_netbios_ns nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables ip6table_filter ip6_tables x_tables fuse loop snd_hda_codec_realtek firewire_ohci firewire_core crc_itu_t snd_hda_intel snd_hda_codec snd_hwdep snd_pcm ohci1394 ieee1394 8139too snd_timer usblp usbhid i2c_nforce2 hid i2c_core 8139cp forcedeth pcspkr snd soundcore snd_page_alloc sr_mod sg shpchp pci_hotplug ext4 jbd2 crc16 dm_mirror dm_region_hash dm_log ohci_hcd ehci_hcd sd_mod usbcore dm_snapshot dm_mod xenblk cdrom xennet processor ata_generic pata_amd sata_nv libata scsi_mod thermal_sys h Sep 25 05:50:21 saturn kernel: wmon Sep 25 05:50:21 saturn kernel: [58361.780122] Pid: 0, comm: swapper Not tainted 2.6.34.7-0.3-xen #1 Sep 25 05:50:21 saturn kernel: [58361.780124] Call Trace: Sep 25 05:50:21 saturn kernel: [58361.780135] [<ffffffff80009646>] dump_trace+0x76/0x1a0 Sep 25 05:50:21 saturn kernel: [58361.780139] [<ffffffff8040a79b>] dump_stack+0x69/0x6f Sep 25 05:50:21 saturn kernel: [58361.780143] [<ffffffff80043943>] warn_slowpath_common+0x73/0xb0 Sep 25 05:50:21 saturn kernel: [58361.780146] [<ffffffff800439e0>] warn_slowpath_fmt+0x40/0x50 Sep 25 05:50:21 saturn kernel: [58361.780149] [<ffffffff8034d04b>] dev_watchdog+0x25b/0x270 Sep 25 05:50:21 saturn kernel: [58361.780155] [<ffffffff80053d34>] run_timer_softirq+0x1d4/0x3d0 Sep 25 05:50:21 saturn kernel: [58361.780159] [<ffffffff8004b8c8>] __do_softirq+0xe8/0x220 Sep 25 05:50:21 saturn kernel: [58361.780162] [<ffffffff80007efc>] call_softirq+0x1c/0x30 Sep 25 05:50:21 saturn kernel: [58361.780166] [<ffffffff80009595>] do_softirq+0xa5/0xe0 Sep 25 05:50:21 saturn kernel: [58361.780175] [<ffffffff8004bafd>] irq_exit+0x8d/0xa0 Sep 25 05:50:21 saturn kernel: [58361.780182] [<ffffffff802d27d2>] evtchn_do_upcall+0x222/0x270 Sep 25 05:50:21 saturn kernel: [58361.780188] [<ffffffff80007a4e>] do_hypervisor_callback+0x1e/0x30 Sep 25 05:50:21 saturn kernel: [58361.780207] [<ffffffff800033aa>] 0xffffffff800033aa Sep 25 05:50:21 saturn kernel: [58361.780213] [<ffffffff80009c0c>] xen_safe_halt+0xc/0x10 Sep 25 05:50:21 saturn kernel: [58361.780216] [<ffffffff8000e763>] xen_idle+0x43/0xc0 Sep 25 05:50:21 saturn kernel: [58361.780220] [<ffffffff80005255>] cpu_idle+0x55/0xa0 Sep 25 05:50:21 saturn kernel: [58361.780223] ---[ end trace 92ba00751c8e0e8f ]--- Sep 25 05:50:21 saturn kernel: [58361.780226] eth3: Got tx_timeout. irq: 00000036 Sep 25 05:50:21 saturn kernel: [58361.780228] eth3: Ring at 80f8000 Sep 25 05:50:21 saturn kernel: [58361.780229] eth3: Dumping tx registers Sep 25 05:50:21 saturn kernel: [58361.780235] 0: 00000036 000000df 00000003 0009000d 00000000 00000000 00000000 00000000 Sep 25 05:50:21 saturn kernel: [58361.780241] 20: 00000000 f0000000 00000000 00000000 00000000 00000000 00000000 00000000 Sep 25 05:50:21 saturn kernel: [58361.780246] 40: 0420e20e 0000a455 00002e20 00000000 00000000 00000000 00000000 00000000 Sep 25 05:50:21 saturn kernel: [58361.780254] 60: 00000000 00000000 00000000 0000ffff 0000ffff 0000ffff 0000ffff 00000000 [...] continue ... saturn:/home/disk # uname -a Linux saturn 2.6.34.7-0.3-xen #1 SMP 2010-09-20 15:27:38 +0200 x86_64 x86_64 x86_64 GNU/Linux saturn:/home/disk # lspci 00:00.0 Host bridge: nVidia Corporation C55 Host Bridge (rev a2) 00:00.1 RAM memory: nVidia Corporation C55 Memory Controller (rev a1) 00:00.2 RAM memory: nVidia Corporation C55 Memory Controller (rev a1) 00:00.3 RAM memory: nVidia Corporation C55 Memory Controller (rev a1) 00:00.4 RAM memory: nVidia Corporation C55 Memory Controller (rev a1) 00:00.5 RAM memory: nVidia Corporation C55 Memory Controller (rev a2) 00:00.6 RAM memory: nVidia Corporation C55 Memory Controller (rev a1) 00:00.7 RAM memory: nVidia Corporation C55 Memory Controller (rev a1) 00:01.0 RAM memory: nVidia Corporation C55 Memory Controller (rev a1) 00:01.1 RAM memory: nVidia Corporation C55 Memory Controller (rev a1) 00:01.2 RAM memory: nVidia Corporation C55 Memory Controller (rev a1) 00:01.3 RAM memory: nVidia Corporation C55 Memory Controller (rev a1) 00:01.4 RAM memory: nVidia Corporation C55 Memory Controller (rev a1) 00:01.5 RAM memory: nVidia Corporation C55 Memory Controller (rev a1) 00:01.6 RAM memory: nVidia Corporation C55 Memory Controller (rev a1) 00:02.0 RAM memory: nVidia Corporation C55 Memory Controller (rev a1) 00:02.1 RAM memory: nVidia Corporation C55 Memory Controller (rev a1) 00:02.2 RAM memory: nVidia Corporation C55 Memory Controller (rev a1) 00:03.0 PCI bridge: nVidia Corporation C55 PCI Express bridge (rev a1) 00:09.0 RAM memory: nVidia Corporation MCP51 Host Bridge (rev a2) 00:0a.0 ISA bridge: nVidia Corporation MCP51 LPC Bridge (rev a3) 00:0a.1 SMBus: nVidia Corporation MCP51 SMBus (rev a3) 00:0a.2 RAM memory: nVidia Corporation MCP51 Memory Controller 0 (rev a3) 00:0b.0 USB Controller: nVidia Corporation MCP51 USB Controller (rev a3) 00:0b.1 USB Controller: nVidia Corporation MCP51 USB Controller (rev a3) 00:0d.0 IDE interface: nVidia Corporation MCP51 IDE (rev a1) 00:0e.0 RAID bus controller: nVidia Corporation MCP51 Serial ATA Controller (rev a1) 00:0f.0 IDE interface: nVidia Corporation MCP51 Serial ATA Controller (rev a1) 00:10.0 PCI bridge: nVidia Corporation MCP51 PCI Bridge (rev a2) 00:10.1 Audio device: nVidia Corporation MCP51 High Definition Audio (rev a2) 00:14.0 Bridge: nVidia Corporation MCP51 Ethernet Controller (rev a3) 01:00.0 PCI bridge: nVidia Corporation Device 05bf (rev a2) 02:00.0 PCI bridge: nVidia Corporation Device 05bf (rev a2) 02:01.0 PCI bridge: nVidia Corporation Device 05bf (rev a2) 02:02.0 PCI bridge: nVidia Corporation Device 05bf (rev a2) 02:03.0 PCI bridge: nVidia Corporation Device 05bf (rev a2) 03:00.0 VGA compatible controller: nVidia Corporation NV44 [GeForce 6200 LE] (rev a1) 07:06.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10) 07:07.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10) 07:08.0 FireWire (IEEE 1394): VIA Technologies, Inc. VT6306/7/8 [Fire II(M)] IEEE 1394 OHCI Controller (rev c0) Hardware : Asus P5N-D last bios version Cpu : model name : Intel(R) Core(TM)2 Quad CPU Q8400 @ 2.66GHz stepping : 10 cpu MHz : 2666.728 Loaded modules: saturn:/home/disk # lsmod Module Size Used by ip6t_LOG 5898 7 xt_tcpudp 2859 26 xt_pkttype 1288 3 xt_physdev 1867 2 ipt_LOG 6067 17 xt_limit 2495 24 usbbk 23847 0 gntdev 8579 3 netbk 41414 0 [permanent] blkbk 28814 0 [permanent] it87 38738 0 blkback_pagemap 2806 1 blkbk snd_pcm_oss 53487 0 hwmon_vid 3226 1 it87 snd_mixer_oss 18913 1 snd_pcm_oss blktap 126702 2 [permanent] domctl 3227 2 blkbk,blktap xenbus_be 3706 4 usbbk,netbk,blkbk,blktap coretemp 6523 0 snd_seq 67827 0 evtchn 38482 4 snd_seq_device 7834 1 snd_seq edd 10176 0 nfsd 330017 9 lockd 84204 1 nfsd nfs_acl 3107 1 nfsd auth_rpcgss 49079 1 nfsd sunrpc 255540 15 nfsd,lockd,nfs_acl,auth_rpcgss exportfs 4715 1 nfsd bridge 85911 2 stp 2331 1 bridge llc 6103 2 bridge,stp ip6t_REJECT 4828 3 nf_conntrack_ipv6 21550 4 ip6table_raw 1627 1 xt_NOTRACK 1192 4 ipt_REJECT 2672 3 xt_state 1618 18 iptable_raw 1686 1 iptable_filter 1946 1 ip6table_mangle 2036 0 nf_conntrack_netbios_ns 1758 0 nf_conntrack_ipv4 10379 14 nf_conntrack 87570 5 nf_conntrack_ipv6,xt_NOTRACK,xt_state,nf_conntrack_netbios_ns,nf_conntrack_ipv4 nf_defrag_ipv4 1673 1 nf_conntrack_ipv4 ip_tables 21762 2 iptable_raw,iptable_filter ip6table_filter 1887 1 ip6_tables 23384 4 ip6t_LOG,ip6table_raw,ip6table_mangle,ip6table_filter x_tables 25752 17 ip6t_LOG,xt_tcpudp,xt_pkttype,xt_physdev,ipt_LOG,xt_limit,ip6t_REJECT,ip6table_raw,xt_NOTRACK,ipt_REJECT,xt_state,iptable_raw,iptable_filter,ip6table_mangle,ip_tables,ip6table_filter,ip6_tables fuse 77021 3 loop 18239 6 snd_hda_codec_realtek 324063 1 firewire_ohci 26970 0 snd_hda_intel 29229 2 firewire_core 61434 1 firewire_ohci crc_itu_t 1747 1 firewire_core snd_hda_codec 112811 2 snd_hda_codec_realtek,snd_hda_intel snd_hwdep 7676 1 snd_hda_codec snd_pcm 107771 3 snd_pcm_oss,snd_hda_intel,snd_hda_codec snd_timer 27312 2 snd_seq,snd_pcm snd 83454 14 snd_pcm_oss,snd_mixer_oss,snd_seq,snd_seq_device,snd_hda_codec_realtek,snd_hda_intel,snd_hda_codec,snd_hwdep,snd_pcm,snd_timer ohci1394 33542 0 soundcore 8757 1 snd 8139too 35962 0 i2c_nforce2 7561 0 usblp 13961 0 pcspkr 2222 0 snd_page_alloc 9473 2 snd_hda_intel,snd_pcm forcedeth 61485 0 8139cp 25731 0 i2c_core 32104 1 i2c_nforce2 ieee1394 104214 1 ohci1394 sr_mod 16364 0 shpchp 34692 0 sg 33047 0 pci_hotplug 31949 1 shpchp ext4 399185 2 jbd2 98208 1 ext4 crc16 1715 1 ext4 usbhid 52713 0 hid 85698 1 usbhid dm_mirror 15871 1 dm_region_hash 13661 1 dm_mirror dm_log 10948 3 dm_mirror,dm_region_hash ohci_hcd 36442 0 ehci_hcd 60996 0 sd_mod 41170 2 usbcore 231747 6 usbbk,usblp,usbhid,ohci_hcd,ehci_hcd dm_snapshot 35225 0 dm_mod 86467 17 dm_mirror,dm_log,dm_snapshot xenblk 26098 0 cdrom 43051 2 sr_mod,xenblk xennet 37357 0 processor 42760 0 ata_generic 3739 0 pata_amd 12922 0 sata_nv 25589 2 libata 211385 3 ata_generic,pata_amd,sata_nv scsi_mod 191240 4 sr_mod,sg,sd_mod,libata thermal_sys 18006 1 processor hwmon 2712 3 it87,coretemp,thermal_sys Other informations: The system is running no specific application on Dom0 The system is running 2 virtual machine based on OpenSuse 11.1 Hope this will help to fix that bug !!! Do not hesitate to contact me to get more informations ... Easy to replicate to me ... just wait a couple of hours to made crash appen ! Reproducible: Always Steps to Reproduce: 1. Start the system 2. Wait 3. Actual Results: The system crash, usually the Ethernet crash and the system becomes unstable ; sometime I'm able to /etc/rc.d/network restart to get it back for a few time but not allways, generally, after the disks disapear (I can't report these logs actually as they are missed due to disk loss...) Expected Results: continue to run normally -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=641900 https://bugzilla.novell.com/show_bug.cgi?id=641900#c1 --- Comment #1 from Paul Pinault <disk_91@hotmail.com> 2010-09-27 19:41:14 UTC --- It seems that this problem is more related with bridge : I changed my setup to stop using eth3 and use eth0 and eth1 instead. Now, my kernel is not crashing but the network is stoping on some interfaces ... Additional information to help : saturn:/home/disk # ifconfig br0 Link encap:Ethernet HWaddr 00:48:54:67:E3:F9 inet adr:10.0.0.20 Bcast:10.0.0.255 Masque:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:668 errors:0 dropped:0 overruns:0 frame:0 TX packets:555 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 lg file transmission:0 RX bytes:316246 (308.8 Kb) TX bytes:95004 (92.7 Kb) br1 Link encap:Ethernet HWaddr 00:48:54:6F:78:AB inet adr:10.0.1.20 Bcast:10.0.1.255 Masque:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:72 errors:0 dropped:0 overruns:0 frame:0 TX packets:20 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 lg file transmission:0 RX bytes:7516 (7.3 Kb) TX bytes:3769 (3.6 Kb) eth0 Link encap:Ethernet HWaddr 00:48:54:67:E3:F9 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:744 errors:0 dropped:0 overruns:0 frame:0 TX packets:646 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 lg file transmission:1000 RX bytes:335746 (327.8 Kb) TX bytes:108131 (105.5 Kb) Interruption:10 Adresse de base:0xc000 eth1 Link encap:Ethernet HWaddr 00:48:54:6F:78:AB UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:45 errors:0 dropped:0 overruns:0 frame:0 TX packets:79 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 lg file transmission:1000 RX bytes:2885 (2.8 Kb) TX bytes:11800 (11.5 Kb) Interruption:11 Adresse de base:0x2000 lo Link encap:Boucle locale inet adr:127.0.0.1 Masque:255.0.0.0 adr inet6: ::1/128 Scope:Hôte UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:87 errors:0 dropped:0 overruns:0 frame:0 TX packets:87 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 lg file transmission:0 RX bytes:9508 (9.2 Kb) TX bytes:9508 (9.2 Kb) vif1.0 Link encap:Ethernet HWaddr FE:FF:FF:FF:FF:FF UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1 RX packets:264 errors:0 dropped:0 overruns:0 frame:0 TX packets:295 errors:0 dropped:1 overruns:0 carrier:0 collisions:0 lg file transmission:32 RX bytes:27623 (26.9 Kb) TX bytes:32612 (31.8 Kb) vif1.1 Link encap:Ethernet HWaddr FE:FF:FF:FF:FF:FF UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1 RX packets:65 errors:0 dropped:0 overruns:0 frame:0 TX packets:21 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 lg file transmission:32 RX bytes:7307 (7.1 Kb) TX bytes:1241 (1.2 Kb) saturn:/home/disk # brctl show bridge name bridge id STP enabled interfaces br0 8000.00485467e3f9 no eth0 vif1.0 br1 8000.0048546f78ab no eth1 vif1.1 The configuration does not seems to be in cause as after a reboot everything is going well ... for a few time :( -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=641900 https://bugzilla.novell.com/show_bug.cgi?id=641900#c2 Tony Jones <tonyj@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED CC| |tonyj@novell.com AssignedTo|kernel-maintainers@forge.pr |jbeulich@novell.com |ovo.novell.com | --- Comment #2 from Tony Jones <tonyj@novell.com> 2010-09-28 15:14:32 UTC --- Jan. Do you want to take a look at this since it's Xen related. Feel free to reassign back if not appropriate. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=641900 https://bugzilla.novell.com/show_bug.cgi?id=641900#c3 Jan Beulich <jbeulich@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Found By|--- |Community User InfoProvider| |disk_91@hotmail.com --- Comment #3 from Jan Beulich <jbeulich@novell.com> 2010-09-29 08:51:07 UTC --- Without seeing the full kernel log we can't really judge whether the netdev watchdog kicking in was just a secondary effect. Please attach the full /var/log/messages fragment(s) of the session(s) in question. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=641900 https://bugzilla.novell.com/show_bug.cgi?id=641900#c4 --- Comment #4 from Paul Pinault <disk_91@hotmail.com> 2010-09-29 10:25:45 UTC --- (In reply to comment #3)
Without seeing the full kernel log we can't really judge whether the netdev watchdog kicking in was just a secondary effect. Please attach the full /var/log/messages fragment(s) of the session(s) in question.
I put all what was interestin in the log, most of the time when crash, the log is empty (no more message that when the system work correctly) Do we have a way to get more log messages that could help ? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=641900 https://bugzilla.novell.com/show_bug.cgi?id=641900#c5 --- Comment #5 from Jan Beulich <jbeulich@novell.com> 2010-09-29 11:05:25 UTC --- (In reply to comment #4)
Do we have a way to get more log messages that could help ?
Without knowing what we're looking for - no. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=641900 https://bugzilla.novell.com/show_bug.cgi?id=641900#c6 --- Comment #6 from Paul Pinault <disk_91@hotmail.com> 2010-09-29 15:16:15 UTC ---
Without knowing what we're looking for - no.
I'm quite sure it is related with Bridge device as network traffic is the crash trigger. Since I changed my config to use my two RTL ethernet cards instead of the MPC51 one, I have no log into /var/log/message but the network is still crashing : in most of the case, the internal network (between VM and Dom0) is working corectly but the external communication (Dom0 or VM communicating to an external machine) is not working. At this point I can type any command you want to get analysis. In some other cases, the global server simply crash and I get no acces to anything (need to reboot) the /var/log/message have no messages related to this. I can add this point (it may help) it seems that it appens each time on the eth with the higher number : initially eth3 ; now eth1 even when I switch eth0 and eth1 networks (by realocating br0 an eth0 and br1 on eth1 ) always eth1 crash Then I can also add that is apears more frequently when I start a second VM ; in this case the br0 (eth0) is shared by 3 systems (Dom0, VM1, VM2) instead of 2 (Dom0 + VM1) Hope this can help ... Let me know what I can do to help to fix this -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=641900 https://bugzilla.novell.com/show_bug.cgi?id=641900#c7 --- Comment #7 from Jan Beulich <jbeulich@novell.com> 2010-09-29 15:25:30 UTC --- (In reply to comment #6)
I'm quite sure it is related with Bridge device as network traffic is the crash trigger.
Your newer setup is using bridging just like the older one (just on different NICs), so I can't see how you would want to distinguish the two.
Since I changed my config to use my two RTL ethernet cards instead of the MPC51 one, I have no log into /var/log/message but the network is still crashing : in most of the case, the internal network (between VM and Dom0) is working corectly but the external communication (Dom0 or VM communicating to an external machine) is not working. At this point I can type any command you want to get analysis. In some other cases, the global server simply crash and I get no acces to anything (need to reboot) the /var/log/message have no messages related to this.
Again, we'll need a full log (up to and including any messages generated during an eventual full machine crash - those typically don't make it to persistent store, so you'll have to set up a serial console, at once allowing you to collect both kernel and hypervisor messages at the same time). -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=641900 https://bugzilla.novell.com/show_bug.cgi?id=641900#c8 --- Comment #8 from Paul Pinault <disk_91@hotmail.com> 2010-09-29 17:08:36 UTC --- Created an attachment (id=392185) --> (http://bugzilla.novell.com/attachment.cgi?id=392185) full kernel log since opensuse 11.3 install ; see after sept 23 for last kernel update full kernel log as requested -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=641900 https://bugzilla.novell.com/show_bug.cgi?id=641900#c9 --- Comment #9 from Paul Pinault <disk_91@hotmail.com> 2010-09-29 17:24:33 UTC --- looking for a serial cable to activate console trace ... it should be in place tonight -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=641900 https://bugzilla.novell.com/show_bug.cgi?id=641900#c10 --- Comment #10 from Paul Pinault <disk_91@hotmail.com> 2010-09-29 18:49:34 UTC --- Unfortunatly, no X serial cable :( ... will have to wait more for this, hope the kernel trace will help -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=641900 https://bugzilla.novell.com/show_bug.cgi?id=641900#c11 --- Comment #11 from Jan Beulich <jbeulich@novell.com> 2010-09-30 07:40:56 UTC --- The log doesn't tell much, but at least it clarifies it's not the problem I was suspecting. Instead, especially the instance on Sep 16 suggest a more general interrupt handling problem, as a SATA device also suffered. Later instances with the 8139 don't, however - did you reconfigure the system in some way (e.g. was the interrupt shared originally, and now it isn't)? We'll need /var/log/boot.msg for both a native and a Xen kernel boot, and we'll need access to Xen's console (if the system is still usable once this state is reached, "xm debug-key" and "xm dmesg" command will do, but if it isn't a serial console is going to be unavoidable). One other thing to try would be passing "cpuidle=0" to Xen. And of course I assume you already installed the recently released Xen update, and know the issue is not solved by this. Finally, it would also be useful to know whether the latest kernel-of-the-day (ftp://ftp.suse.com/pub/projects/kernel/kotd/openSUSE-11.3/x86_64/, 2.6.36-rc based, but specifically with some rework of the interrupt handling) would help. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=641900 https://bugzilla.novell.com/show_bug.cgi?id=641900#c12 --- Comment #12 from Paul Pinault <disk_91@hotmail.com> 2010-09-30 08:13:03 UTC --- (In reply to comment #11)
The log doesn't tell much, but at least it clarifies it's not the problem I was suspecting. Instead, especially the instance on Sep 16 suggest a more general interrupt handling problem, as a SATA device also suffered. Later instances with the 8139 don't, however - did you reconfigure the system in some way (e.g. was the interrupt shared originally, and now it isn't)?
I did not changed anything like this ; just change my network config to get my system stable for a longer time. SATA was a second side effect, when it crashed, firstly eth3 crashed, then I stoped & restard it ; it worked some time then SATA crashed ... but has you say this seems not to be the root cause, they are side effects on something else.
We'll need /var/log/boot.msg for both a native and a Xen kernel boot, ok, i'll provide this
and we'll need access to Xen's console (if the system is still usable once this state is reached, "xm debug-key" and "xm dmesg" command will do, but if it isn't a serial console is going to be unavoidable). When only network is crashed, the VM continue to work well but w/o external network (internal network with dom0 continue to work) ... until the Dom0 crash.
One other thing to try would be passing "cpuidle=0" to Xen. And of course I assume you already installed the recently released Xen update, and know the issue is not solved by this. All the systems : Dom0 and VMs are patched with the latest version of each systms, I have Opensuse 11.3 as Dom0 and Opensuse 11.1 and Opensuse 11.2 as VMs cpuidle=0 : ok I will chnage this
Finally, it would also be useful to know whether the latest kernel-of-the-day (ftp://ftp.suse.com/pub/projects/kernel/kotd/openSUSE-11.3/x86_64/, 2.6.36-rc based, but specifically with some rework of the interrupt handling) would help. Something possible to do after the others test ... no pbm...
I hope to find a serial cable for this weekend to be able to reproduce with all log info .. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=641900 https://bugzilla.novell.com/show_bug.cgi?id=641900#c13 --- Comment #13 from Paul Pinault <disk_91@hotmail.com> 2010-09-30 16:26:50 UTC --- Created an attachment (id=392393) --> (http://bugzilla.novell.com/attachment.cgi?id=392393) boot.msg normal kernel (no xen) Normal boot.msg log -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=641900 https://bugzilla.novell.com/show_bug.cgi?id=641900#c14 --- Comment #14 from Paul Pinault <disk_91@hotmail.com> 2010-09-30 16:27:45 UTC --- Created an attachment (id=392394) --> (http://bugzilla.novell.com/attachment.cgi?id=392394) boot.msg xen kernel boot.msg xen kernel log file -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=641900 https://bugzilla.novell.com/show_bug.cgi?id=641900#c15 --- Comment #15 from Paul Pinault <disk_91@hotmail.com> 2010-09-30 17:31:01 UTC --- The serial cable is in place ... start capturating logs ... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=641900 https://bugzilla.novell.com/show_bug.cgi?id=641900#c16 --- Comment #16 from Paul Pinault <disk_91@hotmail.com> 2010-09-30 17:59:26 UTC --- Created an attachment (id=392409) --> (http://bugzilla.novell.com/attachment.cgi?id=392409) First log booting Dom0 and crashing dom0 This log has been get from serial console. It boots the Xen kernel, start a VM, start a second VM manually , then I crash the system by generating a NFS transfer on BR0/eth0 (it takes less than 5 min to crash) at this point of time I was not able to use the system anymore (no keyboard, no mouse .. screen up but frozen) had to reset. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=641900 https://bugzilla.novell.com/show_bug.cgi?id=641900#c17 --- Comment #17 from Paul Pinault <disk_91@hotmail.com> 2010-09-30 18:01:07 UTC --- Created an attachment (id=392410) --> (http://bugzilla.novell.com/attachment.cgi?id=392410) Second Log booting dom0 with acpi=on As usually I boot with acpi=off, i changed this (removing the option), xen kernel start booting but crashed before the end of the boot ... log contains more information on crash. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=641900 https://bugzilla.novell.com/show_bug.cgi?id=641900#c18 --- Comment #18 from Paul Pinault <disk_91@hotmail.com> 2010-09-30 18:05:07 UTC --- Created an attachment (id=392411) --> (http://bugzilla.novell.com/attachment.cgi?id=392411) Third Log booting Dom0 with acpi=off Third test : back to acpi=off, so the context is the same as in the first log, but this time I was not able to finish to boot before crash appends. Right now the fourth log is in progress, I just reboot and the system finished to boot correctly (as in Log1) (compare to log2 and log3 I did a switch on/off of the machine instead of just using the reset button) I will try to crash it differently to be able to get keyboard access to type the xm dmesg command -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=641900 https://bugzilla.novell.com/show_bug.cgi?id=641900#c19 --- Comment #19 from Paul Pinault <disk_91@hotmail.com> 2010-09-30 20:09:58 UTC --- Other testing done tonight ... - I'm able to make it crash easily just by generating traffic on any interface - I'm actually not able to access console when crashed to type xm dmesg or simply dmesg ... may be later - during each "home made" crash I did not see any interesting logs on console - after crash I usually get Input/Output error on any command (including dmesg), sometime I don't have keyboard, sometime I have - Normal kernel is stable ( I transfered about 12G w/o any issue when I never transfer more than 2G on Xen kernel (interresting limit ..) but generally less is sufficient) - Actually I boot my system with acpi=on and it works as bad as acpi=off - I'll try latest kernel version ... no more test idea as nothing interesting in the log I see... I hope you will decode the matrix in the one I attached today. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=641900 https://bugzilla.novell.com/show_bug.cgi?id=641900#c20 --- Comment #20 from Paul Pinault <disk_91@hotmail.com> 2010-09-30 20:25:35 UTC ---
Finally, it would also be useful to know whether the latest kernel-of-the-day (ftp://ftp.suse.com/pub/projects/kernel/kotd/openSUSE-11.3/x86_64/, 2.6.36-rc based, but specifically with some rework of the interrupt handling) would help.
Only found ftp://ftp.suse.com/pub/projects/kernel/kotd/openSUSE-11.3/x86_64/kernel-xen-debuginfo-2.6.34.7-0.3.99.8.0873825.x86_64.rpm But try that one ... expecting more trace ! -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
/dev/null) Sep 30 22:55:28 saturn kernel: [ 425.812012] ------------[ cut here ]------------ Sep 30 22:55:28 saturn kernel: [ 425.812023] WARNING: at /usr/src/packages/BUILD/kernel-xen-2.6.34.7/linux-2.6.34/net/sched/sch_generic.c:256 dev_watchdog+0x25b/0x270() Sep 30 22:55:28 saturn kernel: [ 425.812025] Hardware name: System Product Name Sep 30 22:55:28 saturn kernel: [ 425.812027] NETDEV WATCHDOG: eth1 (8139too):
https://bugzilla.novell.com/show_bug.cgi?id=641900 https://bugzilla.novell.com/show_bug.cgi?id=641900#c21 --- Comment #21 from Paul Pinault <disk_91@hotmail.com> 2010-09-30 21:18:57 UTC --- So, to finish test tonight : I choose the kernel of the day and it crash exactly the same way :( saturn:/home/disk # uname -a Linux saturn 2.6.34.7-0.3.99.8.0873825-xen #1 SMP 2010-09-27 20:56:41 +0200 x86_64 x86_64 x86_64 GNU/Linux When crash I got the following elements: Sep 30 22:51:44 saturn kernel: [ 201.068023] alloc kstat_irqs on node 0 Sep 30 22:52:37 saturn kernel: [ 254.200733] br0: port 3(vif2.0) entering disabled state Sep 30 22:52:37 saturn logger: /etc/xen/scripts/vif-bridge: offline XENBUS_PATH=backend/vif/2/0 Sep 30 22:52:37 saturn kernel: [ 254.220087] br0: port 3(vif2.0) entering disabled state Sep 30 22:52:37 saturn logger: /etc/xen/scripts/vif-bridge: brctl delif br0 vif2.0 failed Sep 30 22:52:37 saturn logger: /etc/xen/scripts/vif-bridge: ifconfig vif2.0 down failed Sep 30 22:52:37 saturn logger: /etc/xen/scripts/vif-bridge: Successful vif-bridge offline for vif2.0, bridge br0. Sep 30 22:52:37 saturn logger: /etc/xen/scripts/xen-hotplug-cleanup: XENBUS_PATH=backend/vkbd/2/0 Sep 30 22:52:37 saturn logger: /etc/xen/scripts/xen-hotplug-cleanup: XENBUS_PATH=backend/console/2/0 Sep 30 22:52:37 saturn logger: /etc/xen/scripts/xen-hotplug-cleanup: XENBUS_PATH=backend/vfb/2/0 Sep 30 22:52:37 saturn logger: /etc/xen/scripts/block: remove XENBUS_PATH=backend/vbd/2/51712 Sep 30 22:52:37 saturn logger: /etc/xen/scripts/xen-hotplug-cleanup: XENBUS_PATH=backend/vif/2/0 Sep 30 22:52:37 saturn logger: /etc/xen/scripts/block: remove XENBUS_PATH=backend/vbd/2/51728 Sep 30 22:52:37 saturn logger: /etc/xen/scripts/block: remove XENBUS_PATH=backend/vbd/2/51760 Sep 30 22:52:37 saturn logger: /etc/xen/scripts/xen-hotplug-cleanup: XENBUS_PATH=backend/vbd/2/51760 Sep 30 22:52:37 saturn logger: /etc/xen/scripts/xen-hotplug-cleanup: XENBUS_PATH=backend/vbd/2/51728 Sep 30 22:52:37 saturn logger: /etc/xen/scripts/xen-hotplug-cleanup: XENBUS_PATH=backend/vbd/2/51712 Sep 30 22:55:01 saturn /usr/sbin/cron[6050]: (root) CMD (/opt/stats/execstat.sh transmit queue 0 timed out Sep 30 22:55:28 saturn kernel: [ 425.812029] Modules linked in: ip6t_LOG xt_tcpudp xt_pkttype xt_physdev ipt_LOG xt_limit usbbk gntdev netbk blkbk blkback_pagemap blktap domctl hwmon_vid xenbus_be snd_pcm_oss evtchn snd_mixer_oss coretemp snd_seq snd_seq_device edd nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs bridge stp llc ip6t_REJECT nf_conntrack_ipv6 ip6table_raw xt_NOTRACK ipt_REJECT xt_state iptable_raw iptable_filter ip6table_mangle nf_conntrack_netbios_ns nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables ip6table_filter ip6_tables x_tables fuse loop snd_hda_codec_realtek firewire_ohci firewire_core crc_itu_t snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer ohci1394 snd usbhid 8139too soundcore ppdev 8250_pnp hid usblp ieee1394 8139cp forcedeth pcspkr shpchp i2c_nforce2 snd_page_alloc parport_pc sg 8250 sr_mod pci_hotplug parport serial_core floppy asus_atk0110 ext4 jbd2 crc16 dm_mirror dm_region_hash dm_log nouveau ttm drm_kms_helper ohci_hcd drm agpgart i2c_algo_bit i2c_core ehci_hcd sd_m Sep 30 22:55:28 saturn kernel: od usbcore button dm_snapshot dm_mod xenblk cdrom xennet fan processor ata_generic pata_amd sata_nv libata scsi_mod thermal thermal_sys hwmon Sep 30 22:55:28 saturn kernel: [ 425.812110] Pid: 0, comm: swapper Not tainted 2.6.34.7-0.3.99.8.0873825-xen #1 Sep 30 22:55:28 saturn kernel: [ 425.812128] [<ffffffff8040a79b>] dump_stack+0x69/0x6f Sep 30 22:55:28 saturn kernel: [ 425.812134] [<ffffffff80043943>] warn_slowpath_common+0x73/0xb0 Sep 30 22:55:28 saturn kernel: [ 425.812138] [<ffffffff800439e0>] warn_slowpath_fmt+0x40/0x50 Sep 30 22:55:28 saturn kernel: [ 425.812142] [<ffffffff8034d04b>] dev_watchdog+0x25b/0x270 Sep 30 22:55:28 saturn kernel: [ 425.812149] [<ffffffff80053d34>] run_timer_softirq+0x1d4/0x3d0 Sep 30 22:55:28 saturn kernel: [ 425.812154] [<ffffffff8004b8c8>] __do_softirq+0xe8/0x220 Sep 30 22:55:28 saturn kernel: [ 425.812159] [<ffffffff80007efc>] call_softirq+0x1c/0x30 Sep 30 22:55:28 saturn kernel: [ 425.812163] [<ffffffff80009595>] do_softirq+0xa5/0xe0 Sep 30 22:55:28 saturn kernel: [ 425.812168] [<ffffffff8004bafd>] irq_exit+0x8d/0xa0 Sep 30 22:55:28 saturn kernel: [ 425.812174] [<ffffffff802d27d2>] evtchn_do_upcall+0x222/0x270 Sep 30 22:55:28 saturn kernel: [ 425.812179] [<ffffffff80007a4e>] do_hypervisor_callback+0x1e/0x30 Sep 30 22:55:28 saturn kernel: [ 425.812190] [<ffffffff800033aa>] 0xffffffff800033aa Sep 30 22:55:28 saturn kernel: [ 425.812199] [<ffffffff80009c0c>] xen_safe_halt+0xc/0x10 Sep 30 22:55:28 saturn kernel: [ 425.812202] [<ffffffff8000e763>] xen_idle+0x43/0xc0 Sep 30 22:55:28 saturn kernel: [ 425.812207] [<ffffffff80005255>] cpu_idle+0x55/0xa0 Sep 30 22:55:28 saturn kernel: [ 425.812213] [<ffffffff80761b0a>] start_kernel+0x3d2/0x3dd Sep 30 22:55:28 saturn kernel: [ 425.812216] ---[ end trace b6b372b1b3719054 ]--- Sep 30 22:55:31 saturn kernel: [ 428.812028] eth1: link up, 100Mbps, full-duplex, lpa 0x45E1 Sep 30 22:59:13 saturn shutdown[6123]: shutting down for system halt Sep 30 22:59:13 saturn init: Switching to runlevel: 0 Sep 30 22:59:19 saturn sshd[3412]: Received signal 15; terminating. Sep 30 22:59:19 saturn avahi-daemon[3592]: Leaving mDNS multicast group on interface br1.IPv4 with address 10.0.1.20. Sep 30 22:59:19 saturn avahi-daemon[3592]: Leaving mDNS multicast group on interface br0.IPv4 with address 10.0.0.20. Sep 30 22:59:19 saturn auditd[3340]: The audit daemon is exiting. Sep 30 22:59:19 saturn smartd[4254]: smartd received signal 15: Terminated Sep 30 22:59:19 saturn smartd[4254]: Device: /dev/sda [SAT], state written to /var/lib/smartmontools/smartd.ST3250620AS-5QF15S8C.ata.state Sep 30 22:59:19 saturn smartd[4254]: Device: /dev/sdb [SAT], state written to /var/lib/smartmontools/smartd.ST3250620AS-9QE06V9D.ata.state Sep 30 22:59:19 saturn smartd[4254]: smartd is exiting (exit status 0) Sep 30 22:59:19 saturn gnome-keyring-daemon[4985]: dbus failure unregistering from session: Connection is closed Sep 30 22:59:19 saturn gnome-keyring-daemon[4985]: dbus failure unregistering from session: Connection is closed Sep 30 22:59:19 saturn polkitd(authority=local): Unregistered Authentication Agent for session /org/freedesktop/ConsoleKit/Session2 (system bus name :1.56, object path /org/gnome/PolicyKit1/AuthenticationAgent, locale fr_FR.utf8) (disconnected from bus) Sep 30 22:59:19 saturn kernel: [ 656.464725] [drm] nouveau 0000:03:00.0: nouveau_channel_free: freeing fifo 2 On the VM side, I got mm.c 799:d2 non-privileged(2) attenpt tp map I/O space 0000...f0 Hope it will help ... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=641900 https://bugzilla.novell.com/show_bug.cgi?id=641900#c22 --- Comment #22 from Jan Beulich <jbeulich@novell.com> 2010-10-01 09:35:18 UTC --- (In reply to comment #16)
Created an attachment (id=392409) --> (http://bugzilla.novell.com/attachment.cgi?id=392409) [details] First log booting Dom0 and crashing dom0
Did you see "(XEN) APIC error on CPU3: 00(40)"? Are you having problems with your hardware? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=641900 https://bugzilla.novell.com/show_bug.cgi?id=641900#c23 --- Comment #23 from Jan Beulich <jbeulich@novell.com> 2010-10-01 09:37:38 UTC --- (In reply to comment #17)
Created an attachment (id=392410) --> (http://bugzilla.novell.com/attachment.cgi?id=392410) [details] Second Log booting dom0 with acpi=on
As usually I boot with acpi=off, i changed this (removing the option), xen kernel start booting but crashed before the end of the boot ... log contains more information on crash.
The log here is completely meaningless. You pressed arbitrary keys on the serial console (or the remote end sent them without you asking for them) - one can't even tell whether the box was hung, or how far the boot progressed. BUT: if you think you need to disable ACPI, that may be part of your problem. I have yet to understand why you need to... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=641900 https://bugzilla.novell.com/show_bug.cgi?id=641900#c24 --- Comment #24 from Jan Beulich <jbeulich@novell.com> 2010-10-01 09:42:39 UTC --- (In reply to comment #18)
Third test : back to acpi=off, so the context is the same as in the first log, but this time I was not able to finish to boot before crash appends.
Just like for the previous one - there's no evidence that the box crashed, you just had it print huge piles of information. If you didn't ask for it yourself, you'll need to tweak your "other end" of the serial cable (also indicated by the extra blank lines inserted, which make the logs quite hard to read).
I will try to crash it differently to be able to get keyboard access to type the xm dmesg command
No need for "xm dmesg" once you have a serial cable. You get all messages there, and you issue debug keys from the serial console (after switching input to Xen). -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=641900 https://bugzilla.novell.com/show_bug.cgi?id=641900#c25 --- Comment #25 from Jan Beulich <jbeulich@novell.com> 2010-10-01 09:43:58 UTC --- (In reply to comment #20)
Only found ftp://ftp.suse.com/pub/projects/kernel/kotd/openSUSE-11.3/x86_64/kernel-xen-debuginfo-2.6.34.7-0.3.99.8.0873825.x86_64.rpm But try that one ... expecting more trace !
Sorry, I really intended to direct you to ftp://ftp.suse.com/pub/projects/kernel/kotd/master/x86_64/. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=641900 https://bugzilla.novell.com/show_bug.cgi?id=641900#c Jan Beulich <jbeulich@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #392393|application/octet-stream |text/plain mime type| | -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=641900 https://bugzilla.novell.com/show_bug.cgi?id=641900#c Jan Beulich <jbeulich@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #392394|application/octet-stream |text/plain mime type| | -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=641900 https://bugzilla.novell.com/show_bug.cgi?id=641900#c26 --- Comment #26 from Jan Beulich <jbeulich@novell.com> 2010-10-01 09:52:16 UTC --- Turning off ACPI only for Xen makes things even more suspicious. What's the deal here? Also, can you reproduce your problems on other, very different hardware? Finally, one thing you definitely want to try is disabling the use of the nouveau driver in the Xen case. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=641900 https://bugzilla.novell.com/show_bug.cgi?id=641900#c27 --- Comment #27 from Paul Pinault <disk_91@hotmail.com> 2010-10-01 10:04:22 UTC --- (In reply to comment #22)
(In reply to comment #16)
Created an attachment (id=392409) --> (http://bugzilla.novell.com/attachment.cgi?id=392409) [details] [details] First log booting Dom0 and crashing dom0
Did you see "(XEN) APIC error on CPU3: 00(40)"? Are you having problems with your hardware?
I don't think so, system is not crashing when I choose a non xen kernel. CPU is a fresh one never overclocked of something like this. I had a problem with a previous motherboard but I had the problem before and I continue to have it ... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=641900 https://bugzilla.novell.com/show_bug.cgi?id=641900#c28 --- Comment #28 from Paul Pinault <disk_91@hotmail.com> 2010-10-01 10:08:00 UTC --- (In reply to comment #23)
(In reply to comment #17)
Created an attachment (id=392410) --> (http://bugzilla.novell.com/attachment.cgi?id=392410) [details] [details] Second Log booting dom0 with acpi=on
As usually I boot with acpi=off, i changed this (removing the option), xen kernel start booting but crashed before the end of the boot ... log contains more information on crash.
The log here is completely meaningless. You pressed arbitrary keys on the serial console (or the remote end sent them without you asking for them) - one can't even tell whether the box was hung, or how far the boot progressed.
BUT: if you think you need to disable ACPI, that may be part of your problem. I have yet to understand why you need to...
In fact with or without acpi it does not change anything, what I detect is that acpi with a slow serial console in crashing, here, i don't kno why but my remote uart is set at 9600bps and can't be set to a higher baudrate. At the baudrate I can't boot the acpi on xen kernel ... I do not this this log is really interesting regarding the network problem ; it was in case of .. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=641900 https://bugzilla.novell.com/show_bug.cgi?id=641900#c29 --- Comment #29 from Paul Pinault <disk_91@hotmail.com> 2010-10-01 10:13:03 UTC ---
Turning off ACPI only for Xen makes things even more suspicious. What's the deal here? The deal was to be able to detect my sensors but right now, acpi is on and my sensors worked well so I removed acpi=off. This does not affect the crash (that was the purpose of the different test - unvalidate this setting impact)
Also, can you reproduce your problems on other, very different hardware? I do not have other hadware actually available for this.
Finally, one thing you definitely want to try is disabling the use of the nouveau driver in the Xen case. I do not understand what you mean by this. what is the "nouveau driver" ?
-- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=641900 https://bugzilla.novell.com/show_bug.cgi?id=641900#c30 --- Comment #30 from Paul Pinault <disk_91@hotmail.com> 2010-10-01 10:36:28 UTC ---
Finally, one thing you definitely want to try is disabling the use of the nouveau driver in the Xen case. I do not understand what you mean by this. what is the "nouveau driver" ?
Sorry ... got it ! I'll try asap. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=641900 https://bugzilla.novell.com/show_bug.cgi?id=641900#c31 --- Comment #31 from Paul Pinault <disk_91@hotmail.com> 2010-10-01 20:22:12 UTC --- Tonight test - To blacklist nouveau ... I tryed to add "blacklist nouveau" into /etc/modprobe.d/50-blacklist.conf and 00-system.conf ... after a reboot, nouveau module is still here ... so any idea to blacklist it really ? other than moving nouveau.ko out of the fs ? - kernel patch to 2.6.36 The system crashed ... this time ata1 then ata2 crashed ... i'll try to attach log file tomorrow -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=641900 https://bugzilla.novell.com/show_bug.cgi?id=641900#c32 --- Comment #32 from Paul Pinault <disk_91@hotmail.com> 2010-10-05 20:09:01 UTC --- For your information I finished to migrate my VM from xen to qemu-kvm ... now, the system look stable : all vm running in parallel and actually worked well. I'm still able to reproduce the crash if you need my assistance to fix it. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=641900 https://bugzilla.novell.com/show_bug.cgi?id=641900#c33 --- Comment #33 from Jan Beulich <jbeulich@novell.com> 2010-11-04 15:45:50 UTC --- Still missing the log promised in #31. Also please try disabling IRQ balancing in Xen ("noirqbalance" on the Xen command line) and/or in Linux (disabling the irq balance daemon in case it is enabled). -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=641900 https://bugzilla.novell.com/show_bug.cgi?id=641900#c34 --- Comment #34 from Paul Pinault <disk_91@hotmail.com> 2010-11-05 16:13:15 UTC --- (In reply to comment #33)
Still missing the log promised in #31. Nothing attached as the log is the same as the previous one. Nothing to see on it.
Also please try disabling IRQ balancing in Xen ("noirqbalance" on the Xen command line) and/or in Linux (disabling the irq balance daemon in case it is enabled). Next time i will have to reboot my kvmqemu config i will do the test... Actually it works since at least one month. Not sure to answer quickly.
-- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=641900 https://bugzilla.novell.com/show_bug.cgi?id=641900#c35 --- Comment #35 from Jan Beulich <jbeulich@novell.com> 2010-11-05 16:55:23 UTC --- (In reply to comment #34)
Nothing attached as the log is the same as the previous one. Nothing to see on it.
How that, if now you don't load the nouveau driver, while previously you did? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=641900 https://bugzilla.novell.com/show_bug.cgi?id=641900#c Ihno Krumreich <ihno@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|P5 - None |P4 - Low CC| |ihno@novell.com -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=641900 https://bugzilla.novell.com/show_bug.cgi?id=641900#c36 --- Comment #36 from Jan Beulich <jbeulich@novell.com> 2011-04-26 15:11:06 UTC --- Ping? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=641900 https://bugzilla.novell.com/show_bug.cgi?id=641900#c37 Jan Beulich <jbeulich@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |CLOSED InfoProvider|disk_91@hotmail.com | Resolution| |NORESPONSE --- Comment #37 from Jan Beulich <jbeulich@novell.com> 2011-07-06 08:09:53 UTC --- No response in over half a year. Feel free to re-open if you're ready to continue providing necessary information. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com