From fn19.site (192.168.40.59) icmp_seq=1 Destination Host Unreachable Nov 8 14:03:10 fn18 mpd: fn18_38610 (runmainloop 308): no pulse_ack from rhs; re-entering ring
https://bugzilla.novell.com/show_bug.cgi?id=337003#c8 Bernard Delley <bernard.delley@psi.ch> changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #181832|0 |1 is obsolete| | --- Comment #8 from Bernard Delley <bernard.delley@psi.ch> 2007-11-08 06:50:15 MST --- Created an attachment (id=182612) --> (https://bugzilla.novell.com/attachment.cgi?id=182612) var/log/messages while crashing the network interface on mpich2 slave I can reproducibly crash the network interface, by running a small mpi test program a few times. the program sends fake data between the two nodes. The machines remain almost idling. the program worked fine for example on older nodes operated with suse10.1 and on many other. here are the last line appearing in messages on fn18 after the mentioned actions were taken on fn19 or on fn18 console. ssh fn18 tail -f messages after reboot, mount, ssh root@fn18 Nov 8 13:56:25 fn18 sshd[3881]: Accepted publickey for root from 192.168.40.42 port 47757 ssh2 mpdboot -n 2 --verbose --chkup Nov 8 13:58:35 fn18 sshd[3928]: PAM audit_log_acct_message() failed: Operation not permitted mpdtrace -l Nov 8 13:59:08 fn18 sshd[3950]: PAM audit_log_acct_message() failed: Operation not permitted mpiexec -machinefile mpd.hosts -n 2 Mpt Nov 8 13:59:46 fn18 sshd[3979]: PAM audit_log_acct_message() failed: Operation not permitted ssh fn18 then exit Nov 8 14:00:26 fn18 sshd[4028]: Accepted publickey for delley from 192.168.40.59 port 45559 ssh2 Nov 8 14:00:29 fn18 sshd[4030]: PAM audit_log_acct_message() failed: Operation not permitted mpiexec -machinefile mpd.hosts -n 2 Mpt Nov 8 14:01:10 fn18 mpdman: mpdman starting new log; fn18_mpdman_1 sh fn18 then several times mpiexec -machinefile mpd.hosts -n 2 Mpt after a while: Nov 8 14:03:10 fn18 mpd: fn18_38610 (runmainloop 308): no pulse_ack from rhs; re-entering ring ping fn18 PING fn18.site (192.168.40.58) 56(84) bytes of data. the directly connected console showed messages at Nov 8 14:07:23 fn18 mpd: mpd ending mpdid=fn18_38610 (inside cleanup) at console /etc/init.d/network restart eth0 Nov 8 14:11:22 fn18 kernel: eth0: no IPv6 routers present then mpdboot etc after a few mpiexec network interfaces on fn18 and fn19 were dead and later reanimated from console with "network restart eth0" -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.