From fn19.site (192.168.40.59) icmp_seq=1 Destination Host Unreachable
Nov 8 14:03:10 fn18 mpd: fn18_38610 (runmainloop 308): no pulse_ack from rhs;
re-entering ring
https://bugzilla.novell.com/show_bug.cgi?id=337003#c8
Bernard Delley changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #181832|0 |1
is obsolete| |
--- Comment #8 from Bernard Delley 2007-11-08 06:50:15 MST ---
Created an attachment (id=182612)
--> (https://bugzilla.novell.com/attachment.cgi?id=182612)
var/log/messages while crashing the network interface on mpich2 slave
I can reproducibly crash the network interface, by running a small mpi test
program a few times. the program sends fake data between the two nodes.
The machines remain almost idling.
the program worked fine for example on older nodes operated with suse10.1
and on many other.
here are the last line appearing in messages on fn18 after the mentioned
actions were taken on fn19 or on fn18 console.
ssh fn18
tail -f messages
after reboot, mount, ssh root@fn18
Nov 8 13:56:25 fn18 sshd[3881]: Accepted publickey for root from 192.168.40.42
port 47757 ssh2
mpdboot -n 2 --verbose --chkup
Nov 8 13:58:35 fn18 sshd[3928]: PAM audit_log_acct_message() failed: Operation
not permitted
mpdtrace -l
Nov 8 13:59:08 fn18 sshd[3950]: PAM audit_log_acct_message() failed: Operation
not permitted
mpiexec -machinefile mpd.hosts -n 2 Mpt
Nov 8 13:59:46 fn18 sshd[3979]: PAM audit_log_acct_message() failed: Operation
not permitted
ssh fn18 then exit
Nov 8 14:00:26 fn18 sshd[4028]: Accepted publickey for delley from
192.168.40.59 port 45559 ssh2
Nov 8 14:00:29 fn18 sshd[4030]: PAM audit_log_acct_message() failed: Operation
not permitted
mpiexec -machinefile mpd.hosts -n 2 Mpt
Nov 8 14:01:10 fn18 mpdman: mpdman starting new log; fn18_mpdman_1
sh fn18 then several times mpiexec -machinefile mpd.hosts -n 2 Mpt
after a while:
Nov 8 14:03:10 fn18 mpd: fn18_38610 (runmainloop 308): no pulse_ack from rhs;
re-entering ring
ping fn18
PING fn18.site (192.168.40.58) 56(84) bytes of data.
the directly connected console showed messages at
Nov 8 14:07:23 fn18 mpd: mpd ending mpdid=fn18_38610 (inside cleanup)
at console /etc/init.d/network restart eth0
Nov 8 14:11:22 fn18 kernel: eth0: no IPv6 routers present
then mpdboot etc after a few mpiexec network interfaces on fn18 and fn19
were dead and later reanimated from console with "network restart eth0"
--
Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.