[opensuse] o2net_idle_timer
Hello, I've setup multiple SLES 10 boxes for a OCFS2/Linux-HA solution but i'm having some difficulties with o2net. Just one pair of boxes work with OCFS2/Linux-HA. I really need some help in this because I don't know what's different in the only pair that it works except SLES 10 only got heartbeat* and ocfs2* updates and the others got full updates. ---- network config ---- system1: eth1 Link encap:Ethernet HWaddr 00:0C:29:1A:7C:6A inet addr:192.168.100.1 Bcast:192.168.100.255 Mask:255.255.255.0 system2: eth1 Link encap:Ethernet HWaddr 00:0C:29:17:E1:D3 inet addr:192.168.100.2 Bcast:192.168.100.255 Mask:255.255.255.0 system1:~ # ping 192.168.100.2 PING 192.168.100.2 (192.168.100.2) 56(84) bytes of data. 64 bytes from 192.168.100.2: icmp_seq=1 ttl=64 time=1.12 ms 64 bytes from 192.168.100.2: icmp_seq=2 ttl=64 time=0.199 ms 64 bytes from 192.168.100.2: icmp_seq=3 ttl=64 time=0.334 ms system2:~ # ping 192.168.100.1 PING 192.168.100.1 (192.168.100.1) 56(84) bytes of data. 64 bytes from 192.168.100.1: icmp_seq=1 ttl=64 time=0.941 ms 64 bytes from 192.168.100.1: icmp_seq=2 ttl=64 time=0.209 ms system1:~ # cat /etc/hosts 127.0.0.1 localhost system1.site.pt system1 192.168.229.131 system2.site.pt system2 system2:~ # cat /etc/hosts 127.0.0.1 localhost system2.site.pt system2 192.168.229.130 system1.site.pt system1 system1:~ # route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 192.168.100.0 0.0.0.0 255.255.255.0 U 0 0 0 eth1 192.168.229.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0 169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0 127.0.0.0 0.0.0.0 255.0.0.0 U 0 0 0 lo 0.0.0.0 192.168.229.2 0.0.0.0 UG 0 0 0 eth0 system2:~ # route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 192.168.100.0 0.0.0.0 255.255.255.0 U 0 0 0 eth1 192.168.229.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0 169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0 127.0.0.0 0.0.0.0 255.0.0.0 U 0 0 0 lo 0.0.0.0 192.168.229.2 0.0.0.0 UG 0 0 0 eth0 ---- o2cb config --- system1:~ # cat /etc/sysconfig/o2cb O2CB_ENABLED=true O2CB_BOOTCLUSTER=cluster O2CB_HEARTBEAT_THRESHOLD=31 O2CB_HEARTBEAT_MODE="user" system2:~ # cat /etc/sysconfig/o2cb O2CB_ENABLED=true O2CB_BOOTCLUSTER=cluster O2CB_HEARTBEAT_THRESHOLD=31 O2CB_HEARTBEAT_MODE="user" ---- /etc/ocfs2/cluster.conf ---- system1:~ # cat /etc/ocfs2/cluster.conf node: ip_port = 7777 ip_address = 192.168.100.1 number = 0 name = system1 cluster = cluster node: ip_port = 7777 ip_address = 192.168.100.2 number = 1 name = system2 cluster = cluster cluster: node_count = 2 name = cluster system1:~ # md5sum /etc/ocfs2/cluster.conf 7cb6fa81132051e8a1951832d02945fc /etc/ocfs2/cluster.conf system2:~ # md5sum /etc/ocfs2/cluster.conf 7cb6fa81132051e8a1951832d02945fc /etc/ocfs2/cluster.conf ---- /var/log/messages ---- Jan 13 21:31:12 system1 kernel: Node system1 is up in group 03AE9F3FE04A4E5DAAD052FC42AE50E2 Jan 13 21:31:12 system1 kernel: Node system2 is up in group 03AE9F3FE04A4E5DAAD052FC42AE50E2 Jan 13 21:31:13 system1 kernel: o2net: accepted connection from node system2 (num 1) at 192.168.100.2:7777 Jan 13 21:31:15 system1 kernel: ocfs2_dlm: Nodes in domain ("03AE9F3FE04A4E5DAAD052FC42AE50E2"): 0 1 Jan 13 21:31:15 system1 kernel: kjournald starting. Commit interval 5 seconds Jan 13 21:31:15 system1 kernel: ocfs2: Mounting device (8,17) on (node 0, slot 1) Jan 13 21:31:16 system1 kernel: o2net: no longer connected to node system2 (num 1) at 192.168.100.2:7777 Jan 13 21:31:23 system2 kernel: o2net: connected to node system1 (num 0) at 192.168.100.1:7777 Jan 13 21:31:23 system2 kernel: ocfs2_dlm: Nodes in domain ("03AE9F3FE04A4E5DAAD052FC42AE50E2"): 1 Jan 13 21:31:23 system2 kernel: (3161,0):ocfs2_find_slot:261 slot 0 is already allocated to this node! Jan 13 21:31:23 system2 kernel: (3161,0):ocfs2_check_volume:1651 File system was not unmounted cleanly, recovering volume. Jan 13 21:31:23 system2 kernel: (fs/jbd/recovery.c, 255): journal_recover: JBD: recovery, exit status 0, recovered transactions 18 to 19 Jan 13 21:31:23 system2 kernel: (fs/jbd/recovery.c, 257): journal_recover: JBD: Replayed 0 and revoked 0/0 blocks Jan 13 21:31:23 system2 kernel: kjournald starting. Commit interval 5 seconds Jan 13 21:31:24 system2 kernel: ocfs2: Mounting device (8,17) on (node 1, slot 0) Jan 13 21:31:24 system2 kernel: (3169,0):ocfs2_replay_journal:1174 Recovering node 0 from slot 1 on device (8,17) Jan 13 21:31:25 system2 kernel: (fs/jbd/recovery.c, 255): journal_recover: JBD: recovery, exit status 0, recovered transactions 11 to 12 Jan 13 21:31:25 system2 kernel: (fs/jbd/recovery.c, 257): journal_recover: JBD: Replayed 0 and revoked 0/0 blocks Jan 13 21:31:25 system2 kernel: kjournald starting. Commit interval 5 seconds Jan 13 21:31:31 system2 kernel: ocfs2_dlm: Node 0 joins domain 03AE9F3FE04A4E5DAAD052FC42AE50E2 Jan 13 21:31:31 system2 kernel: ocfs2_dlm: Nodes in domain ("03AE9F3FE04A4E5DAAD052FC42AE50E2"): 0 1 Jan 13 21:31:33 system2 kernel: o2net: connection to node system1 (num 0) at 192.168.100.1:7777 has been idle for 10 seconds, shutting it down. Jan 13 21:31:33 system2 kernel: (3173,0):o2net_idle_timer:1314 here are some times that might help debug the situation: (tmr 1168723883.85116 now 1168723893.85809 dr 1168723892.403630 adv 1168723892.403647:1168723892.403647 func (ce961a9e:504) 1168723892.403117:1168723892.403161) Jan 13 21:31:33 system2 kernel: o2net: no longer connected to node system1 (num 0) at 192.168.100.1:7777 -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Okay, with extreme luck I finally got the problem. I've downgraded the kernel to kernel-default-2.6.16.21-0.25 and everything works. The kernel kernel-default-2.6.16.27-0.6 doesn't work with the errors show in by last email. There are only to patches about ocfs2: patches.fixes/ocfs2-network-send-lock.diff: ocfs2: introduce sc->sc_send_lock to protect outbound network messages [#216912] patches.suse/ocfs2-13-fix-quorum-work.diff: ocfs2: outstanding scheduled work can oops when quorum is shut down [#220694] -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
participants (1)
-
José Costa