[Bug 685470] New: openSUSE:11.4/corosync: Bug
https://bugzilla.novell.com/show_bug.cgi?id=685470 https://bugzilla.novell.com/show_bug.cgi?id=685470#c0 Summary: openSUSE:11.4/corosync: Bug Classification: openSUSE Product: openSUSE.org Version: unspecified Platform: x86-64 OS/Version: openSUSE 11.4 Status: NEW Severity: Normal Priority: P5 - None Component: 3rd party software AssignedTo: nix@opensuse.org ReportedBy: k.slott@vink-slott.dk QAContact: opensuse-communityscreening@forge.provo.novell.com Found By: Community User Blocker: --- Created an attachment (id=423424) --> (http://bugzilla.novell.com/attachment.cgi?id=423424) Logfile On 2 machine with a reasonable* clean 11.4 install I try to add corosync zypper in openais libglue2 pacemaker libdlm I have configured following advise from http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratc... When I start the first node pengine and corosync goes defunct immediately 4194 ? Ssl 0:00 /usr/sbin/corosync 4200 ? S 0:00 \_ /usr/lib64/heartbeat/stonithd 4201 ? S 0:00 \_ /usr/lib64/heartbeat/cib 4202 ? S 0:00 \_ /usr/lib64/heartbeat/lrmd 4203 ? S 0:00 \_ /usr/lib64/heartbeat/attrd 4204 ? Z 0:00 \_ [pengine] <defunct> 4205 ? S 0:00 \_ /usr/lib64/heartbeat/crmd 4206 ? Z 0:00 \_ [corosync] <defunct> 4208 ? S 0:00 \_ /usr/lib64/heartbeat/stonithd 4212 ? S 0:00 \_ /usr/lib64/heartbeat/pengine 4213 ? S 0:00 \_ /usr/lib64/heartbeat/crmd 4242 ? S 0:00 \_ /usr/lib64/heartbeat/cib 4243 ? S 0:00 \_ /usr/lib64/heartbeat/attrd I never succeed in contacting the other server. And issuing "rcopenais stop" either, does nothing, or makes stonithd and cib go berserk and consuming 100% cpu until killed with a SIGKILL ss2:~ # rpm -qa | grep -E 'openais|libglue2|pacemaker|libdlm' libdlm3-3.00.01-8.2.x86_64 libpacemaker3-1.1.5-3.2.x86_64 libdlm-3.00.01-8.2.x86_64 libopenais3-1.1.4-3.1.x86_64 openais-1.1.4-3.1.x86_64 libglue2-1.0.7-4.2.x86_64 pacemaker-1.1.5-3.2.x86_64 *reasonable*) Before trying corosync i experimented with heartbeat but removed it using yast before installing corosync -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=685470 https://bugzilla.novell.com/show_bug.cgi?id=685470#c1 Jose Fernando Scheid Mascarenhas <fernando@unitech.inf.br> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |fernando@unitech.inf.br --- Comment #1 from Jose Fernando Scheid Mascarenhas <fernando@unitech.inf.br> 2011-07-14 20:53:08 UTC --- I'm having the same problem. 3463 ? Ssl 0:00 /usr/sbin/corosync 3469 ? S 0:00 \_ /usr/lib64/heartbeat/stonithd 3470 ? S 0:00 \_ /usr/lib64/heartbeat/cib 3471 ? Z 0:00 \_ [lrmd] <defunct> 3472 ? S 0:00 \_ /usr/lib64/heartbeat/attrd 3473 ? Z 0:00 \_ [pengine] <defunct> 3474 ? S 0:00 \_ /usr/lib64/heartbeat/crmd 3475 ? S 0:00 \_ /usr/lib64/heartbeat/mgmtd 3477 ? S 0:00 \_ /usr/lib64/heartbeat/stonithd 3478 ? S 0:00 \_ /usr/lib64/heartbeat/cib 3479 ? S 0:00 \_ /usr/lib64/heartbeat/lrmd 3480 ? S 0:00 \_ /usr/lib64/heartbeat/attrd 3481 ? S 0:00 \_ /usr/lib64/heartbeat/pengine 3482 ? S 0:00 \_ /usr/lib64/heartbeat/crmd # rcopenais stop Stopping OpenAIS/corosync daemon (corosync): ..done OK # ps ax --forest # again 3469 ? R 1:16 /usr/lib64/heartbeat/stonithd 3470 ? R 1:09 /usr/lib64/heartbeat/cib 3475 ? S 0:00 /usr/lib64/heartbeat/mgmtd System very slow. load average goes up and counting # top top - 17:49:26 up 2:00, 2 users, load average: 3.94, 1.66, 0.66 Tasks: 74 total, 3 running, 71 sleeping, 0 stopped, 0 zombie Cpu(s): 31.9%us, 54.7%sy, 0.0%ni, 13.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 1013288k total, 316304k used, 696984k free, 13980k buffers Swap: 1052668k total, 0k used, 1052668k free, 254456k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3469 root RT 0 81236 2652 2000 R 97 0.3 2:00.60 stonithd 3470 hacluste RT 0 82924 4928 2732 R 94 0.5 1:50.96 cib stonithd and cib eating my cpu - After kill -15 cib and stonithd, load average goes down. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=685470 https://bugzilla.novell.com/show_bug.cgi?id=685470#c2 Earl Ruby <eruby@knowledgematters.net> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |eruby@knowledgematters.net --- Comment #2 from Earl Ruby <eruby@knowledgematters.net> 2011-10-13 20:54:45 UTC --- I have the same problem using the same configuration. After a fresh reboot I see: 1935 ? Ssl 0:00 /usr/sbin/corosync 1952 ? S 0:00 \_ /usr/lib64/heartbeat/stonithd 1953 ? S 0:00 \_ /usr/lib64/heartbeat/cib 1954 ? Z 0:00 \_ [lrmd] <defunct> 1955 ? S 0:00 \_ /usr/lib64/heartbeat/attrd 1956 ? Z 0:00 \_ [pengine] <defunct> 1957 ? S 0:00 \_ /usr/lib64/heartbeat/crmd 1958 ? Z 0:00 \_ [mgmtd] <defunct> 1961 ? S 0:00 \_ /usr/lib64/heartbeat/stonithd 1962 ? S 0:00 \_ /usr/lib64/heartbeat/cib 1963 ? S 0:00 \_ /usr/lib64/heartbeat/lrmd 1964 ? S 0:00 \_ /usr/lib64/heartbeat/attrd 1965 ? S 0:00 \_ /usr/lib64/heartbeat/pengine 1966 ? S 0:00 \_ /usr/lib64/heartbeat/crmd After "rccorosync stop" stonithd and cib are still running and the load spikes. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=685470 https://bugzilla.novell.com/show_bug.cgi?id=685470#c3 --- Comment #3 from Earl Ruby <eruby@knowledgematters.net> 2011-10-13 20:58:54 UTC --- Correction: Make that "rcopenais stop". -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=685470 https://bugzilla.novell.com/show_bug.cgi?id=685470#c Tim Serong <tserong@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED Version|unspecified |Final CC| |tserong@suse.com Component|3rd party software |Network AssignedTo|nix@opensuse.org |bnc-team-screening@forge.pr | |ovo.novell.com QAContact|opensuse-communityscreening |qa@suse.de |@forge.provo.novell.com | Product|openSUSE.org |openSUSE 11.4 -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=685470 https://bugzilla.novell.com/show_bug.cgi?id=685470#c Tim Serong <tserong@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|bnc-team-screening@forge.pr |tserong@suse.com |ovo.novell.com | -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=685470 https://bugzilla.novell.com/show_bug.cgi?id=685470#c4 Tim Serong <tserong@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |INVALID --- Comment #4 from Tim Serong <tserong@suse.com> 2011-10-17 04:02:00 UTC --- If you look in /etc/corosync/corosync.conf, you will see: service { # Load the Pacemaker Cluster Resource Manager ver: 0 name: pacemaker use_mgmtd: yes use_logd: yes } Because this is already in /etc/corosync/corosync.conf, you should *not* create /etc/corosync/service.d/pcmk. If corosync is told to load pacemaker both in the main config file, and in the separate service.d/pcmk file, it will load pacemaker twice, causing the problem you have observed. I realise this is slightly different than the configuration described in the latest revision of Clusters from Scratch from upstream, but this reflects some packaging differences between Fedora (used in that document) and openSUSE. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=685470 https://bugzilla.novell.com/show_bug.cgi?id=685470#c5 --- Comment #5 from Earl Ruby <eruby@knowledgematters.net> 2011-10-17 19:39:05 UTC --- Thanks! I had seen the entry for pacemaker in /etc/corosync/corosync.conf, didn't realize that it was a duplicate of /etc/corosync/services.d/pcmk. I rebuilt the VMs from scratch so I had a clean install, then applied my configuration, removing the /etc/corosync/services.d/pcmk file, rebooted the VMs and all was well. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com