On Tue, 2014-07-08 at 11:59 +0200, steve wrote:
On Tue, 2014-07-08 at 11:32 +0200, Richard Brown wrote:
On Tue, 2014-07-08 at 11:24 +0200, steve wrote:
2014/07/08 11:06:39.067080 [ 4035]: 00.ctdb: Consider installing tdbtool or at least tdbdump!
I'd recommend you install tdb-tools and try again
OK. With tdb-tools: node 1
2014/07/08 11:54:32.921389 [ 2856]: CTDB starting on node 2014/07/08 11:54:32.974367 [ 2857]: Starting CTDBD (Version 2.3) as PID: 2857 2014/07/08 11:54:32.985424 [ 2857]: Created PID file /var/run/ctdb/ctdbd.pid 2014/07/08 11:54:32.996422 [ 2857]: Set scheduler to SCHED_FIFO 2014/07/08 11:54:32.997392 [ 2857]: Set runstate to INIT (1) 2014/07/08 11:54:33.789104 [ 2857]: Freeze priority 1 2014/07/08 11:54:33.842150 [ 2857]: Freeze priority 2 2014/07/08 11:54:33.863673 [ 2857]: Freeze priority 3 2014/07/08 11:54:33.899042 [ 2857]: server/ctdb_takeover.c:3239 Released 0 public IPs 2014/07/08 11:54:33.899240 [ 2857]: Set runstate to SETUP (2) 2014/07/08 11:54:34.464217 [ 2857]: Set runstate to FIRST_RECOVERY (3) 2014/07/08 11:54:34.467391 [ 2857]: Keepalive monitoring has been started 2014/07/08 11:54:34.467923 [ 2857]: Monitoring has been started 2014/07/08 11:54:34.482718 [recoverd: 2935]: monitor_cluster starting 2014/07/08 11:54:34.525244 [recoverd: 2935]: server/ctdb_recoverd.c:3483 Initial recovery master set - forcing election 2014/07/08 11:54:34.527061 [ 2857]: Freeze priority 1 2014/07/08 11:54:34.528474 [ 2857]: Freeze priority 2 2014/07/08 11:54:34.529815 [ 2857]: Freeze priority 3 2014/07/08 11:54:34.540343 [ 2857]: This node (0) is now the recovery master 2014/07/08 11:54:35.469934 [ 2857]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 11:54:36.472449 [ 2857]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 11:54:37.474411 [ 2857]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 11:54:37.545470 [recoverd: 2935]: server/ctdb_recoverd.c:1061 Election timed out 2014/07/08 11:54:37.551942 [recoverd: 2935]: The interfaces status has changed on local node 0 - force takeover run 2014/07/08 11:54:37.554449 [recoverd: 2935]: Trigger takeoverrun 2014/07/08 11:54:37.557044 [recoverd: 2935]: Node:0 was in recovery mode. Start recovery process 2014/07/08 11:54:37.562645 [recoverd: 2935]: server/ctdb_recoverd.c:1601 Starting do_recovery 2014/07/08 11:54:37.563916 [recoverd: 2935]: Taking out recovery lock from recovery daemon 2014/07/08 11:54:37.564405 [recoverd: 2935]: Take the recovery lock 2014/07/08 11:54:37.565214 [recoverd: 2935]: ctdb_recovery_lock: Unable to open /cluster/ctbd/lockfile - (No such file or directory) 2014/07/08 11:54:37.566152 [recoverd: 2935]: Unable to get recovery lock - aborting recovery and ban ourself for 300 seconds 2014/07/08 11:54:37.567149 [recoverd: 2935]: Banning node 0 for 300 seconds 2014/07/08 11:54:37.567754 [ 2857]: Banning this node for 300 seconds 2014/07/08 11:54:37.567956 [ 2857]: This node has been banned - forcing freeze and recovery 2014/07/08 11:54:37.568059 [ 2857]: server/ctdb_takeover.c:3239 Released 0 public IPs 2014/07/08 11:54:38.476148 [ 2857]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 11:54:39.477229 [ 2857]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 11:54:40.478881 [ 2857]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 11:54:41.480322 [ 2857]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 11:54:42.481311 [ 2857]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 11:54:43.482493 [ 2857]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 11:54:43.639731 [recoverd: 2935]: Daemon has exited - shutting down client 2014/07/08 11:54:43.640344 [recoverd: 2935]: CTDB recoverd: shutting down
Sorry. Corrected the lockfile error: node 1 2014/07/08 12:04:26.400055 [ 3053]: CTDB starting on node 2014/07/08 12:04:26.451932 [ 3054]: Starting CTDBD (Version 2.3) as PID: 3054 2014/07/08 12:04:26.453895 [ 3054]: Created PID file /var/run/ctdb/ctdbd.pid 2014/07/08 12:04:26.456486 [ 3054]: Set scheduler to SCHED_FIFO 2014/07/08 12:04:26.457501 [ 3054]: Set runstate to INIT (1) 2014/07/08 12:04:27.220775 [ 3054]: Freeze priority 1 2014/07/08 12:04:27.267380 [ 3054]: Freeze priority 2 2014/07/08 12:04:27.298602 [ 3054]: Freeze priority 3 2014/07/08 12:04:27.326972 [ 3054]: server/ctdb_takeover.c:3239 Released 0 public IPs 2014/07/08 12:04:27.327258 [ 3054]: Set runstate to SETUP (2) 2014/07/08 12:04:27.953805 [ 3054]: Set runstate to FIRST_RECOVERY (3) 2014/07/08 12:04:27.957760 [ 3054]: Keepalive monitoring has been started 2014/07/08 12:04:27.958014 [ 3054]: Monitoring has been started 2014/07/08 12:04:28.003672 [recoverd: 3132]: monitor_cluster starting 2014/07/08 12:04:28.041949 [recoverd: 3132]: server/ctdb_recoverd.c:3483 Initial recovery master set - forcing election 2014/07/08 12:04:28.043504 [ 3054]: Freeze priority 1 2014/07/08 12:04:28.044779 [ 3054]: Freeze priority 2 2014/07/08 12:04:28.045983 [ 3054]: Freeze priority 3 2014/07/08 12:04:28.050910 [ 3054]: This node (0) is now the recovery master 2014/07/08 12:04:28.959722 [ 3054]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 12:04:29.961769 [ 3054]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 12:04:30.964780 [ 3054]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 12:04:31.057385 [recoverd: 3132]: server/ctdb_recoverd.c:1061 Election timed out 2014/07/08 12:04:31.065478 [recoverd: 3132]: The interfaces status has changed on local node 0 - force takeover run 2014/07/08 12:04:31.068229 [recoverd: 3132]: Trigger takeoverrun 2014/07/08 12:04:31.079008 [recoverd: 3132]: Node:0 was in recovery mode. Start recovery process 2014/07/08 12:04:31.080220 [recoverd: 3132]: server/ctdb_recoverd.c:1601 Starting do_recovery 2014/07/08 12:04:31.080779 [recoverd: 3132]: Taking out recovery lock from recovery daemon 2014/07/08 12:04:31.081232 [recoverd: 3132]: Take the recovery lock 2014/07/08 12:04:31.148991 [recoverd: 3132]: Recovery lock taken successfully 2014/07/08 12:04:31.149977 [recoverd: 3132]: ctdb_recovery_lock: Got recovery lock on '/cluster/ctdb/lockfile' 2014/07/08 12:04:31.151497 [recoverd: 3132]: Recovery lock taken successfully by recovery daemon 2014/07/08 12:04:31.152500 [recoverd: 3132]: server/ctdb_recoverd.c:1626 Recovery initiated due to problem with node 0 2014/07/08 12:04:31.153991 [recoverd: 3132]: server/ctdb_recoverd.c:1651 Recovery - created remote databases 2014/07/08 12:04:31.155126 [recoverd: 3132]: server/ctdb_recoverd.c:1658 Recovery - updated db priority for all databases 2014/07/08 12:04:31.156718 [ 3054]: Freeze priority 1 2014/07/08 12:04:31.157455 [ 3054]: Freeze priority 2 2014/07/08 12:04:31.158227 [ 3054]: Freeze priority 3 2014/07/08 12:04:31.162076 [ 3054]: server/ctdb_recover.c:989 startrecovery eventscript has been invoked 2014/07/08 12:04:31.505047 [recoverd: 3132]: server/ctdb_recoverd.c:1695 Recovery - updated flags 2014/07/08 12:04:31.507208 [recoverd: 3132]: server/ctdb_recoverd.c:1739 started transactions on all nodes 2014/07/08 12:04:31.507641 [recoverd: 3132]: server/ctdb_recoverd.c:1752 Recovery - starting database commits 2014/07/08 12:04:31.508455 [recoverd: 3132]: server/ctdb_recoverd.c:1764 Recovery - committed databases 2014/07/08 12:04:31.509662 [recoverd: 3132]: server/ctdb_recoverd.c:1814 Recovery - updated vnnmap 2014/07/08 12:04:31.511421 [recoverd: 3132]: server/ctdb_recoverd.c:1823 Recovery - updated recmaster 2014/07/08 12:04:31.519746 [recoverd: 3132]: server/ctdb_recoverd.c:1840 Recovery - updated flags 2014/07/08 12:04:31.523517 [ 3054]: server/ctdb_recover.c:612 Recovery mode set to NORMAL 2014/07/08 12:04:31.523682 [ 3054]: Thawing priority 1 2014/07/08 12:04:31.523707 [ 3054]: Release freeze handler for prio 1 2014/07/08 12:04:31.524040 [ 3054]: Thawing priority 2 2014/07/08 12:04:31.524076 [ 3054]: Release freeze handler for prio 2 2014/07/08 12:04:31.524231 [ 3054]: Thawing priority 3 2014/07/08 12:04:31.524308 [ 3054]: Release freeze handler for prio 3 2014/07/08 12:04:31.537070 [recoverd: 3132]: server/ctdb_recoverd.c:1849 Recovery - disabled recovery mode 2014/07/08 12:04:31.545396 [recoverd: 3132]: Failed to find node to cover ip 192.168.1.81 2014/07/08 12:04:31.549055 [recoverd: 3132]: Failed to find node to cover ip 192.168.1.80 2014/07/08 12:04:31.558856 [recoverd: 3132]: Disabling ip check for 9 seconds 2014/07/08 12:04:31.838212 [ 3054]: Recovery has finished 2014/07/08 12:04:31.966612 [ 3054]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 12:04:31.967317 [ 3054]: server/ctdb_monitor.c:262 wait for pending recoveries to end. Wait one more second. 2014/07/08 12:04:32.244651 [ 3054]: Set runstate to STARTUP (4) 2014/07/08 12:04:32.247212 [recoverd: 3132]: server/ctdb_recoverd.c:1873 Recovery - finished the recovered event 2014/07/08 12:04:32.250004 [recoverd: 3132]: server/ctdb_recoverd.c:1879 Recovery complete 2014/07/08 12:04:32.250004 [recoverd: 3132]: Resetting ban count to 0 for all nodes 2014/07/08 12:04:32.250004 [recoverd: 3132]: Just finished a recovery. New recoveries will now be supressed for the rerecovery timeout (10 seconds) 2014/07/08 12:04:32.968841 [ 3054]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 12:04:32.969135 [ 3054]: server/ctdb_monitor.c:262 wait for pending recoveries to end. Wait one more second. 2014/07/08 12:04:33.970932 [ 3054]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 12:04:33.971521 [ 3054]: server/ctdb_monitor.c:262 wait for pending recoveries to end. Wait one more second. 2014/07/08 12:04:34.972318 [ 3054]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 12:04:34.972676 [ 3054]: server/ctdb_monitor.c:262 wait for pending recoveries to end. Wait one more second. 2014/07/08 12:04:35.973089 [ 3054]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 12:04:35.973291 [ 3054]: server/ctdb_monitor.c:262 wait for pending recoveries to end. Wait one more second. 2014/07/08 12:04:36.973796 [ 3054]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 12:04:36.974241 [ 3054]: server/ctdb_monitor.c:262 wait for pending recoveries to end. Wait one more second. 2014/07/08 12:04:37.209923 [recoverd: 3132]: Daemon has exited - shutting down client 2014/07/08 12:04:37.217624 [recoverd: 3132]: CTDB recoverd: shutting down node 2 2014/07/08 12:09:53.564790 [ 3255]: CTDB starting on node 2014/07/08 12:09:53.621980 [ 3256]: Starting CTDBD (Version 2.3) as PID: 3256 2014/07/08 12:09:53.624703 [ 3256]: Created PID file /var/run/ctdb/ctdbd.pid 2014/07/08 12:09:53.628969 [ 3256]: Set scheduler to SCHED_FIFO 2014/07/08 12:09:53.629969 [ 3256]: Set runstate to INIT (1) 2014/07/08 12:09:54.385416 [ 3256]: Freeze priority 1 2014/07/08 12:09:54.430125 [ 3256]: Freeze priority 2 2014/07/08 12:09:54.442501 [ 3256]: Freeze priority 3 2014/07/08 12:09:54.462958 [ 3256]: server/ctdb_takeover.c:3239 Released 0 public IPs 2014/07/08 12:09:54.463081 [ 3256]: Set runstate to SETUP (2) 2014/07/08 12:09:54.989850 [ 3256]: Set runstate to FIRST_RECOVERY (3) 2014/07/08 12:09:54.992636 [ 3256]: Keepalive monitoring has been started 2014/07/08 12:09:54.993018 [ 3256]: Monitoring has been started 2014/07/08 12:09:55.016430 [recoverd: 3334]: monitor_cluster starting 2014/07/08 12:09:55.048891 [recoverd: 3334]: server/ctdb_recoverd.c:3483 Initial recovery master set - forcing election 2014/07/08 12:09:55.050346 [ 3256]: Freeze priority 1 2014/07/08 12:09:55.056504 [ 3256]: Freeze priority 2 2014/07/08 12:09:55.057850 [ 3256]: Freeze priority 3 2014/07/08 12:09:55.060245 [ 3256]: This node (1) is now the recovery master 2014/07/08 12:09:55.994680 [ 3256]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 12:09:56.996779 [ 3256]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 12:09:57.998366 [ 3256]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 12:09:58.067749 [recoverd: 3334]: server/ctdb_recoverd.c:1061 Election timed out 2014/07/08 12:09:58.074871 [recoverd: 3334]: The interfaces status has changed on local node 1 - force takeover run 2014/07/08 12:09:58.077478 [recoverd: 3334]: Trigger takeoverrun 2014/07/08 12:09:58.085232 [recoverd: 3334]: Node:1 was in recovery mode. Start recovery process 2014/07/08 12:09:58.086269 [recoverd: 3334]: server/ctdb_recoverd.c:1601 Starting do_recovery 2014/07/08 12:09:58.087228 [recoverd: 3334]: Taking out recovery lock from recovery daemon 2014/07/08 12:09:58.088374 [recoverd: 3334]: Take the recovery lock 2014/07/08 12:09:58.089982 [recoverd: 3334]: Recovery lock taken successfully 2014/07/08 12:09:58.091070 [recoverd: 3334]: ctdb_recovery_lock: Got recovery lock on '/cluster/ctdb/lockfile' 2014/07/08 12:09:58.092822 [recoverd: 3334]: Recovery lock taken successfully by recovery daemon 2014/07/08 12:09:58.093407 [recoverd: 3334]: server/ctdb_recoverd.c:1626 Recovery initiated due to problem with node 0 2014/07/08 12:09:58.095758 [recoverd: 3334]: server/ctdb_recoverd.c:1651 Recovery - created remote databases 2014/07/08 12:09:58.096337 [recoverd: 3334]: server/ctdb_recoverd.c:1658 Recovery - updated db priority for all databases 2014/07/08 12:09:58.098322 [ 3256]: Freeze priority 1 2014/07/08 12:09:58.108737 [ 3256]: Freeze priority 2 2014/07/08 12:09:58.109651 [ 3256]: Freeze priority 3 2014/07/08 12:09:58.110864 [ 3256]: server/ctdb_recover.c:989 startrecovery eventscript has been invoked 2014/07/08 12:09:58.396307 [recoverd: 3334]: server/ctdb_recoverd.c:1695 Recovery - updated flags 2014/07/08 12:09:58.402983 [recoverd: 3334]: server/ctdb_recoverd.c:1739 started transactions on all nodes 2014/07/08 12:09:58.404367 [recoverd: 3334]: server/ctdb_recoverd.c:1752 Recovery - starting database commits 2014/07/08 12:09:58.406675 [recoverd: 3334]: server/ctdb_recoverd.c:1764 Recovery - committed databases 2014/07/08 12:09:58.410263 [recoverd: 3334]: server/ctdb_recoverd.c:1814 Recovery - updated vnnmap 2014/07/08 12:09:58.416546 [recoverd: 3334]: server/ctdb_recoverd.c:1823 Recovery - updated recmaster 2014/07/08 12:09:58.426215 [recoverd: 3334]: server/ctdb_recoverd.c:1840 Recovery - updated flags 2014/07/08 12:09:58.428541 [ 3256]: server/ctdb_recover.c:612 Recovery mode set to NORMAL 2014/07/08 12:09:58.428682 [ 3256]: Thawing priority 1 2014/07/08 12:09:58.428708 [ 3256]: Release freeze handler for prio 1 2014/07/08 12:09:58.429194 [ 3256]: Thawing priority 2 2014/07/08 12:09:58.429232 [ 3256]: Release freeze handler for prio 2 2014/07/08 12:09:58.429382 [ 3256]: Thawing priority 3 2014/07/08 12:09:58.429434 [ 3256]: Release freeze handler for prio 3 2014/07/08 12:09:58.447295 [recoverd: 3334]: server/ctdb_recoverd.c:1849 Recovery - disabled recovery mode 2014/07/08 12:09:58.466453 [recoverd: 3334]: Failed to find node to cover ip 192.168.1.81 2014/07/08 12:09:58.467386 [recoverd: 3334]: Failed to find node to cover ip 192.168.1.80 2014/07/08 12:09:58.469791 [recoverd: 3334]: Disabling ip check for 9 seconds 2014/07/08 12:09:58.773772 [ 3256]: Recovery has finished 2014/07/08 12:09:59.001209 [ 3256]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 12:09:59.001812 [ 3256]: server/ctdb_monitor.c:262 wait for pending recoveries to end. Wait one more second. 2014/07/08 12:09:59.093872 [ 3256]: Set runstate to STARTUP (4) 2014/07/08 12:09:59.096577 [recoverd: 3334]: server/ctdb_recoverd.c:1873 Recovery - finished the recovered event 2014/07/08 12:09:59.098518 [recoverd: 3334]: server/ctdb_recoverd.c:1879 Recovery complete 2014/07/08 12:09:59.098603 [recoverd: 3334]: Resetting ban count to 0 for all nodes 2014/07/08 12:09:59.098634 [recoverd: 3334]: Just finished a recovery. New recoveries will now be supressed for the rerecovery timeout (10 seconds) 2014/07/08 12:10:00.002452 [ 3256]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 12:10:00.002770 [ 3256]: server/ctdb_monitor.c:262 wait for pending recoveries to end. Wait one more second. 2014/07/08 12:10:01.004214 [ 3256]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 12:10:01.006243 [ 3256]: server/ctdb_monitor.c:262 wait for pending recoveries to end. Wait one more second. 2014/07/08 12:10:02.007199 [ 3256]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 12:10:02.007584 [ 3256]: server/ctdb_monitor.c:262 wait for pending recoveries to end. Wait one more second. 2014/07/08 12:10:03.008773 [ 3256]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 12:10:03.009497 [ 3256]: server/ctdb_monitor.c:262 wait for pending recoveries to end. Wait one more second. 2014/07/08 12:10:04.010301 [ 3256]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 12:10:04.010722 [ 3256]: server/ctdb_monitor.c:262 wait for pending recoveries to end. Wait one more second. 2014/07/08 12:10:04.395050 [recoverd: 3334]: Daemon has exited - shutting down client 2014/07/08 12:10:04.395421 [recoverd: 3334]: CTDB recoverd: shutting down -- To unsubscribe, e-mail: opensuse-ha+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-ha+owner@opensuse.org