[opensuse-ha] Samba AD domain + CTDB
13.1 nodes with drbd and ocfs2 from the ha-factory repo. Hi First time here so please be gentle. Aim: add second failover file server for our AD domain. First time here so please be gentle. We have drbd and ocfs2 up on 2 nodes. We want to add ctbd on top of that for failover. The documentation: https://ctdb.samba.org/samba.html makes no mention of AD. In particular, how to join the cluster to the domain. 1. What next? Do we go straight to configure ctdb? 2. Is the 13.1 ctdb OK? 3. Does it work in AD? 4. Do we have any openSUSE specific samba cluster stuff? Thanks, Steve -- To unsubscribe, e-mail: opensuse-ha+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-ha+owner@opensuse.org
On Mon, 2014-07-07 at 12:22 +0200, steve wrote:
13.1 nodes with drbd and ocfs2 from the ha-factory repo.
Hi First time here so please be gentle. Aim: add second failover file server for our AD domain.
First time here so please be gentle. We have drbd and ocfs2 up on 2 nodes.
We want to add ctbd on top of that for failover. The documentation: https://ctdb.samba.org/samba.html makes no mention of AD. In particular, how to join the cluster to the domain.
1. What next? Do we go straight to configure ctdb? 2. Is the 13.1 ctdb OK? 3. Does it work in AD? 4. Do we have any openSUSE specific samba cluster stuff?
Thanks, Steve
OK. First attempt at ctdb: We have drbd syncing fine to the ocfs2 mounted partitions: cat /proc/drbd version: 8.4.4 (api:1/proto:86-101) GIT-hash: 3c1f46cb19993f98b22fdf7e18958c21ad75176d build by SuSE Build Service 1: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r----- ns:85 nr:168 dw:253 dr:1919 al:2 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0 On both nodes the ctdb config is: public_addresses 192.168.1.80/24 eth0 192.168.1.81/24 eth0 nodes 192.168.0.10 192.168.0.11 and drbd: global { usage-count yes; } common { protocol C; } resource r0 { net { protocol C; allow-two-primaries yes; } startup { become-primary-on both; } on smb1 { device /dev/drbd1; disk /dev/sdb1; address 192.168.0.10:7789; meta-disk internal; } on smb2 { device /dev/drbd1; disk /dev/sdb1; address 192.168.0.11:7789; meta-disk internal; } } CTDB_RECOVERY_LOCK="/cluster/ctbd/lockfile" CTDB_PUBLIC_INTERFACE=eth0 CTDB_PUBLIC_ADDRESSES=/etc/ctdb/public_addresses CTDB_LVS_PUBLIC_IP= CTDB_MANAGES_SAMBA=yes CTDB_SAMBA_SKIP_SHARE_CHECK=yes CTDB_NFS_SKIP_SHARE_CHECK=yes CTDB_MANAGES_WINBIND=yes CTDB_MANAGES_VSFTPD=yes CTDB_MANAGES_ISCSI=yes CTDB_INIT_STYLE= CTDB_SERVICE_SMB=smb CTDB_SERVICE_NMB=nmb CTDB_SERVICE_WINBIND=winbind CTDB_NODES=/etc/ctdb/nodes CTDB_NOTIFY_SCRIPT=/etc/ctdb/notify.sh CTDB_DBDIR=/var/lib/ctdb CTDB_DBDIR_PERSISTENT=/var/lib/ctdb/persistent CTDB_EVENT_SCRIPT_DIR=/etc/ctdb/events.d CTDB_SOCKET=/var/lib/ctdb/ctdb.socket CTDB_TRANSPORT="tcp" CTDB_MONITOR_FREE_MEMORY=100 CTDB_START_AS_DISABLED="yes" CTDB_CAPABILITY_RECMASTER=yes CTDB_CAPABILITY_LMASTER=yes NATGW_PUBLIC_IP= NATGW_PUBLIC_IFACE= NATGW_DEFAULT_GATEWAY= NATGW_PRIVATE_IFACE= NATGW_PRIVATE_NETWORK= NATGW_NODES=/etc/ctdb/natgw_nodes CTDB_LOGFILE=/var/log/ctdb/log.ctdb CTDB_DEBUGLEVEL=2 CTDB_OPTIONS= The ocfs2 stuff seems OK: ocfs2_dlmfs on /dlm type ocfs2_dlmfs (rw,relatime) /dev/drbd1 on /cluster type ocfs2 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,coherency=full,user_xattr,acl logs on starting ctdb: node 1 2014/07/08 11:06:38.417962 [ 4034]: CTDB starting on node 2014/07/08 11:06:38.469001 [ 4035]: Starting CTDBD (Version 2.3) as PID: 4035 2014/07/08 11:06:38.480647 [ 4035]: Created PID file /var/run/ctdb/ctdbd.pid 2014/07/08 11:06:38.482263 [ 4035]: Set scheduler to SCHED_FIFO 2014/07/08 11:06:38.483114 [ 4035]: Set runstate to INIT (1) 2014/07/08 11:06:39.066264 [ 4035]: 00.ctdb: WARNING: Cannot check databases since neither 2014/07/08 11:06:39.067005 [ 4035]: 00.ctdb: 'tdbdump' nor 'tdbtool check' is available. 2014/07/08 11:06:39.067080 [ 4035]: 00.ctdb: Consider installing tdbtool or at least tdbdump! 2014/07/08 11:06:39.212555 [ 4035]: Freeze priority 1 2014/07/08 11:06:39.250950 [ 4035]: Freeze priority 2 2014/07/08 11:06:39.270343 [ 4035]: Freeze priority 3 2014/07/08 11:06:39.309777 [ 4035]: server/ctdb_takeover.c:3239 Released 0 public IPs 2014/07/08 11:06:39.310120 [ 4035]: Set runstate to SETUP (2) 2014/07/08 11:06:39.858728 [ 4035]: Set runstate to FIRST_RECOVERY (3) 2014/07/08 11:06:39.862470 [ 4035]: Keepalive monitoring has been started 2014/07/08 11:06:39.862725 [ 4035]: Monitoring has been started 2014/07/08 11:06:39.903084 [recoverd: 4107]: monitor_cluster starting 2014/07/08 11:06:39.944166 [recoverd: 4107]: server/ctdb_recoverd.c:3483 Initial recovery master set - forcing election 2014/07/08 11:06:39.946208 [ 4035]: Freeze priority 1 2014/07/08 11:06:39.948072 [ 4035]: Freeze priority 2 2014/07/08 11:06:39.949359 [ 4035]: Freeze priority 3 2014/07/08 11:06:39.953538 [ 4035]: This node (0) is now the recovery master 2014/07/08 11:06:40.864448 [ 4035]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 11:06:41.867073 [ 4035]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 11:06:42.870148 [ 4035]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 11:06:42.964212 [recoverd: 4107]: server/ctdb_recoverd.c:1061 Election timed out 2014/07/08 11:06:42.975829 [recoverd: 4107]: The interfaces status has changed on local node 0 - force takeover run 2014/07/08 11:06:42.983077 [recoverd: 4107]: Trigger takeoverrun 2014/07/08 11:06:42.986293 [recoverd: 4107]: Node:0 was in recovery mode. Start recovery process 2014/07/08 11:06:42.987372 [recoverd: 4107]: server/ctdb_recoverd.c:1601 Starting do_recovery 2014/07/08 11:06:42.988486 [recoverd: 4107]: Taking out recovery lock from recovery daemon 2014/07/08 11:06:42.989591 [recoverd: 4107]: Take the recovery lock 2014/07/08 11:06:42.991088 [recoverd: 4107]: ctdb_recovery_lock: Unable to open /cluster/ctbd/lockfile - (No such file or directory) 2014/07/08 11:06:42.992301 [recoverd: 4107]: Unable to get recovery lock - aborting recovery and ban ourself for 300 seconds 2014/07/08 11:06:42.994374 [recoverd: 4107]: Banning node 0 for 300 seconds 2014/07/08 11:06:42.995048 [ 4035]: Banning this node for 300 seconds 2014/07/08 11:06:42.995122 [ 4035]: This node has been banned - forcing freeze and recovery 2014/07/08 11:06:42.995224 [ 4035]: server/ctdb_takeover.c:3239 Released 0 public IPs 2014/07/08 11:06:43.872389 [ 4035]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 11:06:44.873993 [ 4035]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 11:06:45.875147 [ 4035]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 11:06:46.876290 [ 4035]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 11:06:47.877996 [ 4035]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 11:06:48.880075 [ 4035]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 11:06:49.184701 [recoverd: 4107]: Daemon has exited - shutting down client 2014/07/08 11:06:49.196972 [recoverd: 4107]: CTDB recoverd: shutting down node 2 2014/07/08 11:04:45.733986 [ 6067]: CTDB starting on node 2014/07/08 11:04:45.797095 [ 6068]: Starting CTDBD (Version 2.3) as PID: 6068 2014/07/08 11:04:45.814858 [ 6068]: Created PID file /var/run/ctdb/ctdbd.pid 2014/07/08 11:04:45.818567 [ 6068]: Set scheduler to SCHED_FIFO 2014/07/08 11:04:45.819687 [ 6068]: Set runstate to INIT (1) 2014/07/08 11:04:46.446776 [ 6068]: 00.ctdb: WARNING: Cannot check databases since neither 2014/07/08 11:04:46.447162 [ 6068]: 00.ctdb: 'tdbdump' nor 'tdbtool check' is available. 2014/07/08 11:04:46.447231 [ 6068]: 00.ctdb: Consider installing tdbtool or at least tdbdump! 2014/07/08 11:04:46.599269 [ 6068]: Freeze priority 1 2014/07/08 11:04:46.654344 [ 6068]: Freeze priority 2 2014/07/08 11:04:46.683954 [ 6068]: Freeze priority 3 2014/07/08 11:04:46.721631 [ 6068]: server/ctdb_takeover.c:3239 Released 0 public IPs 2014/07/08 11:04:46.721781 [ 6068]: Set runstate to SETUP (2) 2014/07/08 11:04:47.342320 [ 6068]: Set runstate to FIRST_RECOVERY (3) 2014/07/08 11:04:47.346243 [ 6068]: Keepalive monitoring has been started 2014/07/08 11:04:47.346750 [ 6068]: Monitoring has been started 2014/07/08 11:04:47.376362 [recoverd: 6140]: monitor_cluster starting 2014/07/08 11:04:47.420852 [recoverd: 6140]: server/ctdb_recoverd.c:3483 Initial recovery master set - forcing election 2014/07/08 11:04:47.422705 [ 6068]: Freeze priority 1 2014/07/08 11:04:47.429362 [ 6068]: Freeze priority 2 2014/07/08 11:04:47.430955 [ 6068]: Freeze priority 3 2014/07/08 11:04:47.441080 [ 6068]: This node (1) is now the recovery master 2014/07/08 11:04:48.349129 [ 6068]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 11:04:49.351183 [ 6068]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 11:04:50.353470 [ 6068]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 11:04:50.447487 [recoverd: 6140]: server/ctdb_recoverd.c:1061 Election timed out 2014/07/08 11:04:50.459419 [recoverd: 6140]: The interfaces status has changed on local node 1 - force takeover run 2014/07/08 11:04:50.468723 [recoverd: 6140]: Trigger takeoverrun 2014/07/08 11:04:50.471716 [recoverd: 6140]: Node:1 was in recovery mode. Start recovery process 2014/07/08 11:04:50.473419 [recoverd: 6140]: server/ctdb_recoverd.c:1601 Starting do_recovery 2014/07/08 11:04:50.474646 [recoverd: 6140]: Taking out recovery lock from recovery daemon 2014/07/08 11:04:50.476860 [recoverd: 6140]: Take the recovery lock 2014/07/08 11:04:50.477977 [recoverd: 6140]: Recovery lock taken successfully 2014/07/08 11:04:50.488958 [recoverd: 6140]: ctdb_recovery_lock: Got recovery lock on '/cluster/ctdb/lockfile' 2014/07/08 11:04:50.489972 [recoverd: 6140]: Recovery lock taken successfully by recovery daemon 2014/07/08 11:04:50.491188 [recoverd: 6140]: server/ctdb_recoverd.c:1626 Recovery initiated due to problem with node 0 2014/07/08 11:04:50.492067 [recoverd: 6140]: server/ctdb_recoverd.c:1651 Recovery - created remote databases 2014/07/08 11:04:50.492657 [recoverd: 6140]: server/ctdb_recoverd.c:1658 Recovery - updated db priority for all databases 2014/07/08 11:04:50.493744 [ 6068]: Freeze priority 1 2014/07/08 11:04:50.494532 [ 6068]: Freeze priority 2 2014/07/08 11:04:50.495896 [ 6068]: Freeze priority 3 2014/07/08 11:04:50.503236 [ 6068]: server/ctdb_recover.c:989 startrecovery eventscript has been invoked 2014/07/08 11:04:50.873447 [recoverd: 6140]: server/ctdb_recoverd.c:1695 Recovery - updated flags 2014/07/08 11:04:50.877767 [recoverd: 6140]: server/ctdb_recoverd.c:1739 started transactions on all nodes 2014/07/08 11:04:50.878428 [recoverd: 6140]: server/ctdb_recoverd.c:1752 Recovery - starting database commits 2014/07/08 11:04:50.879417 [recoverd: 6140]: server/ctdb_recoverd.c:1764 Recovery - committed databases 2014/07/08 11:04:50.892275 [recoverd: 6140]: server/ctdb_recoverd.c:1814 Recovery - updated vnnmap 2014/07/08 11:04:50.906800 [recoverd: 6140]: server/ctdb_recoverd.c:1823 Recovery - updated recmaster 2014/07/08 11:04:50.909624 [recoverd: 6140]: server/ctdb_recoverd.c:1840 Recovery - updated flags 2014/07/08 11:04:50.910410 [ 6068]: server/ctdb_recover.c:612 Recovery mode set to NORMAL 2014/07/08 11:04:50.910595 [ 6068]: Thawing priority 1 2014/07/08 11:04:50.910653 [ 6068]: Release freeze handler for prio 1 2014/07/08 11:04:50.911029 [ 6068]: Thawing priority 2 2014/07/08 11:04:50.911106 [ 6068]: Release freeze handler for prio 2 2014/07/08 11:04:50.911290 [ 6068]: Thawing priority 3 2014/07/08 11:04:50.911362 [ 6068]: Release freeze handler for prio 3 2014/07/08 11:04:50.929367 [recoverd: 6140]: server/ctdb_recoverd.c:1849 Recovery - disabled recovery mode 2014/07/08 11:04:50.937270 [recoverd: 6140]: Failed to find node to cover ip 192.168.1.81 2014/07/08 11:04:50.938668 [recoverd: 6140]: Failed to find node to cover ip 192.168.1.80 2014/07/08 11:04:50.946251 [recoverd: 6140]: Disabling ip check for 9 seconds 2014/07/08 11:04:51.275565 [ 6068]: Recovery has finished 2014/07/08 11:04:51.355096 [ 6068]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 11:04:51.355531 [ 6068]: server/ctdb_monitor.c:262 wait for pending recoveries to end. Wait one more second. 2014/07/08 11:04:51.592071 [ 6068]: Set runstate to STARTUP (4) 2014/07/08 11:04:51.593962 [recoverd: 6140]: server/ctdb_recoverd.c:1873 Recovery - finished the recovered event 2014/07/08 11:04:51.595145 [recoverd: 6140]: server/ctdb_recoverd.c:1879 Recovery complete 2014/07/08 11:04:51.595199 [recoverd: 6140]: Resetting ban count to 0 for all nodes 2014/07/08 11:04:51.595224 [recoverd: 6140]: Just finished a recovery. New recoveries will now be supressed for the rerecovery timeout (10 seconds) 2014/07/08 11:04:52.356771 [ 6068]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 11:04:52.357135 [ 6068]: server/ctdb_monitor.c:262 wait for pending recoveries to end. Wait one more second. 2014/07/08 11:04:53.357738 [ 6068]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 11:04:53.358060 [ 6068]: server/ctdb_monitor.c:262 wait for pending recoveries to end. Wait one more second. 2014/07/08 11:04:54.359481 [ 6068]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 11:04:54.359998 [ 6068]: server/ctdb_monitor.c:262 wait for pending recoveries to end. Wait one more second. 2014/07/08 11:04:55.361130 [ 6068]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 11:04:55.361652 [ 6068]: server/ctdb_monitor.c:262 wait for pending recoveries to end. Wait one more second. 2014/07/08 11:04:56.363109 [ 6068]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 11:04:56.363409 [ 6068]: server/ctdb_monitor.c:262 wait for pending recoveries to end. Wait one more second. 2014/07/08 11:04:56.609997 [recoverd: 6140]: Daemon has exited - shutting down client 2014/07/08 11:04:56.614990 [recoverd: 6140]: CTDB recoverd: shutting down apparmor and firewalls non existent. 192.168.1.80 and 192.168.1.81 are the 'out to lan' interfaces 192.168.0.10 and 192.168.0.11 are the drbd crossover interfaces Not sure about what we need in: Public_addresses or in: nodes for the ctdb Any ideas of where to start to sort this out most welcome. Thanks folks, Steve -- To unsubscribe, e-mail: opensuse-ha+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-ha+owner@opensuse.org
On Tue, 2014-07-08 at 11:24 +0200, steve wrote:
2014/07/08 11:06:39.067080 [ 4035]: 00.ctdb: Consider installing tdbtool or at least tdbdump!
I'd recommend you install tdb-tools and try again -- To unsubscribe, e-mail: opensuse-ha+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-ha+owner@opensuse.org
On Tue, 2014-07-08 at 11:32 +0200, Richard Brown wrote:
On Tue, 2014-07-08 at 11:24 +0200, steve wrote:
2014/07/08 11:06:39.067080 [ 4035]: 00.ctdb: Consider installing tdbtool or at least tdbdump!
I'd recommend you install tdb-tools and try again
OK. With tdb-tools: node 1 2014/07/08 11:54:32.921389 [ 2856]: CTDB starting on node 2014/07/08 11:54:32.974367 [ 2857]: Starting CTDBD (Version 2.3) as PID: 2857 2014/07/08 11:54:32.985424 [ 2857]: Created PID file /var/run/ctdb/ctdbd.pid 2014/07/08 11:54:32.996422 [ 2857]: Set scheduler to SCHED_FIFO 2014/07/08 11:54:32.997392 [ 2857]: Set runstate to INIT (1) 2014/07/08 11:54:33.789104 [ 2857]: Freeze priority 1 2014/07/08 11:54:33.842150 [ 2857]: Freeze priority 2 2014/07/08 11:54:33.863673 [ 2857]: Freeze priority 3 2014/07/08 11:54:33.899042 [ 2857]: server/ctdb_takeover.c:3239 Released 0 public IPs 2014/07/08 11:54:33.899240 [ 2857]: Set runstate to SETUP (2) 2014/07/08 11:54:34.464217 [ 2857]: Set runstate to FIRST_RECOVERY (3) 2014/07/08 11:54:34.467391 [ 2857]: Keepalive monitoring has been started 2014/07/08 11:54:34.467923 [ 2857]: Monitoring has been started 2014/07/08 11:54:34.482718 [recoverd: 2935]: monitor_cluster starting 2014/07/08 11:54:34.525244 [recoverd: 2935]: server/ctdb_recoverd.c:3483 Initial recovery master set - forcing election 2014/07/08 11:54:34.527061 [ 2857]: Freeze priority 1 2014/07/08 11:54:34.528474 [ 2857]: Freeze priority 2 2014/07/08 11:54:34.529815 [ 2857]: Freeze priority 3 2014/07/08 11:54:34.540343 [ 2857]: This node (0) is now the recovery master 2014/07/08 11:54:35.469934 [ 2857]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 11:54:36.472449 [ 2857]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 11:54:37.474411 [ 2857]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 11:54:37.545470 [recoverd: 2935]: server/ctdb_recoverd.c:1061 Election timed out 2014/07/08 11:54:37.551942 [recoverd: 2935]: The interfaces status has changed on local node 0 - force takeover run 2014/07/08 11:54:37.554449 [recoverd: 2935]: Trigger takeoverrun 2014/07/08 11:54:37.557044 [recoverd: 2935]: Node:0 was in recovery mode. Start recovery process 2014/07/08 11:54:37.562645 [recoverd: 2935]: server/ctdb_recoverd.c:1601 Starting do_recovery 2014/07/08 11:54:37.563916 [recoverd: 2935]: Taking out recovery lock from recovery daemon 2014/07/08 11:54:37.564405 [recoverd: 2935]: Take the recovery lock 2014/07/08 11:54:37.565214 [recoverd: 2935]: ctdb_recovery_lock: Unable to open /cluster/ctbd/lockfile - (No such file or directory) 2014/07/08 11:54:37.566152 [recoverd: 2935]: Unable to get recovery lock - aborting recovery and ban ourself for 300 seconds 2014/07/08 11:54:37.567149 [recoverd: 2935]: Banning node 0 for 300 seconds 2014/07/08 11:54:37.567754 [ 2857]: Banning this node for 300 seconds 2014/07/08 11:54:37.567956 [ 2857]: This node has been banned - forcing freeze and recovery 2014/07/08 11:54:37.568059 [ 2857]: server/ctdb_takeover.c:3239 Released 0 public IPs 2014/07/08 11:54:38.476148 [ 2857]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 11:54:39.477229 [ 2857]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 11:54:40.478881 [ 2857]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 11:54:41.480322 [ 2857]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 11:54:42.481311 [ 2857]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 11:54:43.482493 [ 2857]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 11:54:43.639731 [recoverd: 2935]: Daemon has exited - shutting down client 2014/07/08 11:54:43.640344 [recoverd: 2935]: CTDB recoverd: shutting down node 2 2014/07/08 11:54:12.635083 [ 2590]: CTDB starting on node 2014/07/08 11:54:12.695604 [ 2591]: Starting CTDBD (Version 2.3) as PID: 2591 2014/07/08 11:54:12.708577 [ 2591]: Created PID file /var/run/ctdb/ctdbd.pid 2014/07/08 11:54:12.711440 [ 2591]: Set scheduler to SCHED_FIFO 2014/07/08 11:54:12.712559 [ 2591]: Set runstate to INIT (1) 2014/07/08 11:54:13.641112 [ 2591]: Freeze priority 1 2014/07/08 11:54:13.670636 [ 2591]: Freeze priority 2 2014/07/08 11:54:13.698589 [ 2591]: Freeze priority 3 2014/07/08 11:54:13.744151 [ 2591]: server/ctdb_takeover.c:3239 Released 0 public IPs 2014/07/08 11:54:13.744348 [ 2591]: Set runstate to SETUP (2) 2014/07/08 11:54:14.296494 [ 2591]: Set runstate to FIRST_RECOVERY (3) 2014/07/08 11:54:14.301281 [ 2591]: Keepalive monitoring has been started 2014/07/08 11:54:14.301672 [ 2591]: Monitoring has been started 2014/07/08 11:54:14.336024 [recoverd: 2669]: monitor_cluster starting 2014/07/08 11:54:14.380332 [recoverd: 2669]: server/ctdb_recoverd.c:3483 Initial recovery master set - forcing election 2014/07/08 11:54:14.383895 [ 2591]: Freeze priority 1 2014/07/08 11:54:14.384816 [ 2591]: Freeze priority 2 2014/07/08 11:54:14.385612 [ 2591]: Freeze priority 3 2014/07/08 11:54:14.388271 [ 2591]: This node (1) is now the recovery master 2014/07/08 11:54:15.303181 [ 2591]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 11:54:16.304167 [ 2591]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 11:54:17.305626 [ 2591]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 11:54:17.393629 [recoverd: 2669]: server/ctdb_recoverd.c:1061 Election timed out 2014/07/08 11:54:17.401504 [recoverd: 2669]: The interfaces status has changed on local node 1 - force takeover run 2014/07/08 11:54:17.404523 [recoverd: 2669]: Trigger takeoverrun 2014/07/08 11:54:17.414122 [recoverd: 2669]: Node:1 was in recovery mode. Start recovery process 2014/07/08 11:54:17.415352 [recoverd: 2669]: server/ctdb_recoverd.c:1601 Starting do_recovery 2014/07/08 11:54:17.415863 [recoverd: 2669]: Taking out recovery lock from recovery daemon 2014/07/08 11:54:17.416346 [recoverd: 2669]: Take the recovery lock 2014/07/08 11:54:17.417857 [recoverd: 2669]: Recovery lock taken successfully 2014/07/08 11:54:17.419955 [recoverd: 2669]: ctdb_recovery_lock: Got recovery lock on '/cluster/ctdb/lockfile' 2014/07/08 11:54:17.420958 [recoverd: 2669]: Recovery lock taken successfully by recovery daemon 2014/07/08 11:54:17.422371 [recoverd: 2669]: server/ctdb_recoverd.c:1626 Recovery initiated due to problem with node 0 2014/07/08 11:54:17.431247 [recoverd: 2669]: server/ctdb_recoverd.c:1651 Recovery - created remote databases 2014/07/08 11:54:17.431810 [recoverd: 2669]: server/ctdb_recoverd.c:1658 Recovery - updated db priority for all databases 2014/07/08 11:54:17.433554 [ 2591]: Freeze priority 1 2014/07/08 11:54:17.434268 [ 2591]: Freeze priority 2 2014/07/08 11:54:17.435149 [ 2591]: Freeze priority 3 2014/07/08 11:54:17.436605 [ 2591]: server/ctdb_recover.c:989 startrecovery eventscript has been invoked 2014/07/08 11:54:17.702530 [recoverd: 2669]: server/ctdb_recoverd.c:1695 Recovery - updated flags 2014/07/08 11:54:17.706249 [recoverd: 2669]: server/ctdb_recoverd.c:1739 started transactions on all nodes 2014/07/08 11:54:17.707564 [recoverd: 2669]: server/ctdb_recoverd.c:1752 Recovery - starting database commits 2014/07/08 11:54:17.708919 [recoverd: 2669]: server/ctdb_recoverd.c:1764 Recovery - committed databases 2014/07/08 11:54:17.710593 [recoverd: 2669]: server/ctdb_recoverd.c:1814 Recovery - updated vnnmap 2014/07/08 11:54:17.712654 [recoverd: 2669]: server/ctdb_recoverd.c:1823 Recovery - updated recmaster 2014/07/08 11:54:17.719478 [recoverd: 2669]: server/ctdb_recoverd.c:1840 Recovery - updated flags 2014/07/08 11:54:17.724280 [ 2591]: server/ctdb_recover.c:612 Recovery mode set to NORMAL 2014/07/08 11:54:17.724609 [ 2591]: Thawing priority 1 2014/07/08 11:54:17.724661 [ 2591]: Release freeze handler for prio 1 2014/07/08 11:54:17.725001 [ 2591]: Thawing priority 2 2014/07/08 11:54:17.725039 [ 2591]: Release freeze handler for prio 2 2014/07/08 11:54:17.725177 [ 2591]: Thawing priority 3 2014/07/08 11:54:17.725210 [ 2591]: Release freeze handler for prio 3 2014/07/08 11:54:17.740456 [recoverd: 2669]: server/ctdb_recoverd.c:1849 Recovery - disabled recovery mode 2014/07/08 11:54:17.747439 [recoverd: 2669]: Failed to find node to cover ip 192.168.1.81 2014/07/08 11:54:17.748751 [recoverd: 2669]: Failed to find node to cover ip 192.168.1.80 2014/07/08 11:54:17.753058 [recoverd: 2669]: Disabling ip check for 9 seconds 2014/07/08 11:54:18.096692 [ 2591]: Recovery has finished 2014/07/08 11:54:18.307148 [ 2591]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 11:54:18.307710 [ 2591]: server/ctdb_monitor.c:262 wait for pending recoveries to end. Wait one more second. 2014/07/08 11:54:18.466053 [ 2591]: Set runstate to STARTUP (4) 2014/07/08 11:54:18.467942 [recoverd: 2669]: server/ctdb_recoverd.c:1873 Recovery - finished the recovered event 2014/07/08 11:54:18.470367 [recoverd: 2669]: server/ctdb_recoverd.c:1879 Recovery complete 2014/07/08 11:54:18.470421 [recoverd: 2669]: Resetting ban count to 0 for all nodes 2014/07/08 11:54:18.470447 [recoverd: 2669]: Just finished a recovery. New recoveries will now be supressed for the rerecovery timeout (10 seconds) 2014/07/08 11:54:19.308460 [ 2591]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 11:54:19.308735 [ 2591]: server/ctdb_monitor.c:262 wait for pending recoveries to end. Wait one more second. 2014/07/08 11:54:20.309499 [ 2591]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 11:54:20.309860 [ 2591]: server/ctdb_monitor.c:262 wait for pending recoveries to end. Wait one more second. 2014/07/08 11:54:21.311258 [ 2591]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 11:54:21.311583 [ 2591]: server/ctdb_monitor.c:262 wait for pending recoveries to end. Wait one more second. 2014/07/08 11:54:22.312506 [ 2591]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 11:54:22.312853 [ 2591]: server/ctdb_monitor.c:262 wait for pending recoveries to end. Wait one more second. 2014/07/08 11:54:23.314261 [ 2591]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 11:54:23.314539 [ 2591]: server/ctdb_monitor.c:262 wait for pending recoveries to end. Wait one more second. 2014/07/08 11:54:23.472970 [recoverd: 2669]: Daemon has exited - shutting down client 2014/07/08 11:54:23.483875 [recoverd: 2669]: CTDB recoverd: shutting down -- To unsubscribe, e-mail: opensuse-ha+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-ha+owner@opensuse.org
On Tue, 2014-07-08 at 11:59 +0200, steve wrote:
On Tue, 2014-07-08 at 11:32 +0200, Richard Brown wrote:
On Tue, 2014-07-08 at 11:24 +0200, steve wrote:
2014/07/08 11:06:39.067080 [ 4035]: 00.ctdb: Consider installing tdbtool or at least tdbdump!
I'd recommend you install tdb-tools and try again
OK. With tdb-tools: node 1
2014/07/08 11:54:32.921389 [ 2856]: CTDB starting on node 2014/07/08 11:54:32.974367 [ 2857]: Starting CTDBD (Version 2.3) as PID: 2857 2014/07/08 11:54:32.985424 [ 2857]: Created PID file /var/run/ctdb/ctdbd.pid 2014/07/08 11:54:32.996422 [ 2857]: Set scheduler to SCHED_FIFO 2014/07/08 11:54:32.997392 [ 2857]: Set runstate to INIT (1) 2014/07/08 11:54:33.789104 [ 2857]: Freeze priority 1 2014/07/08 11:54:33.842150 [ 2857]: Freeze priority 2 2014/07/08 11:54:33.863673 [ 2857]: Freeze priority 3 2014/07/08 11:54:33.899042 [ 2857]: server/ctdb_takeover.c:3239 Released 0 public IPs 2014/07/08 11:54:33.899240 [ 2857]: Set runstate to SETUP (2) 2014/07/08 11:54:34.464217 [ 2857]: Set runstate to FIRST_RECOVERY (3) 2014/07/08 11:54:34.467391 [ 2857]: Keepalive monitoring has been started 2014/07/08 11:54:34.467923 [ 2857]: Monitoring has been started 2014/07/08 11:54:34.482718 [recoverd: 2935]: monitor_cluster starting 2014/07/08 11:54:34.525244 [recoverd: 2935]: server/ctdb_recoverd.c:3483 Initial recovery master set - forcing election 2014/07/08 11:54:34.527061 [ 2857]: Freeze priority 1 2014/07/08 11:54:34.528474 [ 2857]: Freeze priority 2 2014/07/08 11:54:34.529815 [ 2857]: Freeze priority 3 2014/07/08 11:54:34.540343 [ 2857]: This node (0) is now the recovery master 2014/07/08 11:54:35.469934 [ 2857]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 11:54:36.472449 [ 2857]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 11:54:37.474411 [ 2857]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 11:54:37.545470 [recoverd: 2935]: server/ctdb_recoverd.c:1061 Election timed out 2014/07/08 11:54:37.551942 [recoverd: 2935]: The interfaces status has changed on local node 0 - force takeover run 2014/07/08 11:54:37.554449 [recoverd: 2935]: Trigger takeoverrun 2014/07/08 11:54:37.557044 [recoverd: 2935]: Node:0 was in recovery mode. Start recovery process 2014/07/08 11:54:37.562645 [recoverd: 2935]: server/ctdb_recoverd.c:1601 Starting do_recovery 2014/07/08 11:54:37.563916 [recoverd: 2935]: Taking out recovery lock from recovery daemon 2014/07/08 11:54:37.564405 [recoverd: 2935]: Take the recovery lock 2014/07/08 11:54:37.565214 [recoverd: 2935]: ctdb_recovery_lock: Unable to open /cluster/ctbd/lockfile - (No such file or directory) 2014/07/08 11:54:37.566152 [recoverd: 2935]: Unable to get recovery lock - aborting recovery and ban ourself for 300 seconds 2014/07/08 11:54:37.567149 [recoverd: 2935]: Banning node 0 for 300 seconds 2014/07/08 11:54:37.567754 [ 2857]: Banning this node for 300 seconds 2014/07/08 11:54:37.567956 [ 2857]: This node has been banned - forcing freeze and recovery 2014/07/08 11:54:37.568059 [ 2857]: server/ctdb_takeover.c:3239 Released 0 public IPs 2014/07/08 11:54:38.476148 [ 2857]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 11:54:39.477229 [ 2857]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 11:54:40.478881 [ 2857]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 11:54:41.480322 [ 2857]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 11:54:42.481311 [ 2857]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 11:54:43.482493 [ 2857]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 11:54:43.639731 [recoverd: 2935]: Daemon has exited - shutting down client 2014/07/08 11:54:43.640344 [recoverd: 2935]: CTDB recoverd: shutting down
Sorry. Corrected the lockfile error: node 1 2014/07/08 12:04:26.400055 [ 3053]: CTDB starting on node 2014/07/08 12:04:26.451932 [ 3054]: Starting CTDBD (Version 2.3) as PID: 3054 2014/07/08 12:04:26.453895 [ 3054]: Created PID file /var/run/ctdb/ctdbd.pid 2014/07/08 12:04:26.456486 [ 3054]: Set scheduler to SCHED_FIFO 2014/07/08 12:04:26.457501 [ 3054]: Set runstate to INIT (1) 2014/07/08 12:04:27.220775 [ 3054]: Freeze priority 1 2014/07/08 12:04:27.267380 [ 3054]: Freeze priority 2 2014/07/08 12:04:27.298602 [ 3054]: Freeze priority 3 2014/07/08 12:04:27.326972 [ 3054]: server/ctdb_takeover.c:3239 Released 0 public IPs 2014/07/08 12:04:27.327258 [ 3054]: Set runstate to SETUP (2) 2014/07/08 12:04:27.953805 [ 3054]: Set runstate to FIRST_RECOVERY (3) 2014/07/08 12:04:27.957760 [ 3054]: Keepalive monitoring has been started 2014/07/08 12:04:27.958014 [ 3054]: Monitoring has been started 2014/07/08 12:04:28.003672 [recoverd: 3132]: monitor_cluster starting 2014/07/08 12:04:28.041949 [recoverd: 3132]: server/ctdb_recoverd.c:3483 Initial recovery master set - forcing election 2014/07/08 12:04:28.043504 [ 3054]: Freeze priority 1 2014/07/08 12:04:28.044779 [ 3054]: Freeze priority 2 2014/07/08 12:04:28.045983 [ 3054]: Freeze priority 3 2014/07/08 12:04:28.050910 [ 3054]: This node (0) is now the recovery master 2014/07/08 12:04:28.959722 [ 3054]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 12:04:29.961769 [ 3054]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 12:04:30.964780 [ 3054]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 12:04:31.057385 [recoverd: 3132]: server/ctdb_recoverd.c:1061 Election timed out 2014/07/08 12:04:31.065478 [recoverd: 3132]: The interfaces status has changed on local node 0 - force takeover run 2014/07/08 12:04:31.068229 [recoverd: 3132]: Trigger takeoverrun 2014/07/08 12:04:31.079008 [recoverd: 3132]: Node:0 was in recovery mode. Start recovery process 2014/07/08 12:04:31.080220 [recoverd: 3132]: server/ctdb_recoverd.c:1601 Starting do_recovery 2014/07/08 12:04:31.080779 [recoverd: 3132]: Taking out recovery lock from recovery daemon 2014/07/08 12:04:31.081232 [recoverd: 3132]: Take the recovery lock 2014/07/08 12:04:31.148991 [recoverd: 3132]: Recovery lock taken successfully 2014/07/08 12:04:31.149977 [recoverd: 3132]: ctdb_recovery_lock: Got recovery lock on '/cluster/ctdb/lockfile' 2014/07/08 12:04:31.151497 [recoverd: 3132]: Recovery lock taken successfully by recovery daemon 2014/07/08 12:04:31.152500 [recoverd: 3132]: server/ctdb_recoverd.c:1626 Recovery initiated due to problem with node 0 2014/07/08 12:04:31.153991 [recoverd: 3132]: server/ctdb_recoverd.c:1651 Recovery - created remote databases 2014/07/08 12:04:31.155126 [recoverd: 3132]: server/ctdb_recoverd.c:1658 Recovery - updated db priority for all databases 2014/07/08 12:04:31.156718 [ 3054]: Freeze priority 1 2014/07/08 12:04:31.157455 [ 3054]: Freeze priority 2 2014/07/08 12:04:31.158227 [ 3054]: Freeze priority 3 2014/07/08 12:04:31.162076 [ 3054]: server/ctdb_recover.c:989 startrecovery eventscript has been invoked 2014/07/08 12:04:31.505047 [recoverd: 3132]: server/ctdb_recoverd.c:1695 Recovery - updated flags 2014/07/08 12:04:31.507208 [recoverd: 3132]: server/ctdb_recoverd.c:1739 started transactions on all nodes 2014/07/08 12:04:31.507641 [recoverd: 3132]: server/ctdb_recoverd.c:1752 Recovery - starting database commits 2014/07/08 12:04:31.508455 [recoverd: 3132]: server/ctdb_recoverd.c:1764 Recovery - committed databases 2014/07/08 12:04:31.509662 [recoverd: 3132]: server/ctdb_recoverd.c:1814 Recovery - updated vnnmap 2014/07/08 12:04:31.511421 [recoverd: 3132]: server/ctdb_recoverd.c:1823 Recovery - updated recmaster 2014/07/08 12:04:31.519746 [recoverd: 3132]: server/ctdb_recoverd.c:1840 Recovery - updated flags 2014/07/08 12:04:31.523517 [ 3054]: server/ctdb_recover.c:612 Recovery mode set to NORMAL 2014/07/08 12:04:31.523682 [ 3054]: Thawing priority 1 2014/07/08 12:04:31.523707 [ 3054]: Release freeze handler for prio 1 2014/07/08 12:04:31.524040 [ 3054]: Thawing priority 2 2014/07/08 12:04:31.524076 [ 3054]: Release freeze handler for prio 2 2014/07/08 12:04:31.524231 [ 3054]: Thawing priority 3 2014/07/08 12:04:31.524308 [ 3054]: Release freeze handler for prio 3 2014/07/08 12:04:31.537070 [recoverd: 3132]: server/ctdb_recoverd.c:1849 Recovery - disabled recovery mode 2014/07/08 12:04:31.545396 [recoverd: 3132]: Failed to find node to cover ip 192.168.1.81 2014/07/08 12:04:31.549055 [recoverd: 3132]: Failed to find node to cover ip 192.168.1.80 2014/07/08 12:04:31.558856 [recoverd: 3132]: Disabling ip check for 9 seconds 2014/07/08 12:04:31.838212 [ 3054]: Recovery has finished 2014/07/08 12:04:31.966612 [ 3054]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 12:04:31.967317 [ 3054]: server/ctdb_monitor.c:262 wait for pending recoveries to end. Wait one more second. 2014/07/08 12:04:32.244651 [ 3054]: Set runstate to STARTUP (4) 2014/07/08 12:04:32.247212 [recoverd: 3132]: server/ctdb_recoverd.c:1873 Recovery - finished the recovered event 2014/07/08 12:04:32.250004 [recoverd: 3132]: server/ctdb_recoverd.c:1879 Recovery complete 2014/07/08 12:04:32.250004 [recoverd: 3132]: Resetting ban count to 0 for all nodes 2014/07/08 12:04:32.250004 [recoverd: 3132]: Just finished a recovery. New recoveries will now be supressed for the rerecovery timeout (10 seconds) 2014/07/08 12:04:32.968841 [ 3054]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 12:04:32.969135 [ 3054]: server/ctdb_monitor.c:262 wait for pending recoveries to end. Wait one more second. 2014/07/08 12:04:33.970932 [ 3054]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 12:04:33.971521 [ 3054]: server/ctdb_monitor.c:262 wait for pending recoveries to end. Wait one more second. 2014/07/08 12:04:34.972318 [ 3054]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 12:04:34.972676 [ 3054]: server/ctdb_monitor.c:262 wait for pending recoveries to end. Wait one more second. 2014/07/08 12:04:35.973089 [ 3054]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 12:04:35.973291 [ 3054]: server/ctdb_monitor.c:262 wait for pending recoveries to end. Wait one more second. 2014/07/08 12:04:36.973796 [ 3054]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 12:04:36.974241 [ 3054]: server/ctdb_monitor.c:262 wait for pending recoveries to end. Wait one more second. 2014/07/08 12:04:37.209923 [recoverd: 3132]: Daemon has exited - shutting down client 2014/07/08 12:04:37.217624 [recoverd: 3132]: CTDB recoverd: shutting down node 2 2014/07/08 12:09:53.564790 [ 3255]: CTDB starting on node 2014/07/08 12:09:53.621980 [ 3256]: Starting CTDBD (Version 2.3) as PID: 3256 2014/07/08 12:09:53.624703 [ 3256]: Created PID file /var/run/ctdb/ctdbd.pid 2014/07/08 12:09:53.628969 [ 3256]: Set scheduler to SCHED_FIFO 2014/07/08 12:09:53.629969 [ 3256]: Set runstate to INIT (1) 2014/07/08 12:09:54.385416 [ 3256]: Freeze priority 1 2014/07/08 12:09:54.430125 [ 3256]: Freeze priority 2 2014/07/08 12:09:54.442501 [ 3256]: Freeze priority 3 2014/07/08 12:09:54.462958 [ 3256]: server/ctdb_takeover.c:3239 Released 0 public IPs 2014/07/08 12:09:54.463081 [ 3256]: Set runstate to SETUP (2) 2014/07/08 12:09:54.989850 [ 3256]: Set runstate to FIRST_RECOVERY (3) 2014/07/08 12:09:54.992636 [ 3256]: Keepalive monitoring has been started 2014/07/08 12:09:54.993018 [ 3256]: Monitoring has been started 2014/07/08 12:09:55.016430 [recoverd: 3334]: monitor_cluster starting 2014/07/08 12:09:55.048891 [recoverd: 3334]: server/ctdb_recoverd.c:3483 Initial recovery master set - forcing election 2014/07/08 12:09:55.050346 [ 3256]: Freeze priority 1 2014/07/08 12:09:55.056504 [ 3256]: Freeze priority 2 2014/07/08 12:09:55.057850 [ 3256]: Freeze priority 3 2014/07/08 12:09:55.060245 [ 3256]: This node (1) is now the recovery master 2014/07/08 12:09:55.994680 [ 3256]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 12:09:56.996779 [ 3256]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 12:09:57.998366 [ 3256]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 12:09:58.067749 [recoverd: 3334]: server/ctdb_recoverd.c:1061 Election timed out 2014/07/08 12:09:58.074871 [recoverd: 3334]: The interfaces status has changed on local node 1 - force takeover run 2014/07/08 12:09:58.077478 [recoverd: 3334]: Trigger takeoverrun 2014/07/08 12:09:58.085232 [recoverd: 3334]: Node:1 was in recovery mode. Start recovery process 2014/07/08 12:09:58.086269 [recoverd: 3334]: server/ctdb_recoverd.c:1601 Starting do_recovery 2014/07/08 12:09:58.087228 [recoverd: 3334]: Taking out recovery lock from recovery daemon 2014/07/08 12:09:58.088374 [recoverd: 3334]: Take the recovery lock 2014/07/08 12:09:58.089982 [recoverd: 3334]: Recovery lock taken successfully 2014/07/08 12:09:58.091070 [recoverd: 3334]: ctdb_recovery_lock: Got recovery lock on '/cluster/ctdb/lockfile' 2014/07/08 12:09:58.092822 [recoverd: 3334]: Recovery lock taken successfully by recovery daemon 2014/07/08 12:09:58.093407 [recoverd: 3334]: server/ctdb_recoverd.c:1626 Recovery initiated due to problem with node 0 2014/07/08 12:09:58.095758 [recoverd: 3334]: server/ctdb_recoverd.c:1651 Recovery - created remote databases 2014/07/08 12:09:58.096337 [recoverd: 3334]: server/ctdb_recoverd.c:1658 Recovery - updated db priority for all databases 2014/07/08 12:09:58.098322 [ 3256]: Freeze priority 1 2014/07/08 12:09:58.108737 [ 3256]: Freeze priority 2 2014/07/08 12:09:58.109651 [ 3256]: Freeze priority 3 2014/07/08 12:09:58.110864 [ 3256]: server/ctdb_recover.c:989 startrecovery eventscript has been invoked 2014/07/08 12:09:58.396307 [recoverd: 3334]: server/ctdb_recoverd.c:1695 Recovery - updated flags 2014/07/08 12:09:58.402983 [recoverd: 3334]: server/ctdb_recoverd.c:1739 started transactions on all nodes 2014/07/08 12:09:58.404367 [recoverd: 3334]: server/ctdb_recoverd.c:1752 Recovery - starting database commits 2014/07/08 12:09:58.406675 [recoverd: 3334]: server/ctdb_recoverd.c:1764 Recovery - committed databases 2014/07/08 12:09:58.410263 [recoverd: 3334]: server/ctdb_recoverd.c:1814 Recovery - updated vnnmap 2014/07/08 12:09:58.416546 [recoverd: 3334]: server/ctdb_recoverd.c:1823 Recovery - updated recmaster 2014/07/08 12:09:58.426215 [recoverd: 3334]: server/ctdb_recoverd.c:1840 Recovery - updated flags 2014/07/08 12:09:58.428541 [ 3256]: server/ctdb_recover.c:612 Recovery mode set to NORMAL 2014/07/08 12:09:58.428682 [ 3256]: Thawing priority 1 2014/07/08 12:09:58.428708 [ 3256]: Release freeze handler for prio 1 2014/07/08 12:09:58.429194 [ 3256]: Thawing priority 2 2014/07/08 12:09:58.429232 [ 3256]: Release freeze handler for prio 2 2014/07/08 12:09:58.429382 [ 3256]: Thawing priority 3 2014/07/08 12:09:58.429434 [ 3256]: Release freeze handler for prio 3 2014/07/08 12:09:58.447295 [recoverd: 3334]: server/ctdb_recoverd.c:1849 Recovery - disabled recovery mode 2014/07/08 12:09:58.466453 [recoverd: 3334]: Failed to find node to cover ip 192.168.1.81 2014/07/08 12:09:58.467386 [recoverd: 3334]: Failed to find node to cover ip 192.168.1.80 2014/07/08 12:09:58.469791 [recoverd: 3334]: Disabling ip check for 9 seconds 2014/07/08 12:09:58.773772 [ 3256]: Recovery has finished 2014/07/08 12:09:59.001209 [ 3256]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 12:09:59.001812 [ 3256]: server/ctdb_monitor.c:262 wait for pending recoveries to end. Wait one more second. 2014/07/08 12:09:59.093872 [ 3256]: Set runstate to STARTUP (4) 2014/07/08 12:09:59.096577 [recoverd: 3334]: server/ctdb_recoverd.c:1873 Recovery - finished the recovered event 2014/07/08 12:09:59.098518 [recoverd: 3334]: server/ctdb_recoverd.c:1879 Recovery complete 2014/07/08 12:09:59.098603 [recoverd: 3334]: Resetting ban count to 0 for all nodes 2014/07/08 12:09:59.098634 [recoverd: 3334]: Just finished a recovery. New recoveries will now be supressed for the rerecovery timeout (10 seconds) 2014/07/08 12:10:00.002452 [ 3256]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 12:10:00.002770 [ 3256]: server/ctdb_monitor.c:262 wait for pending recoveries to end. Wait one more second. 2014/07/08 12:10:01.004214 [ 3256]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 12:10:01.006243 [ 3256]: server/ctdb_monitor.c:262 wait for pending recoveries to end. Wait one more second. 2014/07/08 12:10:02.007199 [ 3256]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 12:10:02.007584 [ 3256]: server/ctdb_monitor.c:262 wait for pending recoveries to end. Wait one more second. 2014/07/08 12:10:03.008773 [ 3256]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 12:10:03.009497 [ 3256]: server/ctdb_monitor.c:262 wait for pending recoveries to end. Wait one more second. 2014/07/08 12:10:04.010301 [ 3256]: CTDB_WAIT_UNTIL_RECOVERED 2014/07/08 12:10:04.010722 [ 3256]: server/ctdb_monitor.c:262 wait for pending recoveries to end. Wait one more second. 2014/07/08 12:10:04.395050 [recoverd: 3334]: Daemon has exited - shutting down client 2014/07/08 12:10:04.395421 [recoverd: 3334]: CTDB recoverd: shutting down -- To unsubscribe, e-mail: opensuse-ha+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-ha+owner@opensuse.org
On Tue, 2014-07-08 at 12:10 +0200, steve wrote:
On Tue, 2014-07-08 at 11:59 +0200, steve wrote:
On Tue, 2014-07-08 at 11:32 +0200, Richard Brown wrote:
On Tue, 2014-07-08 at 11:24 +0200, steve wrote:
2014/07/08 11:06:39.067080 [ 4035]: 00.ctdb: Consider installing tdbtool or at least tdbdump!
I'd recommend you install tdb-tools and try again
OK Getting somewhere. The command: systemctl start ctdb Issuing: ctdbd success: ps aux|grep ctdb root 3932 2.8 0.6 3132 3112 ? SLs 12:22 0:03 ctdbd root 4003 0.8 0.1 3136 812 ? S 12:22 0:01 ctdbd (never thought to check) and: node 1: ctdb status Number of nodes:2 pnn:0 192.168.0.10 OK (THIS NODE) pnn:1 192.168.0.11 OK Generation:538367990 Size:2 hash:0 lmaster:0 hash:1 lmaster:1 Recovery mode:NORMAL (0) Recovery master:1 node 2: ctdb status Number of nodes:2 pnn:0 192.168.0.10 OK pnn:1 192.168.0.11 OK (THIS NODE) Generation:538367990 Size:2 hash:0 lmaster:0 hash:1 lmaster:1 Recovery mode:NORMAL (0) Recovery master:1 Qn. What's the official way of starting the daemon? Thanks -- To unsubscribe, e-mail: opensuse-ha+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-ha+owner@opensuse.org
On Tue, 2014-07-08 at 12:29 +0200, steve wrote:
On Tue, 2014-07-08 at 12:10 +0200, steve wrote:
On Tue, 2014-07-08 at 11:59 +0200, steve wrote:
On Tue, 2014-07-08 at 11:32 +0200, Richard Brown wrote:
On Tue, 2014-07-08 at 11:24 +0200, steve wrote:
2014/07/08 11:06:39.067080 [ 4035]: 00.ctdb: Consider installing tdbtool or at least tdbdump!
I'd recommend you install tdb-tools and try again
OK Getting somewhere.
The command: systemctl start ctdb
Issuing: ctdbd
success: ps aux|grep ctdb root 3932 2.8 0.6 3132 3112 ? SLs 12:22 0:03 ctdbd root 4003 0.8 0.1 3136 812 ? S 12:22 0:01 ctdbd
(never thought to check)
and: node 1: ctdb status Number of nodes:2 pnn:0 192.168.0.10 OK (THIS NODE) pnn:1 192.168.0.11 OK Generation:538367990 Size:2 hash:0 lmaster:0 hash:1 lmaster:1 Recovery mode:NORMAL (0) Recovery master:1
node 2: ctdb status Number of nodes:2 pnn:0 192.168.0.10 OK pnn:1 192.168.0.11 OK (THIS NODE) Generation:538367990 Size:2 hash:0 lmaster:0 hash:1 lmaster:1 Recovery mode:NORMAL (0) Recovery master:1
Qn. What's the official way of starting the daemon? Thanks
But unfortunately, smbd will not start: Maximum core file size limits now 16777216(soft) -1(hard) smbd version 4.1.9-3.22.1-3256-SUSE-oS13.1-i386 started. Copyright Andrew Tridgell and the Samba Team 1992-2013 uid=0 gid=0 euid=0 egid=0 lp_load_ex: refreshing parameters Initialising global parameters rlimit_max: increasing rlimit_max (1024) to minimum Windows limit (16384) params.c:pm_process() - Processing configuration file "/etc/samba/smb.conf" Processing section "[global]" connect(/var/lib/ctdb/ctdb.socket) failed: Connection refused messaging_ctdbd_init failed: NT_STATUS_CONNECTION_REFUSED [global] workgroup = HH3 realm = HH3.SITE security = ADS kerberos method = system keytab netbios name = smbcluster clustering = Yes private dir = /cluster/ctdb [users] path = /cluster/users read only = No We are not (yet) joined to the domain. Thanks for your patience. systemd set us back a while:( Steve -- To unsubscribe, e-mail: opensuse-ha+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-ha+owner@opensuse.org
participants (2)
-
Richard Brown
-
steve