[opensuse-ha] ext4 failure on cluster

newer
[opensuse-ha] Package ldirectord...

steve

14 Aug 2014 14 Aug '14

08:08

[post to both lists] Hi everyone Next month we hope to go into production with our drbd-ocfs2-ctdb cluster. Of course, we must prove to our boss that it is a fair if not better replacement for our single AD domain file-server which it will replace. A few questions remain to which we cannot find a simple answer that a non technical person would understand. Indeed, much of what we've read we cannot understand ourselves as self styled ha setter-uppers: 1. Where our single ext4 file server is predictable under load (it just gets slower), with both nodes up, why does the cluster fail so badly under ext4 but absolutely screams under ocfs2? The strange thing is that the clustered ext4 actually performs better when only one node is up. 2. Following what is happening with vbox and mysql, are Oracle likely to re-licence ocfs2 in the same way any time soon? 3. IF (2), do openSUSE or anyone else plan a fork (thinking e.g. mariadb)? General chit-chat/verbose/historical answers especially welcome. Thanks, L & S pp all the Linux team here in Alicante. -- To unsubscribe, e-mail: opensuse-ha+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-ha+owner@opensuse.org

Show replies by date

Richard Brown

14 Aug 14 Aug

09:02

On Thu, 2014-08-14 at 10:08 +0200, steve wrote:

...

[post to both lists] Hi everyone Next month we hope to go into production with our drbd-ocfs2-ctdb cluster. Of course, we must prove to our boss that it is a fair if not better replacement for our single AD domain file-server which it will replace.

A few questions remain to which we cannot find a simple answer that a non technical person would understand. Indeed, much of what we've read we cannot understand ourselves as self styled ha setter-uppers: 1. Where our single ext4 file server is predictable under load (it just gets slower), with both nodes up, why does the cluster fail so badly under ext4 but absolutely screams under ocfs2? The strange thing is that the clustered ext4 actually performs better when only one node is up.

I'm unable to comment without a better understanding of how your drbd/ext4 setup is? Are you using clvm? Are you mounting the ext4 exclusively on one node?

...

2. Following what is happening with vbox and mysql, are Oracle likely to re-licence ocfs2 in the same way any time soon?

No idea, I'd hope not.

...

3. IF (2), do openSUSE or anyone else plan a fork (thinking e.g. mariadb)?

If it happened, I'm sure we'd find a way through it -- To unsubscribe, e-mail: opensuse-ha+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-ha+owner@opensuse.org

steve

09:52

On Thu, 2014-08-14 at 11:02 +0200, Richard Brown wrote:

...

On Thu, 2014-08-14 at 10:08 +0200, steve wrote:

...
[post to both lists] Hi everyone Next month we hope to go into production with our drbd-ocfs2-ctdb cluster. Of course, we must prove to our boss that it is a fair if not better replacement for our single AD domain file-server which it will replace.

A few questions remain to which we cannot find a simple answer that a non technical person would understand. Indeed, much of what we've read we cannot understand ourselves as self styled ha setter-uppers: 1. Where our single ext4 file server is predictable under load (it just gets slower), with both nodes up, why does the cluster fail so badly under ext4 but absolutely screams under ocfs2? The strange thing is that the clustered ext4 actually performs better when only one node is up.

I'm unable to comment without a better understanding of how your drbd/ext4 setup is?

Hi Sorry to be unclear. Our existing file server uses ext4. For our 2 node cluster we had to change to ocfs2: http://linuxcostablanca.blogspot.com.es/2014/07/samba4-cluster-for-ad-drbd-o... I suppose the question is, what does ocfs2 have that ext4 doesn't? But in really simple terms. -- To unsubscribe, e-mail: opensuse-ha+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-ha+owner@opensuse.org

Goldwyn Rodrigues

11:37

Hi Steve, On 08/14/2014 04:52 AM, steve wrote:

...

On Thu, 2014-08-14 at 11:02 +0200, Richard Brown wrote:

...
On Thu, 2014-08-14 at 10:08 +0200, steve wrote:

...
[post to both lists] Hi everyone Next month we hope to go into production with our drbd-ocfs2-ctdb cluster. Of course, we must prove to our boss that it is a fair if not better replacement for our single AD domain file-server which it will replace.

A few questions remain to which we cannot find a simple answer that a non technical person would understand. Indeed, much of what we've read we cannot understand ourselves as self styled ha setter-uppers: 1. Where our single ext4 file server is predictable under load (it just gets slower), with both nodes up, why does the cluster fail so badly under ext4 but absolutely screams under ocfs2? The strange thing is that the clustered ext4 actually performs better when only one node is up.

I'm unable to comment without a better understanding of how your drbd/ext4 setup is?

Hi Sorry to be unclear. Our existing file server uses ext4. For our 2 node cluster we had to change to ocfs2: http://linuxcostablanca.blogspot.com.es/2014/07/samba4-cluster-for-ad-drbd-o... I suppose the question is, what does ocfs2 have that ext4 doesn't? But in really simple terms.

ocfs2 is a clustered filesystem wheras ext4 is a local filesystem. ocfs2 can be mounted simultaneously on both nodes without any problems of data integrity and both nodes can simultaneously serve. ext4 being a local filesystem must be mounted on a single node at a time, which means only one node can serve at a time. -- Goldwyn -- To unsubscribe, e-mail: opensuse-ha+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-ha+owner@opensuse.org

Ulrich Windl

12:46

New subject: Antw: [opensuse-ha] ext4 failure on cluster

...

...
...
steve <steve@steve-ss.com> schrieb am 14.08.2014 um 10:08 in Nachricht <1408003706.1428.22.camel@hh16.hh3.site>: [post to both lists] Hi everyone Next month we hope to go into production with our drbd-ocfs2-ctdb cluster. Of course, we must prove to our boss that it is a fair if not better replacement for our single AD domain file-server which it will replace.

A few questions remain to which we cannot find a simple answer that a non technical person would understand. Indeed, much of what we've read we cannot understand ourselves as self styled ha setter-uppers: 1. Where our single ext4 file server is predictable under load (it just gets slower), with both nodes up, why does the cluster fail so badly under ext4 but absolutely screams under ocfs2? The strange thing is that the clustered ext4 actually performs better when only one node is up.

"clustered ext4"? Never heard of it!

...

2. Following what is happening with vbox and mysql, are Oracle likely to re-licence ocfs2 in the same way any time soon? 3. IF (2), do openSUSE or anyone else plan a fork (thinking e.g. mariadb)?

General chit-chat/verbose/historical answers especially welcome.

Thanks, L & S pp all the Linux team here in Alicante.

-- To unsubscribe, e-mail: opensuse-ha+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-ha+owner@opensuse.org

-- To unsubscribe, e-mail: opensuse-ha+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-ha+owner@opensuse.org

Lars Marowsky-Bree

13:33

On 2014-08-14T10:08:26, steve <steve@steve-ss.com> wrote:

...

1. Where our single ext4 file server is predictable under load (it just gets slower), with both nodes up, why does the cluster fail so badly under ext4 but absolutely screams under ocfs2? The strange thing is that the clustered ext4 actually performs better when only one node is up.

ext4 cannot be concurrently mounted on multiple nodes. The file system just plain-out does not support that; if that is required in your use case, you have no choice but to use OCFS2 (or try GFS2). If you do not need to mount the file system on multiple nodes at the same time, do not use OCFS2(/GFS2).

...

2. Following what is happening with vbox and mysql, are Oracle likely to re-licence ocfs2 in the same way any time soon?

No. OCFS2 is part of the upstream kernel, and not Oracle's to relicense.

...

3. IF (2), do openSUSE or anyone else plan a fork (thinking e.g. mariadb)?

We have no intention nor need to fork OCFS2. We have multiple developers contributing to OCFS2 file system development and also the rest of the cluster stack. If you want to go to production, consider this a quick advertisement for SUSE Linux Enterprise High Availability, just in case you want professional support and tested code ;-) Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) "Experience is the name everyone gives to their mistakes." -- Oscar Wilde -- To unsubscribe, e-mail: opensuse-ha+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-ha+owner@opensuse.org

steve

16:02

On Thu, 2014-08-14 at 15:33 +0200, Lars Marowsky-Bree wrote:

...

On 2014-08-14T10:08:26, steve <steve@steve-ss.com> wrote:

...
1. Where our single ext4 file server is predictable under load (it just gets slower), with both nodes up, why does the cluster fail so badly under ext4 but absolutely screams under ocfs2? The strange thing is that the clustered ext4 actually performs better when only one node is up.

ext4 cannot be concurrently mounted on multiple nodes.

Hi. Other setups use it: http://sigterm.sh/2014/02/highly-available-nfs-cluster-on-debian-wheezy/ We proved it doesn't work well and it usually freezes hard needing to hold down the power off button.

...

The file system just plain-out does not support that; if that is required in your use case, you have no choice but to use OCFS2 (or try GFS2). That's fine. We just wondered if there was a simple one liner which explained this. That it just doesn't work is OK. But some people need convincing.

If you do not need to mount the file system on multiple nodes at the same time, do not use OCFS2(/GFS2).

...
2. Following what is happening with vbox and mysql, are Oracle likely to re-licence ocfs2 in the same way any time soon?

No. OCFS2 is part of the upstream kernel, and not Oracle's to relicense.

Thanks. That's good to know and to be able to pass on.

...

...
3. IF (2), do openSUSE or anyone else plan a fork (thinking e.g. mariadb)?

We have no intention nor need to fork OCFS2. We have multiple developers contributing to OCFS2 file system development and also the rest of the cluster stack.

If you want to go to production, consider this a quick advertisement for SUSE Linux Enterprise High Availability, just in case you want professional support and tested code ;-)

Nice plug. We asked, but SUSE seem to have no intention supporting Linux only AD domains. Rumour has it that samba support is behind the times too;)

...

Regards, Lars

-- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) "Experience is the name everyone gives to their mistakes." -- Oscar Wilde

-- To unsubscribe, e-mail: opensuse-ha+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-ha+owner@opensuse.org

Andrei Borzenkov

16:41

В Thu, 14 Aug 2014 18:02:05 +0200 steve <steve@steve-ss.com> пишет:

...

On Thu, 2014-08-14 at 15:33 +0200, Lars Marowsky-Bree wrote:

...
On 2014-08-14T10:08:26, steve <steve@steve-ss.com> wrote:

...
1. Where our single ext4 file server is predictable under load (it just gets slower), with both nodes up, why does the cluster fail so badly under ext4 but absolutely screams under ocfs2? The strange thing is that the clustered ext4 actually performs better when only one node is up.

ext4 cannot be concurrently mounted on multiple nodes.

Hi. Other setups use it: http://sigterm.sh/2014/02/highly-available-nfs-cluster-on-debian-wheezy/

It is failover cluster where filesystem is always mounted by a single node at a time. What made you think it is used as cluster filesystem? -- To unsubscribe, e-mail: opensuse-ha+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-ha+owner@opensuse.org

Lars Marowsky-Bree

15 Aug 15 Aug

11:20

On 2014-08-14T18:02:05, steve <steve@steve-ss.com> wrote:

...

...
...
1. Where our single ext4 file server is predictable under load (it just gets slower), with both nodes up, why does the cluster fail so badly under ext4 but absolutely screams under ocfs2? The strange thing is that the clustered ext4 actually performs better when only one node is up. ext4 cannot be concurrently mounted on multiple nodes. Hi. Other setups use it: http://sigterm.sh/2014/02/highly-available-nfs-cluster-on-debian-wheezy/

But that's not what they are doing. They are only mounting it once; with drbd (in normal configuration) it's only possible to write to the device once anyway, so no-concurrent mount would be possible anyway. (They are also using a very old heartbeat-v1-style fail-over, not a more modern pacemaker setup.)

...

We proved it doesn't work well and it usually freezes hard needing to hold down the power off button.

This would be a bug. This shouldn't happen.

...

...
The file system just plain-out does not support that; if that is required in your use case, you have no choice but to use OCFS2 (or try GFS2). That's fine. We just wondered if there was a simple one liner which explained this. That it just doesn't work is OK. But some people need convincing.

The simple one liner is that ext4 is a local file system (and thus suitable only for local or fail-over style use in a cluster), and OCFS2 is a concurrent cluster file system. Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) "Experience is the name everyone gives to their mistakes." -- Oscar Wilde -- To unsubscribe, e-mail: opensuse-ha+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-ha+owner@opensuse.org

steve

11:39

On Fri, 2014-08-15 at 13:20 +0200, Lars Marowsky-Bree wrote:

...

On 2014-08-14T18:02:05, steve <steve@steve-ss.com> wrote:

...
...
...
1. Where our single ext4 file server is predictable under load (it just gets slower), with both nodes up, why does the cluster fail so badly under ext4 but absolutely screams under ocfs2? The strange thing is that the clustered ext4 actually performs better when only one node is up. ext4 cannot be concurrently mounted on multiple nodes. Hi. Other setups use it: http://sigterm.sh/2014/02/highly-available-nfs-cluster-on-debian-wheezy/

But that's not what they are doing. They are only mounting it once; with drbd (in normal configuration) it's only possible to write to the device once anyway, so no-concurrent mount would be possible anyway.

Ok, I see: they are using fail over only? Presumably they will have only one node up at a time. Is that what you term local file system? Or is that something else? It's as clear as mud over here, sorry! In our case, we have drbd primary:primary with the data mounted on both nodes at the same time. Is it that configuration which cashes ext4? With ocfs2, it's fine. It still works though, even if we disable one of the nodes. We have only old hardware so it would be pointless having ext4 nodes with only one active at a time: we may as well go back to our single file server. Thanks for your time, L x -- To unsubscribe, e-mail: opensuse-ha+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-ha+owner@opensuse.org

Lars Marowsky-Bree

11:45

On 2014-08-15T13:39:45, steve <steve@steve-ss.com> wrote:

...

...
...
Other setups use it: http://sigterm.sh/2014/02/highly-available-nfs-cluster-on-debian-wheezy/

But that's not what they are doing. They are only mounting it once; with drbd (in normal configuration) it's only possible to write to the device once anyway, so no-concurrent mount would be possible anyway. Ok, I see: they are using fail over only? Presumably they will have only one node up at a time. Is that what you term local file system? Or is that something else? It's as clear as mud over here, sorry!

Right. A traditional "local" file system can only be mounted on one node at once without crashing.

...

In our case, we have drbd primary:primary with the data mounted on both nodes at the same time. Is it that configuration which cashes ext4?

Yes. This does not work and causes data corruption and crashes.

...

With ocfs2, it's fine. It still works though, even if we disable one of the nodes. We have only old hardware so it would be pointless having ext4 nodes with only one active at a time: we may as well go back to our single file server.

I'm not sure this holds. There's a penalty needed for syncing and locking. Have you benchmarked if a single node is really slower than two? Especially a write-heavy workload probably won't, and it'll go very much downhill if you have metadata-intensive jobs (e.g., creating/deleting/stating lots of files/directories). The main reason for HA is, well, availability, not performance, thus going back to a single server is likely worse than this anyway. Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) "Experience is the name everyone gives to their mistakes." -- Oscar Wilde -- To unsubscribe, e-mail: opensuse-ha+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-ha+owner@opensuse.org

Ron Kerry

11:57

On 8/15/14, 7:45 AM, Lars Marowsky-Bree wrote:

...

...
With ocfs2, it's fine. It still works though, even if we disable one

...
of the nodes. We have only old hardware so it would be pointless having ext4 nodes with only one active at a time: we may as well go back to our single file server. I'm not sure this holds. There's a penalty needed for syncing and locking. Have you benchmarked if a single node is really slower than two? Especially a write-heavy workload probably won't, and it'll go very much downhill if you have metadata-intensive jobs (e.g., creating/deleting/stating lots of files/directories).

The main reason for HA is, well, availability, not performance, thus going back to a single server is likely worse than this anyway.

True enough, but the performance of an active-active HA configuration with two or more nodes serving NFS or CIFS can be well in excess of what a single server is capable of doing. This all depends on the underlying disk hardware. Many time that hardware may be capable of far more bandwidth than a single server can drive by itself. In this active-active clustered environment the performance achievable by any single node will be less than what it can do on its own, but the combined performance of many nodes to the same shared-clustered disk will be able to reach the bandwidth capability of the underlying disk hardware. If a single node can drive your disk hardware at peak bandwidth, than as Lars says you will get no performance benefit (in fact you get a performance degradation) from an active-active HA configuration ... just the availability benefit. -- Ron Kerry rkerry@sgi.com Global Product Support - SGI Federal -- To unsubscribe, e-mail: opensuse-ha+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-ha+owner@opensuse.org

Lars Marowsky-Bree

12:03

On 2014-08-15T07:57:31, Ron Kerry <rkerry@sgi.com> wrote:

...

True enough, but the performance of an active-active HA configuration with two or more nodes serving NFS or CIFS can be well in excess of what a single server is capable of doing. This all depends on the underlying disk hardware. Many time that hardware may be capable of far more bandwidth than a single server can drive by itself. In this active-active clustered environment the performance achievable by any single node will be less than what it can do on its own, but the combined performance of many nodes to the same shared-clustered disk will be able to reach the bandwidth capability of the underlying disk hardware.

... but in this case, the shared storage is provided by drbd, so the *write* performance of the combined setup is limited to the slowest of the two disks, and the network interconnect, and also reduced by the locking needed by OCFS2. Read-performance could theoretically benefit, yes. There's however also the higher complexity of an active/active environment, and the tighter coupling of the nodes which reduces their ability to cope with faults. Personally, I'd probably avoid this setup, unless there was a really substantial reason for it. Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) "Experience is the name everyone gives to their mistakes." -- Oscar Wilde -- To unsubscribe, e-mail: opensuse-ha+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-ha+owner@opensuse.org

Ron Kerry

12:05

On 8/15/14, 8:03 AM, Lars Marowsky-Bree wrote:

...

On 2014-08-15T07:57:31, Ron Kerry <rkerry@sgi.com> wrote:

...
True enough, but the performance of an active-active HA configuration with two or more nodes serving NFS or CIFS can be well in excess of what a single server is capable of doing. This all depends on the underlying disk hardware. Many time that hardware may be capable of far more bandwidth than a single server can drive by itself. In this active-active clustered environment the performance achievable by any single node will be less than what it can do on its own, but the combined performance of many nodes to the same shared-clustered disk will be able to reach the bandwidth capability of the underlying disk hardware.

... but in this case, the shared storage is provided by drbd, so the *write* performance of the combined setup is limited to the slowest of the two disks, and the network interconnect, and also reduced by the locking needed by OCFS2.

Read-performance could theoretically benefit, yes.

There's however also the higher complexity of an active/active environment, and the tighter coupling of the nodes which reduces their ability to cope with faults.

Personally, I'd probably avoid this setup, unless there was a really substantial reason for it.

I missed that the underlying storage was DRBD. I agree totally, there is little benefit to this sort of configuration - performance or availability except for certain particular sorts of workloads and failure scenarios. -- Ron Kerry rkerry@sgi.com Global Product Support - SGI Federal -- To unsubscribe, e-mail: opensuse-ha+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-ha+owner@opensuse.org

steve

12:29

On Fri, 2014-08-15 at 08:05 -0400, Ron Kerry wrote:

...

On 8/15/14, 8:03 AM, Lars Marowsky-Bree wrote:

...
On 2014-08-15T07:57:31, Ron Kerry <rkerry@sgi.com> wrote:

...
True enough, but the performance of an active-active HA configuration with two or more nodes serving NFS or CIFS can be well in excess of what a single server is capable of doing. This all depends on the underlying disk hardware. Many time that hardware may be capable of far more bandwidth than a single server can drive by itself. In this active-active clustered environment the performance achievable by any single node will be less than what it can do on its own, but the combined performance of many nodes to the same shared-clustered disk will be able to reach the bandwidth capability of the underlying disk hardware.

... but in this case, the shared storage is provided by drbd, so the *write* performance of the combined setup is limited to the slowest of the two disks, and the network interconnect, and also reduced by the locking needed by OCFS2.

Read-performance could theoretically benefit, yes.

There's however also the higher complexity of an active/active environment, and the tighter coupling of the nodes which reduces their ability to cope with faults.

Personally, I'd probably avoid this setup, unless there was a really substantial reason for it.

I missed that the underlying storage was DRBD. I agree totally, there is little benefit to this sort of configuration - performance or availability except for certain particular sorts of workloads and failure scenarios.

We are amazed that obviously knowledgeable engineers are questioning the availability advantages of a 2 node cluster that keeps us all working over a single file server which doesn't if it fails! But OK, we'll scrap this idea. We have a mix of 80 linux and xp computers, some of which are over 10 years old. Our file servers are second hand computers which a bank were throwing out. We want to keep working. We have €150 and 2 old AMD computers. What do you recommend? L x

...

--

Ron Kerry rkerry@sgi.com Global Product Support - SGI Federal

-- To unsubscribe, e-mail: opensuse-ha+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-ha+owner@opensuse.org

Kai Dupke

12:53

On 08/15/2014 02:29 PM, steve wrote:

...

Our file servers are second hand computers which a bank were throwing out. We want to keep working. We have €150 and 2 old AMD computers. What do you recommend?

Depends on your knowledge and how you calculate your own time. High Availability stacks are a way to increase, hm, the availability in an automatic way. If it's about performance I fear there is not really much to do. Such a stack is not intend to cover issues because of weak hardware. It also must be managed appropriate, which usually includes a test system and fail-over tests, and more than one admin who can mange the system. At all make it easy, so easy that whomever can switch and start this at 3am in the morning. It's not unusual for Mr. Murphy to crash something when the admin is on vacation - if this is a single-admin-system then it becomes a real issue. Every time you create a stack too high for the base to keep the weight it will fall down at some time. Depending on the environment there are many solution to fight a potential failure, but that depends on the impact of a failure. https://www.suse.com/documentation/sle_ha/singlehtml/book_sleha_techguides/b... describes a scenario with DRBD based on SUSE Linux Enterprise Server but that should also apply to opensuse. It is about NFS but the transition to samba should not be that a problem. With limited resources (time & money) someone could just rsync the content to a second box, implementing a boot option to start the samba server. This is highly manual and includes pros and cons as well but works good in non-techie environments (= without an on-site admin) greetings Kai Dupke Senior Product Manager Server Product Line -- Phone: +49-(0)5102-9310828 Mail: kdupke@suse.com Mobile: +49-(0)173-5876766 WWW: www.suse.com SUSE Linux Products GmbH - Maxfeldstr. 5 - 90409 Nuernberg (Germany) GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 16746 (AG Nurnberg) -- To unsubscribe, e-mail: opensuse-ha+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-ha+owner@opensuse.org

Lars Marowsky-Bree

13:23

On 2014-08-15T14:29:46, steve <steve@steve-ss.com> wrote:

...

We are amazed that obviously knowledgeable engineers are questioning the availability advantages of a 2 node cluster that keeps us all working over a single file server which doesn't if it fails! But OK, we'll scrap this idea.

No, nobody did question the availability advantage. Just the performance benefits of going active/active. And in your scenario, using a traditional fail-over setup like drbd + ext4 (or XFS) is a much better choice compared to drbd (pri/pri) + OCFS2.

...

We have a mix of 80 linux and xp computers, some of which are over 10 years old. Our file servers are second hand computers which a bank were throwing out. We want to keep working. We have €150 and 2 old AMD computers. What do you recommend?

Doing this with this hardware is okay, that can definitely help with availability. Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) "Experience is the name everyone gives to their mistakes." -- Oscar Wilde -- To unsubscribe, e-mail: opensuse-ha+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-ha+owner@opensuse.org

steve

16 Aug 16 Aug

14:19

On Fri, 2014-08-15 at 08:05 -0400, Ron Kerry wrote:

...

On 8/15/14, 8:03 AM, Lars Marowsky-Bree wrote:

...
On 2014-08-15T07:57:31, Ron Kerry <rkerry@sgi.com> wrote:

...
True enough, but the performance of an active-active HA configuration with two or more nodes serving NFS or CIFS can be well in excess of what a single server is capable of doing. This all depends on the underlying disk hardware. Many time that hardware may be capable of far more bandwidth than a single server can drive by itself. In this active-active clustered environment the performance achievable by any single node will be less than what it can do on its own, but the combined performance of many nodes to the same shared-clustered disk will be able to reach the bandwidth capability of the underlying disk hardware.

... but in this case, the shared storage is provided by drbd, so the *write* performance of the combined setup is limited to the slowest of the two disks, and the network interconnect, and also reduced by the locking needed by OCFS2.

Read-performance could theoretically benefit, yes.

There's however also the higher complexity of an active/active environment, and the tighter coupling of the nodes which reduces their ability to cope with faults.

Personally, I'd probably avoid this setup, unless there was a really substantial reason for it.

I missed that the underlying storage was DRBD. I agree totally, there is little benefit to this sort of configuration - performance or availability except for certain particular sorts of workloads and failure scenarios.

--

Hi Red Hat don't give an option. They say that a cluster fs must be used: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/htm... -- To unsubscribe, e-mail: opensuse-ha+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-ha+owner@opensuse.org

Kai Dupke

18 Aug 18 Aug

07:10

On 08/16/2014 04:19 PM, steve wrote:

...

Red Hat don't give an option. They say that a cluster fs must be used:

Right - if you use CTDB, then you need a cluster FS. But, does CTDB really helps you? With the hardware you described I really doubt that you get more performance using CTDB compared to a single access samba server. If you have high speed disks, high speed network connections, then CTDB provides more performance. If one of this is missing you will miss the performance. In addition, you're using DRBD, which is different from accessing a shared - and fast - storage. If you do not gain performance by using CTDB then it only increases the complexity of the setup - worst case is that you loose performance. That said, you might get a better experience using traditional samba in a fail-over configuration. greetings Kai Dupke Senior Product Manager Server Product Line -- Phone: +49-(0)5102-9310828 Mail: kdupke@suse.com Mobile: +49-(0)173-5876766 WWW: www.suse.com SUSE Linux Products GmbH - Maxfeldstr. 5 - 90409 Nuernberg (Germany) GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 16746 (AG Nurnberg) -- To unsubscribe, e-mail: opensuse-ha+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-ha+owner@opensuse.org

steve

08:51

On Mon, 2014-08-18 at 09:10 +0200, Kai Dupke wrote:

...

On 08/16/2014 04:19 PM, steve wrote:

...
Red Hat don't give an option. They say that a cluster fs must be used:

Right - if you use CTDB, then you need a cluster FS.

But, does CTDB really helps you?

With the hardware you described I really doubt that you get more performance using CTDB compared to a single access samba server.

If you have high speed disks, high speed network connections, then CTDB provides more performance. If one of this is missing you will miss the performance.

In addition, you're using DRBD, which is different from accessing a shared - and fast - storage.

If you do not gain performance by using CTDB then it only increases the complexity of the setup - worst case is that you loose performance.

That said, you might get a better experience using traditional samba in a fail-over configuration.

Hi Kai Thanks for the input. It's not performance we want. We're never going to get that. It's reliability. What do you mean by fail over? I think we already have that. If one node fails, there is always the other one already up. If you mean have only one node available at a time, what's the advantage of that? Also, how are we going to serve a windows domain without ctdb? L x -- To unsubscribe, e-mail: opensuse-ha+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-ha+owner@opensuse.org

Kai Dupke

11:16

On 08/18/2014 10:51 AM, steve wrote:

...

Thanks for the input. It's not performance we want. We're never going to get that. It's reliability. What do you mean by fail over? I think we already have that. If one node fails, there is always the other one already up. If you mean have only one node available at a time, what's the advantage of that? Also, how are we going to serve a windows domain without ctdb?

I assume you can handle a windows domain with Samba and don't need CTDB? fail-over means to have samba either running on A xor B. The advantage is noOCFS2, which means less communication overhead and less complex setup. 'Already the other is up' does not help really as CTDB does not provide a transparent fail-over for a client connection AFIK. The time the system needs to identify the issue and switch from node A to B isn't that long. Usually it isn't an issue if not happens multiple times a week. greetings Kai Dupke Senior Product Manager Server Product Line -- Phone: +49-(0)5102-9310828 Mail: kdupke@suse.com Mobile: +49-(0)173-5876766 WWW: www.suse.com SUSE Linux Products GmbH - Maxfeldstr. 5 - 90409 Nuernberg (Germany) GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 16746 (AG Nurnberg) -- To unsubscribe, e-mail: opensuse-ha+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-ha+owner@opensuse.org

steve

16:56

On Mon, 2014-08-18 at 13:16 +0200, Kai Dupke wrote:

...

On 08/18/2014 10:51 AM, steve wrote:

...
Thanks for the input. It's not performance we want. We're never going to get that. It's reliability. What do you mean by fail over? I think we already have that. If one node fails, there is always the other one already up. If you mean have only one node available at a time, what's the advantage of that? Also, how are we going to serve a windows domain without ctdb?

I assume you can handle a windows domain with Samba and don't need CTDB?

fail-over means to have samba either running on A xor B.

The advantage is noOCFS2, which means less communication overhead and less complex setup.

'Already the other is up' does not help really as CTDB does not provide a transparent fail-over for a client connection AFIK. The time the system needs to identify the issue and switch from node A to B isn't that long. Usually it isn't an issue if not happens multiple times a week.

I'm not sure what you mean by transparent fail-over. Do you mean that the IP is not taken over without user intervention? Or that smbd on the other node does not take over the share? Both, or something else? What do your SLES tests show? Thanks, L x

...

greetings Kai Dupke Senior Product Manager Server Product Line -- Phone: +49-(0)5102-9310828 Mail: kdupke@suse.com Mobile: +49-(0)173-5876766 WWW: www.suse.com

SUSE Linux Products GmbH - Maxfeldstr. 5 - 90409 Nuernberg (Germany) GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 16746 (AG Nurnberg)

-- To unsubscribe, e-mail: opensuse-ha+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-ha+owner@opensuse.org

Andrei Borzenkov

17:04

В Mon, 18 Aug 2014 18:56:44 +0200 steve <steve@steve-ss.com> пишет:

...

On Mon, 2014-08-18 at 13:16 +0200, Kai Dupke wrote:

...
On 08/18/2014 10:51 AM, steve wrote:

...
Thanks for the input. It's not performance we want. We're never going to get that. It's reliability. What do you mean by fail over? I think we already have that. If one node fails, there is always the other one already up. If you mean have only one node available at a time, what's the advantage of that? Also, how are we going to serve a windows domain without ctdb?

I assume you can handle a windows domain with Samba and don't need CTDB?

fail-over means to have samba either running on A xor B.

The advantage is noOCFS2, which means less communication overhead and less complex setup.

'Already the other is up' does not help really as CTDB does not provide a transparent fail-over for a client connection AFIK. The time the system needs to identify the issue and switch from node A to B isn't that long. Usually it isn't an issue if not happens multiple times a week.

I'm not sure what you mean by transparent fail-over.

After failover TCP connection is broken and client has to reconnect. Depending on client application it may have more or less impact (Windows Explorer will probably reconnect; database will likely crash). It has nothing to do with server, but simply with the way SMB works and was the same also using Windows Server cluster. There is limited support for durable/persistent shares in SMB 3.0, do not know whether SAMBA implements it.

...

Do you mean that the IP is not taken over without user intervention? Or that smbd on the other node does not take over the share? Both, or something else?

What do your SLES tests show? Thanks, L x

...
greetings Kai Dupke Senior Product Manager Server Product Line -- Phone: +49-(0)5102-9310828 Mail: kdupke@suse.com Mobile: +49-(0)173-5876766 WWW: www.suse.com

SUSE Linux Products GmbH - Maxfeldstr. 5 - 90409 Nuernberg (Germany) GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 16746 (AG Nurnberg)

-- To unsubscribe, e-mail: opensuse-ha+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-ha+owner@opensuse.org

steve

18:27

On Mon, 2014-08-18 at 21:04 +0400, Andrei Borzenkov wrote:

...

В Mon, 18 Aug 2014 18:56:44 +0200 steve <steve@steve-ss.com> пишет:

...
On Mon, 2014-08-18 at 13:16 +0200, Kai Dupke wrote:

...
On 08/18/2014 10:51 AM, steve wrote:

...
Thanks for the input. It's not performance we want. We're never going to get that. It's reliability. What do you mean by fail over? I think we already have that. If one node fails, there is always the other one already up. If you mean have only one node available at a time, what's the advantage of that? Also, how are we going to serve a windows domain without ctdb?

I assume you can handle a windows domain with Samba and don't need CTDB?

fail-over means to have samba either running on A xor B.

The advantage is noOCFS2, which means less communication overhead and less complex setup.

'Already the other is up' does not help really as CTDB does not provide a transparent fail-over for a client connection AFIK. The time the system needs to identify the issue and switch from node A to B isn't that long. Usually it isn't an issue if not happens multiple times a week.

I'm not sure what you mean by transparent fail-over.

After failover TCP connection is broken and client has to reconnect. Depending on client application it may have more or less impact (Windows Explorer will probably reconnect; database will likely crash).

Ah, I see. So it depends upon the client reconnecting. I thought it was the cluster's responsibility to make the takeover invisible. We can still work on LibreOffice documents for example and disable and enable nodes at will. I Imagine that to be the exception rather than the norm though. Or, more likely, we've just been lucky.

...

It has nothing to do with server, but simply with the way SMB works and was the same also using Windows Server cluster. There is limited support for durable/persistent shares in SMB 3.0, do not know whether SAMBA implements it.

OK. So there's really no such thing as foolproof HA. Thanks, S pp li

...

...
Do you mean that the IP is not taken over without user intervention? Or that smbd on the other node does not take over the share? Both, or something else?

What do your SLES tests show? Thanks, L x

...
greetings Kai Dupke Senior Product Manager Server Product Line -- Phone: +49-(0)5102-9310828 Mail: kdupke@suse.com Mobile: +49-(0)173-5876766 WWW: www.suse.com

SUSE Linux Products GmbH - Maxfeldstr. 5 - 90409 Nuernberg (Germany) GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 16746 (AG Nurnberg)

-- To unsubscribe, e-mail: opensuse-ha+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-ha+owner@opensuse.org

Kai Dupke

21:35

On 08/18/2014 08:27 PM, steve wrote:

...

Ah, I see. So it depends upon the client reconnecting. I thought it was the cluster's responsibility to make the takeover invisible. We can still work on LibreOffice documents for example and disable and enable nodes at will. I Imagine that to be the exception rather than the norm though. Or, more likely, we've just been lucky.

Right - this is a timing issue. When you click save in libreoffice then a connection is opened and the file is saved. No issue if the smb share isn't available before or after but if a fail-over happens during saving (think about autosave and bigger file operations) then you are in bad luck. This is independent of CTDB or Samba, which was the starting point of the last discussion path. That said, you do not gain real benefits from using CTDB but more complexity. greetings Kai Dupke Senior Product Manager Server Product Line -- Phone: +49-(0)5102-9310828 Mail: kdupke@suse.com Mobile: +49-(0)173-5876766 WWW: www.suse.com SUSE Linux Products GmbH - Maxfeldstr. 5 - 90409 Nuernberg (Germany) GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 16746 (AG Nurnberg) -- To unsubscribe, e-mail: opensuse-ha+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-ha+owner@opensuse.org

steve

19 Aug 19 Aug

08:19

On Mon, 2014-08-18 at 23:35 +0200, Kai Dupke wrote:

...

On 08/18/2014 08:27 PM, steve wrote:

...
Ah, I see. So it depends upon the client reconnecting. I thought it was the cluster's responsibility to make the takeover invisible. We can still work on LibreOffice documents for example and disable and enable nodes at will. I Imagine that to be the exception rather than the norm though. Or, more likely, we've just been lucky.

Right - this is a timing issue. When you click save in libreoffice then a connection is opened and the file is saved. No issue if the smb share isn't available before or after but if a fail-over happens during saving (think about autosave and bigger file operations) then you are in bad luck.

This is independent of CTDB or Samba, which was the starting point of the last discussion path. That said, you do not gain real benefits from using CTDB but more complexity.

Hi OK, so what is the recommended method for IP and smbd failover in an AD domain? Cheers, -- To unsubscribe, e-mail: opensuse-ha+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-ha+owner@opensuse.org

Kai Dupke

08:38

On 08/19/2014 10:19 AM, steve wrote:

...

OK, so what is the recommended method for IP and smbd failover in an AD domain?

AFAIK AD controller is a function of samba 4. High Availability can be added with the Linux HA stack. greetings Kai Dupke Senior Product Manager Server Product Line -- Phone: +49-(0)5102-9310828 Mail: kdupke@suse.com Mobile: +49-(0)173-5876766 WWW: www.suse.com SUSE Linux Products GmbH - Maxfeldstr. 5 - 90409 Nuernberg (Germany) GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 16746 (AG Nurnberg) -- To unsubscribe, e-mail: opensuse-ha+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-ha+owner@opensuse.org

steve

10:11

On Tue, 2014-08-19 at 10:38 +0200, Kai Dupke wrote:

...

On 08/19/2014 10:19 AM, steve wrote:

...
OK, so what is the recommended method for IP and smbd failover in an AD domain?

AFAIK AD controller is a function of samba 4. High Availability can be added with the Linux HA stack.

AD can be provided by samba4 but is more often provided by a windows server. We already have HA for DCs. What we are asking about is a file server cluster. What is the Linux HA stack? Doesn't it include ctdb? The sles documentation says it does. . . Thanks, L x -- To unsubscribe, e-mail: opensuse-ha+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-ha+owner@opensuse.org

steve

15 Aug 15 Aug

12:20

On Fri, 2014-08-15 at 07:57 -0400, Ron Kerry wrote:

...

On 8/15/14, 7:45 AM, Lars Marowsky-Bree wrote:

...
...
With ocfs2, it's fine. It still works though, even if we disable one

...
of the nodes. We have only old hardware so it would be pointless having ext4 nodes with only one active at a time: we may as well go back to our single file server. I'm not sure this holds. There's a penalty needed for syncing and locking. Have you benchmarked if a single node is really slower than two? Especially a write-heavy workload probably won't, and it'll go very much downhill if you have metadata-intensive jobs (e.g., creating/deleting/stating lots of files/directories).

The main reason for HA is, well, availability, not performance, thus going back to a single server is likely worse than this anyway.

True enough, but the performance of an active-active HA configuration with two or more nodes serving NFS or CIFS can be well in excess of what a single server is capable of doing. This all depends on the underlying disk hardware. Many time that hardware may be capable of far more bandwidth than a single server can drive by itself. In this active-active clustered environment the performance achievable by any single node will be less than what it can do on its own, but the combined performance of many nodes to the same shared-clustered disk will be able to reach the bandwidth capability of the underlying disk hardware.

If a single node can drive your disk hardware at peak bandwidth, than as Lars says you will get no performance benefit (in fact you get a performance degradation) from an active-active HA configuration ... just the availability benefit.

Hi. It's the availability which is more important here. We're never going to get performance anyway as we can't afford new hardware.

...

--

Ron Kerry rkerry@sgi.com Global Product Support - SGI Federal

-- To unsubscribe, e-mail: opensuse-ha+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-ha+owner@opensuse.org

steve

12:18

On Fri, 2014-08-15 at 13:45 +0200, Lars Marowsky-Bree wrote:

...

On 2014-08-15T13:39:45, steve <steve@steve-ss.com> wrote:

...
...
...
Other setups use it: http://sigterm.sh/2014/02/highly-available-nfs-cluster-on-debian-wheezy/

But that's not what they are doing. They are only mounting it once; with drbd (in normal configuration) it's only possible to write to the device once anyway, so no-concurrent mount would be possible anyway. Ok, I see: they are using fail over only? Presumably they will have only one node up at a time. Is that what you term local file system? Or is that something else? It's as clear as mud over here, sorry!

Right. A traditional "local" file system can only be mounted on one node at once without crashing.

...
In our case, we have drbd primary:primary with the data mounted on both nodes at the same time. Is it that configuration which cashes ext4?

Yes. This does not work and causes data corruption and crashes.

...
With ocfs2, it's fine. It still works though, even if we disable one of the nodes. We have only old hardware so it would be pointless having ext4 nodes with only one active at a time: we may as well go back to our single file server.

I'm not sure this holds. There's a penalty needed for syncing and locking. Have you benchmarked if a single node is really slower than two? Especially a write-heavy workload probably won't, and it'll go very much downhill if you have metadata-intensive jobs (e.g., creating/deleting/stating lots of files/directories).

The main reason for HA is, well, availability, not performance, thus going back to a single server is likely worse than this anyway.

That's our main reason too. Mainly for peace of mind: we can fix the file server that's failed whilst everyone can still work. If we go back to our old single file server, if that goes down nobody can do anything. Maybe that's not what ocfs2 is about, but if it keeps us working then any technical explanation is worthless to us.

...

Regards, Lars

-- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) "Experience is the name everyone gives to their mistakes." -- Oscar Wilde

-- To unsubscribe, e-mail: opensuse-ha+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-ha+owner@opensuse.org

3752

Age (days ago)

3757

Last active (days ago)

List overview

Download

29 comments

8 participants

participants (8)

Andrei Borzenkov
Goldwyn Rodrigues
Kai Dupke
Lars Marowsky-Bree
Richard Brown
Ron Kerry
steve
Ulrich Windl