[opensuse] Leap 15.2: Mounting XFS filesystem fails on large RAID partitions
Hi Folks, I ran into a problem installing 15.2 on a system with a couple of large RAID partitions. I just filed bugid 1174056, but I thought I'd mention it here just in case. Bug# 1174056: Installation from full Leap 15.2 ISO fails when attempting to mount XFS filesystems on large RAID partitions. Error dialog follows: mount -t XFS /dev/sdc1 /mnt mount: /export/data: mount(2) system call failed: structure needs cleaning. After booting without mounting the filesystems, xfs_repair returns: xfs_repair -n /dev/sdc1: Phase 1 - Find and Verify Superblock... Bad Primary Superblock - bad stripe width in Superblock! Attempting to find secondary superblock... ........ (I didn't wait around, too many dots) Formatting the same partitions with Ext4 works as expected. RAID Controller: Avago 3108 MegaRAID /dev/sdc1: 267-TB /dev/sdd1: 127-TB I haven't seen this on other systems with similar RAID partitions with Leap 15.1 and lower. Any ideas? mkfs.xfs seems to work, at least it doesn't complain about anything. Regards, Lew -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On Sun, 12 Jul 2020 18:23:05 -0700 Lew Wolfgang <wolfgang@sweet-haven.com> wrote:
Hi Folks,
I ran into a problem installing 15.2 on a system with a couple of large RAID partitions. I just filed bugid 1174056, but I thought I'd mention it here just in case.
Bug# 1174056:
I'm confused.
Installation from full Leap 15.2 ISO fails when attempting to mount XFS filesystems on large RAID partitions.
Error dialog follows:
mount -t XFS /dev/sdc1 /mnt
Assuming /dev/sdc1 is part of the RAID, why are you trying to mount just it? Why is the type XFS rather than the expected xfs?
mount: /export/data: mount(2) system call failed: structure needs cleaning.
After booting without mounting the filesystems, xfs_repair returns:
xfs_repair -n /dev/sdc1:
Phase 1 - Find and Verify Superblock... Bad Primary Superblock - bad stripe width in Superblock! Attempting to find secondary superblock... ........ (I didn't wait around, too many dots)
Formatting the same partitions with Ext4 works as expected.
RAID Controller: Avago 3108 MegaRAID /dev/sdc1: 267-TB /dev/sdd1: 127-TB
What RAID-type are you using, you don't say? I had guessed RAID1 with two partitions, but now you show they are different sizes?
I haven't seen this on other systems with similar RAID partitions with Leap 15.1 and lower.
Any ideas? mkfs.xfs seems to work, at least it doesn't complain about anything.
Regards, Lew
-- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Dave Howorth wrote:
Assuming /dev/sdc1 is part of the RAID, why are you trying to mount just it?
I read /dev/sdc and /dev/sdd to be volumes "backed" by hardware RAID, individual drives not visible. -- Per Jessen, Zürich (21.9°C) http://www.dns24.ch/ - your free DNS host, made in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 07/13/2020 03:58 AM, Per Jessen wrote:
Dave Howorth wrote:
Assuming /dev/sdc1 is part of the RAID, why are you trying to mount just it? I read /dev/sdc and /dev/sdd to be volumes "backed" by hardware RAID, individual drives not visible.
Yes, the RAID controller assembles, in this case 36 14-TB SAS disks, into two volumes: /dev/sdc and /dev/sdd. The volumes are each GPT labeled and partitions created as /dev/sdc1 and /dev/sdd1. mkfs.xfs is then used to create the two filesystems. mkfs.ext4 worked okay, which leads me to think that mkfs.xfs or something in the XFS libraries is broken. The system is remote (I'm teleworking), but I'll go in to day and try a couple of things, like booting a 15.1 rescue ISO to see if can mount the partitions. If it can't, I'll try the 15.1 mkfs.xfs and see what happens. Obviously 15.2 mkfs.xfs works elsewhere, I installed it a couple of days ago on a high-end gamer laptop and used XFS on its /home partition. BTW, the 15.2 install on that laptop was the easiest SuSE install, on a laptop, that I've ever done. The special function keys for volume and screen brightness worked, NetworkManager seamlessly works with wired and WiFi too! I've seen only one small issue, where settings in a konsole window, like font-size/background-color aren't persistent. Setting changes don't survive logout/login. Maybe I'm doing something wrong? Regards, Lew -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Lew Wolfgang wrote:
On 07/13/2020 03:58 AM, Per Jessen wrote:
Dave Howorth wrote:
Assuming /dev/sdc1 is part of the RAID, why are you trying to mount just it? I read /dev/sdc and /dev/sdd to be volumes "backed" by hardware RAID, individual drives not visible.
Yes, the RAID controller assembles, in this case 36 14-TB SAS disks, into two volumes: /dev/sdc and /dev/sdd. The volumes are each GPT labeled and partitions created as /dev/sdc1 and /dev/sdd1. mkfs.xfs is then used to create the two filesystems. mkfs.ext4 worked okay, which leads me to think that mkfs.xfs or something in the XFS libraries is broken.
You do have some pretty sizeable volumes, but while we may not be testing that at openSUSE, somebody will have, I'm sure.
I've seen only one small issue, where settings in a konsole window, like font-size/background-color aren't persistent. Setting changes don't survive logout/login. Maybe I'm doing something wrong?
It has worked for me except for the 1st tab in the console window which remains at the default. I set a bigger font and scheme "Linux Colours". -- Per Jessen, Zürich (26.6°C) http://www.dns24.ch/ - free dynamic DNS, made in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On Mon, 13 Jul 2020 08:15:03 -0700 Lew Wolfgang <wolfgang@sweet-haven.com> wrote:
On 07/13/2020 03:58 AM, Per Jessen wrote:
Dave Howorth wrote:
Assuming /dev/sdc1 is part of the RAID, why are you trying to mount just it? I read /dev/sdc and /dev/sdd to be volumes "backed" by hardware RAID, individual drives not visible.
Yes, the RAID controller assembles, in this case 36 14-TB SAS disks, into two volumes: /dev/sdc and /dev/sdd. The volumes are each GPT labeled and partitions created as /dev/sdc1 and /dev/sdd1. mkfs.xfs is then used to create the two filesystems. mkfs.ext4 worked okay, which leads me to think that mkfs.xfs or something in the XFS libraries is broken.
The system is remote (I'm teleworking), but I'll go in to day and try a couple of things, like booting a 15.1 rescue ISO to see if can mount the partitions. If it can't, I'll try the 15.1 mkfs.xfs and see what happens.
Obviously 15.2 mkfs.xfs works elsewhere, I installed it a couple of days ago on a high-end gamer laptop and used XFS on its /home partition. BTW, the 15.2 install on that laptop was the easiest SuSE install, on a laptop, that I've ever done. The special function keys for volume and screen brightness worked, NetworkManager seamlessly works with wired and WiFi too! I've seen only one small issue, where settings in a konsole window, like font-size/background-color aren't persistent. Setting changes don't survive logout/login. Maybe I'm doing something wrong?
Thanks for the explanation. I'm still a bit confused though. Your bug report is about installation but what you're discussing appears to be a problem creating an xfs filesystem? But you haven't shown any details of that creation. Neither any output nor any arguments supplied to it. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 07/13/2020 12:32 PM, Dave Howorth wrote:
On Mon, 13 Jul 2020 08:15:03 -0700 Lew Wolfgang <wolfgang@sweet-haven.com> wrote:
Dave Howorth wrote:
Assuming /dev/sdc1 is part of the RAID, why are you trying to mount just it? I read /dev/sdc and /dev/sdd to be volumes "backed" by hardware RAID, individual drives not visible. Yes, the RAID controller assembles, in this case 36 14-TB SAS disks, into two volumes: /dev/sdc and /dev/sdd. The volumes are each GPT labeled and partitions created as /dev/sdc1 and /dev/sdd1. mkfs.xfs is then used to create the two filesystems. mkfs.ext4 worked okay, which leads me to think that mkfs.xfs or something in the XFS
On 07/13/2020 03:58 AM, Per Jessen wrote: libraries is broken.
The system is remote (I'm teleworking), but I'll go in to day and try a couple of things, like booting a 15.1 rescue ISO to see if can mount the partitions. If it can't, I'll try the 15.1 mkfs.xfs and see what happens.
Thanks for the explanation.
I'm still a bit confused though. Your bug report is about installation but what you're discussing appears to be a problem creating an xfs filesystem? But you haven't shown any details of that creation. Neither any output nor any arguments supplied to it.
I kept it short and sweet for the bug report, a failed installation is something you can hang your hat on. I instructed the installation process to create the two large filesystems. The partitioner complained that the "structure needs cleaning". I hit the "ignore" button, but the first boot failed with the mount problem. After the install failed to boot, I commented out the two fstab entries and booted without mounting the RAID partitions. I then tried to build new filesystems using YaST's partitioner, gparted, and mkfs.xfs. In all cases the failure appeared when trying to mount the just-created filesystems. Mount returns: mount -t XFS /dev/sdc1 /mnt mount: /export/data: mount(2) system call failed: structure needs cleaning. Then, xfs_repair returns: Phase 1 - Find and Verify Superblock... Bad Primary Superblock - bad stripe width in Superblock! Attempting to find secondary superblock... ........ (I didn't wait around, too many dots) This morning, Arvin Schnell (Bugzilla) noticed this in /var/log/messages: [ 1361.758237] XFS (sdc1): SB stripe unit sanity check failed [ 1361.758315] XFS (sdc1): Metadata corruption detected at xfs_sb_read_verify+0xfe/0x170 [xfs], xfs_sb block 0xffffffffffffffff [ 1361.758315] XFS (sdc1): Unmount and run xfs_repair [ 1361.758316] XFS (sdc1): First 128 bytes of corrupted metadata buffer: Same entries for sdd1. Note the 0xffffffffffffffff, an overflow somewhere? Again, ext4 built without issue. I'm leaving right now to try some additional things. Further news when I return. Regards, Lew -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 07/13/2020 12:52 PM, Lew Wolfgang wrote:
On 07/13/2020 12:32 PM, Dave Howorth wrote:
On Mon, 13 Jul 2020 08:15:03 -0700 Lew Wolfgang <wolfgang@sweet-haven.com> wrote:
Dave Howorth wrote:
Assuming /dev/sdc1 is part of the RAID, why are you trying to mount just it? I read /dev/sdc and /dev/sdd to be volumes "backed" by hardware RAID, individual drives not visible. Yes, the RAID controller assembles, in this case 36 14-TB SAS disks, into two volumes: /dev/sdc and /dev/sdd. The volumes are each GPT labeled and partitions created as /dev/sdc1 and /dev/sdd1. mkfs.xfs is then used to create the two filesystems. mkfs.ext4 worked okay, which leads me to think that mkfs.xfs or something in the XFS
On 07/13/2020 03:58 AM, Per Jessen wrote: libraries is broken.
The system is remote (I'm teleworking), but I'll go in to day and try a couple of things, like booting a 15.1 rescue ISO to see if can mount the partitions. If it can't, I'll try the 15.1 mkfs.xfs and see what happens.
Thanks for the explanation.
I'm still a bit confused though. Your bug report is about installation but what you're discussing appears to be a problem creating an xfs filesystem? But you haven't shown any details of that creation. Neither any output nor any arguments supplied to it.
I kept it short and sweet for the bug report, a failed installation is something you can hang your hat on. I instructed the installation process to create the two large filesystems. The partitioner complained that the "structure needs cleaning". I hit the "ignore" button, but the first boot failed with the mount problem.
After the install failed to boot, I commented out the two fstab entries and booted without mounting the RAID partitions. I then tried to build new filesystems using YaST's partitioner, gparted, and mkfs.xfs. In all cases the failure appeared when trying to mount the just-created filesystems.
Mount returns:
mount -t XFS /dev/sdc1 /mnt mount: /export/data: mount(2) system call failed: structure needs cleaning.
Then, xfs_repair returns:
Phase 1 - Find and Verify Superblock... Bad Primary Superblock - bad stripe width in Superblock! Attempting to find secondary superblock... ........ (I didn't wait around, too many dots)
This morning, Arvin Schnell (Bugzilla) noticed this in /var/log/messages:
[ 1361.758237] XFS (sdc1): SB stripe unit sanity check failed [ 1361.758315] XFS (sdc1): Metadata corruption detected at xfs_sb_read_verify+0xfe/0x170 [xfs], xfs_sb block 0xffffffffffffffff [ 1361.758315] XFS (sdc1): Unmount and run xfs_repair [ 1361.758316] XFS (sdc1): First 128 bytes of corrupted metadata buffer:
Same entries for sdd1.
Note the 0xffffffffffffffff, an overflow somewhere?
Again, ext4 built without issue.
I'm leaving right now to try some additional things. Further news when I return.
I'm back. The xfs_repair that I started yesterday finished, saying: "Sorry, could not find valid secondary superblock" Today I booted the 15.1 rescue system and determined that mount works! It reported: 4096 byte physical blocks 574218043392 blocks for sda1 (drive lettering changed) 273437163520 blocks for sdb1 Back running 15.2, fdisk reports the same block counts as 15.1. Note that 15.1 was able to mount the XFS partitions created by 15.2's mks.xis. This implies to me that the problem is probably in the 15.2 XIS kernel module? Regards, Lew -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 07/13/2020 04:21 PM, Lew Wolfgang wrote:
On 07/13/2020 12:52 PM, Lew Wolfgang wrote:
On 07/13/2020 12:32 PM, Dave Howorth wrote:
On Mon, 13 Jul 2020 08:15:03 -0700 Lew Wolfgang <wolfgang@sweet-haven.com> wrote:
Dave Howorth wrote:
Assuming /dev/sdc1 is part of the RAID, why are you trying to mount just it? I read /dev/sdc and /dev/sdd to be volumes "backed" by hardware RAID, individual drives not visible. Yes, the RAID controller assembles, in this case 36 14-TB SAS disks, into two volumes: /dev/sdc and /dev/sdd. The volumes are each GPT labeled and partitions created as /dev/sdc1 and /dev/sdd1. mkfs.xfs is then used to create the two filesystems. mkfs.ext4 worked okay, which leads me to think that mkfs.xfs or something in the XFS
On 07/13/2020 03:58 AM, Per Jessen wrote: libraries is broken.
The system is remote (I'm teleworking), but I'll go in to day and try a couple of things, like booting a 15.1 rescue ISO to see if can mount the partitions. If it can't, I'll try the 15.1 mkfs.xfs and see what happens.
Thanks for the explanation.
I'm still a bit confused though. Your bug report is about installation but what you're discussing appears to be a problem creating an xfs filesystem? But you haven't shown any details of that creation. Neither any output nor any arguments supplied to it.
I kept it short and sweet for the bug report, a failed installation is something you can hang your hat on. I instructed the installation process to create the two large filesystems. The partitioner complained that the "structure needs cleaning". I hit the "ignore" button, but the first boot failed with the mount problem.
After the install failed to boot, I commented out the two fstab entries and booted without mounting the RAID partitions. I then tried to build new filesystems using YaST's partitioner, gparted, and mkfs.xfs. In all cases the failure appeared when trying to mount the just-created filesystems.
Mount returns:
mount -t XFS /dev/sdc1 /mnt mount: /export/data: mount(2) system call failed: structure needs cleaning.
Then, xfs_repair returns:
Phase 1 - Find and Verify Superblock... Bad Primary Superblock - bad stripe width in Superblock! Attempting to find secondary superblock... ........ (I didn't wait around, too many dots)
This morning, Arvin Schnell (Bugzilla) noticed this in /var/log/messages:
[ 1361.758237] XFS (sdc1): SB stripe unit sanity check failed [ 1361.758315] XFS (sdc1): Metadata corruption detected at xfs_sb_read_verify+0xfe/0x170 [xfs], xfs_sb block 0xffffffffffffffff [ 1361.758315] XFS (sdc1): Unmount and run xfs_repair [ 1361.758316] XFS (sdc1): First 128 bytes of corrupted metadata buffer:
Same entries for sdd1.
Note the 0xffffffffffffffff, an overflow somewhere?
Again, ext4 built without issue.
I'm leaving right now to try some additional things. Further news when I return.
I'm back.
The xfs_repair that I started yesterday finished, saying:
"Sorry, could not find valid secondary superblock"
Today I booted the 15.1 rescue system and determined that mount works! It reported:
4096 byte physical blocks 574218043392 blocks for sda1 (drive lettering changed) 273437163520 blocks for sdb1
Back running 15.2, fdisk reports the same block counts as 15.1.
Note that 15.1 was able to mount the XFS partitions created by 15.2's mks.xis.
This implies to me that the problem is probably in the 15.2 XIS kernel module?
s/xis/xfs, of course. Regards, Lew -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Lew Wolfgang wrote:
Today I booted the 15.1 rescue system and determined that mount works! It reported:
4096 byte physical blocks 574218043392 blocks for sda1 (drive lettering changed) 273437163520 blocks for sdb1
Back running 15.2, fdisk reports the same block counts as 15.1.
Note that 15.1 was able to mount the XFS partitions created by 15.2's mks.xis.
This implies to me that the problem is probably in the 15.2 XIS kernel module?
I would be tempted to think so too, yes. OTOH, I went and googled it - https://bugzilla.kernel.org/show_bug.cgi?id=202127 (also large XFS filesystem on hardware RAID). I only skimmed it quickly, but they seem to be thinking it is a config issue at time of filesystem creation. -- Per Jessen, Zürich (18.9°C) http://www.cloudsuisse.com/ - your owncloud, hosted in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 07/13/2020 11:32 PM, Per Jessen wrote:
Lew Wolfgang wrote:
Today I booted the 15.1 rescue system and determined that mount works! It reported:
4096 byte physical blocks 574218043392 blocks for sda1 (drive lettering changed) 273437163520 blocks for sdb1
Back running 15.2, fdisk reports the same block counts as 15.1.
Note that 15.1 was able to mount the XFS partitions created by 15.2's mks.xis.
This implies to me that the problem is probably in the 15.2 XIS kernel module? I would be tempted to think so too, yes. OTOH, I went and googled it -
https://bugzilla.kernel.org/show_bug.cgi?id=202127 (also large XFS filesystem on hardware RAID).
I only skimmed it quickly, but they seem to be thinking it is a config issue at time of filesystem creation.
Wow, thanks Per! That sure looks like my problem. It may be a Broadcom controller firmware bug that was revealed with more recent Linux kernels. I'll look into updating the firmware, tomorrow. I just finished 16-oz of sangria with frozen blueberries, and it's 11:52-PM. One should not do root-ish things when drinking wine! Regards, Lew -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On Mon, 13 Jul 2020 23:57:45 -0700 Lew Wolfgang <wolfgang@sweet-haven.com> wrote:
On 07/13/2020 11:32 PM, Per Jessen wrote:
Lew Wolfgang wrote:
Today I booted the 15.1 rescue system and determined that mount works! It reported:
4096 byte physical blocks 574218043392 blocks for sda1 (drive lettering changed) 273437163520 blocks for sdb1
Back running 15.2, fdisk reports the same block counts as 15.1.
Note that 15.1 was able to mount the XFS partitions created by 15.2's mks.xis.
This implies to me that the problem is probably in the 15.2 XIS kernel module? I would be tempted to think so too, yes. OTOH, I went and googled it -
https://bugzilla.kernel.org/show_bug.cgi?id=202127 (also large XFS filesystem on hardware RAID).
I only skimmed it quickly, but they seem to be thinking it is a config issue at time of filesystem creation.
Wow, thanks Per! That sure looks like my problem. It may be a Broadcom controller firmware bug that was revealed with more recent Linux kernels. I'll look into updating the firmware, tomorrow. I just finished 16-oz of sangria with frozen blueberries, and it's 11:52-PM. One should not do root-ish things when drinking wine!
Regards, Lew
It appears you may need to add your data and customer opinion weight to daimh's bug report to Broadcom. It sounds like you can fix the new filesystems by overriding the firmware's buggy values as described in #23, #24 and if you have any existing filesystems, you'll be able to mount them as described in #30 That kind of response from Eric and Dave is why I like XFS! :) -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 07/14/2020 02:10 AM, Dave Howorth wrote:
On Mon, 13 Jul 2020 23:57:45 -0700 Lew Wolfgang <wolfgang@sweet-haven.com> wrote:
On 07/13/2020 11:32 PM, Per Jessen wrote:
Lew Wolfgang wrote:
Today I booted the 15.1 rescue system and determined that mount works! It reported:
4096 byte physical blocks 574218043392 blocks for sda1 (drive lettering changed) 273437163520 blocks for sdb1
Back running 15.2, fdisk reports the same block counts as 15.1.
Note that 15.1 was able to mount the XFS partitions created by 15.2's mks.xis.
This implies to me that the problem is probably in the 15.2 XIS kernel module? I would be tempted to think so too, yes. OTOH, I went and googled it -
https://bugzilla.kernel.org/show_bug.cgi?id=202127 (also large XFS filesystem on hardware RAID).
I only skimmed it quickly, but they seem to be thinking it is a config issue at time of filesystem creation. Wow, thanks Per! That sure looks like my problem. It may be a Broadcom controller firmware bug that was revealed with more recent Linux kernels. I'll look into updating the firmware, tomorrow. I just finished 16-oz of sangria with frozen blueberries, and it's 11:52-PM. One should not do root-ish things when drinking wine!
Regards, Lew It appears you may need to add your data and customer opinion weight to daimh's bug report to Broadcom.
It sounds like you can fix the new filesystems by overriding the firmware's buggy values as described in #23, #24 and if you have any existing filesystems, you'll be able to mount them as described in #30
That kind of response from Eric and Dave is why I like XFS! :)
I think I'll try updating the firmware first, if applicable. We have dozens of older boxes with petabytes of stuff that may be threatened by Leap 15.2. At least this current box is new and doesn't have any production data on it yet. Regards, Lew -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 07/14/2020 08:26 AM, Lew Wolfgang wrote:
On 07/14/2020 02:10 AM, Dave Howorth wrote:
On Mon, 13 Jul 2020 23:57:45 -0700 Lew Wolfgang <wolfgang@sweet-haven.com> wrote:
On 07/13/2020 11:32 PM, Per Jessen wrote:
Lew Wolfgang wrote:
Today I booted the 15.1 rescue system and determined that mount works! It reported:
4096 byte physical blocks 574218043392 blocks for sda1 (drive lettering changed) 273437163520 blocks for sdb1
Back running 15.2, fdisk reports the same block counts as 15.1.
Note that 15.1 was able to mount the XFS partitions created by 15.2's mks.xis.
This implies to me that the problem is probably in the 15.2 XIS kernel module? I would be tempted to think so too, yes. OTOH, I went and googled it -
https://bugzilla.kernel.org/show_bug.cgi?id=202127 (also large XFS filesystem on hardware RAID).
I only skimmed it quickly, but they seem to be thinking it is a config issue at time of filesystem creation. Wow, thanks Per! That sure looks like my problem. It may be a Broadcom controller firmware bug that was revealed with more recent Linux kernels. I'll look into updating the firmware, tomorrow. I just finished 16-oz of sangria with frozen blueberries, and it's 11:52-PM. One should not do root-ish things when drinking wine!
Regards, Lew It appears you may need to add your data and customer opinion weight to daimh's bug report to Broadcom.
It sounds like you can fix the new filesystems by overriding the firmware's buggy values as described in #23, #24 and if you have any existing filesystems, you'll be able to mount them as described in #30
That kind of response from Eric and Dave is why I like XFS! :)
I think I'll try updating the firmware first, if applicable. We have dozens of older boxes with petabytes of stuff that may be threatened by Leap 15.2. At least this current box is new and doesn't have any production data on it yet.
I still haven't gotten a confirmation about where/how to update the firmware in this case. It's Broadcom controller modified by SuperMicro. But in any case, in chasing down issues with methods suggested by Anthony Iliopoulos (Kernel Team) I discovered that the RAID volumes in question, when they were created with the Broadcom GUI, a parameter for stripe size was increased beyond the suggested default. This change created the I/O size clash that was flagged by the Leap 15.2 XFS kernel module. I deleted the two volumes, and re-created them without changing the defaults, and all is well! Left as a problem for the near future is that we'll probably see this again when upgrading existing 15.1 servers, but at least we'll be prepared for the unpleasantness and be able to take appropriate action. Regards, Lew -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 07/15/2020 10:00 PM, Lew Wolfgang wrote:
I still haven't gotten a confirmation about where/how to update the firmware in this case. It's Broadcom controller modified by SuperMicro.
But in any case, in chasing down issues with methods suggested by Anthony Iliopoulos (Kernel Team) I discovered that the RAID volumes in question, when they were created with the Broadcom GUI, a parameter for stripe size was increased beyond the suggested default. This change created the I/O size clash that was flagged by the Leap 15.2 XFS kernel module. I deleted the two volumes, and re-created them without changing the defaults, and all is well!
Left as a problem for the near future is that we'll probably see this again when upgrading existing 15.1 servers, but at least we'll be prepared for the unpleasantness and be able to take appropriate action.
Interesting. I never get to play with large number of spares or large arrays above the normal little-office world of 1-3T, but did try and keep the chunk, stride and stripe of the arrays optimized for the arrays I was messing with: Chunks: the hidden key to RAID performance http://www.zdnet.com/article/chunks-the-hidden-key-to-raid-performance/ and Calculating the stride and stripe width https://wiki.archlinux.org/index.php/RAID#Calculating_the_stride_and_stripe_... So what you ran into was an issue setting the stripe too large via the Broadcom Gui when adding the filesystem to your array? -- David C. Rankin, J.D.,P.E. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 07/15/2020 08:58 PM, David C. Rankin wrote:
On 07/15/2020 10:00 PM, Lew Wolfgang wrote:
I still haven't gotten a confirmation about where/how to update the firmware in this case. It's Broadcom controller modified by SuperMicro.
But in any case, in chasing down issues with methods suggested by Anthony Iliopoulos (Kernel Team) I discovered that the RAID volumes in question, when they were created with the Broadcom GUI, a parameter for stripe size was increased beyond the suggested default. This change created the I/O size clash that was flagged by the Leap 15.2 XFS kernel module. I deleted the two volumes, and re-created them without changing the defaults, and all is well!
Left as a problem for the near future is that we'll probably see this again when upgrading existing 15.1 servers, but at least we'll be prepared for the unpleasantness and be able to take appropriate action.
Interesting. I never get to play with large number of spares or large arrays above the normal little-office world of 1-3T, but did try and keep the chunk, stride and stripe of the arrays optimized for the arrays I was messing with:
Chunks: the hidden key to RAID performance http://www.zdnet.com/article/chunks-the-hidden-key-to-raid-performance/
and
Calculating the stride and stripe width https://wiki.archlinux.org/index.php/RAID#Calculating_the_stride_and_stripe_...
So what you ran into was an issue setting the stripe too large via the Broadcom Gui when adding the filesystem to your array?
No, it was when the Broadcom GUI was building the array volume itself. It proceeded without apparent error. When mkfs.xfs read the parameters given by the RAID controller for the volume, it used parameters that didn't make sense for optimal and minimum block sizes, or something to that effect. mkfs.xfs finished without error too. The problem manifested when mount tried to mount the filesystem. Apparently the XFS kernel module for 15.2 is more picky about the XFS filesystem parameters and triggered the failure. Mount reported that it couldn't find the primary and secondary superblocks. Leap 15.1 had absolutely no problem with the filesystems, it was just 15.2 being picky about the parameters. So my future problem will be fixing old XFS filesystems when the time comes soon to update from 15.1 to 15.2. It may involve saving the data elsewhere, remove the volume, recreate with the right default stripe size, recreate the filesystem on the volume, and restore the data. Lots of it... Anthony came up with a process to edit the filesystem metadata, I haven't tried that yet. It's a bit scary. https://bugzilla.suse.com/show_bug.cgi?id=1174056 Regards, Lew -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
16.07.2020 07:47, Lew Wolfgang пишет:
So my future problem will be fixing old XFS filesystems when the time comes soon to update from 15.1 to 15.2. It may involve saving the data elsewhere, remove the volume, recreate with the right default stripe size, recreate the filesystem on the volume, and restore the data. Lots of it... Anthony came up with a process to edit the filesystem metadata, I haven't tried that yet. It's a bit scary.
If you read kernel bug report that was mentioned previously to the end. you will see that you can simply mount XFS on older kernel with correct parameters and they will be persisted to filesystem. Parameters were initialized incorrectly based on what device reported, remounting with correct parameters will update them in superblock.
Of course using filesystem debugger hammer is always possible, but far more error prone ... -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 07/15/2020 10:03 PM, Andrei Borzenkov wrote:
16.07.2020 07:47, Lew Wolfgang пишет:
So my future problem will be fixing old XFS filesystems when the time comes soon to update from 15.1 to 15.2. It may involve saving the data elsewhere, remove the volume, recreate with the right default stripe size, recreate the filesystem on the volume, and restore the data. Lots of it... Anthony came up with a process to edit the filesystem metadata, I haven't tried that yet. It's a bit scary.
If you read kernel bug report that was mentioned previously to the end. you will see that you can simply mount XFS on older kernel with correct parameters and they will be persisted to filesystem. Parameters were initialized incorrectly based on what device reported, remounting with correct parameters will update them in superblock.
Right, but I had no idea what the correct parameters should be! Going forward. we have multiple RAID controller model numbers, with different RAID volume and disk types. Then, how do those numbers translate to the sunit and swidth metadata parameters? It might be safer to start from scratch and let the RAID controller do its thing, correctly this time. I'll have a 10-gigE link to back them up with, so it might not be too bad. Regards, Lew -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 07/16/2020 12:32 AM, Lew Wolfgang wrote:
Right, but I had no idea what the correct parameters should be!
Going forward. we have multiple RAID controller model numbers, with different RAID volume and disk types. Then, how do those numbers translate to the sunit and swidth metadata parameters? It might be safer to start from scratch and let the RAID controller do its thing, correctly this time. I'll have a 10-gigE link to back them up with, so it might not be too bad.
That's where those two links I provided can help. You will have to figure out what the Broadcom uses for 'chunk' size, but then you can use: stride = chunk size / block size stripe width = number of data disks * stride You will have to check, but it looks like your sunit is stride and the swidth is stripe. I have a couple of LSI Megaraid cards 8-port. The gui (if you can call it that) was always fairly easy to deal with. IIRC it used values that were inline with what mdadm did for RAID1. I always hated having to mess with well-working file systems. Even replacing failed disks in the arrays, even though very simple, always left you with that feeling of "what it this blows up?". Thankfully it hasn't yet, but that doesn't do away with that feeling altogether :) -- David C. Rankin, J.D.,P.E. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
participants (5)
-
Andrei Borzenkov
-
Dave Howorth
-
David C. Rankin
-
Lew Wolfgang
-
Per Jessen