[opensuse-factory] gdbm failing basic creation test; how could perl pass if it uses this?
I tried a sample gdbm prog in trying to discover why perl (5.18) won't pass it's test suite (since 5.16, or last year) in regards to gdbm. The perl people kindly gave me a gdbm 'C' test to test if gdbm works from 'C'. I had gdbm-devel and libgdbm4 V 1.10-6.1 and upgraded to the latest in factory today (V1.10-6.5). The C program I used:
more test_gdbm.c #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <string.h> #include <gdbm.h> #include <errno.h>
int main (int argc, char **argv) { GDBM_FILE dbf; datum key = { "somekey", 8 }; datum value = { "somevalue", 8 }; int ret; datum content; errno = 0; printf("gdbm_version = %s\n", gdbm_version); dbf = gdbm_open ("trial.gdbm", 0, GDBM_NEWDB, 0644, 0); if (!dbf) { printf("%s, errno = %d, gdbm_errno = %d\n", "Fatal error opening trial.gdbm", errno, gdbm_errno); exit(1); } ret = gdbm_store (dbf, key, value, GDBM_INSERT); if (ret != -1) { printf("Successfully stored key.\n"); } else { printf("%s ret = %d, errno = %d, gdbm_errno = %d\n", "Failed to store key.", ret, errno, gdbm_errno); } content = gdbm_fetch(dbf, key); if (content.dptr && strncmp(content.dptr, value.dptr, 7) == 0) { printf("Successfully fetched key.\n"); } else { printf("%s errno = %d, gdbm_errno = %d\n", "Failed to retrieve key.", errno, gdbm_errno); } gdbm_close(dbf); unlink("trial.gdbm"); printf ("done.\n"); return 0; } --- Building & running the above:
gcc -lgdbm -lgdbm_compat test_gdbm.c -o tg tg gdbm_version = GDBM version 1.10. 13/11/2011 Fatal error opening trial.gdbm, errno = 0, gdbm_errno = 2
Can someone verify this? If it is broken, that would be bad, since perl uses the above in its test suite -- and on my system, I've been getting breakages in building (and testing) perl for over a year... Thing is, I can't see how perl would have built and tested in factory with the above not working. So, what am I missing? -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Wednesday 2013-09-04 03:37, Linda Walsh wrote:
dbf = gdbm_open ("trial.gdbm", 0, GDBM_NEWDB, 0644, 0); if (!dbf) { printf("%s, errno = %d, gdbm_errno = %d\n", "Fatal error opening trial.gdbm", errno, gdbm_errno);
Fatal error opening trial.gdbm, errno = 0, gdbm_errno = 2
(2 = GDBM_BLOCK_SIZE_ERROR)
If it is broken, that would be bad, since perl uses the above in its test suite -- and on my system, I've been getting breakages in building (and testing) perl for over a year...
Your system is known to be broken in various strange ways every now and then. I suggest you gdb your gdbm library (by way of single stepping through the test program) and see where it goes to set GDBM_BLOCK_SIZE_ERROR.
Thing is, I can't see how perl would have built and tested in factory with the above not working. So, what am I missing?
Because perl in factory is built using a clean well-known state. Every time. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Wednesday 2013-09-04 03:37, Linda Walsh wrote:
dbf = gdbm_open ("trial.gdbm", 0, GDBM_NEWDB, 0644, 0); if (!dbf) { printf("%s, errno = %d, gdbm_errno = %d\n", "Fatal error opening trial.gdbm", errno, gdbm_errno);
Fatal error opening trial.gdbm, errno = 0, gdbm_errno = 2
(2 = GDBM_BLOCK_SIZE_ERROR)
If it is broken, that would be bad, since perl uses the above in its test suite -- and on my system, I've been getting breakages in building (and testing) perl for over a year...
Your system is known to be broken in various strange ways every now and then. ==== Has it? Notice I haven't posted much in the way of
Jan Engelhardt wrote: problems for .. ~3+ months now. System boots in ~25 seconds directly from disk, just like it has been able to do for the last ~15 years.
Because perl in factory is built using a clean well-known state. Every time.
Which means it is *untested* in the real world (Example follows). If the extent of build & test is in a single-config, sterile environment, how can show it will work on any environment that is different than the 1 config used for build & test -- there is no proof or credibility that a SuSE system will work on a given SuSE installed system. This a perfect example. GDBM appears to be broken by someone from BSD assuming that the "st_blksize" parameter of "stat", is the "Default block size" and will be a *power* (not, just multiple) of _2_. This isn't always true on linux or POSIX though is likely to be true in a sterile build+test environment. If they can't find the struct member, they use a fixed 1024 as the size of the blocks returned in "st_blocks" blkcnt_t st_blocks; /* number of 512B blocks allocated */ So the code defaults to using the wrong blksize on linux. But the usage of blksize is faulty & incorrect, as well. On POSIX (and linux), that blksize is the **preferred I/O size**. Which means it gets set based on the block-device and probably the filesystem. A RAID uses stripe-size x width (#data stripes) as the "blksize" in the stat call. So a RAID with 64k stripes & 12 data disks would have a stripe size of 768k. That stripsize is the optimal I/O size (and it isn't divisible by 2). One would assume SuSE hasn't dropped support for RAID filesystems, but it appears anytime the strip-size*width is not a **power** of 2, gdbm will fail -- as well as perl tests. But they are unlikely to fail on artificially constructed build machine where it is unlikely one would simulate a raid for build & test of perl. But that is exactly the type of Cross-testing SuSE needs to do, but no longer does. That's effectively creating a less useful release each time it covers up another bug like this. As far as I can tell, This appears to be a bug that goes back a ways and is entirely based on people not using a "build-system" configured-machine for their production machine. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Thursday 2013-09-05 05:45, Linda Walsh wrote:
On POSIX (and linux), that blksize is the **preferred I/O size**. Which means it gets set based on the block-device and probably the filesystem.
A RAID uses stripe-size x width (#data stripes) as the "blksize" in the stat call.
So a RAID with 64k stripes & 12 data disks would have a stripe size of 768k. That stripsize is the optimal I/O size (and it isn't divisible by 2).
I cannot reproduce this with at least MD RAIDs in modes 0 5 and 6 using 64k×12 devices. The filesystem created on it reports 4096 for st_blksize, as does fstat on a block device fd when directly opened. So if you can describe your RAID more thoroughly.. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
В Fri, 6 Sep 2013 09:32:19 +0200 (CEST) Jan Engelhardt <jengelh@inai.de> пишет:
On Thursday 2013-09-05 05:45, Linda Walsh wrote:
On POSIX (and linux), that blksize is the **preferred I/O size**. Which means it gets set based on the block-device and probably the filesystem.
A RAID uses stripe-size x width (#data stripes) as the "blksize" in the stat call.
So a RAID with 64k stripes & 12 data disks would have a stripe size of 768k. That stripsize is the optimal I/O size (and it isn't divisible by 2).
I cannot reproduce this with at least MD RAIDs in modes 0 5 and 6 using 64k×12 devices. The filesystem created on it reports 4096 for st_blksize, as does fstat on a block device fd when directly opened. So if you can describe your RAID more thoroughly..
Out of interest - what /sys/block/mdX/queue/optimal_io_size says? -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Friday 2013-09-06 10:05, Andrey Borzenkov wrote:
So a RAID with 64k stripes & 12 data disks would have a stripe size of 768k. That stripsize is the optimal I/O size (and it isn't divisible by 2).
I cannot reproduce this with at least MD RAIDs in modes 0 5 and 6 using 64k×12 devices. The filesystem created on it reports 4096 for st_blksize, as does fstat on a block device fd when directly opened. So if you can describe your RAID more thoroughly..
Out of interest - what /sys/block/mdX/queue/optimal_io_size says?
chunk=64 disks=12 raid0 - 786432 raid5 - 720896 raid6 - 655360 raid10 - 393216 disks=2 raid1 - 0 -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Andrey Borzenkov wrote:
В Fri, 6 Sep 2013 09:32:19 +0200 (CEST) Jan Engelhardt <jengelh@inai.de> пишет:
On Thursday 2013-09-05 05:45, Linda Walsh wrote:
On POSIX (and linux), that blksize is the **preferred I/O size**. Which means it gets set based on the block-device and probably the filesystem.
A RAID uses stripe-size x width (#data stripes) as the "blksize" in the stat call.
So a RAID with 64k stripes & 12 data disks would have a stripe size of 768k. That stripsize is the optimal I/O size (and it isn't divisible by 2). I cannot reproduce this with at least MD RAIDs in modes 0 5 and 6 using 64k×12 devices. The filesystem created on it reports 4096 for st_blksize, as does fstat on a block device fd when directly opened. So if you can describe your RAID more thoroughly..
Out of interest - what /sys/block/mdX/queue/optimal_io_size says?
I don't have an mdX... it's hardware RAID... Setup w/15 disks total 3 groups of RAID5, so 4 data +1parity/group. The 3 groups are stripped like a RAID0 [ 2.754292] megasas: 06.506.00.00-rc1 Sat. Feb. 9 17:00:00 PDT 2013 [ 2.760727] megasas: 0x1000:0x0079:0x1000:0x9275: bus 5:slot 0:func 0 [ 2.767549] megasas: FW now in Ready state [ 2.771843] megaraid_sas 0000:05:00.0: irq 88 for MSI/MSI-X [ 2.835906] megasas_init_mfi: fw_support_ieee=0 [ 2.840312] megasas: INIT adapter done [ 2.906998] scsi0 : LSI SAS based MegaRAID driver [ 2.912046] megasas: 0x1000:0x0060:0x1028:0x1f0c: bus 2:slot 0:func 0 [ 2.918936] megasas: FW now in Ready state [ 2.925806] scsi 0:0:14:0: Enclosure LSI DE1600-SAS 0313 PQ: 0 ANSI: 5 [ 2.937228] scsi 0:0:15:0: Direct-Access ATA Hitachi HUA72202 A3EA PQ: 0 ANSI: 5 [ 2.948416] scsi 0:0:16:0: Direct-Access ATA Hitachi HUA72202 A3EA PQ: 0 ANSI: 5 [ 2.959674] scsi 0:0:17:0: Direct-Access ATA Hitachi HUA72202 A3EA PQ: 0 ANSI: 5 [ 2.966044] megasas_init_mfi: fw_support_ieee=0 [ 2.966044] megasas: INIT adapter done [ 2.979455] scsi 0:0:18:0: Direct-Access ATA Hitachi HUA72202 A3EA PQ: 0 ANSI: 5 [ 2.991140] scsi 0:0:19:0: Direct-Access ATA Hitachi HUA72202 A3EA PQ: 0 ANSI: 5 [ 3.002303] scsi 0:0:20:0: Direct-Access ATA Hitachi HUA72202 A3EA PQ: 0 ANSI: 5 [ 3.013486] scsi 0:0:21:0: Direct-Access ATA Hitachi HUA72202 A3EA PQ: 0 ANSI: 5 [ 3.024320] scsi 0:0:22:0: Direct-Access ATA Hitachi HUA72202 A3EA PQ: 0 ANSI: 5 [ 3.029128] scsi1 : LSI SAS based MegaRAID driver [ 3.040407] scsi 0:0:23:0: Direct-Access ATA Hitachi HUA72202 A3EA PQ: 0 ANSI: 5 [ 3.043045] scsi 1:0:8:0: Direct-Access ATA Hitachi HUA72202 A3EA PQ: 0 ANSI: 5 [ 3.043236] scsi 0:0:24:0: Direct-Access ATA Hitachi HUA72202 A3EA PQ: 0 ANSI: 5 [ 3.045904] scsi 1:0:9:0: Direct-Access ATA Hitachi HUA72202 A3EA PQ: 0 ANSI: 5 [ 3.046077] scsi 0:0:25:0: Direct-Access ATA Hitachi HUA72202 A3EA PQ: 0 ANSI: 5 [ 3.048589] scsi 0:0:26:0: Direct-Access ATA Hitachi HUA72202 A3EA PQ: 0 ANSI: 5 [ 3.048747] scsi 1:0:10:0: Direct-Access ATA Hitachi HUA72202 A3EA PQ: 0 ANSI: 5 [ 3.050180] scsi 0:0:27:0: Enclosure LSI DE1600-SAS 0313 PQ: 0 ANSI: 5 [ 3.051957] scsi 1:0:11:0: Direct-Access ATA Hitachi HDS72202 A28A PQ: 0 ANSI: 5 [ 3.053299] scsi 0:0:28:0: Direct-Access ATA Hitachi HUA72202 A3EA PQ: 0 ANSI: 5 [ 3.054807] scsi 1:0:12:0: Direct-Access ATA Hitachi HUA72202 A3EA PQ: 0 ANSI: 5 [ 3.056146] scsi 0:0:29:0: Direct-Access ATA Hitachi HUA72202 A3EA PQ: 0 ANSI: 5 [ 3.058992] scsi 0:0:30:0: Direct-Access ATA Hitachi HUA72202 A3EA PQ: 0 ANSI: 5 [ 3.061840] scsi 0:0:31:0: Direct-Access ATA Hitachi HUA72202 A3EA PQ: 0 ANSI: 5 [ 3.064680] scsi 0:0:32:0: Direct-Access ATA Hitachi HUA72202 A3EA PQ: 0 ANSI: 5 [ 3.280761] scsi 0:0:34:0: Direct-Access ATA Hitachi HUA72202 A3EA PQ: 0 ANSI: 5 [ 3.292011] scsi 0:0:35:0: Direct-Access ATA Hitachi HDS72202 A20N PQ: 0 ANSI: 5 [ 3.303268] scsi 0:0:36:0: Direct-Access ATA Hitachi HUA72202 A3EA PQ: 0 ANSI: 5 [ 3.314413] scsi 0:0:37:0: Direct-Access ATA Hitachi HUA72202 A3EA PQ: 0 ANSI: 5 [ 3.341222] scsi 1:0:32:0: Enclosure DP BACKPLANE 1.07 PQ: 0 ANSI: 5 [ 3.344068] scsi 0:0:38:0: Direct-Access ATA Hitachi HUA72202 A3EA PQ: 0 ANSI: 5 [ 3.346923] scsi 0:0:39:0: Direct-Access ATA Hitachi HUA72202 A3EA PQ: 0 ANSI: 5 [ 3.389002] scsi 0:2:0:0: Direct-Access LSI MR9280DE-8e 2.0. PQ: 0 ANSI: 5 [ 3.397434] scsi 0:2:1:0: Direct-Access LSI MR9280DE-8e 2.0. PQ: 0 ANSI: 5 ----- I don't know what you are looking for... I'm guessing that using a software RAID may not give the same results. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Jan Engelhardt wrote:
Your system is known to be broken in various strange ways every now and then.
Any who's system runs without any bugs, might I ask? ;-) In this case the gdbm package is bugged in a few places. 1) the gdbm_open, which allows specifying a block-size that is used to perform subsequent I/O's take '0' as meaning "use the file system's optimal I/O size as returned by the "stat" call. The current gdbm code requires that the record size be a power of 2. This is required of parameters passed in, but also of the default 'ideal I/O' size as returned by stat. Unfortunately, there is no way to guarantee this (other than to build & test in a "clean" build environment -- unfortunately, there is no way to guarantee this -- since stat can return different values for each filesystem it gets io/size from.
I suggest you gdb your gdbm library (by way of single stepping through the test program) and see where it goes to set GDBM_BLOCK_SIZE_ERROR.
Looking at the source worked better as gdb stepped over the gdbmopen call rather than into it (maybe I didn't have the right debug packages loaded, so gdb ignored my request to "step- into?" the affected routine. Using source inspection as well as strace/ltrace, I found problems in gdbm/odbm/ndbm. While the gdbm tests in perl can be fixed to not use a '0' record size, the same isn't true for ndbm/odbm which burry the record size in the lib's source -- (and they use '0' to tell the lib to stat the device for the ideal I/O size)...
Because perl in factory is built using a clean well-known state. Every time.
Yeah... that's a problem. Not that it is **BUILT** that way, but that it is tested that way. AFAIK, openSuSE is building a *distribution* where the pieces are expected to work together -- not just in isolation in clean-room environments. While building that way, can be a positive step, testing that way is like testing a car's gas mileage by running the car at 'idle'. Real world numbers tend to vary a bit. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Friday 2013-09-06 02:10, Linda Walsh wrote:
Jan Engelhardt wrote:
Your system is known to be broken in various strange ways every now and then.
Any who's system runs without any bugs, might I ask? ;-)
In this case the gdbm package is bugged in a few places.
1) the gdbm_open, which allows specifying a block-size that is used to perform subsequent I/O's take '0' as meaning "use the file system's optimal I/O size as returned by the "stat" call.
I had something like that in the back of the head, but all of - xfs block sizes 512, 1024, 2048, 4096 - ext4 block sizes 1024, 2048, 4096 - btrfs block size 4096 ran your test program well.
The current gdbm code requires that the record size be a power of 2.
I have yet to see a filesystem whose ideal IO size, especially if it is derived from the fs block/sector/whatever size, is not a power of 2. But there is a first time for everything. Are you on some odd flash filesystem, perhaps?
Looking at the source worked better as gdb stepped over the gdbmopen call rather than into it (maybe I didn't have the right debug packages loaded, so gdb ignored my request to "step- into?" the affected routine.
For stepping, having gdbm-debugsource in addition to gdbm-debuginfo/libgdbm3-debuginfo is a must-have.
While building that way, can be a positive step, testing that way is like testing a car's gas mileage by running the car at 'idle'. Real world numbers tend to vary a bit.
Unfortunately, determing mileage works the same way - also in a clean room, too clean in fact :( "Despite economical driving style and avoiding phases of leadfooting, fuel consumption is 25% above the advertised mileages." - http://www.spiegel.de/auto/aktuell/studie-des-icct-zum-realen-spritverbrauch... -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Thu, Sep 5, 2013 at 9:18 PM, Jan Engelhardt <jengelh@inai.de> wrote:
The current gdbm code requires that the record size be a power of 2.
I have yet to see a filesystem whose ideal IO size, especially if it is derived from the fs block/sector/whatever size, is not a power of 2. But there is a first time for everything.
Are you on some odd flash filesystem, perhaps?
Linda likes raid systems and that was what she specifically said. For raid 5 & 6 arrays in particular, the optimum write size is the stripe size. The stripe size is typically not a power of 2 (but it can be). For a raid 5 or 6 writing optimally means the disk controller can implement a pure write for that stripe and the writes all happen parallel. A non-optimal write means it has to implement a read - modify - write for both the updated data blocks and for the parity block. Thus it takes roughly twice as long. It is highly worth optimizing the workload if feasible. Greg -- Greg Freemyer -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Greg Freemyer wrote:
On Thu, Sep 5, 2013 at 9:18 PM, Jan Engelhardt <jengelh@inai.de> wrote:
The current gdbm code requires that the record size be a power of 2. I have yet to see a filesystem whose ideal IO size, especially if it is derived from the fs block/sector/whatever size, is not a power of 2. But there is a first time for everything.
Are you on some odd flash filesystem, perhaps?
Linda likes raid systems and that was what she specifically said.
For raid 5 & 6 arrays in particular, the optimum write size is the stripe size. The stripe size is typically not a power of 2 (but it can be).
Bingo.
For a raid 5 or 6 writing optimally means the disk controller can implement a pure write for that stripe and the writes all happen parallel. A non-optimal write means it has to implement a read - modify - write for both the updated data blocks and for the parity block. Thus it takes roughly twice as long. It is highly worth optimizing the workload if feasible.
--- Double bingo? ;-) -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Jan Engelhardt wrote:
I had something like that in the back of the head, but all of - xfs block sizes 512, 1024, 2048, 4096 - ext4 block sizes 1024, 2048, 4096 - btrfs block size 4096 ran your test program well.
mount|grep '^\s*/.*on /.*'|sed 's/^\s*// ;s/on //'|sort|uniq|while read dev mp rest do
----- The block size != the ideal I/O size. Block size is the *minimum* I/O size But using the stat -c "%o" call on my mounted partitions, I see: printf "%10d %40s %s\n" "$(sudo stat -c "%o" "$mp")" "$mp" "$dev" done|sort -n 1024 /net /etc/auto.net 1024 /smb /etc/auto.smb 1024 /misc /etc/auto.misc 1024 /homes /etc/auto.homes 4096 / /dev/sdc1 4096 /home/.snapdir/@GMT-2013.08.12-04.07.02 /dev/HnS/Home-2013.08.12-04.07.02 4096 /home/.snapdir/@GMT-2013.08.22-03.52.42 /dev/HnS/Home-2013.08.22-03.52.42 4096 /home/.snapdir/@GMT-2013.08.24-04.07.03 /dev/HnS/Home-2013.08.24-04.07.03 4096 /home/.snapdir/@GMT-2013.08.26-04.07.02 /dev/HnS/Home-2013.08.26-04.07.02 4096 /home/.snapdir/@GMT-2013.08.28-04.07.02 /dev/HnS/Home-2013.08.28-04.07.02 4096 /home/.snapdir/@GMT-2013.08.29-04.07.02 /dev/HnS/Home-2013.08.29-04.07.02 4096 /home/.snapdir/@GMT-2013.08.30-04.07.03 /dev/HnS/Home-2013.08.30-04.07.03 4096 /home/.snapdir/@GMT-2013.08.31-04.07.04 /dev/HnS/Home-2013.08.31-04.07.04 4096 /home/.snapdir/@GMT-2013.09.02-04.07.10 /dev/HnS/Home-2013.09.02-04.07.10 4096 /home/.snapdir/@GMT-2013.09.03-04.07.04 /dev/HnS/Home-2013.09.03-04.07.04 4096 /home/.snapdir/@GMT-2013.09.04-04.07.03 /dev/HnS/Home-2013.09.04-04.07.03 4096 /home/.snapdir/@GMT-2013.09.06-04.07.03 /dev/HnS/Home-2013.09.06-04.07.03 65536 /usr /dev/sdc6 131072 /tmp /dev/sdc2 131072 /var /dev/sdc2 131072 /Nroot /dev/sdc8 131072 /Nroot/var /dev/sdc9 ------non power of 2: 655360 /Media /dev/Media/Media 655360 /backups/Media /dev/HnS/Media_Back 655360 /var/cache/squid /dev/HnS/Squid_Cache 786432 /home /dev/HnS/Home 786432 /Share /dev/HnS/Share 786432 /root2 /dev/HnS/Sys 786432 /backups /dev/Backups/Backups 786432 /home/Win /dev/HnS/Win 786432 /home.diff /dev/HnS/Home.diff 786432 /root2/var /dev/HnS/Sysvar 786432 /usr/share /dev/HnS/Home 786432 /root2/boot /dev/HnS/Sysboot ----(power of 2) 2097152 /boot /dev/sdc3' ----- The above figures come from a combination of how the disk is formatted, as well as mount options (like for /boot my allocsize=2M so the images won't be as likely to fragment. So for xfs_info on my /home partition:
meta-data=/dev/mapper/HnS-Home isize=256 agcount=4, agsize=67108864 blks = sectsz=512 attr=2 data = bsize=4096 blocks=268435456, imaxpct=5 = sunit=16 swidth=192 blks ********************** naming =version 2 bsize=4096 ascii-ci=0 log =internal bsize=4096 blocks=32768, version=2 = sectsz=512 sunit=16 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 The ***** line is where it is getting the idea i/o size. When you created your RAID disks, I know XFS has params for you to specify the Sunit size and it's width -- so you can have the I/O's lined up with the HW. Did you specify a size & width? Do the other Filesystems also have the ability to specify the HW RAID sizes? -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
В Fri, 06 Sep 2013 19:20:38 -0700 Linda Walsh <suse@tlinx.org> пишет:
When you created your RAID disks, I know XFS has params for you to specify the Sunit size and it's width -- so you can have the I/O's lined up with the HW.
You need to raise this up on gdbm list/bug tracker. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Andrey Borzenkov wrote:
� Fri, 06 Sep 2013 19:20:38 -0700 Linda Walsh <suse@tlinx.org> �����:
When you created your RAID disks, I know XFS has params for you to specify the Sunit size and it's width -- so you can have the I/O's lined up with the HW.
You need to raise this up on gdbm list/bug tracker.
Already attended to, but no response... Maybe its no longer supported? -------- Original Message -------- Subject: bug in 1.10: "optimal i/o size is *assumed* to be power of 2. This is not always true. Date: Wed, 04 Sep 2013 23:09:35 -0700 To: bug-gdbm@gnu.org In gdbmopen.c, ~ lines 235-242 we can see the dir size starting off with 8 'datums' of size (off_t), which it says take 3 bits to store. Fine. Then it loops on dir_size < block_size and uses a leftshift on size, and +1 on bits until dir_size >= blocksize. Using a 12-data disk RAID of 64KB/segment, => 768K = 1 fullwidth stripe on the RAID -- which is exactly what is returned from "stat" when asked for the blocksize. When dirsize becomes > 768K, it will have jumped from 512K->1M. Following that at line 244, is a check: /* Check for correct block_size. */ if (dbf->header->dir_size != dbf->header->block_size) { gdbm_close (dbf); gdbm_errno = GDBM_BLOCK_SIZE_ERROR; return NULL; } ---- But dir block size Cannot be equal to the desired block size, to the way it is calculated by powers of 2. Either the prog needs to use dir_size/(sizeof(off_t)) to get dir entries, or if power of 2 is needed for other reasons, then 256K needs to be "allocated" to padding after the dir_block_size gets to 512K -- thus causing further DB writes to be aligned. The above was detected using the SuSE factory source rpm for what will be "13.1" of opensuse. Note -- it is also the case that because of this bug, perl won't build and pass it's DB tests on some machines (like mine) because of this error. ----- Also, a bit of oddness -- I know that several places use a hard-coded '31' as top end for number of bits.. Might this cause problems on 64-bit machines? (bucket.c being the main offender) -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Linda A. Walsh <gnu@tlinx.org> ha escrit:
Already attended to, but no response... Maybe its no longer supported?
It is. I'll attend the matter as soon as my schedule permits. Regards, Sergey -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Sergey Poznyakoff wrote:
Linda A. Walsh ha escrit:
Already attended to, but no response... Maybe its no longer supported? It is. I'll attend the matter as soon as my schedule permits.
Does anyone know of a test suite for gdbm to test for sanity and previous version compatibility? The attached patch allows perl to build and successfully test on my machine (w/stat-blocksize=768K) that is used when using the "compat" libs as well using gdbm with a 0-record size (0=auto-select based on stat-blocksize). It keeps the blocksize the same, but bails out of the auto-size algorithm when the dirsize is > 50% of the blocksize. I.e. before, it continued until dirsize was >= blocksize, now it will stop if 2*dirsize > blocksize. Theoretically, but with no way to really test it, if you are running on a file system with blocksize = 2**X for some integer X, the data and dir blocksizes should remain the same. The downside would be for people for whom the previous algorithm DIDN'T work. Their dir entries could be about 50% of what they could be if the full dir-block was used (but it cuts out at the previous power of 2) -- with the datablock size being unchanged. I.e, in my case, w/iosize=768K, the dir will stop @ size 512K leaving 256K entries unused -- but allowing for the DB as a whole to use the optimal I/O size. Would take more work than I care to spend to try to change the various algorithms to use the full space for the dir-entries. I think the area for hashes and data should be fully utilized, as it appeared the sizing problem was in sizing the dir block size upwards by a factor of 2 until it was >= to the iosize (in my case, the dirblock size would have not stopped increasing until 1024K, -- which caused the internal consistency check to fail. Anyway -- testing would be good now - all I know is it let perl pass on build. (about 10 tests failed due to the db problem).
On Saturday 2013-09-07 04:20, Linda Walsh wrote:
Jan Engelhardt wrote:
I had something like that in the back of the head, but all of - xfs block sizes 512, 1024, 2048, 4096 - ext4 block sizes 1024, 2048, 4096 - btrfs block size 4096 ran your test program well.
The block size != the ideal I/O size. Block size is the *minimum* I/O size But using the stat -c "%o" call on my mounted partitions, I see:
The optimum IO size "%o" is influenced by the fs block size option, which is why I went through a few values. Nothing more, nothing less :)
Did you specify a size & width?
Nope. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
participants (6)
-
Andrey Borzenkov
-
Greg Freemyer
-
Jan Engelhardt
-
Linda A. Walsh
-
Linda Walsh
-
Sergey Poznyakoff