I just downloaded the new packet writing code and tried it out on my system.
no SMP, HP 9200 SCSI CD-RW drive.
I was copying over a ~260MB dir with cp -a but the copying seems kind of
strange. While the "cp" is running, my CD-RW "write LED" isn't on, I assumed
it was some write caching/buffering but the cp was taking >30min so I typed
"sync" and all of a sudden the write LED came on. Shouldn't the kernel be
syncing more frequently?
I got tired of the cp so I kill -9'ed the cp process, when I tried to umount
the cdrw I got the following message. I then cp -a the data back from
cdrw to hard drive, one file has some directory entry but no real contents,
I ran md5sum on the original files and the copied back from cdrw files and
about 10 of 60 files have incorrect md5sums.
assert failed sectors == rq->nr_sectors,pkt_recheck_segments at 142
kernel BUG at pktcdvd.c:142!
invalid operand: 0000
CPU: 0
EIP: 0010:[<c018da28>]
EFLAGS: 00010286
eax: 0000001d ebx: 0000001c ecx: 00000001 edx: 00000000
esi: 00000080 edi: dff6ade0 ebp: dff6ade0 esp: dff5df2c
ds: 0018 es: 0018 ss: 0018
Process kpacketd (pid: 5, stackpage=dff5d000)
Stack: c024b865 c024ba6e 0000008e df8ccda0 df8ccda0 00002800 c018e342 dff6ade0
dff6ade0 dff5e000 dff5e06c dff6e480 dff3c940 00000004 00000070 00002800
00002780 c018e429 dff5e000 dff6ade0 dff6ade0 00000000 c018e513 dff5e000
Call Trace: [<c024b865>] [<c024ba6e>] [<c018e342>] [<c018e429>] [<c018e513>] [
On Sat, Sep 09 2000, Ravi K. Swamy wrote:
I just downloaded the new packet writing code and tried it out on my system. no SMP, HP 9200 SCSI CD-RW drive.
I was copying over a ~260MB dir with cp -a but the copying seems kind of strange. While the "cp" is running, my CD-RW "write LED" isn't on, I assumed it was some write caching/buffering but the cp was taking >30min so I typed "sync" and all of a sudden the write LED came on. Shouldn't the kernel be syncing more frequently?
Well, yes and no. If you have enough memory to hold it all in cache, then there's no point in starting a flush early. However, if the cp as stalled we are indeed not flushing when we should. I'll try and reproduce it here.
I got tired of the cp so I kill -9'ed the cp process, when I tried to umount the cdrw I got the following message. I then cp -a the data back from cdrw to hard drive, one file has some directory entry but no real contents, I ran md5sum on the original files and the copied back from cdrw files and about 10 of 60 files have incorrect md5sums.
The corruption is now a known issue that I'm looking into.
assert failed sectors == rq->nr_sectors,pkt_recheck_segments at 142 kernel BUG at pktcdvd.c:142! invalid operand: 0000 CPU: 0 EIP: 0010:[<c018da28>] EFLAGS: 00010286 eax: 0000001d ebx: 0000001c ecx: 00000001 edx: 00000000 esi: 00000080 edi: dff6ade0 ebp: dff6ade0 esp: dff5df2c ds: 0018 es: 0018 ss: 0018 Process kpacketd (pid: 5, stackpage=dff5d000) Stack: c024b865 c024ba6e 0000008e df8ccda0 df8ccda0 00002800 c018e342 dff6ade0 dff6ade0 dff5e000 dff5e06c dff6e480 dff3c940 00000004 00000070 00002800 00002780 c018e429 dff5e000 dff6ade0 dff6ade0 00000000 c018e513 dff5e000 Call Trace: [<c024b865>] [<c024ba6e>] [<c018e342>] [<c018e429>] [<c018e513>] [
] [<c0108aaf>] Code: 0f 0b 83 c4 0c 8d 76 00 5b 5e 5f c3 83 ec 0c 55 31 c9 57 8b
Can you run this through ksymoops to get some symbols attached to
the call trace?
--
* Jens Axboe
On Sat, 9 Sep 2000, Jens Axboe wrote:
On Sat, Sep 09 2000, Ravi K. Swamy wrote:
I just downloaded the new packet writing code and tried it out on my system. no SMP, HP 9200 SCSI CD-RW drive.
I was copying over a ~260MB dir with cp -a but the copying seems kind of strange. While the "cp" is running, my CD-RW "write LED" isn't on, I assumed it was some write caching/buffering but the cp was taking >30min so I typed "sync" and all of a sudden the write LED came on. Shouldn't the kernel be syncing more frequently?
Well, yes and no. If you have enough memory to hold it all in cache, then there's no point in starting a flush early. However, if the cp as stalled we are indeed not flushing when we should. I'll try and reproduce it here.
I have 512MB of RAM. When I do things like untar a 20MB kernel src tree it does untar really quickly but then within a minute I hear the 10-20 seconds of syncing to the hard disk. The write to CD-RW was taking far longer than I usually see to sync to disk.
I got tired of the cp so I kill -9'ed the cp process, when I tried to umount the cdrw I got the following message. I then cp -a the data back from cdrw to hard drive, one file has some directory entry but no real contents, I ran md5sum on the original files and the copied back from cdrw files and about 10 of 60 files have incorrect md5sums.
The corruption is now a known issue that I'm looking into.
Thanks a lot, this is the first version of the packet writing code that has worked for me, I'm going to set it up on a machine at work, it's an IDE HP CD-RW drive.
assert failed sectors == rq->nr_sectors,pkt_recheck_segments at 142 kernel BUG at pktcdvd.c:142! invalid operand: 0000 CPU: 0 EIP: 0010:[<c018da28>] EFLAGS: 00010286 eax: 0000001d ebx: 0000001c ecx: 00000001 edx: 00000000 esi: 00000080 edi: dff6ade0 ebp: dff6ade0 esp: dff5df2c ds: 0018 es: 0018 ss: 0018 Process kpacketd (pid: 5, stackpage=dff5d000) Stack: c024b865 c024ba6e 0000008e df8ccda0 df8ccda0 00002800 c018e342 dff6ade0 dff6ade0 dff5e000 dff5e06c dff6e480 dff3c940 00000004 00000070 00002800 00002780 c018e429 dff5e000 dff6ade0 dff6ade0 00000000 c018e513 dff5e000 Call Trace: [<c024b865>] [<c024ba6e>] [<c018e342>] [<c018e429>] [<c018e513>] [
] [<c0108aaf>] Code: 0f 0b 83 c4 0c 8d 76 00 5b 5e 5f c3 83 ec 0c 55 31 c9 57 8b Can you run this through ksymoops to get some symbols attached to the call trace?
Sorry about that... Is this version of ksymoops okay? ksymoops 0.7c on i686 2.4.0-test7. Options used -V (default) -k /proc/ksyms (default) -l /proc/modules (default) -o /lib/modules/2.4.0-test7/ (default) -m /usr/src/linux/System.map (default) Warning: You did not tell me where to find symbol information. I will assume that the log matches the kernel and modules that are running right now and I'll use the default options above for symbol resolution. If the current kernel and/or modules do not match the log, you can get more accurate output by telling me the kernel version and where to find map, modules, ksyms etc. ksymoops -h explains the options. No modules in ksyms, skipping objects Warning (read_lsmod): no symbols in lsmod, is /proc/modules a valid lsmod file? invalid operand: 0000 CPU: 0 EIP: 0010:[<c018da28>] Using defaults from ksymoops -t elf32-i386 -a i386 EFLAGS: 00010286 eax: 0000001d ebx: 0000001c ecx: 00000001 edx: 00000000 esi: 00000080 edi: dff6ade0 ebp: dff6ade0 esp: dff5df2c ds: 0018 es: 0018 ss: 0018 Process kpacketd (pid: 5, stackpage=dff5d000) Stack: c024b865 c024ba6e 0000008e df8ccda0 df8ccda0 00002800 c018e342 dff6ade0 dff6ade0 dff5e000 dff5e06c dff6e480 dff3c940 00000004 00000070 00002800 00002780 c018e429 dff5e000 dff6ade0 dff6ade0 00000000 c018e513 dff5e000 Call Trace: [<c024b865>] [<c024ba6e>] [<c018e342>] [<c018e429>] [<c018e513>] [<c018e662>] [<c0108aaf>] Code: 0f 0b 83 c4 0c 8d 76 00 5b 5e 5f c3 83 ec 0c 55 31 c9 57 8b
EIP; c018da28
<===== Trace; c024b865 <__mon_yday+4385/5fa0> Trace; c024ba6e <__mon_yday+458e/5fa0> Trace; c018e342 Trace; c018e429 Trace; c018e513 Trace; c018e662 Trace; c0108aaf Code; c018da28 00000000 <_EIP>: Code; c018da28 <===== 0: 0f 0b ud2a <===== Code; c018da2a 2: 83 c4 0c addl $0xc,%esp Code; c018da2d 5: 8d 76 00 leal 0x0(%esi),%esi Code; c018da30 8: 5b popl %ebx Code; c018da31 9: 5e popl %esi Code; c018da32 a: 5f popl %edi Code; c018da33 b: c3 ret Code; c018da34 c: 83 ec 0c subl $0xc,%esp Code; c018da37 f: 55 pushl %ebp Code; c018da38 10: 31 c9 xorl %ecx,%ecx Code; c018da3a 12: 57 pushl %edi Code; c018da3b 13: 8b 00 movl (%eax),%eax
2 warnings issued. Results may not be reliable.
On Sat, Sep 09 2000, Ravi K. Swamy wrote:
Well, yes and no. If you have enough memory to hold it all in cache, then there's no point in starting a flush early. However, if the cp as stalled we are indeed not flushing when we should. I'll try and reproduce it here.
I have 512MB of RAM. When I do things like untar a 20MB kernel src tree it does untar really quickly but then within a minute I hear the 10-20 seconds of syncing to the hard disk.
Right and that didn't happen. If cp doesn't return, it is an entirely different problem. I'm banging my head against fs/buffer.c atm looking at this.
The corruption is now a known issue that I'm looking into.
Thanks a lot, this is the first version of the packet writing code that has worked for me, I'm going to set it up on a machine at work, it's an IDE HP CD-RW drive.
Great, remember to be careful, we are still some way from a beta quality product 8)
Sorry about that...
Is this version of ksymoops okay?
[snip]
Trace; c024b865 <__mon_yday+4385/5fa0>
Looks a bit weird, where the heck is this coming from?
--
* Jens Axboe
On Sat, 9 Sep 2000, Jens Axboe wrote:
On Sat, Sep 09 2000, Ravi K. Swamy wrote:
Well, yes and no. If you have enough memory to hold it all in cache, then there's no point in starting a flush early. However, if the cp as stalled we are indeed not flushing when we should. I'll try and reproduce it here.
I have 512MB of RAM. When I do things like untar a 20MB kernel src tree it does untar really quickly but then within a minute I hear the 10-20 seconds of syncing to the hard disk.
Right and that didn't happen. If cp doesn't return, it is an entirely different problem. I'm banging my head against fs/buffer.c atm looking at this.
Well hopefully I will soon have two machines to test this on so if you can suggest other types of testcases just let me know.
Sorry about that...
Is this version of ksymoops okay?
[snip]
Trace; c024b865 <__mon_yday+4385/5fa0>
Looks a bit weird, where the heck is this coming from?
I did compile another kernel afterward but i had saved the System.map from the kernel that was running when the bug happened. Then I rebooted to that kernel when I ran ksymoops. What's the weird part? I'm not a kernel expert... Is the trace the list of functions that have been called? and __mon_yday isn't a function but a data structure? I do see it used fs/udf/udftime.c
participants (2)
-
Jens Axboe
-
Ravi K. Swamy