map_user_kiobuf on 2.4.21-241-numa kernel running suse 8.1 amd64
Hi, I have a filesystem driver using the map_user_kiobuf error = map_user_kiobuf(WRITE, user_mem, (unsigned long)buf, length_to_be_written); On this new kernel all 'dd' with bs>16k hangs the process and 'top' indicates it's spinning. cheetah:/home# dd if=/dev/zero of=test bs=16834 count=10 <---hangs--> I see that this function has changed a bit in 2.4.21-241 and i am wondering if the new changes are causing this. The thread is looping in map_user_kiobuf and it can not be killed. PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND 4281 root 19 0 564 548 444 R 99.6 0.0 108:21 dd Any help is appreciated Thanks, Skumar
On Fri, Oct 15, 2004 at 05:13:25PM -0400, Satish Kumar wrote:
I have a filesystem driver using the map_user_kiobuf
error = map_user_kiobuf(WRITE, user_mem, (unsigned long)buf, length_to_be_written);
On this new kernel all 'dd' with bs>16k hangs the process and 'top' indicates it's spinning.
cheetah:/home# dd if=/dev/zero of=test bs=16834 count=10 <---hangs-->
I see that this function has changed a bit in 2.4.21-241 and i am wondering if the new changes are causing this. The thread is looping in map_user_kiobuf and it can not be killed.
Are you sure the problem is in map_user_kiobuf() ? That function is used for raw io, and that certainly works with 16k transfers. I would enable sysrq in /etc/sysconfig/sysctl and do sysrq-t / sysrq-r on the console and check the backtrace of your process. With a few samples you can see where the time is spent. -Andi
I rebuilt the kernel and put printk to confirm we are getting stuck. Our
module was working fine on 2.4.21-143 which was out first port on
Suse-amd64. The block size i am using is GREATER than 16k.
printk("entering point 2 \n"); <== No message after this.
/* Try to fault in all of the necessary pages */
down_read(&mm->mmap_sem);
/* rw==READ means read from disk, write into memory area */
err = get_user_pages_pte_pin(current, mm, va, pgcount,
(rw==READ), 0, iobuf->maplist, NULL);
up_read(&mm->mmap_sem);
/* get_user_pages returns the amount of mapped pages,
* which can be less than the amount of requested pages
* in some cases. To avoid surprises downstream, we
* unmap and return an error in those cases. -bjornw
*/
printk("entering point 3 \n");
if(err > 0)
I enabled sysrq, but I am not getting any stack trace for the DD process
'4075'
Oct 16 00:18:58 porting-10 kernel:
Oct 16 00:18:58 porting-10 kernel: tcsh S 0000000000000000 2760
4045 4044 4207 (NOTLB)
Oct 16 00:18:58 porting-10 kernel:
Oct 16 00:18:58 porting-10 kernel: Call Trace:
[init_level4_pgt+0/4096]{init_level4_pgt+0}
Oct 16 00:18:58 porting-10 kernel: Call Trace:
[<ffffffff80101000>]{init_level4_pgt+0}
Oct 16 00:18:58 porting-10 kernel:
[system_call+119/124]{system_call+119}
[sys_rt_sigsuspend+213/256]{sys_rt_sigsuspend+213}
Oct 16 00:18:58 porting-10 kernel:
[<ffffffff801100b3>]{system_call+119}
[<ffffffff8010f205>]{sys_rt_sigsuspend+213}
Oct 16 00:18:58 porting-10 kernel:
[ptregscall_common+103/172]{ptregscall_common+103}
Oct 16 00:18:58 porting-10 kernel:
[<ffffffff801103eb>]{ptregscall_common+103}
Oct 16 00:18:58 porting-10 kernel: dd R 000001000b7d5b00 0
4075 997 (NOTLB)
Oct 16 00:18:58 porting-10 kernel: sshd S ffffffffa0009c82 0
4076 578 4078 4042 (NOTLB)
Oct 16 00:18:58 porting-10 kernel:
Oct 16 00:18:58 porting-10 kernel: Call Trace:
[init_level4_pgt+0/4096]{init_level4_pgt+0}
Oct 16 00:18:58 porting-10 kernel: Call Trace:
[<ffffffff80101000>]{init_level4_pgt+0}
Oct 16 00:18:58 porting-10 kernel:
[schedule_timeout+37/240]{schedule_timeout+37}
[refile_buffer+16/48]{__refile_buffer+64}
Oct 16 00:18:58 porting-10 kernel:
[<ffffffff80130225>]{schedule_timeout+37}
[<ffffffff80157c10>]{__refile_buffer+64}
Oct 16 00:18:58 porting-10 kernel:
[unix_stream_data_wait+366/368]{unix_stream_data_wait+270}
[_end+531682729/213175910
Thx,
Skumar
----- Original Message -----
From: "Andi Kleen"
On Fri, Oct 15, 2004 at 05:13:25PM -0400, Satish Kumar wrote:
I have a filesystem driver using the map_user_kiobuf
error = map_user_kiobuf(WRITE, user_mem, (unsigned long)buf, length_to_be_written);
On this new kernel all 'dd' with bs>16k hangs the process and 'top' indicates it's spinning.
cheetah:/home# dd if=/dev/zero of=test bs=16834 count=10 <---hangs-->
I see that this function has changed a bit in 2.4.21-241 and i am wondering if the new changes are causing this. The thread is looping in map_user_kiobuf and it can not be killed.
Are you sure the problem is in map_user_kiobuf() ? That function is used for raw io, and that certainly works with 16k transfers.
I would enable sysrq in /etc/sysconfig/sysctl and do sysrq-t / sysrq-r on the console and check the backtrace of your process. With a few samples you can see where the time is spent.
-Andi
-- Check the List-Unsubscribe header to unsubscribe For additional commands, email: suse-amd64-help@suse.com
I tried using the raw device and saw that hangs too.
porting-10 home/perf-36# fdisk -l
Disk /dev/hda: 255 heads, 63 sectors, 9729 cylinders
Units = cylinders of 16065 * 512 bytes
Device Boot Start End Blocks Id System
/dev/hda1 * 1 6001 48203001 83 Linux
/dev/hda2 6002 9728 29937127+ 82 Linux swap
porting-10 home/perf-36# raw /dev/raw/raw2 3 2
/dev/raw/raw2: bound to major 3, minor 2
porting-10 home/skumar# raw -q /dev/raw/raw2
/dev/raw/raw2: bound to major 3, minor 2
porting-10 home/skumar#
porting-10 home/perf-36# dd if=/dev/zero of=/dev/raw/raw2 bs=65536 count=1
<----hangs--->
What seems to be happening when using block size >= 16k is page 1 is coming
in as "pinned", so __get_user_pages unpins page0 and retries infinetely.
It's surprising that page 1 is pinned when we are trying to a fresh map.
Using bs=1024 works
porting-10 home/skumar# dd if=/dev/zero of=/dev/raw/raw2 bs=1024 count=1
1+0 records in
1+0 records out
----- Original Message -----
From: "skumar"
I rebuilt the kernel and put printk to confirm we are getting stuck. Our module was working fine on 2.4.21-143 which was out first port on Suse-amd64. The block size i am using is GREATER than 16k.
printk("entering point 2 \n"); <== No message after this.
/* Try to fault in all of the necessary pages */ down_read(&mm->mmap_sem); /* rw==READ means read from disk, write into memory area */ err = get_user_pages_pte_pin(current, mm, va, pgcount, (rw==READ), 0, iobuf->maplist, NULL); up_read(&mm->mmap_sem); /* get_user_pages returns the amount of mapped pages, * which can be less than the amount of requested pages * in some cases. To avoid surprises downstream, we * unmap and return an error in those cases. -bjornw */ printk("entering point 3 \n"); if(err > 0)
I enabled sysrq, but I am not getting any stack trace for the DD process '4075'
Oct 16 00:18:58 porting-10 kernel: Oct 16 00:18:58 porting-10 kernel: tcsh S 0000000000000000 2760 4045 4044 4207 (NOTLB) Oct 16 00:18:58 porting-10 kernel: Oct 16 00:18:58 porting-10 kernel: Call Trace: [init_level4_pgt+0/4096]{init_level4_pgt+0} Oct 16 00:18:58 porting-10 kernel: Call Trace: [<ffffffff80101000>]{init_level4_pgt+0} Oct 16 00:18:58 porting-10 kernel: [system_call+119/124]{system_call+119} [sys_rt_sigsuspend+213/256]{sys_rt_sigsuspend+213} Oct 16 00:18:58 porting-10 kernel: [<ffffffff801100b3>]{system_call+119} [<ffffffff8010f205>]{sys_rt_sigsuspend+213} Oct 16 00:18:58 porting-10 kernel: [ptregscall_common+103/172]{ptregscall_common+103} Oct 16 00:18:58 porting-10 kernel: [<ffffffff801103eb>]{ptregscall_common+103} Oct 16 00:18:58 porting-10 kernel: dd R 000001000b7d5b00 0 4075 997 (NOTLB) Oct 16 00:18:58 porting-10 kernel: sshd S ffffffffa0009c82 0 4076 578 4078 4042 (NOTLB) Oct 16 00:18:58 porting-10 kernel: Oct 16 00:18:58 porting-10 kernel: Call Trace: [init_level4_pgt+0/4096]{init_level4_pgt+0} Oct 16 00:18:58 porting-10 kernel: Call Trace: [<ffffffff80101000>]{init_level4_pgt+0} Oct 16 00:18:58 porting-10 kernel: [schedule_timeout+37/240]{schedule_timeout+37} [refile_buffer+16/48]{__refile_buffer+64} Oct 16 00:18:58 porting-10 kernel: [<ffffffff80130225>]{schedule_timeout+37} [<ffffffff80157c10>]{__refile_buffer+64} Oct 16 00:18:58 porting-10 kernel: [unix_stream_data_wait+366/368]{unix_stream_data_wait+270} [_end+531682729/213175910
Thx, Skumar ----- Original Message ----- From: "Andi Kleen"
To: "Satish Kumar" Cc: Sent: Friday, October 15, 2004 9:35 PM Subject: Re: [suse-amd64] map_user_kiobuf on 2.4.21-241-numa kernel running suse 8.1 amd64 On Fri, Oct 15, 2004 at 05:13:25PM -0400, Satish Kumar wrote:
I have a filesystem driver using the map_user_kiobuf
error = map_user_kiobuf(WRITE, user_mem, (unsigned long)buf, length_to_be_written);
On this new kernel all 'dd' with bs>16k hangs the process and 'top' indicates it's spinning.
cheetah:/home# dd if=/dev/zero of=test bs=16834 count=10 <---hangs-->
I see that this function has changed a bit in 2.4.21-241 and i am wondering if the new changes are causing this. The thread is looping in map_user_kiobuf and it can not be killed.
Are you sure the problem is in map_user_kiobuf() ? That function is used for raw io, and that certainly works with 16k transfers.
I would enable sysrq in /etc/sysconfig/sysctl and do sysrq-t / sysrq-r on the console and check the backtrace of your process. With a few samples you can see where the time is spent.
-Andi
-- Check the List-Unsubscribe header to unsubscribe For additional commands, email: suse-amd64-help@suse.com
-- Check the List-Unsubscribe header to unsubscribe For additional commands, email: suse-amd64-help@suse.com
What seems to be happening when using block size >= 16k is page 1 is coming in as "pinned", so __get_user_pages unpins page0 and retries infinetely. It's surprising that page 1 is pinned when we are trying to a fresh map.
I think it's related to you using /dev/zero. /dev/zero always hands out the same zero page, and locking it multiple times doesn't work. I filled a bug. -Andi
Thanks, dd works with a non /dev/zero if. Could you please pass me the bug
no and how i can track the bug?
Thx,
Satish
----- Original Message -----
From: "Andi Kleen"
What seems to be happening when using block size >= 16k is page 1 is coming in as "pinned", so __get_user_pages unpins page0 and retries infinetely. It's surprising that page 1 is pinned when we are trying to a fresh map.
I think it's related to you using /dev/zero. /dev/zero always hands out the same zero page, and locking it multiple times doesn't work. I filled a bug.
-Andi
participants (3)
-
Andi Kleen
-
Satish Kumar
-
skumar