map_user_kiobuf on 2.4.21-241-numa kernel running suse 8.1 amd64

Hi, I have a filesystem driver using the map_user_kiobuf error = map_user_kiobuf(WRITE, user_mem, (unsigned long)buf, length_to_be_written); On this new kernel all 'dd' with bs>16k hangs the process and 'top' indicates it's spinning. cheetah:/home# dd if=/dev/zero of=test bs=16834 count=10 <---hangs--> I see that this function has changed a bit in 2.4.21-241 and i am wondering if the new changes are causing this. The thread is looping in map_user_kiobuf and it can not be killed. PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND 4281 root 19 0 564 548 444 R 99.6 0.0 108:21 dd Any help is appreciated Thanks, Skumar

On Fri, Oct 15, 2004 at 05:13:25PM -0400, Satish Kumar wrote:
I have a filesystem driver using the map_user_kiobuf
error = map_user_kiobuf(WRITE, user_mem, (unsigned long)buf, length_to_be_written);
On this new kernel all 'dd' with bs>16k hangs the process and 'top' indicates it's spinning.
cheetah:/home# dd if=/dev/zero of=test bs=16834 count=10 <---hangs-->
I see that this function has changed a bit in 2.4.21-241 and i am wondering if the new changes are causing this. The thread is looping in map_user_kiobuf and it can not be killed.
Are you sure the problem is in map_user_kiobuf() ? That function is used for raw io, and that certainly works with 16k transfers. I would enable sysrq in /etc/sysconfig/sysctl and do sysrq-t / sysrq-r on the console and check the backtrace of your process. With a few samples you can see where the time is spent. -Andi

I rebuilt the kernel and put printk to confirm we are getting stuck. Our module was working fine on 2.4.21-143 which was out first port on Suse-amd64. The block size i am using is GREATER than 16k. printk("entering point 2 \n"); <== No message after this. /* Try to fault in all of the necessary pages */ down_read(&mm->mmap_sem); /* rw==READ means read from disk, write into memory area */ err = get_user_pages_pte_pin(current, mm, va, pgcount, (rw==READ), 0, iobuf->maplist, NULL); up_read(&mm->mmap_sem); /* get_user_pages returns the amount of mapped pages, * which can be less than the amount of requested pages * in some cases. To avoid surprises downstream, we * unmap and return an error in those cases. -bjornw */ printk("entering point 3 \n"); if(err > 0) I enabled sysrq, but I am not getting any stack trace for the DD process '4075' Oct 16 00:18:58 porting-10 kernel: Oct 16 00:18:58 porting-10 kernel: tcsh S 0000000000000000 2760 4045 4044 4207 (NOTLB) Oct 16 00:18:58 porting-10 kernel: Oct 16 00:18:58 porting-10 kernel: Call Trace: [init_level4_pgt+0/4096]{init_level4_pgt+0} Oct 16 00:18:58 porting-10 kernel: Call Trace: [<ffffffff80101000>]{init_level4_pgt+0} Oct 16 00:18:58 porting-10 kernel: [system_call+119/124]{system_call+119} [sys_rt_sigsuspend+213/256]{sys_rt_sigsuspend+213} Oct 16 00:18:58 porting-10 kernel: [<ffffffff801100b3>]{system_call+119} [<ffffffff8010f205>]{sys_rt_sigsuspend+213} Oct 16 00:18:58 porting-10 kernel: [ptregscall_common+103/172]{ptregscall_common+103} Oct 16 00:18:58 porting-10 kernel: [<ffffffff801103eb>]{ptregscall_common+103} Oct 16 00:18:58 porting-10 kernel: dd R 000001000b7d5b00 0 4075 997 (NOTLB) Oct 16 00:18:58 porting-10 kernel: sshd S ffffffffa0009c82 0 4076 578 4078 4042 (NOTLB) Oct 16 00:18:58 porting-10 kernel: Oct 16 00:18:58 porting-10 kernel: Call Trace: [init_level4_pgt+0/4096]{init_level4_pgt+0} Oct 16 00:18:58 porting-10 kernel: Call Trace: [<ffffffff80101000>]{init_level4_pgt+0} Oct 16 00:18:58 porting-10 kernel: [schedule_timeout+37/240]{schedule_timeout+37} [refile_buffer+16/48]{__refile_buffer+64} Oct 16 00:18:58 porting-10 kernel: [<ffffffff80130225>]{schedule_timeout+37} [<ffffffff80157c10>]{__refile_buffer+64} Oct 16 00:18:58 porting-10 kernel: [unix_stream_data_wait+366/368]{unix_stream_data_wait+270} [_end+531682729/213175910 Thx, Skumar ----- Original Message ----- From: "Andi Kleen" <ak@suse.de> To: "Satish Kumar" <skumar@panasas.com> Cc: <suse-amd64@suse.com> Sent: Friday, October 15, 2004 9:35 PM Subject: Re: [suse-amd64] map_user_kiobuf on 2.4.21-241-numa kernel running suse 8.1 amd64
On Fri, Oct 15, 2004 at 05:13:25PM -0400, Satish Kumar wrote:
I have a filesystem driver using the map_user_kiobuf
error = map_user_kiobuf(WRITE, user_mem, (unsigned long)buf, length_to_be_written);
On this new kernel all 'dd' with bs>16k hangs the process and 'top' indicates it's spinning.
cheetah:/home# dd if=/dev/zero of=test bs=16834 count=10 <---hangs-->
I see that this function has changed a bit in 2.4.21-241 and i am wondering if the new changes are causing this. The thread is looping in map_user_kiobuf and it can not be killed.
Are you sure the problem is in map_user_kiobuf() ? That function is used for raw io, and that certainly works with 16k transfers.
I would enable sysrq in /etc/sysconfig/sysctl and do sysrq-t / sysrq-r on the console and check the backtrace of your process. With a few samples you can see where the time is spent.
-Andi
-- Check the List-Unsubscribe header to unsubscribe For additional commands, email: suse-amd64-help@suse.com

I tried using the raw device and saw that hangs too. porting-10 home/perf-36# fdisk -l Disk /dev/hda: 255 heads, 63 sectors, 9729 cylinders Units = cylinders of 16065 * 512 bytes Device Boot Start End Blocks Id System /dev/hda1 * 1 6001 48203001 83 Linux /dev/hda2 6002 9728 29937127+ 82 Linux swap porting-10 home/perf-36# raw /dev/raw/raw2 3 2 /dev/raw/raw2: bound to major 3, minor 2 porting-10 home/skumar# raw -q /dev/raw/raw2 /dev/raw/raw2: bound to major 3, minor 2 porting-10 home/skumar# porting-10 home/perf-36# dd if=/dev/zero of=/dev/raw/raw2 bs=65536 count=1 <----hangs---> What seems to be happening when using block size >= 16k is page 1 is coming in as "pinned", so __get_user_pages unpins page0 and retries infinetely. It's surprising that page 1 is pinned when we are trying to a fresh map. Using bs=1024 works porting-10 home/skumar# dd if=/dev/zero of=/dev/raw/raw2 bs=1024 count=1 1+0 records in 1+0 records out ----- Original Message ----- From: "skumar" <skumar@panasas.com> To: "Andi Kleen" <ak@suse.de> Cc: <suse-amd64@suse.com> Sent: Saturday, October 16, 2004 12:26 AM Subject: Re: [suse-amd64] map_user_kiobuf on 2.4.21-241-numa kernel running suse 8.1 amd64
I rebuilt the kernel and put printk to confirm we are getting stuck. Our module was working fine on 2.4.21-143 which was out first port on Suse-amd64. The block size i am using is GREATER than 16k.
printk("entering point 2 \n"); <== No message after this.
/* Try to fault in all of the necessary pages */ down_read(&mm->mmap_sem); /* rw==READ means read from disk, write into memory area */ err = get_user_pages_pte_pin(current, mm, va, pgcount, (rw==READ), 0, iobuf->maplist, NULL); up_read(&mm->mmap_sem); /* get_user_pages returns the amount of mapped pages, * which can be less than the amount of requested pages * in some cases. To avoid surprises downstream, we * unmap and return an error in those cases. -bjornw */ printk("entering point 3 \n"); if(err > 0)
I enabled sysrq, but I am not getting any stack trace for the DD process '4075'
Oct 16 00:18:58 porting-10 kernel: Oct 16 00:18:58 porting-10 kernel: tcsh S 0000000000000000 2760 4045 4044 4207 (NOTLB) Oct 16 00:18:58 porting-10 kernel: Oct 16 00:18:58 porting-10 kernel: Call Trace: [init_level4_pgt+0/4096]{init_level4_pgt+0} Oct 16 00:18:58 porting-10 kernel: Call Trace: [<ffffffff80101000>]{init_level4_pgt+0} Oct 16 00:18:58 porting-10 kernel: [system_call+119/124]{system_call+119} [sys_rt_sigsuspend+213/256]{sys_rt_sigsuspend+213} Oct 16 00:18:58 porting-10 kernel: [<ffffffff801100b3>]{system_call+119} [<ffffffff8010f205>]{sys_rt_sigsuspend+213} Oct 16 00:18:58 porting-10 kernel: [ptregscall_common+103/172]{ptregscall_common+103} Oct 16 00:18:58 porting-10 kernel: [<ffffffff801103eb>]{ptregscall_common+103} Oct 16 00:18:58 porting-10 kernel: dd R 000001000b7d5b00 0 4075 997 (NOTLB) Oct 16 00:18:58 porting-10 kernel: sshd S ffffffffa0009c82 0 4076 578 4078 4042 (NOTLB) Oct 16 00:18:58 porting-10 kernel: Oct 16 00:18:58 porting-10 kernel: Call Trace: [init_level4_pgt+0/4096]{init_level4_pgt+0} Oct 16 00:18:58 porting-10 kernel: Call Trace: [<ffffffff80101000>]{init_level4_pgt+0} Oct 16 00:18:58 porting-10 kernel: [schedule_timeout+37/240]{schedule_timeout+37} [refile_buffer+16/48]{__refile_buffer+64} Oct 16 00:18:58 porting-10 kernel: [<ffffffff80130225>]{schedule_timeout+37} [<ffffffff80157c10>]{__refile_buffer+64} Oct 16 00:18:58 porting-10 kernel: [unix_stream_data_wait+366/368]{unix_stream_data_wait+270} [_end+531682729/213175910
Thx, Skumar ----- Original Message ----- From: "Andi Kleen" <ak@suse.de> To: "Satish Kumar" <skumar@panasas.com> Cc: <suse-amd64@suse.com> Sent: Friday, October 15, 2004 9:35 PM Subject: Re: [suse-amd64] map_user_kiobuf on 2.4.21-241-numa kernel running suse 8.1 amd64
On Fri, Oct 15, 2004 at 05:13:25PM -0400, Satish Kumar wrote:
I have a filesystem driver using the map_user_kiobuf
error = map_user_kiobuf(WRITE, user_mem, (unsigned long)buf, length_to_be_written);
On this new kernel all 'dd' with bs>16k hangs the process and 'top' indicates it's spinning.
cheetah:/home# dd if=/dev/zero of=test bs=16834 count=10 <---hangs-->
I see that this function has changed a bit in 2.4.21-241 and i am wondering if the new changes are causing this. The thread is looping in map_user_kiobuf and it can not be killed.
Are you sure the problem is in map_user_kiobuf() ? That function is used for raw io, and that certainly works with 16k transfers.
I would enable sysrq in /etc/sysconfig/sysctl and do sysrq-t / sysrq-r on the console and check the backtrace of your process. With a few samples you can see where the time is spent.
-Andi
-- Check the List-Unsubscribe header to unsubscribe For additional commands, email: suse-amd64-help@suse.com
-- Check the List-Unsubscribe header to unsubscribe For additional commands, email: suse-amd64-help@suse.com

What seems to be happening when using block size >= 16k is page 1 is coming in as "pinned", so __get_user_pages unpins page0 and retries infinetely. It's surprising that page 1 is pinned when we are trying to a fresh map.
I think it's related to you using /dev/zero. /dev/zero always hands out the same zero page, and locking it multiple times doesn't work. I filled a bug. -Andi

Thanks, dd works with a non /dev/zero if. Could you please pass me the bug no and how i can track the bug? Thx, Satish ----- Original Message ----- From: "Andi Kleen" <ak@suse.de> To: "skumar" <skumar@panasas.com> Cc: "Andi Kleen" <ak@suse.de>; <suse-amd64@suse.com> Sent: Sunday, October 17, 2004 7:50 PM Subject: Re: [suse-amd64] map_user_kiobuf on 2.4.21-241-numa kernel running suse 8.1 amd64
What seems to be happening when using block size >= 16k is page 1 is coming in as "pinned", so __get_user_pages unpins page0 and retries infinetely. It's surprising that page 1 is pinned when we are trying to a fresh map.
I think it's related to you using /dev/zero. /dev/zero always hands out the same zero page, and locking it multiple times doesn't work. I filled a bug.
-Andi
participants (3)
-
Andi Kleen
-
Satish Kumar
-
skumar