FYI: aiotest is working fine on my SuSE 9.1 Personal box. -----Original Message----- From: Matthew Gregan [mailto:kinetik@orcon.net.nz] Sent: 27. september 2004 11:33 To: suse-programming-e@suse.com Subject: [suse-programming-e] Problems with POSIX Asynchronous IO Hi all, I'm having some problems with the POSIX Asynchronous IO support on a SLES 9 box I've been working with. The machine has all of the available SLES 9 updates applied. After submitting a single IO using the POSIX AIO interface provided by glibc, I notice that the worker thread created by glibc enters a tight loop and consumes 100% of one CPU. Once this occurs, AIO continues to work fine (other than this problem), but the worker thread will remain in this state until the program exits. The kernel does not log any errors during the occurrence of this problem, and the machine has otherwise been running fine. I've written a small test case to reproduce the problem I'm seeing (attached as aiotest.c). Build the test case using 'cc -Wall -o aiotest aiotest.c -lrt' and run it--it'll create an 8MB file in the current directory, submit a single 128kB IO, and then sleep for 10 seconds. The test case is small and does little error checking; for the sake of this test case, I prefer exiting to handling any errors nicely. Since writing the test case, I've reproduced the problem on a number of other SLES 9 boxes, both x86 and z/Series. The same test case, when run on a SLES 8 box or RHEL 3AS box runs without issue, and the CPU is close to idle, as expected. I suspect the problem is within either the new kernel AIO library provided as /usr/lib/librtkaio.so.1, with the kernel, or an interaction between the two. RHEL 3AS also provides librtkaio for kernel AIO, and it seems to be a new addition to SLES 9 as it was not present on my SLES 8 test boxes. Running aiotest with LD_ASSUME_KERNEL=2.4.0 to force use of the older pre-NPTL glibc results in the userspace AIO interface provided by librt.so.1 being used, and the test case runs without consuming excessive CPU. Any suggestions would be appreciated. 'vmstat 1' during an aiotest run shows: procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 83356 399012 10168 61196 0 0 0 0 1012 55 0 0 100 0 0 0 83356 399012 10168 61196 0 0 0 0 1036 93 1 0 99 0 [aiotest starts] 2 0 83356 400204 10192 59508 0 0 0 788 1026 54722 20 29 51 0 [aiotest should be idle at this point] 1 0 83356 398476 10192 61252 0 0 0 0 1052 111790 36 64 0 0 1 0 83356 396684 10200 63000 0 0 0 0 1010 111877 38 62 0 0 1 0 83356 394892 10200 64752 0 0 0 0 1011 112199 37 63 0 0 1 0 83356 393164 10200 66504 0 0 0 0 1009 112273 39 61 0 0 1 1 83356 391372 10204 68232 0 0 0 8740 1071 110521 35 65 0 0 1 0 83356 389708 10212 69972 0 0 0 40 1019 111472 38 62 0 0 1 0 83356 387916 10216 71724 0 0 0 0 1008 112259 40 60 0 0 1 0 83356 386188 10216 73480 0 0 0 0 1009 112332 39 61 0 0 1 0 83356 384396 10220 75232 0 0 0 0 1011 112208 38 62 0 0 0 1 83356 383516 10220 76288 0 0 0 8372 1075 67572 25 36 21 18 [10 second sleep ends, aiotest exits] 0 0 83356 383580 10228 76288 0 0 0 40 1013 57 0 0 95 5 0 0 83356 383580 10228 76288 0 0 0 0 1010 52 1 0 99 0 A few seconds after starting aiotest: $ ps -e -L -o pcpu,pid,tid,comm,wchan' | grep [a]iotest 0.0 23316 23316 aiotest2 schedule_timeout 71.7 23316 23317 aiotest2 - Attaching with GDB and looking at the thread backtraces: Thread 2 (Thread 1077447600 (LWP 23317)): #0 0x400335b6 in handle_kernel_aio () from /lib/tls/librt.so.1 #1 0x401529ed in start_thread () from /lib/tls/libpthread.so.0 #2 0x400f09ca in clone () from /lib/tls/libc.so.6 Thread 1 (Thread 1075172000 (LWP 23316)): #0 0xffffe410 in ?? () #1 0xbfffef08 in ?? () #2 0x40149bb8 in __elf_set___libc_thread_subfreeres_element___rpc_thread_destroy__ () from /lib/tls/libc.so.6 #3 0xbfffed64 in ?? () #4 0x400c0696 in __nanosleep_nocancel () from /lib/tls/libc.so.6 #5 0x400c0473 in sleep () from /lib/tls/libc.so.6 #6 0x080486d7 in main () #0 0xffffe410 in ?? () Further observation of the worker thread indicates that it spins inside handle_kernel_aio(), which is part of kaio_misc.c in the rtkaio module of glibc. I've quickly compared the source of the rtkaio module from SLES 9 and a RHEL 3AS box. The source of the modules in each glibc is identical, which suggests that the difference between the working RHEL 3AS box and the broken SLES 9 box lies elsewhere in the glibc code, or within the kernel. Running the test case using 'strace -f' shows thousands syscalls like the one below. Compared to a run on a RHEL 3AS box, the number of these io_getevents() calls is abnormally high. Example output from 'strace -f ./aiotest': [...] [pid 23317] io_getevents(0x4017f000, 0, 0xa, 0x4038894c, 0) = 0 [...the line above is repeated thousands of times...] Cheers, -mjg -- Matthew Gregan |/ /| kinetik@orcon.net.nz