On 5/3/2010 at 03:05 PM, in message <20100503210507.GI3470@quack.suse.cz>, Jan Kara <jack@suse.cz> wrote: On Mon 03-05-10 14:23:50, Cameron Seader wrote:
On 5/3/2010 at 01:47 PM, in message <20100503194736.GH3470@quack.suse.cz>, Jan Kara <jack@suse.cz> wrote: Hello,
On Mon 03-05-10 13:12:40, Cameron Seader wrote:
First, we're given an inode 'ainode', which should be the correct inode for the file we're looking at. (If it were incorrect, we would have gotten an error much earlier.)
If we have iget, we call iget. The 2.6.16.60-* kernels lack iget, I believe, so instead we do: So we are talking about SLE10 based kernels, right? In fact these kernels do have iget() but I guess you do not want to do all the writing by hand and want to use standard write path and thus you need open file descriptor for which you need a dentry...
No, this was my mistake. I thought the lack of an 'iget' symbol in the core meant that it wasn't available, but iget itself is just a static inline function, so it wouldn't be in there. We use iget if it's available, so we are using iget here.
fid.i32.ino = ainode; fid.i32.gen = 0; dp = afs_cacheSBp->s_export_op->fh_to_dentry(afs_cacheSBp, &fid, sizeof(fid), FILEID_INO32_GEN); filp = dentry_open(dp, mntget(afs_cacheMnt), O_RDWR); Hmm, so about which kernel are we speaking? fh_to_dentry has been introduced only in 2.6.24...
Yes, sorry, that's my mistake. With iget, we actually call:
tip = iget(afs_cacheSBp, (u_long) ainode); dp = d_alloc_anon(tip); tip->i_flags |= MS_NOATIME; filp = dentry_open(dp, mntget(afs_cacheMnt), O_RDWR); OK.
However, there would almost always be several reads of the same file between successive writes. (Again, in an 'open(); read(); close();' fashion) But they are probably all happening very quickly; I assume the cache for the stuff in this file is thrashing. Well, the cache could be thrashing but still you'll get ENOMEM only if the kernel cannot find enough memory to pull in a page you are writing to. And that should not happen unless the machine has real problems. My personal tip would be that your code leaks some memory (or reference or so) and thus kernel really gets out of memory after enough reading / writing...
To be clear, I mean the OpenAFS cache is thrashing, not kernel memory caches et al... I just meant to say that this particular file is getting written to and read from a lot.
Here is output from kmem -i
crash> kmem -i PAGES TOTAL PERCENTAGE TOTAL MEM 4089940 15.6 GB ---- FREE 1587155 6.1 GB 38% of TOTAL MEM Indeed a lot of free memory...
Seems like we have enough memory. Do you know why we could be getting an ENOMEM at all? Is there anything in an ext2/3 write that could require allocating a lot of memory? On a standard write path, there's not too much of an ext2/3 specific code and I don't see a big potetial for returning ENOMEM especially with this much of free memory. Looking at the generic code, generic_file_buffered_write has: if (unlikely(sigismember(¤t->pending.signal, SIGKILL))) { /* * Must not hang almost forever in D state in * presence of sigkill and lots of ram/swap * (think during OOM). */ status = -ENOMEM; break; } So maybe this could be the path we are taking?
A look at the core makes it look very much to me like that is what it is (hooray). Can you confirm the following? crash> print ((struct task_struct*)0xffff8103432a2080)->pending.signal $5 = { sig = {256} } SIGKILL is 9, 9-1==8, and (1 & (256 >> 8)) == 1. So, if I'm reading sigismember correctly, yes, we have a SIGKILL pending. A little C test program confirms, but I'd like to get confirmation from someone that's actually used to the linux kernel code :) I don't suppose there's any way to tell if this is caused via the OOM killer, is there? Any structures or something in the core i can analyze to see if it's been activated for some reason? Thanks, Cameron -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-kernel+help@opensuse.org