Re: [opensuse-kernel] ext3 - ENOMEM on file write

3 May 2010

      ...
...
...
On 5/3/2010 at 03:05 PM, in message <20100503210507.GI3470@quack.suse.cz>, Jan
Kara <jack@suse.cz> wrote: 
On Mon 03-05-10 14:23:50, Cameron Seader wrote:
...
...
On 5/3/2010 at 01:47 PM, in message <20100503194736.GH3470@quack.suse.cz>, 
Jan
Kara <jack@suse.cz> wrote: 
Hello,
On Mon 03-05-10 13:12:40, Cameron Seader wrote:
...
First, we're given an inode 'ainode', which should be the correct inode
for the file we're looking at. (If it were incorrect, we would have
gotten an error much earlier.)
If we have iget, we call iget. The 2.6.16.60-* kernels lack iget, I
believe, so instead we do:
  So we are talking about SLE10 based kernels, right? In fact these kernels
do have iget() but I guess you do not want to do all the writing by hand
and want to use standard write path and thus you need open file descriptor
for which you need a dentry...
No, this was my mistake. I thought the lack of an 'iget' symbol in the
core meant that it wasn't available, but iget itself is just a static
inline function, so it wouldn't be in there. We use iget if it's
available, so we are using iget here.
...
...
fid.i32.ino = ainode;
fid.i32.gen = 0;
dp = afs_cacheSBp->s_export_op->fh_to_dentry(afs_cacheSBp,
                                             &fid, sizeof(fid), 
FILEID_INO32_GEN);
filp = dentry_open(dp, mntget(afs_cacheMnt), O_RDWR);
  Hmm, so about which kernel are we speaking? fh_to_dentry has been
introduced only in 2.6.24...
Yes, sorry, that's my mistake. With iget, we actually call:
tip = iget(afs_cacheSBp, (u_long) ainode);
dp = d_alloc_anon(tip);
tip->i_flags |= MS_NOATIME;
filp = dentry_open(dp, mntget(afs_cacheMnt), O_RDWR);
  OK.
...
...
...
However, there would almost always be several reads of the same file
between successive writes. (Again, in an 'open(); read(); close();'
fashion) But they are probably all happening very quickly; I assume the
cache for the stuff in this file is thrashing.
  Well, the cache could be thrashing but still you'll get ENOMEM only if
the kernel cannot find enough memory to pull in a page you are writing to.
And that should not happen unless the machine has real problems. My
personal tip would be that your code leaks some memory (or reference or so)
and thus kernel really gets out of memory after enough reading / writing...
To be clear, I mean the OpenAFS cache is thrashing, not kernel memory
caches et al... I just meant to say that this particular file is getting
written to and read from a lot.
Here is output from kmem -i
crash> kmem -i
              PAGES        TOTAL      PERCENTAGE
 TOTAL MEM  4089940      15.6 GB         ----
      FREE  1587155       6.1 GB   38% of TOTAL MEM
  Indeed a lot of free memory...
...
Seems like we have enough memory. Do you know why we could be getting an
ENOMEM at all? Is there anything in an ext2/3 write that could require
allocating a lot of memory?
  On a standard write path, there's not too much of an ext2/3 specific
code and I don't see a big potetial for returning ENOMEM especially with
this much of free memory. Looking at the generic code,
generic_file_buffered_write has:
                if (unlikely(sigismember(¤t->pending.signal, SIGKILL))) {
                        /*
                         * Must not hang almost forever in D state in
                         * presence of sigkill and lots of ram/swap
                         * (think during OOM).
                         */
                        status = -ENOMEM;
                        break;
                }
So maybe this could be the path we are taking?
A look at the core makes it look very much to me like that is what it is
(hooray). Can you confirm the following?

crash> print ((struct task_struct*)0xffff8103432a2080)->pending.signal
$5 = {
  sig = {256}
}

SIGKILL is 9, 9-1==8, and (1 & (256 >> 8)) == 1. So, if I'm reading
sigismember correctly, yes, we have a SIGKILL pending. A little C test
program confirms, but I'd like to get confirmation from someone that's
actually used to the linux kernel code :)

I don't suppose there's any way to tell if this is caused via the OOM
killer, is there? Any structures or something in the core i can
analyze to see if it's been activated for some reason?

Thanks,
Cameron

-- 
To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org
For additional commands, e-mail: opensuse-kernel+help@opensuse.org