[opensuse-kernel] XFS leaking memory?
Hi all, I have seen the following repeatedly, from time to time during the last years, but only today I have probably found a way to trigger it: * create some heavy FS workload on XFS, probably pushing the machine heavily into swap * stop the workload (probably that "make -j" in a big C++ project is not such a good idea after all...) * machine stays slow, as if it had no memory, starts swapping even though there are apparently gigabytes of free memory * free shows there is lots of memory free: susi:~ # free total used free shared buffers cached Mem: 3949728 3583132 366596 0 348 199180 -/+ buffers/cache: 3383604 566124 Swap: 2093052 317460 1775592 * swapoff often fails, even though there should be enough memory according to "free" * slabtop shows there is quite some amount of space in use: Active / Total Objects (% used) : 160024 / 14304077 (1.1%) Active / Total Slabs (% used) : 11111 / 778760 (1.4%) Active / Total Caches (% used) : 117 / 220 (53.2%) Active / Total Size (% used) : 104352.71K / 3073816.64K (3.4%) Minimum / Average / Maximum Object : 0.02K / 0.21K / 4096.00K OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME 13434482 0 0% 0.20K 707078 19 2828312K xfs_btree_cur 507270 0 0% 0.39K 50727 10 202908K xfs_efi_item 167297 8 0% 0.22K 9841 17 39364K xfs_buf_item 24752 21011 84% 0.03K 221 112 884K size-32 * there is no way to get rid of this short of unloading the XFS kernel module. Only unmounting the FS's does not help. I guess this is not really a SUSE kernel bug. Where to report this? And has maybe someone else seen something similar? Running FACTORY, 2.6.38-24-desktop (Kernel:HEAD), Thinkpad X200s with 4GB RAM, but I have seen similar things at least two years ago on a then current openSUSE server at home, also x86_64, also XFS. Until today I had guessed it had something to do with sparse files (running lots of kvm guests from images stored on XFS seemed to trigger it sometimes), but today it was clearly triggered by "overload the machine heavily by unbounded make -j", no sparse files involved and no kvm guests running at all. Thanks, seife -- Stefan Seyfried "Dispatch war rocket Ajax to bring back his body!" -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-kernel+help@opensuse.org
On Mon, 21 Mar 2011 15:02:22 +0100 Stefan Seyfried <stefan.seyfried@googlemail.com> wrote:
* slabtop shows there is quite some amount of space in use: Active / Total Objects (% used) : 160024 / 14304077 (1.1%) Active / Total Slabs (% used) : 11111 / 778760 (1.4%) Active / Total Caches (% used) : 117 / 220 (53.2%) Active / Total Size (% used) : 104352.71K / 3073816.64K (3.4%) Minimum / Average / Maximum Object : 0.02K / 0.21K / 4096.00K
OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME 13434482 0 0% 0.20K 707078 19 2828312K xfs_btree_cur 507270 0 0% 0.39K 50727 10 202908K xfs_efi_item 167297 8 0% 0.22K 9841 17 39364K xfs_buf_item 24752 21011 84% 0.03K 221 112 884K size-32
after unmounting all xfs, the situation is the same, after "rmmod xfs" it looks like this: Active / Total Objects (% used) : 164213 / 195172 (84.1%) Active / Total Slabs (% used) : 11174 / 11187 (99.9%) Active / Total Caches (% used) : 109 / 203 (53.7%) Active / Total Size (% used) : 105308.29K / 110456.62K (95.3%) Minimum / Average / Maximum Object : 0.02K / 0.57K / 4096.00K OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME 24304 20402 83% 0.03K 217 112 868K size-32 20736 20601 99% 0.08K 432 48 1728K sysfs_dir_cache 20034 19445 97% 0.18K 954 21 3816K vm_area_struct 18800 14065 74% 0.19K 940 20 3760K dentry 17169 10284 59% 0.06K 291 59 1164K size-64 Unfortunately, "unmount all xfs and rmmod xfs" is amost equivalent to rebooting the machine, so it is not really an option :-) -- Stefan Seyfried "Dispatch war rocket Ajax to bring back his body!" -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-kernel+help@opensuse.org
On Mon, 21 Mar 2011 15:07:33 +0100 Stefan Seyfried <stefan.seyfried@googlemail.com> wrote:
Unfortunately, "unmount all xfs and rmmod xfs" is amost equivalent to rebooting the machine, so it is not really an option :-)
and of course the problem is not easy to reproduce. The second try went just fine without leaking :-( -- Stefan Seyfried "Dispatch war rocket Ajax to bring back his body!" -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-kernel+help@opensuse.org
Hello, On Mon 21-03-11 15:02:22, Stefan Seyfried wrote: > I have seen the following repeatedly, from time to time during the last > years, but only today I have probably found a way to trigger it: > > * create some heavy FS workload on XFS, probably pushing the machine > heavily into swap > * stop the workload (probably that "make -j" in a big C++ project is not > such a good idea after all...) > * machine stays slow, as if it had no memory, starts swapping even though > there are apparently gigabytes of free memory > * free shows there is lots of memory free: > susi:~ # free > total used free shared buffers cached > Mem: 3949728 3583132 366596 0 348 199180 > -/+ buffers/cache: 3383604 566124 > Swap: 2093052 317460 1775592 Well, 366 MB free isn't that much given you have 317 MB in swap and 3.9 GB of memory. > * swapoff often fails, even though there should be enough memory according > to "free" > * slabtop shows there is quite some amount of space in use: > Active / Total Objects (% used) : 160024 / 14304077 (1.1%) > Active / Total Slabs (% used) : 11111 / 778760 (1.4%) > Active / Total Caches (% used) : 117 / 220 (53.2%) > Active / Total Size (% used) : 104352.71K / 3073816.64K (3.4%) > Minimum / Average / Maximum Object : 0.02K / 0.21K / 4096.00K > > OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME > 13434482 0 0% 0.20K 707078 19 2828312K xfs_btree_cur > 507270 0 0% 0.39K 50727 10 202908K xfs_efi_item > 167297 8 0% 0.22K 9841 17 39364K xfs_buf_item > 24752 21011 84% 0.03K 221 112 884K size-32 Hmm, so you have about 3 GB of memory in unused slabs. That's indeed a bug. This reminds me of one swap-over-NFS bug which was causing a similar effect for journal_handle slab. And after some digging - it was bug 554081 - I see that openSUSE 11.4 and master branches don't have the fix. Nick forgot to commit the fix to master branch and I didn't realize that either. So I'd bet on this patch... I'm also going to push it to master branch. > * there is no way to get rid of this short of unloading the XFS kernel > module. Only unmounting the FS's does not help. > > I guess this is not really a SUSE kernel bug. Where to report this? And > has maybe someone else seen something similar? > > Running FACTORY, 2.6.38-24-desktop (Kernel:HEAD), Thinkpad X200s with 4GB > RAM, but I have seen similar things at least two years ago on a then > current openSUSE server at home, also x86_64, also XFS. > > Until today I had guessed it had something to do with sparse files > (running lots of kvm guests from images stored on XFS seemed to trigger it > sometimes), but today it was clearly triggered by "overload the machine > heavily by unbounded make -j", no sparse files involved and no kvm guests > running at all. Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-kernel+help@opensuse.org
On Mon, 21 Mar 2011 18:53:45 +0100 Jan Kara <jack@suse.cz> wrote: > Hello, > > On Mon 21-03-11 15:02:22, Stefan Seyfried wrote: > > I have seen the following repeatedly, from time to time during the last > > years, but only today I have probably found a way to trigger it: > > > > * create some heavy FS workload on XFS, probably pushing the machine > > heavily into swap > > * stop the workload (probably that "make -j" in a big C++ project is not > > such a good idea after all...) > > * machine stays slow, as if it had no memory, starts swapping even though > > there are apparently gigabytes of free memory > > * free shows there is lots of memory free: > > susi:~ # free > > total used free shared buffers cached > > Mem: 3949728 3583132 366596 0 348 199180 > > -/+ buffers/cache: 3383604 566124 > > Swap: 2093052 317460 1775592 > Well, 366 MB free isn't that much given you have 317 MB in swap and 3.9 > GB of memory. Yes, I actually switched "used" and "free" in my mind X-). Anyway, there is no process using the memory, If I add all up, I have something like 300MB used, and I think the slabtop clearly shows that the kernel grabbed all memory. > > * swapoff often fails, even though there should be enough memory according > > to "free" > > * slabtop shows there is quite some amount of space in use: > > Active / Total Objects (% used) : 160024 / 14304077 (1.1%) > > Active / Total Slabs (% used) : 11111 / 778760 (1.4%) > > Active / Total Caches (% used) : 117 / 220 (53.2%) > > Active / Total Size (% used) : 104352.71K / 3073816.64K (3.4%) > > Minimum / Average / Maximum Object : 0.02K / 0.21K / 4096.00K > > > > OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME > > 13434482 0 0% 0.20K 707078 19 2828312K xfs_btree_cur > > 507270 0 0% 0.39K 50727 10 202908K xfs_efi_item > > 167297 8 0% 0.22K 9841 17 39364K xfs_buf_item > > 24752 21011 84% 0.03K 221 112 884K size-32 > Hmm, so you have about 3 GB of memory in unused slabs. That's indeed a > bug. This reminds me of one swap-over-NFS bug which was causing a similar > effect for journal_handle slab. And after some digging - it was bug 554081 > - I see that openSUSE 11.4 and master branches don't have the fix. Nick > forgot to commit the fix to master branch and I didn't realize that > either. So I'd bet on this patch... I'm also going to push it to master > branch. Thanks. Unfortunately it's not easy to reproduce, so I cannot instantly tell you if it's fixed. Is it possible that the same bug is / was in 11.2 or 11.3? Just because I'm pretty sure that I saw similar things on my server at home quite some time ago. The box is now 11.3, but I'm pretty sure it was 11.2 or 11.1 when I first saw it (about 1,5 years ago) Maybe we should drop all those enterprise patches from openSUSE. Makes life easier by getting closer to mainline and only the paying customers suffer :-) Lucky me that I did *not* report this on lkml or fsdevel :-) -- Stefan Seyfried "Dispatch war rocket Ajax to bring back his body!" -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-kernel+help@opensuse.org
>>> On 21.03.11 at 18:53, Jan Kara <jack@suse.cz> wrote: > Hmm, so you have about 3 GB of memory in unused slabs. That's indeed a > bug. This reminds me of one swap-over-NFS bug which was causing a similar > effect for journal_handle slab. And after some digging - it was bug 554081 > - I see that openSUSE 11.4 and master branches don't have the fix. Nick > forgot to commit the fix to master branch and I didn't realize that > either. So I'd bet on this patch... I'm also going to push it to master > branch. While I see the fix on master, there's nothing so far on 11.4 and 11.3, though they both appear affected - there's a certain chance that bug 681540, despite it's completely different description, is actually a result of this. Thanks, Jan -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-kernel+help@opensuse.org
On Wed 23-03-11 11:05:15, Jan Beulich wrote: > >>> On 21.03.11 at 18:53, Jan Kara <jack@suse.cz> wrote: > > Hmm, so you have about 3 GB of memory in unused slabs. That's indeed a > > bug. This reminds me of one swap-over-NFS bug which was causing a similar > > effect for journal_handle slab. And after some digging - it was bug 554081 > > - I see that openSUSE 11.4 and master branches don't have the fix. Nick > > forgot to commit the fix to master branch and I didn't realize that > > either. So I'd bet on this patch... I'm also going to push it to master > > branch. > > While I see the fix on master, there's nothing so far on 11.4 and > 11.3, though they both appear affected - there's a certain chance > that bug 681540, despite it's completely different description, is > actually a result of this. Yup, it looks like that bug. The fix is committed also to openSUSE 11.3 and 11.4 branch. Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-kernel+help@opensuse.org
participants (3)
-
Jan Beulich
-
Jan Kara
-
Stefan Seyfried