https://bugzilla.novell.com/show_bug.cgi?id=802625 https://bugzilla.novell.com/show_bug.cgi?id=802625#c0 Summary: Crash kernel size limitations cause dumps on large systems to fail Classification: openSUSE Product: openSUSE 11.4 Version: Factory Platform: x86-64 OS/Version: openSUSE 11.4 Status: NEW Severity: Major Priority: P5 - None Component: Kernel AssignedTo: kernel-maintainers@forge.provo.novell.com ReportedBy: lisa.mitchell@hp.com QAContact: qa-bugs@suse.de Found By: --- Blocker: --- User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; GTB7.4; SLCC1; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 1.0.3705; .NET CLR 1.1.4322; .NET CLR 3.5.30729; InfoPath.2) I have filed a bugzilla about this to SLES11SP3, and this bugzilla is to get this user space code solution we want, makedumpfile v1.5.1 into Open SUSE release for the exposure it needs before it can be taken into a SLES release. The current SLES11SP2 shipping version of makedumpfile makes a bitmap of all the memory in the system, which requires 64 MB per each terabyte of system memory. As memory size increases, this means the crashkernel size must increase to contain this bit map. For a 12 TB system, this means the bitmap must be 768 MB in size to dump this memory. With at least 128 MB needed for the kernel (and with various IO, driver configurations, we've seen the kernel take up more space), this would require crashkernel sizes greater than 896 MB to get a successful dump of this memory size. On upstream kernel versions 3.3 later, crashkernel sizes are now limited to less than 512 MB, due to a page table allocation now required at 512MB which bisects the space the crashkernel can occupy. See https://lkml.org/lkml/2012/3/13/372 https://lkml.org/lkml/2012/3/13/36 This clearly will not be enough to allow a dump to succeed for a 12 TB system. We have also seen indications that greater than 1 GB crashkernel sizes are needed for certain configurations of our largest DL980 systems with 4 TB memory, and certain IO cards or file system types that use a lot of kernel memory. We have been testing a new upstream version of makedumpfile, makedumpfile v1.5.1, which uses a fixed amount of memory. The new version employs a cyclic buffer size to cycle through all the memory to do its job, and this will allow the crashkernel to remain at a fixed size below the 512 MB maximum, and still be able to successfully dump arbitrarily large memory sizes. We have done tests with this new version of makedumpfile on a DL980 on with 4 TBytes of memory, and and were able to successfully dump systems with crashkernel sizes of 256 to 386 MB, on kernel versions of 3.3 and 2.6.32. Therefore, we request that the new makedumpfile, v1.5.1, be incorporated in SLES11SP3 the best solution for scaling dumps to 12 TB and beyond, and for preventing dumps in our largets configurations of the DL980 from failing due to lack of memory. The following link gives the source of the upstream posted makedumpfile v1.5.1 command: http://lists.infradead.org/pipermail/kexec/2012-December/007460.html As part of the request, there is also a kernel patch that is submitted upstream that we would request as part of this SLES11SP3 solution, as it improves performance with this new makedumpfile, so that there is no regression in dump times. https://lkml.org/lkml/2012/11/21/90 Reproducible: Didn't try Steps to Reproduce: 1.Install a DL980 with 4 TB, SLES11SP2 and maximum IO cards, with Fusion cards or VXVM/VXFS root file system from Symantec, installed later 2.If version 3.3 kernel is applied crashkernel max is less than 512 MB, if SLES11SP2 is applied 896 MB is the max. 3.Configure max crashkernel size, and trigger a dump Actual Results: The dump will fail due to out of memory while makedumpfile attempts to create the bitmap. Expected Results: The dump should complete successfully. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.