[Bug 913885] New: PC with Intel DH55TC mainboard fails to resume, if GPT/UEFI is used
http://bugzilla.opensuse.org/show_bug.cgi?id=913885 Bug ID: 913885 Summary: PC with Intel DH55TC mainboard fails to resume, if GPT/UEFI is used Classification: openSUSE Product: openSUSE Distribution Version: 13.2 Hardware: Other OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Kernel Assignee: kernel-maintainers@forge.provo.novell.com Reporter: bjoernv@arcor.de QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- Created attachment 620163 --> http://bugzilla.opensuse.org/attachment.cgi?id=620163&action=edit Kernel 3.12.16 with failed resume I used openSUSE 13.1 successfully with an DOS partitioned hard disk. Suspend/Resume worked fine. Some weeks ago I replaced the old hard disk with a GPT partitioned SSD and a GPT partitioned hard disk (3 TB) and upgraded to openSUSE 13.2. Suspend still works, but Resume fails often (but not always). I get the following error (the complete "dmesg" output is attached). [ 3.762641] PM: 0x02551000 in e820 nosave region: [mem 0x02551000-0x02551fff] [ 3.808021] PM: Read 6888700 kbytes in 0.28 seconds (24602.50 MB/s) [ 3.809098] PM: Error -14 resuming [ 3.809116] PM: Failed to load hibernation image, recovering. [ 3.854094] PM: Basic memory bitmaps freed [ 3.854096] Restarting tasks ... done. My setup: - openSUSE 13.2 - Intel Mainboard DH55TC, BIOS TCIBX10H.86A.0048.2011.1206.1342 12/06/2011 (latest BIOS) - UEFI boot with Grub 2 - Kernel 3.12.36 Vanilla (I try to get similar logs for openSUSE 13.2 standard kernels and kernels from Kernel_stable repository) - as SLEEP_MODULE I use the "kernel" method /etc/pm/config.d/sleep-module.config with "SLEEP_MODULE=kernel" -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=913885
--- Comment #1 from Björn Voigt
http://bugzilla.opensuse.org/show_bug.cgi?id=913885
Mark Scott
http://bugzilla.opensuse.org/show_bug.cgi?id=913885
--- Comment #9 from Björn Voigt
http://bugzilla.opensuse.org/show_bug.cgi?id=913885
--- Comment #10 from Björn Voigt
http://bugzilla.opensuse.org/show_bug.cgi?id=913885
--- Comment #11 from Mark Scott
http://bugzilla.opensuse.org/show_bug.cgi?id=913885
--- Comment #12 from Joey Lee
Created attachment 620573 [details] Dmesg output after suspend (Kernel 3.18.3-1.gc3e148f-desktop)
Looks there have bug in e820_mark_nosave_regions() make an misjudgment when the e820 region doesn't align to PAGE_SIZE: void __init e820_mark_nosave_regions(unsigned long limit_pfn) { int i; unsigned long pfn = 0; for (i = 0; i < e820.nr_map; i++) { struct e820entry *ei = &e820.map[i]; if (pfn < PFN_UP(ei->addr)) register_nosave_region(pfn, PFN_UP(ei->addr)); /* register to nosave region if there have hole between 2 e820 entries */ pfn = PFN_DOWN(ei->addr + ei->size); [...] The purpose of the above block in e820_mark_nosave_regions() is to compare the end PFN of last region with the current region's start PFN. It used to find out the hole between 2 e820 entries and register it to nosave regions list. The above codes so far works when all e820 entries aligned to PAGE_SIZE, but have bug on non-aligned e820 entries. For example from dmesg: Aligned case: [ 0.000000] reserve setup_data: [mem 0x000000000270b658-0x00000000cbc4efff] usable [ 0.000000] reserve setup_data: [mem 0x00000000cbc4f000-0x00000000cbc93fff] ACPI NVS PFN_DOWN (0xcbc4efff+1) = ((0xCBC4F000) >> PAGE_SHIFT) = 0xCBC4F PFN_UP (0xcbc4f000) = (((0xcbc4f000) + PAGE_SIZE-1) >> PAGE_SHIFT) = (0xcbc4f000 + 0x1000 - 1 >> PAGE_SHIFT) = 0xCBC4FFFF >> PAGE_SHIFT = 0xCBC4F if (0xCBC4F < 0xCBC4F) /* NOT TRUE! */ register_nosave_region(...); Non-aligned case: [ 0.000000] reserve setup_data: [mem 0x0000000000100000-0x0000000002576017] usable [ 0.000000] reserve setup_data: [mem 0x0000000002576018-0x0000000002585857] usable PFN_DOWN (0x2576017+1) = ((0x2576018) >> PAGE_SHIFT) = 0x2576 PFN_UP (0x2576018) = (((0x2576018) + PAGE_SIZE-1) >> PAGE_SHIFT) = (0x2576018 + 0x1000 - 1 >> PAGE_SHIFT) = 0x2577017 >> PAGE_SHIFT = 0x2577 if (0x2576 < 0x2577) /* TRUE! Misjudgment! */ register_nosave_region(0x2576, 0x2577); /* register to nosave region */ Then why this issue happened randomly because the non-aligned e820 entries generated by e820_reserve_setup_data(), setup data are allocated by kernel when early boot stage. So the allocated address are random in usable region. I think this is a old issue in code until I add patch to check the persistent of e820 region between hibernate and resume. I am trying to make a patch to fix this issue. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=913885
Mark Scott
http://bugzilla.opensuse.org/show_bug.cgi?id=913885
--- Comment #13 from Joey Lee
http://bugzilla.opensuse.org/show_bug.cgi?id=913885
Joey Lee
http://bugzilla.opensuse.org/show_bug.cgi?id=913885
--- Comment #14 from Joey Lee
http://bugzilla.opensuse.org/show_bug.cgi?id=913885
--- Comment #15 from Björn Voigt
http://bugzilla.opensuse.org/show_bug.cgi?id=913885
--- Comment #16 from Björn Voigt
http://bugzilla.opensuse.org/show_bug.cgi?id=913885
--- Comment #17 from Joey Lee
I currently test the patch "0001-x86-mm-hibernate-Fix-misjudgment-of-nosave-region-to.patch". At first I test it with Kernel 3.12.36. This Kernel worked reliable with BIOS/GPT booting for my system. The original patch didn't apply cleanly. So I refreshed the patch.
Until now, hibernate/resume works reliable with this patch and Kernel 3.12.36. I continue testing. Later I will test Kernel 3.18.*.
Very thank for your testing on my patch. Yinghai Lu raised a better idea is totally remove E820_RESERVED_KERN type because we already using memblock to reserve setup_data instead of e820 table: http://lists-archives.com/linux-kernel/28235454-x86-mm-hibernate-fix-misjudg... -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=913885
--- Comment #18 from Joey Lee
http://bugzilla.opensuse.org/show_bug.cgi?id=913885
--- Comment #19 from Björn Voigt
http://bugzilla.opensuse.org/show_bug.cgi?id=913885
--- Comment #20 from Björn Voigt
http://bugzilla.opensuse.org/show_bug.cgi?id=913885
--- Comment #21 from Björn Voigt
http://bugzilla.opensuse.org/show_bug.cgi?id=913885
--- Comment #22 from Joey Lee
I tested the new patch "kill_e820_reserved_kern.patch" for three days. I applied the patch to Kernel kernel-desktop-3.18.5-1.1.gf378da4.x86_64 from Kernel_stable repository.
Suspend and resume worked fine with this patch on my Intel DH55TC system.
Great!
But in two of five cases I had a minor problem: During resume, the screen was black for 1-2 minutes and also the keyboard LED was off. At first I thought, that resume failed, but 1-2 minutes later the keyboard LED was on and graphics also switched on.
hm.... Does this issue happen before apply kill_e820_reserved_kern.patch?
I tested the patch also on another Notebook (Lenovo Ideapad U430 with Intel Core i5 Haswell CPU). Unfortunately resume failed on this notebook. The notebook rebooted during resume. I have no log entries for failed resume.
The same thing need to confirm is: Does this issue happen before apply kill_e820_reserved_kern.patch?
The problems with the Notebook and the black screen problem on my Intel DH55TC system are probably unrelated to this patch. I test again without the patch.
That's good, thanks! -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=913885
--- Comment #23 from Björn Voigt
http://bugzilla.opensuse.org/show_bug.cgi?id=913885
--- Comment #24 from Björn Voigt
http://bugzilla.opensuse.org/show_bug.cgi?id=913885
--- Comment #25 from Joey Lee
I checked suspend/resume with original openSUSE Kernel_stable Kernel 3.18.5-1.gf378da4-desktop on my Intel DH55TC system. The black screen / keyboard LED off problem for 1-2 minutes after/during resume also appears with this kernel. So it's unaffected by latest patch ( kill_e820_reserved_kern.patch).
Great! Another concern is the Lenovo machine you raised from comment#19:
I tested the patch also on another Notebook (Lenovo Ideapad U430 with Intel Core i5 Haswell CPU). Unfortunately resume failed on this notebook. The notebook rebooted during resume. I have no log entries for failed resume.
Does the resume fail cause by kill_e820_reserved_kern.patch? Thanks -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=913885
--- Comment #26 from Björn Voigt
http://bugzilla.opensuse.org/show_bug.cgi?id=913885
--- Comment #27 from Björn Voigt
http://bugzilla.opensuse.org/show_bug.cgi?id=913885
--- Comment #28 from Björn Voigt
http://bugzilla.opensuse.org/show_bug.cgi?id=913885
--- Comment #29 from Joey Lee
Created attachment 622907 [details] Dmesg output after failed resume (Kernel 3.12.37-desktop-bv2, patched Kernel_stable)
hm... Base on dmesg, the hibernate subsystem didn't find any hibernate image, that causes system didn't resume: [ 3.472078] PM: Basic memory bitmaps freed [ 3.474816] PM: Starting manual resume from disk [ 3.474819] PM: Hibernation image partition 8:2 present [ 3.474820] PM: Looking for hibernation image. [ 3.475052] PM: Image not found (code -22) [ 3.475056] PM: Hibernation image not present or could not be loaded. But, currently I have no idea why the hibernate image didn't find. There have another doubtful thing is there have ioremap warning when trying to add pci device: [ 0.267916] ------------[ cut here ]------------ [ 0.267921] WARNING: CPU: 0 PID: 1 at arch/x86/mm/ioremap.c:63 __ioremap_check_ram+0xb3/0xc0() [ 0.267922] ioremap on RAM pfn 0x2551 Per kernel source, looks 0x2551000 should be the PG_reserved System RAM and kernel try to ioremap it. The above message is warning. I am confusing for this warning message because the pages of setup_data are reserved but why emit message. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=913885
--- Comment #30 from Joey Lee
http://bugzilla.opensuse.org/show_bug.cgi?id=913885
--- Comment #31 from Björn Voigt
http://bugzilla.opensuse.org/show_bug.cgi?id=913885
--- Comment #32 from Björn Voigt
http://bugzilla.opensuse.org/show_bug.cgi?id=913885
--- Comment #33 from Björn Voigt
http://bugzilla.opensuse.org/show_bug.cgi?id=913885
--- Comment #34 from Björn Voigt
http://bugzilla.opensuse.org/show_bug.cgi?id=913885
--- Comment #35 from Björn Voigt
http://bugzilla.opensuse.org/show_bug.cgi?id=913885
--- Comment #36 from Björn Voigt
http://bugzilla.opensuse.org/show_bug.cgi?id=913885
--- Comment #37 from Björn Voigt
http://bugzilla.opensuse.org/show_bug.cgi?id=913885
--- Comment #38 from Joey Lee
After lots of successful suspend/resume cycles in the last days, I have a failed resume operation with Kernel 3.12.38 and patch kill_e820_reserved_kern.patch again.
[ 3.334134] PM: Using 3 thread(s) for decompression. PM: Loading and decompressing image data (1884119 pages)... [ 3.334138] PM: Image mismatch: memory size
The above is another strange thing for memory size changed when resuming. This issue also happened on dmesg on comment#10. That's before you apply kill_e820_reserved_kern.patch, so should NOT relates to kill_e820_reserved_kern.patch. This issue is another one. Before tracing more about this proble, I have a question about the hibernate mode on your machine. Are you using [platform] more but not other mode on your machine? $ cat /sys/power/disk [platform] shutdown reboot suspend Thanks -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=913885
--- Comment #39 from Björn Voigt
http://bugzilla.opensuse.org/show_bug.cgi?id=913885
--- Comment #40 from Joey Lee
Yes, I have the same output like you (platform method):
$ cat /sys/power/disk [platform] shutdown reboot suspend
I use the "kernel" method for suspend/resume and systemd uses pm-utils:
$ cat /etc/pm/config.d/sleep-module.config SLEEP_MODULE=kernel HIBERNATE_RESUME_POST_VIDEO="yes"
$ rpm -q systemd pm-utils systemd-210-25.12.1.x86_64 pm-utils-1.4.1-38.4.1.x86_64
Thanks for your information, at least we confirm using _S4 from BIOS. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=913885
--- Comment #41 from Joey Lee
http://bugzilla.opensuse.org/show_bug.cgi?id=913885
--- Comment #42 from Joey Lee
Hi Björn,
Is the resume issue(hibernate image not found) on can 100% on "patched Kernel 3.12.37"? And, when issue happened, is the "ioremap on RAM pfn 0x2551" warning always happen?
0.267916] ------------[ cut here ]------------ [ 0.267921] WARNING: CPU: 0 PID: 1 at arch/x86/mm/ioremap.c:63 __ioremap_check_ram+0xb3/0xc0() [ 0.267922] ioremap on RAM pfn 0x2551
And, if possible, please check Yinghai Lu's patch on 3.19 kernel.
Thanks
This is the mailing loop on x86/mm upstream to discuss the warning of __ioremap_check_ram(): http://marc.info/?l=linux-kernel&m=142492905425130 It's another issue but not related to our any patch. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=913885
Joey Lee
http://bugzilla.opensuse.org/show_bug.cgi?id=913885
--- Comment #43 from Joey Lee
http://bugzilla.opensuse.org/show_bug.cgi?id=913885
Björn Voigt
http://bugzilla.opensuse.org/show_bug.cgi?id=913885
--- Comment #45 from Björn Voigt
participants (1)
-
bugzilla_noreply@novell.com