[Bug 913885] New: PC with Intel DH55TC mainboard fails to resume, if GPT/UEFI is used
http://bugzilla.opensuse.org/show_bug.cgi?id=913885 Bug ID: 913885 Summary: PC with Intel DH55TC mainboard fails to resume, if GPT/UEFI is used Classification: openSUSE Product: openSUSE Distribution Version: 13.2 Hardware: Other OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Kernel Assignee: kernel-maintainers@forge.provo.novell.com Reporter: bjoernv@arcor.de QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- Created attachment 620163 --> http://bugzilla.opensuse.org/attachment.cgi?id=620163&action=edit Kernel 3.12.16 with failed resume I used openSUSE 13.1 successfully with an DOS partitioned hard disk. Suspend/Resume worked fine. Some weeks ago I replaced the old hard disk with a GPT partitioned SSD and a GPT partitioned hard disk (3 TB) and upgraded to openSUSE 13.2. Suspend still works, but Resume fails often (but not always). I get the following error (the complete "dmesg" output is attached). [ 3.762641] PM: 0x02551000 in e820 nosave region: [mem 0x02551000-0x02551fff] [ 3.808021] PM: Read 6888700 kbytes in 0.28 seconds (24602.50 MB/s) [ 3.809098] PM: Error -14 resuming [ 3.809116] PM: Failed to load hibernation image, recovering. [ 3.854094] PM: Basic memory bitmaps freed [ 3.854096] Restarting tasks ... done. My setup: - openSUSE 13.2 - Intel Mainboard DH55TC, BIOS TCIBX10H.86A.0048.2011.1206.1342 12/06/2011 (latest BIOS) - UEFI boot with Grub 2 - Kernel 3.12.36 Vanilla (I try to get similar logs for openSUSE 13.2 standard kernels and kernels from Kernel_stable repository) - as SLEEP_MODULE I use the "kernel" method /etc/pm/config.d/sleep-module.config with "SLEEP_MODULE=kernel" -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=913885 --- Comment #1 from Björn Voigt <bjoernv@arcor.de> --- Created attachment 620164 --> http://bugzilla.opensuse.org/attachment.cgi?id=620164&action=edit Suspend log file (suspending was successful) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=913885 Mark Scott <markcscott2003@hotmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |markcscott2003@hotmail.com --- Comment #8 from Mark Scott <markcscott2003@hotmail.com> --- I note that you have a WDC WD30EFRX (3TB) which does not support APM IIRC. I'm having an issue with WD 2TB caviar Blacks which also do not support APM which initially I thought was an Nvida driver issue. Please can you try the system without the WD and see if it resumes from S2RAM. Please see my issue here https://bugzilla.opensuse.org/show_bug.cgi?id=913105 -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=913885 --- Comment #9 from Björn Voigt <bjoernv@arcor.de> --- I tested my system with and without my Western Digital Red (WD30EFRX). First time suspend+resume worked, second time suspend worked, but resume failed. My first "disk" is an Samsung SSD 840 Evo. The Western Digital Red disk and the Samsung SSD 840 Evo both do not support APM: $ hdparm -i /dev/sda|grep PM AdvancedPM=no WriteCache=enabled $ hdparm -i /dev/sdb|grep PM AdvancedPM=no WriteCache=enabled I have issues with my new Western Digital Red disk too (exceptions like "failed command: READ FPDMA QUEUED"). They are unrelated to suspend/resume, but they may related to power management problems. I currently debug this problem with cable changes, hdparm optimizations and WD support. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=913885 --- Comment #10 from Björn Voigt <bjoernv@arcor.de> --- Created attachment 620798 --> http://bugzilla.opensuse.org/attachment.cgi?id=620798&action=edit Resume also fails without Western Digital WD30EFRX disk (Kernel 3.18.3-1.gc3e148f-desktop) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=913885 --- Comment #11 from Mark Scott <markcscott2003@hotmail.com> --- Hi Bjorn, thanks for trying that, it looks unlikely our problems are related then, it would have been nice if it were that simple :) Just on a side note I do remember those WD reds did have some issue regarding load cycles but I think they were older batches, I seem to remember reading a good thread on it over at the synology Western Digital Discussion Room http://forum.synology.com/enu/viewforum.php?f=178 Good luck with your bug hunting and thanks. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=913885 --- Comment #12 from Joey Lee <jlee@suse.com> --- Hi Björn, Thanks for your dmesg on comment#5 and comment#6, it's VERY useful to check the logic bug of e820 to nosave region. (In reply to Björn Voigt from comment #6)
Created attachment 620573 [details] Dmesg output after suspend (Kernel 3.18.3-1.gc3e148f-desktop)
Looks there have bug in e820_mark_nosave_regions() make an misjudgment when the e820 region doesn't align to PAGE_SIZE: void __init e820_mark_nosave_regions(unsigned long limit_pfn) { int i; unsigned long pfn = 0; for (i = 0; i < e820.nr_map; i++) { struct e820entry *ei = &e820.map[i]; if (pfn < PFN_UP(ei->addr)) register_nosave_region(pfn, PFN_UP(ei->addr)); /* register to nosave region if there have hole between 2 e820 entries */ pfn = PFN_DOWN(ei->addr + ei->size); [...] The purpose of the above block in e820_mark_nosave_regions() is to compare the end PFN of last region with the current region's start PFN. It used to find out the hole between 2 e820 entries and register it to nosave regions list. The above codes so far works when all e820 entries aligned to PAGE_SIZE, but have bug on non-aligned e820 entries. For example from dmesg: Aligned case: [ 0.000000] reserve setup_data: [mem 0x000000000270b658-0x00000000cbc4efff] usable [ 0.000000] reserve setup_data: [mem 0x00000000cbc4f000-0x00000000cbc93fff] ACPI NVS PFN_DOWN (0xcbc4efff+1) = ((0xCBC4F000) >> PAGE_SHIFT) = 0xCBC4F PFN_UP (0xcbc4f000) = (((0xcbc4f000) + PAGE_SIZE-1) >> PAGE_SHIFT) = (0xcbc4f000 + 0x1000 - 1 >> PAGE_SHIFT) = 0xCBC4FFFF >> PAGE_SHIFT = 0xCBC4F if (0xCBC4F < 0xCBC4F) /* NOT TRUE! */ register_nosave_region(...); Non-aligned case: [ 0.000000] reserve setup_data: [mem 0x0000000000100000-0x0000000002576017] usable [ 0.000000] reserve setup_data: [mem 0x0000000002576018-0x0000000002585857] usable PFN_DOWN (0x2576017+1) = ((0x2576018) >> PAGE_SHIFT) = 0x2576 PFN_UP (0x2576018) = (((0x2576018) + PAGE_SIZE-1) >> PAGE_SHIFT) = (0x2576018 + 0x1000 - 1 >> PAGE_SHIFT) = 0x2577017 >> PAGE_SHIFT = 0x2577 if (0x2576 < 0x2577) /* TRUE! Misjudgment! */ register_nosave_region(0x2576, 0x2577); /* register to nosave region */ Then why this issue happened randomly because the non-aligned e820 entries generated by e820_reserve_setup_data(), setup data are allocated by kernel when early boot stage. So the allocated address are random in usable region. I think this is a old issue in code until I add patch to check the persistent of e820 region between hibernate and resume. I am trying to make a patch to fix this issue. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=913885 Mark Scott <markcscott2003@hotmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC|markcscott2003@hotmail.com | -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=913885 --- Comment #13 from Joey Lee <jlee@suse.com> --- Created attachment 621379 --> http://bugzilla.opensuse.org/attachment.cgi?id=621379&action=edit 0001-x86-mm-hibernate-Fix-misjudgment-of-nosave-region-to.patch This patch changes to compare the address of regions but not pfn. Please try this patch to see if issue fixed. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=913885 Joey Lee <jlee@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |bjoernv@arcor.de Flags| |needinfo?(bjoernv@arcor.de) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=913885 --- Comment #14 from Joey Lee <jlee@suse.com> --- Sent to Kernel upstream for review: https://lkml.org/lkml/2015/1/29/1113 -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=913885 --- Comment #15 from Björn Voigt <bjoernv@arcor.de> --- Created attachment 621519 --> http://bugzilla.opensuse.org/attachment.cgi?id=621519&action=edit Patch 0001-x86-* adapted for Kernel 3.12.36 -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=913885 --- Comment #16 from Björn Voigt <bjoernv@arcor.de> --- I currently test the patch "0001-x86-mm-hibernate-Fix-misjudgment-of-nosave-region-to.patch". At first I test it with Kernel 3.12.36. This Kernel worked reliable with BIOS/GPT booting for my system. The original patch didn't apply cleanly. So I refreshed the patch. Until now, hibernate/resume works reliable with this patch and Kernel 3.12.36. I continue testing. Later I will test Kernel 3.18.*. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=913885 --- Comment #17 from Joey Lee <jlee@suse.com> --- (In reply to Björn Voigt from comment #16)
I currently test the patch "0001-x86-mm-hibernate-Fix-misjudgment-of-nosave-region-to.patch". At first I test it with Kernel 3.12.36. This Kernel worked reliable with BIOS/GPT booting for my system. The original patch didn't apply cleanly. So I refreshed the patch.
Until now, hibernate/resume works reliable with this patch and Kernel 3.12.36. I continue testing. Later I will test Kernel 3.18.*.
Very thank for your testing on my patch. Yinghai Lu raised a better idea is totally remove E820_RESERVED_KERN type because we already using memblock to reserve setup_data instead of e820 table: http://lists-archives.com/linux-kernel/28235454-x86-mm-hibernate-fix-misjudg... -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=913885 --- Comment #18 from Joey Lee <jlee@suse.com> --- Created attachment 621528 --> http://bugzilla.opensuse.org/attachment.cgi?id=621528&action=edit kill_e820_reserved_kern.patch The attached patch from Yinghai Lu on kernel upstream for totally remove E820_RESERVED_KERN type. It's against v3.19-rc kernel. I tested on v3.18 kernel looks also fix our issue on my side. Please try it if you want. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=913885 --- Comment #19 from Björn Voigt <bjoernv@arcor.de> --- I tested the new patch "kill_e820_reserved_kern.patch" for three days. I applied the patch to Kernel kernel-desktop-3.18.5-1.1.gf378da4.x86_64 from Kernel_stable repository. Suspend and resume worked fine with this patch on my Intel DH55TC system. But in two of five cases I had a minor problem: During resume, the screen was black for 1-2 minutes and also the keyboard LED was off. At first I thought, that resume failed, but 1-2 minutes later the keyboard LED was on and graphics also switched on. I tested the patch also on another Notebook (Lenovo Ideapad U430 with Intel Core i5 Haswell CPU). Unfortunately resume failed on this notebook. The notebook rebooted during resume. I have no log entries for failed resume. The problems with the Notebook and the black screen problem on my Intel DH55TC system are probably unrelated to this patch. I test again without the patch. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=913885 --- Comment #20 from Björn Voigt <bjoernv@arcor.de> --- Created attachment 621769 --> http://bugzilla.opensuse.org/attachment.cgi?id=621769&action=edit Dmesg output after suspend (Kernel 3.18.5-desktop, short black screen problem) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=913885 --- Comment #21 from Björn Voigt <bjoernv@arcor.de> --- Created attachment 621770 --> http://bugzilla.opensuse.org/attachment.cgi?id=621770&action=edit Dmesg output after suspend (Kernel 3.18.5-desktop, no problem) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=913885 --- Comment #22 from Joey Lee <jlee@suse.com> --- (In reply to Björn Voigt from comment #19)
I tested the new patch "kill_e820_reserved_kern.patch" for three days. I applied the patch to Kernel kernel-desktop-3.18.5-1.1.gf378da4.x86_64 from Kernel_stable repository.
Suspend and resume worked fine with this patch on my Intel DH55TC system.
Great!
But in two of five cases I had a minor problem: During resume, the screen was black for 1-2 minutes and also the keyboard LED was off. At first I thought, that resume failed, but 1-2 minutes later the keyboard LED was on and graphics also switched on.
hm.... Does this issue happen before apply kill_e820_reserved_kern.patch?
I tested the patch also on another Notebook (Lenovo Ideapad U430 with Intel Core i5 Haswell CPU). Unfortunately resume failed on this notebook. The notebook rebooted during resume. I have no log entries for failed resume.
The same thing need to confirm is: Does this issue happen before apply kill_e820_reserved_kern.patch?
The problems with the Notebook and the black screen problem on my Intel DH55TC system are probably unrelated to this patch. I test again without the patch.
That's good, thanks! -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=913885 --- Comment #23 from Björn Voigt <bjoernv@arcor.de> --- Created attachment 621970 --> http://bugzilla.opensuse.org/attachment.cgi?id=621970&action=edit Dmesg output after suspend (Kernel 3.18.5-1.gf378da4-desktop, original Kernel_stable, no problem) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=913885 --- Comment #24 from Björn Voigt <bjoernv@arcor.de> --- I checked suspend/resume with original openSUSE Kernel_stable Kernel 3.18.5-1.gf378da4-desktop on my Intel DH55TC system. The black screen / keyboard LED off problem for 1-2 minutes after/during resume also appears with this kernel. So it's unaffected by latest patch ( kill_e820_reserved_kern.patch). -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=913885 --- Comment #25 from Joey Lee <jlee@suse.com> --- (In reply to Björn Voigt from comment #24)
I checked suspend/resume with original openSUSE Kernel_stable Kernel 3.18.5-1.gf378da4-desktop on my Intel DH55TC system. The black screen / keyboard LED off problem for 1-2 minutes after/during resume also appears with this kernel. So it's unaffected by latest patch ( kill_e820_reserved_kern.patch).
Great! Another concern is the Lenovo machine you raised from comment#19:
I tested the patch also on another Notebook (Lenovo Ideapad U430 with Intel Core i5 Haswell CPU). Unfortunately resume failed on this notebook. The notebook rebooted during resume. I have no log entries for failed resume.
Does the resume fail cause by kill_e820_reserved_kern.patch? Thanks -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=913885 --- Comment #26 from Björn Voigt <bjoernv@arcor.de> --- Created attachment 622906 --> http://bugzilla.opensuse.org/attachment.cgi?id=622906&action=edit kill_e820_reserved_kern.patch adapted for Kernel 3.12.37 -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=913885 --- Comment #27 from Björn Voigt <bjoernv@arcor.de> --- I also tested the patch kill_e820_reserved_kern.patch with Kernel 3.12.37. The original kill_e820_reserved_kern.patch patch didn't applied clean, so I had to edit it a bit (hopefully correct, see attachment). I tested Kernel 3.12.37 because Kernel 3.12.* is the latest Kernel, which always had stable suspend/resume function on my DH55TC system with BIOS/MBR. There is an suspend/resume problem with my old graphics card (Radeon HD 3450). The patched Kernel 3.12.37 doesn't have reliable suspend/resume on DH55TC with UEFI/GPT. I suspended my system on Monday and now on Wednesday I wasn't able to resume the system. See attached dmesg output. Other suspend/resume cycles worked with this Kernel. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=913885 --- Comment #28 from Björn Voigt <bjoernv@arcor.de> --- Created attachment 622907 --> http://bugzilla.opensuse.org/attachment.cgi?id=622907&action=edit Dmesg output after failed resume (Kernel 3.12.37-desktop-bv2, patched Kernel_stable) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=913885 --- Comment #29 from Joey Lee <jlee@suse.com> --- (In reply to Björn Voigt from comment #28)
Created attachment 622907 [details] Dmesg output after failed resume (Kernel 3.12.37-desktop-bv2, patched Kernel_stable)
hm... Base on dmesg, the hibernate subsystem didn't find any hibernate image, that causes system didn't resume: [ 3.472078] PM: Basic memory bitmaps freed [ 3.474816] PM: Starting manual resume from disk [ 3.474819] PM: Hibernation image partition 8:2 present [ 3.474820] PM: Looking for hibernation image. [ 3.475052] PM: Image not found (code -22) [ 3.475056] PM: Hibernation image not present or could not be loaded. But, currently I have no idea why the hibernate image didn't find. There have another doubtful thing is there have ioremap warning when trying to add pci device: [ 0.267916] ------------[ cut here ]------------ [ 0.267921] WARNING: CPU: 0 PID: 1 at arch/x86/mm/ioremap.c:63 __ioremap_check_ram+0xb3/0xc0() [ 0.267922] ioremap on RAM pfn 0x2551 Per kernel source, looks 0x2551000 should be the PG_reserved System RAM and kernel try to ioremap it. The above message is warning. I am confusing for this warning message because the pages of setup_data are reserved but why emit message. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=913885 --- Comment #30 from Joey Lee <jlee@suse.com> --- Hi Björn, Is the resume issue(hibernate image not found) on can 100% on "patched Kernel 3.12.37"? And, when issue happened, is the "ioremap on RAM pfn 0x2551" warning always happen? 0.267916] ------------[ cut here ]------------ [ 0.267921] WARNING: CPU: 0 PID: 1 at arch/x86/mm/ioremap.c:63 __ioremap_check_ram+0xb3/0xc0() [ 0.267922] ioremap on RAM pfn 0x2551 And, if possible, please check Yinghai Lu's patch on 3.19 kernel. Thanks -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=913885 --- Comment #31 from Björn Voigt <bjoernv@arcor.de> --- Created attachment 623457 --> http://bugzilla.opensuse.org/attachment.cgi?id=623457&action=edit Dmesg output after successful resume (Kernel 3.12.37-desktop, patched Kernel_stable) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=913885 --- Comment #32 from Björn Voigt <bjoernv@arcor.de> --- Created attachment 623458 --> http://bugzilla.opensuse.org/attachment.cgi?id=623458&action=edit Dmesg output after successful resume (Kernel 3.19.0-desktop-bv1, patched Kernel_stable) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=913885 --- Comment #33 from Björn Voigt <bjoernv@arcor.de> --- I attached two "Dmesg" outputs for patched (kill_e820_reserved_kern.patch) Kernel 3.12.37 and 3.19.0 for successful resume. Currently I can not say, if the warning "ioremap on RAM pfn 0x2551" is always shown before resume fails. I will further test this. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=913885 --- Comment #34 from Björn Voigt <bjoernv@arcor.de> --- Created attachment 623610 --> http://bugzilla.opensuse.org/attachment.cgi?id=623610&action=edit Dmesg output after successful resume and with ioremap exception (Kernel 3.19.0-desktop-bv1, patched Kernel_stable) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=913885 --- Comment #35 from Björn Voigt <bjoernv@arcor.de> --- After an ioremap exception on a patched Kernel 3.19.x I can still successful suspend and resume. ------------[ cut here ]------------ WARNING: CPU: 2 PID: 1 at arch/x86/mm/ioremap.c:63 __ioremap_check_ram+0xa7/0xc0() ioremap on RAM pfn 0x2551 Modules linked in: CPU: 2 PID: 1 Comm: swapper/0 Not tainted 3.19.0-desktop-bv1 #1 Hardware name: /DH55TC, BIOS TCIBX10H.86A.0048.2011.1206.1342 12/06/2011 [...] PM: restore of devices complete after 2110.165 msecs PM: Image restored successfully. PM: Basic memory bitmaps freed Restarting tasks ... done. (see full log in attachment) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=913885 --- Comment #36 from Björn Voigt <bjoernv@arcor.de> --- Created attachment 624897 --> http://bugzilla.opensuse.org/attachment.cgi?id=624897&action=edit Dmesg output after failed resume (Kernel 3.12.38-desktop-bv1, patched Kernel_stable) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=913885 --- Comment #37 from Björn Voigt <bjoernv@arcor.de> --- After lots of successful suspend/resume cycles in the last days, I have a failed resume operation with Kernel 3.12.38 and patch kill_e820_reserved_kern.patch again. [ 3.334134] PM: Using 3 thread(s) for decompression. PM: Loading and decompressing image data (1884119 pages)... [ 3.334138] PM: Image mismatch: memory size [ 3.334164] PM: Read 7536476 kbytes in 0.01 seconds (753647.60 MB/s) [ 3.335127] PM: Error -1 resuming [ 3.335131] PM: Failed to load hibernation image, recovering. [ 3.364559] usb 1-1.1.1: new low-speed USB device number 5 using ehci-pci [ 3.380217] PM: Basic memory bitmaps freed [ 3.380219] Restarting tasks ... done. [ 3.380925] PM: Hibernation image not present or could not be loaded. (see attachment for complete dmesg output) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=913885 --- Comment #38 from Joey Lee <jlee@suse.com> --- Hi Björn, Your testing causes more hibernate problems raised on issue machine. (In reply to Björn Voigt from comment #37)
After lots of successful suspend/resume cycles in the last days, I have a failed resume operation with Kernel 3.12.38 and patch kill_e820_reserved_kern.patch again.
[ 3.334134] PM: Using 3 thread(s) for decompression. PM: Loading and decompressing image data (1884119 pages)... [ 3.334138] PM: Image mismatch: memory size
The above is another strange thing for memory size changed when resuming. This issue also happened on dmesg on comment#10. That's before you apply kill_e820_reserved_kern.patch, so should NOT relates to kill_e820_reserved_kern.patch. This issue is another one. Before tracing more about this proble, I have a question about the hibernate mode on your machine. Are you using [platform] more but not other mode on your machine? $ cat /sys/power/disk [platform] shutdown reboot suspend Thanks -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=913885 --- Comment #39 from Björn Voigt <bjoernv@arcor.de> --- Yes, I have the same output like you (platform method): $ cat /sys/power/disk [platform] shutdown reboot suspend I use the "kernel" method for suspend/resume and systemd uses pm-utils: $ cat /etc/pm/config.d/sleep-module.config SLEEP_MODULE=kernel HIBERNATE_RESUME_POST_VIDEO="yes" $ rpm -q systemd pm-utils systemd-210-25.12.1.x86_64 pm-utils-1.4.1-38.4.1.x86_64 -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=913885 --- Comment #40 from Joey Lee <jlee@suse.com> --- (In reply to Björn Voigt from comment #39)
Yes, I have the same output like you (platform method):
$ cat /sys/power/disk [platform] shutdown reboot suspend
I use the "kernel" method for suspend/resume and systemd uses pm-utils:
$ cat /etc/pm/config.d/sleep-module.config SLEEP_MODULE=kernel HIBERNATE_RESUME_POST_VIDEO="yes"
$ rpm -q systemd pm-utils systemd-210-25.12.1.x86_64 pm-utils-1.4.1-38.4.1.x86_64
Thanks for your information, at least we confirm using _S4 from BIOS. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=913885 --- Comment #41 from Joey Lee <jlee@suse.com> --- I checked kernel code again "memory size mismatch", before v3.11, it checks num_physpages global variable should constant between hibernating and resuming. After v3.11, kernel code changed to use get_num_physpages(), the return value is from the sum of all node's totalpages. Checked each dmesg attached on this bug, found v3.12: Description: [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-3.12.36-desktop-bv1 root=UUID=2d968229- [ 0.000000] e820: last_pfn = 0x430000 max_arch_pfn = 0x400000000 [ 0.000000] e820: last_pfn = 0xcc000 max_arch_pfn = 0x400000000 [ 0.000000] On node 0 totalpages: 4177255 Comment#28: [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-3.12.37-desktop-bv2 root=UUID=2d968229- [ 0.000000] e820: last_pfn = 0x430000 max_arch_pfn = 0x400000000 [ 0.000000] e820: last_pfn = 0xcc000 max_arch_pfn = 0x400000000 [ 0.000000] On node 0 totalpages: 4177255 Comment#31: [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-3.12.37-desktop-bv2 root=UUID=2d968229- [ 0.000000] e820: last_pfn = 0x430000 max_arch_pfn = 0x400000000 [ 0.000000] e820: last_pfn = 0xcc000 max_arch_pfn = 0x400000000 [ 0.000000] On node 0 totalpages: 4177255 Comment#36: [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-3.12.38-desktop-bv1 root=UUID=2d968229- [ 0.000000] e820: last_pfn = 0x430000 max_arch_pfn = 0x400000000 [ 0.000000] e820: last_pfn = 0xcc000 max_arch_pfn = 0x400000000 [ 0.000000] On node 0 totalpages: 4177256 [ 3.334138] PM: Image mismatch: memory size v3.18: Comment#5: [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-3.18.3-1.gc3e148f-desktop [ 0.000000] e820: last_pfn = 0x430000 max_arch_pfn = 0x400000000 [ 0.000000] e820: last_pfn = 0xcc000 max_arch_pfn = 0x400000000 [ 0.000000] On node 0 totalpages: 4177255 Comment#6: [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-3.18.3-1.gc3e148f-desktop [ 0.000000] e820: last_pfn = 0x430000 max_arch_pfn = 0x400000000 [ 0.000000] e820: last_pfn = 0xcc000 max_arch_pfn = 0x400000000 [ 0.000000] On node 0 totalpages: 4177255 Comment#10: [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-3.18.3-1.gc3e148f-desktop [ 0.000000] e820: last_pfn = 0x430000 max_arch_pfn = 0x400000000 [ 0.000000] e820: last_pfn = 0xcc000 max_arch_pfn = 0x400000000 [ 0.000000] On node 0 totalpages: 4177256 [ 3.834831] PM: Image mismatch: memory size Comment#20: [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-3.18.5-desktop-bv2 root=UUID=2d968229- [ 0.000000] e820: last_pfn = 0x430000 max_arch_pfn = 0x400000000 [ 0.000000] e820: last_pfn = 0xcc000 max_arch_pfn = 0x400000000 [ 0.000000] On node 0 totalpages: 4177255 Comment#21: [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-3.18.5-desktop-bv2 root=UUID=2d968229- [ 0.000000] e820: last_pfn = 0x430000 max_arch_pfn = 0x400000000 [ 0.000000] e820: last_pfn = 0xcc000 max_arch_pfn = 0x400000000 [ 0.000000] On node 0 totalpages: 4177255 Comment#23: [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-3.18.5-1.gf378da4-desktop [ 0.000000] e820: last_pfn = 0x430000 max_arch_pfn = 0x400000000 [ 0.000000] e820: last_pfn = 0xcc000 max_arch_pfn = 0x400000000 [ 0.000000] On node 0 totalpages: 4177256 [ 3.847437] PM: Image mismatch: memory size As the above part of dmesg, no matter v3.12 or v3.18 kernel, the last_pfn of e820 always keeps in 0x430000/0xcc000. But the totalpages of node 0 normally is 4177255, but sometimes 4177256. The memory size mismatch always happened when totalpages=4177256. Need more detail trace the code of how to calculate totalpages of node, I suggest open another bug against this new found issue. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=913885 --- Comment #42 from Joey Lee <jlee@suse.com> --- (In reply to Joey Lee from comment #30)
Hi Björn,
Is the resume issue(hibernate image not found) on can 100% on "patched Kernel 3.12.37"? And, when issue happened, is the "ioremap on RAM pfn 0x2551" warning always happen?
0.267916] ------------[ cut here ]------------ [ 0.267921] WARNING: CPU: 0 PID: 1 at arch/x86/mm/ioremap.c:63 __ioremap_check_ram+0xb3/0xc0() [ 0.267922] ioremap on RAM pfn 0x2551
And, if possible, please check Yinghai Lu's patch on 3.19 kernel.
Thanks
This is the mailing loop on x86/mm upstream to discuss the warning of __ioremap_check_ram(): http://marc.info/?l=linux-kernel&m=142492905425130 It's another issue but not related to our any patch. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=913885 Joey Lee <jlee@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|PC with Intel DH55TC |PC with Intel DH55TC |mainboard fails to resume, |mainboard fails to resume, |if GPT/UEFI is used |if GPT/UEFI is used | |(E820_RESERVED_KERN ranges | |are continuous and | |boundary is not page | |aligned.) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=913885 --- Comment #43 from Joey Lee <jlee@suse.com> --- Hi Björn, I just filed another bug, boo#921549, against the issue on comment#41. And, I changed the subject of this issue, let's wait Yinghai Lu's patch set merged to v4.1 kernel: https://lkml.org/lkml/2015/2/28/272 Then, I will back port patches to openSUSE 13.2. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=913885 Björn Voigt <bjoernv@arcor.de> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(bjoernv@arcor.de) | --- Comment #44 from Björn Voigt <bjoernv@arcor.de> --- I have some updates: 1) Since several months (since march) I have stable suspend/resume actions on my system. I primarily use the Kernel 3.12.* version (currently 3.12.41). I use the adapted patch "kill_e820_reserved_kern.patch adapted for Kernel 3.12.37" (see attachments). 2) Kernel 4.0.* also seems to work fine without any patches (except the openSUSE patches from Kernel:stable repository). I need to further test Kernel 4.0.1 and later. 3) One of my four Kingston memory modules had an error. I detected the error with Memtest86 and removed the module. The hardware change probably didn't had any influence to this bug. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=913885 --- Comment #45 from Björn Voigt <bjoernv@arcor.de> --- Created attachment 633644 --> http://bugzilla.opensuse.org/attachment.cgi?id=633644&action=edit Dmesg output after 9 successful suspend/resume cycles and four days uptime (Kernel 3.12.41-desktop-bv1, patched Kernel_stable) -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com