[Bug 439461] kernel-pae-2.6.25.18-0.2 doesn't resume from StR

3 Nov 2008

      https://bugzilla.novell.com/show_bug.cgi?id=439461

User pavel@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=439461#c21

--- Comment #21 from Pavel Machek   2008-11-03 02:10:58 MDT ---
I went through 2.6.25.16->18 changelogs, and:

This one could be it but then it would probably not be widespread:

commit 8e023f85b670c9f6008df675b6b213025b3387b3
Author: Yinghai Lu 
Date:   Fri Aug 22 17:40:05 2008 +0000

    x86: work around MTRR mask setting

    commit 38cc1c3df77c1bb739a4766788eb9fa49f16ffdf upstream

    Joshua Hoblitt reported that only 3 GB of his 16 GB of RAM is
    usable. Booting with mtrr_show showed us the BIOS-initialized
    MTRR settings - which are all wrong.

Timer/clockevents issues could also be it, and I believe I actually debugged
something like that... but not in -stable:

commit 22e4330618d27748cc69b62d3c96223bcefe6c6c
Author: Thomas Gleixner 
Date:   Sat Sep 6 03:06:08 2008 +0200

    x86: HPET: read back compare register before reading counter

    commit 72d43d9bc9210d24d09202eaf219eac09e17b339 upstream

    After fixing the u32 thinko I sill had occasional hickups on ATI chipsets
    with small deltas. There seems to be a delay between writing the compare
    register and the transffer to the internal register which triggers the
    interrupt. Reading back the value makes sure, that it hit the internal
    match register befor we compare against the counter value.

    Signed-off-by: Thomas Gleixner 
    Signed-off-by: Greg Kroah-Hartman 

commit 59ff733c6b6ef547bb09a9902020750dfbb2200f
Author: Thomas Gleixner 
Date:   Sat Sep 6 03:03:32 2008 +0200

    x86: HPET fix moronic 32/64bit thinko

    commit f7676254f179eac6b5244a80195ec8ae0e9d4606 upstream

    We use the HPET only in 32bit mode because:
    1) some HPETs are 32bit only
    2) on i386 there is no way to read/write the HPET atomic 64bit wide

    The HPET code unification done by the "moron of the year" did
    not take into account that unsigned long is different on 32 and
    64 bit.

    This thinko results in a possible endless loop in the clockevents
    code, when the return comparison fails due to the 64bit/332bit
    unawareness.

    unsigned long cnt = (u32) hpet_read() + delta can wrap over 32bit.
    but the final compare will fail and return -ETIME causing endless
    loops.

    Signed-off-by: Thomas Gleixner 
    Signed-off-by: Greg Kroah-Hartman 

commit 9d29a18def727d9e0d5c656cfc86a278988b7926
Author: Thomas Gleixner 
Date:   Sat Sep 6 03:01:45 2008 +0200

    clockevents: broadcast fixup possible waiters

    commit 7300711e8c6824fcfbd42a126980ff50439d8dd0 upstream

    Until the C1E patches arrived there where no users of periodic broadcast
    before switching to oneshot mode. Now we need to trigger a possible
    waiter for a periodic broadcast when switching to oneshot mode.
    Otherwise we can starve them for ever.

    Signed-off-by: Thomas Gleixner 
    Signed-off-by: Greg Kroah-Hartman 

commit 4f8e2bf785bd7e5ba9b93a7d5ad5c18ba409a199
Author: Thomas Gleixner 
Date:   Wed Sep 3 21:37:24 2008 +0000

    HPET: make minimum reprogramming delta useful

    commit 7cfb0435330364f90f274a26ecdc5f47f738498c upstream

    The minimum reprogramming delta was hardcoded in HPET ticks,
    which is stupid as it does not work with faster running HPETs.
    The C1E idle patches made this prominent on AMD/RS690 chipsets,
    where the HPET runs with 25MHz. Set it to 5us which seems to be
    a reasonable value and fixes the problems on the bug reporters
    machines. We have a further sanity check now in the clock events,
    which increases the delta when it is not sufficient.

    Signed-off-by: Thomas Gleixner 
    Tested-by: Luiz Fernando N. Capitulino 
    Tested-by: Dmitry Nezhevenko 
    Signed-off-by: Ingo Molnar 
    Signed-off-by: Greg Kroah-Hartman 

commit 19ab6cbbf02a7d4ca81ef44cc856ce11870e202b
Author: Thomas Gleixner 
Date:   Wed Sep 3 21:37:14 2008 +0000

    clockevents: prevent endless loop lockup

    commit 1fb9b7d29d8e85ba3196eaa7ab871bf76fc98d36 upstream

    The C1E/HPET bug reports on AMDX2/RS690 systems where tracked down to a
    too small value of the HPET minumum delta for programming an event.

    The clockevents code needs to enforce an interrupt event on the clock event
    device in some cases. The enforcement code was stupid and naive, as it just
    added the minimum delta to the current time and tried to reprogram the
device.
    When the minimum delta is too small, then this loops forever.

    Add a sanity check. Allow reprogramming to fail 3 times, then print a
warning
    and double the minimum delta value to make sure, that this does not happen
again.
    Use the same function for both tick-oneshot and tick-broadcast code.

    Signed-off-by: Thomas Gleixner 
    Signed-off-by: Ingo Molnar 
    Signed-off-by: Greg Kroah-Hartman 

commit ffa4da2a25bb4ac08f710ac99827baf48a8f8d57
Author: Thomas Gleixner 
Date:   Wed Sep 3 21:37:08 2008 +0000

    clockevents: prevent multiple init/shutdown

    commit 9c17bcda991000351cb2373f78be7e4b1c44caa3 upstream

    While chasing the C1E/HPET bugreports I went through the clock events
    code inch by inch and found that the broadcast device can be initialized
    and shutdown multiple times. Multiple shutdowns are not critical, but
    useless waste of time. Multiple initializations are simply broken. Another
    CPU might have the device in use already after the first initialization and
    the second init could just render it unusable again.

    Signed-off-by: Thomas Gleixner 
    Signed-off-by: Ingo Molnar 
    Signed-off-by: Greg Kroah-Hartman 

commit e73068458bf253c2e738cd55080c3a54c61037ef
Author: Thomas Gleixner 
Date:   Wed Sep 3 21:37:03 2008 +0000

    clockevents: enforce reprogram in oneshot setup

    commit 7205656ab48da29a95d7f55e43a81db755d3cb3a upstream

    In tick_oneshot_setup we program the device to the given next_event,
    but we do not check the return value. We need to make sure that the
    device is programmed enforced so the interrupt handler engine starts
    working. Split out the reprogramming function from tick_program_event()
    and call it with the device, which was handed in to tick_setup_oneshot().
    Set the force argument, so the devices is firing an interrupt.

    Signed-off-by: Thomas Gleixner 
    Signed-off-by: Ingo Molnar 
    Signed-off-by: Greg Kroah-Hartman 

commit fbbece349081a689d5687d9ebc769a847fdf423a
Author: Thomas Gleixner 
Date:   Wed Sep 3 21:36:57 2008 +0000

    clockevents: prevent endless loop in periodic broadcast handler

    commit d4496b39559c6d43f83e4c08b899984f8b8089b5 upstream

    The reprogramming of the periodic broadcast handler was broken,
    when the first programming returned -ETIME. The clockevents code
    stores the new expiry value in the clock events device next_event field
    only when the programming time has not been elapsed yet. The loop in
    question calculates the new expiry value from the next_event value
    and therefor never increases.

    Signed-off-by: Thomas Gleixner 
    Signed-off-by: Ingo Molnar 
    Signed-off-by: Greg Kroah-Hartman 

commit 6141266c43db890ada7df589358b8553de2e6322
Author: Venkatesh Pallipadi 
Date:   Wed Sep 3 21:36:50 2008 +0000

    clockevents: prevent clockevent event_handler ending up handler_noop

    commit 7c1e76897492d92b6a1c2d6892494d39ded9680c upstream

    There is a ordering related problem with clockevents code, due to which
    clockevents_register_device() called after tickless/highres switch
    will not work. The new clockevent ends up with clockevents_handle_noop as
    event handler, resulting in no timer activity.

    The problematic path seems to be

    * old device already has hrtimer_interrupt as the event_handler
    * new clockevent device registers with a higher rating
    * tick_check_new_device() is called
      * clockevents_exchange_device() gets called
        * old->event_handler is set to clockevents_handle_noop
      * tick_setup_device() is called for the new device
        * which sets new->event_handler using the old->event_handler which is
noop.

    Change the ordering so that new device inherits the proper handler.

    This does not have any issue in normal case as most likely all the
clockevent
    devices are setup before the highres switch. But, can potentially be
affecting
    some corner case where HPET force detect happens after the highres switch.
    This was a problem with HPET in MSI mode code that we have been
experimenting
    with.

    Signed-off-by: Venkatesh Pallipadi 
    Signed-off-by: Shaohua Li 
    Signed-off-by: Thomas Gleixner 
    Signed-off-by: Ingo Molnar 
    Signed-off-by: Greg Kroah-Hartman 

..and of course, EC is always suspect:
commit dc317ed0f9cb83f616b95ae6abdba44832c60f39
Author: Zhao Yakui 
Date:   Tue Sep 23 13:38:13 2008 +0800

    ACPI: Avoid bogus EC timeout when EC is in Polling mode

    commit 9d699ed92a459cb408e2577e8bbeabc8ec3989e1 upstream

    When EC is in Polling mode, OS will check the EC status continually by
using
    the following source code:
           clear_bit(EC_FLAGS_WAIT_GPE, &ec->flags);
           while (time_before(jiffies, delay)) {
                   if (acpi_ec_check_status(ec, event))
                            return 0;
                   msleep(1);
           }
    But msleep is realized by the function of schedule_timeout. At the same
time
    although one process is already waken up by some events, it won't be
scheduled
    immediately. So maybe there exists the following phenomena:
         a. The current jiffies is already after the predefined jiffies.
        But before timeout happens, OS has no chance to check the EC
        status again.
         b. If preemptible schedule is enabled, maybe preempt schedule will
happen
        before checking loop. When the process is resumed again, maybe
        timeout already happens, which means that OS has no chance to check
        the EC status.

    In such case maybe EC status is already what OS expects when timeout
happens.
    But OS has no chance to check the EC status and regards it as AE_TIME.

    So it will be more appropriate that OS will try to check the EC status
again
    when timeout happens. If the EC status is what we expect, it won't be
regarded
    as timeout. Only when the EC status is not what we expect, it will be
regarded
    as timeout, which means that EC controller can't give a response in time.

    http://bugzilla.kernel.org/show_bug.cgi?id=9823
    http://bugzilla.kernel.org/show_bug.cgi?id=11141

    Signed-off-by: Zhao Yakui 
    Signed-off-by: Zhang Rui  
    Signed-off-by: Andi Kleen 
    Signed-off-by: Greg Kroah-Hartman 

..actually, can you try 2.6.27 kernel and/or opensuse11.1 beta?

-- 
Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

[Bug 439461] kernel-pae-2.6.25.18-0.2 doesn't resume from StR

bugzilla_noreply＠novell.com