Bug ID 1192767
Summary Kernel boot logging gets mixed up when bringing up CPUs on SMP system
Classification openSUSE
Product openSUSE Distribution
Version Leap 15.2
Hardware VMWare
OS openSUSE Leap 15.2
Status NEW
Severity Minor
Priority P5 - None
Component Kernel
Assignee kernel-bugs@opensuse.org
Reporter archie.cobbs@gmail.com
QA Contact qa-bugs@suse.de
Found By ---
Blocker ---

I booted kernel-default-5.3.18-lp152.95.1.x86_64 (running on VMWare).

Here is a snippet from dmesg -T:

    [Tue Nov  2 20:14:51 2021] smpboot: CPU0: Intel(R) Xeon(R) Gold 6226 CPU @
2.70GHz (family: 0x6, model: 0x3f, stepping: 0x0)
    [Tue Nov  2 20:14:51 2021] Performance Events: Haswell events, core PMU
driver.
    [Tue Nov  2 20:14:51 2021] core: CPUID marked event: 'cpu cycles'
unavailable
    [Tue Nov  2 20:14:51 2021] core: CPUID marked event: 'instructions'
unavailable
    [Tue Nov  2 20:14:51 2021] core: CPUID marked event: 'bus cycles'
unavailable
    [Tue Nov  2 20:14:51 2021] core: CPUID marked event: 'cache references'
unavailable
    [Tue Nov  2 20:14:51 2021] core: CPUID marked event: 'cache misses'
unavailable
    [Tue Nov  2 20:14:51 2021] core: CPUID marked event: 'branch instructions'
unavailable
    [Tue Nov  2 20:14:51 2021] core: CPUID marked event: 'branch misses'
unavailable
    [Tue Nov  2 20:14:51 2021] ... version:                1
    [Tue Nov  2 20:14:51 2021] ... bit width:              48
    [Tue Nov  2 20:14:51 2021] ... generic registers:      4
    [Tue Nov  2 20:14:51 2021] ... value mask:             0000ffffffffffff
    [Tue Nov  2 20:14:51 2021] ... max period:             000000007fffffff
    [Tue Nov  2 20:14:51 2021] ... fixed-purpose events:   0
    [Tue Nov  2 20:14:51 2021] ... event mask:             000000000000000f
    [Tue Nov  2 20:14:51 2021] rcu: Hierarchical SRCU implementation.
    [Tue Nov  2 20:14:51 2021] NMI watchdog: Perf NMI watchdog permanently
disabled
    [Tue Nov  2 20:14:51 2021] smp: Bringing up secondary CPUs ...
    [Tue Nov  2 20:14:51 2021] x86: Booting SMP configuration:
    [Tue Nov  2 20:14:51 2021] .... node  #0, CPUs:      #1
    [Tue Nov  2 20:14:51 2021] Disabled fast string operations
    [Tue Nov  2 20:14:51 2021]  #2
    [Tue Nov  2 20:14:51 2021] Disabled fast string operations
    [Tue Nov  2 20:14:51 2021]  #3
    [Tue Nov  2 20:14:51 2021] Disabled fast string operations
    [Tue Nov  2 20:14:51 2021]  #4
    [Tue Nov  2 20:14:51 2021] Disabled fast string operations
    [Tue Nov  2 20:14:51 2021]  #5
    [Tue Nov  2 20:14:51 2021] Disabled fast string operations
    [Tue Nov  2 20:14:51 2021]  #6
    [Tue Nov  2 20:14:51 2021] Disabled fast string operations
    [Tue Nov  2 20:14:51 2021]  #7
    [Tue Nov  2 20:14:51 2021] Disabled fast string operations
    [Tue Nov  2 20:14:51 2021] smp: Brought up 1 node, 8 CPUs

Here is what showed up in /var/log/warn:

    Nov  2 20:14:53 prod kernel: core: CPUID marked event: 'cpu cycles'
unavailable
    Nov  2 20:14:53 prod kernel: core: CPUID marked event: 'instructions'
unavailable
    Nov  2 20:14:53 prod kernel: core: CPUID marked event: 'bus cycles'
unavailable
    Nov  2 20:14:53 prod kernel: core: CPUID marked event: 'cache references'
unavailable
    Nov  2 20:14:53 prod kernel: core: CPUID marked event: 'cache misses'
unavailable
    Nov  2 20:14:53 prod kernel: core: CPUID marked event: 'branch
instructions' unavailable
    Nov  2 20:14:53 prod kernel: core: CPUID marked event: 'branch misses'
unavailable
    Nov  2 20:14:53 prod kernel:  #2
    Nov  2 20:14:53 prod kernel:  #3
    Nov  2 20:14:53 prod kernel:  #4
    Nov  2 20:14:53 prod kernel:  #5
    Nov  2 20:14:53 prod kernel:  #6
    Nov  2 20:14:53 prod kernel:  #7

The "marked event" log messages are normal/expected. The problem here is the
meaningless log messages "#2", "#3", etc. appearing in /var/log/warn (on my
systems, this file is always monitored automatically and any unexpected
warnings result in escalation to humans).

The file smpboot.c uses pr_cont() [1] apparently assuming nothing will be
logged when each CPU is brought up. But this is incorrect, as obviously
"Disabled fast string operations" is a counter-example. Moreover, there is some
mixup with pr_cont() being bumped up from log level INFO to WARN by the
"Disabled fast string operations" message being logged, although it's logged at
INFO [2]. So there are two related problems here.

In any case, it's obvious some cleanup is needed in the code's assumptions
about how messages are logged at this point in the bootup sequence - both with
respect to separate lines and mixing up INFO vs. WARN.

[1]
https://github.com/torvalds/linux/blob/fc661f2dcb7e41dcda9ae862efb822bb2f461646/arch/x86/kernel/smpboot.c#L957
[2]
https://github.com/torvalds/linux/blob/1b1cf8fe99830e8c95f0fe110b02ba51c2bbc4e0/arch/x86/kernel/cpu/intel.c#L293


You are receiving this mail because: