Bug ID 1227345
Summary Kernel 6.9 (MicroOS): a single cpu core is available.
Classification openSUSE
Product openSUSE Tumbleweed
Version Current
Hardware x86-64
OS Other
Status NEW
Severity Normal
Priority P5 - None
Component Kernel
Assignee kernel-bugs@opensuse.org
Reporter maxime.thirion@r-virtuel.net
QA Contact qa-bugs@suse.de
Target Milestone ---
Found By ---
Blocker ---

Created attachment 875851 [details]
htop showing only one core

Hello everyone !

Overview: 

I'm experiencing a really strange problem on two servers, in production, hosted
by the Hetzner company.

Since upgrading to kernel 6.9, only one cpu core is available, giving
catastrophic performance.

The strange thing is that the two machines are not identical:

The first server has an Intel(R) Xeon(R) CPU E3-1275 v5 with 2x 512Gb NVME SSD
and 64Gb ram.
The second server has an Intel(R) Core(TM) i7-3770 CPU with 4x 6TB HDD and 32Gb
ram.

I contacted the hosting company first, who told me they didn't know about the
problem. With their suggestion, I restarted one of the machines on their rescue
system, which has a 6.9.7 kernel, so it's also very recent, and the problem
doesn't exist: all the cpu cores are activated.

Steps to Reproduce : 

Update to kernel 6.9

Actual Results:

Only one CPU core is available, and performance is very poor.
lscpu gives only one core per socket and only one tread per core.

kutta:~ # lscpu
Architecture:             x86_64
  CPU op-mode(s):         32-bit, 64-bit
  Address sizes:          39 bits physical, 48 bits virtual
  Byte Order:             Little Endian
CPU(s):                   1
  On-line CPU(s) list:    0
Vendor ID:                GenuineIntel
  BIOS Vendor ID:         Intel(R) Corporation
  Model name:             Intel(R) Xeon(R) CPU E3-1275 v5 @ 3.60GHz
    BIOS Model name:      Intel(R) Xeon(R) CPU E3-1275 v5 @ 3.60GHz To Be
Filled By O.E.M. CPU @ 3.6GHz
    BIOS CPU family:      179
    CPU family:           6
    Model:                94
    Thread(s) per core:   1
    Core(s) per socket:   1
    Socket(s):            1
    Stepping:             3
    CPU(s) scaling MHz:   36%
    CPU max MHz:          4000.0000
    CPU min MHz:          800.0000
    BogoMIPS:             7202.00
    Flags:                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx
pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl
xtopology n
                          onstop_tsc cpuid aperfmperf pni pclmulqdq dtes64
monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2
x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm
3dnowpr
                          efetch cpuid_fault epb pti ssbd ibrs ibpb stibp
tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2
erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1
xsaves d
                          therm ida arat pln pts hwp hwp_notify hwp_act_window
hwp_epp vnmi md_clear flush_l1d arch_capabilities
Virtualization features:  
  Virtualization:         VT-x
Caches (sum of all):      
  L1d:                    32 KiB (1 instance)
  L1i:                    32 KiB (1 instance)
  L2:                     256 KiB (1 instance)
  L3:                     8 MiB (1 instance)
NUMA:                     
  NUMA node(s):           1
  NUMA node0 CPU(s):      0
Vulnerabilities:          
  Gather data sampling:   Vulnerable: No microcode
  Itlb multihit:          KVM: Mitigation: VMX disabled
  L1tf:                   Mitigation; PTE Inversion; VMX conditional cache
flushes, SMT disabled
  Mds:                    Mitigation; Clear CPU buffers; SMT disabled
  Meltdown:               Mitigation; PTI
  Mmio stale data:        Mitigation; Clear CPU buffers; SMT disabled
  Reg file data sampling: Not affected
  Retbleed:               Mitigation; IBRS
  Spec rstack overflow:   Not affected
  Spec store bypass:      Mitigation; Speculative Store Bypass disabled via
prctl
  Spectre v1:             Mitigation; usercopy/swapgs barriers and __user
pointer sanitization
  Spectre v2:             Mitigation; IBRS; IBPB conditional; STIBP disabled;
RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
  Srbds:                  Mitigation; Microcode
  Tsx async abort:        Mitigation; TSX disable

Expected Results:

Correct and optimal operation of the cpu, with all cores activated, as with
kernels 6.8 and earlier or with 6.9 supplied by the provider's rescue system.

Additional Information: Any other useful information.

I couldn't find the system in the list but it's OpenSUSE MicroOS, so the
servers have a read-only file system and they're in production (which limits my
testing and reinstallation possibilities).

I did a rollback on the server with the Xeon that was still in 6.4 and the
problem no longer appears.

I no longer have a snapshop with an old kernel on the i7, as I didn't see the
problem immediately.
I can do some "tests" on this server if necessary.

The servers were up to date (before the Xeon rollback).


You are receiving this mail because: