https://bugzilla.suse.com/show_bug.cgi?id=1220119 https://bugzilla.suse.com/show_bug.cgi?id=1220119#c8 --- Comment #8 from Jiri Wiesner <jwiesner@suse.com> --- (In reply to Jiri Wiesner from comment #2)
There is almost certainly something wrong with the tsc clocksource (not kvm-clock). Perhaps, the TSC counter is somehow virtualized (or para-virtualized by Qemu, I don't know) and the virtualized TSC misbehaves under certain conditions.
A brief look at the code executed by the guest shows that kvm-clock is the paravirtualized clocksource whereas the tsc clocksource executes rdtscp directly. My guess is that the hypervisor shields the guest from the fact that time flows differently on its pCPUs. In line with this, the readout of kvm-clock would result in smaller counter values, probably related to the CPU time for which the vCPUs were able to run on the hypervisor, whereas a direct readout of the value of the real TSC counter (on the pCPU) gives much larger counter values and larger differences. Although if this were true, I am unsure about how NTP would be able to synchronize time in the guest so that userspace gets correct timestamps. Whether or not the tsc clocksource gets initialized during booting up depends on the flags exposed by the vCPU, hence it depends on the exact type of the vCPU. On some guests where the vCPU sets the TSC-related flags (as seen in /proc/cpuinfo on x86: tsc rdtscp constant_tsc tsc_known_freq tsc_deadline_timer tsc_adjust), the tsc clocksource will be initialized and checked by the watchdog. It is possible that the values read by the tsc clocksource are entirely misleading and irrelevant as far as timekeeping in the guest is concerned. Passing tsc=nowatchdog to the guest kernel would disable the clocksource watchdog checks on the tsc clocksource. I am looking at the ppc64le and aarch64 logs. There are no clocksource watchdog errors:
$ grep -E 'clocksource|watchdog' serial0.ppc64le.3939629.txt [ 0.000000][ T0] clocksource: timebase: mask: 0xffffffffffffffff max_cycles: 0x761537d007, max_idle_ns: 440795202126 ns [ 0.000002][ T0] clocksource: timebase mult[1f40000] shift[24] registered [ 0.007938][ T1] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 6370867519511994 ns [ 0.058782][ T1] clocksource: Switched to clocksource timebase $ grep -E 'clocksource|watchdog' serial0.aarch64.3940457.txt [ 0.000000][ T0] clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0xb8812736b, max_idle_ns: 440795202655 ns [ 0.050199][ T1] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 6370867519511994 ns [ 0.594978][ T1] clocksource: Switched to clocksource arch_sys_counter [ 11.709872][ T49] watchdog: Delayed init of the lockup detector failed: -19 [ 11.711996][ T49] watchdog: Hard watchdog permanently disabled Then I realized that the clocksource watchdog is only enabled on x86_64 and i386: $ grep -r CLOCKSOURCE_WATCHDOG config config/x86_64/default:CONFIG_CLOCKSOURCE_WATCHDOG=y config/x86_64/default:CONFIG_CLOCKSOURCE_WATCHDOG_MAX_SKEW_US=100 config/x86_64/default:# CONFIG_TEST_CLOCKSOURCE_WATCHDOG is not set config/i386/pae:CONFIG_CLOCKSOURCE_WATCHDOG=y config/i386/pae:CONFIG_CLOCKSOURCE_WATCHDOG_MAX_SKEW_US=100 config/i386/pae:# CONFIG_TEST_CLOCKSOURCE_WATCHDOG is not set How are the ppc64le and aarch64 results relevant to the clocksource watchdog errors observed on x86_64 guests? The ltp_net_tcp_cmds test failing should not have anything to do with the clocksource watchdog error - there is no evidence for that as far as I can see. -- You are receiving this mail because: You are on the CC list for the bug.