[Bug 1206321] New: livepatch test klp_tc_8.sh fails: rmmod: ERROR: Module klp_tc_8_4_livepatch is in use
https://bugzilla.suse.com/show_bug.cgi?id=1206321 Bug ID: 1206321 Summary: livepatch test klp_tc_8.sh fails: rmmod: ERROR: Module klp_tc_8_4_livepatch is in use Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: x86-64 OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Kernel Assignee: kernel-bugs@opensuse.org Reporter: petr.vorel@suse.com QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- This looks to be some race (does not fail always) [1]. It has been here at least since build 20221123 [2] which has 6.0.8-1.1 (1579d93) (likely before, but older failed logs cleared, it might be just on 6.0.x stable branch). ppc64le version looks to be OK [3]. I was not able to reproduce it on my laptop (6.1.0-rc8-2.g2fb1790-default), nor in my openQA instance [4] (might be related to o3 setup/infrastructure). Logs from openQA run (./klp_tc_8.sh) [5], nothing obvious in dmesg [6]. [19:26:53] Test Case 8: Patch with replace-all [19:26:53] *** Compiling live patches [19:26:53] make: Entering directory '/usr/src/linux-6.0.12-1-obj/x86_64/default' [19:26:54] CC [M] /tmp/live-patch/tc_8/patch_replace-all_1/klp_tc_8_1_livepatch.o [19:26:55] MODPOST /tmp/live-patch/tc_8/patch_replace-all_1/Module.symvers [19:26:55] CC [M] /tmp/live-patch/tc_8/patch_replace-all_1/klp_tc_8_1_livepatch.mod.o [19:26:55] LD [M] /tmp/live-patch/tc_8/patch_replace-all_1/klp_tc_8_1_livepatch.ko [19:26:55] BTF [M] /tmp/live-patch/tc_8/patch_replace-all_1/klp_tc_8_1_livepatch.ko [19:26:55] Skipping BTF generation for /tmp/live-patch/tc_8/patch_replace-all_1/klp_tc_8_1_livepatch.ko due to unavailability of vmlinux [19:26:55] make: Leaving directory '/usr/src/linux-6.0.12-1-obj/x86_64/default' [19:26:55] make: Entering directory '/usr/src/linux-6.0.12-1-obj/x86_64/default' [19:26:56] CC [M] /tmp/live-patch/tc_8/patch_replace-all_2/klp_tc_8_2_livepatch.o [19:26:56] MODPOST /tmp/live-patch/tc_8/patch_replace-all_2/Module.symvers [19:26:56] CC [M] /tmp/live-patch/tc_8/patch_replace-all_2/klp_tc_8_2_livepatch.mod.o [19:26:57] LD [M] /tmp/live-patch/tc_8/patch_replace-all_2/klp_tc_8_2_livepatch.ko [19:26:57] BTF [M] /tmp/live-patch/tc_8/patch_replace-all_2/klp_tc_8_2_livepatch.ko [19:26:57] Skipping BTF generation for /tmp/live-patch/tc_8/patch_replace-all_2/klp_tc_8_2_livepatch.ko due to unavailability of vmlinux [19:26:57] make: Leaving directory '/usr/src/linux-6.0.12-1-obj/x86_64/default' [19:26:57] make: Entering directory '/usr/src/linux-6.0.12-1-obj/x86_64/default' [19:26:58] CC [M] /tmp/live-patch/tc_8/patch_replace-all_3/klp_tc_8_3_livepatch.o [19:26:58] MODPOST /tmp/live-patch/tc_8/patch_replace-all_3/Module.symvers [19:26:58] CC [M] /tmp/live-patch/tc_8/patch_replace-all_3/klp_tc_8_3_livepatch.mod.o [19:26:58] LD [M] /tmp/live-patch/tc_8/patch_replace-all_3/klp_tc_8_3_livepatch.ko [19:26:58] BTF [M] /tmp/live-patch/tc_8/patch_replace-all_3/klp_tc_8_3_livepatch.ko [19:26:58] Skipping BTF generation for /tmp/live-patch/tc_8/patch_replace-all_3/klp_tc_8_3_livepatch.ko due to unavailability of vmlinux [19:26:58] make: Leaving directory '/usr/src/linux-6.0.12-1-obj/x86_64/default' [19:26:58] make: Entering directory '/usr/src/linux-6.0.12-1-obj/x86_64/default' [19:26:59] CC [M] /tmp/live-patch/tc_8/patch_replace-all_4/klp_tc_8_4_livepatch.o [19:27:00] MODPOST /tmp/live-patch/tc_8/patch_replace-all_4/Module.symvers [19:27:00] CC [M] /tmp/live-patch/tc_8/patch_replace-all_4/klp_tc_8_4_livepatch.mod.o [19:27:00] LD [M] /tmp/live-patch/tc_8/patch_replace-all_4/klp_tc_8_4_livepatch.ko [19:27:00] BTF [M] /tmp/live-patch/tc_8/patch_replace-all_4/klp_tc_8_4_livepatch.ko [19:27:00] Skipping BTF generation for /tmp/live-patch/tc_8/patch_replace-all_4/klp_tc_8_4_livepatch.ko due to unavailability of vmlinux [19:27:00] make: Leaving directory '/usr/src/linux-6.0.12-1-obj/x86_64/default' [19:27:00] make: Entering directory '/usr/src/linux-6.0.12-1-obj/x86_64/default' [19:27:01] CC [M] /tmp/live-patch/tc_8/patch_replace-all_5/klp_tc_8_5_livepatch.o [19:27:01] MODPOST /tmp/live-patch/tc_8/patch_replace-all_5/Module.symvers [19:27:01] CC [M] /tmp/live-patch/tc_8/patch_replace-all_5/klp_tc_8_5_livepatch.mod.o [19:27:01] LD [M] /tmp/live-patch/tc_8/patch_replace-all_5/klp_tc_8_5_livepatch.ko [19:27:01] BTF [M] /tmp/live-patch/tc_8/patch_replace-all_5/klp_tc_8_5_livepatch.ko [19:27:01] Skipping BTF generation for /tmp/live-patch/tc_8/patch_replace-all_5/klp_tc_8_5_livepatch.ko due to unavailability of vmlinux [19:27:01] make: Leaving directory '/usr/src/linux-6.0.12-1-obj/x86_64/default' [19:27:01] *** Inserting getpid patch 1 [19:27:02] *** Wait for completion (klp_tc_8_1_livepatch) [19:27:02] *** Inserting getpid patch 2 [19:27:02] *** Wait for completion (klp_tc_8_2_livepatch) [19:27:02] *** Inserting getpid patch 3 [19:27:02] *** Wait for completion (klp_tc_8_3_livepatch) [19:27:02] *** Inserting getpid patch 4 [19:27:02] *** Wait for completion (klp_tc_8_4_livepatch) [19:27:02] *** Inserting getpid patch 5 [19:27:02] *** Wait for completion (klp_tc_8_5_livepatch) [19:27:02] *** Removing getpid patch 1 [19:27:02] *** Removing getpid patch 2 [19:27:02] *** Removing getpid patch 3 [19:27:02] *** Removing getpid patch 4 [19:27:02] rmmod: ERROR: Module klp_tc_8_4_livepatch is in use [19:27:02] *** Removing patches [19:27:02] TEST FAILED while executing 'rmmod "$PATCH_MOD_NAME"' [1] https://openqa.opensuse.org/tests/2947725#next_previous [2] https://openqa.opensuse.org/tests/2899233 [3] https://openqa.opensuse.org/tests/2948060#next_previous [4] http://quasar.suse.cz/tests/1446#next_previous [5] https://openqa.opensuse.org/tests/2947725/file/serial_terminal.txt [6] https://openqa.opensuse.org/tests/2947725/file/serial0.txt -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1206321 https://bugzilla.suse.com/show_bug.cgi?id=1206321#c1 --- Comment #1 from Petr Vorel <petr.vorel@suse.com> --- Created attachment 863462 --> https://bugzilla.suse.com/attachment.cgi?id=863462&action=edit Test output (https://openqa.opensuse.org/tests/2947725/file/serial_terminal.txt) -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1206321 https://bugzilla.suse.com/show_bug.cgi?id=1206321#c2 --- Comment #2 from Petr Vorel <petr.vorel@suse.com> --- Created attachment 863463 --> https://bugzilla.suse.com/attachment.cgi?id=863463&action=edit dmesg (https://openqa.opensuse.org/tests/2947725/file/serial0.txt) -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1206321 Petr Vorel <petr.vorel@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |mbenes@suse.com -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1206321 Petr Vorel <petr.vorel@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jslaby@suse.com -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1206321 Petr Vorel <petr.vorel@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |nstange@suse.com -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1206321 Petr Vorel <petr.vorel@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |petr.vorel@suse.com -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1206321 Petr Vorel <petr.vorel@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Found By|--- |openQA -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1206321 https://bugzilla.suse.com/show_bug.cgi?id=1206321#c3 Miroslav Bene�� <mbenes@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |mpdesouza@suse.com, | |pmladek@suse.com --- Comment #3 from Miroslav Bene�� <mbenes@suse.com> --- Is it easily reproducible in openQA then? If yes, would it be possible to enable a dynamic debug output? # echo "file kernel/livepatch/* +p" > /sys/kernel/debug/dynamic_debug/control It might provide more insight. The logs do not contain anything interesting now. Given that klp_tc_8_4_livepatch is not removed, klp_tc_8_5_livepatch stays applied and is then applied to all the remaining tests (if relevant). -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1206321 https://bugzilla.suse.com/show_bug.cgi?id=1206321#c4 --- Comment #4 from Petr Vorel <petr.vorel@suse.com> --- (In reply to Miroslav Bene�� from comment #3)
Is it easily reproducible in openQA then? If yes, would it be possible to enable a dynamic debug output?
# echo "file kernel/livepatch/* +p" > /sys/kernel/debug/dynamic_debug/control
It might provide more insight.
Sure, I added the debugging to my fork. I reproduced it only once [1] of 6 runs [2]. Dmesg [3] contains more debug info in the end, but it also caused also test 14 to fail (only in this case where also test 8 fails). [1] https://openqa.opensuse.org/tests/2952626 [2] https://openqa.opensuse.org/tests/overview?build=livepatch-debugging&version... [3] https://openqa.opensuse.org/tests/2952626/file/serial0.txt [4] https://openqa.opensuse.org/tests/2952626/file/serial_terminal.txt -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1206321 https://bugzilla.suse.com/show_bug.cgi?id=1206321#c5 --- Comment #5 from Petr Vorel <petr.vorel@suse.com> --- Created attachment 863504 --> https://bugzilla.suse.com/attachment.cgi?id=863504&action=edit dmesg with debugging (https://openqa.opensuse.org/tests/2952626/file/serial0.txt) -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1206321 https://bugzilla.suse.com/show_bug.cgi?id=1206321#c6 --- Comment #6 from Petr Vorel <petr.vorel@suse.com> --- Created attachment 863505 --> https://bugzilla.suse.com/attachment.cgi?id=863505&action=edit Test output with debugging (https://openqa.opensuse.org/tests/2952626/file/serial_terminal.txt) -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1206321 https://bugzilla.suse.com/show_bug.cgi?id=1206321#c7 --- Comment #7 from Miroslav Bene�� <mbenes@suse.com> --- There is something strange going on. klp_tc_8_{1,2,3}_livepatch modules are just rmmoded. Their refcount is 0, so it is straightforward. klp_tc_8_4_livepatch should be the same. They are all replace_all patches so once klp_tc_8_5_livepatch is installed, and it is successfully, the refcount of klp_tc_8_4_livepatch should be 0 and it should be possible to just rmmod it. But it is clearly not for some reason. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1206321 https://bugzilla.suse.com/show_bug.cgi?id=1206321#c8 --- Comment #8 from Miroslav Bene�� <mbenes@suse.com> --- I am not able to reproduce on v6.1, but... There is a race condition. The kernel infrastructure clears klp_transition_patch once a transition succeeds. And only after that it calls module_put() on all replaced previous live patches if klp_transition_patch was replace_all. klp_in_progress() in qa_test_klp checks /sys/kernel/livepatch/*/transition which is basically an export of klp_transition_patch. klp_wait_complete() then waits for klp_in_progress() to return false when a live patch module is loaded and that is it. So what very likely happens in klp_tc_8.sh is that klp_tc_8_*_livepatch are loaded sequentially. When it is klp_tc_8_5_livepatch turn, the module is loaded successfully and klp_transition_patch is cleared. klp_wait_complete() sees it and tries to rmmod all previous modules because their refcnt should be zero. But that does not have to be true for klp_tc_8_4_livepatch because its module_put() has not been called yet in the kernel. Let's just call klp_wait_complete() also before rmmod in klp_tc_8.sh. It has the logic to wait for refcnt being zero anyway (used in klp_tc_exit() for example, which does the right thing). Petr, could you test with the attached patch for qa_test_klp, please? -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1206321 https://bugzilla.suse.com/show_bug.cgi?id=1206321#c9 --- Comment #9 from Miroslav Bene�� <mbenes@suse.com> --- Created attachment 863525 --> https://bugzilla.suse.com/attachment.cgi?id=863525&action=edit Fix for klp_tc_8.sh -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1206321 Miroslav Bene�� <mbenes@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|kernel-bugs@opensuse.org |mbenes@suse.com -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1206321 https://bugzilla.suse.com/show_bug.cgi?id=1206321#c10 Petr Vorel <petr.vorel@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |IN_PROGRESS --- Comment #10 from Petr Vorel <petr.vorel@suse.com> --- Thanks for a patch, I'm running tests this patch [1], half of them also with echo "file kernel/livepatch/* +p" > /sys/kernel/debug/dynamic_debug/control debugging. I'll keep restarting it, because it's hard to trigger this bug so that we have some confidence with the results. [1] https://openqa.opensuse.org/tests/overview?distri=opensuse&build=livepatch-f... -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1206321 https://bugzilla.suse.com/show_bug.cgi?id=1206321#c11 --- Comment #11 from Petr Vorel <petr.vorel@suse.com> --- Hm, test 14 failed, even on on test which does *not* run echo "file kernel/livepatch/* +p" ... [1]: == Trace patched function == [10:38:50] Test Case 14: Trace patched function [10:38:50] *** Compiling kernel live patch [10:38:50] make: Entering directory '/usr/src/linux-6.0.12-1-obj/x86_64/default' [10:38:50] CC [M] /tmp/live-patch/tc_14/klp_tc_14_livepatch.o [10:38:51] MODPOST /tmp/live-patch/tc_14/Module.symvers [10:38:51] CC [M] /tmp/live-patch/tc_14/klp_tc_14_livepatch.mod.o [10:38:51] LD [M] /tmp/live-patch/tc_14/klp_tc_14_livepatch.ko [10:38:51] BTF [M] /tmp/live-patch/tc_14/klp_tc_14_livepatch.ko [10:38:51] Skipping BTF generation for /tmp/live-patch/tc_14/klp_tc_14_livepatch.ko due to unavailability of vmlinux [10:38:51] make: Leaving directory '/usr/src/linux-6.0.12-1-obj/x86_64/default' [10:38:51] *** Compile test support module [10:38:51] make: Entering directory '/usr/src/linux-6.0.12-1-obj/x86_64/default' [10:38:23] make: Warning: File '/tmp/live-patch/tc_14/modules.order' has modification time 27 s in the future [10:38:24] make[1]: Warning: File '/tmp/live-patch/tc_14/Makefile' has modification time 27 s in the future [10:38:24] CC [M] /tmp/live-patch/tc_14/klp_test_support_mod.o [10:38:24] make[1]: warning: Clock skew detected. Your build may be incomplete. [10:38:24] make[1]: Warning: File '/tmp/live-patch/tc_14/Makefile' has modification time 27 s in the future [10:38:24] make[2]: *** No rule to make target '/tmp/live-patch/tc_14/klp_test_support_mod.mod.o', needed by '/tmp/live-patch/tc_14/klp_test_support_mod.ko'. Stop. [10:38:24] make[1]: *** [/usr/src/linux-6.0.12-1/scripts/Makefile.modpost:140: __modpost] Error 2 [10:38:24] make: *** [../../../linux-6.0.12-1/Makefile:1780: modules] Error 2 [10:38:24] make: Leaving directory '/usr/src/linux-6.0.12-1-obj/x86_64/default' [10:38:24] *** Load test support module [10:38:24] insmod: ERROR: could not load module : No such file or directory [10:38:24] *** Removing patches [10:38:24] TEST FAILED while executing 'insmod "$SUPPORT_KO"' Obviously debugging does not cause the problem on test 14. Not sure if you have nough info in dmesg [2]. [1] https://openqa.opensuse.org/tests/2955134/file/serial_terminal.txt [2] https://openqa.opensuse.org/tests/2955134/file/serial0.txt -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1206321 https://bugzilla.suse.com/show_bug.cgi?id=1206321#c12 --- Comment #12 from Petr Mladek <pmladek@suse.com> --- (In reply to Petr Vorel from comment #11)
Hm, test 14 failed, even on on test which does *not* run echo "file kernel/livepatch/* +p" ... [1]:
== Trace patched function == [10:38:50] Test Case 14: Trace patched function [10:38:50] *** Compiling kernel live patch [10:38:50] make: Entering directory '/usr/src/linux-6.0.12-1-obj/x86_64/default' [10:38:50] CC [M] /tmp/live-patch/tc_14/klp_tc_14_livepatch.o [10:38:51] MODPOST /tmp/live-patch/tc_14/Module.symvers [10:38:51] CC [M] /tmp/live-patch/tc_14/klp_tc_14_livepatch.mod.o [10:38:51] LD [M] /tmp/live-patch/tc_14/klp_tc_14_livepatch.ko [10:38:51] BTF [M] /tmp/live-patch/tc_14/klp_tc_14_livepatch.ko
[10:38:51] Skipping BTF generation for /tmp/live-patch/tc_14/klp_tc_14_livepatch.ko due to unavailability of vmlinux [10:38:51] make: Leaving directory '/usr/src/linux-6.0.12-1-obj/x86_64/default' [10:38:51] *** Compile test support module [10:38:51] make: Entering directory '/usr/src/linux-6.0.12-1-obj/x86_64/default' [10:38:23] make: Warning: File '/tmp/live-patch/tc_14/modules.order' has modification time 27 s in the future [10:38:24] make[1]: Warning: File '/tmp/live-patch/tc_14/Makefile' has modification time 27 s in the future [10:38:24] CC [M] /tmp/live-patch/tc_14/klp_test_support_mod.o [10:38:24] make[1]: warning: Clock skew detected. Your build may be incomplete. [10:38:24] make[1]: Warning: File '/tmp/live-patch/tc_14/Makefile' has modification time 27 s in the future
I wonder where the modification time in the future comes from. It usually happens when some directory is shared via NFS.
[10:38:24] make[2]: *** No rule to make target '/tmp/live-patch/tc_14/klp_test_support_mod.mod.o', needed by '/tmp/live-patch/tc_14/klp_test_support_mod.ko'. Stop.
The module has generic name "klp_test_support_mod" in compare with, for example, klp_tc_14_livepatch.mod.o. I wonder if this is somehow shared with other tests via NFS. And the tests are stepping into each other food.
[10:38:24] make[1]: *** [/usr/src/linux-6.0.12-1/scripts/Makefile.modpost:140: __modpost] Error 2 [10:38:24] make: *** [../../../linux-6.0.12-1/Makefile:1780: modules] Error 2 [10:38:24] make: Leaving directory '/usr/src/linux-6.0.12-1-obj/x86_64/default' [10:38:24] *** Load test support module [10:38:24] insmod: ERROR: could not load module : No such file or directory [10:38:24] *** Removing patches [10:38:24] TEST FAILED while executing 'insmod "$SUPPORT_KO"'
Obviously debugging does not cause the problem on test 14. Not sure if you have nough info in dmesg [2].
[1] https://openqa.opensuse.org/tests/2955134/file/serial_terminal.txt [2] https://openqa.opensuse.org/tests/2955134/file/serial0.txt
I do not see anything interesting in dmesg. This compilation problem seems to be on the user-space side. Note that Test Case 8 passed in this run. This compilation error seems to be another unrelated issue. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1206321 https://bugzilla.suse.com/show_bug.cgi?id=1206321#c13 --- Comment #13 from Petr Vorel <petr.vorel@suse.com> --- (In reply to Petr Mladek from comment #12)
Note that Test Case 8 passed in this run. This compilation error seems to be another unrelated issue.
I haven't noticed 14 failing when 8 was failing, but it can be really unrelated (14 likely needs it's own fix). -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1206321 https://bugzilla.suse.com/show_bug.cgi?id=1206321#c14 --- Comment #14 from Petr Vorel <petr.vorel@suse.com> --- (In reply to Petr Mladek from comment #12)
I wonder where the modification time in the future comes from. It usually happens when some directory is shared via NFS.
We might use NFS on openQA workers, which run SUT over QEMU. IMHO we don't use any NFS in SUT, but I'll check it, also ask others. BTW It's not a first bug reproducible in openQA which I was not able to reproduce outside openQA on https://openqa.opensuse.org/ or https://openqa.suse.de/ (some bugs I'm not able to trigger not only when I run qcow created by openQA, but not even in my openQA worker). -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1206321 https://bugzilla.suse.com/show_bug.cgi?id=1206321#c15 --- Comment #15 from Miroslav Bene�� <mbenes@suse.com> --- Yes, that build failure is unrelated and is really strange, so I would say it must be really something specific to openQA setup as Petr M. said. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1206321 https://bugzilla.suse.com/show_bug.cgi?id=1206321#c16 --- Comment #16 from Petr Vorel <petr.vorel@suse.com> --- BTW, running extensively found there are more errors: https://openqa.opensuse.org/tests/2956639/file/serial_terminal.txt [03:50:05] Test Case 10: Patch caller of graph traced callee [03:50:05] *** Compiling kernel live patch [03:50:05] make: Entering directory '/usr/src/linux-6.0.12-1-obj/x86_64/default' [03:50:06] CC [M] /tmp/live-patch/tc_10/klp_tc_10_livepatch.o [03:50:06] MODPOST /tmp/live-patch/tc_10/Module.symvers [03:50:06] CC [M] /tmp/live-patch/tc_10/klp_tc_10_livepatch.mod.o [03:50:07] LD [M] /tmp/live-patch/tc_10/klp_tc_10_livepatch.ko [03:50:07] BTF [M] /tmp/live-patch/tc_10/klp_tc_10_livepatch.ko [03:50:07] Skipping BTF generation for /tmp/live-patch/tc_10/klp_tc_10_livepatch.ko due to unavailability of vmlinux [03:50:07] make: Leaving directory '/usr/src/linux-6.0.12-1-obj/x86_64/default' [03:50:07] *** Compile test support module [03:50:07] make: Entering directory '/usr/src/linux-6.0.12-1-obj/x86_64/default' [03:49:41] make: Warning: File '/tmp/live-patch/tc_10/modules.order' has modification time 25 s in the future [03:49:41] make[1]: Warning: File '/tmp/live-patch/tc_10/Makefile' has modification time 25 s in the future [03:49:41] CC [M] /tmp/live-patch/tc_10/klp_test_support_mod.o [03:49:41] make[1]: warning: Clock skew detected. Your build may be incomplete. [03:49:42] make[1]: Warning: File '/tmp/live-patch/tc_10/Makefile' has modification time 25 s in the future [03:49:42] make[2]: *** No rule to make target '/tmp/live-patch/tc_10/klp_test_support_mod.mod.o', needed by '/tmp/live-patch/tc_10/klp_test_support_mod.ko'. Stop. [03:49:42] make[1]: *** [/usr/src/linux-6.0.12-1/scripts/Makefile.modpost:140: __modpost] Error 2 [03:49:42] make: *** [../../../linux-6.0.12-1/Makefile:1780: modules] Error 2 [03:49:42] make: Leaving directory '/usr/src/linux-6.0.12-1-obj/x86_64/default' [03:49:42] *** Load test support module [03:49:42] insmod: ERROR: could not load module : No such file or directory [03:49:42] *** Removing patches [03:49:42] TEST FAILED while executing 'insmod "$SUPPORT_KO"' Not sure how this can even happen. BTW there is for sure no NFS on SUT. I wonder what else should I look for. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1206321 https://bugzilla.suse.com/show_bug.cgi?id=1206321#c17 --- Comment #17 from Miroslav Bene�� <mbenes@suse.com> --- I am clueless. This is really strange. The recipe for building everything is always the same (there is one function only to do that in klp_tc_functions.sh). And since the compilation succeeds elsewhere... All those warnings about a time modification do not give me much confidence either. Petr, what about klp_tc_8.sh? Is it ok now with the fix? I would push it out in that case. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1206321 https://bugzilla.suse.com/show_bug.cgi?id=1206321#c18 Petr Mladek <pmladek@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |martin.doucha@suse.com --- Comment #18 from Petr Mladek <pmladek@suse.com> --- I do not know how the openqa tests are separated. I guess that they are running in some virtualized hosts. I wonder if the host share some disk for temporary files or if the same host is reused. Anyway, the log: [03:49:41] make: Warning: File '/tmp/live-patch/tc_10/modules.order' has modification time 25 s in the future [03:49:41] make[1]: Warning: File '/tmp/live-patch/tc_10/Makefile' has modification time 25 s in the future shows that files in /tmp have timestamps in the future. It means that either the system clock was updated on this system. Or that the files were generated on another system with another system clock. If the files were generated in another system then /tmp is somehow shared. If it is shared then different systems might break each other, for example, delete/recreate files when the other system are using them. One problem here is that the path "/tmp/live-patch/tc_10" is always the same. It is not safe. If a tool wants to use /tmp a safe way then it should create random subdirectory, e.g. TEMP_DIR=`mktemp -d /tmp/livepatch-XXXXXX`. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1206321 https://bugzilla.suse.com/show_bug.cgi?id=1206321#c19 --- Comment #19 from Petr Vorel <petr.vorel@suse.com> --- (In reply to Petr Mladek from comment #18)
I do not know how the openqa tests are separated. I guess that they are running in some virtualized hosts. I wonder if the host share some disk for temporary files or if the same host is reused.
Tests are run each in it's own VM => separated. SUT is using it's own QCOW2 image (can be downloaded e.g. [1]), each one obviously in it's own directory in /var/lib/openqa/pool/, details from log [2]: [2022-12-19T03:56:10.441605+01:00] [debug] running `/usr/bin/qemu-img info --output=json /var/lib/openqa/pool/1/opensuse-Tumbleweed-x86_64-20221218-DVD@64bit-with-ltp.qcow2` [2022-12-19T03:56:10.495901+01:00] [debug] running `/usr/bin/qemu-img info --output=json /var/lib/openqa/pool/1/openSUSE-Tumbleweed-DVD-x86_64-Snapshot20221218-Media.iso` [2022-12-19T03:56:10.533396+01:00] [debug] Initializing block device images [2022-12-19T03:56:10.533875+01:00] [debug] running `/usr/bin/qemu-img create -f qcow2 -F qcow2 -b /var/lib/openqa/pool/1/opensuse-Tumbleweed-x86_64-20221218-DVD@64bit-with-ltp.qcow2 /var/lib/openqa/pool/1/raid/hd0-overlay0 32212254720` [2022-12-19T03:56:10.577109+01:00] [debug] Formatting '/var/lib/openqa/pool/1/raid/hd0-overlay0', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=32212254720 backing_file=/var/lib/openqa/pool/1/opensuse-Tumbleweed-x86_64-20221218-DVD@64bit-with-ltp.qcow2 backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16 [2022-12-19T03:56:10.577320+01:00] [debug] running `/usr/bin/qemu-img create -f qcow2 -F raw -b /var/lib/openqa/pool/1/openSUSE-Tumbleweed-DVD-x86_64-Snapshot20221218-Media.iso /var/lib/openqa/pool/1/raid/cd0-overlay0 4675600384` [2022-12-19T03:56:10.617701+01:00] [debug] Formatting '/var/lib/openqa/pool/1/raid/cd0-overlay0', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=4675600384 backing_file=/var/lib/openqa/pool/1/openSUSE-Tumbleweed-DVD-x86_64-Snapshot20221218-Media.iso backing_fmt=raw lazy_refcounts=off refcount_bits=16 [2022-12-19T03:56:10.617851+01:00] [debug] init_blockdev_images: Finished creating block devices [2022-12-19T03:56:10.620309+01:00] [debug] starting: /usr/bin/qemu-system-x86_64 -device VGA,edid=on,xres=1024,yres=768 -only-migratable -chardev ringbuf,id=serial0,logfile=serial0,logappend=on -serial chardev:serial0 -audiodev none,id=snd0 -device intel-hda -device hda-output,audiodev=snd0 -global isa-fdc.fdtypeA=none -m 1536 -cpu host -netdev user,id=qanet0 -device virtio-net,netdev=qanet0,mac=52:54:00:12:34:56 -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-pci,rng=rng0 -boot order=c -device qemu-xhci -device usb-tablet -smp 1 -enable-kvm -no-shutdown -vnc :91,share=force-shared -device virtio-serial -chardev pipe,id=virtio_console,path=virtio_console,logfile=virtio_console.log,logappend=on -device virtconsole,chardev=virtio_console,name=org.openqa.console.virtio_console -chardev pipe,id=virtio_console1,path=virtio_console1,logfile=virtio_console1.log,logappend=on -device virtconsole,chardev=virtio_console1,name=org.openqa.console.virtio_console1 -chardev socket,path=qmp_socket,server=on,wait=off,id=qmp_socket,logfile=qmp_socket.log,logappend=on -qmp chardev:qmp_socket -S -device virtio-scsi-pci,id=scsi0 -blockdev driver=file,node-name=hd0-overlay0-file,filename=/var/lib/openqa/pool/1/raid/hd0-overlay0,cache.no-flush=on -blockdev driver=qcow2,node-name=hd0-overlay0,file=hd0-overlay0-file,cache.no-flush=on,discard=unmap -device virtio-blk,id=hd0-device,drive=hd0-overlay0,bootindex=0,serial=hd0 -blockdev driver=file,node-name=cd0-overlay0-file,filename=/var/lib/openqa/pool/1/raid/cd0-overlay0,cache.no-flush=on -blockdev driver=qcow2,node-name=cd0-overlay0,file=cd0-overlay0-file,cache.no-flush=on,discard=unmap -device scsi-cd,id=cd0-device,drive=cd0-overlay0,serial=cd0
Anyway, the log:
[03:49:41] make: Warning: File '/tmp/live-patch/tc_10/modules.order' has modification time 25 s in the future [03:49:41] make[1]: Warning: File '/tmp/live-patch/tc_10/Makefile' has modification time 25 s in the future
This is really strange, I need to download the qcow to investigate. Thanks for finding the main issue.
shows that files in /tmp have timestamps in the future. It means that either the system clock was updated on this system. Or that the files were generated on another system with another system clock.
If the files were generated in another system then /tmp is somehow shared. If it is shared then different systems might break each other, for example, delete/recreate files when the other system are using them.
/tmp is definitely not shared, there is no NFS on the system.
One problem here is that the path "/tmp/live-patch/tc_10" is always the same. It is not safe.
If a tool wants to use /tmp a safe way then it should create random subdirectory, e.g. TEMP_DIR=`mktemp -d /tmp/livepatch-XXXXXX`.
+1 qa_test_klp should fix it with mktemp. But agree, there might be something else wrong in way how openQA prepare/run the VM, I'll have look, but might need to ask openQA experts. openQA uses NVMe, but that should not affect this (fast testing caused some problems in different testsuites). [1] https://openqa.opensuse.org/tests/2962772/asset/hdd/opensuse-Tumbleweed-x86_... [2] https://openqa.opensuse.org/tests/2962772/file/autoinst-log.txt -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1206321 https://bugzilla.suse.com/show_bug.cgi?id=1206321#c20 --- Comment #20 from Petr Mladek <pmladek@suse.com> --- Just for record. I tried to update qa_test_klp/klp_tc_functions.sh and klp_tc_10.sh to create the files and directories under /tmp using mktemp. But it all blown up. The various paths are not used consistently. It created the files in one directory and searched them in another one... It still might be the right solution but it would require cleaning/redesigning the test framework. We should first agree that there is no easy workaround on the QA side. If openqa shares or reuses /tmp then it might cause another hard-to-reproduce problems like this one. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1206321 https://bugzilla.suse.com/show_bug.cgi?id=1206321#c21 --- Comment #21 from Petr Vorel <petr.vorel@suse.com> --- I tried to reproduce the problem running qcow with qemu locally (run test 50x times). I'll try to add some debug code to test which is run in openQA, but I have no idea what could be wrong (as I stated, there is no NFS, no shared folder). openQA workers, which run VMs are often heavily overbooked, but that should not cause time shift. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1206321 https://bugzilla.suse.com/show_bug.cgi?id=1206321#c22 --- Comment #22 from Petr Vorel <petr.vorel@suse.com> --- (In reply to Petr Mladek from comment #20)
Just for record. I tried to update qa_test_klp/klp_tc_functions.sh and klp_tc_10.sh to create the files and directories under /tmp using mktemp. But it all blown up. The various paths are not used consistently. It created the files in one directory and searched them in another one...
IMHO any software should expect being run simultaneously, that's why qa_test_klp should have used mktemp -d. But this will not fix problem in openQA, because test is run only once, /tmp is not shared. "it all blown up" I always wonder why kernel developers like to write testsuites in bash which does not help testsuite to be maintainable and readable. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1206321 https://bugzilla.suse.com/show_bug.cgi?id=1206321#c23 --- Comment #23 from Miroslav Bene�� <mbenes@suse.com> --- (In reply to Petr Vorel from comment #22)
(In reply to Petr Mladek from comment #20)
Just for record. I tried to update qa_test_klp/klp_tc_functions.sh and klp_tc_10.sh to create the files and directories under /tmp using mktemp. But it all blown up. The various paths are not used consistently. It created the files in one directory and searched them in another one...
IMHO any software should expect being run simultaneously, that's why qa_test_klp should have used mktemp -d. But this will not fix problem in openQA, because test is run only once, /tmp is not shared.
Yes, it is worth fixing.
"it all blown up" I always wonder why kernel developers like to write testsuites in bash which does not help testsuite to be maintainable and readable.
You would have to ask the original developer who is not at SUSE anymore, but developers usually choose whatever suits them the best for whatever reason :). -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1206321 https://bugzilla.suse.com/show_bug.cgi?id=1206321#c24 --- Comment #24 from Petr Vorel <petr.vorel@suse.com> --- (In reply to Miroslav Bene�� from comment #23)
(In reply to Petr Vorel from comment #22)
(In reply to Petr Mladek from comment #20)
Just for record. I tried to update qa_test_klp/klp_tc_functions.sh and klp_tc_10.sh to create the files and directories under /tmp using mktemp. But it all blown up. The various paths are not used consistently. It created the files in one directory and searched them in another one...
IMHO any software should expect being run simultaneously, that's why qa_test_klp should have used mktemp -d. But this will not fix problem in openQA, because test is run only once, /tmp is not shared.
Yes, it is worth fixing.
IMHO the best would be to port test to LTP, it's API is much more reliable and readable (both result and the code). Also there are many useful features you don't want to reinvent. At least if you're not happy with the current framework or you plan to rewrite it.
"it all blown up" I always wonder why kernel developers like to write testsuites in bash which does not help testsuite to be maintainable and readable.
You would have to ask the original developer who is not at SUSE anymore, but developers usually choose whatever suits them the best for whatever reason :).
Actually, I talked with Libor in the past and he admitted he didn't know about LTP API quality. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1206321 https://bugzilla.suse.com/show_bug.cgi?id=1206321#c25 --- Comment #25 from Petr Mladek <pmladek@suse.com> --- An alternative implementation of kernel livepatch tests already exists. It is in the kernel sources in tools/testing/selftests/livepatch/. These testcases are updated when a new features are added upstream. I always wanted to migrate the missing tests from our qa_test_klp there but I never found time for this :-( -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1206321 https://bugzilla.suse.com/show_bug.cgi?id=1206321#c26 --- Comment #26 from Petr Mladek <pmladek@suse.com> --- If /tmp is not reused between various openQA instances then the system time gets probably updated during the tests. It seems that also journalctl notices it: From https://openqa.opensuse.org/tests/2955134/file/serial0.txt : [ 149.066718][ T4925] livepatch: 'klp_tc_13_livepatch': unpatching complete [ 151.766355][ T479] systemd-journald[479]: Time jumped backwards, rotating. [ 156.255882][T11438] livepatch: enabling patch 'klp_tc_15_livepatch' From https://openqa.opensuse.org/tests/2956639/logfile?filename=serial0.txt : [ 75.692049][T25067] livepatch: 'klp_tc_8_5_livepatch': unpatching complete [ 78.480561][ T479] systemd-journald[479]: Time jumped backwards, rotating. [ 82.448435][ T2539] livepatch: enabling patch 'klp_tc_11_livepatch' It seems that it is not connected with a particular test. I wonder if it is done by a ntp service that manages to connect a ntp server later when the tests are already running. It would be a generic problem that might affect basically any test. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1206321 https://bugzilla.suse.com/show_bug.cgi?id=1206321#c27 --- Comment #27 from Petr Vorel <petr.vorel@suse.com> --- (In reply to Petr Mladek from comment #25)
An alternative implementation of kernel livepatch tests already exists. It is in the kernel sources in tools/testing/selftests/livepatch/. These testcases are updated when a new features are added upstream.
Thanks for info.
I always wanted to migrate the missing tests from our qa_test_klp there but I never found time for this :-(
I'll probably port kselftest tests to LTP (I was even ask to port some tests as example by folks from Linaro, so that they could follow this effort). The problem with kselftest, that the API is really poor, therefore quality of the varies a lot. Proof about quality of LTP API: even xfstests got inspired [1]. [1] https://lwn.net/Articles/788339/ -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1206321 https://bugzilla.suse.com/show_bug.cgi?id=1206321#c28 --- Comment #28 from Petr Vorel <petr.vorel@suse.com> --- (In reply to Petr Mladek from comment #26)
[ 151.766355][ T479] systemd-journald[479]: Time jumped backwards, rotating.
Thanks a lot for a hint, I noted this also on VM run locally. Also on my laptop, isn't it quite common that time shifts and needs to be updated? OTOH I haven't noticed time shift on x86_64 on SLES 15-SP5 [1], nor on Tumbleweed ppc64le [2].
I wonder if it is done by a ntp service that manages to connect a ntp server later when the tests are already running. It would be a generic problem that might affect basically any test.
I have no idea. There is running chronyd.service, run with no modifications (untouched /etc/sysconfig/chronyd). I noted, that manual attempt to run 'chronyd' and 'chronyc makestep' is in openQA code, but this code is not run in the installer, whose output (produced qcow2) we use for livepatch testing (and for all LTP tests). I suppose openQA really uses the default setup from the distribution. [1] https://openqa.suse.de/tests/10088727/file/serial0.txt [2] https://openqa.opensuse.org/tests/2968575/file/serial0.txt -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1206321 https://bugzilla.suse.com/show_bug.cgi?id=1206321#c29 --- Comment #29 from Miroslav Bene�� <mbenes@suse.com> --- (In reply to Petr Mladek from comment #25)
An alternative implementation of kernel livepatch tests already exists. It is in the kernel sources in tools/testing/selftests/livepatch/. These testcases are updated when a new features are added upstream.
I always wanted to migrate the missing tests from our qa_test_klp there but I never found time for this :-(
That is the idea behind Marcos's patch set sent to upstream. We would then port everything from qa_test_klp to kselftests. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1206321 https://bugzilla.suse.com/show_bug.cgi?id=1206321#c30 Richard Palethorpe <richard.palethorpe@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |richard.palethorpe@suse.com --- Comment #30 from Richard Palethorpe <richard.palethorpe@suse.com> --- kselftests are generally awful and hardly anyone automates running them. When they do it's only a small subset. If you want the tests to be maintained and run consistently across all kernels then you should send them to LTP. The advantage of having a separate repository is that we do not have to backport tests, we write them so that they function on multiple kernels. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1206321 https://bugzilla.suse.com/show_bug.cgi?id=1206321#c31 --- Comment #31 from Miroslav Bene�� <mbenes@suse.com> --- (In reply to Richard Palethorpe from comment #30)
kselftests are generally awful and hardly anyone automates running them. When they do it's only a small subset. If you want the tests to be maintained and run consistently across all kernels then you should send them to LTP.
I understand that and we have already discussed that a couple of times. However, we would like to keep the live patching tests really close to the kernel. Our policy is that every new substantial change or feature should also have a selftest for that. It is easily enforced and our experience with that is only positive. It would be much harder to mandate it with tests living elsewhere. Another thing is that live patches are really close to the kernel and its API. It is not completely like a testing from userspace. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1206321 https://bugzilla.suse.com/show_bug.cgi?id=1206321#c32 --- Comment #32 from Petr Mladek <pmladek@suse.com> --- Recapitulation: 1. This bug started with an error that a module was not removed. It was fix by adding klp_wait_complete(), see comment #9. This fix is already pushed as the commit 23038844639c831765d ("klp_tc_8: Wait for module reference count to drain after replace_all patch is applied"). 2. Then we saw another problem that a module could not be loaded. The most promising theory is that the module is not build properly because the system clock is updated and some files have timestamps in the future. Both "make" and "journalctl" complain about this time shift, see the comment #18 and comment #26. And it would explain that the module was not build properly. At the moment we do not know why it happens. But I could imagine that it takes some time until a NTP service (chronyd.service) connects the ntp servers and synchronizes the time. We would need to check the chronyd service logs to prove this theory. It might also be in the full journalctl log. 3. We realized that qa_test_klp is not using "mktemp" when creating files and directories in /tmp. We agreed that this should get fixed. But this should not cause the problem because openQA does not share /tmp. This triggered the discussion about fixing/rewriting the tests. We all agree that rewrite is the right approach. The ideas are to move the tests into LTP or upstream kselftests. As a result, this bugs is very messy now. My proposal: 1. Close this bug as the original problem is solved. 2. Open another bug to solve the problem with timestamps in the future and building problems. 3. Open EPIC (or how it is called) in Jira to migrate qa_test_klp and discuss it there. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1206321 https://bugzilla.suse.com/show_bug.cgi?id=1206321#c33 Petr Vorel <petr.vorel@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|IN_PROGRESS |RESOLVED Resolution|--- |FIXED --- Comment #33 from Petr Vorel <petr.vorel@suse.com> --- (In reply to Petr Mladek from comment #32) ...
As a result, this bugs is very messy now.
I'm sorry for abusing single bug report for discussiong several other topics.
My proposal:
1. Close this bug as the original problem is solved.
2. Open another bug to solve the problem with timestamps in the future and building problems.
3. Open EPIC (or how it is called) in Jira to migrate qa_test_klp and discuss it there.
Makes sense, closing. -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@suse.com