[Bug 1203854] New: Command criu dump fails on Tumbleweed and ALP
https://bugzilla.suse.com/show_bug.cgi?id=1203854 Bug ID: 1203854 Summary: Command criu dump fails on Tumbleweed and ALP Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: Other OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Kernel Assignee: kernel-bugs@opensuse.org Reporter: jlopez@suse.com QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- Dumping a process does not work. Runnning this simple loop test [1] fails on TW using criu-3.17.1 (although the same version works on Leap 15.4): (00.015852) Collecting fds (pid: 8313) (00.015856) ---------------------------------------- (00.015904) Found 4 file descriptors (00.015908) ---------------------------------------- (00.015931) Dump private signals of 8313 (00.015940) Dump shared signals of 8313 (00.015948) Dump rseq of 8313: ptr = 0x7f8df8e18f60 sign = 0x53053053 (00.015973) Parasite syscall_ip at 0x55f78eb6f000 (00.016150) Set up parasite blob using memfd (00.016201) Putting parasite blob into 0x7f5700f79000->0x7f8df8b34000 (00.016318) Dumping general registers for 8313 in native mode (00.016333) Dumping GP/FPU registers for 8313 (00.016357) x86: xsave runtime structure (00.016364) x86: ----------------------- (00.016367) x86: cwd:0x37f swd:0 twd:0 fop:0 mxcsr:0x1f80 mxcsr_mask:0xffff (00.016371) x86: magic1:0x46505853 extended_size:1092 xstate_bv:0x17 xstate_size:1088 (00.016378) x86: xstate_bv: 0x17 (00.016381) x86: ----------------------- (00.016384) Putting tsock into pid 8313 (00.016432) Error (criu/parasite-syscall.c:88): si_code=4 si_pid=8313 si_status=11 (00.016441) Error (criu/parasite-syscall.c:95): 8313 was stopped by 11 unexpectedly This was detected when testing podman checkpoints in TW and ALP (with cockpit), as part of the 1:1 system management workgroup. [1] https://criu.org/Simple_loop -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1203854
Takashi Iwai
https://bugzilla.suse.com/show_bug.cgi?id=1203854
https://bugzilla.suse.com/show_bug.cgi?id=1203854#c1
--- Comment #1 from Takashi Iwai
https://bugzilla.suse.com/show_bug.cgi?id=1203854
https://bugzilla.suse.com/show_bug.cgi?id=1203854#c2
Takashi Iwai
https://bugzilla.suse.com/show_bug.cgi?id=1203854
https://bugzilla.suse.com/show_bug.cgi?id=1203854#c3
Jos� Iv�n L�pez Gonz�lez
It seems to be a breakage due to the LTO enablement. I turned off the LTO build, and it worked, at least for the local est with simple-loop.
The fix was submitted to TW.
Could you verify with the package in OBS devel:tools/criu?
I have tested the new packages for TW [1] and ALP [2]. In TW everything works as expected (using Btrfs file system). In ALP (also Btrfs but read-only), the criu simple-loop test works, but podman-checkpoint still complains: $ podman container checkpoint distracted_babbage ERRO[0000] container still running ERRO[0000] criu failed: type NOTIFY errno 0 log file: /var/lib/containers/storage/btrfs-containers/ee203bab8fadb391481ec3179657eccd3970e3f9acc087ab8869890cd7938239/userdata/dump.log Error: `/usr/bin/runc checkpoint --image-path /var/lib/containers/storage/btrfs-containers/ee203bab8fadb391481ec3179657eccd3970e3f9acc087ab8869890cd7938239/userdata/checkpoint --work-path /var/lib/containers/storage/btrfs-containers/ee203bab8fadb391481ec3179657eccd3970e3f9acc087ab8869890cd7938239/userdata ee203bab8fadb391481ec3179657eccd3970e3f9acc087ab8869890cd7938239` failed: exit status 1 $ tail -n 10 /var/lib/containers/storage/btrfs-containers/ee203bab8fadb391481ec3179657eccd3970e3f9acc087ab8869890cd7938239/userdata/dump.log (00.185430) sockets: Sockects collect procedure family AF_INET6 proto IPPROTO_UDPLITE: -2 (00.189057) sockets: Sockects collect procedure family AF_INET6 proto IPPROTO_RAW: -2 (00.192736) sockets: Sockects collect procedure family AF_PACKET proto IPPROTO_IP: -2 (00.196484) sockets: Sockects collect procedure family AF_NETLINK proto IPPROTO_RAW: -2 (00.196555) Unlock network (00.196566) Running network-unlock scripts (00.196587) RPC (00.201989) Unfreezing tasks into 1 (00.202011) Unseizing 1377 into 1 (00.202053) Error (criu/cr-dump.c:2053): Dumping FAILED. I have also tried this workaround [2] without success. Any further idea? Thanks! [1] https://build.opensuse.org/package/show/devel:tools/criu [2] https://build.opensuse.org/package/show/SUSE:ALP/criu [3] https://criu.org/Filesystems_pecularities#BTRFS_Workaround -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1203854
https://bugzilla.suse.com/show_bug.cgi?id=1203854#c4
--- Comment #4 from Jos� Iv�n L�pez Gonz�lez
https://bugzilla.suse.com/show_bug.cgi?id=1203854
https://bugzilla.suse.com/show_bug.cgi?id=1203854#c5
--- Comment #5 from Takashi Iwai
https://bugzilla.suse.com/show_bug.cgi?id=1203854
Takashi Iwai
https://bugzilla.suse.com/show_bug.cgi?id=1203854
https://bugzilla.suse.com/show_bug.cgi?id=1203854#c6
--- Comment #6 from Jos�� Iv��n L��pez Gonz��lez
https://bugzilla.suse.com/show_bug.cgi?id=1203854
https://bugzilla.suse.com/show_bug.cgi?id=1203854#c7
Jos� Iv�n L�pez Gonz�lez
participants (1)
-
bugzilla_noreply@suse.com