Bug ID | 1230062 |
---|---|
Summary | RISC-V: virtio-console sometimes gets stuck in set_termios.part.0 |
Classification | openSUSE |
Product | openSUSE Tumbleweed |
Version | Current |
Hardware | RISC-V |
OS | Other |
Status | NEW |
Severity | Normal |
Priority | P5 - None |
Component | Kernel |
Assignee | kernel-bugs@opensuse.org |
Reporter | fvogt@suse.com |
QA Contact | qa-bugs@suse.de |
Target Milestone | --- |
Found By | --- |
Blocker | --- |
Created attachment 877101 [details] /proc/*/task/*/stack + strace of bash openQA types in commands to the serial console, sends a newline and then waits until the prompt appears. Running this in a loop causes it to get stuck sometimes: https://openqa.opensuse.org/tests/4445068#step/http_srv/16 In the QEMU chardev log it appears like this: # systemctl --no-pager is-active apache2; echo hjhkw-$?- active hjhkw-0- echo Lorem ipsum dolor sit amet > /srv/www/htdocs/index.html openQA sends 'systemctl --no-pager is-active apache2; echo hjhkw-$?-\n', receives 'active\nhjhkw-0-\n', then sends 'echo Lorem ipsum dolor sit amet > /srv/www/htdocs/index.html\n' and waits for '# '. That does not arrive however and the wait times out. In most cases that's just ignored by openQA and it moves on to the next command. When it sends that ("curl -f ..." here), the missing part suddenly appears: # echo Lorem ipsum dolor sit amet > /srv/www/htdocs/index.html # curl -f http://localhost/ | grep 'Lorem ipsum dolor sit amet'; echo _Z5Xg-$?- This issue appears to happen on riscv64 only, I have not observed it anywhere else so far. Thankfully the arbitrarily long delay of the prompt allows to debug this. I created a testcase that exercises the serial terminal in a loop and so far it always gets stuck before the 20th attempt. I logged onto the graphical console and collected some info. The /bin/bash process on the console is stuck in the kernel: /proc/1624/task/1624/stack:[<0>] set_termios.part.0+0x130/0x206 /proc/1624/task/1624/stack:[<0>] tty_mode_ioctl+0x46e/0x576 /proc/1624/task/1624/stack:[<0>] n_tty_ioctl_helper+0x2a/0xe4 /proc/1624/task/1624/stack:[<0>] n_tty_ioctl+0x9a/0x148 /proc/1624/task/1624/stack:[<0>] tty_ioctl+0x170/0x64e /proc/1624/task/1624/stack:[<0>] __riscv_sys_ioctl+0x82/0xaa /proc/1624/task/1624/stack:[<0>] do_trap_ecall_u+0x200/0x212 /proc/1624/task/1624/stack:[<0>] handle_exception+0xde/0xde FWICT this is inside the "retry_write_wait" loop: https://github.com/torvalds/linux/blob/67784a74e258a467225f0e68335df77acd67b7ab/drivers/tty/tty_ioctl.c#L487 No other process trace has any related tty call pending. When I attached to the process with strace, it immediately continued and the missing output appeared on the serial console. /proc/*/task/*/stack and strace log are attached. The first syscall that strace reports is rt_sigprocmask(SIG_BLOCK, ...), which usually follows ioctl(0, TCSETSW, ...), so it was probably stuck in that ioctl. As it helps to provide more input or attach with strace, it appears like some wakeup signal is missing or got lost that would notify the wait that the write has completed.