[Bug 1230062] New: RISC-V: virtio-console sometimes gets stuck in set_termios.part.0
https://bugzilla.suse.com/show_bug.cgi?id=1230062 Bug ID: 1230062 Summary: RISC-V: virtio-console sometimes gets stuck in set_termios.part.0 Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: RISC-V OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Kernel Assignee: kernel-bugs@opensuse.org Reporter: fvogt@suse.com QA Contact: qa-bugs@suse.de Target Milestone: --- Found By: --- Blocker: --- Created attachment 877101 --> https://bugzilla.suse.com/attachment.cgi?id=877101&action=edit /proc/*/task/*/stack + strace of bash openQA types in commands to the serial console, sends a newline and then waits until the prompt appears. Running this in a loop causes it to get stuck sometimes: https://openqa.opensuse.org/tests/4445068#step/http_srv/16 In the QEMU chardev log it appears like this: # systemctl --no-pager is-active apache2; echo hjhkw-$?- active hjhkw-0- echo Lorem ipsum dolor sit amet > /srv/www/htdocs/index.html openQA sends 'systemctl --no-pager is-active apache2; echo hjhkw-$?-\n', receives 'active\nhjhkw-0-\n', then sends 'echo Lorem ipsum dolor sit amet > /srv/www/htdocs/index.html\n' and waits for '# '. That does not arrive however and the wait times out. In most cases that's just ignored by openQA and it moves on to the next command. When it sends that ("curl -f ..." here), the missing part suddenly appears: # echo Lorem ipsum dolor sit amet > /srv/www/htdocs/index.html # curl -f http://localhost/ | grep 'Lorem ipsum dolor sit amet'; echo _Z5Xg-$?- This issue appears to happen on riscv64 only, I have not observed it anywhere else so far. Thankfully the arbitrarily long delay of the prompt allows to debug this. I created a testcase that exercises the serial terminal in a loop and so far it always gets stuck before the 20th attempt. I logged onto the graphical console and collected some info. The /bin/bash process on the console is stuck in the kernel: /proc/1624/task/1624/stack:[<0>] set_termios.part.0+0x130/0x206 /proc/1624/task/1624/stack:[<0>] tty_mode_ioctl+0x46e/0x576 /proc/1624/task/1624/stack:[<0>] n_tty_ioctl_helper+0x2a/0xe4 /proc/1624/task/1624/stack:[<0>] n_tty_ioctl+0x9a/0x148 /proc/1624/task/1624/stack:[<0>] tty_ioctl+0x170/0x64e /proc/1624/task/1624/stack:[<0>] __riscv_sys_ioctl+0x82/0xaa /proc/1624/task/1624/stack:[<0>] do_trap_ecall_u+0x200/0x212 /proc/1624/task/1624/stack:[<0>] handle_exception+0xde/0xde FWICT this is inside the "retry_write_wait" loop: https://github.com/torvalds/linux/blob/67784a74e258a467225f0e68335df77acd67b... No other process trace has any related tty call pending. When I attached to the process with strace, it immediately continued and the missing output appeared on the serial console. /proc/*/task/*/stack and strace log are attached. The first syscall that strace reports is rt_sigprocmask(SIG_BLOCK, ...), which usually follows ioctl(0, TCSETSW, ...), so it was probably stuck in that ioctl. As it helps to provide more input or attach with strace, it appears like some wakeup signal is missing or got lost that would notify the wait that the write has completed. -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1230062 https://bugzilla.suse.com/show_bug.cgi?id=1230062#c1 --- Comment #1 from Andreas Schwab <schwab@suse.de> --- Does that also happen if you use U-Boot intead of OVMF? -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1230062 https://bugzilla.suse.com/show_bug.cgi?id=1230062#c2 Jiri Slaby <jslaby@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jslaby@suse.com --- Comment #2 from Jiri Slaby <jslaby@suse.com> --- (In reply to Fabian Vogt from comment #0)
FWICT this is inside the "retry_write_wait" loop: https://github.com/torvalds/linux/blob/ 67784a74e258a467225f0e68335df77acd67b7ab/drivers/tty/tty_ioctl.c#L487
This means the driver never sends the chars in xmit_fifo completely. (Or forgets to tty_wakeup()). Which driver is driving this tty0? amba pl011? -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1230062 https://bugzilla.suse.com/show_bug.cgi?id=1230062#c3 --- Comment #3 from Fabian Vogt <fvogt@suse.com> --- (In reply to Andreas Schwab from comment #1)
Does that also happen if you use U-Boot intead of OVMF?
Yes. I used u-boot.bin from u-boot:qemu-riscv64smode for "-kernel u-boot.bin" and the issue remains unchanged. (In reply to Jiri Slaby from comment #2)
(In reply to Fabian Vogt from comment #0)
FWICT this is inside the "retry_write_wait" loop: https://github.com/torvalds/linux/blob/ 67784a74e258a467225f0e68335df77acd67b7ab/drivers/tty/tty_ioctl.c#L487
This means the driver never sends the chars in xmit_fifo completely. (Or forgets to tty_wakeup()).
Which driver is driving this tty0? amba pl011?
It's about hvc0 here which is a virtio-console, i.e. drivers/char/virtio_console.c -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1230062 https://bugzilla.suse.com/show_bug.cgi?id=1230062#c4 --- Comment #4 from Andreas Schwab <schwab@suse.de> --- Anything suspicious in /sys/kernel/debug/virtio-ports/vport*? -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1230062 https://bugzilla.suse.com/show_bug.cgi?id=1230062#c5 --- Comment #5 from Fabian Vogt <fvogt@suse.com> --- (In reply to Andreas Schwab from comment #4)
Anything suspicious in /sys/kernel/debug/virtio-ports/vport*?
Looks fine to me: name: org.openqa.console.virtio_console guest_connected: 1 host_connected: 1 outvq_full: 0 bytes_sent: 1583 bytes_received: 808 bytes_discarded: 0 is_console: yes console_vtermno: 1 -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1230062 https://bugzilla.suse.com/show_bug.cgi?id=1230062#c6 --- Comment #6 from Andreas Schwab <schwab@suse.de> --- Which version of qemu does that use? -- You are receiving this mail because: You are the assignee for the bug.
participants (1)
-
bugzilla_noreply@suse.com