Comment # 93 on bug 1055641 from Franck Bui

Ok I think we made some progress here and thanks to Olli for all the feedback
and testing he provided so far.

So here is what's happening from my understanding:

During shutdown, systemd stops all services and kills all remaining processes
which are still alive (systemd killing spree). However some processes are
excluded from the killing spree: all processes whose first character of the
zeroth command line argument is '@'.

And plymouth is one of those since it still wants to run after dracut takes
over.

Once dracut becomes PID1, it kills all remaining processes that still
references /oldroot (the path where the rootfs is now mounted RO) in order to
umount it definitively. Plymouth is the process that still holds reference to
/oldroot at that point and therefore is killed by dracut.

And I think this is where the problem lives: dracut doesn't wait for plymouth
to exit. Instead it sends the KILL signal (done by "killall_proc_mountpoint
/oldroot") and then right after tries to umount /oldroot (done by umount_a()).
But in this case, plymouth still exists and the umount fails.

Another strange part is the way dracut tries to umount /oldroot in a loop:

  _cnt=0
  while [ $_cnt -le 40 ]; do
      umount_a 2>/dev/null || break
      _cnt=$(($_cnt+1))
  done

  [ $_cnt -ge 40 ] && umount_a

unlike its name suggests, "umount_a" only unmounts /oldroot.

If umount_a() succeeds, dracut will continue to umount /oldroot even if the
exit status suggests that /oldroot was unmounted and there's no point to
continue. OTOH if umount_a() fails then dracut won't do any other attempts and
will break the loop.

Note also that all errors are hidden since stderr of umount_a() is redirected
to /dev/null.

Anyways this hopefully shows that the issue lives in dracut so I'm reassigning
this bug to Daniel.