Joe S changed bug 1206287
What Removed Added
CC   jack@suse.com

Comment # 2 on bug 1206287 from
(In reply to Jiri Slaby from comment #1)
> Provided the trace shows this came from the vmw_vmci module, you'd have to
> complain to VMWare.

Hi Jiri,

When you said "Provided the trace" did you mean what I included in the report
or that you wanted me to run some kind of a trace.   

If so, could you please tell me how to run that?   
Is it as simply as running 'strace vmplayer'?

While waiting for a response, I did some additional research into the problem
so I am including it here so that it will be available to help others which
might encounter this problem.

This link talks about a problem which seems very similar to what I was
experiencing.

   
https://communities.vmware.com/t5/VMware-Workstation-Player/VMWare-Player-crashing-on-Windows-Guest-Shutdown/td-p/1751650

but it is from 2017, using VMWare Player 12.5 and older 4.x kernels.

Despite those BIG differences it still seemed quite similar.

A few key points from the link:

    When the vms were stored on XFS or btrfs instead of ext4 their problems 
    went away.

    One of people suspected a race condition in ext4 that existed since kernel 
    4.10

They provided 3 different workarounds.

I tested the last one which was to modify the VMX file to change

    vmci0.present = "TRUE"

        to

    vmci0.present = "FALSE"

especially since you also mentioned vmci in your response.

I have now booted and shutdown the Windows 10 VM over 10 times and it has NOT
had the ext4 fault occur.

Although it did not previously occur every time, it occurred almost every time
so clearly turning that off addressed the issue.

I don't use Shared Folders ( they were already disabled in the VM ) and don't
use anything related to vmci for inter machine communications.

Having said that, I still wonder if this could be a new bug in recent ext4 code
changes that is also part of the problem.

Here's why I say that.

Back in 03/2022 I reported this bug

    https://bugzilla.opensuse.org/show_bug.cgi?id=1196832

If you review that bug report you will see that it was a similar issue in which
a kernel fix was done to address the problem.

Another key point in that prior bug report ( whose fix resolved the issue ) was
that I use the "data=journal" mount option for my ext4 file systems and that
option is rarely used.

This leads me to believe that recent kernel changes may have reintroduced that
bug or a similar one which only appears again now because I use data=journal
instead of the default.

Jan Kara provided the fix for that prior bug 

Here's what they said about that prior bug

    Thanks for report! Interesting. This is a BUG in page_buffers(page) call
    inside ext4_journalled_writepage_callback(). So this function got passed
    a page without buffers attached which is indeed wrong. I'll have a look 
    if I can find out how that could have happened.

    That being said data=journal mode is not used that much and gets much less 
    testing.

Possible worth Jan taking a look at this issue to see if it is related to the
prior problem which was fixed?


You are receiving this mail because: