Mailinglist Archive: yast-devel (61 mails)

< Previous Next >
[yast-devel] Call for ideas: parted vs. udev-events vs. kernel
In the YaST storage area, we have a problem that manifests itself in very subtle bugs:

Every once in a while, device nodes (in particular for partitions) are not present after a seemingly harmless call like "parted -l", not even after calling "udevadm settle".

That means that even though parted just told us that there should be a /dev/sda1, /dev/sda2 etc., those device nodes are just not there (yet - they are recreated moments later), and operations on those device nodes (like trying to determine the filesystem, if any, on them) fail with ENOENT.

After investigating this quite some bit, it turned out that even though it should be purely read-only, "parted -l" opens the disk device first read-only, then closes it, then reopens it read-write (for reasons we don't understand yet) - and in that process the kernel gets the ioctl() call to re-read the partition table, which triggers that the device nodes for the partitions on that disk are first completely removed and then recreated (by udev AFAIK).

snwint even tried to add a real read-only mode to parted, but so far this was not successful.

Since our parted maintainer is very busy with other things at this time (and for the immediate future), here is the call to our developer community:

What can we reasonably do about this? Does anybody have a good idea, or maybe even might want to contribute?


Likely causes
=============

It is a spurious problem, but it happens most often in virtual machines. It might have to do with only one CPU typically being configured, so there is no real parallel processing; when parted sends the "re-read partition table" ioctl() to the kernel, the device nodes for the partitions are first removed, then the partition table is re-read, then udev events are generated to recreate them.

We suspect that it might be a race condition due to task switches between udev and YaST at unlucky moments: The kernel would generate more udev events, but udev is still busy, and there is a task switch to YaST which fires off an "udevadm settle" call - which returns immediately because right now the udev event queue is indeed empty, and YaST continues, in the next step trying to open one of those partition devices, which fails because the device node is not present yet.


Real life relevance
===================

It would be bad enough if this would only hurt OpenQA, but this is very likely very relevant for customers, too: Virtual machines for web hosters, for example.


Brainstorming approach #1: Suspend udev
=======================================

We talked about suspending udev for a while, maybe at least for purely passive operations like that "parted -l". Can this realistically be done? Would any pending udev events get completely lost, or would they remain in the queue? What would we lose in either case?

We can't do that for every parted call, in particular not for those where we create or delete partitions since in those cases, we would want the updates what the partitioning looks like now. Not sure if the same problem would not reappear, just a little less likely.


Brainstorming approach #2: Use someting other than parted
=========================================================

In the past, we had used "fdisk" on i386, and IIRC there were other tools on other architecturs. parted was the promise to unify them all and make the others obsolete, which it did pretty good - except for this issue. But would we really want to trade that one problem for all the others we had gotten rid of when we moved to parted? I don't think so.

Verdict: Not desirable.


Brainstorming approach #3: Tactical sleep
==========================================

We try our best to avoid dysfunctional and broken-by-design approaches like adding sleep() calls to hope everything has settled down after that time. Those things tend to accumulate in code and never go away, and while they cost time constantly, those timeouts are always too short (if we are unlucky) and too long (the user has to wait) at the same time.

Verdict: Not desirable.


Brainstorming approach #4: Add read-only mode to parted
=======================================================

"parted -l", which we are using in this situation, should already be a read-only operation. Unfortunately, strace shows that this is not the case: It starts with opening the disk device read-only, reads information, closes it - and then for whatever reason opens it again read-write which triggers the code that sends the ioctl() to make the kernel re-read the partition table.

We consider that a bug (https://bugzilla.suse.com/show_bug.cgi?id=979275), but it does not seem to be easy to fix. Any contribution to that would be very welcome.



Brainstorming approach #5: [add your idea here]
===============================================




[this area intentionally left blank]






- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Related bugs:

https://bugzilla.suse.com/show_bug.cgi?id=979275
https://bugzilla.suse.com/show_bug.cgi?id=978137


Please note that while I will not be in the office during the next week, Arvin, Ancor and Steffen will be here to follow up on this.



Kind regards
--
Stefan Hundhammer <shundhammer@xxxxxxx>
YaST Developer

SUSE Linux GmbH
GF: Felix Imendörffer, Jane Smithard, Graham Norton; HRB 21284 (AG Nürnberg)
Maxfeldstr. 5, 90409 Nürnberg, Germany
--
To unsubscribe, e-mail: yast-devel+unsubscribe@xxxxxxxxxxxx
To contact the owner, e-mail: yast-devel+owner@xxxxxxxxxxxx

< Previous Next >