[yast-devel] Call for ideas: parted vs. udev-events vs. kernel
In the YaST storage area, we have a problem that manifests itself in very subtle bugs: Every once in a while, device nodes (in particular for partitions) are not present after a seemingly harmless call like "parted -l", not even after calling "udevadm settle". That means that even though parted just told us that there should be a /dev/sda1, /dev/sda2 etc., those device nodes are just not there (yet - they are recreated moments later), and operations on those device nodes (like trying to determine the filesystem, if any, on them) fail with ENOENT. After investigating this quite some bit, it turned out that even though it should be purely read-only, "parted -l" opens the disk device first read-only, then closes it, then reopens it read-write (for reasons we don't understand yet) - and in that process the kernel gets the ioctl() call to re-read the partition table, which triggers that the device nodes for the partitions on that disk are first completely removed and then recreated (by udev AFAIK). snwint even tried to add a real read-only mode to parted, but so far this was not successful. Since our parted maintainer is very busy with other things at this time (and for the immediate future), here is the call to our developer community: What can we reasonably do about this? Does anybody have a good idea, or maybe even might want to contribute? Likely causes ============= It is a spurious problem, but it happens most often in virtual machines. It might have to do with only one CPU typically being configured, so there is no real parallel processing; when parted sends the "re-read partition table" ioctl() to the kernel, the device nodes for the partitions are first removed, then the partition table is re-read, then udev events are generated to recreate them. We suspect that it might be a race condition due to task switches between udev and YaST at unlucky moments: The kernel would generate more udev events, but udev is still busy, and there is a task switch to YaST which fires off an "udevadm settle" call - which returns immediately because right now the udev event queue is indeed empty, and YaST continues, in the next step trying to open one of those partition devices, which fails because the device node is not present yet. Real life relevance =================== It would be bad enough if this would only hurt OpenQA, but this is very likely very relevant for customers, too: Virtual machines for web hosters, for example. Brainstorming approach #1: Suspend udev ======================================= We talked about suspending udev for a while, maybe at least for purely passive operations like that "parted -l". Can this realistically be done? Would any pending udev events get completely lost, or would they remain in the queue? What would we lose in either case? We can't do that for every parted call, in particular not for those where we create or delete partitions since in those cases, we would want the updates what the partitioning looks like now. Not sure if the same problem would not reappear, just a little less likely. Brainstorming approach #2: Use someting other than parted ========================================================= In the past, we had used "fdisk" on i386, and IIRC there were other tools on other architecturs. parted was the promise to unify them all and make the others obsolete, which it did pretty good - except for this issue. But would we really want to trade that one problem for all the others we had gotten rid of when we moved to parted? I don't think so. Verdict: Not desirable. Brainstorming approach #3: Tactical sleep ========================================== We try our best to avoid dysfunctional and broken-by-design approaches like adding sleep() calls to hope everything has settled down after that time. Those things tend to accumulate in code and never go away, and while they cost time constantly, those timeouts are always too short (if we are unlucky) and too long (the user has to wait) at the same time. Verdict: Not desirable. Brainstorming approach #4: Add read-only mode to parted ======================================================= "parted -l", which we are using in this situation, should already be a read-only operation. Unfortunately, strace shows that this is not the case: It starts with opening the disk device read-only, reads information, closes it - and then for whatever reason opens it again read-write which triggers the code that sends the ioctl() to make the kernel re-read the partition table. We consider that a bug (https://bugzilla.suse.com/show_bug.cgi?id=979275), but it does not seem to be easy to fix. Any contribution to that would be very welcome. Brainstorming approach #5: [add your idea here] =============================================== [this area intentionally left blank] - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Related bugs: https://bugzilla.suse.com/show_bug.cgi?id=979275 https://bugzilla.suse.com/show_bug.cgi?id=978137 Please note that while I will not be in the office during the next week, Arvin, Ancor and Steffen will be here to follow up on this. Kind regards -- Stefan Hundhammer <shundhammer@suse.de> YaST Developer SUSE Linux GmbH GF: Felix Imendörffer, Jane Smithard, Graham Norton; HRB 21284 (AG Nürnberg) Maxfeldstr. 5, 90409 Nürnberg, Germany -- To unsubscribe, e-mail: yast-devel+unsubscribe@opensuse.org To contact the owner, e-mail: yast-devel+owner@opensuse.org
Hello, maily FYI because I don't know a solution On May 25 17:06 Stefan Hundhammer wrote (excerpt):
Every once in a while, device nodes (in particular for partitions) are not present after a seemingly harmless call like "parted -l", not even after calling "udevadm settle".
In the Relax-and-Recover (rear) installer we have the same kind of problems, see https://github.com/rear/rear/issues/791 And FYI a somehow related issue is https://github.com/rear/rear/issues/799
Brainstorming approach #1: Suspend udev
Something like this was tried in the rear installer, see https://github.com/rear/rear/pull/518 and it had interesting complicated bad effects, see https://github.com/rear/rear/issues/533 which lead to https://bugzilla.opensuse.org/show_bug.cgi?id=914245 and https://bugzilla.opensuse.org/show_bug.cgi?id=914311 In short: For non-udev experts: Don't touch how udev works ;-)
Brainstorming approach #3: Tactical sleep
I also tried to avoid that but meanwhile I think it is needed (as some kind of fallback) in addition to other stuff, cf. https://github.com/rear/rear/issues/791#issuecomment-211318620 (excerpt): -------------------------------------------------------------- Perhaps what is needed to make it really work reilably is - "udevadm settle" plus - waiting for the actual "thingy" plus - retries of commands that need the actual "thingy" -------------------------------------------------------------- where "waiting" and "retries" implicitly mean to sleep. Kind Regards Johannes Meixner -- SUSE LINUX GmbH - GF: Felix Imendoerffer, Jane Smithard, Graham Norton - HRB 21284 (AG Nuernberg) -- To unsubscribe, e-mail: yast-devel+unsubscribe@opensuse.org To contact the owner, e-mail: yast-devel+owner@opensuse.org
Hi, On Wed, 25 May 2016, Stefan Hundhammer wrote:
Brainstorming approach #4: Add read-only mode to parted =======================================================
"parted -l", which we are using in this situation, should already be a read-only operation. Unfortunately, strace shows that this is not the case: It starts with opening the disk device read-only, reads information, closes it - and then for whatever reason opens it again read-write which triggers the code that sends the ioctl() to make the kernel re-read the partition table.
That's of course stupid of parted.
We consider that a bug (https://bugzilla.suse.com/show_bug.cgi?id=979275), but it does not seem to be easy to fix. Any contribution to that would be very welcome.
See attached, fixes the bug by always opening read-only and lazily switching to read-write only when necessary (i.e. a write or flush operation occurs). No libparted API changes, purely internal to the linux "backend". I think I got all places where _ensure_read_write must be called, if you hit problems it should be easy to diagnose, because a forgotten call will lead to obvious errors for using a write on a read-only FD, so should be easy to diagnose and add. (also contains a local fix when not using blkid) Ciao, Michael.
On 2016-05-25 18:23, Michael Matz wrote:
We consider that a bug (https://bugzilla.suse.com/show_bug.cgi?id=979275), but it does not seem to be easy to fix. Any contribution to that would be very welcome.
See attached, fixes the bug by always opening read-only and lazily switching to read-write only when necessary (i.e. a write or flush operation occurs). No libparted API changes, purely internal to the linux "backend". I think I got all places where _ensure_read_write must be called, if you hit problems it should be easy to diagnose, because a forgotten call will lead to obvious errors for using a write on a read-only FD, so should be easy to diagnose and add. (also contains a local fix when not using blkid)
Wow, thanks! That looks *very* promising. Since I won't be back before June 6 (and then there will be that week-long CSM workshop in Z-Bau in the Südstadt), I'd ask Steffen, Arvin or Ancor to start testing with that patch so we can get it into SP2. Kind regards HuHa -- To unsubscribe, e-mail: yast-devel+unsubscribe@opensuse.org To contact the owner, e-mail: yast-devel+owner@opensuse.org
On Wed, May 25, 2016 at 06:23:21PM +0200, Michael Matz wrote:
Hi,
On Wed, 25 May 2016, Stefan Hundhammer wrote:
Brainstorming approach #4: Add read-only mode to parted =======================================================
"parted -l", which we are using in this situation, should already be a read-only operation. Unfortunately, strace shows that this is not the case: It starts with opening the disk device read-only, reads information, closes it - and then for whatever reason opens it again read-write which triggers the code that sends the ioctl() to make the kernel re-read the partition table.
That's of course stupid of parted.
We consider that a bug (https://bugzilla.suse.com/show_bug.cgi?id=979275), but it does not seem to be easy to fix. Any contribution to that would be very welcome.
See attached, fixes the bug by always opening read-only and lazily switching to read-write only when necessary (i.e. a write or flush operation occurs). No libparted API changes, purely internal to the linux "backend". I think I got all places where _ensure_read_write must be called, if you hit problems it should be easy to diagnose, because a forgotten call will lead to obvious errors for using a write on a read-only FD, so should be easy to diagnose and add. (also contains a local fix when not using blkid)
Michael, many thanks for the patch. I was really convinced I would have change the libparted API, but you have proved me wrong. I did some adjustments to the patch (e.g. it was necessary to call _flush_cache() from _ensure_read_write()), attached it to bsc#979275 and submitted to Factory. Whether to submit it to Beta2 or wait for Beta3 is being discussed now. Thanks again, Petr -- Petr Uzel TL SUSE L3 Team 2
Hello, Am Mittwoch, 25. Mai 2016, 18:23:21 CEST schrieb Michael Matz:
On Wed, 25 May 2016, Stefan Hundhammer wrote:
Brainstorming approach #4: Add read-only mode to parted =======================================================
See attached, fixes the bug by always opening read-only and lazily switching to read-write only when necessary (i.e. a write or flush operation occurs). No libparted API changes, purely internal to the linux "backend". I think I got all places where _ensure_read_write must be called, if you hit problems it should be easy to diagnose, because a forgotten call will lead to obvious errors for using a write on a read-only FD, so should be easy to diagnose and add. (also contains a local fix when not using blkid)
Minor nitpicking on the patch: It seems the parted code (mostly) uses spaces, but the patch introduces several tabs (I also verified this with the updated patch in Base:System/ parted). Can you fix this detail, please? Regards, Christian Boltz -- <hendersj> I guess part of the question then becomes what the ultimate purpose of the board is. I've always been under the impression that they guide the project <suseROCKs> hendersj, "What's the meaning of life" is a much less complex question :-D [from #opensuse-project] -- To unsubscribe, e-mail: yast-devel+unsubscribe@opensuse.org To contact the owner, e-mail: yast-devel+owner@opensuse.org
Hi all, On Wed, May 25, 2016 at 05:06:07PM +0200, Stefan Hundhammer wrote:
In the YaST storage area, we have a problem that manifests itself in very subtle bugs:
Stefan: Thank you for starting this thread. [...]
After investigating this quite some bit, it turned out that even though it should be purely read-only, "parted -l" opens the disk device first read-only,
This is the probing done by parted itself,
then closes it, then reopens it read-write (for reasons we don't understand yet)
This second RW open is done by libparted - it simply does not care if the operation to be performed needs RW access or not. BTW, as I have less and less time for parted, I would like to offer it to anybody who would like to become a (co?)maintainer. Best, Petr -- Petr Uzel TL SUSE L3 Team 2
participants (6)
-
Christian Boltz
-
Johannes Meixner
-
Michael Matz
-
Petr Uzel
-
shundhammer
-
Stefan Hundhammer