[Bug 754259] New: kiwi/zypper: Installation of packages fails randomly, sometimes silently
https://bugzilla.novell.com/show_bug.cgi?id=754259 https://bugzilla.novell.com/show_bug.cgi?id=754259#c0 Summary: kiwi/zypper: Installation of packages fails randomly, sometimes silently Classification: openSUSE Product: openSUSE 12.1 Version: Final Platform: x86-64 OS/Version: openSUSE 12.1 Status: NEW Severity: Normal Priority: P5 - None Component: Other AssignedTo: bnc-team-screening@forge.provo.novell.com ReportedBy: Yarny@public-files.de QAContact: qa-bugs@suse.de Found By: --- Blocker: --- Created an attachment (id=483428) --> (http://bugzilla.novell.com/attachment.cgi?id=483428) Script/config to reproduce. kiwi logfile. User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.2) Gecko/20100101 Firefox/10.0.2 Since I started building kiwi live systems with openSUSE 12.1, I'm observing a subtle bug. It seems that sometimes, not all files/directories get created properly, e.g. the directory /var/lib/empty is missing regularly. When this happens, there is no indication of it in the kiwi log. I'm attaching a tar file which contains * config.xml : A sample configuration to trigger the bug. * config.sh : Very simple script, it just checks for /var/lib/empty and fails ('exit 1') if this dir doesn't exist. * doit.sh : A script which mounts /dev/sr1 to /mnt/suse and then repeatedly runs 'kiwi --prepare...' until it fails. * kiwi.log : A log from one of my tests where it failed. Reproducible: Sometimes Steps to Reproduce: 1. unpack attached tar 2. insert the SUSE-12.1 DVD into /dev/sr0 3. Run doit.sh doit.sh will call kiwi at most 10 times. On my test system, this is usually enough to trigger the bug, but sometimes I have to call it again. Actual Results: config.sh fails because /var/lib/empty is missing. Expected Results: No missing files/directories. I'm currently testing openSUSE 12.1 inside VirtualBox, which is hosted on 11.4. Therefore, there is also the possibility that data corruption by 11.4's VirtualBox is causing this. I tried to rule this out by creating sha1sums of all files on the SUSE-12.1 DVD repeatedly. Since 30 loops produced the same result, I don't think it's the fault of VirtualBox. It's not just /var/lib/empty which is missing. I once observed that /etc/alternatives lost most of its content, e.g. vim was missing and I couldn't call vi directly. During my testruns, I also observed other failures of the kiwi build process. I suspect that these failures are related to the same bug. Here are two excerpts from kiwi's log:
[...] Installing: yast2-theme-openSUSE-2.21.18-2.1.1 [error] Installation of yast2-theme-openSUSE-2.21.18-2.1.1 failed: (with --nodeps --force) Error: Subprocess failed. Error: RPM failed: warning: /var/cache/kiwi/zypper/packages/http:__localhost:50080_0/rpm/noarch/yast2-theme-openSUSE-2.21.18 error: unpacking of archive failed on file /usr/share/YaST2/theme/openSUSE/control-center/title-bar-right.png;4f6ef69f: cpio: open failed - No such file or directory error: yast2-theme-openSUSE-2.21.18-2.1.1.noarch: install failed
[...] Installing: acpid-2.0.10-10.1.3 [error] Installation of acpid-2.0.10-10.1.3 failed: (with --nodeps --force) Error: Subprocess failed. Error: RPM failed: warning: /var/cache/kiwi/zypper/packages/tmp_suse/rpm/x86_64/acpid-2.0.10-10.1.3.x86_64.rpm: Header V3 RS touch: cannot touch `/var/lib/systemd/migrated/acpid': No such file or directory error: %pre(acpid-2.0.10-10.1.3.x86_64) scriptlet failed, exit status 1 error: acpid-2.0.10-10.1.3.x86_64: install failed Such failures occur very rarely, maybe one in 100 builds, so I cannot reproduce them reliably.
The package/pattern list in config.xml is minimal (for me); I wasn't able to reproduce the bug without pattern:base and kernel-default. I'm not sure what causes this behaviour, maybe it's a bug somewhere in rpm ...? On the other hand, I also occasionally install openSUSE 12.1 directly via "zypper -R /mnt/new_root install packages...". I never observed this bug when doing so. Apologies for blaming kiwi+zypper, but kiwi is currently the only tool for me to trigger this bug. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=754259 https://bugzilla.novell.com/show_bug.cgi?id=754259#c kk zhang <kkzhang@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |kkzhang@novell.com AssignedTo|bnc-team-screening@forge.pr |ms@suse.com |ovo.novell.com | -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=754259 https://bugzilla.novell.com/show_bug.cgi?id=754259#c1 Marcus Schaefer <ms@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO InfoProvider| |Yarny@public-files.de --- Comment #1 from Marcus Schaefer <ms@suse.com> 2012-03-28 09:52:52 UTC --- Hmm, I'm afraid but I can't reproduce this. The directory /var/lib/empty belongs to the filesystem package. So it's part of the rpm package list. Thus the install process would fail if the package can't be installed but a silent ignore of this can't be reproduced in any of the tests I made Other than that errors like Error: RPM failed: warning: /var/cache/kiwi/zypper/packages/tmp_suse/rpm/x86_64/acpid-2.0.10-10.1.3.x86_64.rpm: Header V3 RS touch: cannot touch `/var/lib/systemd/migrated/acpid': No such file or directory error: %pre(acpid-2.0.10-10.1.3.x86_64) scriptlet failed, exit status 1 error: acpid-2.0.10-10.1.3.x86_64: install failed sounds more like packaging bugs and should be assigned to the maintainer you said you are running kiwi in virtualbox which runs 12.1 and builds a 12.1 image with kiwi. is that correct ? Your host system is 11.4. Can you try to build a 12.1 image on your 11.4 host so that there is no virtualbox instance in between. Can you reproduce the same error there too ? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=754259 https://bugzilla.novell.com/show_bug.cgi?id=754259#c2 Marcus Schaefer <ms@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |RESOLVED InfoProvider|Yarny@public-files.de | Resolution| |NORESPONSE Severity|Normal |Major --- Comment #2 from Marcus Schaefer <ms@suse.com> 2012-03-30 09:36:36 UTC --- looks like there is no further information ? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=754259 https://bugzilla.novell.com/show_bug.cgi?id=754259#c3 Yarny Yarny <Yarny@public-files.de> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|NORESPONSE | --- Comment #3 from Yarny Yarny <Yarny@public-files.de> 2012-03-30 11:26:14 UTC --- Sorry Marcus, you're to quick for my. OK, I tried it with 11.4's kiwi: The bug did not appear within 100 cycles. I build a PXE image for 12.1, booted one of my machines from this image, and tried it with the 12.1-kiwi inside this PXE environment: The bug did not appear within 100 cycles. btw: I build this PXE image in the 12.1-VirtualBox. /var/lib/empty was missing again, but my test-code in config.sh did /not/ catch it. It didn't harm the kiwi test later (I hope). I'm currently trying to find out why config.sh didn't detect this.
sounds more like packaging bugs and should be assigned to the maintainer Hmmm, maybe the "No such file or directory" errors are also a result of filesystem...rpm not being installed properly. But as this only happens sometimes, I guess there is a race condition somewhere. I'm not sure how the filesystem package works and if there might be a race condition inside an rpm at all. It might be the fault of the rpm binary.
Do you see any chance to further encircle the cause of this bug? I think it would be a big advantage to get kiwi out of the way. On the other hand, I would understand if you don't want to go any further since at the moment this cannot be reproduced without VirtualBox. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=754259 https://bugzilla.novell.com/show_bug.cgi?id=754259#c4 Marcus Schaefer <ms@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |RESOLVED Resolution| |WONTFIX --- Comment #4 from Marcus Schaefer <ms@suse.com> 2012-03-30 12:52:43 UTC --- the filesystem rpm is basically a collection of directories to represent the basic root filesystem structure. This is a pretty light weight package and shouldn't create any kind of trouble... last famous words
From your report it really looks like something which is related to virtual box and maybe the handling of I/O operations from within the VM a simple run out of disk space issue is not possible, right ?
Other than that you can try to call the 'zypper ... install ...' command from within your virtual box instance into a directory e.g /tmp/test-inst and let it install packages. It's the same thing kiwi does here and you should see the same errors if they appear I'm sorry but I don't see how I could fix that with respect to the kiwi code so I'm closing this as a wontfix Hope you don't mind -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=754259 https://bugzilla.novell.com/show_bug.cgi?id=754259#c5 Yarny Yarny <Yarny@public-files.de> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|WONTFIX | Summary|kiwi/zypper: Installation |rpm in qemu-kvm or VBox: |of packages fails randomly, |Installed files/directories |sometimes silently |vanish mysteriously --- Comment #5 from Yarny Yarny <Yarny@public-files.de> 2012-09-08 12:43:04 UTC ---
you can try to call the 'zypper ... install ...' command [...] It's the same thing kiwi does here and you should see the same errors if they appear I followed that path to get closer to the cause of this bug. This report/REOPEN is the result of this.
qemu-kvm -smp 4 -drive file=suse121a.qcow2,media=disk,cache=writethrough -m 768 -monitor stdio -enable-kvm -vnc :0 The script installs the filesystem rpm (takes it from /tmp/) into an empty, temporary root tree, then generates a hash sum of the content of this root
find: `./media': No such file or directory find: `./mnt': No such file or directory find: `./sys': No such file or directory find: `./home': No such file or directory This makes me believe that the directories have been present when 'find' parsed
I can now reproduce this problem within qemu-kvm and within VirtualBox. What happens is basically this: rpm installs a package (possibly indirectly through zypper or kiwi), and milliseconds after the rpm binary exited, some files/directories it created mysteriously disappear. You can try the attached script to reproduce it. I'm using a x86_64 machine and this qemu command line: tree. It does so continuously until the hash sum is different from the previous one. When I run it, it mostly stops after ~4000 iterations because lots of directories are missing (e.g. /media, /var/...). The hash sum is generated by `(cd $tmproot ; find) | sha1sum` and sometimes 'find' reports errors of the form the root tree, but they vanish before 'find' has had the chance to enter these subdirs and look for further entries. It seems the likelihood of this bug occuring depends on the choice of the cache in the qemu command line: "writethrough" seems to be best for triggering it, "unsafe" also works, but I couldn't reproduce it with "writeback" and "none". My current host is openSUSE 12.1 (dual core x86_64), and I observed this bug with guests 12.1 and 12.2 (last RC). I faintly recall having observed it without -smp, but I couldn't reproduce it today. Also I failed to reproduce it on an old i686 laptop (oS 12.1, inside VBox), but this is a slow machine without HW virtualization and I had to stop it after ~6000 iterations. This bug does not depend on direct invocation of rpm: As mentioned in my original report it cripples about every fifth kiwi build, and it also, rarely, generates errors while installing openSUSE from DVD (rpm fails because a directory is missing which should have been created by another package). I consulted the forum: <URL:http://forums.opensuse.org/english/get-technical-help-here/virtualization/477701-rpm-installation-randomly-fails-inside-virtualbox-ideas-needed.html> (They pointed me to qemu-kvm as another test environment). Marcus, I'm sorry if this REOPEN ends up on your "desk" again. I see that this is not connected to kiwi at all. Do you have an idea to whom this might be best forwarded?, maybe a qemu or rpm or kernel/filesystem maintainer? I'm really desperate on this: I can't use openSUSE reliably in virtualized environments and I don't know how to further track down this bug. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=754259 https://bugzilla.novell.com/show_bug.cgi?id=754259#c6 --- Comment #6 from Yarny Yarny <Yarny@public-files.de> 2012-09-08 12:46:50 UTC --- Created an attachment (id=504951) --> (http://bugzilla.novell.com/attachment.cgi?id=504951) Shell script to repeatedly call rpm until the bug occurs This script installs the filesystem rpm in a loop until some files/dirs are missing (see last comment). P.S.: I forgot to mention: I couldn't reproduce this bug when the qemu guest booted from a live cd. It requires a harddisk installation of suse. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=754259 https://bugzilla.novell.com/show_bug.cgi?id=754259#c7 Marcus Schaefer <ms@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |RESOLVED Resolution| |WORKSFORME --- Comment #7 from Marcus Schaefer <ms@suse.com> 2013-01-04 20:44:02 UTC --- I'm afraid I can't reproduce this anymore -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com