[Bug 1018262] New: Installation failure "cpio: rename" PowerPC multipath openQA test
http://bugzilla.opensuse.org/show_bug.cgi?id=1018262 Bug ID: 1018262 Summary: Installation failure "cpio: rename" PowerPC multipath openQA test Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: PowerPC-64 URL: http://openqa.opensuse.org/tests/329391/modules/instal l_and_reboot/steps/21 OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: YaST2 Assignee: yast2-maintainers@suse.de Reporter: normand@linux.vnet.ibm.com QA Contact: jsrain@suse.com Found By: --- Blocker: --- Created attachment 708670 --> http://bugzilla.opensuse.org/attachment.cgi?id=708670&action=edit install_and_reboot-y2logs.tar.bz2 This bug is created as follow-up of previous boo#1009472 to continue investigation of same error after worker update. I am using the same Summary: Installation failure "cpio: rename" PowerPC multipath openQA test As said below I need help to continue investigation of this problem. [Build 20170104] openQA test fails in install_and_reboot ## Observation openQA test in scenario opensuse-Tumbleweed-DVD-ppc64le-install_only_ppc@ppc64le-multipath fails in [install_and_reboot](http://openqa.opensuse.org/tests/329391/modules/install_and_reboot/steps/21) ## Reproducible Fails since (at least) Build [20161110](http://openqa.opensuse.org/tests/303570) ## Expected result Last good: [20161107](http://openqa.opensuse.org/tests/303068) (or more recent) ## Further details Always latest result in this scenario: [latest](http://openqa.opensuse.org/tests/latest?flavor=DVD&arch=ppc64le&version=Tumbleweed&test=install_only_ppc&distri=opensuse&machine=ppc64le-multipath) I am appending below the same status from https://bugzilla.suse.com/show_bug.cgi?id=1009472#c15 I need suggestion to continue investigation as per following status. Current_Status: * The failure is specific to disk multipath test and btrfs for TW PowerPC the reported error in y2log is "cpio: rename" error * No failure for Leap 42.2 * Unable to recreate the failure without openQA env. * Not same failure in ext4 FS in place of btrfs. * The error reported by Yast is any package installation failure and the y2log reports a "cpio: rename" error with no error number. * the "cpio: rename" string is related to error from fsmRename fct in lib/fsm.c: Reported by rpm via the zypp traces from libzypp (for ExternalProgram.cc, Exception.cc, RpmDb.cc) the last error is reported by rpm psm.c rpmpsmUnpack fct as error from rpmPackageFilesInstall the related string from emsg (output of rpmfileStrerror) string "cpio: rename" is build in this rpmfileStrerror by decoding of RPMERR_RENAME_FAILED RC Summary of related source lines: === ./rpm-4.12.0.1/lib/psm.c:671: fsmrc = rpmPackageFilesInstall(psm->ts, psm->te, psm->files, === fsmrc = rpmPackageFilesInstall(psm->ts, psm->te, psm->files, psm, &failedFile); emsg = rpmfileStrerror(fsmrc); rpmlog(RPMLOG_ERR, _("unpacking of archive failed%s%s: %s\n"), (failedFile != NULL ? _(" on file ") : ""), (failedFile != NULL ? failedFile : ""), emsg); === ./rpm-4.12.0.1/lib/rpmfi.c:2111:char * rpmfileStrerror(int rc) ./rpm-4.12.0.1/lib/fsm.c:535: static int fsmRename(const char *opath, const char *path) ./rpm-4.12.0.1/lib/rpmarchive.h RPMERR_RENAME_FAILED = -32774, === -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1018262
http://bugzilla.opensuse.org/show_bug.cgi?id=1018262#c6
--- Comment #6 from Michel Normand
http://bugzilla.opensuse.org/show_bug.cgi?id=1018262
Michal Suchanek
http://bugzilla.opensuse.org/show_bug.cgi?id=1018262
http://bugzilla.opensuse.org/show_bug.cgi?id=1018262#c10
Oliver Kurz
http://bugzilla.opensuse.org/show_bug.cgi?id=1018262
http://bugzilla.opensuse.org/show_bug.cgi?id=1018262#c11
--- Comment #11 from Oliver Kurz
http://bugzilla.opensuse.org/show_bug.cgi?id=1018262
http://bugzilla.opensuse.org/show_bug.cgi?id=1018262#c12
--- Comment #12 from Michel Normand
http://bugzilla.opensuse.org/show_bug.cgi?id=1018262
http://bugzilla.opensuse.org/show_bug.cgi?id=1018262#c13
--- Comment #13 from Michel Normand
http://bugzilla.opensuse.org/show_bug.cgi?id=1018262
Michel Normand
http://bugzilla.opensuse.org/show_bug.cgi?id=1018262
Michal Suchanek
http://bugzilla.opensuse.org/show_bug.cgi?id=1018262
http://bugzilla.opensuse.org/show_bug.cgi?id=1018262#c14
Michal Suchanek
http://bugzilla.opensuse.org/show_bug.cgi?id=1018262
http://bugzilla.opensuse.org/show_bug.cgi?id=1018262#c16
Michel Normand
http://bugzilla.opensuse.org/show_bug.cgi?id=1018262
http://bugzilla.opensuse.org/show_bug.cgi?id=1018262#c17
--- Comment #17 from Michal Suchanek
http://bugzilla.opensuse.org/show_bug.cgi?id=1018262
http://bugzilla.opensuse.org/show_bug.cgi?id=1018262#c18
--- Comment #18 from Oliver Kurz
There is work underway to fix this bug.
Unfortunately the bug is not reliably reproducible inside QA and is very hard to reproduce outside QA. So finding the bug may take some time.
Well, it *is* reproducible within the openQA tests and therefore what I consider "inside QA". https://openqa.opensuse.org/tests/418998 is the latest example from yesterday and the logs explicitly show that it is the same error: ``` 2017-06-10 21:45:02 <5> install(3321) [zypp] Exception.cc(log):137 RpmDb.cc(doInstallPackage):2043 THROW: Subprocess failed. Error: RPM failed: error: unpacking of archive failed on file /usr/share/fonts/100dpi/courO14-ISO8859-10.pcf.gz: cpio: rename 2017-06-10 21:45:02 <5> install(3321) [zypp] Exception.cc(log):137 error: xorg-x11-fonts-7.6-32.1.noarch: install failed 2017-06-10 21:45:02 <5> install(3321) [zypp] Exception.cc(log):137 2017-06-10 21:45:02 <5> install(3321) [zypp] Exception.cc(log):137 2017-06-10 21:45:02 <1> install(3321) [Ruby] modules/PackageCallbacks.rb:422 DonePackage(error: 3, reason: 'Subprocess failed. Error: RPM failed: error: unpacking of archive failed on file /usr/share/fonts/100dpi/courO14-ISO8859-10.pcf.gz: cpio: rename error: xorg-x11-fonts-7.6-32.1.noarch: install failed ```
If you can provide a test case that reproduces the bug without running a full QA installation test that would be helpful.
It might be possible to reproduce the same error by just repeatedly trying to install/uninstall a package using rpm. Other than this, what is the problem with the "full QA installation test"? Only other alternative I have in mind right now is running a specific subset of "xfstests" but I don't know which one would be feasible. @Michel Normand: Maybe you can try out to run xfstests in an environment similar to the one that fails here?
Also using such test to point out a particular kernel commit that causes the bug or makes it more prominent would be helpful.
In case no one did that yet I recommend to check the kernel version differences between the first failed and the last good and then look into the changelog to identify submit requests and commits correspondingly. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1018262
http://bugzilla.opensuse.org/show_bug.cgi?id=1018262#c19
--- Comment #19 from Michal Suchanek
(In reply to Michal Suchanek from comment #17)
There is work underway to fix this bug.
Unfortunately the bug is not reliably reproducible inside QA and is very hard to reproduce outside QA. So finding the bug may take some time.
Well, it *is* reproducible within the openQA tests and therefore what I consider "inside QA". https://openqa.opensuse.org/tests/418998 is the latest example from yesterday and the logs explicitly show that it is the same error:
``` 2017-06-10 21:45:02 <5> install(3321) [zypp] Exception.cc(log):137 RpmDb.cc(doInstallPackage):2043 THROW: Subprocess failed. Error: RPM failed: error: unpacking of archive failed on file /usr/share/fonts/100dpi/courO14-ISO8859-10.pcf.gz: cpio: rename 2017-06-10 21:45:02 <5> install(3321) [zypp] Exception.cc(log):137 error: xorg-x11-fonts-7.6-32.1.noarch: install failed 2017-06-10 21:45:02 <5> install(3321) [zypp] Exception.cc(log):137 2017-06-10 21:45:02 <5> install(3321) [zypp] Exception.cc(log):137 2017-06-10 21:45:02 <1> install(3321) [Ruby] modules/PackageCallbacks.rb:422 DonePackage(error: 3, reason: 'Subprocess failed. Error: RPM failed: error: unpacking of archive failed on file /usr/share/fonts/100dpi/courO14-ISO8859-10.pcf.gz: cpio: rename error: xorg-x11-fonts-7.6-32.1.noarch: install failed ```
And about half of the tests succeed for recent builds and most of them for Build20170527. That is what I call not reliably reproducible.
If you can provide a test case that reproduces the bug without running a full QA installation test that would be helpful.
It might be possible to reproduce the same error by just repeatedly trying to install/uninstall a package using rpm.
Yes, it *might*. But nobody reproduced it that way so far. So if you have exact steps that lead to the error with reasonable probability go ahead and share them.
Other than this, what is the problem with the "full QA installation test"?
That it happens after a lengthy process on a virtual machine somewhere in QA which is trashed after the test rather than on a developer machine where the state of the system can be analyzed after the error.
Only other alternative I have in mind right now is running a specific subset of "xfstests" but I don't know which one would be feasible.
Or some tar or cpio benchmarks come to mind, yes. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1018262
http://bugzilla.opensuse.org/show_bug.cgi?id=1018262#c20
--- Comment #20 from Michel Normand
http://bugzilla.opensuse.org/show_bug.cgi?id=1018262
http://bugzilla.opensuse.org/show_bug.cgi?id=1018262#c21
--- Comment #21 from Michel Normand
FYIO, as a bypass I added in openQA a retry of packages install (1), retry that allow to complete the Leap 42.3 ppc64le Build0089.
(1) https://openqa.opensuse.org/tests/421918#step/install_and_reboot/3
Similarly same bypass working also for TW last 20170615 snapshot (ppc64/ppc64le) https://openqa.opensuse.org/tests/422452#step/install_and_reboot/3 https://openqa.opensuse.org/tests/422451#step/install_and_reboot/3 -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1018262
http://bugzilla.opensuse.org/show_bug.cgi?id=1018262#c22
Michel Normand
And about half of the tests succeed for recent builds and most of them for Build20170527. That is what I call not reliably reproducible. ...[CUT]...
With Last Leap 42.3 Build 0101 the failure is reproducible on trial as per two exemples (1) and (2). There were some btrfs disk capacity captured for similar bug #1039504 (I do not have access to this bug, could you add me in cc ?) as detailed in (3) Would that data capture is sufficient and if not, what need to be added ? Note that (1) and (2) are clone_job with increased HDDSIZEGB as per (4) (1) https://openqa.opensuse.org/tests/433628#step/install_and_reboot/6 (DVD) (2) https://openqa.opensuse.org/tests/433630#step/install_and_reboot/6 (NET) (3) https://github.com/os-autoinst/os-autoinst-distri-opensuse/commit/22add07cf4... (4) ==== $/usr/share/openqa/script/clone_job.pl --from https://openqa.opensuse.org 433351 --host https://openqa.opensuse.org HDDSIZEGB=20 BETA=1 --skip-download Created job #433628: opensuse-42.3-DVD-ppc64le-Build0101-minimalx@ppc64le -> https://openqa.opensuse.org/t433628 === $/usr/share/openqa/script/clone_job.pl --from https://openqa.opensuse.org 433343 --host https://openqa.opensuse.org HDDSIZEGB=20 BETA=1 --skip-download Created job #433630: opensuse-42.3-NET-ppc64le-Build0101-minimalx@ppc64le -> https://openqa.opensuse.org/t433630 === -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1018262
http://bugzilla.opensuse.org/show_bug.cgi?id=1018262#c23
--- Comment #23 from Michel Normand
participants (1)
-
bugzilla_noreply@novell.com