[Bug 929806] New: glibc: SR#295007 needed be reverted (threaded-trim-threshold.patch)
http://bugzilla.opensuse.org/show_bug.cgi?id=929806 Bug ID: 929806 Summary: glibc: SR#295007 needed be reverted (threaded-trim-threshold.patch) Classification: openSUSE Product: openSUSE Factory Version: 201503* Hardware: Other OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Basesystem Assignee: bnc-team-screening@forge.provo.novell.com Reporter: dimstar@opensuse.org QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- the latest submission of glibc has been reverted in openSUSE:Factory again. Already in the staging area we saw an increased number of 'installation hangs' with this patch. The system would simply not finish installing and deadlocking. In order to exclude this to just be an anomaly, we checked it in to openSUSE:Factory, where we saw the same hang across multiple test runs. As a consequence, that patch has been reverted again until somebody can find out / fix the installation hangs introduced by it. A sample test: https://openqa.opensuse.org/tests/60367/modules/livecdreboot/steps/21 -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=929806
Dominique Leuenberger
http://bugzilla.opensuse.org/show_bug.cgi?id=929806
Martin Pluskal
http://bugzilla.opensuse.org/show_bug.cgi?id=929806
--- Comment #1 from Mel Gorman
http://bugzilla.opensuse.org/show_bug.cgi?id=929806
--- Comment #2 from Mel Gorman
http://bugzilla.opensuse.org/show_bug.cgi?id=929806
--- Comment #3 from Dominique Leuenberger
Marcus, I see you assigned this to Andreas but did you see comment 2 where it was stated that this is very likely to be a bug in the installer using uninitialised memory?
Or rpm - or any of the rpm scriptlets running code. or libzypp, or [...] In the various tests I'd seen, the lockup was not always in the same package(s). -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=929806
--- Comment #4 from Mel Gorman
(In reply to Mel Gorman from comment #2)
Marcus, I see you assigned this to Andreas but did you see comment 2 where it was stated that this is very likely to be a bug in the installer using uninitialised memory?
Or rpm - or any of the rpm scriptlets running code. or libzypp, or [...]
In the various tests I'd seen, the lockup was not always in the same package(s).
I think the installation scripts are a bad fit because we'd expect the same packages to freeze each time. It's also very likely that they are single-threaded which means they are unaffected by the glibc patch. rpm also feels like a bad fit because it's short-lived and I don't see calls to pthread_create in there. libzypp, zypper or the installer are better candidates because at least zypper is threaded and they're long-lived enough to eventually see an unluckly allocation pattern that gets uninitialised memory. I guessed the installer simply because zypper use on an installed system seems ok. Bugs due to uninitialised memory are not a bug in glibc though so the assignee still is inappropriate. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=929806
Mel Gorman
http://bugzilla.opensuse.org/show_bug.cgi?id=929806
--- Comment #8 from Mel Gorman
One way of testing would be to force the installer to globally set MALLOC_CHECK_=2 during installation and see does that "fix" it. I don't know how to setup a temporary installation environment like that but some of the yast people should.
Sure :-)
Simply use a boot parameter MALLOC_CHECK_=2 and the installer will export it to the environment, producing the desired result. It seems even PID 1 has it.
All righty Martin, thanks. Dominique, I know these are dumb questions but I never deal with the installer and just want to push this along so we don't get burned in the future when glibc updates again. Is there still an ISO image available that freezes during install? I can at least download it and see if MALLOC_CHECK_=2 "fixes" it. That would at least indicate that something in the installer has an uninitialised memory bug. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=929806
Dominique Leuenberger
http://bugzilla.opensuse.org/show_bug.cgi?id=929806
Mel Gorman
@Mel,
The link in the original comment to openQA also allows you to get the ISO file used for the task.
https://openqa.opensuse.org/tests/60367 => https://openqa.opensuse.org/tests/60367/asset/3037
Well, I get a duh prize. I used to ISO and KVM to reproduce this. 1 in 5 installations appear to fail with a freeze where the UI ceases to interact -- X pointer works, no text can be selected and the UI cannot be interacted with. Terminal switching still works and using that I checked what was active. There were no RPM scripts active or any portion of rpm. tar existed as a zombie process that was a child of y2base. Even if they were the problem with packages, the UI would not freeze and besides, it would always be the same package that froze. The window manager is not threaded so that's not likely to be the problem. What appears to be frozen is y2base. I'll now test with MALLOC_CHECK_=2 and see does it freeze but right now, y2base appears to be the primary candidate as the problem. Martin, would you be able to or identify someone on the yast team that could run the installer through valgrind to see if it spits out any warnings about uninitialised memory use and debug it? Ideally it would be with the devel version of glibc but it's not strictly necessary as uninitinialised memory use is unconditionally a bug regardless of system libraries used. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=929806
Mel Gorman
(In reply to Dominique Leuenberger from comment #9)
@Mel,
The link in the original comment to openQA also allows you to get the ISO file used for the task.
https://openqa.opensuse.org/tests/60367 => https://openqa.opensuse.org/tests/60367/asset/3037
<SNIP> I used to ISO and KVM to reproduce this. 1 in 5 installations appear to fail with a freeze where the UI ceases to interact -- X pointer works, no text can be selected and the UI cannot be interacted with. Terminal switching still works and using that I checked what was active.
I'll now test with MALLOC_CHECK_=2 and see does it freeze
I successfully installed 10 times without freezes with MALLOC_CHECK_=2 specified as a boot parameter. At this point, it really looks like y2base is the source. Based on the experiences with llvm regression suites, I also suspect it's due to an uninitialised memory bug. I updated the bug title accordingly. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=929806
--- Comment #12 from Mel Gorman
http://bugzilla.opensuse.org/show_bug.cgi?id=929806
Mel Gorman
Thank you, Mel. But https://openqa.opensuse.org/tests/60367/asset/3037 seems to have expired. Do you have the image around?
I have a copy locally but it could take a few days to complete an upload due to limited upstream bandwidth. Does anyone cc'd have a copy on a machine in an office that they could make available?
I am testing with https://w3.suse.de/~mvidner/glibc-bsc929806.iso which I run as
qemu-kvm -m 4096 -smp 2 -cdrom ~/svn/mksusecd/glibc-bsc929806.iso scratch.qcow2 with the boot option VALGRIND=1
I'm unable to reproduce the freeze with this ISO. Has anything changed in yast since about mid-April? It's possible it got accidentally fixed or worked around since the original glibc submission. Related to that, is the version of yast used the same as what it is in Factory? If so then it might be appropriate to try resubmit SR#295007. At worst, the same problem will recur but there will be a problematic ISO available. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=929806
--- Comment #21 from Dominique Leuenberger
I'm unable to reproduce the freeze with this ISO. Has anything changed in yast since about mid-April?
Actually, yes, we have fixed some GCC warnings: https://github.com/yast/yast-core/pull/100 I *think* this should not change things related to uninitialized memory, but it seems best to retry the glibc submission. I am not sure how to do that since https://build.opensuse.org/request/show/295007 is marked as Accepted.
As glibc was revertd post-accept, you will have create a new submitrequest:
osc sr Base:System glibc openSUSE:Factory -m "Let's retry to see what this brings"
-- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=929806
--- Comment #22 from Mel Gorman
(In reply to Martin Vidner from comment #20)
I'm unable to reproduce the freeze with this ISO. Has anything changed in yast since about mid-April?
Actually, yes, we have fixed some GCC warnings: https://github.com/yast/yast-core/pull/100 I *think* this should not change things related to uninitialized memory, but it seems best to retry the glibc submission. I am not sure how to do that since https://build.opensuse.org/request/show/295007 is marked as Accepted.
As glibc was revertd post-accept, you will have create a new submitrequest:
As there have been no changes to the Base:System glibc project since, I went ahead and created a new request 309677. Thanks. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=929806
--- Comment #23 from Mel Gorman
Mel, can you resubmit glibc and then resolve this as Works For Me please?
It's resubmitted but I did not close this as resolved until we see if the ISO created for openQA testing reproduces the problem or not. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=929806
http://bugzilla.opensuse.org/show_bug.cgi?id=929806#c26
Mel Gorman
participants (1)
-
bugzilla_noreply@novell.com