[Bug 970239] New: [openQA][20160308] Kernel 4.4.4 fails to boot in all i586 tests
http://bugzilla.opensuse.org/show_bug.cgi?id=970239 Bug ID: 970239 Summary: [openQA][20160308] Kernel 4.4.4 fails to boot in all i586 tests Classification: openSUSE Product: openSUSE Tumbleweed Version: 2015* Hardware: Other OS: Other Status: NEW Severity: Critical Priority: P5 - None Component: Kernel Assignee: kernel-maintainers@forge.provo.novell.com Reporter: dimstar@opensuse.org QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- The latest openSUSE Tumbleweed snapshot (20160308) fails in all i586 tests (ok, only two tests left) openQA does not report any visible error on screen (just black screen) - a local test was 'more verbose' and printed two lines of text:
Probing EDD (edd=off to disable)... ok
Failed to allocate space for phdrs
-- System halted
Fastest way to reproduce: - Grab the asset/NET installer from https://openqa.opensuse.org/tests/128465/asset/iso/openSUSE-Tumbleweed-NET-i... - Load it in a qemu, I used :
qemu-kvm -cdrom openSUSE-Tumbleweed-NET-i586-Snapshot20160308-Media.iso -m 2048
In the boot menu, choose 'install' or 'rescue system' (basically anything booting the kernel from the disk) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=970239
http://bugzilla.opensuse.org/show_bug.cgi?id=970239#c1
Dominique Leuenberger
From their report, the kernel might not be the fault per se, but just the victim of a binutils upgrade (Which we happened to integrate into TW on 20160305 - but only with the new kernel checkin was the kernel triggered for rebuild, thus masking the actual 'source'
CC Richard -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=970239
Ludwig Nussel
http://bugzilla.opensuse.org/show_bug.cgi?id=970239
http://bugzilla.opensuse.org/show_bug.cgi?id=970239#c2
Ismail Donmez
http://bugzilla.opensuse.org/show_bug.cgi?id=970239
http://bugzilla.opensuse.org/show_bug.cgi?id=970239#c3
--- Comment #3 from Dominique Leuenberger
http://bugzilla.opensuse.org/show_bug.cgi?id=970239
http://bugzilla.opensuse.org/show_bug.cgi?id=970239#c5
--- Comment #5 from Dominique Leuenberger
Let me push two fixes from the 2.26 branch to Factory (one for some fallout Andreas observed). Maybe it magically fixes things here, too - the patch from H.J looks completely unrelated to me unless you start building the kernel with LTO.
The update will fix swo#19739 and swo#19775, checked into devel:gcc, SR#369051
Thanks - As staging will take quite some time and does not test i586, I setup an additional branch: https://build.opensuse.org/project/show/home:dimstar:boo970239 (currently building your binutils as well as kernel-default, all limited to i586) Out of the resulting binaries I will spin a NET ISO for testing (that should be much faster than waiting for staging/openQA) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=970239
http://bugzilla.opensuse.org/show_bug.cgi?id=970239#c7
Felix Miata
http://bugzilla.opensuse.org/show_bug.cgi?id=970239
http://bugzilla.opensuse.org/show_bug.cgi?id=970239#c8
--- Comment #8 from Dominique Leuenberger
http://bugzilla.opensuse.org/show_bug.cgi?id=970239
http://bugzilla.opensuse.org/show_bug.cgi?id=970239#c9
--- Comment #9 from Dominique Leuenberger
The error printed on screen comes from arch/x86/boot/compressed/misc.c (if I found the right spot)
293 phdrs = malloc(sizeof(*phdrs) * ehdr.e_phnum); 294 if (!phdrs) 295 error("Failed to allocate space for phdrs");
Something must have gone awfully wrong by the time we are here... objdump -a vmlinz: Size of program headers: 32 (bytes) Number of program headers: 3 -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=970239
http://bugzilla.opensuse.org/show_bug.cgi?id=970239#c10
--- Comment #10 from Dominique Leuenberger
https://build.opensuse.org/project/show/home:dimstar:boo970239 (currently building your binutils as well as kernel-default, all limited to i586)
Out of the resulting binaries I will spin a NET ISO for testing (that should be much faster than waiting for staging/openQA)
Kernel build completed - the resulting kernel-default does not boot -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=970239
http://bugzilla.opensuse.org/show_bug.cgi?id=970239#c11
Fabian Vogt
(In reply to Dominique Leuenberger from comment #8)
The error printed on screen comes from arch/x86/boot/compressed/misc.c (if I found the right spot)
293 phdrs = malloc(sizeof(*phdrs) * ehdr.e_phnum); 294 if (!phdrs) 295 error("Failed to allocate space for phdrs");
Something must have gone awfully wrong by the time we are here...
objdump -a vmlinz: Size of program headers: 32 (bytes) Number of program headers: 3
I had a quick GDB session: malloc returns NULL because free_mem_ptr gets overwritten at some point, probably during decompression. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=970239
Björn Voigt
http://bugzilla.opensuse.org/show_bug.cgi?id=970239
http://bugzilla.opensuse.org/show_bug.cgi?id=970239#c15
--- Comment #15 from Dominique Leuenberger
You might be right, as it is actually broken in a way (and it affects x86 only): https://sourceware.org/bugzilla/show_bug.cgi?id=19520 The fix is in 2.26.1. (I wonder whether using GOLD would work as well here) I'm currently trying to compile a kernel to reproduce it (and not succeeding in doing so with tinyconfig), so I can try that option after the first "successful failure".
I can provide a VM disk image which has a couple kernels installed (most from RPM and one self-built inside the VM. The locally built one failed to boot in my tests, so the kernel tree / config seems 'fine' to reproduce -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=970239
http://bugzilla.opensuse.org/show_bug.cgi?id=970239#c16
--- Comment #16 from Fabian Vogt
(In reply to Fabian Vogt from comment #13)
The fix is in 2.26.1. (I wonder whether using GOLD would work as well here)
Unless something changed gold won't be able to link the linux kernel, afaik it doesn't support its ldscript usage.
What a pity, I'm using gold everywhere and now I'm used to the speed boost... (In reply to Dominique Leuenberger from comment #15)
(In reply to Fabian Vogt from comment #13)
You might be right, as it is actually broken in a way (and it affects x86 only): https://sourceware.org/bugzilla/show_bug.cgi?id=19520 The fix is in 2.26.1. (I wonder whether using GOLD would work as well here) I'm currently trying to compile a kernel to reproduce it (and not succeeding in doing so with tinyconfig), so I can try that option after the first "successful failure".
I can provide a VM disk image which has a couple kernels installed (most from RPM and one self-built inside the VM. The locally built one failed to boot in my tests, so the kernel tree / config seems 'fine' to reproduce
That would be useful. I'm currently installing TW on a more powerful machine to make full rebuilds a bit quicker. ) <- Closing parenthesis to relieve tension -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=970239
http://bugzilla.opensuse.org/show_bug.cgi?id=970239#c17
--- Comment #17 from Fabian Vogt
http://bugzilla.opensuse.org/show_bug.cgi?id=970239
Fabian Vogt
http://bugzilla.opensuse.org/show_bug.cgi?id=970239
http://bugzilla.opensuse.org/show_bug.cgi?id=970239#c19
Markus Trippelsdorf
(In reply to Ismail Donmez from comment #14)
(In reply to Fabian Vogt from comment #13)
The fix is in 2.26.1. (I wonder whether using GOLD would work as well here)
Unless something changed gold won't be able to link the linux kernel, afaik it doesn't support its ldscript usage.
What a pity, I'm using gold everywhere and now I'm used to the speed boost...
gold is able to link the kernel for several years already... -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=970239
http://bugzilla.opensuse.org/show_bug.cgi?id=970239#c21
Hendrik Woltersdorf
http://bugzilla.opensuse.org/show_bug.cgi?id=970239
http://bugzilla.opensuse.org/show_bug.cgi?id=970239#c22
Dominique Leuenberger
http://bugzilla.opensuse.org/show_bug.cgi?id=970239
PatrickD Garvey
http://bugzilla.opensuse.org/show_bug.cgi?id=970239
VERSINI Arnaud
http://bugzilla.opensuse.org/show_bug.cgi?id=970239
Max Lin
http://bugzilla.opensuse.org/show_bug.cgi?id=970239
http://bugzilla.opensuse.org/show_bug.cgi?id=970239#c29
--- Comment #29 from Fabian Vogt
http://bugzilla.opensuse.org/show_bug.cgi?id=970239
Mobeen Azhar
http://bugzilla.opensuse.org/show_bug.cgi?id=970239
http://bugzilla.opensuse.org/show_bug.cgi?id=970239#c30
--- Comment #30 from Dominique Leuenberger
http://bugzilla.opensuse.org/show_bug.cgi?id=970239
http://bugzilla.opensuse.org/show_bug.cgi?id=970239#c31
Takashi Iwai
http://bugzilla.opensuse.org/show_bug.cgi?id=970239
http://bugzilla.opensuse.org/show_bug.cgi?id=970239#c32
Fabian Vogt
http://bugzilla.opensuse.org/show_bug.cgi?id=970239
http://bugzilla.opensuse.org/show_bug.cgi?id=970239#c33
--- Comment #33 from Björn Voigt
http://bugzilla.opensuse.org/show_bug.cgi?id=970239
http://bugzilla.opensuse.org/show_bug.cgi?id=970239#c38
Dieter Nützel
OK, I took Fabian's patch as is, and pushed to my git branch. Now waiting for pull and the package submission.
Hip, hip, hooray!
Hello Takashi, et al.,
you all saved my faith in S.u.S.E, SuSE, openSUSE...;-)
And I'm with you since ~1993, first on floppies, and then on CDs.
My home server VIA C7-D stopped working (boot) under openSUSE 13.2 plus stable
kernel 4.4.5-1.2 on Friday night, 11.03.2016. I had running, plugging, carrying
monitors, keyboards and systems, all that shit a hole night...
i686-pae
4.4.5-1.1 GOOD
4.4.5-1.2 BAD
Then I had have to go back to kernel-pae-3.16.6-2.1.i686 with USB recovery.
And lost network configuration.
After that was fixed I was on 3.16.7-35-pae, until now.
I hopped for the new 4.5.0 series, but...
4.5.0-1.g95a9976-pae BAD
and now you...
Linux XXXX 2.6.65-2.gb2c9ae5-pae #1 SMP PREEMPT Wed Mar 16 17:30:21 UTC 2016
(b2c9ae5) i686 i686 i386 GNU/Linux
Running.
So you got from me:
Tested-by: Dieter Nützel
http://bugzilla.opensuse.org/show_bug.cgi?id=970239
http://bugzilla.opensuse.org/show_bug.cgi?id=970239#c39
--- Comment #39 from Dieter Nützel
(In reply to Takashi Iwai from comment #37)
OK, I took Fabian's patch as is, and pushed to my git branch. Now waiting for pull and the package submission.
Hip, hip, hooray!
Hello Takashi, et al.,
you all saved my faith in S.u.S.E, SuSE, openSUSE...;-) And I'm with you since ~1993, first on floppies, and then on CDs.
My home server VIA C7-D stopped working (boot) under openSUSE 13.2 plus stable kernel 4.4.5-1.2 on Friday night, 11.03.2016. I had running, plugging, carrying monitors, keyboards and systems, all that shit a hole night...
i686-pae
4.4.5-1.1 GOOD 4.4.5-1.2 BAD
Then I had have to go back to kernel-pae-3.16.6-2.1.i686 with USB recovery. And lost network configuration.
After that was fixed I was on 3.16.7-35-pae, until now.
I hopped for the new 4.5.0 series, but...
4.5.0-1.g95a9976-pae BAD
and now you...
Linux XXXX 2.6.65-2.gb2c9ae5-pae #1 SMP PREEMPT Wed Mar 16 17:30:21 UTC 2016 (b2c9ae5) i686 i686 i386 GNU/Linux
Argh.... uname26... Should be: Linux XXXX 4.5.0-2.gb2c9ae5-pae #1 SMP PREEMPT Wed Mar 16 17:30:21 UTC 2016 (b2c9ae5) i686 i686 i386 GNU/Linux Sorry! -Dieter
Running.
So you got from me:
Tested-by: Dieter Nützel
Thank you so much!
Normally I'm on my Xeon machine for DRM/Mesa/Radeon development;-)
Linux XXXX 4.5.0-2.gb2c9ae5-default #1 SMP PREEMPT Wed Mar 16 17:30:21 UTC 2016 (b2c9ae5) x86_64 x86_64 x86_64 GNU/Linux
-Dieter
PS If I (we) only could find a solution for my openSUSE 13.2 wicked/DSL/modem problem... (all newer then wicked-0.6.19-18.1 versions can't bring up dsl0)
-- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=970239
http://bugzilla.opensuse.org/show_bug.cgi?id=970239#c43
Bruno Friedmann
http://bugzilla.opensuse.org/show_bug.cgi?id=970239
http://bugzilla.opensuse.org/show_bug.cgi?id=970239#c44
--- Comment #44 from Dominique Leuenberger
@Dominique could you also post a status update here (and certainly perhaps on ml) to a rough estimate for the light at the end of tunnel
The kernel has been checked in to openSUSE:Factory and the repo is building... It is believed that this all solved - but honestly, I believe it when openQA passes the test (probably tomorrow). Of course I trust the work of Fabian and Takashi, but there have been many variables seen that caused it to go wrong. So I'm a bit cautious. As soon as I know for sure I will certainly make an update on the ML. For reference, the fix was in https://build.opensuse.org/request/show/373926 -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=970239
http://bugzilla.opensuse.org/show_bug.cgi?id=970239#c45
--- Comment #45 from Hendrik Woltersdorf
Linux XXXX 4.5.0-2.gb2c9ae5-pae #1 SMP PREEMPT Wed Mar 16 17:30:21 UTC 2016 (b2c9ae5) i686 i686 i386 GNU/Linux
I've downloaded this kernel package too and installed it on my hardware: - Intel(R) Pentium(R) M processor 730 - gfx card: ATI Mobility FireGL V5000 564B (M26) (PCIE) with the free radeon driver - encrypted disks (everything but /boot): encrypted single partitions with etx4 and xfs and a "LVM on LUKS" container And it works! -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=970239
cees vorstenbosch
http://bugzilla.opensuse.org/show_bug.cgi?id=970239
cees vorstenbosch
http://bugzilla.opensuse.org/show_bug.cgi?id=970239
Aaron Burgemeister
participants (1)
-
bugzilla_noreply@novell.com