[opensuse-arm] Most ARM v7 workers are broken
Hi, Sorry for cross post (-arm and -buildservice) but not sure who should fix this problem. Some ARM v7 workers are broken. They start to boot and then fail. Thus, lots of packages go to building state but return quickly to scheduled state. armbuild numbers 13, 16, 17, 18 and 19 are failing workers. armbuild15 does not seem to be ok but it seems to be a slightly different bug. Guillaume -- To unsubscribe, e-mail: opensuse-arm+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-arm+owner@opensuse.org
On Thursday 26 November 2015, 11:23:27 wrote Guillaume Gardet:
Hi,
Sorry for cross post (-arm and -buildservice) but not sure who should fix this problem.
Some ARM v7 workers are broken. They start to boot and then fail. Thus, lots of packages go to building state but return quickly to scheduled state.
armbuild numbers 13, 16, 17, 18 and 19 are failing workers. armbuild15 does not seem to be ok but it seems to be a slightly different bug.
armbuild13 (and most likely the others as well) did stubmble over the not working hardware random generator support in the kernel. I have disabled for now kernel RNG support in the build script for arm, but we need to discuss if we want to have this in future (by having proper kernel and initrd support for it) or if we should disable it in general for arm.
Guillaume
-- Adrian Schroeter email: adrian@suse.de SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) Maxfeldstraße 5 90409 Nürnberg Germany -- To unsubscribe, e-mail: opensuse-arm+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-arm+owner@opensuse.org
Le 26/11/2015 11:48, Adrian Schröter a écrit :
On Thursday 26 November 2015, 11:23:27 wrote Guillaume Gardet:
Hi,
Sorry for cross post (-arm and -buildservice) but not sure who should fix this problem.
Some ARM v7 workers are broken. They start to boot and then fail. Thus, lots of packages go to building state but return quickly to scheduled state.
armbuild numbers 13, 16, 17, 18 and 19 are failing workers. armbuild15 does not seem to be ok but it seems to be a slightly different bug. armbuild13 (and most likely the others as well) did stubmble over the not working hardware random generator support in the kernel.
I have disabled for now kernel RNG support in the build script for arm, but we need to discuss if we want to have this in future (by having proper kernel and initrd support for it) or if we should disable it in general for arm.
How is it failing? On which board/kernel? Guillaume -- To unsubscribe, e-mail: opensuse-arm+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-arm+owner@opensuse.org
Adrian Schröter <adrian@suse.de> writes:
armbuild13 (and most likely the others as well) did stubmble over the not working hardware random generator support in the kernel.
That should already be solved by <https://github.com/openSUSE/obs-build/pull/209>. Does that not work? Andreas. -- Andreas Schwab, SUSE Labs, schwab@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different." -- To unsubscribe, e-mail: opensuse-arm+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-arm+owner@opensuse.org
Hi Adrian,
I have disabled for now kernel RNG support in the build script for arm, but we need to discuss if we want to have this in future (by having proper kernel and initrd support for it) or if we should disable it in general for arm.
Not all ARMv7 workers have support hardware available for a random number generator. Most do, so disabling it globally is not a good idea. I'm not sure what the problem was, but now I had no chance to debug - isn't this already solved by the autodetection that Andreas submitted? Greetings, Dirk -- To unsubscribe, e-mail: opensuse-arm+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-arm+owner@opensuse.org
On Thursday 26 November 2015, 22:57:57 wrote Dirk Müller:
Hi Adrian,
I have disabled for now kernel RNG support in the build script for arm, but we need to discuss if we want to have this in future (by having proper kernel and initrd support for it) or if we should disable it in general for arm.
Not all ARMv7 workers have support hardware available for a random number generator. Most do, so disabling it globally is not a good idea.
I'm not sure what the problem was, but now I had no chance to debug - isn't this already solved by the autodetection that Andreas submitted?
The problem is that we can't detect from outside if the kernel which is loading will complain about the extra parameter or not. And complain means here stopping the build. The commits from the mentioned pull request are merged and active, but they only affect the qemu-kvm behaviour, not the kernel behaviour. All what we we can do is checking on the host if the hardware is able to provide hardware rng and then just guessing if the guest kernel will support it IMHO. Some something like dd if=/dev/hwrng of=/dev/null bs=1 count=1 || kvm_rng_device= this which is disabling rng options when no device is there in the host. Do you know a better way to check for hwrng which avoids such a potential hanging read? I do not have any arm 32bit with hwrng device around, it seems. -- Adrian Schroeter email: adrian@suse.de SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) Maxfeldstraße 5 90409 Nürnberg Germany -- To unsubscribe, e-mail: opensuse-arm+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-arm+owner@opensuse.org
On Friday 27 November 2015, 07:58:47 wrote Adrian Schröter:
On Thursday 26 November 2015, 22:57:57 wrote Dirk Müller:
Hi Adrian,
I have disabled for now kernel RNG support in the build script for arm, but we need to discuss if we want to have this in future (by having proper kernel and initrd support for it) or if we should disable it in general for arm.
Not all ARMv7 workers have support hardware available for a random number generator. Most do, so disabling it globally is not a good idea.
I'm not sure what the problem was, but now I had no chance to debug - isn't this already solved by the autodetection that Andreas submitted?
The problem is that we can't detect from outside if the kernel which is loading will complain about the extra parameter or not. And complain means here stopping the build.
The commits from the mentioned pull request are merged and active, but they only affect the qemu-kvm behaviour, not the kernel behaviour.
All what we we can do is checking on the host if the hardware is able to provide hardware rng and then just guessing if the guest kernel will support it IMHO.
Some something like
dd if=/dev/hwrng of=/dev/null bs=1 count=1 || kvm_rng_device=
this which is disabling rng options when no device is there in the host. Do you know a better way to check for hwrng which avoids such a potential hanging read? I do not have any arm 32bit with hwrng device around, it seems.
okay, the second commit tries to do this already ... checking why it breaks ... (sorry, first mail before coffee) -- Adrian Schroeter email: adrian@suse.de SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) Maxfeldstraße 5 90409 Nürnberg Germany -- To unsubscribe, e-mail: opensuse-arm+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-arm+owner@opensuse.org
On Friday 27 November 2015, 08:02:19 wrote Adrian Schröter:
On Friday 27 November 2015, 07:58:47 wrote Adrian Schröter:
On Thursday 26 November 2015, 22:57:57 wrote Dirk Müller:
Hi Adrian,
I have disabled for now kernel RNG support in the build script for arm, but we need to discuss if we want to have this in future (by having proper kernel and initrd support for it) or if we should disable it in general for arm.
Not all ARMv7 workers have support hardware available for a random number generator. Most do, so disabling it globally is not a good idea.
I'm not sure what the problem was, but now I had no chance to debug - isn't this already solved by the autodetection that Andreas submitted?
The problem is that we can't detect from outside if the kernel which is loading will complain about the extra parameter or not. And complain means here stopping the build.
The commits from the mentioned pull request are merged and active, but they only affect the qemu-kvm behaviour, not the kernel behaviour.
All what we we can do is checking on the host if the hardware is able to provide hardware rng and then just guessing if the guest kernel will support it IMHO.
Some something like
dd if=/dev/hwrng of=/dev/null bs=1 count=1 || kvm_rng_device=
this which is disabling rng options when no device is there in the host. Do you know a better way to check for hwrng which avoids such a potential hanging read? I do not have any arm 32bit with hwrng device around, it seems.
okay, the second commit tries to do this already ... checking why it breaks ... (sorry, first mail before coffee)
[ 14s] linux64 /usr/bin/qemu-system-arm -nodefaults -no-reboot -nographic -vga none -enable-kvm -M virt -cpu host -object rng-random,filename=/dev/random,id=rng0 -device virtio-rng-pci,rng=rng0 -mem-prealloc -mem-path /dev/hugepages -net none -kernel /boot/zImage.guest -initrd /boot/initrd -append root=/dev/disk/by-id/virtio-0 rootfstype=ext4 rootflags=noatime panic=1 quiet no-kvmclock nmi_watchdog=0 rw rd.driver.pre=binfmt_misc elevator=noop console=ttyAMA0 init=/.build/build -m 1020 -drive file=/var/cache/obs/worker/root_1/root,format=raw,if=none,id=disk,serial=0,cache=unsafe -device virtio-blk-device,drive=disk -drive file=/var/cache/obs/worker/root_1.swap,format=raw,if=none,id=swap,serial=1,cache=unsafe -device virtio-blk-device,drive=swap -serial stdio -smp 1 [ 14s] qemu-system-arm: -device virtio-rng-pci,rng=rng0: No 'PCI' bus found for device 'virtio-rng-pci' So, virtio rng device seems not to work with non-hardware rng generator. -- Adrian Schroeter email: adrian@suse.de SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) Maxfeldstraße 5 90409 Nürnberg Germany -- To unsubscribe, e-mail: opensuse-arm+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-arm+owner@opensuse.org
On 27.11.15 08:07, Adrian Schröter wrote:
On Friday 27 November 2015, 08:02:19 wrote Adrian Schröter:
On Friday 27 November 2015, 07:58:47 wrote Adrian Schröter:
On Thursday 26 November 2015, 22:57:57 wrote Dirk Müller:
Hi Adrian,
I have disabled for now kernel RNG support in the build script for arm, but we need to discuss if we want to have this in future (by having proper kernel and initrd support for it) or if we should disable it in general for arm.
Not all ARMv7 workers have support hardware available for a random number generator. Most do, so disabling it globally is not a good idea.
I'm not sure what the problem was, but now I had no chance to debug - isn't this already solved by the autodetection that Andreas submitted?
The problem is that we can't detect from outside if the kernel which is loading will complain about the extra parameter or not. And complain means here stopping the build.
The commits from the mentioned pull request are merged and active, but they only affect the qemu-kvm behaviour, not the kernel behaviour.
All what we we can do is checking on the host if the hardware is able to provide hardware rng and then just guessing if the guest kernel will support it IMHO.
Some something like
dd if=/dev/hwrng of=/dev/null bs=1 count=1 || kvm_rng_device=
this which is disabling rng options when no device is there in the host. Do you know a better way to check for hwrng which avoids such a potential hanging read? I do not have any arm 32bit with hwrng device around, it seems.
okay, the second commit tries to do this already ... checking why it breaks ... (sorry, first mail before coffee)
[ 14s] linux64 /usr/bin/qemu-system-arm -nodefaults -no-reboot -nographic -vga none -enable-kvm -M virt -cpu host -object rng-random,filename=/dev/random,id=rng0 -device virtio-rng-pci,rng=rng0 -mem-prealloc -mem-path /dev/hugepages -net none -kernel /boot/zImage.guest -initrd /boot/initrd -append root=/dev/disk/by-id/virtio-0 rootfstype=ext4 rootflags=noatime panic=1 quiet no-kvmclock nmi_watchdog=0 rw rd.driver.pre=binfmt_misc elevator=noop console=ttyAMA0 init=/.build/build -m 1020 -drive file=/var/cache/obs/worker/root_1/root,format=raw,if=none,id=disk,serial=0,cache=unsafe -device virtio-blk-device,drive=disk -drive file=/var/cache/obs/worker/root_1.swap,format=raw,if=none,id=swap,serial=1,cache=unsafe -device virtio-blk-device,drive=swap -serial stdio -smp 1 [ 14s] qemu-system-arm: -device virtio-rng-pci,rng=rng0: No 'PCI' bus found for device 'virtio-rng-pci'
So, virtio rng device seems not to work with non-hardware rng generator.
The error says that there is no bus we can plug virtio-rng-pci into. Which is true, we don't have a PCI bus by default on ARM virtual machines ;). You can either use virtio-rng-device (which then plugs into virtio-mmio) or spawn a PCI bus using -device gpex-pcihost. Alex -- To unsubscribe, e-mail: opensuse-arm+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-arm+owner@opensuse.org
Hi Adrian,
[ 14s] qemu-system-arm: -device virtio-rng-pci,rng=rng0: No 'PCI' bus found for device 'virtio-rng-pci'
I fixed that already a bit over a month ago. see https://github.com/openSUSE/obs-build/pull/210/ That needs to be deployed. In case you're wondering why all build workers on all architectures are broken right now: there is a typo in the changes that you deployed on the server: from build-vm-kvm line 192: - kvm_options="$kvm_options -object rng-random,filename=/dev/hwrng,id=rng0 -device kvm_rng_device,rng=rng0" + kvm_options="$kvm_options -object rng-random,filename=/dev/hwrng,id=rng0 -device $kvm_rng_device,rng=rng0" On the aarch64 workers I have deployed a local hack so that it doesn't fetch the broken build script anymore. Greetings, Dirk -- To unsubscribe, e-mail: opensuse-arm+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-arm+owner@opensuse.org
On Saturday 28 November 2015, 18:29:36 wrote Dirk Müller:
Hi Adrian,
[ 14s] qemu-system-arm: -device virtio-rng-pci,rng=rng0: No 'PCI' bus found for device 'virtio-rng-pci'
I fixed that already a bit over a month ago. see https://github.com/openSUSE/obs-build/pull/210/ That needs to be deployed.
That was the version which caused the problems on arm 32bit with the the kernel/qemu startup
In case you're wondering why all build workers on all architectures are broken right now
huh? at least x86_64 is working fine
: there is a typo in the changes that you deployed on the server:
from build-vm-kvm line 192:
- kvm_options="$kvm_options -object rng-random,filename=/dev/hwrng,id=rng0 -device kvm_rng_device,rng=rng0" + kvm_options="$kvm_options -object rng-random,filename=/dev/hwrng,id=rng0 -device $kvm_rng_device,rng=rng0"
k, adapted that. -- Adrian Schroeter email: adrian@suse.de SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) Maxfeldstraße 5 90409 Nürnberg Germany -- To unsubscribe, e-mail: opensuse-arm+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-arm+owner@opensuse.org
Adrian Schröter <adrian@suse.de> writes:
On Saturday 28 November 2015, 18:29:36 wrote Dirk Müller:
In case you're wondering why all build workers on all architectures are broken right now
huh? at least x86_64 is working fine
Probably because most of the x86-64 workers have no hwrng. Andreas. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." -- To unsubscribe, e-mail: opensuse-arm+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-arm+owner@opensuse.org
On Monday 30 November 2015, 09:11:34 wrote Andreas Schwab:
Adrian Schröter <adrian@suse.de> writes:
On Saturday 28 November 2015, 18:29:36 wrote Dirk Müller:
In case you're wondering why all build workers on all architectures are broken right now
huh? at least x86_64 is working fine
Probably because most of the x86-64 workers have no hwrng.
that might be very well. Actually, I wonder if the rng stuff is really worth the effort when the majority of our systems is not supporting it. Esp. since it seems to be untestable (if it would be setup in the initrd only we could test it through kernel builds, but it requires qemu parameters atm). since the introduction we always had a breakage somewhere :/ -- Adrian Schroeter email: adrian@suse.de SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) Maxfeldstraße 5 90409 Nürnberg Germany -- To unsubscribe, e-mail: opensuse-arm+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-arm+owner@opensuse.org
Hi,
Actually, I wonder if the rng stuff is really worth the effort when the majority of our systems is not supporting it.
passing through virtio-rng is definitely a good idea especially with the terrible openssl on SLE12 that requires several kilobytes of entropy for generating a trivial RSA key. However, I'm not sure if selecting /dev/hwrng was a good idea, since that one has an unspecified quality (and might be returning 00"s all day). The normal setup is that you run rngd that tests the entropy of hwrng before feeding bad kernel into the kernel entropy pool, and /dev/random is seeded with those entropy sources that provide the required quality. With just using /dev/random from the host the amount of code needed and the level of breakage should be minimized accross architectures. -- To unsubscribe, e-mail: opensuse-arm+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-arm+owner@opensuse.org
Hi Adrian,
That was the version which caused the problems on arm 32bit with the the kernel/qemu startup
Not really :-) the change fixed the problem. the issue was that somebody ,while editing the build script, forgot to remove the reference to the -pci device and only added the -device virtio-rng-block. Which is why we had now two random devices and why it was failing.
k, adapted that.
Thanks, Dirk -- To unsubscribe, e-mail: opensuse-arm+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-arm+owner@opensuse.org
On Monday 30 November 2015, 09:35:10 wrote Dirk Müller:
Hi Adrian,
That was the version which caused the problems on arm 32bit with the the kernel/qemu startup
Not really :-) the change fixed the problem. the issue was that somebody ,while editing the build script, forgot to remove the reference to the -pci device and only added the -device virtio-rng-block. Which is why we had now two random devices and why it was failing.
the current git master is the one with caused the armv7 failures, do you speak about that one? Or of some local hack versions of me on the server? -- Adrian Schroeter email: adrian@suse.de SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) Maxfeldstraße 5 90409 Nürnberg Germany -- To unsubscribe, e-mail: opensuse-arm+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-arm+owner@opensuse.org
Hi Adrian,
the current git master is the one with caused the armv7 failures, do you speak about that one?
When we talk about why armv7 fails, then yes, thats because https://github.com/openSUSE/obs-build/pull/210 is not merged in git master. Don't be confused by the "merged" state in github, somebody clicked on the merge button but afterwards force-pushed master to the github repo without that change I think. So it is not in, and merging that change would fix it, and we'd be happy again.
Or of some local hack versions of me on the server?
The local hack version was causing the other breakage (for aarch64 and powerpc). Greetings, Dirk -- To unsubscribe, e-mail: opensuse-arm+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-arm+owner@opensuse.org
On Monday 30 November 2015, 11:12:57 wrote Dirk Müller:
Hi Adrian,
the current git master is the one with caused the armv7 failures, do you speak about that one?
When we talk about why armv7 fails, then yes, thats because https://github.com/openSUSE/obs-build/pull/210 is not merged in git master. Don't be confused by the "merged" state in github, somebody clicked on the merge button but afterwards force-pushed master to the github repo without that change I think. So it is not in, and merging that change would fix it, and we'd be happy again.
;/ ... k, reverted the former commit of disabling virtio when /dev/random is used and applied that one again. Let's see what happens ... -- Adrian Schroeter email: adrian@suse.de SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) Maxfeldstraße 5 90409 Nürnberg Germany -- To unsubscribe, e-mail: opensuse-arm+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-arm+owner@opensuse.org
Le 26/11/2015 11:23, Guillaume Gardet a écrit :
Hi,
Sorry for cross post (-arm and -buildservice) but not sure who should fix this problem.
Some ARM v7 workers are broken. They start to boot and then fail. Thus, lots of packages go to building state but return quickly to scheduled state.
armbuild numbers 13, 16, 17, 18 and 19 are failing workers. armbuild15 does not seem to be ok but it seems to be a slightly different bug.
Seems to be back to a normal state, except for armbuild15 which seems to be very slow. Guillaume -- To unsubscribe, e-mail: opensuse-arm+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-arm+owner@opensuse.org
On Thursday 26 November 2015, 11:52:14 wrote Guillaume Gardet:
Le 26/11/2015 11:23, Guillaume Gardet a écrit :
Hi,
Sorry for cross post (-arm and -buildservice) but not sure who should fix this problem.
Some ARM v7 workers are broken. They start to boot and then fail. Thus, lots of packages go to building state but return quickly to scheduled state.
armbuild numbers 13, 16, 17, 18 and 19 are failing workers. armbuild15 does not seem to be ok but it seems to be a slightly different bug.
Seems to be back to a normal state, except for armbuild15 which seems to be very slow.
15 seems to work at relative normal speed for an armv7 system. Can you tell how you meassure "very slow" ? -- Adrian Schroeter email: adrian@suse.de SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) Maxfeldstraße 5 90409 Nürnberg Germany -- To unsubscribe, e-mail: opensuse-arm+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-arm+owner@opensuse.org
Le 26/11/2015 11:53, Adrian Schröter a écrit :
On Thursday 26 November 2015, 11:52:14 wrote Guillaume Gardet:
Le 26/11/2015 11:23, Guillaume Gardet a écrit :
Hi,
Sorry for cross post (-arm and -buildservice) but not sure who should fix this problem.
Some ARM v7 workers are broken. They start to boot and then fail. Thus, lots of packages go to building state but return quickly to scheduled state.
armbuild numbers 13, 16, 17, 18 and 19 are failing workers. armbuild15 does not seem to be ok but it seems to be a slightly different bug. Seems to be back to a normal state, except for armbuild15 which seems to be very slow. 15 seems to work at relative normal speed for an armv7 system.
Can you tell how you meassure "very slow" ?
No real measure, just saw "slow" worker start (package installation, ...) but it seems to be ok now. Guillaume -- To unsubscribe, e-mail: opensuse-arm+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-arm+owner@opensuse.org
participants (6)
-
Adrian Schröter
-
Alexander Graf
-
Andreas Schwab
-
Andreas Schwab
-
Dirk Müller
-
Guillaume Gardet