On Mon, 2018-02-12 at 11:53 +0100, Lars Marowsky-Bree wrote:
On 2018-02-09T13:17:21, "H.Merijn Brand" <h.m.brand@xs4all.nl> wrote:
TumbleWeed is bleeding edge. I hope you are aware that problems may arise. If you want automatic fixes, stick to Leap.
I think this is unfair to the promise of Tumbleweed.
I could understand how minor issues get past openQA, but a whitescreen after boot for a large portion of the user base deserves a second look as to how that slipped.
There are multiple things playing together and so far, openQA indeed does not catch this kind of problem. One of the biggest issue why openQA does not catch it is that most tests are based on 'fresh installs' and 'upgrades from released openSUSE versions': the fresh install gets obviously a clean cache state, not causing the problem. The upgrade scenario comes from Mesa 17.0 and apparently does not trigger the caches to go beserk. Reasons are not totally clear just yet, as there is a reboot towards the end of the test, where we do add more program tests afterwards. But one thing we DO miss in openQA are non-autologin enabled installs. So we pass sddm only once on the very specific test for 'logout, get to sddm and try to log back in' -> we reach sddm only once.
(Yes, I know, it's FLOSS, so the result might end up implying that I need to do something myself and contribute, but so far I don't yet understand enough as to where ;-)
That's part of the problem of the issue: it is not fully understood: from what I gathered so far it seems to be an invalid play of Mesa invalidating the cache, and Qt not liking it, resulting in render errors.
Is that because openQA runs virtualized and not with actual hardware?
This can well be part of the equation - making it even harder to detect it with openQA
That said, this problem was more intrusive that the other problems in the last 6 months as it prevents users to work at all. You need strong nerves to not start cursing.
Well, I was able to eventually fix it locally (noticing how sddm broke, failing to figure out why, and switching to kdm). But I'd still like to understand the failure and process better.
so do I - and hopefully come up with an openQA test. so far, the best thing we found is 'it does not matter if Mesa changes version, a simple rebuild of it will be enough to trigger the error again, as the cache is validated/invalidated based on build_ids. And a rebuild is nothing I can guarantee not to happen (possibly changed deps)
• FUN! I think I'm just masochistic enough to enjoy the occasional throwback.
There's certainly that.
But TW should also not be a dumping ground for unstable or untested code, and it shouldn't mean developers push without a local test.
Rest assured, it's not seen as a dumping ground at all.
And maybe it means TW's release process should eventually evolve to include a canary release stage, instead of instant "everyone, everywhere"?
In short: you want to have 'Factory' back? I'm open to hear proposals; I know of at leats one user that regularly tests in parallel to openQA, before things 'hit the shelves' - and he did report issues in the past (which we used to block snapshots). Cheers Dominique