Re: [opensuse-factory] Re: snapshot 20180206 - boots to white screen

12 Feb 2018

      On Mon, 2018-02-12 at 11:53 +0100, Lars Marowsky-Bree wrote:
...
On 2018-02-09T13:17:21, "H.Merijn Brand" <h.m.brand@xs4all.nl> wrote:
...
TumbleWeed is bleeding edge. I hope you are aware that problems may
arise. If you want automatic fixes, stick to Leap.
I think this is unfair to the promise of Tumbleweed.
I could understand how minor issues get past openQA, but a whitescreen
after boot for a large portion of the user base deserves a second look
as to how that slipped.
There are multiple things playing together and so far, openQA indeed
does not catch this kind of problem.

One of the biggest issue why openQA does not catch it is that most
tests are based on 'fresh installs' and 'upgrades from released
openSUSE versions': the fresh install gets obviously a clean cache
state, not causing the problem. The upgrade scenario comes from Mesa
17.0 and apparently does not trigger the caches to go beserk.

Reasons are not totally clear just yet, as there is a reboot towards
the end of the test, where we do add more program tests afterwards. But
one thing we DO miss in openQA are non-autologin enabled installs. So
we pass sddm only once on the very specific test for 'logout, get to
sddm and try to log back in' -> we reach sddm only once.
...
(Yes, I know, it's FLOSS, so the result might end up implying that I
need to do something myself and contribute, but so far I don't yet
understand enough as to where ;-)
That's part of the problem of the issue: it is not fully understood:
from what I gathered so far it seems to be an invalid play of Mesa
invalidating the cache, and Qt not liking it, resulting in render
errors.
...
Is that because openQA runs virtualized and not with actual hardware?
This can well be part of the equation - making it even harder to detect
it with openQA
...
...
That said, this problem was more intrusive that the other problems in
the last 6 months as it prevents users to work at all. You need strong
nerves to not start cursing.
Well, I was able to eventually fix it locally (noticing how sddm broke,
failing to figure out why, and switching to kdm). But I'd still like to
understand the failure and process better.
so do I - and hopefully come up with an openQA test. so far, the best
thing we found is 'it does not matter if Mesa changes version, a simple
rebuild of it will be enough to trigger the error again, as the cache
is validated/invalidated based on build_ids. And a rebuild is nothing I
can guarantee not to happen (possibly changed deps)
...
...
• FUN! I think I'm just masochistic enough to enjoy the occasional
  throwback.
There's certainly that.
But TW should also not be a dumping ground for unstable or untested
code, and it shouldn't mean developers push without a local test.
Rest assured, it's not seen as a dumping ground at all.
...
And maybe it means TW's release process should eventually evolve to
include a canary release stage, instead of instant "everyone,
everywhere"?
In short: you want to have 'Factory' back? I'm open to hear proposals;
I know of at leats one user that regularly tests in parallel to openQA,
before things 'hit the shelves' - and he did report issues in the past
(which we used to block snapshots).

Cheers
Dominique

Re: [opensuse-factory] Re: snapshot 20180206 - boots to white screen

Dominique Leuenberger / DimStar