On Wed, 2013-12-04 at 11:57 +0100, Alberto Planas Dominguez wrote:
Hello list,
as outlined in the mail sent by coolo last Thursday, the current approach of having openQA only run after Factory emits an ISO image does not prevent Factory from getting broken. Problems can only be detected when it's basically already too late. Therefore coolo's mail proposed this workflow that strengthens openQA's importance by including pre-integration tests in the Factory submission process.
https://progress.opensuse.org/workflow/factory-proposal.html
That mode of operation adds new requirements on openQA. In no particular order of importance:
1. Accessible In order to allow Factory and staging project maintainers to evaluate and fix test runs openQA's web interface needs to be enhanced to work with multiple users. That means introducing an actual web framework instead of using CGI.pm, introducing a proper database backend, coming up with a permissions model, etc. Also, there needs to be a way to allow users to edit test cases. Right now one has to do that outside of the tool via direct git access. Fulfilling this requirement is basically precondition for hosting openQA in public, ie needed for full upstream acceptance.
This would be great. As I've expressed more verbosely in the past, I think it's a huge shame that all the work the openSUSE Team have done to date on openQA is 'hidden away' from the majority of our contributors because of this issue. If nothing else you propose gets done, fixing this so all our contributors have the opportunity to make use of new openQA would be a huge benefit.
2. Reliable In it's current state openQA needs constant care taking by an operator with root access. Way too often jobs get stuck or fail in an unpredictable way. Many times the reasons for that are deadlocks in the threading model, issues with the monitor interface of QEMU, race conditions that depend on the load and disk IO or side effects of tests. Effort needs to be put into eliminating random failures by e.g. avoiding Perl threads, enhancing QEMU's QMP interface, avoiding sleep() or leveraging VM snapshots in a smarter way.
Sounds fair enough.. given I've seen little of the current state of your openQA I cant really comment further, but these changes sound reasonable.
3. Scalable openQA already is able to run several QEMU instances on one machine. There are natural limits on the amount of RAM and CPU of affordable machines though. So even the fastest machine seen so far can only run eight installations in parallel. To be able to run both Factory tests as well as pre-integration tests projects in a reasonable amount of time openQA needs to be parallelized to run on several machines. The way OBS uses workers can serve as inspiration here. This feature requires implementing smart ways to distribute input files like test scripts, reference images, RPMs or ISO images among workers. It requires a database model that allows live updating of test results and a new way to provide the back channel for the live view.
If you go this route, will it be possible for contributors to donate/add 'testers' (my name for the openQA equivalent of OBS 'builders') to the pool? We can't offer that for OBS for lots of very sensible security reasons, but if it is an option for openQA we'd hopefully have contributors willing to volunteer hardware, bandwidth, etc for testing Factory, which could really help with that scalability (and at least mean the responsibility and cost for all hardware isn't solely SUSEs)
4. Debuggable Sometimes tests just fail in a way that is not debuggable after the fact. Therefore there needs to be a mode that allows developers to interactively take control of the virtual machine that runs the test. If that doesn't help either, there should be a way to clone jobs to a local instance to reproduce it there.
I've had some thoughts about that. I'm not convinced you need to make that 'openQA's problem'. If we have an issue which openQA is detecting but unable to debug, I think all openQA 'needs' to do is flag the issue and provide all the information it can - this then becomes an issue for a real human tester to test In my opinion, the best way openQA can help there perhaps some kind of dashboard to show open/undebugged test cases But I certainly agree that having openQA offer downloadable VM/disk images so the tester can download and run those tests locally, yeah, that would be really nice, and dramatically cut down the work required for those cases which cant be debugged after the fact.
5. Interfacable So far openQA test results are meant to be interpreted by humans. To be able to serve as good input for automatically judging whether a staging project is good enough, the results must be available in a way for the machine to understand. E.g. for displaying test results directly in OBS projects.
Without knowing the current system, again, hard for me to comment, but perhaps the answer is to flip the problem on its head? have the initial results be machine-parsable, and then have a parser transform those results to generate the human readable ones?
To implement the listed features, several man months of concentrated development effort would be needed.
Great, what opportunities will there be for non-openSUSE Team contributors to help out? I assume you'll be accepting pull requests to https://github.com/openSUSE-Team/openQA ?
p.s The original author of this email is Ludwig Nussel, but I edited it a bit before sending. That means that the good parts are from him, and the mistakes are mine.
Can I at least give you credit for your very humble disclaimer? :) -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org