[opensuse-project] AppStream/SC/PK GSoC report #1
Hi! This is my first report of the work I'm doing on AppStream and PackageKit / the Software-Center. It took a little longer to write it, university is giving me a hard time, sorry for that! Before the SoC development started, I already familiarized with the SC code and learned a bit more Python, which will definitely help a lot :-) I ported the SC from using PkClient to PkTask, because PkTask is a much more generic interface and the thing we want to use in the SC. Unfortunately, the SC was still extremely slow, so I applied some PackageKit-tricks to make it over 420% faster, I introduced some new PK API for that. Still this was not enough: If you install software, you are - due to PK's design - unable to also query package details, which makes it impossible to view application details while installing stuff. This makes using the SC a pain and also has some problems when it comes to searching stuff. In the last two weeks, I discussed all these issues with Richard Hughes, the PackageKit maintainer. I wrote a prototype of a SQLite based package-cache, which will make querying data super-fast and will make it possible to do this in parallel to PK transactions. First, the cache was super-slow, but with some optimizations in PkPackageSack routines, it now runs with acceptable speed so we can use it. I also found some other optimizations together with Richard, which will make PK itself a lot faster on some operations. (it will save from a few msecs up to 10secs, depending on what you're trying to do) In the process of talking about changes and reviewing patches and suggestions, Richard decided that it's time now to _really_ break stuff and make it more sane. So right now we're doing a new PackageKit 0.8.x series, which will contain lots of cool improvements, not only relevant for the stuff I want to do, but also for other new features, like systemd-offline-upgrade. My changes are already merged upstream and I will do a public API to access the cache now, after discussing that upstream. Also, there are some other changes on PKs DBus-API I and some other people would like to have, discussion is going on there right now. We're also improving backend interaction and broke plugin API, so there's lots of stuff going on right now which will make developing a SC based on PackageKit a lot easier. At time, all PackageKit backends are broken and there's lots of stuff which needs to be fixed and even more to be discussed. Doing cross-distro-work is sometimes even more discussion than coding, but I am certain right now that we will have a usable AppStream Software-Center with the end of my SoC project. If you read through this now, I'd like to ask you if you have anything you want to do with PackageKit (which is not very backend-specific) and haven't been able to do until now. - If there is any of these issues, we might now be able to fix it. Next steps are even more discussion and work on PackageKit to get the API right, as well as some changes on SC and probably a C++ implementation of AppStream Xapian generator, so other implementations of an AppStream-SC are possible (hey, KDE!) and we can use it even without the SC. Also, backend developers, please fix your backends soon! I want to find people responsible for the Zypper backend, so they can take a look at it and adjust it to our changes. (I have no knowledge about Zypper, and developing in a VM is a pain, but I could of course make some changes by myself later, if the basic stuff is working) Test results about overall performance is helpful too. -- Overall the start of this SoC was promising. First after experiencing the first issues, I thought it might become extremely difficult to complete this project, as there were many issues to work around, but now problems are solved and we're really doing amazing stuff! Working with Richard and the PK crows is a pleasure, as always and very productive. I'm looking forward to the next weeks, as everything is now working as planned again. :-) (Especially I need feedback from other distributions later) Thanks, Matthias -- To unsubscribe, e-mail: opensuse-project+unsubscribe@opensuse.org To contact the owner, email: opensuse-project+owner@opensuse.org
Hi! On Mon, Jun 04, 2012 at 11:45:44AM +0200, Matthias Klumpp wrote:
In the last two weeks, I discussed all these issues with Richard Hughes, the PackageKit maintainer. I wrote a prototype of a SQLite based package-cache, which will make querying data super-fast and will make it possible to do this in parallel to PK transactions. First, the cache was super-slow, but with some optimizations in PkPackageSack routines, it now runs with acceptable speed so we can use it.
Hmm, I don't understand this. Is this sqlite database in PK or in SC? If it's in SC, I don't see why you need it as SC puts everything in its xapian database anyway. And I don't see why PK would need it, as libzypp has its down database to query. Confused, Michael. -- Michael Schroeder mls@suse.de SUSE LINUX Products GmbH, GF Jeff Hawn, HRB 16746 AG Nuernberg main(_){while(_=~getchar())putchar(~_-1/(~(_|32)/13*2-11)*13);} -- To unsubscribe, e-mail: opensuse-project+unsubscribe@opensuse.org To contact the owner, email: opensuse-project+owner@opensuse.org
Hi!
Let's clarify this ;-)
The database is in PK and therefore generic, it can be used by every
other tool too. The Xapian database you mentioned is the AppStream DB
with application data and the packages in which the application is, it
does not contain other data. (well, not entirely true, but that's the
most important part)
Now the SC needs to query certain information from PK, e.g. if a
package is already installed or the description of a package to
display it.
Problem here is that you can't run thousands of Resolve() calls on
PackageKit, as this is very slow and there's no chance to optimize it.
Also, all data has to go through DBus, which slows down the process
too.
Solution was to first optimize these calls, which is done already.
Unfortunately, if a transaction is running you also can't spawn
another one, so e.g. while you're installing Foo you can't view the
details page for Bar.
We now use the SQL cache to query this information async and very
fast, without DBus.
I'm not 100% happy with this solution, but it's the best we have at
time. I first suggested to avoid the DBus trip and cache-reopen-run by
linking the PackageKit access libraries directly against PK backends,
so we would be able to access the cache directly for read actions, but
this idea was rejected upstream.
We're preparing some other solutions there anyway, but nothing as fast
as a cache would be, unfortunately.
I hope this clear things up :)
Cheers,
Matthias
2012/6/4 Michael Schroeder
Hi!
On Mon, Jun 04, 2012 at 11:45:44AM +0200, Matthias Klumpp wrote:
In the last two weeks, I discussed all these issues with Richard Hughes, the PackageKit maintainer. I wrote a prototype of a SQLite based package-cache, which will make querying data super-fast and will make it possible to do this in parallel to PK transactions. First, the cache was super-slow, but with some optimizations in PkPackageSack routines, it now runs with acceptable speed so we can use it.
Hmm, I don't understand this. Is this sqlite database in PK or in SC? If it's in SC, I don't see why you need it as SC puts everything in its xapian database anyway. And I don't see why PK would need it, as libzypp has its down database to query.
Confused, Michael.
-- Michael Schroeder mls@suse.de SUSE LINUX Products GmbH, GF Jeff Hawn, HRB 16746 AG Nuernberg main(_){while(_=~getchar())putchar(~_-1/(~(_|32)/13*2-11)*13);}
-- To unsubscribe, e-mail: opensuse-project+unsubscribe@opensuse.org To contact the owner, email: opensuse-project+owner@opensuse.org
On Mon, Jun 04, 2012 at 01:36:51PM +0200, Matthias Klumpp wrote:
The database is in PK and therefore generic, it can be used by every other tool too. The Xapian database you mentioned is the AppStream DB with application data and the packages in which the application is, it does not contain other data. (well, not entirely true, but that's the most important part) Now the SC needs to query certain information from PK, e.g. if a package is already installed or the description of a package to display it. Problem here is that you can't run thousands of Resolve() calls on PackageKit, as this is very slow and there's no chance to optimize it.
Why not? How about adding a "ResolveMultiple()" or something like that?
Also, all data has to go through DBus, which slows down the process too. Solution was to first optimize these calls, which is done already. Unfortunately, if a transaction is running you also can't spawn another one, so e.g. while you're installing Foo you can't view the details page for Bar.
I don't see why you need to run a transaction for Resolve(). (It's probably also pretty PK-backend specific, I guess)
We now use the SQL cache to query this information async and very fast, without DBus.
So SC opens the PK sqlite database? That sounds pretty awful. Even if there's a database it would be much saner to go through dbus for the access (and use something like ResolveMultiple). Cheers, Michael. -- Michael Schroeder mls@suse.de SUSE LINUX Products GmbH, GF Jeff Hawn, HRB 16746 AG Nuernberg main(_){while(_=~getchar())putchar(~_-1/(~(_|32)/13*2-11)*13);} -- To unsubscribe, e-mail: opensuse-project+unsubscribe@opensuse.org To contact the owner, email: opensuse-project+owner@opensuse.org
Hi!
Your suggestions still don't solve the issue that I can't run multiple
transactions at the same time - some backends don't support that also.
So I will have to run super-big transactions while starting the SC and
cache all results, which is slow.
Just for understanding, a "transaction" is any action which performs
any task on the package database using a PackageKit backend, for
example InstallPackages(), RemovePackages(), GetDetails(), Resolve()
... You can run these actions on multiple packages, but that is slow.
Just running it for one package as soon as we need the results is
better but still slow and if we're doing InstallPackages(), we can't
run GetDetails() at the same time.
Nearly all package-manager backends aren't threadsafe, so
parallelizing that could cause unexpected behavior.
To see how slow these actions are, you can just try an older SC
version or maybe run "pkcon get-packages", but without having the
caches enabled. (PackageKit already cached some of these data, to
enable fast filtering in frontends)
Regards,
Matthias
2012/6/4 Michael Schroeder
On Mon, Jun 04, 2012 at 01:36:51PM +0200, Matthias Klumpp wrote:
The database is in PK and therefore generic, it can be used by every other tool too. The Xapian database you mentioned is the AppStream DB with application data and the packages in which the application is, it does not contain other data. (well, not entirely true, but that's the most important part) Now the SC needs to query certain information from PK, e.g. if a package is already installed or the description of a package to display it. Problem here is that you can't run thousands of Resolve() calls on PackageKit, as this is very slow and there's no chance to optimize it.
Why not? How about adding a "ResolveMultiple()" or something like that?
Also, all data has to go through DBus, which slows down the process too. Solution was to first optimize these calls, which is done already. Unfortunately, if a transaction is running you also can't spawn another one, so e.g. while you're installing Foo you can't view the details page for Bar.
I don't see why you need to run a transaction for Resolve(). (It's probably also pretty PK-backend specific, I guess)
We now use the SQL cache to query this information async and very fast, without DBus.
So SC opens the PK sqlite database? That sounds pretty awful. Even if there's a database it would be much saner to go through dbus for the access (and use something like ResolveMultiple).
Cheers, Michael.
-- Michael Schroeder mls@suse.de SUSE LINUX Products GmbH, GF Jeff Hawn, HRB 16746 AG Nuernberg main(_){while(_=~getchar())putchar(~_-1/(~(_|32)/13*2-11)*13);} -- To unsubscribe, e-mail: opensuse-project+unsubscribe@opensuse.org To contact the owner, email: opensuse-project+owner@opensuse.org
Hi, me again ;) On Mon, Jun 04, 2012 at 01:36:51PM +0200, Matthias Klumpp wrote:
The database is in PK and therefore generic, it can be used by every other tool too.
Just a couple of words about storing package data into a big sqlite database: that's what opensuse-10.X did, and updating the database with new repository data was so slow that this led to the development of libsatsolver (now libsolv) used in newer opensuse versions. I really don't see why there's need for yet another database when the packagekit backends already maintain their own ones. Cheers, Michael. -- Michael Schroeder mls@suse.de SUSE LINUX Products GmbH, GF Jeff Hawn, HRB 16746 AG Nuernberg main(_){while(_=~getchar())putchar(~_-1/(~(_|32)/13*2-11)*13);} -- To unsubscribe, e-mail: opensuse-project+unsubscribe@opensuse.org To contact the owner, email: opensuse-project+owner@opensuse.org
Hi!
I already answered that question very extensively, I think on this
list too (no only on disttributions@fd.o)
However, I dislike the cache approach very much and right now trying
other things which might solve the issue too, if upstream accepts my
suggestions.
Was really SQLite the problem? Which database are you using now?
Cheers,
Matthias
2012/6/5 Michael Schroeder
Hi, me again ;)
On Mon, Jun 04, 2012 at 01:36:51PM +0200, Matthias Klumpp wrote:
The database is in PK and therefore generic, it can be used by every other tool too.
Just a couple of words about storing package data into a big sqlite database: that's what opensuse-10.X did, and updating the database with new repository data was so slow that this led to the development of libsatsolver (now libsolv) used in newer opensuse versions. I really don't see why there's need for yet another database when the packagekit backends already maintain their own ones.
Cheers, Michael.
-- Michael Schroeder mls@suse.de SUSE LINUX Products GmbH, GF Jeff Hawn, HRB 16746 AG Nuernberg main(_){while(_=~getchar())putchar(~_-1/(~(_|32)/13*2-11)*13);}
-- To unsubscribe, e-mail: opensuse-project+unsubscribe@opensuse.org To contact the owner, email: opensuse-project+owner@opensuse.org
On Tue, Jun 05, 2012 at 11:33:58AM +0200, Matthias Klumpp wrote:
I already answered that question very extensively, I think on this list too (no only on disttributions@fd.o)
What question? ;)
However, I dislike the cache approach very much and right now trying other things which might solve the issue too, if upstream accepts my suggestions. Was really SQLite the problem? Which database are you using now?
Yes, the slowness was caused by sqlite (but that was a coupld of years ago, I don't know if newer sqlite versions are faster). Some users tend to have lots of repositories enabled, so the number of packages that need to be stored in the database may be quite big. Yum also uses sqlite databses, but they create them on the server for each repo, so the client never has to update them. libsolv (and thus also libzypp) uses "solv" databases instead of sqlite databases, like yum it uses one database for each repository so that updating one repository is still fast when there are lots of other repositories. Cheers, Michael. -- Michael Schroeder mls@suse.de SUSE LINUX Products GmbH, GF Jeff Hawn, HRB 16746 AG Nuernberg main(_){while(_=~getchar())putchar(~_-1/(~(_|32)/13*2-11)*13);} -- To unsubscribe, e-mail: opensuse-project+unsubscribe@opensuse.org To contact the owner, email: opensuse-project+owner@opensuse.org
Le lundi 04 juin 2012, Matthias Klumpp a écrit :
If you read through this now, I'd like to ask you if you have anything you want to do with PackageKit (which is not very backend-specific) and haven't been able to do until now. - If there is any of these issues, we might now be able to fix it.
I'm not involved in all this but there are reasons (which I don't know) explaining why aptdaemon got introduced when the natural choice should have been to decide to use PK. Have you looked into those reasons and can PK fulfill everything that aptdaemon does? It would be great to have a cross-distro solution that's really cross-distro and that does not force everybody into the lowest common denominator. Cheers, -- Raphaël Hertzog ◈ Writer/Consultant ◈ Debian Developer Get the Debian Administrator's Handbook: → http://debian-handbook.info/get/ -- To unsubscribe, e-mail: opensuse-project+unsubscribe@opensuse.org To contact the owner, email: opensuse-project+owner@opensuse.org
Hi!
2012/6/4 Raphael Hertzog
Le lundi 04 juin 2012, Matthias Klumpp a écrit :
If you read through this now, I'd like to ask you if you have anything you want to do with PackageKit (which is not very backend-specific) and haven't been able to do until now. - If there is any of these issues, we might now be able to fix it.
I'm not involved in all this but there are reasons (which I don't know) explaining why aptdaemon got introduced when the natural choice should have been to decide to use PK. Yes, this is part of the current discussion and something which bothered me a lot during the past time. I think we now have the choice to fix this once and for all. The reason why aptd was introduced was that PK wasn't able to fulfill some special Debian/Ubuntu requirements, for example Debconf support. Debconf is a very flexible system to ask questions during installation of packages, something which PK explicitly forbids by policy. (See hughsie's law) We (mostly Daniel, I did a few things too) managed to workaround this issue last year. Also, aptd only has one PolicyKit policy to do all actions and a DBus interface to perform actions on, allowing some more advanced tools to use aptd too. Right now, something similar is discussed, but it's a very slow discussion. Last but not least, aptd supports some Ubuntu specifics (purchasing apps, plugins) and does some other things which are more or less covered by PK already.
Have you looked into those reasons and can PK fulfill everything that aptdaemon does? Right now, PK can do some stuff aptd can't and aptd can do things PK can't. In theory, PK can do everything Aptd can, if some API changes are made, and this needs to be discussed with Richard, who usually has some good comments about that changes.
It would be great to have a cross-distro solution that's really cross-distro and that does not force everybody into the lowest common denominator. Agreed :-) At least we managed to get all stuff ready for Debian now, and I'm happy that PK will be part of the new Wheezy release. (You can already try it there!) PK is fully functional in Debian, there are just some things missing I need to implement the SC properly. It would be cool if you, as DD, could maybe comment on that stuff too, as I am the only "Debian" person right now and other people are from Ubuntu. The cool part is that improving PK will help every distribution, not just OpenSUSE or Debian :-) Regards, Matthias -- To unsubscribe, e-mail: opensuse-project+unsubscribe@opensuse.org To contact the owner, email: opensuse-project+owner@opensuse.org
participants (3)
-
Matthias Klumpp
-
Michael Schroeder
-
Raphael Hertzog