Re: [zypp-devel] zypper performance gone bad, bad, bad
Stephan Kulow napsal(a):
Hi,
Looks like the implementation of locks has all the old problems ;(
I cancelled the callgrind call, but till then it called 415 826 times PoolQueryIterator::matchSolvable, which in turn costed 28 382 428 146 instructions.
Greetings, Stephan
Locks depends on PoolQuery which uses DataIterator...if we expect lot of locks from ui, which is exact string match, then we can optimalize PoolQuery (or directly locks by extracting information about lock from query) for this (e.g. using index on names). Pepa
linux-wcg3:/home/coolo # time zypper in fate Reading installed packages... 'fate' is already installed. Nothing to do.
real 0m1.648s user 0m1.240s sys 0m0.360s linux-wcg3:/home/coolo # mv /etc/zypp/locks{.away,} linux-wcg3:/home/coolo # grep kind /etc/zypp/locks | wc -l 8 linux-wcg3:/home/coolo # time zypper in fate Reading installed packages... 'fate' is already installed. Nothing to do.
real 0m21.176s user 0m20.449s sys 0m0.592s
-- To unsubscribe, e-mail: zypp-devel+unsubscribe@opensuse.org For additional commands, e-mail: zypp-devel+help@opensuse.org
Hi, On Fri, 16 May 2008, josef reidiner wrote:
Stephan Kulow napsal(a):
Hi,
Looks like the implementation of locks has all the old problems ;(
I cancelled the callgrind call, but till then it called 415 826 times PoolQueryIterator::matchSolvable, which in turn costed 28 382 428 146 instructions.
Greetings, Stephan
Locks depends on PoolQuery which uses DataIterator...if we expect lot of locks from ui, which is exact string match, then we can optimalize PoolQuery (or directly locks by extracting information about lock from query) for this (e.g. using index on names).
Nah. The problem seems to be that right now PoolQuery wants to match strings on it's own instead of letting it be done by the DataIterator, so it has to iterate over all solvables itself, and somewhere there must be a bug that leads to actually viting all solvables again leading to quadraticness. There's clearly something going very wrong in PoolQuerys use of the DataIterator, I see it in gdb. Ciao, Michael. -- To unsubscribe, e-mail: zypp-devel+unsubscribe@opensuse.org For additional commands, e-mail: zypp-devel+help@opensuse.org
Hi, On Fri, 16 May 2008, Michael Matz wrote:
Nah. The problem seems to be that right now PoolQuery wants to match strings on it's own instead of letting it be done by the DataIterator, so it has to iterate over all solvables itself, and somewhere there must be a bug that leads to actually viting all solvables again leading to quadraticness. There's clearly something going very wrong in PoolQuerys use of the DataIterator, I see it in gdb.
I know coolos problem I think. He had a lock file written with the old attribute names still, like: kind: package solvable_name: build string_type: glob When parsing this in, "kind" and "string_type" are parsed as normal attributes to match, because they meanwhile are called "type" and "match_type". This makes it a three-attribute query (where two of them never will match anything because they are special attributes that aren't recognized anymore). And queries with more than one attribute are not matched by the dataiterator but by libzypp itself, which is _very_ inefficient (it basically retrieves _all_ attributes of all solvables and tries to match them against the query). To show you the difference, with my locks file using the old attribute names: % time zypper in fate real 0m42.440s user 0m41.127s sys 0m1.204s When I edit the locks file by hand to use the meanwhile correct attribute names (but otherwise the same): % time zypper in fate real 0m2.161s user 0m1.312s sys 0m0.252s So, coolo: do this on your locks file: sed -e 's/^kind:/type:/' -e 's/^string_type:/match_type:/' Josef: it would be worthwhile I think to still recognize and parse the old names (without exposing them in the API) to avoid this problem with lock files written in the intermediate form. Ciao, Michael. -- To unsubscribe, e-mail: zypp-devel+unsubscribe@opensuse.org For additional commands, e-mail: zypp-devel+help@opensuse.org
Am Samstag, 17. Mai 2008 schrieb Michael Matz:
Hi,
On Fri, 16 May 2008, Michael Matz wrote:
Nah. The problem seems to be that right now PoolQuery wants to match strings on it's own instead of letting it be done by the DataIterator, so it has to iterate over all solvables itself, and somewhere there must be a bug that leads to actually viting all solvables again leading to quadraticness. There's clearly something going very wrong in PoolQuerys use of the DataIterator, I see it in gdb.
I know coolos problem I think. He had a lock file written with the old attribute names still, like:
kind: package solvable_name: build string_type: glob
When parsing this in, "kind" and "string_type" are parsed as normal attributes to match, because they meanwhile are called "type" and "match_type". This makes it a three-attribute query (where two of them never will match anything because they are special attributes that aren't recognized anymore). And queries with more than one attribute are not matched by the dataiterator but by libzypp itself, which is _very_ inefficient (it basically retrieves _all_ attributes of all solvables and tries to match them against the query).
To show you the difference, with my locks file using the old attribute names:
% time zypper in fate real 0m42.440s user 0m41.127s sys 0m1.204s
When I edit the locks file by hand to use the meanwhile correct attribute names (but otherwise the same):
% time zypper in fate real 0m2.161s user 0m1.312s sys 0m0.252s
So, coolo: do this on your locks file: sed -e 's/^kind:/type:/' -e 's/^string_type:/match_type:/'
Josef: it would be worthwhile I think to still recognize and parse the old names (without exposing them in the API) to avoid this problem with lock files written in the intermediate form.
What I don't understand is, why zypper addlock fate adds type: package match_type: glob case_sensitive: on solvable_name: fate Shouldn't the match be exact unless I specify glob chars? Or is this logic while reading? But you're right, if I redo my locks with zypper addlock I get a different result, that is not _that_ bad: with 8 package names locked: real 0m2.308s user 0m1.876s without any locks: real 0m1.718s user 0m1.272s So it's only 50% slower with 8 locks. That's an improvement against the 1700% with my old locks file. Greetings, Stephan -- To unsubscribe, e-mail: zypp-devel+unsubscribe@opensuse.org For additional commands, e-mail: zypp-devel+help@opensuse.org
Hi, On Sat, 17 May 2008, Stephan Kulow wrote:
What I don't understand is, why zypper addlock fate adds
type: package match_type: glob case_sensitive: on solvable_name: fate
Shouldn't the match be exact unless I specify glob chars? Or is this logic while reading?
I would guess so, yes, but right now zypper just simply hardcodes glob matching. In the end it doesn't really matter as matching a string with fnmatch(3) that doesn't contain glob chars is equivalent to string matching.
So it's only 50% slower with 8 locks. That's an improvement against the 1700% with my old locks file.
Still not ideal, but at least no blocker anymore. Ciao, Michael. -- To unsubscribe, e-mail: zypp-devel+unsubscribe@opensuse.org For additional commands, e-mail: zypp-devel+help@opensuse.org
Michael Matz napsal(a):
Hi,
On Sat, 17 May 2008, Stephan Kulow wrote:
What I don't understand is, why zypper addlock fate adds
type: package match_type: glob case_sensitive: on solvable_name: fate
Shouldn't the match be exact unless I specify glob chars? Or is this logic while reading?
I would guess so, yes, but right now zypper just simply hardcodes glob matching. In the end it doesn't really matter as matching a string with fnmatch(3) that doesn't contain glob chars is equivalent to string matching.
So it's only 50% slower with 8 locks. That's an improvement against the 1700% with my old locks file.
Still not ideal, but at least no blocker anymore.
Ciao, Michael.
I still think, that the best way is different implementation for exact matching (YaST use it (?) ). Good is same index on name (is something liek that already exist on pool?) which can fast find by exact matching like some tree(O(log(n))) or hashing (O(1)). Then when user choose from UI e.g. 100 exact packaga for lock, you only need 100 quick look to index during each start. Pepa -- To unsubscribe, e-mail: zypp-devel+unsubscribe@opensuse.org For additional commands, e-mail: zypp-devel+help@opensuse.org
Hi, On Sun, 18 May 2008, josef reidiner wrote:
I still think, that the best way is different implementation for exact matching (YaST use it (?) ). Good is same index on name (is something liek that already exist on pool?) which can fast find by exact matching like some tree(O(log(n))) or hashing (O(1)). Then when user choose from UI e.g. 100 exact packaga for lock, you only need 100 quick look to index during each start.
I never ever want to see libzypp needing many megabytes of RAM any more, except if absolutely required. That's why I'm so against any indexes and try to find solutions that wouldn't require any, or could make use of data we already have. For instance in this case: if we want to speedup searches for exact package names (and only for names), then we can reuse the whatprovides arrays. Every package N will have a provide "N = version", and the provides _are_ indexed already by libsatsolver. There will potentially be more packages also providing N, but that can be quickly sorted out. So, the strategy would be: make sure whatprovides index exist for all solvables S in whatprovides[N] if nameof(S) == N found(S) Ciao, Michael. -- To unsubscribe, e-mail: zypp-devel+unsubscribe@opensuse.org For additional commands, e-mail: zypp-devel+help@opensuse.org
On Mon, May 19, Michael Matz wrote:
Hi,
On Sun, 18 May 2008, josef reidiner wrote:
I still think, that the best way is different implementation for exact matching (YaST use it (?) ). Good is same index on name (is something liek that already exist on pool?) which can fast find by exact matching like some tree(O(log(n))) or hashing (O(1)). Then when user choose from UI e.g. 100 exact packaga for lock, you only need 100 quick look to index during each start.
I never ever want to see libzypp needing many megabytes of RAM any more, except if absolutely required. That's why I'm so against any indexes and try to find solutions that wouldn't require any, or could make use of data we already have.
For instance in this case: if we want to speedup searches for exact package names (and only for names), then we can reuse the
Use ResPool::byIdent_iterator ResPool::byIdentBegin( ResKind::package, name ) ResPool::byIdentEnd ( ResKind::package, name ) It's purpose is to iterate over all items with that name. It's implemementation should be 'as fast as possible', but no need for the application to know the details. If it can be improved, all users will benefit. ( ResPool::byIdent is actually based on an (Id->PoolItem) index. The only index we maintain, and the implementation provided by matz@ is reasonable fast ;) ) -- cu, Michael Andres +------------------------------------------------------------------+ Key fingerprint = 2DFA 5D73 18B1 E7EF A862 27AC 3FB8 9E3A 27C6 B0E4 +------------------------------------------------------------------+ Michael Andres YaST Development ma@novell.com SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nuernberg) Maxfeldstrasse 5, D-90409 Nuernberg, Germany, ++49 (0)911 - 740 53-0 +------------------------------------------------------------------+ -- To unsubscribe, e-mail: zypp-devel+unsubscribe@opensuse.org For additional commands, e-mail: zypp-devel+help@opensuse.org
Hi, On Tue, 27 May 2008, Michael Andres wrote:
( ResPool::byIdent is actually based on an (Id->PoolItem) index. The only index we maintain, and the implementation provided by matz@ is reasonable fast ;)
Whoops, I'm becoming old it seems :-) Ciao, Michael. -- To unsubscribe, e-mail: zypp-devel+unsubscribe@opensuse.org For additional commands, e-mail: zypp-devel+help@opensuse.org
Michael Matz napsal(a):
Hi,
On Fri, 16 May 2008, Michael Matz wrote:
Nah. The problem seems to be that right now PoolQuery wants to match strings on it's own instead of letting it be done by the DataIterator, so it has to iterate over all solvables itself, and somewhere there must be a bug that leads to actually viting all solvables again leading to quadraticness. There's clearly something going very wrong in PoolQuerys use of the DataIterator, I see it in gdb.
I know coolos problem I think. He had a lock file written with the old attribute names still, like:
kind: package solvable_name: build string_type: glob
When parsing this in, "kind" and "string_type" are parsed as normal attributes to match, because they meanwhile are called "type" and "match_type". This makes it a three-attribute query (where two of them never will match anything because they are special attributes that aren't recognized anymore). And queries with more than one attribute are not matched by the dataiterator but by libzypp itself, which is _very_ inefficient (it basically retrieves _all_ attributes of all solvables and tries to match them against the query).
To show you the difference, with my locks file using the old attribute names:
% time zypper in fate real 0m42.440s user 0m41.127s sys 0m1.204s
When I edit the locks file by hand to use the meanwhile correct attribute names (but otherwise the same):
% time zypper in fate real 0m2.161s user 0m1.312s sys 0m0.252s
So, coolo: do this on your locks file: sed -e 's/^kind:/type:/' -e 's/^string_type:/match_type:/'
Josef: it would be worthwhile I think to still recognize and parse the old names (without exposing them in the API) to avoid this problem with lock files written in the intermediate form.
Ciao, Michael.
OK, good idea, only read and during rewrite new name is used. I add it. Pepa -- To unsubscribe, e-mail: zypp-devel+unsubscribe@opensuse.org For additional commands, e-mail: zypp-devel+help@opensuse.org
Michael Matz wrote:
Hi,
On Fri, 16 May 2008, Michael Matz wrote:
Nah. The problem seems to be that right now PoolQuery wants to match strings on it's own instead of letting it be done by the DataIterator, so it has to iterate over all solvables itself, and somewhere there must be a bug that leads to actually viting all solvables again leading to quadraticness. There's clearly something going very wrong in PoolQuerys use of the DataIterator, I see it in gdb.
I know coolos problem I think. He had a lock file written with the old attribute names still, like:
kind: package solvable_name: build string_type: glob
When parsing this in, "kind" and "string_type" are parsed as normal attributes to match, because they meanwhile are called "type" and "match_type". This makes it a three-attribute query (where two of them never will match anything because they are special attributes that aren't recognized anymore). And queries with more than one attribute are not matched by the dataiterator but by libzypp itself, which is _very_ inefficient (it basically retrieves _all_ attributes of all solvables and tries to match them against the query).
Another problem is when user mistype when edit lockfile. Exist any method like known solvable string??? I can remove all non-exist attributes and also log it. Pepa -- To unsubscribe, e-mail: zypp-devel+unsubscribe@opensuse.org For additional commands, e-mail: zypp-devel+help@opensuse.org
Hi, On Tue, 20 May 2008, Josef Reidinger wrote:
When parsing this in, "kind" and "string_type" are parsed as normal attributes to match, because they meanwhile are called "type" and "match_type". This makes it a three-attribute query (where two of them never will match anything because they are special attributes that aren't recognized anymore). And queries with more than one attribute are not matched by the dataiterator but by libzypp itself, which is _very_ inefficient (it basically retrieves _all_ attributes of all solvables and tries to match them against the query).
Another problem is when user mistype when edit lockfile. Exist any method like known solvable string??? I can remove all non-exist attributes and also log it.
Not really, as the solvable attribute names are extensible. Ciao, Michael. -- To unsubscribe, e-mail: zypp-devel+unsubscribe@opensuse.org For additional commands, e-mail: zypp-devel+help@opensuse.org
Michael Matz wrote:
Josef: it would be worthwhile I think to still recognize and parse the old names (without exposing them in the API) to avoid this problem with lock files written in the intermediate form.
Why? The old names existed only during two betas or so. Why not just get rid of them and not pollute the code? Instead a mechanism to detect and log (at least) an unknown attribute (whether it is a sat attribute or PoolQuery's own) should be added. cheers, jano -- To unsubscribe, e-mail: zypp-devel+unsubscribe@opensuse.org For additional commands, e-mail: zypp-devel+help@opensuse.org
Hi, On Tue, 20 May 2008, Jan Kupec wrote:
Josef: it would be worthwhile I think to still recognize and parse the old names (without exposing them in the API) to avoid this problem with lock files written in the intermediate form.
Why?
Why not? :-) It doesn't clutter up the sources too much and is not in a performance sensitive area, so we can do that. OTOH you are right that it existed only very shortly, but as you can see in this thread even that was enough that someone was hit by the problem.
The old names existed only during two betas or so. Why not just get rid of them and not pollute the code? Instead a mechanism to detect and log (at least) an unknown attribute (whether it is a sat attribute or PoolQuery's own) should be added.
I'm not sure this would work. The attributes are extensible, hence libzypp can't know which attributes are unknown, and which aren't. Ciao, Michael. -- To unsubscribe, e-mail: zypp-devel+unsubscribe@opensuse.org For additional commands, e-mail: zypp-devel+help@opensuse.org
participants (6)
-
Jan Kupec
-
josef reidiner
-
Josef Reidinger
-
Michael Andres
-
Michael Matz
-
Stephan Kulow