Mailinglist Archive: zypp-devel (70 mails)
| < Previous | Next > |
[zypp-devel] Re: [zypp-commit] r7678 - in /trunk/sat-solver/tools: ./ CMakeLists.txt attr_store.c attr_store.h attr_store_p.h dumpattr.c repo_susetags.c repo_susetags.h susetags2solv.c
- From: Michael Matz <matz@xxxxxxx>
- Date: Mon, 29 Oct 2007 20:27:53 +0100 (CET)
- Message-id: <Pine.LNX.4.64.0710291938560.23011@xxxxxxxxxxxxx>
Hi,
On Mon, 29 Oct 2007, Duncan Mac-Vicar Prett wrote:
I agree that an attribute store and the solvable data structures need to
be integrated somewhat, exactly for the reason you state, sometimes a
solver (or something sitting on top of it) needs to know assorted
information about solvables for whatever it wants to achieve.
A common theme for such information is that it's used only relatively
seldomly (e.g. only when there are several alternatives, which are
completely equivalent from a dependency perspective, but for which other
information could provide guidance which of the alternatives would be
preferred). I.e. it's used only when the depsolver itself can't determine
everything on it's own. To that end such additional information usually
is only accessed on demand.
This property (seldomly used) is shared with purely diagnostic information
(for instance to show in a GUI, like package summary or description),
which also only needs to be retrieved when needed.
There's another property of all this information: once determined for a
repository it usually doesn't change, i.e. is read-only mostly.
So, we have already one separation of all the information bits: necessary
for the depsolver, hence accessed all the time, and all of it, and "rest".
My intent is to implement all this "rest" information as side-band info to
the solver itself, so that it doesn't take much space when it isn't
actually accessed. Collectively I call this rest information
"attributes". At the same time I want accesses to be cumulatively "fast
enough", i.e. accessing much of that information should still be fairly
fast, even though I'm willing to pay a certain setup price (for initially
accessing it).
Additionally for such rest information constant memory usage (what has to
be kept in-core once it is accessed at all) is more important to me than
speed of retrieval, as long as retrieval is "fast enough".
For some of these attributes it makes sense to be able to "grep" them,
i.e. search for the set of attributes which match certain criteria.
Now, at this point this all looks much like I want a database :-) If it
weren't for memory requirements (and perhaps ease of use) :-)
So, the rough design goals I had in my brain were such (in no particular
order):
* needs to be possible to store all kinds of information bits: numbers,
string, chunks of large data (e.g. for descriptions)
* the set of information bits associated with one data-set needs to be
dynamic (i.e. can be extended later)
* memory usage should be low
* access reasonable fast
* grepable (i.e. some sort of query interface, but not too elaborate)
* should be able to integrate into sat-solver proper
To reach these goals I made certain design decisions:
* large data blobs are stored separately and are _not_ generally held in
core. Only (offset,length) pairs are stored directly.
* attributes are name,type,value triples, with name being a string, and
value being of several basic types (number, string, blob)
* a set of attributes simply is a list of such triples
* attribute names actually are uniquified strings, i.e. represented only
as IDs (local to one store, other stores might use the same ID for a
different name, but the name strings itself are stored and hence
uniquied in the Pool). This leads to those name IDs being small numbers
* maximum 1<<12 different attribute names are possible per store
* maximum 1<<4 types are possible (so (name,type) fits into one short)
* for space and efficiency reason we support a store local uniquified
string pool. So the values can be two types of strings:
- an inline string, represented as is in the list of attributes
the same string in different attributes is represented multiple
times
- an string ID, pointing into the store-local string pool, so all
equal strings store in this manner are represented only once (very
good e.g. for the License: tag)
this string pool is _not_ merged with the Pool stringpool, it would just
cost time in loading an attribute store and wouldn't provide many
advantages.
* two internal representations: one read-only (the space efficient one),
one where you can dynamically add attributes.
* the read-only representation uses schema abbreviations, which are a list
of sorted (name,type) pairs, which can be references by the individual
attribute lists (which then only need to store the actual data per
data-set, not all the (name,type) specifiers again and again, which tend
to be equal for many data-sets. That's like in DWARF (the debug
format).
I only started to implement this on saturday, so I'm sure I'll revisit
some of those decisions over time when I implement the other things
necessary. In particular these things are still missing:
* easily usable iterator over the read-only representation
* grep interface
here I intend to provide some functions with callbacks which then get
either the matching data-sets (like grep -l), or the data-sets and all
(name,value) pair which matched (like normal grep).
I also intend to provide different search strategies:
* only in certain attributes, or in all of them
* only a substring search or a regexp or a fileglob (let's see how
determined I am during my holidays :-))
* support for big blobs (that means generating them, storing them away,
and loading them on demand, I have in mind either a paging scheme with
compressed pages, or an uncompressed scheme)
And of course making use of all of this in the several xxx2solv
generators. A start can be seen in susetags2solv which (when called with
'-a') emits a file "test.attr" containing license, authors, group and
keywords of each solvable.
Ciao,
Michael.
--
To unsubscribe, e-mail: zypp-devel+unsubscribe@xxxxxxxxxxxx
For additional commands, e-mail: zypp-devel+help@xxxxxxxxxxxx
On Mon, 29 Oct 2007, Duncan Mac-Vicar Prett wrote:
It's not finished yet, neither the on-disk not the in-memory layout.
Not the interfaces :-) You can be sure that I'll document everything
when I'm more satisfied with the whole thing.
I am really interested in the discussion about putting non-solvable data
in the solver, or how to define solvable data, for example vendor can be
used for solver policies (cross vendor upgrades), sizes can be use by
solver for installation summary, etc. and at the same time they cost
time and space.
I agree that an attribute store and the solvable data structures need to
be integrated somewhat, exactly for the reason you state, sometimes a
solver (or something sitting on top of it) needs to know assorted
information about solvables for whatever it wants to achieve.
A common theme for such information is that it's used only relatively
seldomly (e.g. only when there are several alternatives, which are
completely equivalent from a dependency perspective, but for which other
information could provide guidance which of the alternatives would be
preferred). I.e. it's used only when the depsolver itself can't determine
everything on it's own. To that end such additional information usually
is only accessed on demand.
This property (seldomly used) is shared with purely diagnostic information
(for instance to show in a GUI, like package summary or description),
which also only needs to be retrieved when needed.
There's another property of all this information: once determined for a
repository it usually doesn't change, i.e. is read-only mostly.
So, we have already one separation of all the information bits: necessary
for the depsolver, hence accessed all the time, and all of it, and "rest".
My intent is to implement all this "rest" information as side-band info to
the solver itself, so that it doesn't take much space when it isn't
actually accessed. Collectively I call this rest information
"attributes". At the same time I want accesses to be cumulatively "fast
enough", i.e. accessing much of that information should still be fairly
fast, even though I'm willing to pay a certain setup price (for initially
accessing it).
Additionally for such rest information constant memory usage (what has to
be kept in-core once it is accessed at all) is more important to me than
speed of retrieval, as long as retrieval is "fast enough".
For some of these attributes it makes sense to be able to "grep" them,
i.e. search for the set of attributes which match certain criteria.
Now, at this point this all looks much like I want a database :-) If it
weren't for memory requirements (and perhaps ease of use) :-)
So, the rough design goals I had in my brain were such (in no particular
order):
* needs to be possible to store all kinds of information bits: numbers,
string, chunks of large data (e.g. for descriptions)
* the set of information bits associated with one data-set needs to be
dynamic (i.e. can be extended later)
* memory usage should be low
* access reasonable fast
* grepable (i.e. some sort of query interface, but not too elaborate)
* should be able to integrate into sat-solver proper
To reach these goals I made certain design decisions:
* large data blobs are stored separately and are _not_ generally held in
core. Only (offset,length) pairs are stored directly.
* attributes are name,type,value triples, with name being a string, and
value being of several basic types (number, string, blob)
* a set of attributes simply is a list of such triples
* attribute names actually are uniquified strings, i.e. represented only
as IDs (local to one store, other stores might use the same ID for a
different name, but the name strings itself are stored and hence
uniquied in the Pool). This leads to those name IDs being small numbers
* maximum 1<<12 different attribute names are possible per store
* maximum 1<<4 types are possible (so (name,type) fits into one short)
* for space and efficiency reason we support a store local uniquified
string pool. So the values can be two types of strings:
- an inline string, represented as is in the list of attributes
the same string in different attributes is represented multiple
times
- an string ID, pointing into the store-local string pool, so all
equal strings store in this manner are represented only once (very
good e.g. for the License: tag)
this string pool is _not_ merged with the Pool stringpool, it would just
cost time in loading an attribute store and wouldn't provide many
advantages.
* two internal representations: one read-only (the space efficient one),
one where you can dynamically add attributes.
* the read-only representation uses schema abbreviations, which are a list
of sorted (name,type) pairs, which can be references by the individual
attribute lists (which then only need to store the actual data per
data-set, not all the (name,type) specifiers again and again, which tend
to be equal for many data-sets. That's like in DWARF (the debug
format).
I only started to implement this on saturday, so I'm sure I'll revisit
some of those decisions over time when I implement the other things
necessary. In particular these things are still missing:
* easily usable iterator over the read-only representation
* grep interface
here I intend to provide some functions with callbacks which then get
either the matching data-sets (like grep -l), or the data-sets and all
(name,value) pair which matched (like normal grep).
I also intend to provide different search strategies:
* only in certain attributes, or in all of them
* only a substring search or a regexp or a fileglob (let's see how
determined I am during my holidays :-))
* support for big blobs (that means generating them, storing them away,
and loading them on demand, I have in mind either a paging scheme with
compressed pages, or an uncompressed scheme)
And of course making use of all of this in the several xxx2solv
generators. A start can be seen in susetags2solv which (when called with
'-a') emits a file "test.attr" containing license, authors, group and
keywords of each solvable.
Ciao,
Michael.
--
To unsubscribe, e-mail: zypp-devel+unsubscribe@xxxxxxxxxxxx
For additional commands, e-mail: zypp-devel+help@xxxxxxxxxxxx
| < Previous | Next > |