Hi, On Mon, 29 Oct 2007, Duncan Mac-Vicar Prett wrote:
It's not finished yet, neither the on-disk not the in-memory layout. Not the interfaces :-) You can be sure that I'll document everything when I'm more satisfied with the whole thing.
I am really interested in the discussion about putting non-solvable data in the solver, or how to define solvable data, for example vendor can be used for solver policies (cross vendor upgrades), sizes can be use by solver for installation summary, etc. and at the same time they cost time and space.
I agree that an attribute store and the solvable data structures need to be integrated somewhat, exactly for the reason you state, sometimes a solver (or something sitting on top of it) needs to know assorted information about solvables for whatever it wants to achieve. A common theme for such information is that it's used only relatively seldomly (e.g. only when there are several alternatives, which are completely equivalent from a dependency perspective, but for which other information could provide guidance which of the alternatives would be preferred). I.e. it's used only when the depsolver itself can't determine everything on it's own. To that end such additional information usually is only accessed on demand. This property (seldomly used) is shared with purely diagnostic information (for instance to show in a GUI, like package summary or description), which also only needs to be retrieved when needed. There's another property of all this information: once determined for a repository it usually doesn't change, i.e. is read-only mostly. So, we have already one separation of all the information bits: necessary for the depsolver, hence accessed all the time, and all of it, and "rest". My intent is to implement all this "rest" information as side-band info to the solver itself, so that it doesn't take much space when it isn't actually accessed. Collectively I call this rest information "attributes". At the same time I want accesses to be cumulatively "fast enough", i.e. accessing much of that information should still be fairly fast, even though I'm willing to pay a certain setup price (for initially accessing it). Additionally for such rest information constant memory usage (what has to be kept in-core once it is accessed at all) is more important to me than speed of retrieval, as long as retrieval is "fast enough". For some of these attributes it makes sense to be able to "grep" them, i.e. search for the set of attributes which match certain criteria. Now, at this point this all looks much like I want a database :-) If it weren't for memory requirements (and perhaps ease of use) :-) So, the rough design goals I had in my brain were such (in no particular order): * needs to be possible to store all kinds of information bits: numbers, string, chunks of large data (e.g. for descriptions) * the set of information bits associated with one data-set needs to be dynamic (i.e. can be extended later) * memory usage should be low * access reasonable fast * grepable (i.e. some sort of query interface, but not too elaborate) * should be able to integrate into sat-solver proper To reach these goals I made certain design decisions: * large data blobs are stored separately and are _not_ generally held in core. Only (offset,length) pairs are stored directly. * attributes are name,type,value triples, with name being a string, and value being of several basic types (number, string, blob) * a set of attributes simply is a list of such triples * attribute names actually are uniquified strings, i.e. represented only as IDs (local to one store, other stores might use the same ID for a different name, but the name strings itself are stored and hence uniquied in the Pool). This leads to those name IDs being small numbers * maximum 1<<12 different attribute names are possible per store * maximum 1<<4 types are possible (so (name,type) fits into one short) * for space and efficiency reason we support a store local uniquified string pool. So the values can be two types of strings: - an inline string, represented as is in the list of attributes the same string in different attributes is represented multiple times - an string ID, pointing into the store-local string pool, so all equal strings store in this manner are represented only once (very good e.g. for the License: tag) this string pool is _not_ merged with the Pool stringpool, it would just cost time in loading an attribute store and wouldn't provide many advantages. * two internal representations: one read-only (the space efficient one), one where you can dynamically add attributes. * the read-only representation uses schema abbreviations, which are a list of sorted (name,type) pairs, which can be references by the individual attribute lists (which then only need to store the actual data per data-set, not all the (name,type) specifiers again and again, which tend to be equal for many data-sets. That's like in DWARF (the debug format). I only started to implement this on saturday, so I'm sure I'll revisit some of those decisions over time when I implement the other things necessary. In particular these things are still missing: * easily usable iterator over the read-only representation * grep interface here I intend to provide some functions with callbacks which then get either the matching data-sets (like grep -l), or the data-sets and all (name,value) pair which matched (like normal grep). I also intend to provide different search strategies: * only in certain attributes, or in all of them * only a substring search or a regexp or a fileglob (let's see how determined I am during my holidays :-)) * support for big blobs (that means generating them, storing them away, and loading them on demand, I have in mind either a paging scheme with compressed pages, or an uncompressed scheme) And of course making use of all of this in the several xxx2solv generators. A start can be seen in susetags2solv which (when called with '-a') emits a file "test.attr" containing license, authors, group and keywords of each solvable. Ciao, Michael. -- To unsubscribe, e-mail: zypp-devel+unsubscribe@opensuse.org For additional commands, e-mail: zypp-devel+help@opensuse.org