[zypp-devel] reading file lists from metadata
I noticed that now we read filenames from <file> tags of YUM metadata in a separate parser run (see YUMSourceImpl::providePackages() - YUMFileListParser, YUMPrimaryParser, ...). Is there any particular reason for this? I intend to read all the metadata in one run and insert it one chunk after another into the db (e.g. read one <package> after another and store it using CacheStore::appendResolvable() with all available metadata). jano -- To unsubscribe, e-mail: zypp-devel+unsubscribe@opensuse.org For additional commands, e-mail: zypp-devel+help@opensuse.org
I'm not sure I understand the question properly, but is the possiblity that the packages are in different files in different order the answer for you? Another quetsion: How will libxml behave if you parse three files simulatenously? (primary.xml, other.xml, filelist.xml) Jiri Dne pátek 27 duben 2007 13:16 Jan Kupec napsal(a):
I noticed that now we read filenames from <file> tags of YUM metadata in a separate parser run (see YUMSourceImpl::providePackages() - YUMFileListParser, YUMPrimaryParser, ...).
Is there any particular reason for this?
I intend to read all the metadata in one run and insert it one chunk after another into the db (e.g. read one <package> after another and store it using CacheStore::appendResolvable() with all available metadata).
jano
-- Regards, Jiri Srain YaST Team Leader --------------------------------------------------------------------- SUSE LINUX, s.r.o. e-mail: jsrain@suse.cz Lihovarska 1060/12 tel: +420 284 028 959 190 00 Praha 9 fax: +420 284 028 951 Czech Republic http://www.suse.cz
* Jiri Srain
I'm not sure I understand the question properly, but is the possiblity that the packages are in different files in different order the answer for you?
Another quetsion: How will libxml behave if you parse three files simulatenously? (primary.xml, other.xml, filelist.xml)
Huh ? Why should one do that ? And why is order important ? Parse primary first, read one package, write one package. Then other. Then filelist. If there is a package in primary with no data in other or filelist, it will have incomplete attributes. This shouldn't harm. If there is a package in either other or filelist, but without representation in primary, log a warning and continue. Klaus -- To unsubscribe, e-mail: zypp-devel+unsubscribe@opensuse.org For additional commands, e-mail: zypp-devel+help@opensuse.org
Dne pátek 27 duben 2007 13:30 Klaus Kaempf napsal(a):
* Jiri Srain
[Apr 27. 2007 13:26]: I'm not sure I understand the question properly, but is the possiblity that the packages are in different files in different order the answer for you?
Another quetsion: How will libxml behave if you parse three files simulatenously? (primary.xml, other.xml, filelist.xml)
Huh ? Why should one do that ?
See below.
And why is order important ?
Parse primary first, read one package, write one package. Then other.
Find each package in the database, modify it. I thought that was what Jano wanted to avoid. Jiri
Then filelist.
If there is a package in primary with no data in other or filelist, it will have incomplete attributes. This shouldn't harm.
If there is a package in either other or filelist, but without representation in primary, log a warning and continue.
Klaus
-- Regards, Jiri Srain YaST Team Leader --------------------------------------------------------------------- SUSE LINUX, s.r.o. e-mail: jsrain@suse.cz Lihovarska 1060/12 tel: +420 284 028 959 190 00 Praha 9 fax: +420 284 028 951 Czech Republic http://www.suse.cz
On Friday 27 April 2007 13:41:46 Jiri Srain wrote:
Find each package in the database, modify it. I thought that was what Jano wanted to avoid.
When you insert a package, you get the id. You can get a map with the ids. Those optmizations could also be introduced in the cache store. Duncan -- To unsubscribe, e-mail: zypp-devel+unsubscribe@opensuse.org For additional commands, e-mail: zypp-devel+help@opensuse.org
Duncan Mac-Vicar Prett wrote:
On Friday 27 April 2007 13:41:46 Jiri Srain wrote:
Find each package in the database, modify it. I thought that was what Jano wanted to avoid.
I meant to avoid the memory consumption that would be necessary otherwise. filelists.xml.gz for factory has 13 megs currently. If done like read package into mem, insert into db, delete from mem, memory consumption is zero...
When you insert a package, you get the id. You can get a map with the ids.
... if done this way, the memory overhead would be only this map with the ids. Still much better than before.
Those optmizations could also be introduced in the cache store.
I would go for this, if it would mean good memory vs. speed-up trade-off. jano -- To unsubscribe, e-mail: zypp-devel+unsubscribe@opensuse.org For additional commands, e-mail: zypp-devel+help@opensuse.org
* Jan Kupec
When you insert a package, you get the id. You can get a map with the ids.
... if done this way, the memory overhead would be only this map with the ids. Still much better than before.
Those optmizations could also be introduced in the cache store.
Yes, thats the right approach. So it can be implemented as (costly) selects first and optimized by (in-memory) maps later. Klaus -- To unsubscribe, e-mail: zypp-devel+unsubscribe@opensuse.org For additional commands, e-mail: zypp-devel+help@opensuse.org
* Jiri Srain
And why is order important ?
Parse primary first, read one package, write one package. Then other.
Find each package in the database, modify it. I thought that was what Jano wanted to avoid.
sqlite is reasonably fast and does quite good caching. It might be helpful to have different tables for (the attributes contained in) primary, other, and filelists. So you don't have to modify a table entry, but just to query the 'master' table for the package id and do 'stream' writing to the other tables. With an index, this should be cheap. Klaus -- To unsubscribe, e-mail: zypp-devel+unsubscribe@opensuse.org For additional commands, e-mail: zypp-devel+help@opensuse.org
On Fri, Apr 27, Jan Kupec wrote:
I noticed that now we read filenames from <file> tags of YUM metadata in a separate parser run (see YUMSourceImpl::providePackages() - YUMFileListParser, YUMPrimaryParser, ...).
Is there any particular reason for this?
I intend to read all the metadata in one run and insert it one chunk after another into the db (e.g. read one <package> after another and store it using CacheStore::appendResolvable() with all available metadata).
As the data for one package are spread across multiple files, you had to be shure that all files contain the packages in the same order. IMO you must should not assume that. -- cu, Michael Andres +------------------------------------------------------------------+ Key fingerprint = 2DFA 5D73 18B1 E7EF A862 27AC 3FB8 9E3A 27C6 B0E4 +------------------------------------------------------------------+ Michael Andres YaST Development ma@novell.com SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nuernberg) Maxfeldstrasse 5, D-90409 Nuernberg, Germany, ++49 (0)911 - 740 53-0 +------------------------------------------------------------------+ -- To unsubscribe, e-mail: zypp-devel+unsubscribe@opensuse.org For additional commands, e-mail: zypp-devel+help@opensuse.org
participants (5)
-
Duncan Mac-Vicar Prett
-
Jan Kupec
-
Jiri Srain
-
Klaus Kaempf
-
Michael Andres