How to remove duplicates (all but installed ver) from /var/cache/zypp/packages ?
All, We have touched on this before, years ago, 10/11.x days, but the opensuse package name format and variations have changed a bit. What I want to do is find /var/cache/zypp/packages -type f -name "*rpm" | sort --version-sort all the rpm files in /var/cache/zypp/package and remove all duplicates leaving the currently installed rpm version in /var/cache using awk to split path/rpmname and then outputting the record while last == current to pipe to xargs rm -f (which seems to work fine and I'm not worrying about rpm names containing the '\n' character which would necessitate creating and processing records with a nul-character at the end, e.g. -print0 and xargs -0) How to best do that? So long as find ... | sort --verison-sort will give the rpms in version sort order, piping to a short awk script and then to xargs will do. I did this on a few zypp/package dirs by hand, but now I want to automate processing the entire cache for those repos that have --keep-packages set. But before I spend an hour or so testing and tweaking is the a standard way to discard all but installed from /var/cache -- that would keep me from reinventing the wheel? (I have additional repos like devel-gcc, etc.. so package names can have +gitxxx as a version number suffix -- if that makes a difference) -- David C. Rankin, J.D.,P.E.
On Fri, 10 Mar 2023 23:21:44 -0600, "David C. Rankin" <drankinatty@suddenlinkmail.com> wrote:
All,
We have touched on this before, years ago, 10/11.x days, but the opensuse package name format and variations have changed a bit.
What I want to do is
find /var/cache/zypp/packages -type f -name "*rpm" | sort --version-sort
all the rpm files in /var/cache/zypp/package and remove all duplicates leaving the currently installed rpm version in /var/cache using awk to split path/rpmname and then outputting the record while last == current to pipe to xargs rm -f
(which seems to work fine and I'm not worrying about rpm names containing the '\n' character which would necessitate creating and processing records with a nul-character at the end, e.g. -print0 and xargs -0)
How to best do that? So long as find ... | sort --verison-sort will give the rpms in version sort order, piping to a short awk script and then to xargs will do. I did this on a few zypp/package dirs by hand, but now I want to automate processing the entire cache for those repos that have --keep-packages set.
But before I spend an hour or so testing and tweaking is the a standard way to discard all but installed from /var/cache -- that would keep me from reinventing the wheel?
I don't know an off-the-shelf method. Wouldn't it be possible to generate a list of all installed packages, and their rpm filenames? Then if you searched for all rpm files, your job would be to for each file, check the list and don't delete it if it is listed. -- Robert Webb
On Sat, 11 Mar 2023 09:37:58 +0300, Andrei Borzenkov <arvidjaar@gmail.com> wrote:
On 11.03.2023 08:50, Robert Webb wrote:
Wouldn't it be possible to generate a list of all installed packages, and their rpm filenames?
No. Package does not care how its container was named before/during installation. You have to query each file to see what is inside.
Alright, but then how is a name of an rpm file generated in the first place, to be able to install it? An rpm filename cannot be the starting point (information wise) for the process of package installation. The filename must be derivable from other information. -- Robert Webb
On 11.03.2023 12:05, Robert Webb wrote:
On Sat, 11 Mar 2023 09:37:58 +0300, Andrei Borzenkov <arvidjaar@gmail.com> wrote:
On 11.03.2023 08:50, Robert Webb wrote:
Wouldn't it be possible to generate a list of all installed packages, and their rpm filenames?
No. Package does not care how its container was named before/during installation. You have to query each file to see what is inside.
Alright, but then how is a name of an rpm file generated in the first place, to be able to install it?
Name of RPM file is just a convention, nothing more. If you want to know how it is generated, you need to read sources of a program used to create it. But name of file has no impact on ability to install package in this file.
An rpm filename cannot be the starting point (information wise) for the process of package installation.
Again - filename is completely irrelevant and includes package NVR just for your (general user) convenience. Zypper works with repository index which associates filenames with actual package (name, version, release, architecture) tuple. Repository index is generated by scanning files.
The filename must be derivable from other information.
By default filename is derived from package name, version, release and rpmbuild creates these names. I am pretty sure it is possible to change in RPM settings ... yep bor@bor-Latitude-E5450:~$ rpm --eval %{_rpmfilename} %{ARCH}/%{NAME}-%{VERSION}-%{RELEASE}.%{ARCH}.rpm bor@bor-Latitude-E5450:~$ But nothing would prevent renaming files in repository to their hash as example (before generating index). Zypper would continue to work as before. If you really want to search for cached RPM files which are currently installed, you need to query each file for the package inside and look whether this package is installed. Anything else is going to break sooner or later.
On Sat, 11 Mar 2023 13:31:38 +0300, Andrei Borzenkov <arvidjaar@gmail.com> wrote:
On 11.03.2023 12:05, Robert Webb wrote:
On Sat, 11 Mar 2023 09:37:58 +0300, Andrei Borzenkov <arvidjaar@gmail.com> wrote:
On 11.03.2023 08:50, Robert Webb wrote:
Wouldn't it be possible to generate a list of all installed packages, and their rpm filenames?
No. Package does not care how its container was named before/during installation. You have to query each file to see what is inside.
Alright, but then how is a name of an rpm file generated in the first place, to be able to install it?
Name of RPM file is just a convention, nothing more. If you want to know how it is generated, you need to read sources of a program used to create it. But name of file has no impact on ability to install package in this file.
An rpm filename cannot be the starting point (information wise) for the process of package installation.
Again - filename is completely irrelevant and includes package NVR just for your (general user) convenience. Zypper works with repository index which associates filenames with actual package (name, version, release, architecture) tuple. Repository index is generated by scanning files.
By scanning the contents of rpm files, right? OK, I was missing that concept. So the source for the repo index is the rpm file, and by extension, its filename. I was thinking of the process starting from the user requesting a certain package name from zypper, possibly with a version and architecture.
The filename must be derivable from other information.
By default filename is derived from package name, version, release and rpmbuild creates these names. I am pretty sure it is possible to change in RPM settings ... yep
bor@bor-Latitude-E5450:~$ rpm --eval %{_rpmfilename} %{ARCH}/%{NAME}-%{VERSION}-%{RELEASE}.%{ARCH}.rpm bor@bor-Latitude-E5450:~$
But nothing would prevent renaming files in repository to their hash as example (before generating index). Zypper would continue to work as before.
If you really want to search for cached RPM files which are currently installed, you need to query each file for the package inside and look whether this package is installed. Anything else is going to break sooner or later.
And it can break because the repo index can change to associate a different rpm filename with a certain package, version, etc, or that entry can even disappear. OK, thanks for the great info. Very enlightening. -- Robert Webb
On 2023-03-11 11:31, Andrei Borzenkov wrote: ...
If you really want to search for cached RPM files which are currently installed, you need to query each file for the package inside and look whether this package is installed. Anything else is going to break sooner or later.
And one use of the cache is to export it via NFS and use in other machines. A package might not be used in this machine but used in another. I would perhaps go by dates. for each rpm, check if older/newer versions exist, find the newest, then delete those older than whatever interval. But I don't use TW, so this is not a problem I have. On Leap I keep all. -- Cheers / Saludos, Carlos E. R. (from 15.4 x86_64 at Telcontar)
On 3/10/23 23:50, Robert Webb wrote:
I don't know an off-the-shelf method. Wouldn't it be possible to generate a list of all installed packages, and their rpm filenames? Then if you searched for all rpm files, your job would be to for each file, check the list and don't delete it if it is listed.
It is possible, but that greatly increases the processing needed. Not all files are cached. main and update repos are not cached. So doing, e.g. rpm -qa to generate the list would generate thousands of packages not cached and require a 2 - file pass with awk. One to process the output or rpm -qa and build an array of installed versions and then another to process the files found in /var/cache. It's easy enough to parse the version number from the rpm -qa records returned (essentially the rpm name without the .rpm -- but I haven't tested to see if that holds in all cases) As long as the --version-sort provides a sorted list, it's an easy single-pass to parse the package name and drop all but the last. Essentially I want either rpm or zypper to behave like pacman -Sc (--sync --clean). (that would also be a very very nice addition for Tumbleweed ...) -- David C. Rankin, J.D.,P.E.
From: "David C. Rankin" <drankinatty@suddenlinkmail.com> Date: Fri, 10 Mar 2023 23:21:44 -0600 All, We have touched on this before, years ago, 10/11.x days, but the opensuse package name format and variations have changed a bit. What I want to do is find /var/cache/zypp/packages -type f -name "*rpm" | sort --version-sort all the rpm files in /var/cache/zypp/package and remove all duplicates leaving the currently installed rpm version in /var/cache . . . But before I spend an hour or so testing and tweaking is the a standard way to discard all but installed from /var/cache -- that would keep me from reinventing the wheel? (I have additional repos like devel-gcc, etc.. so package names can have +gitxxx as a version number suffix -- if that makes a difference) Probably not -- as long as "rpm -qa" reports the version part properly as part of the package name. -- David C. Rankin, J.D.,P.E. Here's a Perl solution that doesn't require comparing versions except for exact matches; if the RPM is installed, the file is filtered out of the file list. # find /var/cache/zypp/packages -type f | find-old-rpms.pl | xargs rm -f But it seems like it should be the same problem since RPM Day 1, since the format used for file names and "rpm -q" hasn't really changed (except to add the architecture), has it? -- Bob Rogers http://www.rgrjr.com/ ------------------------------------------------------------------------ #!/usr/bin/perl # # Given a list of RPM file names on stdin (as from ls or find), eliminate the # ones that correspond to installed versions. # # [created. -- rgr, 10-Mar-23.] # use strict; use warnings; my %file_from_package; while (<>) { chomp; # Get the "pkg.arch" in ".../pkg.arch.rpm". my ($package) = m@/([^/]+)[.]d?rpm$@ or next; $file_from_package{$package} = $_; } open(my $in, 'rpm -qa |') or die "$0: Can't open pipe from rpm: $!"; while (<$in>) { chomp; delete($file_from_package{$_}); } for my $package (sort(keys(%file_from_package))) { print $file_from_package{$package}, "\n"; }
On 3/11/23 00:23, Bob Rogers wrote:
# find /var/cache/zypp/packages -type f | find-old-rpms.pl | xargs rm -f
But it seems like it should be the same problem since RPM Day 1, since the format used for file names and "rpm -q" hasn't really changed (except to add the architecture), has it?
The only real issue is the --version-sort which is what is needed. Any numeric or natural sort based on LOCALE suffers from the difference in version digits problem: name-102.4[+gitxxx]-lp154.relmaj.relmin.x86_64.rpm and name-96.4[+gitxxx]-lp154.relmaj.relmin.x86_64.rpm At least for an overwhelming majority, opensuse is consistent with the names and sort with --version-sort has been 100% accurate so far. Worst case you could output additional (date) info from find with -printf that would decorate the rpm name with a date-time that could also be easily sorted, though the --version-sort is preferable. awk or perl is fine, but the preference for awk just comes from the blistering processing speed. Thousands or tens of thousands of rpm files handled in a fraction of a second. (though in reality it really doesn't matter if it takes minutes -- as long is it can be 100% reliable on leaving the currently installed rpm) I guess we could build the list of files to pass to xargs for deletion, but before deletion compare with the list of installed rpms to verify the deletion list doesn't contain the installed version of the rpm (though that would require a sizeable array in awk to do the lookup for ( file in arr ) -- but that would still be orders of magnitude faster than looping grep. Or, you could just write out the rpm name from awk in the same format as returned by rpm -qa and then do a single grep -qf to test. (this was part of why I was really hoping zypper had gotten smarter ...) -- David C. Rankin, J.D.,P.E.
From: "David C. Rankin" <drankinatty@suddenlinkmail.com> Date: Sat, 11 Mar 2023 03:23:44 -0600 On 3/11/23 00:23, Bob Rogers wrote:
# find /var/cache/zypp/packages -type f | find-old-rpms.pl | xargs rm -f
But it seems like it should be the same problem since RPM Day 1, since the format used for file names and "rpm -q" hasn't really changed (except to add the architecture), has it?
The only real issue is the --version-sort which is what is needed . . . There is no need to sort; all you need is an exact match against the installed version, since that's all you care about. Which is what the Perl script does by using a hash table, so it should be O(N) instead of O(N log(N)). awk or perl is fine, but the preference for awk just comes from the blistering processing speed. Thousands or tens of thousands of rpm files handled in a fraction of a second. (though in reality it really doesn't matter if it takes minutes -- as long is it can be 100% reliable on leaving the currently installed rpm) From a single timing of find-old-rpms.pl (on an admittedly small cache, since I don't keep updates), almost the entire 3sec processing time is due to "rpm -qa". And (theoretically, at least) hashing ought to scale better than sorting. I guess we could build the list of files to pass to xargs for deletion, but before deletion compare with the list of installed rpms to verify the deletion list doesn't contain the installed version of the rpm (though that would require a sizeable array in awk to do the lookup for ( file in arr ) -- but that would still be orders of magnitude faster than looping grep. And memory is cheap. Or, you could just write out the rpm name from awk in the same format as returned by rpm -qa and then do a single grep -qf to test. But a hash test is faster. Doesn't awk have hashes? If so, you could rewrite the perl into awk, and skip the grep. (this was part of why I was really hoping zypper had gotten smarter ...) -- David C. Rankin, J.D.,P.E. ;-} -- Bob
W dniu 11.03.2023 o 06:21, David C. Rankin pisze:
All,
We have touched on this before, years ago, 10/11.x days, but the opensuse package name format and variations have changed a bit.
What I want to do is
find /var/cache/zypp/packages -type f -name "*rpm" | sort --version-sort
all the rpm files in /var/cache/zypp/package and remove all duplicates leaving the currently installed rpm version in /var/cache using awk to split path/rpmname and then outputting the record while last == current to pipe to xargs rm -f
[snip] On a related note: I checked Zypper sources and there is only one way of cleaning cache: deleting everything. There is an issue pending about implementing something smarter: https://github.com/openSUSE/zypper/issues/229
On Saturday 11 March 2023, David C. Rankin wrote:
All,
We have touched on this before, years ago, 10/11.x days, but the opensuse package name format and variations have changed a bit.
What I want to do is
find /var/cache/zypp/packages -type f -name "*rpm" | sort --version-sort
all the rpm files in /var/cache/zypp/package and remove all duplicates leaving the currently installed rpm version in /var/cache using awk to split path/rpmname and then outputting the record while last == current to pipe to xargs rm -f
(which seems to work fine and I'm not worrying about rpm names containing the '\n' character which would necessitate creating and processing records with a nul-character at the end, e.g. -print0 and xargs -0)
How to best do that? So long as find ... | sort --verison-sort will give the rpms in version sort order, piping to a short awk script and then to xargs will do. I did this on a few zypp/package dirs by hand, but now I want to automate processing the entire cache for those repos that have --keep-packages set.
But before I spend an hour or so testing and tweaking is the a standard way to discard all but installed from /var/cache -- that would keep me from reinventing the wheel?
(I have additional repos like devel-gcc, etc.. so package names can have +gitxxx as a version number suffix -- if that makes a difference)
Would this help - I wrote a bash gist to "Purge off any /var/cache/zypp rpms that are not installed in this system" https://gist.github.com/digitaltrails/cfd6617be9f0e5cfbdaaf951d4526aef Michael
On 3/12/23 04:16, Michael Hamilton wrote:
Would this help - I wrote a bash gist to "Purge off any /var/cache/zypp rpms that are not installed in this system"
https://gist.github.com/digitaltrails/cfd6617be9f0e5cfbdaaf951d4526aef
Michael
Yep, That's basically the 2nd option discussed in the thread of using `rpm -qa` output as a lookup to output any cached version not in the rpm -qa output for piping to xargs. I like the rpm_index array -- that would also let you use an "cached in rpm_index" test to output all not in rpm_index -- which may tweak performance a bit. I'll grab it and finish going through it. Thanks! -- David C. Rankin, J.D.,P.E.
participants (7)
-
Adam Mizerski
-
Andrei Borzenkov
-
Bob Rogers
-
Carlos E. R.
-
David C. Rankin
-
Michael Hamilton
-
Robert Webb