md5sum of a package

Hi list, I am wondering, how is the md5sum of a package[1] calculated? I have tried the md5sum of all the files[2], but that did not appear to match (also, the question is in which order would the files be concatenated). Thanks in advance, Dan Footnotes: [1] The srcmd5 attribute of the directory element that is displayed via the GET /source/${project}/${package} route [2] cat *|md5sum -- Dan Čermák <dcermak@suse.com> Software Engineer Development tools SUSE Software Solutions Germany GmbH Maxfeldstr. 5 90409 Nuremberg Germany (HRB 36809, AG Nürnberg) Managing Director: Felix Imendörffer

On Jan 25 2021, Dan Čermák wrote:
I am wondering, how is the md5sum of a package[1] calculated? I have tried the md5sum of all the files[2], but that did not appear to match
For non-linked packages: md5sum * | md5sum Ie., the MD5 sum of the md5sum output. For linked packages, it is quite a bit more complicated. Andreas. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different."

Hi Andreas, Andreas Schwab <schwab@linux-m68k.org> writes:
On Jan 25 2021, Dan Čermák wrote:
I am wondering, how is the md5sum of a package[1] calculated? I have tried the md5sum of all the files[2], but that did not appear to match
For non-linked packages: md5sum * | md5sum Ie., the MD5 sum of the md5sum output.
Is that done exactly via this shell command in bash?
For linked packages, it is quite a bit more complicated.
Could you elaborate what is done differently here? I would have expected to run above command over the expanded sources. Thanks, Dan -- Dan Čermák <dcermak@suse.com> Software Engineer Development tools SUSE Software Solutions Germany GmbH Maxfeldstr. 5 90409 Nuremberg Germany (HRB 36809, AG Nürnberg) Managing Director: Felix Imendörffer

On Montag, 25. Januar 2021, 16:21:43 CET Dan Čermák wrote:
Hi Andreas,
Andreas Schwab <schwab@linux-m68k.org> writes:
On Jan 25 2021, Dan Čermák wrote:
I am wondering, how is the md5sum of a package[1] calculated? I have tried the md5sum of all the files[2], but that did not appear to match
For non-linked packages: md5sum * | md5sum Ie., the MD5 sum of the md5sum output.
Is that done exactly via this shell command in bash?
For linked packages, it is quite a bit more complicated.
Could you elaborate what is done differently here? I would have expected to run above command over the expanded sources.
you would get the verifymd5 sum then. But not the xsrcmd5, which includes also the linkinformation of the _link files. So it depends which md5sum you actually want. -- Adrian Schroeter <adrian@suse.de> Build Infrastructure Project Manager SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nuernberg, Germany (HRB 247165, AG München), Geschäftsführer: Felix Imendörffer

Adrian Schröter <adrian@suse.de> writes:
On Montag, 25. Januar 2021, 16:21:43 CET Dan Čermák wrote:
Hi Andreas,
Andreas Schwab <schwab@linux-m68k.org> writes:
On Jan 25 2021, Dan Čermák wrote:
I am wondering, how is the md5sum of a package[1] calculated? I have tried the md5sum of all the files[2], but that did not appear to match
For non-linked packages: md5sum * | md5sum Ie., the MD5 sum of the md5sum output.
Is that done exactly via this shell command in bash?
For linked packages, it is quite a bit more complicated.
Could you elaborate what is done differently here? I would have expected to run above command over the expanded sources.
you would get the verifymd5 sum then. But not the xsrcmd5, which includes also the linkinformation of the _link files.
Uh, what's verifymd5? My google-skills have failed me here.
So it depends which md5sum you actually want.
I want to calculate some unique hash of a (potentially modified and checked-out) package. I could use some custom scheme, but I thought it would be more productive to use the same method that OBS itself employs. Using the packages md5sum appeared like the simplest choice to me. Cheers, Dan -- Dan Čermák <dcermak@suse.com> Software Engineer Development tools SUSE Software Solutions Germany GmbH Maxfeldstr. 5 90409 Nuremberg Germany (HRB 36809, AG Nürnberg) Managing Director: Felix Imendörffer

On Mon, Jan 25, 2021 at 10:24:48PM +0100, Dan Čermák wrote:
Uh, what's verifymd5? My google-skills have failed me here.
That's the md5sum over the expanded files. It's called "verifymd5" because it is what the obs build workers use to verify that the sources they downloaded match the build job. For packages without an expanded _link, the srcmd5 is equal to the verifymd5. For expanded links, the srcmd5 is different, see my other mail. Cheers, Michael. -- Michael Schroeder SUSE Software Solutions Germany GmbH mls@suse.de GF: Felix Imendoerffer HRB 36809, AG Nuernberg main(_){while(_=~getchar())putchar(~_-1/(~(_|32)/13*2-11)*13);}

On 2021-01-25 22:24:48 +0100, Dan Čermák wrote:
Adrian Schröter <adrian@suse.de> writes:
On Montag, 25. Januar 2021, 16:21:43 CET Dan Čermák wrote:
Andreas Schwab <schwab@linux-m68k.org> writes:
On Jan 25 2021, Dan Čermák wrote:
I am wondering, how is the md5sum of a package[1] calculated? I have tried the md5sum of all the files[2], but that did not appear to match
For non-linked packages: md5sum * | md5sum Ie., the MD5 sum of the md5sum output.
Is that done exactly via this shell command in bash?
For linked packages, it is quite a bit more complicated.
Could you elaborate what is done differently here? I would have expected to run above command over the expanded sources.
you would get the verifymd5 sum then. But not the xsrcmd5, which includes also the linkinformation of the _link files.
Uh, what's verifymd5? My google-skills have failed me here.
See above, the md5 sum of the md5sum outputs (where the filenames are sorted lexicographically). That is, in bash md5sum - <<EOF $(md5sum file1) ... $(md5sum fileN) EOF where file1 < ... < fileN (wrt. the lexicographical order) For instance, if you have a package (or more precisely a fileset of a package) that consists of two files md5sum filename 401b30e3b8b5d629635a5c613cdb7919 x 009520053b00386d1173f3988c55d192 y its verifymd5 is cd5c0c5e01f0843238b2dab8205cb27b. Note that the verifymd5 always belongs to a "fileset" of a package. For instance, the unexpanded fileset and the expanded fileset of a linked/branched package usually have different verifymd5s (unless there is a md5 collision...). (I tried to come up with a concrete example, which illustrates the different verifymd5s, via the "GET /source/<prj>/<a branched pkg>?view=info" vs. the "GET /source/<prj>/<a branched pkg>?view=info&noexpand=1" routes, but the API ignores the "noexpand" parameter:/ )
So it depends which md5sum you actually want.
I want to calculate some unique hash of a (potentially modified and checked-out) package. I could use some custom scheme, but I thought it would be more productive to use the same method that OBS itself employs. Using the packages md5sum appeared like the simplest choice to me.
In order to detect "local changes" it is probably the best to simply compare the actual md5 with the expected md5 of the individual files. In essence, that's also what osc does. Or if you need this per package fileset, you could compare the verifymd5s (actual/computed vs. expected). Marcus

Hi Marcus, Marcus Hüwe <suse-tux@gmx.de> writes:
On 2021-01-25 22:24:48 +0100, Dan Čermák wrote:
Adrian Schröter <adrian@suse.de> writes:
On Montag, 25. Januar 2021, 16:21:43 CET Dan Čermák wrote:
Andreas Schwab <schwab@linux-m68k.org> writes:
On Jan 25 2021, Dan Čermák wrote:
I am wondering, how is the md5sum of a package[1] calculated? I have tried the md5sum of all the files[2], but that did not appear to match
For non-linked packages: md5sum * | md5sum Ie., the MD5 sum of the md5sum output.
Is that done exactly via this shell command in bash?
For linked packages, it is quite a bit more complicated.
Could you elaborate what is done differently here? I would have expected to run above command over the expanded sources.
you would get the verifymd5 sum then. But not the xsrcmd5, which includes also the linkinformation of the _link files.
Uh, what's verifymd5? My google-skills have failed me here.
See above, the md5 sum of the md5sum outputs (where the filenames are sorted lexicographically). That is, in bash
md5sum - <<EOF $(md5sum file1) ... $(md5sum fileN) EOF
where file1 < ... < fileN (wrt. the lexicographical order)
For instance, if you have a package (or more precisely a fileset of a package) that consists of two files
md5sum filename 401b30e3b8b5d629635a5c613cdb7919 x 009520053b00386d1173f3988c55d192 y
its verifymd5 is cd5c0c5e01f0843238b2dab8205cb27b.
Thank you for this in-depth explanation!
Note that the verifymd5 always belongs to a "fileset" of a package. For instance, the unexpanded fileset and the expanded fileset of a linked/branched package usually have different verifymd5s (unless there is a md5 collision...). (I tried to come up with a concrete example, which illustrates the different verifymd5s, via the "GET /source/<prj>/<a branched pkg>?view=info" vs. the "GET /source/<prj>/<a branched pkg>?view=info&noexpand=1" routes, but the API ignores the "noexpand" parameter:/ )
So it depends which md5sum you actually want.
I want to calculate some unique hash of a (potentially modified and checked-out) package. I could use some custom scheme, but I thought it would be more productive to use the same method that OBS itself employs. Using the packages md5sum appeared like the simplest choice to me.
In order to detect "local changes" it is probably the best to simply compare the actual md5 with the expected md5 of the individual files. In essence, that's also what osc does. Or if you need this per package fileset, you could compare the verifymd5s (actual/computed vs. expected).
I'm probably going to use the verifymd5 for this, as it appears to be the simplest to use in this case. Thanks again everyone for your answers! Dan -- Dan Čermák <dcermak@suse.com> Software Engineer Development tools SUSE Software Solutions Germany GmbH Maxfeldstr. 5 90409 Nuremberg Germany (HRB 36809, AG Nürnberg) Managing Director: Felix Imendörffer

On Jan 25 2021, Dan Čermák wrote:
Could you elaborate what is done differently here? I would have expected to run above command over the expanded sources.
If you want to dig deep, take a look at src/backend/BSSrcServer/Link.pm in the OBS sources. Good luck. Running osc info in the local checkout can help here. Andreas. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different."

Could you elaborate what is done differently here? I would have expected to run above command over the expanded sources.
It's basically the md5sum over two lines describing the srcmd5 of the link source and the srcmd5 of the link target. Here's an example for Base:System/screen: 3de15dbf3a2aae904a49b461f79faf47 /LINK 90e45521074cf279fb2feec55993a405 /LOCAL i.e. the unexpanded link had srcmd5 90e45521074cf279fb2feec55993a405, the link pointed to package screen in openSUSE:Factory with srcmd5 3de15dbf3a2aae904a49b461f79faf47. The md5sum over the above lines is 7161f02f1ada22b812ae3660d942b03b, the srcmd5 of the expanded link. Cheers, Michael. -- Michael Schroeder SUSE Software Solutions Germany GmbH mls@suse.de GF: Felix Imendoerffer HRB 36809, AG Nuernberg main(_){while(_=~getchar())putchar(~_-1/(~(_|32)/13*2-11)*13);}
participants (5)
-
Adrian Schröter
-
Andreas Schwab
-
Dan Čermák
-
Marcus Hüwe
-
Michael Schroeder