Re: ALP Git-Packaging-Workflow group created
On Wed, Jul 20, 2022 at 8:44 AM Jan Engelhardt
On Wednesday 2022-07-20 14:13, Dirk Müller wrote:
* Exploration of the Git Hosting options (run by me) openSUSE already has some experience with Pagure.io (used by https://code.opensuse.org/ for example) and GitHub (used by various openSUSE projects under https://github.com/openSUSE) as well as some projects also in GitLab (used by a few community efforts as well as SUSE-internally for hosting) so we explored another option named Gitea, which is now available to the community under https://gitea.opensuse.org/.
gitea with the lfs approach works well.
I wonder how the ipfs symlinks stored in pagure would be handled. Fuse mount, or another global hook like git-lfs?
Right now, we don't have any blob storage system attached to Pagure. We can do whatever we want for this. Fedora's Dist-Git setup[0] leverages a wrapper tool called fedpkg[1] built on the rpkg[2] framework. We don't have to do it the same way, but there are a lot of nice things from having a swiss army knife tool for packaging. We have this today with OBS through osc, for example. After working with Git-LFS for years for normal development, I'm not a huge fan of it. A couple of issues I have with LFS: * It's incredibly difficult to mirror LFS, and there's no straightforward way to do it with git repos, given how the integration works * LFS easily breaks when modifications occur across forks, and it's difficult to identify the problems there and fix them There are other issues that happen specific to LFS server implementations. For example, GitLab's implementation imposes ACLs on each blob which makes it difficult for forks made by non-members to work reliably. GitHub's implementation has issues where smudging can be broken when doing complex merges that have LFS objects touched. For some of the awkwardness around Dist-Git, these problems don't happen because it's a transparent blob cache and the only thing written to Git is a plain text file with the filename and checksum, which is used as the blob reference. Authorization to upload and download those blobs is controlled by packager ACLs or whatever you wish to use to control that, making it effectively global, so it works properly across forks, or even cross-server forks. With my downstream hat, that's also useful if I'm building patched derivative packages, because I can use the Dist-Git server references even without my own Git repo on my own Git server. It makes CI and all kinds of other things easier too. [0]: https://github.com/release-engineering/dist-git [1]: https://pagure.io/fedpkg [2]: https://pagure.io/rpkg
* Exploration of the history import from the SUSE and openSUSE build service source history [...] Your feedback on the history import quality and how it could be made better is very much appreciated.
I would make the case that OBS history is not particularly interesting for its messages, because (a) we have had .changes files, (b) the SVN-ish nature of the OBS SCM meant that you can't go back and edit messages or commits, so there is little incentive to write up much if you missed it in the first commit.
The history is better than nothing because things like Git blame work reasonably okay. When Bernhard was working on the original setup, he didn't bother because it was a read-only setup. If we transition to Git, we probably want the attribution and messages to be more accurate because the Git history is useful for working through the package's development. It would also open the door for things like rpmautospec to be used in the future[3]. [3]: https://docs.pagure.org/Fedora-Infra.rpmautospec/ -- 真実はいつも一つ!/ Always, there's only one truth!
On Wednesday, July 20, 2022 4:02:12 PM CEST Neal Gompa wrote:
GitHub's implementation has issues where smudging can be broken when doing complex merges that have LFS objects touched.
I do not think that LFS is set into stone yet, and this kind of interaction should be tested. IMHO one of the advantage of LFS is that is a very simple protocol, that can behave very nice with OBS. For example, there is currently in the PoC a proxy in place that routes back to OBS for old tgz, avoiding duplication. Future commits can delegate into OBS or another LFS server from this point, without much notice from the users PoV. The dist-git / fedpkg idea was commented too. I personally do not like it. The point is a bit irrelevant, but as a developer, when I am using a model that is like the one in OBS (tarball, spec, changes and patches), I do not like the context switch that requires fedpkg to manage the tgz at a different level. In that regard "osc" is better, and with git-LFS there is a chance to replicate this experience using a mostly pure git model. In any case this should be stressed, validated and revisited.
The history is better than nothing because things like Git blame work reasonably okay.
+1 on this. -- SUSE Software Solutions Germany GmbH Frankenstrasse 146 90461 Nuremberg Germany Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman (HRB 36809, AG Nürnberg)
On Mittwoch, 20. Juli 2022, 16:39:55 CEST Alberto Planas wrote:
On Wednesday, July 20, 2022 4:02:12 PM CEST Neal Gompa wrote:
GitHub's implementation has issues where smudging can be broken when doing complex merges that have LFS objects touched.
I do not think that LFS is set into stone yet, and this kind of interaction should be tested.
yes, definitivly. It was mainly used to represent the file blobs from OBS source server commits. We wanted to avoid to duplicate all these files and therefore created a _read-only_ LFS bridge to OBS source server. This means also that * the .lfsconfig will also work when you move a git repository to another git hoster, as the URL to the LFS storage is hardcoded * the .lfsconfig needs to be removed/adapted when replacing the assets. IMHO using the pbuild/OBS remote assets would be a good way to go here, but it does not mean that this is the only allowed way. Using the remote assets we would avoid the mentioned problems when pushing the git repo to another host and we also would keep tracking data from where the resources came (something what is not working with LFS out of the box). It would also help us to create proper provenance files for reproducibility. Something what can only increase the trust in our code base ...
IMHO one of the advantage of LFS is that is a very simple protocol, that can behave very nice with OBS. For example, there is currently in the PoC a proxy in place that routes back to OBS for old tgz, avoiding duplication. Future commits can delegate into OBS or another LFS server from this point, without much notice from the users PoV.
Right, you can find an example package here btw:
https://gitea.opensuse.org/adrianSuSE/git-example-lfs
The .lfsconfig is pointing to a (non final) OBS LFS provider.
But again, this is only intended for past commits, not for future ones
when git will be the master source.
--
Adrian Schroeter
On Wed, Jul 20, 2022 at 10:40 AM Alberto Planas
On Wednesday, July 20, 2022 4:02:12 PM CEST Neal Gompa wrote:
GitHub's implementation has issues where smudging can be broken when doing complex merges that have LFS objects touched.
I do not think that LFS is set into stone yet, and this kind of interaction should be tested.
IMHO one of the advantage of LFS is that is a very simple protocol, that can behave very nice with OBS. For example, there is currently in the PoC a proxy in place that routes back to OBS for old tgz, avoiding duplication. Future commits can delegate into OBS or another LFS server from this point, without much notice from the users PoV.
The dist-git / fedpkg idea was commented too. I personally do not like it. The point is a bit irrelevant, but as a developer, when I am using a model that is like the one in OBS (tarball, spec, changes and patches), I do not like the context switch that requires fedpkg to manage the tgz at a different level. In that regard "osc" is better, and with git-LFS there is a chance to replicate this experience using a mostly pure git model.
Git LFS is a bolt-on to Git, it just so happens that GitLab and GitHub know how to dereference it when the blobs are stored on *their* LFS servers. It is not hard to bolt on Dist-Git the same way. There's just been no drive to do that in Fedora because the "fedpkg" porcelain is more than sufficient. Git LFS does it with the .gitattributes file and storing extra goop in the git repository metadata, Dist-Git does it with the "sources" file. Six one way, half dozen the other. That said, if OBS has its own LFS server, then it really doesn't matter. Git repos in Pagure can be pointed to it like any other. Right now, Pagure has no specific plugin/integration for *any* blob storage system. The biggest annoyance I have with Git LFS from a general workflow perspective is that "git clone --mirror" produces unusable git repos with it, which makes downstream usage annoying. -- 真実はいつも一つ!/ Always, there's only one truth!
participants (3)
-
Adrian Schröter
-
Alberto Planas
-
Neal Gompa