[opensuse-packaging] Re: [opensuse-factory] RFC Generic Packaging for Languages that have vendor/ Trees
On Tue, Dec 19, 2017 at 4:32 PM, Aleksa Sarai <asarai@suse.de> wrote:
Hello *,
This is a proposal for having a generic packaging system of RPMs for languages that use "vendor/" trees. Please respond with any feedback you have on the details of this proposal.
The main justification for the need for this proposal is that we have seen the recent rise of languages that have an *enormous* number of "micro-packages" (JavaScript is the most well-known offender here, where the majority of widely used packages are only several lines long, but Rust has a similar issue, and Go/Ruby do too). This has effectively made it an impracticality (or even an impossibility for some languages) to create a 1-to-1 RPM mapping for each package. So while a 1-to-1 RPM mapping is arguably the most ideal (both from a idealogical perspective and a tooling perspective), the maintenance burden is far too high.
Another problem is that many projects written in these sorts of languages these days "vendor" their dependencies, usually using a language-specific package manager to do so. (This is slightly ironic in my opinion, because if they'd integrated more with distributions this ideally wouldn't be necessary, but that ship has sailed.) This is a problem that also needs to be resolved. Luckily such projects usually have some sort of "lock file" that describes what is present inside the "vendor/" tree -- this is something that will be useful later. It should be noted that the 1-to-1 RPM mapping also doesn't help here either as it further will balloon out the number of packages we would need to have (as each project might have different version dependencies). Debian has been attempting to do this with Go packages, and as far as I can see it's quite a futile effort because of the maintenance burden that comes from it.
At the moment the way that most packages deal with this problem is that they just punt completely on reproducibility and audit-ability, and just vendor all dependencies in a project and then tar up the vendor/ tree and include it in the OBS project. For a JavaScript project this would involve just running `yarn <blah>` (or whatever the command is) and then taking node_modules/ and creating a node_modules.tar.xz that is included in the specfile. The main problem with this approach currently is that it is completely unauditable and nobody knows what's inside that magic vendor blob. *However* the core idea is not completely insane. The Rust folks have also started doing the same thing with cargo-vendor.
And here we come to my proposal. The idea is to take what is already being done in these projects, and create better tooling around it to make the work of development, maintainence, security, and legal much easier.
First, we need to provide more metadata about these vendor blobs in the RPM layer, so that security could at least *track* what versions of things are used by a project. And in the worst case, it should be possible to patch a vendor blob. This would likely best be done through RPM macros, by creating a virtual Provides for each of the vendored libraries. This matches what Fedora does for bundled libraries[1]. The Provides could be just as simple as
Provides: bundled(rust:nix) = 0.8.1
Or something more involved to be extra paranoid:
Provides: bundled(rust:registry+https://github.com/rust-lang/crates.io-index:nix) = 0.8.1
Secondly, in order to make this vendor archive reproducible, I propose we have an OBS service that can be used to vendor a source tree (which can obviously be run either locally or on OBS). It will produce all of the vendor archives created by language-specific tools, and produce a language-agnostic manifest of what was downloaded (the name, language, version, git commit, and so on). The idea is that this manifest could be used by the RPM macros above rather than writing language-specific macros.
I have already started working on the OBS service, but I would love to hear your feedback on this proposal.
[1]: https://fedoraproject.org/wiki/Bundled_Libraries?rd=Packaging:Bundled_Librar...
I don't fully disagree with your proposal, but I will point out a few things: * The current vendoring of rust crates is temporary. We're waiting on RPM 4.14[1] and the new product builder to come online (DimStar already slapped me once for breaking Tumbleweed with rich deps before...). I'm working on making rust2rpm make openSUSE-friendly spec files (mainly add the boilerplate header, skip conversion of SPDX to Fedora license tags, generate changes file) so that crates can be easily packaged and shipped in the distribution. Right now, Fedora has well over 230 Rust crates packaged[2], and the packaging for them is pretty trivial[3]. We've also got a good handle on cargo integration, so crates function as if they're in a local cargo registry for things to depend on. * I'm not sure why openSUSE hasn't adopted the bundled() Provides thing across the board anyway. There are plenty of packages that ship vendored trees/libraries and no one knows what they are. In general, it's really not a bad idea to do that. In my opinion, it's irresponsible to not require what you bundle to be defined. Generally speaking, I think this is a solid idea, but I solidly do not believe we will be continuing the vendored crates practice for much longer in Rust. [1]: https://build.opensuse.org/request/show/558345 [2]: https://koji.fedoraproject.org/koji/search?match=glob&type=package&terms=rust-* [3]: https://pagure.io/fedora-rust/playground -- 真実はいつも一つ!/ Always, there's only one truth! -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org
On 20 December 2017 at 11:49, Neal Gompa <ngompa13@gmail.com> wrote:
On Tue, Dec 19, 2017 at 4:32 PM, Aleksa Sarai <asarai@suse.de> wrote:
Hello *,
This is a proposal for having a generic packaging system of RPMs for languages that use "vendor/" trees. Please respond with any feedback you have on the details of this proposal.
The main justification for the need for this proposal is that we have seen the recent rise of languages that have an *enormous* number of "micro-packages" (JavaScript is the most well-known offender here, where the majority of widely used packages are only several lines long, but Rust has a similar issue, and Go/Ruby do too). This has effectively made it an impracticality (or even an impossibility for some languages) to create a 1-to-1 RPM mapping for each package. So while a 1-to-1 RPM mapping is arguably the most ideal (both from a idealogical perspective and a tooling perspective), the maintenance burden is far too high.
Another problem is that many projects written in these sorts of languages these days "vendor" their dependencies, usually using a language-specific package manager to do so. (This is slightly ironic in my opinion, because if they'd integrated more with distributions this ideally wouldn't be necessary, but that ship has sailed.) This is a problem that also needs to be resolved. Luckily such projects usually have some sort of "lock file" that describes what is present inside the "vendor/" tree -- this is something that will be useful later. It should be noted that the 1-to-1 RPM mapping also doesn't help here either as it further will balloon out the number of packages we would need to have (as each project might have different version dependencies). Debian has been attempting to do this with Go packages, and as far as I can see it's quite a futile effort because of the maintenance burden that comes from it.
At the moment the way that most packages deal with this problem is that they just punt completely on reproducibility and audit-ability, and just vendor all dependencies in a project and then tar up the vendor/ tree and include it in the OBS project. For a JavaScript project this would involve just running `yarn <blah>` (or whatever the command is) and then taking node_modules/ and creating a node_modules.tar.xz that is included in the specfile. The main problem with this approach currently is that it is completely unauditable and nobody knows what's inside that magic vendor blob. *However* the core idea is not completely insane. The Rust folks have also started doing the same thing with cargo-vendor.
And here we come to my proposal. The idea is to take what is already being done in these projects, and create better tooling around it to make the work of development, maintainence, security, and legal much easier.
First, we need to provide more metadata about these vendor blobs in the RPM layer, so that security could at least *track* what versions of things are used by a project. And in the worst case, it should be possible to patch a vendor blob. This would likely best be done through RPM macros, by creating a virtual Provides for each of the vendored libraries. This matches what Fedora does for bundled libraries[1]. The Provides could be just as simple as
Provides: bundled(rust:nix) = 0.8.1
Or something more involved to be extra paranoid:
Provides: bundled(rust:registry+https://github.com/rust-lang/crates.io-index:nix) = 0.8.1
Secondly, in order to make this vendor archive reproducible, I propose we have an OBS service that can be used to vendor a source tree (which can obviously be run either locally or on OBS). It will produce all of the vendor archives created by language-specific tools, and produce a language-agnostic manifest of what was downloaded (the name, language, version, git commit, and so on). The idea is that this manifest could be used by the RPM macros above rather than writing language-specific macros.
I have already started working on the OBS service, but I would love to hear your feedback on this proposal.
[1]: https://fedoraproject.org/wiki/Bundled_Libraries?rd=Packaging:Bundled_Librar...
I don't fully disagree with your proposal, but I will point out a few things:
* The current vendoring of rust crates is temporary. We're waiting on RPM 4.14[1] and the new product builder to come online (DimStar already slapped me once for breaking Tumbleweed with rich deps before...). I'm working on making rust2rpm make openSUSE-friendly spec files (mainly add the boilerplate header, skip conversion of SPDX to Fedora license tags, generate changes file) so that crates can be easily packaged and shipped in the distribution. Right now, Fedora has well over 230 Rust crates packaged[2], and the packaging for them is pretty trivial[3]. We've also got a good handle on cargo integration, so crates function as if they're in a local cargo registry for things to depend on.
* I'm not sure why openSUSE hasn't adopted the bundled() Provides thing across the board anyway. There are plenty of packages that ship vendored trees/libraries and no one knows what they are. In general, it's really not a bad idea to do that. In my opinion, it's irresponsible to not require what you bundle to be defined.
Generally speaking, I think this is a solid idea, but I solidly do not believe we will be continuing the vendored crates practice for much longer in Rust.
[1]: https://build.opensuse.org/request/show/558345 [2]: https://koji.fedoraproject.org/koji/search?match=glob&type=package&terms=rust-* [3]: https://pagure.io/fedora-rust/playground
-- 真実はいつも一つ!/ Always, there's only one truth!
Just to add to what Neil wrote - where possible we should absolutely be using rpm packaged deps, especially in the case of Rust. However, I am fairly certain that there will be cases where using vendored blobs of sources may be acceptable (though, not for distribution in the main openSUSE trees) for user built and provided packages - I wouldn't expect a hobbyist to package a pile of dependencies, and so maybe something in place for tracking vendoring would be wise here? If we aim to package each and every dependency it's going to turn in to lunacy pretty damn quickly, so the goal here should be distribution of only the... well, top packages? I don't know, but we should be selective and work backwards from there. And yeah nah, we won't be continuing with vendored packages for long one rich deps are in place. The current vendored packages are only a temporary thing to keep the Rust structure and packaging ticking over. -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org
On 2017-12-19, Neal Gompa <ngompa13@gmail.com> wrote:
* The current vendoring of rust crates is temporary. We're waiting on RPM 4.14[1] and the new product builder to come online (DimStar already slapped me once for breaking Tumbleweed with rich deps before...). I'm working on making rust2rpm make openSUSE-friendly spec files (mainly add the boilerplate header, skip conversion of SPDX to Fedora license tags, generate changes file) so that crates can be easily packaged and shipped in the distribution. Right now, Fedora has well over 230 Rust crates packaged[2], and the packaging for them is pretty trivial[3]. We've also got a good handle on cargo integration, so crates function as if they're in a local cargo registry for things to depend on.
Is there a document somewhere that explains how it works? I read through the Fedora wiki page on Rust packaging[1] last time the RPM feature was mentioned on this list, but it doesn't explain anything about the current status (unless "rust2rpm" is the current status?).
* I'm not sure why openSUSE hasn't adopted the bundled() Provides thing across the board anyway. There are plenty of packages that ship vendored trees/libraries and no one knows what they are. In general, it's really not a bad idea to do that. In my opinion, it's irresponsible to not require what you bundle to be defined.
Generally speaking, I think this is a solid idea, but I solidly do not believe we will be continuing the vendored crates practice for much longer in Rust.
Okay. I just want to make sure that we don't run into the same maintainence problem we already have with Ruby packages (which will end up being worse due to the multi-versioning support in Rust, as well as the existence of far more micro-packages than in the Ruby universe). Does the current plan for Rust packaging account for that? [1]: https://fedoraproject.org/wiki/SIGs/Rust -- Aleksa Sarai Senior Software Engineer (Containers) SUSE Linux GmbH <https://www.cyphar.com/>
On Tue, Dec 19, 2017 at 11:54 PM, Aleksa Sarai <asarai@suse.de> wrote:
On 2017-12-19, Neal Gompa <ngompa13@gmail.com> wrote:
* The current vendoring of rust crates is temporary. We're waiting on RPM 4.14[1] and the new product builder to come online (DimStar already slapped me once for breaking Tumbleweed with rich deps before...). I'm working on making rust2rpm make openSUSE-friendly spec files (mainly add the boilerplate header, skip conversion of SPDX to Fedora license tags, generate changes file) so that crates can be easily packaged and shipped in the distribution. Right now, Fedora has well over 230 Rust crates packaged[2], and the packaging for them is pretty trivial[3]. We've also got a good handle on cargo integration, so crates function as if they're in a local cargo registry for things to depend on.
Is there a document somewhere that explains how it works? I read through the Fedora wiki page on Rust packaging[1] last time the RPM feature was mentioned on this list, but it doesn't explain anything about the current status (unless "rust2rpm" is the current status?).
Well, if you want to do it by hand, we do document how you're supposed to do it: https://fedoraproject.org/wiki/Packaging:Rust Unlike Go, which is mostly B.S. on packaging, we have been taking a careful approach to ensure we're on a solid path for Rust.
* I'm not sure why openSUSE hasn't adopted the bundled() Provides thing across the board anyway. There are plenty of packages that ship vendored trees/libraries and no one knows what they are. In general, it's really not a bad idea to do that. In my opinion, it's irresponsible to not require what you bundle to be defined.
Generally speaking, I think this is a solid idea, but I solidly do not believe we will be continuing the vendored crates practice for much longer in Rust.
Okay. I just want to make sure that we don't run into the same maintainence problem we already have with Ruby packages (which will end up being worse due to the multi-versioning support in Rust, as well as the existence of far more micro-packages than in the Ruby universe). Does the current plan for Rust packaging account for that?
Our design of Rust packaging is deliberately because of needing to package multiple versions of things. Though it is encouraged that when we encounter such situations, to try to upgrade to latest crate versions and submit patches upstream. Igor Gnatenko has been like a machine and doing just that across most of Fedora's crates. But yes, we handle multiple versions of crates within a dep tree perfectly fine. :)
-- Aleksa Sarai Senior Software Engineer (Containers) SUSE Linux GmbH <https://www.cyphar.com/>
-- 真実はいつも一つ!/ Always, there's only one truth! -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org
participants (3)
-
Aleksa Sarai
-
Luke Jones
-
Neal Gompa