[opensuse-buildservice] handling one thousand source tarballs in one package
Hi, I'm trying to build a package from source that uses npm modules. Doing so involves more than one thousand(!) upstream tarballs. In the hope to make that more manageable I wrote a PoC script¹ to produuce a file includable from the spec and to also download the tarballs. Uploading them to OBS as part of the source is rather annoying though as you can't see the forest for all the trees anymore then. So would be convenient if a service like download_files or download_url could do that on server side so the files would be hidden as _service:* AFAICS services do not have access to previously generated results² though so would have to download those files on every source change which seems rather wasteful and also takes time. OTOH there's a hack in OBS to have a cache directory for tar_scm³ only. So would it be possible to either let services access the previous results or have a cache dir for all? Any other ideas how to handle one thousand source tarballs? cu Ludwig [1] https://github.com/lnussel/nodejs-tarballs [2] https://github.com/openSUSE/osc/blob/7b5d10552352295ebdd21193b6dfbf830e7d236... [3] https://github.com/openSUSE/open-build-service/blob/master/src/backend/run-s... -- (o_ Ludwig Nussel //\ V_/_ http://www.suse.com/ SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer HRB 36809 (AG Nürnberg) -- To unsubscribe, e-mail: opensuse-buildservice+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-buildservice+owner@opensuse.org
On Dienstag, 29. September 2020, 17:59:11 CEST Ludwig Nussel wrote:
Hi,
I'm trying to build a package from source that uses npm modules. Doing so involves more than one thousand(!) upstream tarballs. In the hope to make that more manageable I wrote a PoC script¹ to produuce a file includable from the spec and to also download the tarballs. Uploading them to OBS as part of the source is rather annoying though as you can't see the forest for all the trees anymore then. So would be convenient if a service like download_files or download_url could do that on server side so the files would be hidden as _service:*
AFAICS services do not have access to previously generated results²
actually they have, your service should be able to find them in .old/ subdir
though so would have to download those files on every source change which seems rather wasteful and also takes time. OTOH there's a hack in OBS to have a cache directory for tar_scm³ only.
that is specific implementation to keep scm histories.
So would it be possible to either let services access the previous results or have a cache dir for all? Any other ideas how to handle one thousand source tarballs?
dunno if it matters here, but often you need to postprocess the npm's with random scripts. So you end up in disabled runs there unfortunatly...
cu Ludwig
[1] https://github.com/lnussel/nodejs-tarballs [2] https://github.com/openSUSE/osc/blob/7b5d10552352295ebdd21193b6dfbf830e7d236... [3] https://github.com/openSUSE/open-build-service/blob/master/src/backend/run-s...
-- Adrian Schroeter <adrian@suse.de> Build Infrastructure Project Manager SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nuernberg, Germany (HRB 247165, AG München), Geschäftsführer: Felix Imendörffer -- To unsubscribe, e-mail: opensuse-buildservice+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-buildservice+owner@opensuse.org
On Mittwoch, 30. September 2020, 07:17:22 CEST Adrian Schröter wrote:
On Dienstag, 29. September 2020, 17:59:11 CEST Ludwig Nussel wrote:
Hi,
I'm trying to build a package from source that uses npm modules. Doing so involves more than one thousand(!) upstream tarballs. In the hope to make that more manageable I wrote a PoC script¹ to produuce a file includable from the spec and to also download the tarballs. Uploading them to OBS as part of the source is rather annoying though as you can't see the forest for all the trees anymore then. So would be convenient if a service like download_files or download_url could do that on server side so the files would be hidden as _service:*
AFAICS services do not have access to previously generated results²
actually they have, your service should be able to find them in .old/ subdir
though so would have to download those files on every source change which seems rather wasteful and also takes time. OTOH there's a hack in OBS to have a cache directory for tar_scm³ only.
that is specific implementation to keep scm histories.
So would it be possible to either let services access the previous results or have a cache dir for all? Any other ideas how to handle one thousand source tarballs?
dunno if it matters here, but often you need to postprocess the npm's with random scripts. So you end up in disabled runs there unfortunatly...
btw, please try to store them into .obscpio archives. That way we can store the incremental changes only. You can construct them by using cpio --create --format=newc --reproducible for example -- Adrian Schroeter <adrian@suse.de> Build Infrastructure Project Manager SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nuernberg, Germany (HRB 247165, AG München), Geschäftsführer: Felix Imendörffer -- To unsubscribe, e-mail: opensuse-buildservice+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-buildservice+owner@opensuse.org
Adrian Schröter wrote:
On Mittwoch, 30. September 2020, 07:17:22 CEST Adrian Schröter wrote:
On Dienstag, 29. September 2020, 17:59:11 CEST Ludwig Nussel wrote:
I'm trying to build a package from source that uses npm modules. Doing so involves more than one thousand(!) upstream tarballs. In the hope to make that more manageable I wrote a PoC script¹ to produuce a file includable from the spec and to also download the tarballs. Uploading them to OBS as part of the source is rather annoying though as you can't see the forest for all the trees anymore then. [...] btw, please try to store them into .obscpio archives. That way we can store the incremental changes only.
I've updated the service to produce an obscpio now. That makes it easier to handle in disabled mode: https://build.opensuse.org/package/show/home:lnussel:branches:systemsmanagem... Nevertheless this still suboptimal as the packager has to down- and upload that 85MB obscpio. So doing that server side would be more convenient. Also would make sure the obscpio actually was created by the service. Right now there is no guarantee that whatever is specified in _service was actually used by the packager (applies to any service, not just this one). Last but not least server side would allow rebuilding from git master only by means of webhooks. cu Ludwig -- (o_ Ludwig Nussel //\ V_/_ http://www.suse.com/ SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer HRB 36809 (AG Nürnberg)
Adrian Schröter wrote:
On Dienstag, 29. September 2020, 17:59:11 CEST Ludwig Nussel wrote:
I'm trying to build a package from source that uses npm modules. Doing so involves more than one thousand(!) upstream tarballs. In the hope to make that more manageable I wrote a PoC script¹ to produuce a file includable from the spec and to also download the tarballs. Uploading them to OBS as part of the source is rather annoying though as you can't see the forest for all the trees anymore then. So would be convenient if a service like download_files or download_url could do that on server side so the files would be hidden as _service:*
AFAICS services do not have access to previously generated results²
actually they have, your service should be able to find them in .old/ subdir
Ah, so there's an inconsistency. While OBS does that, osc just deletes old files :-( I've filed an issue now¹. So far it's not a special service btw. I thought to use download_files but that one does not process spec file includes. Given the insane amount of sources I feel like editing the spec file directly like done with bundle_gems wouldn't be a good option here. So the next best thing is download_url. That one is extremely basic though so I wanted to make that a bit smarter.
though so would have to download those files on every source change which seems rather wasteful and also takes time. OTOH there's a hack in OBS to have a cache directory for tar_scm³ only.
that is specific implementation to keep scm histories.
Obviously. The question is whether a cache diretory could be made available to other services as well. The /usr/lib/obs/service/*.service file could specify whether a service needs a cache for example. That would remove the need for 'run-service-containerized' to hardcode things based on service name.
So would it be possible to either let services access the previous results or have a cache dir for all? Any other ideas how to handle one thousand source tarballs?
dunno if it matters here, but often you need to postprocess the npm's with random scripts. So you end up in disabled runs there unfortunatly...
So far not. The goal is pristine sources. cu Ludwig [1] https://github.com/openSUSE/osc/issues/845 -- (o_ Ludwig Nussel //\ V_/_ http://www.suse.com/ SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer HRB 36809 (AG Nürnberg) -- To unsubscribe, e-mail: opensuse-buildservice+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-buildservice+owner@opensuse.org
On Mittwoch, 30. September 2020, 08:34:48 CEST Ludwig Nussel wrote:
Adrian Schröter wrote:
On Dienstag, 29. September 2020, 17:59:11 CEST Ludwig Nussel wrote:
I'm trying to build a package from source that uses npm modules. Doing so involves more than one thousand(!) upstream tarballs. In the hope to make that more manageable I wrote a PoC script¹ to produuce a file includable from the spec and to also download the tarballs. Uploading them to OBS as part of the source is rather annoying though as you can't see the forest for all the trees anymore then. So would be convenient if a service like download_files or download_url could do that on server side so the files would be hidden as _service:*
AFAICS services do not have access to previously generated results²
actually they have, your service should be able to find them in .old/ subdir
Ah, so there's an inconsistency. While OBS does that, osc just deletes old files :-( I've filed an issue now¹.
yeah, it is a bit of a question if osc should put files from service side runs or from former runs into .old/ ... IMHO the later is better, because it avoids duplicate downloads. But it breaks concepts like automatic changelog generation.
So far it's not a special service btw. I thought to use download_files but that one does not process spec file includes. Given the insane amount of sources I feel like editing the spec file directly like done with bundle_gems wouldn't be a good option here. So the next best thing is download_url. That one is extremely basic though so I wanted to make that a bit smarter.
though so would have to download those files on every source change which seems rather wasteful and also takes time. OTOH there's a hack in OBS to have a cache directory for tar_scm³ only.
that is specific implementation to keep scm histories.
Obviously. The question is whether a cache diretory could be made available to other services as well.
well, that will remain as service specific decision...
The /usr/lib/obs/service/*.service file could specify whether a service needs a cache for example. That would remove the need for 'run-service-containerized' to hardcode things based on service name.
so far no service needs a cache, it just can implement one optional. Also it might be a decision of a service to hide content to others. So it makes IMHO not much sense to put this into .service files. -- Adrian Schroeter <adrian@suse.de> Build Infrastructure Project Manager SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nuernberg, Germany (HRB 247165, AG München), Geschäftsführer: Felix Imendörffer -- To unsubscribe, e-mail: opensuse-buildservice+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-buildservice+owner@opensuse.org
Adrian Schröter wrote:
On Mittwoch, 30. September 2020, 08:34:48 CEST Ludwig Nussel wrote:
Adrian Schröter wrote:
On Dienstag, 29. September 2020, 17:59:11 CEST Ludwig Nussel wrote:
I'm trying to build a package from source that uses npm modules. Doing so involves more than one thousand(!) upstream tarballs. In the hope to make that more manageable I wrote a PoC script¹ to produuce a file includable from the spec and to also download the tarballs. Uploading them to OBS as part of the source is rather annoying though as you can't see the forest for all the trees anymore then. So would be convenient if a service like download_files or download_url could do that on server side so the files would be hidden as _service:*
AFAICS services do not have access to previously generated results²
actually they have, your service should be able to find them in .old/ subdir
Ah, so there's an inconsistency. While OBS does that, osc just deletes old files :-( I've filed an issue now¹.
yeah, it is a bit of a question if osc should put files from service side runs or from former runs into .old/ ...
IMHO the later is better, because it avoids duplicate downloads. But it breaks concepts like automatic changelog generation.
How does that work on server side then? Doesn't seem to make sense for osc to behave differently than the server side.
So far it's not a special service btw. I thought to use download_files but that one does not process spec file includes. Given the insane amount of sources I feel like editing the spec file directly like done with bundle_gems wouldn't be a good option here. So the next best thing is download_url. That one is extremely basic though so I wanted to make that a bit smarter.
though so would have to download those files on every source change which seems rather wasteful and also takes time. OTOH there's a hack in OBS to have a cache directory for tar_scm³ only.
that is specific implementation to keep scm histories.
Obviously. The question is whether a cache diretory could be made available to other services as well.
well, that will remain as service specific decision...
The /usr/lib/obs/service/*.service file could specify whether a service needs a cache for example. That would remove the need for 'run-service-containerized' to hardcode things based on service name.
so far no service needs a cache, it just can implement one optional. Also it might be a decision of a service to hide content to others. So it makes IMHO not much sense to put this into .service files.
Ok there is no service that strictly *needs* a cache, yet it totally makes sense for tar_scm resp obs_scm. There's no way for a service to communicate that though. Extra options for services are hardcoded in 'run-service-containerized'. So it's not a service decision but one that 'run-service-containerized' takes. So what I'm saying is that instead of hardcoding a cache dir for a specific service name in 'run-service-containerized', there could be a way for the service to tell 'run-service-containerized' that it wants a cache. Once such a method is available, any service could leverage a cache if desired. cu Ludwig -- (o_ Ludwig Nussel //\ V_/_ http://www.suse.com/ SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer HRB 36809 (AG Nürnberg) -- To unsubscribe, e-mail: opensuse-buildservice+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-buildservice+owner@opensuse.org
On Mittwoch, 30. September 2020, 09:13:01 CEST Ludwig Nussel wrote:
Adrian Schröter wrote:
On Mittwoch, 30. September 2020, 08:34:48 CEST Ludwig Nussel wrote:
Adrian Schröter wrote:
On Dienstag, 29. September 2020, 17:59:11 CEST Ludwig Nussel wrote:
I'm trying to build a package from source that uses npm modules. Doing so involves more than one thousand(!) upstream tarballs. In the hope to make that more manageable I wrote a PoC script¹ to produuce a file includable from the spec and to also download the tarballs. Uploading them to OBS as part of the source is rather annoying though as you can't see the forest for all the trees anymore then. So would be convenient if a service like download_files or download_url could do that on server side so the files would be hidden as _service:*
AFAICS services do not have access to previously generated results²
actually they have, your service should be able to find them in .old/ subdir
Ah, so there's an inconsistency. While OBS does that, osc just deletes old files :-( I've filed an issue now¹.
yeah, it is a bit of a question if osc should put files from service side runs or from former runs into .old/ ...
IMHO the later is better, because it avoids duplicate downloads. But it breaks concepts like automatic changelog generation.
How does that work on server side then? Doesn't seem to make sense for osc to behave differently than the server side.
server side run results are always commits, so the problem is not existing there.
So far it's not a special service btw. I thought to use download_files but that one does not process spec file includes. Given the insane amount of sources I feel like editing the spec file directly like done with bundle_gems wouldn't be a good option here. So the next best thing is download_url. That one is extremely basic though so I wanted to make that a bit smarter.
though so would have to download those files on every source change which seems rather wasteful and also takes time. OTOH there's a hack in OBS to have a cache directory for tar_scm³ only.
that is specific implementation to keep scm histories.
Obviously. The question is whether a cache diretory could be made available to other services as well.
well, that will remain as service specific decision...
The /usr/lib/obs/service/*.service file could specify whether a service needs a cache for example. That would remove the need for 'run-service-containerized' to hardcode things based on service name.
so far no service needs a cache, it just can implement one optional. Also it might be a decision of a service to hide content to others. So it makes IMHO not much sense to put this into .service files.
Ok there is no service that strictly *needs* a cache, yet it totally makes sense for tar_scm resp obs_scm. There's no way for a service to communicate that though.
right, since it is anyway an admin decision to run them containerized or not. With or without network.
Extra options for services are hardcoded in 'run-service-containerized'. So it's not a service decision but one that 'run-service-containerized' takes. So what I'm saying is that instead of hardcoding a cache dir for a specific service name in 'run-service-containerized', there could be a way for the service to tell 'run-service-containerized' that it wants a cache. Once such a method is available, any service could leverage a cache if desired.
I do not see an advantage here, any service can make such a thing configurable via /etc/ configs like the existing do. If we add this to .service files we just limit the options here, because it needs to stay compatible for all services IMHO -- Adrian Schroeter <adrian@suse.de> Build Infrastructure Project Manager SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nuernberg, Germany (HRB 247165, AG München), Geschäftsführer: Felix Imendörffer -- To unsubscribe, e-mail: opensuse-buildservice+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-buildservice+owner@opensuse.org
Hey, On 29.09.20 17:59, Ludwig Nussel wrote:
I'm trying to build a package from source that uses npm modules. Doing so involves more than one thousand(!) upstream tarballs. In the hope to make that more manageable I wrote a PoC script¹ to produuce a file includable from the spec and to also download the tarballs.
I have the feeling you are reinventing the vendoring NPM provides. The `node_modules` directory is most likely already the result of the `package.json` file shipped with the software you are trying to package. npm install's default behavior is to install packages locally, just for your app into the `node_modules` directory. You can also choose to install npm packages globally (to the system so to speak) with the `npm install -g` option. Now you have a decision to make: Either you ship in your RPM package the vendored (local to your app) dependencies. Which would "just" require tar'ing up the `node_modules` directory with all the dependencies in it, unpack it during the build and make that accessible for your app. This is the strategy services like `bundled_gems`, `cargo_vendor` or `go_modules` follow. Or, and I guess for a distribution this would be the correct way, you package all of the dependencies of your npm based software also as RPM packages. Guess you can automate this to a certain degree as we do with rubygems. People before you have gone through this :-) -> https://github.com/theforeman/npm2rpm -> https://docs.fedoraproject.org/en-US/packaging-guidelines/Node.js/ Henne -- Henne Vogelsang http://www.opensuse.org Everybody has a plan, until they get hit. - Mike Tyson -- To unsubscribe, e-mail: opensuse-buildservice+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-buildservice+owner@opensuse.org
Henne Vogelsang wrote:
On 29.09.20 17:59, Ludwig Nussel wrote:
I'm trying to build a package from source that uses npm modules. Doing so involves more than one thousand(!) upstream tarballs. In the hope to make that more manageable I wrote a PoC script¹ to produuce a file includable from the spec and to also download the tarballs.
I have the feeling you are reinventing the vendoring NPM provides. [...] People before you have gone through this :-)
I'd wish so. Cockpit is not a nodejs module though and does not ship nodejs stuff. Nodejs is only used for building. So while Fedora has some nice nodejs policies there in theory, in practice they just ship the prebuilt files that some external github CI produces. The goal here is to actually rebuild from source in OBS, including the full chain of npm modules. Just including the node_modules directory as source would be a cheap and dirty hack. It does not tell where the various things in there come from. Ie violates the pristine sources concept. And worst of all it may contain binaries. Not just ones that were actually built on the packager's machine (bad enough) but ones that were just randomly downloaded when you called "npm rebuild". So you thought it would rebuild from source but it didn't. IOW the node_modules directory cannot be bundled as source.
That one contains code that could be potentially be reused to avoid "npm install" indeed. I'll take a look into that, thanks! cu Ludwig -- (o_ Ludwig Nussel //\ V_/_ http://www.suse.com/ SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer HRB 36809 (AG Nürnberg) -- To unsubscribe, e-mail: opensuse-buildservice+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-buildservice+owner@opensuse.org
Hey, On 30.09.20 18:12, Ludwig Nussel wrote:
Henne Vogelsang wrote:
On 29.09.20 17:59, Ludwig Nussel wrote:
I'm trying to build a package from source that uses npm modules. Doing so involves more than one thousand(!) upstream tarballs. In the hope to make that more manageable I wrote a PoC script¹ to produuce a file includable from the spec and to also download the tarballs.
I have the feeling you are reinventing the vendoring NPM provides. [...] People before you have gone through this :-)
I'd wish so. Cockpit is not a nodejs module though and does not ship nodejs stuff. Nodejs is only used for building.
Not sure there is a difference. Are we talking about cockpit-project.org? You see first level build/runtime deps in their `package.json` https://github.com/cockpit-project/cockpit/blob/master/package.json Each of those dependencies has it's `package.json` which potentially pulls in more dependencies. In the end you have 1.000 :-)
The goal here is to actually rebuild from source in OBS, including the full chain of npm modules.
So you choose option 2. You create RPM packages for all the NPM packages that are mentioned in the Cockpit `package.json`. And RPM packages for all the NPM packages in their dependency-chain. You end up with 1.001 RPM packages that all follow the pristine sources concept. Henne -- Henne Vogelsang http://www.opensuse.org Everybody has a plan, until they get hit. - Mike Tyson -- To unsubscribe, e-mail: opensuse-buildservice+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-buildservice+owner@opensuse.org
participants (3)
-
Adrian Schröter
-
Henne Vogelsang
-
Ludwig Nussel