[opensuse-packaging] Proposal to remove pyc/pyo from Python on TW
Hi, As you know the Python packages are collecting the pyc/pyo precompiled binaries inside the RPM. This is mostly a good idea, as makes the first execution of the Python code faster, as is skipped the stage where the interpreter compile the .py code. But this also makes the Python stack a bit fat. Most of the time this is not a problem, but things are changing. We have JeOS and MicroOS, both minimal images (build with different goals and technology) that search for be small and slim. But we want to include a bit of Python in there, like salt-minion or cloud-init. And now the relative size of Python is evident. For Python 2.7 and 3.7 is possible to remove the pyc code from the system and instruct the interpreter to avoid the recreation of the pyc once the code is executed. The Python interpreter, by default, will compile and store the pyc in the disk for each `import`, but this behavior can be disable when we call Python. But this will make the initial execution of a big Python stack a bit slow, as the pyc needs to be recreated in memory for each invocation. The slowness can be relevant in some situations, so is better to not enable this feature. But in Python 3.8 there is a new feature in place, bpo-33499, that will recognize a new env variable (PYTHONPYCACHEPREFIX) that will change the place where __pycache__ is stored [2]. I backported this feature to 3.7 and create a JeOS image that includes salt-minion. I created an small shim that replace the python3.7 binary to enable this cache prefix feature, to point it to /var/ cache/pycache/<username>, and I removed from the image all the python compiled code. I decided salt-minion as saltsack is a relevant Python codebase. I needed to port to 3.7 150 python libraries to create the first PoC. The PoC works properly locally. I have yet some bits that I need to publish in the repo, but the general idea seems to work OK. I can also publish the gain on size for the ISO with the patch and without the patch, to have more data to compare. I also estimated some gains for different scenarios. For example in a normal TW installation: * Python 2.7 + 3.6 - pyc/pyc: 127M total - py: 109M total * Python 3.6 only - pyc/pyc: 91M total - py: 70M total Python pyc/pyo size is more than the py code size, so we can potentially half the size of the Python 3 stack. Maybe for a normal TW installation the absolute gain is not much (91M). But for other scenarios can be relevant, like in OpenStack Cloud, where the size of the Python code is big. I made some calculations based on all the different OpenStack services: * Python 2.7 OpenStack services - pyc/pyo: 1.2G total - py: 804M total Saving 1.2G each node is a more important number. So, my proposal is to remove the pyc from the Python 3 packages and enable the cache layer on Tumbleweed since Python 3.7. I do not know if do that by default or under certain configurations, as I am not sure how to that feature optional. Any ideas? Any suggestions? What do you think if I follow this path? Some ideas that I have are add a new %pycache-clean macro that will remove the __pycache__ from the RPM, add a new rpmlint check to make sure that there are not pyc for a python3 package, update the wiki and update the py2pack code to generate good python3 spec files for openSUSE. But if most of the community do not agree with this approach, I can drop the idea : ) [1] https://bugs.python.org/issue33499 [2] https://docs.python.org/3.8/whatsnew/3.8.html p.s: Sorry if the message is delivered two times, one from the .com account. My bad. -- SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nürnberg) Maxfeldstraße 5, 90409 Nürnberg, Germany -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org
On 10/4/18 4:52 PM, Alberto Planas Dominguez wrote:
I created an small shim that replace the python3.7 binary to enable this cache prefix feature, to point it to /var/ cache/pycache/<username>, and I removed from the image all the python compiled code.
The above sounds to me that compiled code goes into several user-specific pycache directories. How does that save space? Ciao, Michael.
On Thu, Oct 4, 2018 at 10:52 AM Alberto Planas Dominguez <aplanas@suse.de> wrote:
Hi,
As you know the Python packages are collecting the pyc/pyo precompiled binaries inside the RPM. This is mostly a good idea, as makes the first execution of the Python code faster, as is skipped the stage where the interpreter compile the .py code.
But this also makes the Python stack a bit fat. Most of the time this is not a problem, but things are changing. We have JeOS and MicroOS, both minimal images (build with different goals and technology) that search for be small and slim. But we want to include a bit of Python in there, like salt-minion or cloud-init. And now the relative size of Python is evident.
For Python 2.7 and 3.7 is possible to remove the pyc code from the system and instruct the interpreter to avoid the recreation of the pyc once the code is executed. The Python interpreter, by default, will compile and store the pyc in the disk for each `import`, but this behavior can be disable when we call Python.
But this will make the initial execution of a big Python stack a bit slow, as the pyc needs to be recreated in memory for each invocation. The slowness can be relevant in some situations, so is better to not enable this feature.
But in Python 3.8 there is a new feature in place, bpo-33499, that will recognize a new env variable (PYTHONPYCACHEPREFIX) that will change the place where __pycache__ is stored [2]. I backported this feature to 3.7 and create a JeOS image that includes salt-minion. I created an small shim that replace the python3.7 binary to enable this cache prefix feature, to point it to /var/ cache/pycache/<username>, and I removed from the image all the python compiled code.
I decided salt-minion as saltsack is a relevant Python codebase. I needed to port to 3.7 150 python libraries to create the first PoC.
The PoC works properly locally. I have yet some bits that I need to publish in the repo, but the general idea seems to work OK. I can also publish the gain on size for the ISO with the patch and without the patch, to have more data to compare.
I've heard variations of this theme for almost a decade now. There are three major problems with this: * Python is _very_ slow without the cache, and generating the cache is a slow operation. This is a terrible penalty for systems that heavily rely on Python. And failure to write the cache means every run is fully interpreted and very slow. * Generating the bytecode on the system means that we aren't evaluating the code to check whether it actually works for the target Python at build-time. This is a huge problem for ensuring the code is actually compatible with the target version of Python. While it's of course possible to have some things slip by, even with bytecode generation, it's a lot less likely. * It makes it much more likely that we'll leave garbage on the filesystem with installs and uninstalls of Python software. That just adds up to unaccounted space being taken up for relatively static content that should be pre-generated and tracked. OpenMandriva went with this approach for a while, and they're switching back because of these issues, especially as they've become more aggressive about upgrading the Python stack and keeping modules up to date since they switched /usr/bin/python to point to Python 3. -- 真実はいつも一つ!/ Always, there's only one truth! -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org
On Thursday 2018-10-04 18:04, Neal Gompa wrote:
But in Python 3.8 there is a new feature in place, bpo-33499, that will recognize a new env variable (PYTHONPYCACHEPREFIX) that will change the place where __pycache__ is stored [2]. I backported this feature to 3.7 and create a JeOS image that includes salt-minion. I created an small shim that replace the python3.7 binary to enable this cache prefix feature, to point it to /var/ cache/pycache/<username>, and I removed from the image all the python compiled code.
I've heard variations of this theme for almost a decade now. There are three major problems with this:
* Python is _very_ slow without the cache, and generating the cache is a slow operation. This is a terrible penalty for systems that heavily rely on Python. And failure to write the cache means every run is fully interpreted and very slow.
It seems weird that among all the scripting interpreters in a Linux distribution, this seems like a cpython-only issue. What is it that sh, perl are doing right? I can't remember seeing equally vocal lamentations about their execution speeds or techniques here. -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org
On Thursday, October 4, 2018 6:04:27 PM CEST Neal Gompa wrote:
I've heard variations of this theme for almost a decade now. There are three major problems with this:
* Python is _very_ slow without the cache, and generating the cache is a slow operation. This is a terrible penalty for systems that heavily rely on Python. And failure to write the cache means every run is fully interpreted and very slow.
But in the proposal there is a cache, only that living in a different place. Also you are wrong believing that Python will not generate the pyc if is not there, as they will reside in memory. Python can only execute pycs, so the speed is exactly the same. The penalty is only the first time that pyc is generated.
* Generating the bytecode on the system means that we aren't evaluating the code to check whether it actually works for the target Python at build-time. This is a huge problem for ensuring the code is actually compatible with the target version of Python. While it's of course possible to have some things slip by, even with bytecode generation, it's a lot less likely.
Actually no, as the pyc will be generated on OBS just when the %check happens. The proposal is not integrating those pyc in the RPM.
* It makes it much more likely that we'll leave garbage on the filesystem with installs and uninstalls of Python software. That just adds up to unaccounted space being taken up for relatively static content that should be pre-generated and tracked.
That is true, but is true for any cache management. This can be cleaned regularly or thing about a new macro that will make sure that the pyc for the upgraded module is not there anymore. Or not do anything and delegate it to the normal management of any server.
OpenMandriva went with this approach for a while, and they're switching back because of these issues, especially as they've become more aggressive about upgrading the Python stack and keeping modules up to date since they switched /usr/bin/python to point to Python 3.
Note that the proposal is backporting a new feature from 3.8 that will enable the living of the pyc files in a different place. I am not sure how OpenMandriva implemented this feature. -- SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nürnberg) Maxfeldstraße 5, 90409 Nürnberg, Germany -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org
On Thursday 2018-10-04 16:52, Alberto Planas Dominguez wrote:
I also estimated some gains for different scenarios. For example in a normal TW installation:
* Python 2.7 + 3.6 - pyc/pyc: 127M total - py: 109M total
* Python 3.6 only - pyc/pyc: 91M total - py: 70M total
Or one could remove py and keep pyc/pyo. That's basically how GNU C C++ & Fortran, Erlang, ocaml, .. all work ;-) -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org
On Thursday, October 4, 2018 6:39:41 PM CEST Jan Engelhardt wrote:
On Thursday 2018-10-04 16:52, Alberto Planas Dominguez wrote:
I also estimated some gains for different scenarios. For example in a normal TW installation:
* Python 2.7 + 3.6
- pyc/pyc: 127M total - py: 109M total
* Python 3.6 only
- pyc/pyc: 91M total - py: 70M total
Or one could remove py and keep pyc/pyo. That's basically how GNU C C++ & Fortran, Erlang, ocaml, .. all work ;-)
I would definitively not do that, as the traceback will lack context. Imagine supporting a system that fails without pointing the source code that generate the error. In my experience having the pys makes the debugging experience a lot better. -- SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nürnberg) Maxfeldstraße 5, 90409 Nürnberg, Germany -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org
On 10/4/18 10:52 AM, Alberto Planas Dominguez wrote:
Hi,
As you know the Python packages are collecting the pyc/pyo precompiled binaries inside the RPM. This is mostly a good idea, as makes the first execution of the Python code faster, as is skipped the stage where the interpreter compile the .py code.
But this also makes the Python stack a bit fat. Most of the time this is not a problem, but things are changing. We have JeOS and MicroOS, both minimal images (build with different goals and technology) that search for be small and slim.
Can you share the definition of "small and slim". What is the target size we want to get to and why does it matter if the image is a "bit" bigger? I can think of 1 trade-of, in the Cloud, when a new instance is created the image file is copied. Therefore a smaller image improves the overall instance start up as there is less data to copy. However, from my experience in GCE, where we at some point built 8GB images then switched to 10 GB images, there was no noticeable difference between the two image sizes w.r.t. start up time of an instance.
But we want to include a bit of Python in there, like salt-minion or cloud-init. And now the relative size of Python is evident.
Well, especially for cloud-init at the last couple of get together events of upstream contributors start up time for cloud-init was a big discussion point. A lot of effort has gone into making cloud-init faster. The results of this effort would be eliminated with such a move. The result here is that we would trade a measurable hit, i.e. X seconds slower for cloud-init for a non quantified "size goal" with non quantified benefits.
For Python 2.7 and 3.7 is possible to remove the pyc code from the system and instruct the interpreter to avoid the recreation of the pyc once the code is executed. The Python interpreter, by default, will compile and store the pyc in the disk for each `import`, but this behavior can be disable when we call Python.
But this will make the initial execution of a big Python stack a bit slow, as the pyc needs to be recreated in memory for each invocation. The slowness can be relevant in some situations, so is better to not enable this feature.
But in Python 3.8 there is a new feature in place, bpo-33499, that will recognize a new env variable (PYTHONPYCACHEPREFIX) that will change the place where __pycache__ is stored [2]. I backported this feature to 3.7 and create a JeOS image that includes salt-minion. I created an small shim that replace the python3.7 binary to enable this cache prefix feature, to point it to /var/ cache/pycache/<username>, and I removed from the image all the python compiled code.
I decided salt-minion as saltsack is a relevant Python codebase. I needed to port to 3.7 150 python libraries to create the first PoC.
The PoC works properly locally. I have yet some bits that I need to publish in the repo, but the general idea seems to work OK. I can also publish the gain on size for the ISO with the patch and without the patch, to have more data to compare.
I also estimated some gains for different scenarios. For example in a normal TW installation:
* Python 2.7 + 3.6 - pyc/pyc: 127M total - py: 109M total
* Python 3.6 only - pyc/pyc: 91M total - py: 70M total
Python pyc/pyo size is more than the py code size, so we can potentially half the size of the Python 3 stack.
And we need to consider all the points made when we had this discussion sometime last year, I think at that point it was started by Duncan. Having these numbers is interesting but what is the goal of the image you want to build and what is the benefit of the smaller size for JeOS or MicriOS?
Maybe for a normal TW installation the absolute gain is not much (91M).
Well it is not just the install. We would be penalizing every user with a start up time penalty to save 91M, sorry that appears to me as an optimization for the corner case at the expense of the most common path.
But for other scenarios can be relevant, like in OpenStack Cloud, where the size of the Python code is big. I made some calculations based on all the different OpenStack services:
* Python 2.7 OpenStack services - pyc/pyo: 1.2G total - py: 804M total
Saving 1.2G each node is a more important number.
See above, w.r.t. start up time of an instance I think we'd have to show that this saving actually makes a difference when the image is copied to start a new instance. Backend storage is very fast these days and I am not convinced this actually makes a difference.
So, my proposal is to remove the pyc from the Python 3 packages and enable the cache layer on Tumbleweed since Python 3.7. I do not know if do that by default or under certain configurations, as I am not sure how to that feature optional.
Any ideas?
IMHO there are mechanism for you to do this for the corner cases, i.e. JeOS and MicroOS image builds. It is very easy with kiwi to run "find / -name '*.pyc' | xargs rm ' during the image build state. This gives you what you are after, a smaller image size without penalizing everyone else.
Any suggestions?
See above, remove the files during image build for JeOS and MicroOS.
What do you think if I follow this path?
I oppose this path. We'd be penalizing every start up of every instance of EC2. We have feature requests to improve our boot performance and this is counter acting our efforts. Also it'll be uncomfortable to explain why when someone runs 'systemd analyze' our cloud-init in EC2 will be significantly slower than the same version of cloud-init on other distros. Later, Robert -- Robert Schweikert MAY THE SOURCE BE WITH YOU Distinguished Architect LINUX Team Lead Public Cloud rjschwei@suse.com IRC: robjo
On Thu, Oct 04, Robert Schweikert wrote:
Can you share the definition of "small and slim". What is the target size we want to get to and why does it matter if the image is a "bit" bigger?
Ever downloaded an image at every boot via LTE? That's what some of our customers are doing, as they don't want to send technicians out in the wild to do that via usb stick. And in virtualisation environments (not public cloud), disks are no longer cheap, as you have many, many virtual machines. So why for you a jump from 8 GB to 10 GB in the public cloud is no big problem, a lot of customers would like to see that we use only 4 GB, as that would allow them to store double as many virtual machines as today. And the LTE fraction would even like to see images in the 150MB range...
Having these numbers is interesting but what is the goal of the image you want to build and what is the benefit of the smaller size for JeOS or MicriOS?
The requested goal by our big customers are less than 500MB, else see above. And no, we don't want any of your proposed hacks, we are very happy that we were able to remove all of this hacks from building images. This only breaks RPM and updating the images. Thorsten -- Thorsten Kukuk, Distinguished Engineer, Senior Architect SLES & CaaSP SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nuernberg, Germany GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg) -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org
On 10/4/18 1:33 PM, Thorsten Kukuk wrote:
On Thu, Oct 04, Robert Schweikert wrote:
Can you share the definition of "small and slim". What is the target size we want to get to and why does it matter if the image is a "bit" bigger?
Ever downloaded an image at every boot via LTE? That's what some of our customers are doing, as they don't want to send technicians out in the wild to do that via usb stick.
And in virtualisation environments (not public cloud), disks are no longer cheap, as you have many, many virtual machines. So why for you a jump from 8 GB to 10 GB in the public cloud is no big problem, a lot of customers would like to see that we use only 4 GB,
But we can build a 4GB functional openSUSE or SLES image with pyc code included, I'm pretty sure. Our images in the Public Cloud are 10GB on the request of the providers, not because we need the space.
as that would allow them to store double as many virtual machines as today. And the LTE fraction would even like to see images in the 150MB range...
I see the problem with that request.
Having these numbers is interesting but what is the goal of the image you want to build and what is the benefit of the smaller size for JeOS or MicriOS?
The requested goal by our big customers are less than 500MB, else see above.
And no, we don't want any of your proposed hacks, we are very happy that we were able to remove all of this hacks from building images. This only breaks RPM and updating the images.
OK, fair enough, although I do not consider image manipulation via images.sh a hack, I agree that it will cause issues with the update path. Then again an image that gets downloaded for every boot should get rebuilt for updates, rather than being updated in place. So we have two groups with seemingly conflicting interests that we'll try to make happy without favoring one at the expense of the other. We've been here before ;) Can we teach rpm to handle this better? I am thinking of something like --no-pycache as an option for install. This would skip the install of pyc/pyo, egg, and other Python artifacts and leave a record in the rpmdb that the package was installed with this option. Then it could be handled in the update path properly and not install the byte compiled files on update, nor try to remove them from the previous package install. Of course there needs to be an option to get the "missing" files, maybe something like "--add-pycache". For MicroOS and JeOS image builds this could then be set as an option for the kiwi image build, well kiwi would need to learn about the new option, but that is not too terribly difficult. We might need to introduce a macro to mark the files in the rpm package appropriately, maybe something like {%_pycache_file} The concept already exists in rpm (--excludedocs) to piggy back on. But I certainly don't know enough about the state of the rpm community if something like this would fly upstream or if this is something we'd consider doing on our own. mls? Something along those lines would meet both of our needs. Later, Robert -- Robert Schweikert MAY THE SOURCE BE WITH YOU Distinguished Architect LINUX Team Lead Public Cloud rjschwei@suse.com IRC: robjo
On Thu, Oct 4, 2018 at 2:36 PM Robert Schweikert <rjschwei@suse.com> wrote:
On 10/4/18 1:33 PM, Thorsten Kukuk wrote:
On Thu, Oct 04, Robert Schweikert wrote:
Can you share the definition of "small and slim". What is the target size we want to get to and why does it matter if the image is a "bit" bigger?
Ever downloaded an image at every boot via LTE? That's what some of our customers are doing, as they don't want to send technicians out in the wild to do that via usb stick.
And in virtualisation environments (not public cloud), disks are no longer cheap, as you have many, many virtual machines. So why for you a jump from 8 GB to 10 GB in the public cloud is no big problem, a lot of customers would like to see that we use only 4 GB,
But we can build a 4GB functional openSUSE or SLES image with pyc code included, I'm pretty sure. Our images in the Public Cloud are 10GB on the request of the providers, not because we need the space.
as that would allow them to store double as many virtual machines as today. And the LTE fraction would even like to see images in the 150MB range...
I see the problem with that request.
Having these numbers is interesting but what is the goal of the image you want to build and what is the benefit of the smaller size for JeOS or MicriOS?
The requested goal by our big customers are less than 500MB, else see above.
And no, we don't want any of your proposed hacks, we are very happy that we were able to remove all of this hacks from building images. This only breaks RPM and updating the images.
OK, fair enough, although I do not consider image manipulation via images.sh a hack, I agree that it will cause issues with the update path. Then again an image that gets downloaded for every boot should get rebuilt for updates, rather than being updated in place.
So we have two groups with seemingly conflicting interests that we'll try to make happy without favoring one at the expense of the other. We've been here before ;)
Can we teach rpm to handle this better? I am thinking of something like
--no-pycache
as an option for install. This would skip the install of pyc/pyo, egg, and other Python artifacts and leave a record in the rpmdb that the package was installed with this option. Then it could be handled in the update path properly and not install the byte compiled files on update, nor try to remove them from the previous package install. Of course there needs to be an option to get the "missing" files, maybe something like "--add-pycache".
For MicroOS and JeOS image builds this could then be set as an option for the kiwi image build, well kiwi would need to learn about the new option, but that is not too terribly difficult. We might need to introduce a macro to mark the files in the rpm package appropriately, maybe something like
{%_pycache_file}
The concept already exists in rpm (--excludedocs) to piggy back on. But I certainly don't know enough about the state of the rpm community if something like this would fly upstream or if this is something we'd consider doing on our own.
There is a feature in the latest rpm versions: the %artifact attribute, which actually would make sense to mark *.py[co] files with. And there's an install filter switch for it, too. This is already used for the newer upstream debuginfo stuff, too. It should be present now with rpm 4.14.1, and I'm working on moving us to rpm 4.14.2. -- 真実はいつも一つ!/ Always, there's only one truth! -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org
On 10/4/18 2:41 PM, Neal Gompa wrote:
On Thu, Oct 4, 2018 at 2:36 PM Robert Schweikert <rjschwei@suse.com> wrote:
<snip>
And no, we don't want any of your proposed hacks, we are very happy that we were able to remove all of this hacks from building images. This only breaks RPM and updating the images.
OK, fair enough, although I do not consider image manipulation via images.sh a hack, I agree that it will cause issues with the update path. Then again an image that gets downloaded for every boot should get rebuilt for updates, rather than being updated in place.
So we have two groups with seemingly conflicting interests that we'll try to make happy without favoring one at the expense of the other. We've been here before ;)
Can we teach rpm to handle this better? I am thinking of something like
--no-pycache
as an option for install. This would skip the install of pyc/pyo, egg, and other Python artifacts and leave a record in the rpmdb that the package was installed with this option. Then it could be handled in the update path properly and not install the byte compiled files on update, nor try to remove them from the previous package install. Of course there needs to be an option to get the "missing" files, maybe something like "--add-pycache".
For MicroOS and JeOS image builds this could then be set as an option for the kiwi image build, well kiwi would need to learn about the new option, but that is not too terribly difficult. We might need to introduce a macro to mark the files in the rpm package appropriately, maybe something like
{%_pycache_file}
The concept already exists in rpm (--excludedocs) to piggy back on. But I certainly don't know enough about the state of the rpm community if something like this would fly upstream or if this is something we'd consider doing on our own.
There is a feature in the latest rpm versions: the %artifact attribute, which actually would make sense to mark *.py[co] files with. And there's an install filter switch for it, too.
This is already used for the newer upstream debuginfo stuff, too. It should be present now with rpm 4.14.1, and I'm working on moving us to rpm 4.14.2.
Sounds like one part of our problem is already solved. Now getting it back to SLES 12 and SLES 15 that's a different question. And of course fixing up all those python spec files...... Later, Robert -- Robert Schweikert MAY THE SOURCE BE WITH YOU Distinguished Architect LINUX Team Lead Public Cloud rjschwei@suse.com IRC: robjo
On Thursday, October 4, 2018 9:07:50 PM CEST Robert Schweikert wrote:
On 10/4/18 2:41 PM, Neal Gompa wrote:
On Thu, Oct 4, 2018 at 2:36 PM Robert Schweikert <rjschwei@suse.com> wrote: <snip> There is a feature in the latest rpm versions: the %artifact attribute, which actually would make sense to mark *.py[co] files with. And there's an install filter switch for it, too.
This is already used for the newer upstream debuginfo stuff, too. It should be present now with rpm 4.14.1, and I'm working on moving us to rpm 4.14.2.
Sounds like one part of our problem is already solved. Now getting it back to SLES 12 and SLES 15 that's a different question. And of course fixing up all those python spec files......
I do not see it for sles12 nor 15, but for Tumbleweed. I think that I miss to be clear that this is a Python 3.7 feature that is a backport for the still not released 3.8. I would not follow the -B path except for very specific uses cases that I do not have in mind. The %artifact idea sounds nice to me. -- SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nürnberg) Maxfeldstraße 5, 90409 Nürnberg, Germany -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org
Robert Schweikert schrieb:
On 10/4/18 2:41 PM, Neal Gompa wrote:
[...] There is a feature in the latest rpm versions: the %artifact attribute, which actually would make sense to mark *.py[co] files with. And there's an install filter switch for it, too.
This is already used for the newer upstream debuginfo stuff, too. It should be present now with rpm 4.14.1, and I'm working on moving us to rpm 4.14.2.
Sounds like one part of our problem is already solved. Now getting it back to SLES 12 and SLES 15 that's a different question. And of course fixing up all those python spec files......
Adding %artifact could be done fully automatic by rpm itself based on file pattern. AFAICS tagging artifacts so far is hardcoded to debuginfo though. So there's still work to do. rpmbuild would need to extend eg. the fileattrs mechanism¹ also support hooks for %artifact. With that it would be just a matter of a rebuild. Also, a config option would be needed to have rpm -i/-U automatically use that mode, like %_excludedocs and %_install_langs. cu Ludwig [1] /usr/lib/rpm/fileattrs/python.attr -- (o_ Ludwig Nussel //\ V_/_ http://www.suse.com/ SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org
On Thu, Oct 04, 2018 at 02:41:51PM -0400, Neal Gompa wrote:
There is a feature in the latest rpm versions: the %artifact attribute, which actually would make sense to mark *.py[co] files with. And there's an install filter switch for it, too.
I don't think so. There's a query filter switch, but not an install filter switch. %artifact was introduced to hide files from queries. Cheers, Michael. -- Michael Schroeder mls@suse.de SUSE LINUX GmbH, GF Jeff Hawn, HRB 16746 AG Nuernberg main(_){while(_=~getchar())putchar(~_-1/(~(_|32)/13*2-11)*13);} -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org
On Fri, Oct 5, 2018 at 6:04 AM Michael Schroeder <mls@suse.de> wrote:
On Thu, Oct 04, 2018 at 02:41:51PM -0400, Neal Gompa wrote:
There is a feature in the latest rpm versions: the %artifact attribute, which actually would make sense to mark *.py[co] files with. And there's an install filter switch for it, too.
I don't think so. There's a query filter switch, but not an install filter switch. %artifact was introduced to hide files from queries.
You're right, my mistake. But it probably wouldn't take much to extend it to that, too. -- 真実はいつも一つ!/ Always, there's only one truth! -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org
Am 04.10.18 um 20:36 schrieb Robert Schweikert:
Can we teach rpm to handle this better? I am thinking of something like
--no-pycache
as an option for install. This would skip the install of pyc/pyo, egg, and other Python artifacts and leave a record in the rpmdb that the package was installed with this option. Just mark them as %doc, there's already an option to omit documentation files ;-) The best thing would really be to not ship sources by default (put them in an extra RPM), but apparently the python compiler is not good enough to produce easily runnable binaries.
(OT: I'm really happy I never moved away from perl when it comes to scripting languages ;-)) -- Stefan Seyfried "For a successful technology, reality must take precedence over public relations, for nature cannot be fooled." -- Richard Feynman -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org
On Thu, 4 Oct 2018, Robert Schweikert wrote:
On 10/4/18 1:33 PM, Thorsten Kukuk wrote:
On Thu, Oct 04, Robert Schweikert wrote:
Can you share the definition of "small and slim". What is the target size we want to get to and why does it matter if the image is a "bit" bigger?
Ever downloaded an image at every boot via LTE? That's what some of our customers are doing, as they don't want to send technicians out in the wild to do that via usb stick.
And in virtualisation environments (not public cloud), disks are no longer cheap, as you have many, many virtual machines. So why for you a jump from 8 GB to 10 GB in the public cloud is no big problem, a lot of customers would like to see that we use only 4 GB,
But we can build a 4GB functional openSUSE or SLES image with pyc code included, I'm pretty sure. Our images in the Public Cloud are 10GB on the request of the providers, not because we need the space.
as that would allow them to store double as many virtual machines as today. And the LTE fraction would even like to see images in the 150MB range...
I see the problem with that request.
Having these numbers is interesting but what is the goal of the image you want to build and what is the benefit of the smaller size for JeOS or MicriOS?
The requested goal by our big customers are less than 500MB, else see above.
And no, we don't want any of your proposed hacks, we are very happy that we were able to remove all of this hacks from building images. This only breaks RPM and updating the images.
OK, fair enough, although I do not consider image manipulation via images.sh a hack, I agree that it will cause issues with the update path. Then again an image that gets downloaded for every boot should get rebuilt for updates, rather than being updated in place.
So we have two groups with seemingly conflicting interests that we'll try to make happy without favoring one at the expense of the other. We've been here before ;)
Can we teach rpm to handle this better? I am thinking of something like
--no-pycache
as an option for install. This would skip the install of pyc/pyo, egg, and other Python artifacts and leave a record in the rpmdb that the package was installed with this option. Then it could be handled in the update path properly and not install the byte compiled files on update, nor try to remove them from the previous package install. Of course there needs to be an option to get the "missing" files, maybe something like "--add-pycache".
For MicroOS and JeOS image builds this could then be set as an option for the kiwi image build, well kiwi would need to learn about the new option, but that is not too terribly difficult. We might need to introduce a macro to mark the files in the rpm package appropriately, maybe something like
{%_pycache_file}
The concept already exists in rpm (--excludedocs) to piggy back on. But I certainly don't know enough about the state of the rpm community if something like this would fly upstream or if this is something we'd consider doing on our own.
mls?
Something along those lines would meet both of our needs.
Well, I suppose first splitting off .py[co] from .py into separate sub-packages would make sense. It has already been noted that one can drop the .py if you keep the .py[co] and this would leave choice. I guess for dependences you'd then have $foo-py: Provides: $foo $foo-pyc: Provides: $foo $bar: Requires: $foo so installing either -py or -pyc resolves the dependence. We have to somehow prefer one or the other to make our dependence solver happy I guess, not sure if it supports a systemwide "pattern" like "Prefer: *-py" ;) Richard.
Later, Robert
-- Richard Biener <rguenther@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg) -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org
Am 04.10.18 um 19:33 schrieb Thorsten Kukuk:
And the LTE fraction would even like to see images in the 150MB range...
How about telling them to just avoid python at all, then? If I have some Internet-of-Shit thing that boots via LTE, it surely also has a cheap CPU and as little RAM as possible. So I'd better use something that does more work per watt. Python obviously (as can be seen from this thread) is one of the worst possible choices for that task.
And no, we don't want any of your proposed hacks, we are very happy that we were able to remove all of this hacks from building images. This only breaks RPM and updating the images.
They are booted every time via LTE and then updated with rpm? Come on, get serious, you are not going to believe what you are trying to sell us here, are you? Penalizing all openSUSE users (this is opensuse-factory after all) and all serious SLES customers for some crazy embedded fringe scenario is -- ahem -- worth discussing. -- Stefan Seyfried "For a successful technology, reality must take precedence over public relations, for nature cannot be fooled." -- Richard Feynman -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org
On Fri, Oct 05, Stefan Seyfried wrote:
They are booted every time via LTE and then updated with rpm? Come on, get serious, you are not going to believe what you are trying to sell us here, are you?
Please read again. What you claim is not what I wrote. Thorsten -- Thorsten Kukuk, Distinguished Engineer, Senior Architect SLES & CaaSP SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nuernberg, Germany GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg) -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org
On Thursday 2018-10-04 19:23, Robert Schweikert wrote:
Backend storage is very fast these days
For some definition of "fast". -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org
On Thursday, October 4, 2018 7:23:45 PM CEST Robert Schweikert wrote:
On 10/4/18 10:52 AM, Alberto Planas Dominguez wrote:
But this also makes the Python stack a bit fat. Most of the time this is not a problem, but things are changing. We have JeOS and MicroOS, both minimal images (build with different goals and technology) that search for be small and slim.
Can you share the definition of "small and slim". What is the target size we want to get to and why does it matter if the image is a "bit" bigger?
I can think of 1 trade-of, in the Cloud, when a new instance is created the image file is copied. Therefore a smaller image improves the overall instance start up as there is less data to copy. However, from my experience in GCE, where we at some point built 8GB images then switched to 10 GB images, there was no noticeable difference between the two image sizes w.r.t. start up time of an instance.
That is a good example, indeed. The main problem is that the ration py / pyc is around 1.3 to 1.5, so for each KB of .py I have to add 1.3 or 1.5 of .pyc. We are more than doubling the space. I can think how removing more than a half of the size can overall improve the speed of upgrading a system, the size of the rpm and drpm, the download time of JeOS and MicroOS or the time to upload something to Cinder.
But we want to include a bit of Python in there, like salt-minion or cloud-init. And now the relative size of Python is evident.
Well, especially for cloud-init at the last couple of get together events of upstream contributors start up time for cloud-init was a big discussion point. A lot of effort has gone into making cloud-init faster. The results of this effort would be eliminated with such a move.
I plan to measure this. The first boot can be slower, but I am still not able to have numbers here. This argument can be indeed relevant and make the proposal a bad one, but by far I do not think that the big chunk of time goes under the pyc generation in the cloud-init case, as there are more architectural problems in that.
Having these numbers is interesting but what is the goal of the image you want to build and what is the benefit of the smaller size for JeOS or MicriOS?
I have one local that I am replicating it in OBS. Will appear today or this weeked in the repo pointed in the first email.
Maybe for a normal TW installation the absolute gain is not much (91M).
Well it is not just the install. We would be penalizing every user with a start up time penalty to save 91M, sorry that appears to me as an optimization for the corner case at the expense of the most common path.
I do not see the penalization, sorry. The proposal is not to wipe out pyc and use -B when calling the Python code, is about moving the pyc generation in / var/cache (or some other cache place) and delay the pyc generation until the first boot. The difference is that pyc will be there, maybe in a ram disk, or in a faster fs, or in the same old harddisk than always. If there is a considerable penalty on the first launch, we can thing on alternatives, like making the feature optional or prepopulating a subset of the stack.
So, my proposal is to remove the pyc from the Python 3 packages and enable the cache layer on Tumbleweed since Python 3.7. I do not know if do that by default or under certain configurations, as I am not sure how to that feature optional.
Any ideas?
IMHO there are mechanism for you to do this for the corner cases, i.e. JeOS and MicroOS image builds. It is very easy with kiwi to run "find / -name '*.pyc' | xargs rm ' during the image build state. This gives you what you are after, a smaller image size without penalizing everyone else.
Well, this is how I am testing it now.
What do you think if I follow this path?
I oppose this path. We'd be penalizing every start up of every instance of EC2. We have feature requests to improve our boot performance and this is counter acting our efforts.
Not true, as the cache will be populated after the first boot. And again, by far the slowest path is not the pyc generation. But I agree that I need to deliver the numbers. -- SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nürnberg) Maxfeldstraße 5, 90409 Nürnberg, Germany -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org
On 10/5/18 4:23 AM, Alberto Planas Dominguez wrote:
On Thursday, October 4, 2018 7:23:45 PM CEST Robert Schweikert wrote:
On 10/4/18 10:52 AM, Alberto Planas Dominguez wrote:
But this also makes the Python stack a bit fat. Most of the time this is not a problem, but things are changing. We have JeOS and MicroOS, both minimal images (build with different goals and technology) that search for be small and slim.
Can you share the definition of "small and slim". What is the target size we want to get to and why does it matter if the image is a "bit" bigger?
I can think of 1 trade-of, in the Cloud, when a new instance is created the image file is copied. Therefore a smaller image improves the overall instance start up as there is less data to copy. However, from my experience in GCE, where we at some point built 8GB images then switched to 10 GB images, there was no noticeable difference between the two image sizes w.r.t. start up time of an instance.
That is a good example, indeed. The main problem is that the ration py / pyc is around 1.3 to 1.5, so for each KB of .py I have to add 1.3 or 1.5 of .pyc. We are more than doubling the space.
I can think how removing more than a half of the size can overall improve the speed of upgrading a system, the size of the rpm and drpm, the download time of JeOS and MicroOS or the time to upload something to Cinder.
Can you please formulate your goals concisely and stick to it? Are we back to discussing side effects? This is confusing. You started out stating that there is a goal to reduce image sizes for JeOS and MicroOS builds and that Python was a contributing factor to image "bloat". This was seconded by Thorsten providing a specific use case where we have people asking for 150 MB images. The image size and the size of the rpm are tangentially related. By example, an rpm contains docs, this contributes to the rpm size, but these can easily be excluded at install time with the --excludedocs option, and thus would not contribute to a measure such as image size. So the relationship you are creating with the download and upgrade example is really a different cup of tea than the goal, as I read it, in the original proposal. If your goal is to reduce the size of the Python packages then we probably need a different solution compared to a goal that produces a smaller image size when Python is part of an image.
But we want to include a bit of Python in there, like salt-minion or cloud-init. And now the relative size of Python is evident.
Well, especially for cloud-init at the last couple of get together events of upstream contributors start up time for cloud-init was a big discussion point. A lot of effort has gone into making cloud-init faster. The results of this effort would be eliminated with such a move.
I plan to measure this. The first boot can be slower, but I am still not able to have numbers here. This argument can be indeed relevant and make the proposal a bad one, but by far I do not think that the big chunk of time goes under the pyc generation in the cloud-init case, as there are more architectural problems in that.
Well I think the common agreement is that pyc generation is pretty slow. But lets put some perspective behind that and look at data rather than taking common believes as facts. On a t2.micro instance in AWS, running the SUSE stock SLES 15 BYOS image. The instance was booted (first boot), then the cloud-init cache was cleared with # cloud-init clean then shutdown -r now, i.e. a soft reboot of the VM. # systemd-analyze blame | grep cloud 6.505s cloud-init-local.service 1.013s cloud-config.service 982ms cloud-init.service 665ms cloud-final.service All these services are part of cloud-init Clear the cloud-init cache so it will re-run # cloud-init clean Clear out all Python artifacts: # cd / # find . -name '__pycache__' | xargs rm -rf # find . -name '*.pyc' | xargs rm # find . -name '*.pyo' | xargs rm This should reasonably approximate the state you are proposing, I think. Reboot: # systemd-analyze blame | grep cloud 7.469s cloud-init-local.service 1.070s cloud-init.service 976ms cloud-config.service 671ms cloud-final.service so a 13% increase for the runtime of the cloud-init-local service. And this is just a quick and dirty test with a soft reboot of the VM. Number would probably be worse with a stop-start cycle. I'll leave that to be dis-proven for those interested.
Having these numbers is interesting but what is the goal of the image you want to build and what is the benefit of the smaller size for JeOS or MicriOS?
I have one local that I am replicating it in OBS. Will appear today or this weeked in the repo pointed in the first email.
Maybe for a normal TW installation the absolute gain is not much (91M).
Well it is not just the install. We would be penalizing every user with a start up time penalty to save 91M, sorry that appears to me as an optimization for the corner case at the expense of the most common path.
I do not see the penalization, sorry.
Well I'd say the penalty is shown above, 13% in one particular example. This or worse would hit our users every time they start a new instance in AWS, GCE, Azure, OpenStack,.....
The proposal is not to wipe out pyc
The way I read your proposal was to eliminate py{c,o} from the packages, i.e. we have to byte-compile when any python module is used
and use -B when calling the Python code, is about moving the pyc generation in / var/cache (or some other cache place) and delay the pyc generation until the first boot.
OK, this part of the statement seems in line with my understanding of your proposal. You say "first-boot" are you implying a process that does a system-wide byte compilation of all installed Python code? That would probably add a rather large time penalty to first boot and is not going to work for us in the Public Cloud. But data would have to be collected on such a process to make a real decision. Or do you mean "module load" when you say "first boot", i.e. the byte-compilation takes place when a Python module is loaded for the first time? The effect of this is shown in the above example. I have an issue with a 13% drop in performance for every user on initial start up. This is penalizing the majority to cover one specific use case. Sorry it is hard for me to see this any other way.
The difference is that pyc will be there, maybe in a ram disk, or in a faster fs, or in the same old harddisk than always.
If there is a considerable penalty on the first launch, we can thing on alternatives, like making the feature optional or prepopulating a subset of the stack.
So, my proposal is to remove the pyc from the Python 3 packages and enable the cache layer on Tumbleweed since Python 3.7. I do not know if do that by default or under certain configurations, as I am not sure how to that feature optional.
Any ideas?
IMHO there are mechanism for you to do this for the corner cases, i.e. JeOS and MicroOS image builds. It is very easy with kiwi to run "find / -name '*.pyc' | xargs rm ' during the image build state. This gives you what you are after, a smaller image size without penalizing everyone else.
Well, this is how I am testing it now.
What do you think if I follow this path?
I oppose this path. We'd be penalizing every start up of every instance of EC2. We have feature requests to improve our boot performance and this is counter acting our efforts.
Not true, as the cache will be populated after the first boot.
How is my statement not true? Above you state that the cache is filled on "first boot". I hope we can all agree that byte-compilation is not a 0 time operation. Therefore, there is a time penalty in the boot process in some way shape or form. Increasing the boot time is counter to our efforts to reduce the boot time for our cloud images.
And again, by far the slowest path is not the pyc generation. But I agree that I need to deliver the numbers.
The numbers in execution time in my crude test above show that pyc is not the slowest for cloud-init, but >10% slow down is not insignificant. I think fiddling at the Python package level is not the best approach to solve the problem, unless of course making the Python packages smaller is your primary goal. Later, Robert -- Robert Schweikert MAY THE SOURCE BE WITH YOU Distinguished Architect LINUX Team Lead Public Cloud rjschwei@suse.com IRC: robjo
On Sat, Oct 06, Robert Schweikert wrote:
Can you please formulate your goals concisely and stick to it? Are we back to discussing side effects? This is confusing.
The goal is the same as with every iteration of this dicussion: python is too fat, python3 even more. And a lot of people don't like that for various, different reasons. As I already wrote: size matters again today, for various reasons and use cases. So we need a solution for this. We don't discuss or talk here about a single use case with a special solution, we talk here about a generic problem. And the solutions are quite different. And I don't think we can solve the problem with one change, we need to takle that from various directions and do that already. Albertos suggestion is only one of several things currently going on. Some people rewrite the python scripts in go or bash, other plan to drop them. Like Red Hat plans, according to two presentations last week, to drop cloud-init as it is too fat and slow and either revive the go implementation from CoreOS or replace it with Ignition. Thorsten -- Thorsten Kukuk, Distinguished Engineer, Senior Architect SLES & CaaSP SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nuernberg, Germany GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg) -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org
On Sat, Oct 6, 2018 at 12:39 PM Thorsten Kukuk <kukuk@suse.de> wrote:
On Sat, Oct 06, Robert Schweikert wrote:
Can you please formulate your goals concisely and stick to it? Are we back to discussing side effects? This is confusing.
The goal is the same as with every iteration of this dicussion: python is too fat, python3 even more. And a lot of people don't like that for various, different reasons. As I already wrote: size matters again today, for various reasons and use cases. So we need a solution for this.
It may be worth talking to upstream RPM about making %artifact optionally filtered out for install and auto-marking *.py[oc] as artifact files.
We don't discuss or talk here about a single use case with a special solution, we talk here about a generic problem. And the solutions are quite different. And I don't think we can solve the problem with one change, we need to takle that from various directions and do that already. Albertos suggestion is only one of several things currently going on. Some people rewrite the python scripts in go or bash, other plan to drop them. Like Red Hat plans, according to two presentations last week, to drop cloud-init as it is too fat and slow and either revive the go implementation from CoreOS or replace it with Ignition.
Within Fedora, the current effort is around extending Ignition to support everything we use in cloud-init for Fedora instance configuration. Currently, it's scoped to Fedora CoreOS, but there are plans to support the main Fedora Cloud Edition, too. If there's some interest in this for openSUSE, I can port over the ignition packaging from Fedora to openSUSE for people to poke at. -- 真実はいつも一つ!/ Always, there's only one truth! -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org
On Donnerstag, 4. Oktober 2018 16:52:13 CEST Alberto Planas Dominguez wrote:
Hi,
As you know the Python packages are collecting the pyc/pyo precompiled binaries inside the RPM. This is mostly a good idea, as makes the first execution of the Python code faster, as is skipped the stage where the interpreter compile the .py code.
But this also makes the Python stack a bit fat. Most of the time this is not a problem, but things are changing. We have JeOS and MicroOS, both minimal images (build with different goals and technology) that search for be small and slim. But we want to include a bit of Python in there, like salt-minion or cloud-init. And now the relative size of Python is evident.
Some different ideas how to reduce the footprint: 1. make sure your stack is completely migrated to python3 2. make sure you ship just a single optimization level [1] 3. enable FS level compression, e.g. with btrfs For (2.), on TW python3-base e.g. ships three bytecode files per source file - unoptimized, level 1 and level 2: --- $> du -b /usr/lib64/python3.6/xml/dom/{,__pycache__}/minidom* 66819 /usr/lib64/python3.6/xml/dom//minidom.py 55738 /usr/lib64/python3.6/xml/dom/__pycache__/minidom.cpython-36.opt-1.pyc 54164 /usr/lib64/python3.6/xml/dom/__pycache__/minidom.cpython-36.opt-2.pyc 55840 /usr/lib64/python3.6/xml/dom/__pycache__/minidom.cpython-36.pyc --- For python, optimized just means stripping "assert"s (opt-1) + stripping docstrings (opt-2). As long as you don't execute python with the "-O" option - explicitly or as part of the she-bang - the opt-1 and opt-2 files are completely ignored. You can verify this e.g. with: $> strace -efile -o'|grep minidom.' python3 -O -c 'import xml.dom.minidom' stat("/usr/lib64/python3.6/xml/dom/minidom.py", {st_mode=S_IFREG|0644, st_size=66819, ...}) = 0 openat(AT_FDCWD, "/usr/lib64/python3.6/xml/dom/__pycache__/ minidom.cpython-36.opt-1.pyc", O_RDONLY|O_CLOEXEC) = 3 For (3.), compressing with zstd level 6, python sources can be compressed about 3 times and bytecode still by more than 2. Kind regards, Stefan-- To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org
* Alberto Planas Dominguez <aplanas@suse.de> [Oct 04. 2018 16:52]:
Hi,
As you know the Python packages are collecting the pyc/pyo precompiled binaries inside the RPM. This is mostly a good idea, as makes the first execution of the Python code faster, as is skipped the stage where the interpreter compile the .py code.
But this also makes the Python stack a bit fat. Most of the time this is not a problem, but things are changing.
Oh, deja-vu: https://hackweek.suse.com/projects/minimal-salt-packaging Klaus -- SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org
On Friday, October 5, 2018 8:25:09 AM CEST Klaus Kaempf wrote:
* Alberto Planas Dominguez <aplanas@suse.de> [Oct 04. 2018 16:52]:
But this also makes the Python stack a bit fat. Most of the time this is not a problem, but things are changing.
Oh, deja-vu: https://hackweek.suse.com/projects/minimal-salt-packaging
Removing all the .py files will make the system a bit hard to debug, as the tracebacks will not point to the source line codes, and you loose the advantage of inspecting and changing the code in place. Cool project. -- SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nürnberg) Maxfeldstraße 5, 90409 Nürnberg, Germany -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org
Am 05.10.18 um 09:59 schrieb Alberto Planas Dominguez:
Removing all the .py files will make the system a bit hard to debug, as the tracebacks will not point to the source line codes, and you loose the advantage of inspecting and changing the code in place.
The same argument applies to C, C++, whatever code. Just install python-foobarbaz-debugsource.rpm -- Stefan Seyfried "For a successful technology, reality must take precedence over public relations, for nature cannot be fooled." -- Richard Feynman -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org
participants (12)
-
Alberto Planas Dominguez
-
Brüns, Stefan
-
Jan Engelhardt
-
Klaus Kaempf
-
Ludwig Nussel
-
Michael Schroeder
-
Michael Ströder
-
Neal Gompa
-
Richard Biener
-
Robert Schweikert
-
Stefan Seyfried
-
Thorsten Kukuk