On Thu, Oct 4, 2018 at 10:52 AM Alberto Planas Dominguez <aplanas@suse.de> wrote:
Hi,
As you know the Python packages are collecting the pyc/pyo precompiled binaries inside the RPM. This is mostly a good idea, as makes the first execution of the Python code faster, as is skipped the stage where the interpreter compile the .py code.
But this also makes the Python stack a bit fat. Most of the time this is not a problem, but things are changing. We have JeOS and MicroOS, both minimal images (build with different goals and technology) that search for be small and slim. But we want to include a bit of Python in there, like salt-minion or cloud-init. And now the relative size of Python is evident.
For Python 2.7 and 3.7 is possible to remove the pyc code from the system and instruct the interpreter to avoid the recreation of the pyc once the code is executed. The Python interpreter, by default, will compile and store the pyc in the disk for each `import`, but this behavior can be disable when we call Python.
But this will make the initial execution of a big Python stack a bit slow, as the pyc needs to be recreated in memory for each invocation. The slowness can be relevant in some situations, so is better to not enable this feature.
But in Python 3.8 there is a new feature in place, bpo-33499, that will recognize a new env variable (PYTHONPYCACHEPREFIX) that will change the place where __pycache__ is stored [2]. I backported this feature to 3.7 and create a JeOS image that includes salt-minion. I created an small shim that replace the python3.7 binary to enable this cache prefix feature, to point it to /var/ cache/pycache/<username>, and I removed from the image all the python compiled code.
I decided salt-minion as saltsack is a relevant Python codebase. I needed to port to 3.7 150 python libraries to create the first PoC.
The PoC works properly locally. I have yet some bits that I need to publish in the repo, but the general idea seems to work OK. I can also publish the gain on size for the ISO with the patch and without the patch, to have more data to compare.
I've heard variations of this theme for almost a decade now. There are three major problems with this: * Python is _very_ slow without the cache, and generating the cache is a slow operation. This is a terrible penalty for systems that heavily rely on Python. And failure to write the cache means every run is fully interpreted and very slow. * Generating the bytecode on the system means that we aren't evaluating the code to check whether it actually works for the target Python at build-time. This is a huge problem for ensuring the code is actually compatible with the target version of Python. While it's of course possible to have some things slip by, even with bytecode generation, it's a lot less likely. * It makes it much more likely that we'll leave garbage on the filesystem with installs and uninstalls of Python software. That just adds up to unaccounted space being taken up for relatively static content that should be pre-generated and tracked. OpenMandriva went with this approach for a while, and they're switching back because of these issues, especially as they've become more aggressive about upgrading the Python stack and keeping modules up to date since they switched /usr/bin/python to point to Python 3. -- 真実はいつも一つ!/ Always, there's only one truth! -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org