On 10/4/18 10:52 AM, Alberto Planas Dominguez wrote:
As you know the Python packages are collecting the pyc/pyo precompiled
binaries inside the RPM. This is mostly a good idea, as makes the first
execution of the Python code faster, as is skipped the stage where the
interpreter compile the .py code.
But this also makes the Python stack a bit fat. Most of the time this is not a
problem, but things are changing. We have JeOS and MicroOS, both minimal
images (build with different goals and technology) that search for be small
Can you share the definition of "small and slim". What is the target
size we want to get to and why does it matter if the image is a "bit"
I can think of 1 trade-of, in the Cloud, when a new instance is created
the image file is copied. Therefore a smaller image improves the overall
instance start up as there is less data to copy. However, from my
experience in GCE, where we at some point built 8GB images then switched
to 10 GB images, there was no noticeable difference between the two
image sizes w.r.t. start up time of an instance.
But we want to include a bit of Python in there, like
cloud-init. And now the relative size of Python is evident.
Well, especially for cloud-init at the last couple of get together
events of upstream contributors start up time for cloud-init was a big
discussion point. A lot of effort has gone into making cloud-init
faster. The results of this effort would be eliminated with such a move.
The result here is that we would trade a measurable hit, i.e. X seconds
slower for cloud-init for a non quantified "size goal" with non
For Python 2.7 and 3.7 is possible to remove the pyc code from the system and
instruct the interpreter to avoid the recreation of the pyc once the code is
executed. The Python interpreter, by default, will compile and store the pyc
in the disk for each `import`, but this behavior can be disable when we call
But this will make the initial execution of a big Python stack a bit slow, as
the pyc needs to be recreated in memory for each invocation. The slowness can
be relevant in some situations, so is better to not enable this feature.
But in Python 3.8 there is a new feature in place, bpo-33499, that will
recognize a new env variable (PYTHONPYCACHEPREFIX) that will change the place
where __pycache__ is stored . I backported this feature to 3.7 and create a
JeOS image that includes salt-minion. I created an small shim that replace the
python3.7 binary to enable this cache prefix feature, to point it to /var/
cache/pycache/<username>, and I removed from the image all the python compiled
I decided salt-minion as saltsack is a relevant Python codebase. I needed to
port to 3.7 150 python libraries to create the first PoC.
The PoC works properly locally. I have yet some bits that I need to publish in
the repo, but the general idea seems to work OK. I can also publish the gain
on size for the ISO with the patch and without the patch, to have more data to
I also estimated some gains for different scenarios. For example in a normal
* Python 2.7 + 3.6
- pyc/pyc: 127M total
- py: 109M total
* Python 3.6 only
- pyc/pyc: 91M total
- py: 70M total
Python pyc/pyo size is more than the py code size, so we can potentially half
the size of the Python 3 stack.
And we need to consider all the points made when we had this discussion
sometime last year, I think at that point it was started by Duncan.
Having these numbers is interesting but what is the goal of the image
you want to build and what is the benefit of the smaller size for JeOS
Maybe for a normal TW installation the absolute gain is not much (91M).
Well it is not just the install. We would be penalizing every user with
a start up time penalty to save 91M, sorry that appears to me as an
optimization for the corner case at the expense of the most common path.
for other scenarios can be relevant, like in OpenStack Cloud, where the size
of the Python code is big. I made some calculations based on all the different
* Python 2.7 OpenStack services
- pyc/pyo: 1.2G total
- py: 804M total
Saving 1.2G each node is a more important number.
See above, w.r.t. start up time of an instance I think we'd have to show
that this saving actually makes a difference when the image is copied to
start a new instance. Backend storage is very fast these days and I am
not convinced this actually makes a difference.
So, my proposal is to remove the pyc from the Python 3 packages and enable the
cache layer on Tumbleweed since Python 3.7. I do not know if do that by
default or under certain configurations, as I am not sure how to that feature
IMHO there are mechanism for you to do this for the corner cases, i.e.
JeOS and MicroOS image builds. It is very easy with kiwi to run "find /
-name '*.pyc' | xargs rm ' during the image build state. This gives you
what you are after, a smaller image size without penalizing everyone else.
See above, remove the files during image build for JeOS and MicroOS.
What do you think if I follow this path?
I oppose this path. We'd be penalizing every start up of every instance
of EC2. We have feature requests to improve our boot performance and
this is counter acting our efforts.
Also it'll be uncomfortable to explain why when someone runs 'systemd
analyze' our cloud-init in EC2 will be significantly slower than the
same version of cloud-init on other distros.
Robert Schweikert MAY THE SOURCE BE WITH YOU
Distinguished Architect LINUX
Team Lead Public Cloud