Re: [opensuse-factory] Proposal to remove pyc/pyo from Python on TW

8 Oct 2018

      On Saturday, October 6, 2018 11:24:46 AM CEST Robert Schweikert wrote:
...
On 10/5/18 4:23 AM, Alberto Planas Dominguez wrote:
[...]
...
Can you please formulate your goals concisely and stick to it? Are we
back to discussing side effects? This is confusing.
Uhmm. Is too hard to understand that the proposal is to remove the pyc, 
because this is doubling the size of the Python stack? Doing this have some 
good benefits, and also some bad stuff. The good stuff are related with less 
size used on RPMs and disk and the implications of this, and the bad stuff are 
maybe related with security and a penalty the first time the Python code runs.

If those side effects confuse you I am not sure how to achieve an informed 
decision without analyzing those.

In any case let me be clear: my goal is to decrease the size of the Python 
stack, and my proposal is removing the pyc from the initial first install, 
backporting a feature from 3.8 to have the pyc in a different file system.

The backported code is this one:

https://build.opensuse.org/package/view_file/home:aplanas:Images/python3/
bpo-33499_Add_PYTHONPYCACHEPREFIX_env_var_for_alt_bytecode.patch?expand=1

I tested it and it works.

[...]
...
If your goal is to reduce the size of the Python packages then we
probably need a different solution compared to a goal that produces a
smaller image size when Python is part of an image.
I am open to read about other alternatives to make the Python stack size 
smaller. I can see only two: remove the pyc (and delegating the creation of 
them on a different file system during the first execution), and analyzing all 
the Requirements to be sure that there are not extra subtrees installed that 
are not needed.

IMHO both are needed, but my proposal was only about how to use a new feature 
from 3.8 to achieve a good compromise of speed / size when the pycs are 
removed from the RPM
...
...
...
...
But we want to include a bit of Python in there, like salt-minion or
cloud-init. And now the relative size of Python is evident.
Well, especially for cloud-init at the last couple of get together
events of upstream contributors start up time for cloud-init was a big
discussion point. A lot of effort has gone into making cloud-init
faster. The results of this effort would be eliminated with such a move.
I plan to measure this. The first boot can be slower, but I am still not
able to have numbers here. This argument can be indeed relevant and make
the proposal a bad one, but by far I do not think that the big chunk of
time goes under the pyc generation in the cloud-init case, as there are
more architectural problems in that.
Well I think the common agreement is that pyc generation is pretty slow.
Citation needed.

My tests give in my machine a 6.08MB/s of compilation speed. I tested it 
installing django with python 3.6 in a venv and doing this:

# To avoid measure the dir crawling
# find . -name "*.py" > LIST
# time python -m compileall  -f -qq -i LIST

real    0m1.406s
user    0m1.257s
sys     0m0.148s

# du -hsb
44812156        .

# find . -name "__pycache__" -exec rm -rf {} \;
# du -hsb
35888321        .

(44812156 - 35888321) / 1.4 ~= 6.08 MB/s
...
But lets put some perspective behind that and look at data rather than
taking common believes as facts.
On a t2.micro instance in AWS, running the SUSE stock SLES 15 BYOS
image. The instance was booted (first boot), then the cloud-init cache
was cleared with
# cloud-init clean
then shutdown -r now, i.e. a soft reboot of the VM.
# systemd-analyze blame | grep cloud
          6.505s cloud-init-local.service
          1.013s cloud-config.service
           982ms cloud-init.service
           665ms cloud-final.service
All these services are part of cloud-init
Clear the cloud-init cache so it will re-run
# cloud-init clean
Clear out all Python artifacts:
# cd /
# find . -name '__pycache__' | xargs rm -rf
# find . -name '*.pyc' | xargs rm
# find . -name '*.pyo' | xargs rm
This should reasonably approximate the state you are proposing, I think.
Reboot:
# systemd-analyze blame | grep cloud
          7.469s cloud-init-local.service
          1.070s cloud-init.service
           976ms cloud-config.service
           671ms cloud-final.service
so a 13% increase for the runtime of the cloud-init-local service. And
this is just a quick and dirty test with a soft reboot of the VM. Number
would probably be worse with a stop-start cycle. I'll leave that to be
dis-proven for those interested.
This is a very nice contribution to the discussion.

I tested it in engcloud and I have a 9.3% of overload during the boot. It 
spend 0.205s to create the initials pyc needed for cloud-init:

* With pyc in place

# systemd-analyze blame | grep cloud
          1.985s cloud-init-local.service
          1.176s cloud-init.service
           609ms cloud-config.service
           531ms cloud-final.service

* Without pyc in place

# systemd-analyze blame | grep cloud
          2.190s cloud-init-local.service
          1.165s cloud-init.service
           844ms cloud-config.service
           528ms cloud-final.service

The sad thing is that the __real__ first boot is a bit worse:

* First boot. with pyc in place

# systemd-analyze blame | grep cloud
         36.494s cloud-init.service
          2.673s cloud-init-local.service
          1.420s cloud-config.service
           730ms cloud-final.service

Comparing to this real first boot, the pyc cost generation represent the 0.54% 
for cloud-init (not in relation with the total boot time). We can ignore it, 
as I guess that the images used for EC2 will have some tweaks to avoid the 
file system resize, or some other magic that makes the boot more similar to 
the second boot.

Once the pycs are generated they will be reused, so the 0.205s of penalty are 
amortized in the second and subsequent boots. We still store the pyc in /var/
cache.

In any case, 0.205s is not so big for a 15.187 total boot time that this 
instance have for each new reboot, as the boot time is dominated by other 
factors as wicked and other services.

The image is still in engcloud, is an SLE 12 SP3 under the name 'aplanas-
test'. Feel free to access it (send me your public key to have ssh access 
there) to double check my data.
...
...
...
Well it is not just the install. We would be penalizing every user with
a start up time penalty to save 91M, sorry that appears to me as an
optimization for the corner case at the expense of the most common path.
I do not see the penalization, sorry.
Well I'd say the penalty is shown above, 13% in one particular example.
This or worse would hit our users every time they start a new instance
in AWS, GCE, Azure, OpenStack,.....
The cost is amortized, and the corner case, IMHO, is more yours than mine. 
Your case is a fresh boot of a just installed EC2 VM. I agree that there is a 
penalty of ~10% (or a 0.54% in my SLE12 SP3 OpenStack case), but this is only 
for this first boot.

But booting just-created VM is hardly the normal use case.
...
...
The proposal is not to wipe out pyc
The way I read your proposal was to eliminate py{c,o} from the packages,
i.e. we have to byte-compile when any python module is used
...
and
use -B when calling the Python code, is about moving the pyc generation in
/ var/cache (or some other cache place) and delay the pyc generation
until the first boot.
OK, this part of the statement seems in line with my understanding of
your proposal.
You say "first-boot" are you implying a process that does a system-wide
byte compilation of all installed Python code?
No, the first time that the Python code runs.

This strategy have good results as not all the Python code is loaded when a 
service is running. For example, in the Django venv scenario for this email, 
the initial size of the venv was 57MB, after removing all the pyc from site-
packages I had a venv of 45MB. If I create a new Django application (with 
database access, models and views) I have a venv of 47MB, so only 2MB of pyc 
are generated during run time, as I propose. You still save 10MB of space 
without sacrificing run-time speed.
...
Or do you mean "module load" when you say "first boot", i.e. the
byte-compilation takes place when a Python module is loaded for the
first time? The effect of this is shown in the above example. I have an
issue with a 13% drop in performance for every user on initial start up.
This is penalizing the majority to cover one specific use case. Sorry it
is hard for me to see this any other way.
Again, hardly booting fresh VMs is a majority here.

[...]
...
...
...
...
What do you think if I follow this path?
I oppose this path. We'd be penalizing every start up of every instance
of EC2. We have feature requests to improve our boot performance and
this is counter acting our efforts.
Not true, as the cache will be populated after the first boot.
How is my statement not true?
How maintaining a a /var/cache is going to penalize every start up of every 
instance of EC2? That is not true, again. You are populating /var/cache with 
the modules used the first time. Subsequent boots will not be penalized. This 
is an amortization case, so at the end, there is no penalty.
...
...
And again, by
far the slowest path is not the pyc generation. But I agree that I need to
deliver the numbers.
The numbers in execution time in my crude test above show that pyc is
not the slowest for cloud-init, but >10% slow down is not insignificant.
I do not want to impose any sub-optimal technical solution that hurt openSUSE 
and the Python developers, but this argument of yours is not complete. For the 
normal qcow2 image generated by kiwi, the penalty of sle12sp3 is ~0.5% for 
cloud-init, not for the total boot time. Fixing the resize issue makes the 
same 0.205s invested to generate the pyc, a ~10% of the time that cloud-init 
uses to start the service. In this case the cloud-init time is around 2s, 
where the total boot time is about 15.2s.
...
I think fiddling at the Python package level is not the best approach to
solve the problem, unless of course making the Python packages smaller
is your primary goal.
I agree that this needs to be addressed too.

-- 
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Dilip Upmanyu, Graham 
Norton, HRB 21284 (AG Nürnberg)
Maxfeldstraße 5, 90409 Nürnberg, Germany

-- 
To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org
To contact the owner, e-mail: opensuse-factory+owner@opensuse.org