October 2018 - openSUSE Packaging - openSUSE Mailing Lists

[opensuse-packaging] New macro to limit resources allocation per thread
by Tomas Chvatal 23 Aug '19

23 Aug '19

Hello all, For use in libreoffice, chromium and others I've created macro that should allow you to limit jobs based on some constraints you can set later on in the spec to avoid OOM crashes. The usage is pretty straight forward (Once it is accepted in Tumbleweed): === BuildRequires: memory-constraints %build # require 2GB mem per thread %limit_build -m 2000 make %{?_smp_mflags} ==== Here the _smp_mflags vaule for 8GB machine would be 4 while default is number of cores (lets say 16)... Both macros %jobs and %_smp_mflags are overriden as such the integration should be really painless if you need to do something like this. Tom

7 12

[opensuse-packaging] Best strategy for dealing with Python version numbers
by John Paul Adrian Glaubitz 26 Nov '18

26 Nov '18

Hi! RPM/DPKG uses a different version ordering than Python. In particular, Python has a different logic for pre-releases. While DPKG/RPM uses the tilde for lowering version numbers meaning that RC versions have to be constructed as something like "1.0.0~rc2", Python uses "1.0.0rc2". This means, that for Python, "1.0.0rc2" is a lower version than "1.0.0" while for DPKG/RPM, "1.0.0rc2" is actually higher than "1.0.0". Since lots of Python packages with the rc-suffix exist, I assume there is a consensus in openSUSE on how to map Python RC versions to RPM RC versions, isn't there? What's the suggested strategy? Are there any example packages? Adrian > [1] https://www.python.org/dev/peps/pep-0440/#pre-releases -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe(a)opensuse.org To contact the owner, e-mail: opensuse-packaging+owner(a)opensuse.org

4 5

[opensuse-packaging] python3 package for goocanvas
by Axel Braun 01 Nov '18

01 Nov '18

Hi, I have a python package (GooCalendar) that needs python-goocanvas. For python2 everything is fine. For python3 the 'Requires' in GooCalendar asks for a python-goocanvas, but during install it wants python3-goocanvas, which does not exist . In fact, python-goocanvas provides a binary 'goocanvasmodule.so' in /usr/ lib64/python2.7/sitepackages, but no further python files, so it is basically python-version independent. How could we best make it useable for py 2 and 3? (Singlespec does not cover this as far I could see) Thanks Axel -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe(a)opensuse.org To contact the owner, e-mail: opensuse-packaging+owner(a)opensuse.org

2 1

[opensuse-packaging] Use of %desktop_database_postun
by Axel Braun 28 Oct '18

28 Oct '18

Hi, I tried to build a python3-package, which installs a desktop file and menu- entry. The build went technically fine, but during the trial-installation for the postun check it failed with [ 38s] /var/tmp/rpm-tmp.cUzmQ7: line 3: fg: no job control I was told that the macros like %desktop_database_postun [1] are depreciated. If I comment %post and %postun out, it builds fine Can anyone confirmt he depreciation of the desktop macros? Thanks Axel [1] https://en.opensuse.org/openSUSE:Packaging_Conventions_RPM_Macros#. 25desktop_database_post_.2F_.25desktop_database_postun -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe(a)opensuse.org To contact the owner, e-mail: opensuse-packaging+owner(a)opensuse.org

2 1

[opensuse-packaging] Finding the right category for the desktop file
by John Paul Adrian Glaubitz 28 Oct '18

28 Oct '18

Hi! I am currently packaging a small application which is provided by the German government for use with the new German ID which provides online functionality. The project can be found in [1]. I used the "keepassx" package as a template. Currently, the build fails with: [ 544s] ERROR: Icon file not installed: /home/abuild/rpmbuild/BUILDROOT/AusweisApp2-1.14.3-2.1.x86_64//usr/share/applications/AusweisApp2.desktop (AusweisApp2) [ 544s] WARNING: Empty GenericName: /home/abuild/rpmbuild/BUILDROOT/AusweisApp2-1.14.3-2.1.x86_64//usr/share/applications/AusweisApp2.desktop [ 544s] Errors in installed desktop file detected. Please refer to http://en.opensuse.org/SUSE_Package_Conventions/RPM_Macros [ 544s] error: Bad exit status from /var/tmp/rpm-tmp.qw1ZMJ (%install) Which indicates that there is a problem with the desktop category and icon file, the desktop file template being: [Desktop Entry] Version=1.0 Type=Application Exec=@CMAKE_INSTALL_PREFIX@/bin/AusweisApp2 Icon=AusweisApp2 StartupNotify=true Terminal=false Categories=Network;Utility Keywords=nPA,eID,eAT,Personalausweis,Aufenthaltstitel,Identity,Card Name=AusweisApp2 Looking at the guidelines in [2], it's obvious that the category "Network;Utility" is not allowed. However, I don't really know what other category from the list in [2] would fit for an application for online ID verification. As for the icon, it seems to be missing from the upstream tarball. I will fix that later. Adrian > [1] https://build.opensuse.org/package/show/home:glaubitz:branches:security/Aus… > [2] https://en.opensuse.org/openSUSE:Packaging_desktop_menu_categories -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe(a)opensuse.org To contact the owner, e-mail: opensuse-packaging+owner(a)opensuse.org

2 1

[opensuse-packaging] OBS package bugowners wanted for git and mercurial
by Takashi Iwai 24 Oct '18

24 Oct '18

Hi, due to the lack of time, I'd like to give away my maintainership of two packages: git and mercurial. More strictly speaking, it's about "Bugowner" field that needs to be re-assigned. Since both are commonly used programs, I hope someone will take over it and I can avoid submitting deletereq for them :) Thanks! Takashi -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe(a)opensuse.org To contact the owner, e-mail: opensuse-packaging+owner(a)opensuse.org

6 12

[opensuse-packaging] openSUSE Build Service - KDE:Qt
by Mathias Homann 12 Oct '18

12 Oct '18

Hi, it seems that KDE:Qt:5.11 has no users set, especially no bugowner ... which makes the "report bug" link on OBS go away. I'm pretty sure that is not on purpose, so can someone please set proper maintaners and bugowners for all those KDE subprojects? cheers MH -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe(a)opensuse.org To contact the owner, e-mail: opensuse-packaging+owner(a)opensuse.org

1 0

[opensuse-packaging] Re: [opensuse-factory] Proposal to remove pyc/pyo from Python on TW
by Robert Schweikert 09 Oct '18

09 Oct '18

On 10/9/18 4:04 AM, Alberto Planas Dominguez wrote: > On Monday, October 8, 2018 7:17:08 PM CEST Robert Schweikert wrote: >> On 10/8/18 12:07 PM, Alberto Planas Dominguez wrote: >>> [Dropping a very unproductive content] >> <snip> >>> Is in my understanding that CPU cost is cheaper in relation with network >>> transfer and storage. Can we measure the savings of network and storage >>> here? >> Not in the Public Cloud, there is no data. Network data into the >> framework is always free and the size of the root volumes in our images >> is already 10GB (30GB in Azure) and thus offers up ample space for the >> pyc files. Meaning there is no gain if the actual disk space used by the >> packages we install is smaller and we have more empty space in the 10GB >> (30GB in Azure) image. > > So uploading ISO / qcow2 images and storing them for a long period is free? No, that is not free. However, I think there is some intermingling of concepts occurring. Let me try and explain by example. A user starts and instance from a SUSE provided image. The size of the image is 10GB in EC2 and GCE. The customer has no cost for storage of the root volume or any network data transferred into the instance during update, i.e. "zypper up" does not generate any cost for the customer even if the customer pulls packages from OBS or SCC. The cost is for the instance, i.e. CPU and memory cost. Also given that Public Cloud frameworks generally have a really good network connection and most instances have a decent network speed the size of the rpm that is being pulled in is not much of a concern. This is the case I had in mind when I provided the example calculation of how a "on first use pyc generation" strategy can cost a user money. In this second example the user generates their own image and uploads it. In this case the user is responsible for the cost of the root volume storage. In AWS for ssd based storage this amounts to $0.1 per GB per month. So if a user can save 1 GB of storage because there are no pyc files in their image and they make their image exactly the size they need they stand the potential of saving 10 cents a month or $1.2 over the course of a year. If I go back to the example with 2000 test instances, from my point of view, it is apparent that the cost savings potential by optimizing for size are significantly smaller when compared to the cost increase potential due to increased CPU use. So this scenario, IMHO, would also favor the use case where we can, in some way shape or form have images that contain the pyc files. > > [...] >>> Or better, the user can add a `python -m compileall` in the kiwi >>> config.sh, >>> that will populate /var/cache for the cloud images only. >> >> Well, I could turn around and state that this is a "hack" and "...we >> don't want any of your proposed hacks.." in our image creation process. >> Hopefully sounds familiar.... ;) > > Indeed, the hack is breaking the content of the RPM removing files from the > places where zypper put in (/usr). There is no hack making /var/cache hot > before the service runs :-) > >> Anyway, on a more serious note, if we can resolve the security concerns >> and properly handle the upgrade mechanism while not generating multiple >> packages I am not categorically opposed to such an addition in our >> Public Cloud image builds. > > Thanks! > > I see the security problem, indeed. I will work to provide an answer to this > problem and propose it here. > > Meanwhile the image from [1] is providing some of the pieces that I talk in > the first email. This is the image that I am using for another project related > with Salt, but as today: > > * Python 3.7 is installed (in TW we still have 3.6) > * Python 3.7 contains the patch from 3.8 to enable storage of pycache in a > different file system > * I added two shim loaders (written in shell), that replace python3.7 and > python3.7m, to enable the 3.8 feature > * As a hack, I removed all the pycache from the ISO (I know, I know ...), so > all is generated under demand on /var/cache > > [1] https://build.opensuse.org/project/monitor/home:aplanas:Images? > arch_x86_64=1&defaults=0&repo_images=1&succeeded=1 > Over the next couple of days I will try to find the time to generate a SLES 15 image and clear the py{c,o} and __pycache__ and get it uploaded to EC2 so we can get a number for "cold start" for cloud-init. Later, Robert -- Robert Schweikert MAY THE SOURCE BE WITH YOU Distinguished Architect LINUX Team Lead Public Cloud rjschwei(a)suse.com IRC: robjo

1 0

[opensuse-packaging] Re: [opensuse-factory] Proposal to remove pyc/pyo from Python on TW
by Robert Schweikert 08 Oct '18

08 Oct '18

On 10/8/18 12:07 PM, Alberto Planas Dominguez wrote: > On Monday, October 8, 2018 4:55:33 PM CEST Robert Schweikert wrote: >> On 10/8/18 7:35 AM, Alberto Planas Dominguez wrote: >>> On Saturday, October 6, 2018 11:24:46 AM CEST Robert Schweikert wrote: >>>> On 10/5/18 4:23 AM, Alberto Planas Dominguez wrote: > > [Dropping a very unproductive content] I am really trying hard to leave out "color commentary" and adjectives, it would be nice to see the effort reciprocated. > >>> In any case let me be clear: my goal is to decrease the size of the Python >>> stack, and my proposal is removing the pyc from the initial first install, >>> backporting a feature from 3.8 to have the pyc in a different file system. >> >> OK, this is different from the original e-mail where it was implied that >> "image size" was the primary target. > > That was my initial motivator, yes. But the goal, I hope, is clear now with my > previous paragraph. > >>> My tests give in my machine a 6.08MB/s of compilation speed. I tested it >>> installing django with python 3.6 in a venv and doing this: >>> >>> # To avoid measure the dir crawling >>> # find . -name "*.py" > LIST >>> # time python -m compileall -f -qq -i LIST >>> >>> real 0m1.406s >>> user 0m1.257s >>> sys 0m0.148s >>> >>> # du -hsb >>> 44812156 . >>> >>> # find . -name "__pycache__" -exec rm -rf {} \; >>> # du -hsb >>> 35888321 . >>> >>> (44812156 - 35888321) / 1.4 ~= 6.08 MB/s >>> >>>> But lets put some perspective behind that and look at data rather than >>>> taking common believes as facts. >>>> >>>> On a t2.micro instance in AWS, running the SUSE stock SLES 15 BYOS >>>> image. The instance was booted (first boot), then the cloud-init cache >>>> was cleared with >>>> >>>> # cloud-init clean >>>> >>>> then shutdown -r now, i.e. a soft reboot of the VM. >>>> >>>> # systemd-analyze blame | grep cloud >>>> >>>> 6.505s cloud-init-local.service >>>> 1.013s cloud-config.service >>>> >>>> 982ms cloud-init.service >>>> 665ms cloud-final.service >>>> >>>> All these services are part of cloud-init >>>> >>>> Clear the cloud-init cache so it will re-run >>>> # cloud-init clean >>>> >>>> Clear out all Python artifacts: >>>> >>>> # cd / >>>> # find . -name '__pycache__' | xargs rm -rf >>>> # find . -name '*.pyc' | xargs rm >>>> # find . -name '*.pyo' | xargs rm >>>> >>>> This should reasonably approximate the state you are proposing, I think. >>>> Reboot: >>>> >>>> # systemd-analyze blame | grep cloud >>>> >>>> 7.469s cloud-init-local.service >>>> 1.070s cloud-init.service >>>> >>>> 976ms cloud-config.service >>>> 671ms cloud-final.service >>>> >>>> so a 13% increase for the runtime of the cloud-init-local service. And >>>> this is just a quick and dirty test with a soft reboot of the VM. Number >>>> would probably be worse with a stop-start cycle. I'll leave that to be >>>> dis-proven for those interested. >>> >>> This is a very nice contribution to the discussion. >>> >>> I tested it in engcloud and I have a 9.3% of overload during the boot. It >> >>> spend 0.205s to create the initials pyc needed for cloud-init: > >> That would not be sufficient, all pycs in the dependency tree would need >> to be generated, you cannot just measure the creation of the cloud-init >> pyc files. cloud-init is going to be one of, if not the first Python >> processes running in the boot sequence, which implies that no pyc files >> exist for the cloud-init dependencies. > > Of course it is enough for the argument. In fact is a critical part of the > discussion: you delegate the pyc create when they are needed, and once > required will be stored in the cache. > > When cloud-init is loaded, Python will read all the `import`s and the required > subtree of pyc will be generated before the execution of _any_ Python code. > You are not compiling only the pyc from cloud-init, but for all the > dependencies that are required. > > Unless there are some lazy load in cloud-init based on something like > stevedore (that I do not see), or is full for `import`s inside functions and > methods, the pyc generation of the required subtree will be the first thing > that Python will do. I think we are in agreement, problem appears to be that we have a different idea about what "create the initials pyc needed for cloud-init:" means. For me this arrived as what you stated, i.e. "pyc needed for cloud-init" which says nothing about dependencies. For you this statement appears to imply that the pyc files for the dependencies were also generated in this test. More explicit and concise communication would certainly help. > >>> * With pyc in place >>> >>> # systemd-analyze blame | grep cloud >>> >>> 1.985s cloud-init-local.service >>> 1.176s cloud-init.service >>> >>> 609ms cloud-config.service >>> 531ms cloud-final.service >>> >>> * Without pyc in place >>> >>> # systemd-analyze blame | grep cloud >>> >>> 2.190s cloud-init-local.service >>> 1.165s cloud-init.service >>> >>> 844ms cloud-config.service >>> 528ms cloud-final.service >>> >>> The sad thing is that the __real__ first boot is a bit worse: >>> >>> * First boot. with pyc in place >>> >>> # systemd-analyze blame | grep cloud >>> >>> 36.494s cloud-init.service >>> >>> 2.673s cloud-init-local.service >>> 1.420s cloud-config.service >>> >>> 730ms cloud-final.service >>> >>> Comparing to this real first boot, the pyc cost generation represent the >>> 0.54% for cloud-init (not in relation with the total boot time). We can >>> ignore it, as I guess that the images used for EC2 will have some tweaks >>> to avoid the file system resize, or some other magic that makes the boot >>> more similar to the second boot. >> >> First boot execution of cloud-init is also significantly slower in the >> Public Cloud. However, not as bad as in your example. > > If this is the case, this first boot is the one that will generate the pyc, > not the later one. > >> In any case your >> second comparison appears to making a leap that I, at this point do not >> agree with. You are equating the generation of pyc code in a "hot >> system" to the time it takes to load everything in a "cold system". A >> calculation of percentage contribution of pyc creation in a "cold >> system" would only be valid if that scenario were tested. Which we have >> not done, but would certainly not be too difficult to test. > > I do not get the point. At the end we measured the proportion of the time > Python spend generating the pyc for cloud-init and all the dependencies needed > for the service, in relation with the overall time that cloud init spend > during the initialization of the service. > > I am not sure what do you mean by hot and clod here, as I removed all the pyc > from site-packages to have a measure of the relation of the generation of the > pyc over the time that cloud init uses to start the service. OK, maybe I can explain this better. We agree that there is a significant difference in the cloud-init execution time between initial start up of the VM vs. a reboot, even if the cloud-init cache (not the pyc files) is cleared. This implies that something is working behind the scene to our advantage and makes a VM reboot faster w.r.t. cloud-init execution when compared to the start a new instance scenario. Given that we do not know what this "makes it work faster" part is, we should not draw any conclusion that the pyc build will take equally as short/long of a time on initial start up as it takes in a "reboot the VM" scenario. This will have to be tested. > >>> The cost is amortized, and the corner case, IMHO, is more yours than mine. >>> Your case is a fresh boot of a just installed EC2 VM. I agree that there >>> is a penalty of ~10% (or a 0.54% in my SLE12 SP3 OpenStack case), but >>> this is only for this first boot. >> >> Which is a problem for those users that start a lot of instances to >> throw them away and start new instances the next time they are needed. >> This would be a typical autoscaling use case or a typical test use case. > > Correct. The 0.205s will be added for each new fresh VM. Am I correct to > assume that also this is an scenario where the resize in the initial boot is > happening? If so, the overall impact is much less that the 10% that we are > talking about, and more close to the %0.5 that I measured in OpenStack. The data I presented as an example was generated with a 10GB image size for the creation of an instance with a 10GB root volume size. So there is a, what should be a negligible, contribution from the growpart script, which is called by cloud-init and runs in a subprocess. I attribute "negligible" in this case as growpart will exit very fast is no resizing is required. It still take time for process start up etc. but again I consider this as negligible. But you are correct, by increasing the time it takes for other things cloud-init calls, such as root volume resize, one can decrease the percentage of time allocated to pyc creation. If I would want to take this to an extreme I could start a process during user data processing that runs for several minutes and thus I could make an argument that pyc creation takes almost no time. However, that would be misleading. I think in an effort to arrive as close as reasonably possible at the "real cost" of the pyc generation for the cloud-init example, we should minimize the externally executed processes by cloud-init, such as minimizing the runtime for growpart, which is done by not manipulating the instance root volume size as compared to the image size. The other thing that comes into play when comparing different frameworks is that different modules get loaded by cloud-init and also if we do not have the exact same config file that would also cause differences as cloud-init only loads configuration modules that are needed based on the given configuration, a certain "lazy load" mechanism. > >> It is relatively easy to calculate a cost for this with some estimates. >> If my test for my application needs 2000 (an arbitrary number I picked) >> test instances, and every test instance takes .2 seconds longer to boot, >> to use your number, than the total time penalty is ~6.7 seconds. If this >> test uses an instance type that costs me $10 per hour the slow down >> costs me ~ $1.1 every time I run my test. So if the test case runs once >> a week it would amount to ~$57 per year. > > Imagine the cost of the resize of the kiwi operation, must be around some > thousands dollars. > > But you are right. If there is a weekly re-escalation of 2000 instances during > the 54 weeks of a year, you can measure the cost of the pyc generation. > > Is in my understanding that CPU cost is cheaper in relation with network > transfer and storage. Can we measure the savings of network and storage here? Not in the Public Cloud, there is no data. Network data into the framework is always free and the size of the root volumes in our images is already 10GB (30GB in Azure) and thus offers up ample space for the pyc files. Meaning there is no gain if the actual disk space used by the packages we install is smaller and we have more empty space in the 10GB (30GB in Azure) image. > I have the feeling that are more that 57$ per year. Nope the cost for this to the user in the Public Cloud is 0, see above. > We are interchanging CPU > per network and storage savings, that with those 2000 instances per week > during a full year will be also measurable. > >>>> This is penalizing the majority to cover one specific use case. Sorry it >>>> is hard for me to see this any other way. >>> >>> Again, hardly booting fresh VMs is a majority here. >> >> That is an assumption on your part, from my perspective. > > Do you really thing that TW and Leap are optimized for boot speed in a cloud > scenario? No, but whatever we put into TW inevitably ends up in SLE and there it does matter. > If the majority of users are launching all day and night VMs I vote > to optimize this use case before anything else. > >> If we pursue the approach of multiple packages, as suggested in one or >> two messages in this thread, then we could build Public Cloud images >> with pyc files included. > > Or better, the user can add a `python -m compileall` in the kiwi config.sh, > that will populate /var/cache for the cloud images only. Well, I could turn around and state that this is a "hack" and "...we don't want any of your proposed hacks.." in our image creation process. Hopefully sounds familiar.... ;) Anyway, on a more serious note, if we can resolve the security concerns and properly handle the upgrade mechanism while not generating multiple packages I am not categorically opposed to such an addition in our Public Cloud image builds. > > I think that we need a productive argumentation here. And I thought we were having that for the most part. (My contribution to color commentary) > All engineering > decisions are based on a trade-off. Some times the trade-off do not pay by > itself, but I really think that this is not the case. Or at least your > arguments so far do not point in this direction. Sorry, I am not following you here, are you dismissing the example of paid CPU for large test cases as invalid? If that is the case, then yes we are not having a productive discussion :( . > > If / when the use case is such that the trade-off between space and cpu is so > critical (scenario that is not the one that you described, I am sorry), the > opportunities for optimization are in a different place. And we need to > address those too. > > In a classical use case the savings in space in the different places are > justified IMHO, and the amortization of the pyc generation will be justified > in less than 2 seconds after the service is running for the first time. > > In a cloud use case the user can boot the image, prepare it (avoiding the > penalty of the resize step and other details), update it and save it as a > volume that can be cloned those 2000 times, completely avoiding any > penalization for the pyc generation too. > This is not the predominant use case in the Public Cloud, based on information we have from our partners. I would appreciate if, rather than proposing solutions about "how to change people's habits", which almost never work, we'd discuss solutions that can meet the needs of things being pointed out as topics to consider in arriving at the solution. Later, Robert -- Robert Schweikert MAY THE SOURCE BE WITH YOU Distinguished Architect LINUX Team Lead Public Cloud rjschwei(a)suse.com IRC: robjo

1 0

[opensuse-packaging] Re: [opensuse-factory] Proposal to remove pyc/pyo from Python on TW
by Robert Schweikert 08 Oct '18

08 Oct '18

On 10/8/18 7:35 AM, Alberto Planas Dominguez wrote: > On Saturday, October 6, 2018 11:24:46 AM CEST Robert Schweikert wrote: >> On 10/5/18 4:23 AM, Alberto Planas Dominguez wrote: > > [...] > >> Can you please formulate your goals concisely and stick to it? Are we >> back to discussing side effects? This is confusing. > > Uhmm. Is too hard to understand that the proposal is to remove the pyc, > because this is doubling the size of the Python stack? No this has nothing to do with the proposal. Paraphrasing: You started out by stating that the problem you are trying to solve is the size of images, namely JeOS and MicroOS, at least that is the way I read the initial mail. Then you proposed that one way to do this is to drop pyc from the packages. And I am not denying that this may be one approach. However, this is not the only way to get there, as this discussion has shown. Further this discussion has shown that the relationship between rpm size and image size is tangential. Along the way you have drifted to pushing "rpm size" as an important topic rather than sticking to the original problem definition of "image size" as the problem to be solved. > Doing this have some > good benefits, and also some bad stuff. The good stuff are related with less > size used on RPMs and disk and the implications of this, and the bad stuff are > maybe related with security and a penalty the first time the Python code runs. > > If those side effects confuse you I am not sure how to achieve an informed > decision without analyzing those. The side effects are not confusing, I am trying to figure out which of the problems, image size or rpm size is the more important one to you to be solved. > > In any case let me be clear: my goal is to decrease the size of the Python > stack, and my proposal is removing the pyc from the initial first install, > backporting a feature from 3.8 to have the pyc in a different file system. OK, this is different from the original e-mail where it was implied that "image size" was the primary target. Thus marking things as %artifact in RPM wouldn't really help with this goal as the package size would remain the same or might grow slightly as there would be more metadata in the package. > > The backported code is this one: > > https://build.opensuse.org/package/view_file/home:aplanas:Images/python3/ > bpo-33499_Add_PYTHONPYCACHEPREFIX_env_var_for_alt_bytecode.patch?expand=1 > > I tested it and it works. > > [...] > >> If your goal is to reduce the size of the Python packages then we >> probably need a different solution compared to a goal that produces a >> smaller image size when Python is part of an image. > > I am open to read about other alternatives to make the Python stack size > smaller. I can see only two: remove the pyc (and delegating the creation of > them on a different file system during the first execution), and analyzing all > the Requirements to be sure that there are not extra subtrees installed that > are not needed. > > IMHO both are needed, but my proposal was only about how to use a new feature > from 3.8 to achieve a good compromise of speed / size when the pycs are > removed from the RPM > >>>>> But we want to include a bit of Python in there, like salt-minion or >>>>> cloud-init. And now the relative size of Python is evident. >>>> >>>> Well, especially for cloud-init at the last couple of get together >>>> events of upstream contributors start up time for cloud-init was a big >>>> discussion point. A lot of effort has gone into making cloud-init >>>> faster. The results of this effort would be eliminated with such a move. >>> >>> I plan to measure this. The first boot can be slower, but I am still not >>> able to have numbers here. This argument can be indeed relevant and make >>> the proposal a bad one, but by far I do not think that the big chunk of >>> time goes under the pyc generation in the cloud-init case, as there are >>> more architectural problems in that. >> >> Well I think the common agreement is that pyc generation is pretty slow. > > Citation needed. You are reading this thread, right? At least 2 other people that are part of the discussion mentioned this as a concern. But OK, I should have used "sentiment" rather than "agreement" in my statement. > > My tests give in my machine a 6.08MB/s of compilation speed. I tested it > installing django with python 3.6 in a venv and doing this: > > # To avoid measure the dir crawling > # find . -name "*.py" > LIST > # time python -m compileall -f -qq -i LIST > > real 0m1.406s > user 0m1.257s > sys 0m0.148s > > # du -hsb > 44812156 . > > # find . -name "__pycache__" -exec rm -rf {} \; > # du -hsb > 35888321 . > > (44812156 - 35888321) / 1.4 ~= 6.08 MB/s > >> But lets put some perspective behind that and look at data rather than >> taking common believes as facts. >> >> On a t2.micro instance in AWS, running the SUSE stock SLES 15 BYOS >> image. The instance was booted (first boot), then the cloud-init cache >> was cleared with >> >> # cloud-init clean >> >> then shutdown -r now, i.e. a soft reboot of the VM. >> >> # systemd-analyze blame | grep cloud >> 6.505s cloud-init-local.service >> 1.013s cloud-config.service >> 982ms cloud-init.service >> 665ms cloud-final.service >> >> All these services are part of cloud-init >> >> Clear the cloud-init cache so it will re-run >> # cloud-init clean >> >> Clear out all Python artifacts: >> >> # cd / >> # find . -name '__pycache__' | xargs rm -rf >> # find . -name '*.pyc' | xargs rm >> # find . -name '*.pyo' | xargs rm >> >> This should reasonably approximate the state you are proposing, I think. >> Reboot: >> >> # systemd-analyze blame | grep cloud >> 7.469s cloud-init-local.service >> 1.070s cloud-init.service >> 976ms cloud-config.service >> 671ms cloud-final.service >> >> so a 13% increase for the runtime of the cloud-init-local service. And >> this is just a quick and dirty test with a soft reboot of the VM. Number >> would probably be worse with a stop-start cycle. I'll leave that to be >> dis-proven for those interested. > > This is a very nice contribution to the discussion. > > I tested it in engcloud and I have a 9.3% of overload during the boot. It > spend 0.205s to create the initials pyc needed for cloud-init: That would not be sufficient, all pycs in the dependency tree would need to be generated, you cannot just measure the creation of the cloud-init pyc files. cloud-init is going to be one of, if not the first Python processes running in the boot sequence, which implies that no pyc files exist for the cloud-init dependencies. > > * With pyc in place > > # systemd-analyze blame | grep cloud > 1.985s cloud-init-local.service > 1.176s cloud-init.service > 609ms cloud-config.service > 531ms cloud-final.service > > * Without pyc in place > > # systemd-analyze blame | grep cloud > 2.190s cloud-init-local.service > 1.165s cloud-init.service > 844ms cloud-config.service > 528ms cloud-final.service > > The sad thing is that the __real__ first boot is a bit worse: > > * First boot. with pyc in place > > # systemd-analyze blame | grep cloud > 36.494s cloud-init.service > 2.673s cloud-init-local.service > 1.420s cloud-config.service > 730ms cloud-final.service > > Comparing to this real first boot, the pyc cost generation represent the 0.54% > for cloud-init (not in relation with the total boot time). We can ignore it, > as I guess that the images used for EC2 will have some tweaks to avoid the > file system resize, or some other magic that makes the boot more similar to > the second boot. First boot execution of cloud-init is also significantly slower in the Public Cloud. However, not as bad as in your example. In any case your second comparison appears to making a leap that I, at this point do not agree with. You are equating the generation of pyc code in a "hot system" to the time it takes to load everything in a "cold system". A calculation of percentage contribution of pyc creation in a "cold system" would only be valid if that scenario were tested. Which we have not done, but would certainly not be too difficult to test. > > Once the pycs are generated they will be reused, so the 0.205s of penalty are > amortized in the second and subsequent boots. We still store the pyc in /var/ > cache. > > In any case, 0.205s is not so big for a 15.187 total boot time that this > instance have for each new reboot, as the boot time is dominated by other > factors as wicked and other services. > > The image is still in engcloud, is an SLE 12 SP3 under the name 'aplanas- > test'. Feel free to access it (send me your public key to have ssh access > there) to double check my data. > >>>> Well it is not just the install. We would be penalizing every user with >>>> a start up time penalty to save 91M, sorry that appears to me as an >>>> optimization for the corner case at the expense of the most common path. >>> >>> I do not see the penalization, sorry. >> >> Well I'd say the penalty is shown above, 13% in one particular example. >> This or worse would hit our users every time they start a new instance >> in AWS, GCE, Azure, OpenStack,..... > > The cost is amortized, and the corner case, IMHO, is more yours than mine. > Your case is a fresh boot of a just installed EC2 VM. I agree that there is a > penalty of ~10% (or a 0.54% in my SLE12 SP3 OpenStack case), but this is only > for this first boot. Which is a problem for those users that start a lot of instances to throw them away and start new instances the next time they are needed. This would be a typical autoscaling use case or a typical test use case. It is relatively easy to calculate a cost for this with some estimates. If my test for my application needs 2000 (an arbitrary number I picked) test instances, and every test instance takes .2 seconds longer to boot, to use your number, than the total time penalty is ~6.7 seconds. If this test uses an instance type that costs me $10 per hour the slow down costs me ~ $1.1 every time I run my test. So if the test case runs once a week it would amount to ~$57 per year. If we go with the 1 second penalty in my example it's going to be more expensive. > > But booting just-created VM is hardly the normal use case. > >>> The proposal is not to wipe out pyc >> >> The way I read your proposal was to eliminate py{c,o} from the packages, >> i.e. we have to byte-compile when any python module is used >> >>> and >>> use -B when calling the Python code, is about moving the pyc generation in >>> / var/cache (or some other cache place) and delay the pyc generation >>> until the first boot. >> >> OK, this part of the statement seems in line with my understanding of >> your proposal. >> >> You say "first-boot" are you implying a process that does a system-wide >> byte compilation of all installed Python code? > > No, the first time that the Python code runs. > > This strategy have good results as not all the Python code is loaded when a > service is running. For example, in the Django venv scenario for this email, > the initial size of the venv was 57MB, after removing all the pyc from site- > packages I had a venv of 45MB. If I create a new Django application (with > database access, models and views) I have a venv of 47MB, so only 2MB of pyc > are generated during run time, as I propose. You still save 10MB of space > without sacrificing run-time speed. > >> Or do you mean "module load" when you say "first boot", i.e. the >> byte-compilation takes place when a Python module is loaded for the >> first time? The effect of this is shown in the above example. I have an >> issue with a 13% drop in performance for every user on initial start up. >> >> This is penalizing the majority to cover one specific use case. Sorry it >> is hard for me to see this any other way. > > Again, hardly booting fresh VMs is a majority here. That is an assumption on your part, from my perspective. Data I can produce has no place on this list, sorry, so I will share in private e-mail when I have it. > > [...] > >>>>> What do you think if I follow this path? >>>> >>>> I oppose this path. We'd be penalizing every start up of every instance >>>> of EC2. We have feature requests to improve our boot performance and >>>> this is counter acting our efforts. >>> >>> Not true, as the cache will be populated after the first boot. >> >> How is my statement not true? > > How maintaining a a /var/cache is going to penalize every start up of every > instance of EC2? Every time I start a new instance the cache has to be created as no pyc files would be in the image. If we pursue the approach of multiple packages, as suggested in one or two messages in this thread, then we could build Public Cloud images with pyc files included. > That is not true, again. You are populating /var/cache with > the modules used the first time. Yes, and the cache is initially empty and has to be filled. > Subsequent boots will not be penalized. I am not talking about subsequent boots of a stopped instance. > This > is an amortization case, so at the end, there is no penalty. No sorry, it is not free. If you have to pay a $5 parking ticket because you forgot to put money in the meter it doesn't matter how many times you park at the meter and put money in, you will always have paid the $5, you're not getting the money back. If cloud-init is 10% slower, an assumption as we have no data at this point for a cold start, on first boot, it is 10% slower. We are not getting that time back, it is time spent and lost. Later, Robert -- Robert Schweikert MAY THE SOURCE BE WITH YOU Distinguished Architect LINUX Team Lead Public Cloud rjschwei(a)suse.com IRC: robjo

1 0