[opensuse-packaging] python .pyc packaging
Via https://en.opensuse.org/openSUSE:Reproducible_Builds I found that when we build python packages like python-amqp or python-binplist it contains a .pyc file for every .py file and for every build these .pyc files differ, because they contain the timestamp of the corresponding source file and for some source files this is the time of build. http://rb.zq1.de/compare.factory-20170208/python-amqp.html#content http://rb.zq1.de/compare.factory-20170208/python-binplist.html#content I was wondering how to best get those to build bit-by-bit identical rpms. I assume, we want to keep the concept of .pyc files, since they provide some performance gain (e.g. I measured 'openstack --help' taking only 1.5 seconds with .pyc files versus 2.5 seconds without (on another machine it was 12 vs 13 seconds)) But why do we have to ship .pyc files as part of our binary rpms? They waste disk space and bandwidth for our mirrors and users. They could be created in a %post or %posttrans hook when installing the rpm (or do they need special build deps?) It might even be, that compiling them on the destination is faster than transferring and unpacking the LZMA compressed version. The less intrusive alternative approach would be to touch .py files to a constant older date (e.g. $SOURCE_DATE_EPOCH if set) before generating the .pyc files. What do you think which way to go? Ciao Bernhard M. -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org
On Friday 2017-02-17 06:33, Bernhard M. Wiedemann wrote:
I assume, we want to keep the concept of .pyc files, since they provide some performance gain[...] But why do we have to ship .pyc files as part of our binary rpms? They waste disk space and bandwidth for our mirrors and users. They could be created in a %post or %posttrans hook when installing the rpm (or do they need special build deps?)
- It could prolong the installation time. - rpm -qi's Size field is further away from the real installation size ("yast said it would take 1.2GB now it's 2.0.."-kind of thing) - Creating them in %post, i.e. directly on the end-user system, in a way defeats the purpose of a precompiled distribution.
It might even be, that compiling them on the destination is faster than transferring and unpacking the LZMA compressed version.
Feel free to take numbers on a 32-bit Raspberry :-p -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org
On 02/17/2017 05:16 PM, Jan Engelhardt wrote:
On Friday 2017-02-17 06:33, Bernhard M. Wiedemann wrote:
I assume, we want to keep the concept of .pyc files, since they provide some performance gain[...] But why do we have to ship .pyc files as part of our binary rpms? They waste disk space and bandwidth for our mirrors and users. They could be created in a %post or %posttrans hook when installing the rpm (or do they need special build deps?)
- It could prolong the installation time. - rpm -qi's Size field is further away from the real installation size ("yast said it would take 1.2GB now it's 2.0.."-kind of thing) - Creating them in %post, i.e. directly on the end-user system, in a way defeats the purpose of a precompiled distribution.
The other reason is if you don't package them, you need to add specific code in the %postun to check if they were created (the user may install and never run them) then remove them if the exist. Making sure this happens for each pyc file is often alot more effort then just packaging them especially if subdirectories are involved.
It might even be, that compiling them on the destination is faster than transferring and unpacking the LZMA compressed version.
Feel free to take numbers on a 32-bit Raspberry :-p
-- Simon Lees (Simotek) http://simotek.net Emergency Update Team keybase.io/simotek SUSE Linux Adelaide Australia, UTC+10:30 GPG Fingerprint: 5B87 DB9D 88DC F606 E489 CEC5 0922 C246 02F0 014B
Moin, On Feb 17, 17 05:33:02 +0000, Bernhard M. Wiedemann wrote:
Via https://en.opensuse.org/openSUSE:Reproducible_Builds
I found that when we build python packages like python-amqp or python-binplist
it contains a .pyc file for every .py file and for every build these .pyc files differ, because they contain the timestamp of the corresponding source file and for some source files this is the time of build.
http://rb.zq1.de/compare.factory-20170208/python-amqp.html#content
http://rb.zq1.de/compare.factory-20170208/python-binplist.html#content
I was wondering how to best get those to build bit-by-bit identical rpms.
I assume, we want to keep the concept of .pyc files, since they provide some performance gain (e.g. I measured 'openstack --help' taking only 1.5 seconds with .pyc files versus 2.5 seconds without (on another machine it was 12 vs 13 seconds))
Hm, my understanding was that the .pyc files have just an influence on loading times, not on run-times of python programs. So I assume for soemthing like 'openstack --help' it's significant, while something that runs longer it may be less so.
But why do we have to ship .pyc files as part of our binary rpms? They waste disk space and bandwidth for our mirrors and users.
I sometimes hear "disk space is cheap", but trying to do a rather small installation with python-packages is difficult atm. E.g. just adding sal-minion to a small installed system results in several dozend MB just for the .pyc files.
They could be created in a %post or %posttrans hook when installing the rpm (or do they need special build deps?) It might even be, that compiling them on the destination is faster than transferring and unpacking the LZMA compressed version.
Frankly speaking, they could be provided in a separate package, if really needed. I would not generate them in the packages. But "I am not a Python expert, just a user" :), so maybe there is a significant reason to have them always available? Just to have shorter startup-time of Python-programs doesn't sound like a valid reason to have both .py and .pyc. On the other hand, why not remove the .py files and just keep the .pyc? The .py files could be still in the .src-rpms, for those who need/want them. That would still give one the advantages of the .pyc, without wasting space. ciao, Stefan -- Stefan Behlert, SUSE LINUX Maxfeldstr. 5, D-90409 Nuernberg, Germany Phone +49-911-74053-173 SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org
* Bernhard M. Wiedemann <bernhardout@lsmod.de> [Feb 17. 2017 06:33]:
But why do we have to ship .pyc files as part of our binary rpms?
I'd rather ask why we have to ship *source code* (.py files) as part of our binary rpms ? They waste much more space since .py files usually include full documentation. https://hackweek.suse.com/15/projects/1244 showed that e.g. for Salt and its dependencies, stripping the source almost halved(!) the package size. From ~33 to ~18 MB. Klaus -- SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org
On Fri, 17 Feb 2017, Klaus Kaempf wrote:
* Bernhard M. Wiedemann <bernhardout@lsmod.de> [Feb 17. 2017 06:33]:
But why do we have to ship .pyc files as part of our binary rpms?
I'd rather ask why we have to ship *source code* (.py files) as part of our binary rpms ?
They waste much more space since .py files usually include full documentation.
https://hackweek.suse.com/15/projects/1244 showed that e.g. for Salt and its dependencies, stripping the source almost halved(!) the package size. From ~33 to ~18 MB.
From looking at package rebuilds I do remember seeing changing python
OTOH I remember .pyc files are not 100% portable across python versions. I really wonder why there's not some /var/cache/python-X.Y.Z where .pyc files are created and cached on-demand (and that cache configurable to not exist). Debian compiles to .pyc at install time IIRC and re-compiles them eventually on python package upgrates. triggering all -python packages to be rebuilt (when ideally when just shipping .py files no build would be involved). Richard. -- Richard Biener <rguenther@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg) -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org
Am 17.02.2017 um 11:12 schrieb Richard Biener:
From looking at package rebuilds I do remember seeing changing python triggering all -python packages to be rebuilt (when ideally when just shipping .py files no build would be involved).
Why would we prefer every user compiling over us building once? If the pyc files are not compatible with all python versions, we need to have a require on the python abi - so we know when to rebuild. Greetings, Stephan -- Ma muaß weiterkämpfen, kämpfen bis zum Umfalln, a wenn die ganze Welt an Arsch offen hat, oder grad deswegn. -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org
On Fri, 17 Feb 2017, Stephan Kulow wrote:
Am 17.02.2017 um 11:12 schrieb Richard Biener:
From looking at package rebuilds I do remember seeing changing python triggering all -python packages to be rebuilt (when ideally when just shipping .py files no build would be involved).
Why would we prefer every user compiling over us building once? If the pyc files are not compatible with all python versions, we need to have a require on the python abi - so we know when to rebuild.
Sure. Just as was mentioned if we ship .pyc why do we ship .py files? Eventually it's about choice but then we can as well split packages into python-X-py and python-X-pyc both providing python-X (or some other way). The incompatibility thing was just something I remember from the past, it may be no longer true (python may just silently fall back to reading the .py file if the .pyc is incompatible). Richard.
Greetings, Stephan
-- Richard Biener <rguenther@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg) -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org
Am 17.02.2017 um 11:52 schrieb Richard Biener:
On Fri, 17 Feb 2017, Stephan Kulow wrote:
Am 17.02.2017 um 11:12 schrieb Richard Biener:
From looking at package rebuilds I do remember seeing changing python triggering all -python packages to be rebuilt (when ideally when just shipping .py files no build would be involved).
Why would we prefer every user compiling over us building once? If the pyc files are not compatible with all python versions, we need to have a require on the python abi - so we know when to rebuild.
Sure. Just as was mentioned if we ship .pyc why do we ship .py files? Eventually it's about choice but then we can as well split packages into python-X-py and python-X-pyc both providing python-X (or some other way).
Then why stop at python? Possibly we should make a choice if people want to download binaries or compile their C code themselves? I'm sure you will find other distributions as reference :) Greetings, Stephan -- Ma muaß weiterkämpfen, kämpfen bis zum Umfalln, a wenn die ganze Welt an Arsch offen hat, oder grad deswegn. -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org
On Friday 2017-02-17 12:01, Stephan Kulow wrote:
Why would we prefer every user compiling over us building once?
Sure. Just as was mentioned if we ship .pyc why do we ship .py files? Eventually it's about choice but then we can as well split packages into python-X-py and python-X-pyc both providing python-X (or some other way).
Then why stop at python? Possibly we should make a choice if people want to download binaries or compile their C code themselves? I'm sure you will find other distributions as reference :)
Tempting proposal. In fact, so tempting that we could just.. zypper() { if [ "$1" != "emerge" ]; then command zypper "$@"; return $?; fi shift; for i in "$@"; do extra_optflags="-march=native" \ bcond_withs=<(cat /etc/use_flags) \ osc --dont-ask-for-username build "openSUSE:Leap:42.2/$i" mv /var/tmp/.../*rpm /tmp/collect/ -f done command zypper in /tmp/collect/*.rpm } It's trivialized here for brevity, but shows the picture (that, to me, also looks a lot easier than portage/emerge.) -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org
On 2017-02-17 11:14, Stephan Kulow wrote:
Why would we prefer every user compiling over us building once?
a) it is not the users but their machines compiling in %post or such b) it is very fast I did a quick benchmark on a 2.1GHz CPU via https://gist.github.com/bmwiedemann/2db103cda98d9c750ff27e3f92f67e37 which found that compiling 400000 lines or 14MB worth of python source files created 14MB .pyc files, within 1.6 seconds. Compressing those into a tar.xz of 2.7MB took 6.7s and uncompressing took 0.2 seconds. Now, you might think that users save 1.4 seconds when uncompressing the precompiled .pyc files, but they also have to download them first, which makes it only be faster when they have download speeds of >2MByte/s per machine (=16MBit/s per machine) which unfortunately is not true everywhere, especially in Germany aka Internet-Neuland. on slower CPUs the balance might be better for precompiling, though. -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org
Hi and now compare this to what Klaus suggested and ship only .pyc files: - no compiling - smaller package => less download time as well. Sounds like that would be the fastest one. Additionally no trouble with cleaning up these files. Remember all these locally compiled files would be "not owned by any package". Or we need to generate %ghost entries for all of them. Am Freitag, 17. Februar 2017, 12:56:24 schrieb Bernhard M. Wiedemann:
On 2017-02-17 11:14, Stephan Kulow wrote:
Why would we prefer every user compiling over us building once?
a) it is not the users but their machines compiling in %post or such
b) it is very fast
I did a quick benchmark on a 2.1GHz CPU via https://gist.github.com/bmwiedemann/2db103cda98d9c750ff27e3f92f67e37
which found that compiling 400000 lines or 14MB worth of python source files created 14MB .pyc files, within 1.6 seconds. Compressing those into a tar.xz of 2.7MB took 6.7s and uncompressing took 0.2 seconds.
Now, you might think that users save 1.4 seconds when uncompressing the precompiled .pyc files, but they also have to download them first, which makes it only be faster when they have download speeds of >2MByte/s per machine (=16MBit/s per machine) which unfortunately is not true everywhere, especially in Germany aka Internet-Neuland.
on slower CPUs the balance might be better for precompiling, though.
-- Regards Michael Calmer -------------------------------------------------------------------------- Michael Calmer SUSE LINUX GmbH, Maxfeldstr. 5, D-90409 Nuernberg T: +49 (0) 911 74053 0 F: +49 (0) 911 74053575 - e-mail: Michael.Calmer@suse.com -------------------------------------------------------------------------- SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org
hello, replying to all comments so far 1. reason for existence of .pyc Startup times, plain and simple. This does not matter in apps that run for a long time, but it matters a lot for command line utilities. Importantly, .pyc (and __pycache__ directories) is considered a *cache*. The source code is the primary thing. The fact that python can run purely off that cache is basically an implementation detail. 2. dropping .pyc and compiling on installation That's certainly possible (Debian does something of the sort) and has even some interesting advantages: it allows you to transparently support many different python versions simultaneously with just one package install. Of course, it would be a lot of work. Doing this manually in every package is not realistic, we'd need some sort of automation that does this for a package in a single macro, or maybe a "python-central" tool that you run against the filelist in %post, or something... You would also have to %ghost all the .pyc files, otherwise all you get is a big ugly mess of not-owned files. The singlespec macro set could help with this, possibly even creating the %post/%preun scriptlets and %ghost entries automatically in packages. This requires further discussion on the tradeoffs, probably at least a rudimentary install time benchmark, but I'm not opposed to including something like this. 3. dropping .pyc without replacement That could be reasonable in some packages, if we know that they are long-running and the startup time difference is negligible to users... which is up to the individual maintainer, i suppose. It's probably a bad idea for libraries, which tend to be used by command-line tools, where startup time matters. 4. reproducible builds For now, where it matters, let's touch generated .py files with a set time (probably mtime of tarball?). I suspect this is not an issue in most packages, so automation doesn't really help? But I'd be happy to learn more, e.g. see packages that don't build reproducibly and check out why that happens. OTOH, doesn't rpm store mtime? Will a build with a regenerated file count as same as the previous build, if contents of the file are unchanged but metadata are? 5. dropping .py That's a big NO. First of all, it's simply not worth it. On my work machine, which has a higher-than-usual amount of pythonic packages, the grand total size of all "*.py" files is 208 MB. Out of a 7.6GB /usr. Salt itself, which I installed for the purpose of this experiment, is 17 MB of that. Second, users would kill us and I personally would be one of them. As I noted, .pyc is a *cache* for the primary source, which is the .py file. For python (and other languages that run from source), the automatic presence of source code and possibility of instant modification (for local patching, debugging, etc.) is a big advantage. Not installing .py files by default breaks all sorts of conventions, user expectations, goes against the spirit of open source, and is downright power-user-hostile, all in the name of saving space that's negligible on a typical system. We don't want to be That Distro (at least as far as I'm concerned) For usecases where additional 15 MB matter (such as salt-minion? I have no idea what kind of system is the target), this might be a reasonable step. But you should also go further and install the whole thing as a zip file (which python can import transparently if added to sys.path); most of it would be symbol names, which are retained in .pyc, and here the zip compression helps a lot. Speaking of: whole salt is 32 MB, zipped up is 8.6 MB, zipped only *.py is 3.7 MB, zipped only *.pyc is 4.9 MB. Make your own tradeoff. You can go even further and vendor-include the dependent packages in the zip file. Make the build process take the packages from their installed locations, to pick up all updates and security fixes on rebuild. Sounds like a good hackweek project. (also, if 15 MB matter to you, maybe don't base your software on a language with a 50MB stdlib of which you need maybe a third, if that... or on a language where the sizes of source code and compiled objects are comparable ;) ) regards m. On 17.2.2017 06:33, Bernhard M. Wiedemann wrote:
Via https://en.opensuse.org/openSUSE:Reproducible_Builds
I found that when we build python packages like python-amqp or python-binplist
it contains a .pyc file for every .py file and for every build these .pyc files differ, because they contain the timestamp of the corresponding source file and for some source files this is the time of build.
http://rb.zq1.de/compare.factory-20170208/python-amqp.html#content
http://rb.zq1.de/compare.factory-20170208/python-binplist.html#content
I was wondering how to best get those to build bit-by-bit identical rpms.
I assume, we want to keep the concept of .pyc files, since they provide some performance gain (e.g. I measured 'openstack --help' taking only 1.5 seconds with .pyc files versus 2.5 seconds without (on another machine it was 12 vs 13 seconds))
But why do we have to ship .pyc files as part of our binary rpms? They waste disk space and bandwidth for our mirrors and users. They could be created in a %post or %posttrans hook when installing the rpm (or do they need special build deps?) It might even be, that compiling them on the destination is faster than transferring and unpacking the LZMA compressed version.
The less intrusive alternative approach would be to touch .py files to a constant older date (e.g. $SOURCE_DATE_EPOCH if set) before generating the .pyc files.
What do you think which way to go?
Ciao Bernhard M.
jan matejek <jmatejek@suse.com> writes:
5. dropping .py That's a big NO.
I just wanted to add my voice to this and agree wholeheartedly. As a developer, this would make the packaged python modules completely useless to me, as stepping into them with a debugger would no longer work. I would strongly recommend against removing .py files from the packages. Cheers, Kristoffer -- // Kristoffer Grönlund // kgronlund@suse.com -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org
* Kristoffer Grönlund <kgronlund@suse.com> [Feb 17. 2017 16:59]:
jan matejek <jmatejek@suse.com> writes:
5. dropping .py That's a big NO.
I just wanted to add my voice to this and agree wholeheartedly. As a developer, this would make the packaged python modules completely useless to me, as stepping into them with a debugger would no longer work.
Then you just install the python-X-source package. That could even be automated in zypper.
I would strongly recommend against removing .py files from the packages.
I would strongly recommend for shipping the kernel source inside the kernel-default package. SCNR ;-) Klaus -- SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org
On Freitag, 17. Februar 2017 20:08:53 CET Klaus Kaempf wrote:
* Kristoffer Grönlund <kgronlund@suse.com> [Feb 17. 2017 16:59]:
jan matejek <jmatejek@suse.com> writes:
5. dropping .py That's a big NO.
I just wanted to add my voice to this and agree wholeheartedly. As a developer, this would make the packaged python modules completely useless to me, as stepping into them with a debugger would no longer work.
Then you just install the python-X-source package. That could even be automated in zypper.
Maybe add Recomends: packageand(python-X, python-sources), so its easy to install sources for all installed python-packages and restore the current status quo. Option 5) actually said dropping sources, which is IMHO either badly worded or just a bad idea. Splitting out the sources (.py) from the bytecode (.pyc) would allow to remove any redundant data and have fast startup times. Especially for containers and small devices (RPi and alike) this would be a clear win. Probably most installed systems would never need the sources, and even developers will likely only need the sources of a few packages. Hands up, who has debuginfo/ debugsource installed for *all* their installed packages?
I would strongly recommend against removing .py files from the packages.
I would strongly recommend for shipping the kernel source inside the kernel-default package. SCNR ;-)
Don't forget to put the compiler and linker into the initrd, so we get a freshly compiled kernel every time the system boots. ;-) Kind regards, Stefan -- Stefan Brüns / Bergstraße 21 / 52062 Aachen home: +49 241 53809034 mobile: +49 151 50412019 work: +49 2405 49936-424 -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org
On Fri, Feb 17, 2017 at 3:00 PM, Stefan Bruens <stefan.bruens@rwth-aachen.de> wrote:
On Freitag, 17. Februar 2017 20:08:53 CET Klaus Kaempf wrote:
* Kristoffer Grönlund <kgronlund@suse.com> [Feb 17. 2017 16:59]:
jan matejek <jmatejek@suse.com> writes:
5. dropping .py That's a big NO.
I just wanted to add my voice to this and agree wholeheartedly. As a developer, this would make the packaged python modules completely useless to me, as stepping into them with a debugger would no longer work.
Then you just install the python-X-source package. That could even be automated in zypper.
Maybe add Recomends: packageand(python-X, python-sources), so its easy to install sources for all installed python-packages and restore the current status quo.
Option 5) actually said dropping sources, which is IMHO either badly worded or just a bad idea.
Splitting out the sources (.py) from the bytecode (.pyc) would allow to remove any redundant data and have fast startup times. Especially for containers and small devices (RPi and alike) this would be a clear win. Probably most installed systems would never need the sources, and even developers will likely only need the sources of a few packages. Hands up, who has debuginfo/ debugsource installed for *all* their installed packages?
As others have said, Python is an interpreted language, not a compiled language like C. The .py files are the code that is executed. The .pyc files are just an optional cache. In fact recent versions of python put them in a __pycache__ directory. More seriously, .pyc files are not intended for stand-alone use and Python is not designed to work this way. Python allows a lot of tinkering with its internals, so I would not count on the .pyc files even working reliably on their own, and the bugs that are introduced could be rare and hard to track down. This won't even work by default in recent versions of python. As I mentioned, these files go in the __pycache__ directory by default, and python does not look there for code to execute. So we would need to change python from the upstream default behavior just to make this proposal work at all. -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org
On Friday 2017-02-17 21:55, Todd Rme wrote:
As others have said, Python is an interpreted language, not a compiled language like C.
That's nonsense. Who says Python *has* to be interpreted? Who says C *has* to be compiled? Python, like C, each is a language (with a more or less large standard library behind it).. I have yet to see a language that cannot be compiled - it most likely would be some esoteric one. -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org
On Fri, Feb 17, 2017 at 5:30 PM, Jan Engelhardt <jengelh@inai.de> wrote:
On Friday 2017-02-17 21:55, Todd Rme wrote:
As others have said, Python is an interpreted language, not a compiled language like C.
That's nonsense. Who says Python *has* to be interpreted? Who says C *has* to be compiled? Python, like C, each is a language (with a more or less large standard library behind it).. I have yet to see a language that cannot be compiled - it most likely would be some esoteric one.
I guess in principle it *might* be possible to make a python ahead-of-time compiler, or even a generally-usable python JIT (existing JITs only work in special cases), but so far all attempts to do so have failed. Python is just too dynamic a language for this to be feasible. -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org
On Sunday 2017-02-19 16:19, Todd Rme wrote:
On Fri, Feb 17, 2017 at 5:30 PM, Jan Engelhardt <jengelh@inai.de> wrote:
On Friday 2017-02-17 21:55, Todd Rme wrote:
As others have said, Python is an interpreted language, not a compiled language like C.
That's nonsense. Who says Python *has* to be interpreted? Who says C *has* to be compiled? Python, like C, each is a language (with a more or less large standard library behind it).. I have yet to see a language that cannot be compiled - it most likely would be some esoteric one.
I guess in principle it *might* be possible to make a python ahead-of-time compiler, or even a generally-usable python JIT (existing JITs only work in special cases)
do so have failed. Python is just too dynamic a language for this to be feasible.
The very existence of Python JITs goes counter to your claims. It may not necessarily be fast, but it's possible: """Every language can be mapped to another language. If not, then the language cannot really run on computers. Thus every language can technically be compiled. And, since any compiled program can be written in the form "interpret the act of compiling the program, then interpret the result," every program can be interpreted as well.""" - http://softwareengineering.stackexchange.com/a/262286 -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org
On Sun, Feb 19, 2017 at 12:44 PM, Jan Engelhardt <jengelh@inai.de> wrote:
On Sunday 2017-02-19 16:19, Todd Rme wrote:
On Fri, Feb 17, 2017 at 5:30 PM, Jan Engelhardt <jengelh@inai.de> wrote:
On Friday 2017-02-17 21:55, Todd Rme wrote:
As others have said, Python is an interpreted language, not a compiled language like C.
That's nonsense. Who says Python *has* to be interpreted? Who says C *has* to be compiled? Python, like C, each is a language (with a more or less large standard library behind it).. I have yet to see a language that cannot be compiled - it most likely would be some esoteric one.
I guess in principle it *might* be possible to make a python ahead-of-time compiler, or even a generally-usable python JIT (existing JITs only work in special cases)
do so have failed. Python is just too dynamic a language for this to be feasible.
The very existence of Python JITs goes counter to your claims.
As I said, "existing JITs only work in special cases". There is no existing JIT that can compile the full range of python's capabilities.
It may not necessarily be fast, but it's possible:
I didn't say it wasn't "possible", I said it wasn't "feasible."
"""Every language can be mapped to another language. If not, then the language cannot really run on computers. Thus every language can technically be compiled. And, since any compiled program can be written in the form "interpret the act of compiling the program, then interpret the result," every program can be interpreted as well.""" - http://softwareengineering.stackexchange.com/a/262286
If you read further down, the post is saying the exact same thing I am saying. This discussion is all academic, though. The normal and expected way to distribute Python code includes the source, the Python standard library supports directly accessing the source code, and Python packages are making use of that capability. The only reliable way to make sure that isn't happening is to manually check every .py file in the package (since they may not be directly calling the python standard library functions, or may be accessing the source directly). I hope that for special-purpose source-only distributions they are willing to take the time to verify that the packages they are using don't rely on the source code at least for code paths they are making use of. But we don't even have the manpower to keep all of our existing packages up-to-date. It is simply infeasible for us to check that each update of each package isn't introducing the use of this feature. And anything less than that risks shipping horribly broken code. -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org
On Freitag, 17. Februar 2017 15:55:51 CET Todd Rme wrote:
On Fri, Feb 17, 2017 at 3:00 PM, Stefan Bruens
<stefan.bruens@rwth-aachen.de> wrote:
On Freitag, 17. Februar 2017 20:08:53 CET Klaus Kaempf wrote:
* Kristoffer Grönlund <kgronlund@suse.com> [Feb 17. 2017 16:59]:
jan matejek <jmatejek@suse.com> writes:
5. dropping .py That's a big NO.
I just wanted to add my voice to this and agree wholeheartedly. As a developer, this would make the packaged python modules completely useless to me, as stepping into them with a debugger would no longer work.
Then you just install the python-X-source package. That could even be automated in zypper.
Maybe add Recomends: packageand(python-X, python-sources), so its easy to install sources for all installed python-packages and restore the current status quo.
Option 5) actually said dropping sources, which is IMHO either badly worded or just a bad idea.
Splitting out the sources (.py) from the bytecode (.pyc) would allow to remove any redundant data and have fast startup times. Especially for containers and small devices (RPi and alike) this would be a clear win. Probably most installed systems would never need the sources, and even developers will likely only need the sources of a few packages. Hands up, who has debuginfo/ debugsource installed for *all* their installed packages?
As others have said, Python is an interpreted language, not a compiled language like C. The .py files are the code that is executed. The .pyc files are just an optional cache. In fact recent versions of python put them in a __pycache__ directory.
Sorry to destroy your simplistic world view, but there are no compiled or interpreted languages. Source code gets parsed, optimized, stored in various intermediate forms, and finally transformed into machine specific bytecode (and later may be transformed again, e.g. current x86 CPUs, or translated again, e.g. qemu).
More seriously, .pyc files are not intended for stand-alone use and Python is not designed to work this way. Python allows a lot of tinkering with its internals, so I would not count on the .pyc files even working reliably on their own, and the bugs that are introduced could be rare and hard to track down.
Ever wondered what happens if there *are* pyc bytecode files? Python *stats* the python source file, looks for a corresponding bytecode file in the (python2: source; python3: __pycache__) directory, reads the bytecode file header, compares the header timestamp with the change time from the earlier stat call, and executes the byte code. The contents of the source file are *not* read at all. The valid question now is, what happens if there is no source file, to compare the timestamp against? This is completely valid from pythons perspective, called "sourceless" distribution, although the behaviour differs slightly between python2 and python3. Python2 just uses the python2 bytecode file from the source directory unconditionally. Python3 does not use the bytecode from the __pycache__ dir, but looks for a .pyc name file in the source directory. All this is btw. specified in PEP-3147: https://www.python.org/dev/peps/ pep-3147/#case-4-legacy-pyc-files-and-source-less-imports So, we actually would have: - a sources package, shared by python2 and python3 - a python2 bytecode package (usable standalone) - a python3 "sourceless" package, installing symlinks into the sources directory, conflicting with the python2 bytecode package) - a python3 bytecode package (depending on the sources or the sourceless package, not exclusive)
This won't even work by default in recent versions of python. As I mentioned, these files go in the __pycache__ directory by default, and python does not look there for code to execute. So we would need to change python from the upstream default behavior just to make this proposal work at all.
Tried it, it *does* work, and is specified by the python PEP linked above. Kind regards, Stefan -- Stefan Brüns / Bergstraße 21 / 52062 Aachen home: +49 241 53809034 mobile: +49 151 50412019 work: +49 2405 49936-424 -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org
On 17.02.2017 23:48, Stefan Bruens wrote:
Ever wondered what happens if there *are* pyc bytecode files? Python *stats* the python source file, looks for a corresponding bytecode file in the (python2: source; python3: __pycache__) directory, reads the bytecode file header, compares the header timestamp with the change time from the earlier stat call, and executes the byte code. The contents of the source file are *not* read at all.
"Executes it" It has be said that it gets executed only in the python virtual machine as it is not x86 bytecode. And then you have to know which python interpreter you use, there are more than one. At least I know CPython (the "default" one), Jython (uses the JVM), PyPy and then there are several other programs that are using python for scripting (some of them also are loading python libraries from system, so they would have been patched to work with pyc code). But is there one good point for dropping the python sources? Of cause you can save some space, but that would be about some MiB (on my system 24MiB), but which system does not have enough space to for that? Best regards, Ferdinand -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org
On Samstag, 18. Februar 2017 00:04:36 CET Ferdinand Thiessen wrote:
On 17.02.2017 23:48, Stefan Bruens wrote:
Ever wondered what happens if there *are* pyc bytecode files? Python *stats* the python source file, looks for a corresponding bytecode file in the (python2: source; python3: __pycache__) directory, reads the bytecode file header, compares the header timestamp with the change time from the earlier stat call, and executes the byte code. The contents of the source file are *not* read at all.
"Executes it"
It has be said that it gets executed only in the python virtual machine as it is not x86 bytecode. And then you have to know which python interpreter you use, there are more than one.
You *either* execute CPython 2 or 3, Jython or whatever by its name, depending on the $PATH, *or* you have the interpreter path specified in the toplevel script. At this point, the interpreter is fixed.
At least I know CPython (the "default" one), Jython (uses the JVM), PyPy and then there are several other programs that are using python for scripting (some of them also are loading python libraries from system, so they would have been patched to work with pyc code).
Currently we ship bytecode for every supported Python interpreter (for Tumbleweed, that is CPython 2.7 and CPython 3.5), accompanied *each time* by the source code. *If* the bytecode is available, the sources are *not* read. Your concern applies to the current situation already - can you tell me which files will be sourced by the current trivial script? --- #! /usr/bin/env python from time import sleep sleep(1) ---
But is there one good point for dropping the python sources? Of cause you can save some space, but that would be about some MiB (on my system 24MiB), but which system does not have enough space to for that?
As I already said, there are small devices, like Raspberry Pi, there are containers. It also saves download bandwidth on both the mirrors as the users machine. Regards, Stefan -- Stefan Brüns / Bergstraße 21 / 52062 Aachen home: +49 241 53809034 mobile: +49 151 50412019 work: +49 2405 49936-424 -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org
Hi Am Samstag, 18. Februar 2017, 01:03:00 CET schrieb Stefan Bruens:
On Samstag, 18. Februar 2017 00:04:36 CET Ferdinand Thiessen wrote:
On 17.02.2017 23:48, Stefan Bruens wrote: [...] But is there one good point for dropping the python sources? Of cause you can save some space, but that would be about some MiB (on my system 24MiB), but which system does not have enough space to for that?
As I already said, there are small devices, like Raspberry Pi, there are containers. It also saves download bandwidth on both the mirrors as the users machine.
It is not only this. Please think of datacenters with thousands of virtual machines where the virtual harddisks are placed on a storage. This storage has some raid level and admins are really fighting for every MB the OS do not need. Simply because multiplied by 1000 or 2000 some MB makes a difference in costs which need to be spend for the storage. And if we talk about other architectures where storages are extra expensive this hurt more. If I look at JeOS, I can see we even do not have the standard kernel installed but only the kernel-default-base just to save some MB. The next things are containers which become really popular these days. Every container only run 1 application but bring a python stack if the software is written in python. For Desktop systems some MB do not matter if you can choose between 1TB or 2TB harddisk. But there are other systems. And if python officially defines that using only .pyc is ok, I wonder why we should not split packages. For openSUSE we could install the .py files by default using the recommends or supplements trick. We do something similar with languages. You can select you want to have german translations and automatically all packages which include german translations are installed. -- Regards Michael Calmer -------------------------------------------------------------------------- Michael Calmer SUSE LINUX GmbH, Maxfeldstr. 5, D-90409 Nuernberg T: +49 (0) 911 74053 0 F: +49 (0) 911 74053575 - e-mail: Michael.Calmer@suse.com -------------------------------------------------------------------------- SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org
Michael Calmer wrote:
It is not only this. Please think of datacenters with thousands of virtual machines where the virtual harddisks are placed on a storage.
This storage has some raid level and admins are really fighting for every MB the OS do not need. Simply because multiplied by 1000 or 2000 some MB makes a difference in costs which need to be spend for the storage. And if we talk about other architectures where storages are extra expensive this hurt more.
Sorry, based on my experience with customers running mid-size data centers I don't buy this cost argument. And really large data centers (Google, Facebook etc.) will run their own Linux distribution with a partitioning scheme and file-system hierarchy allowing to mount read-only parts of the OS from central location for saving lots of storage space. Having said this: If you're really eager to optimize for small/reusable data storage for large data centers there are more promising measures to reduce storage size.
If I look at JeOS, I can see we even do not have the standard kernel installed but only the kernel-default-base just to save some MB.
But IIRC this was done because of initrd size.
The next things are containers which become really popular these days. Every container only run 1 application but bring a python stack if the software is written in python.
The reason why applications bring their own module stack (pip install in a virtualenv, own devpi index) is that application programmers don't want to deal with the arbitrary changes made by OS packagers. Especially since OS packagers usually do not test the Python modules they update. Probably I will also take this route for my Æ-DIR to avoid having to change my ansible code each time OS packagers make random package name changes etc. And this will also redeem me from packaging Python modules. Ciao, Michael.
On Sat, Feb 18, Michael Ströder wrote:
Having said this: If you're really eager to optimize for small/reusable data storage for large data centers there are more promising measures to reduce storage size.
Yes, get ride of all this old, stupid, obsolete, meanwhile wrong or superfluous Requires in spec files. Thorsten -- Thorsten Kukuk, Distinguished Engineer, Senior Architect SLES & CaaSP SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nuernberg, Germany GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg) -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org
On Saturday 2017-02-18 14:11, Michael Ströder wrote:
Michael Calmer wrote:
Please think of datacenters with thousands of virtual machines where the virtual harddisks are placed on a storage.
Sorry, based on my experience with customers running mid-size data centers I don't buy this cost argument. And really large data centers (Google, Facebook etc.)
Obviously, Google is not the target here (something smaller is).
If I look at JeOS, I can see we even do not have the standard kernel installed but only the kernel-default-base just to save some MB.
But IIRC this was done because of initrd size.
Not really; something like Xen DomUs are in all practical scenarios not going to have hardware for all the drivers there are. That's where -base comes in handy. -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org
Jan Engelhardt wrote:
On Saturday 2017-02-18 14:11, Michael Ströder wrote:
Michael Calmer wrote:
Please think of datacenters with thousands of virtual machines where the virtual harddisks are placed on a storage.
Sorry, based on my experience with customers running mid-size data centers I don't buy this cost argument. And really large data centers (Google, Facebook etc.)
Obviously, Google is not the target here (something smaller is).
So the storage costs are not high enough to justify bigger changes.
If I look at JeOS, I can see we even do not have the standard kernel installed but only the kernel-default-base just to save some MB.
But IIRC this was done because of initrd size.
Not really; something like Xen DomUs are in all practical scenarios not going to have hardware for all the drivers there are. That's where -base comes in handy.
Ok, so it does not serve as real argument for the source split for Python packages. Ciao, Michael.
On 02/17/2017 07:03 PM, Stefan Bruens wrote: <snip>
But is there one good point for dropping the python sources? Of cause you can save some space, but that would be about some MiB (on my system 24MiB), but which system does not have enough space to for that?
As I already said, there are small devices, like Raspberry Pi, there are containers. It also saves download bandwidth on both the mirrors as the users machine.
Fair enough. But then we have to answer the question whether we want to optimize for the small devices and implementations such as containers. The other option may be to document how to get to a sourceless Python system, and maybe provide a "clean_py_source" script, that can be used by those that build for small targets. Without measuring, my gut feel would be that it will be hard to show that container start-up is significantly improved if the .py files are not present in the container. There is a cost to the "optimization" and the question, IMHO, is whether the places, such as the Raspberry Pi, where such an "optimization" might make a small difference are sufficiently important in the over all picture for us to take on the cost. The whole system becomes more complex and that brings with it an on- going maintenance burden. Later, Robert -- Robert Schweikert MAY THE SOURCE BE WITH YOU Distinguished Architect LINUX Team Lead Public Cloud rjschwei@suse.com IRC: robjo
On Fri, Feb 17, 2017 at 7:03 PM, Stefan Bruens <stefan.bruens@rwth-aachen.de> wrote:
On Samstag, 18. Februar 2017 00:04:36 CET Ferdinand Thiessen wrote:
On 17.02.2017 23:48, Stefan Bruens wrote:
Ever wondered what happens if there *are* pyc bytecode files? Python *stats* the python source file, looks for a corresponding bytecode file in the (python2: source; python3: __pycache__) directory, reads the bytecode file header, compares the header timestamp with the change time from the earlier stat call, and executes the byte code. The contents of the source file are *not* read at all.
At least I know CPython (the "default" one), Jython (uses the JVM), PyPy and then there are several other programs that are using python for scripting (some of them also are loading python libraries from system, so they would have been patched to work with pyc code).
Currently we ship bytecode for every supported Python interpreter (for Tumbleweed, that is CPython 2.7 and CPython 3.5), accompanied *each time* by the source code. *If* the bytecode is available, the sources are *not* read.
The sources are always read if the Python code calls for them to be read, such as with a bunch of functions in the "inspect" module.
But is there one good point for dropping the python sources? Of cause you can save some space, but that would be about some MiB (on my system 24MiB), but which system does not have enough space to for that?
As I already said, there are small devices, like Raspberry Pi, there are containers. It also saves download bandwidth on both the mirrors as the users machine.
Does rasberry pi upstream include source files or only pyc files? -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org
Stefan Bruens wrote:
Ever wondered what happens if there *are* pyc bytecode files? Python *stats* the python source file, looks for a corresponding bytecode file in the (python2: source; python3: __pycache__) directory, reads the bytecode file header, compares the header timestamp with the change time from the earlier stat call, and executes the byte code. The contents of the source file are *not* read at all.
Right. One more question: Can .pyo files be generated just with the .pyc files being present in source directory? Nevertheless I don't see so much benefit splitting .py and .pyc into different RPMs. Ciao, Michael.
On 02/17/2017 05:48 PM, Stefan Bruens wrote:
So, we actually would have: - a sources package, shared by python2 and python3 - a python2 bytecode package (usable standalone) - a python3 "sourceless" package, installing symlinks into the sources directory, conflicting with the python2 bytecode package) - a python3 bytecode package (depending on the sources or the sourceless package, not exclusive)
So we would have four packages instead of two. Now we would have couple of other questions: - would these four packages take less space in the repo? - how well all these dependency solvers, zypper etc. will work when the number of packages double? Regards, Mikhail -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org
On Fri, Feb 17, 2017 at 5:48 PM, Stefan Bruens <stefan.bruens@rwth-aachen.de> wrote:
On Freitag, 17. Februar 2017 15:55:51 CET Todd Rme wrote:
On Fri, Feb 17, 2017 at 3:00 PM, Stefan Bruens
<stefan.bruens@rwth-aachen.de> wrote:
On Freitag, 17. Februar 2017 20:08:53 CET Klaus Kaempf wrote:
* Kristoffer Grönlund <kgronlund@suse.com> [Feb 17. 2017 16:59]:
jan matejek <jmatejek@suse.com> writes:
5. dropping .py That's a big NO.
I just wanted to add my voice to this and agree wholeheartedly. As a developer, this would make the packaged python modules completely useless to me, as stepping into them with a debugger would no longer work.
Then you just install the python-X-source package. That could even be automated in zypper.
Maybe add Recomends: packageand(python-X, python-sources), so its easy to install sources for all installed python-packages and restore the current status quo.
Option 5) actually said dropping sources, which is IMHO either badly worded or just a bad idea.
Splitting out the sources (.py) from the bytecode (.pyc) would allow to remove any redundant data and have fast startup times. Especially for containers and small devices (RPi and alike) this would be a clear win. Probably most installed systems would never need the sources, and even developers will likely only need the sources of a few packages. Hands up, who has debuginfo/ debugsource installed for *all* their installed packages?
As others have said, Python is an interpreted language, not a compiled language like C. The .py files are the code that is executed. The .pyc files are just an optional cache. In fact recent versions of python put them in a __pycache__ directory.
Sorry to destroy your simplistic world view, but there are no compiled or interpreted languages. Source code gets parsed, optimized, stored in various intermediate forms, and finally transformed into machine specific bytecode (and later may be transformed again, e.g. current x86 CPUs, or translated again, e.g. qemu).
No, that simply isn't true. Some language are converted into machine code (which is not the same as bytecode) that is executed by the CPU, some aren't. At no point is CPython code converted into machine code. The CPython interpeter decides what pre-existing machine code to execute based on the Python bytecode, but the CPU never sees that bytecode. Pypy only compiles certain commonly-used code paths to machine code. Cython and Numba can only compile a restricted subset of Python functionality to machine code. No one has been able to make a Python machine code compiler that works with the full range of Python functionality.
More seriously, .pyc files are not intended for stand-alone use and Python is not designed to work this way. Python allows a lot of tinkering with its internals, so I would not count on the .pyc files even working reliably on their own, and the bugs that are introduced could be rare and hard to track down.
Ever wondered what happens if there *are* pyc bytecode files? Python *stats* the python source file, looks for a corresponding bytecode file in the (python2: source; python3: __pycache__) directory, reads the bytecode file header, compares the header timestamp with the change time from the earlier stat call, and executes the byte code. The contents of the source file are *not* read at all.
Depends, you can tell Python never to use .pyc files.
The valid question now is, what happens if there is no source file, to compare the timestamp against? This is completely valid from pythons perspective, called "sourceless" distribution, although the behaviour differs slightly between python2 and python3. Python2 just uses the python2 bytecode file from the source directory unconditionally. Python3 does not use the bytecode from the __pycache__ dir, but looks for a .pyc name file in the source directory.
Assuming the package doesn't use the "inspect" module or something like it to access the source code directly, which is completely valid in Python. I just checked on github, and there are more than 24,500 uses of "inspect.getsource" alone in github.
This won't even work by default in recent versions of python. As I mentioned, these files go in the __pycache__ directory by default, and python does not look there for code to execute. So we would need to change python from the upstream default behavior just to make this proposal work at all.
Tried it, it *does* work, and is specified by the python PEP linked above.
This contradicts what you said above: "Python3 does not use the bytecode from the __pycache__ dir, but looks for a .pyc name file in the source directory." So in order to make this work, we would need to deviate from the default Python behavior, either manually symlinking or changing the default python bytecode handling. -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org
On 2017-02-17 16:00, jan matejek wrote:
2. dropping .pyc and compiling on installation That's certainly possible (Debian does something of the sort) and has even some interesting advantages: it allows you to transparently support many different python versions simultaneously with just one package install. Of course, it would be a lot of work. Doing this manually in every package is not realistic, we'd need some sort of automation that does this for a package in a single macro, or maybe a "python-central" tool that you run against the filelist in %post, or something... You would also have to %ghost all the .pyc files, otherwise all you get is a big ugly mess of not-owned files. The singlespec macro set could help with this, possibly even creating the %post/%preun scriptlets and %ghost entries automatically in packages.
yes, indeed.
This requires further discussion on the tradeoffs, probably at least a rudimentary install time benchmark, but I'm not opposed to including something like this.
I did some quick benchmarking today, that showed ~10 MB of source can be compiled in 1-2 seconds (as long as you do not compile them one-by-one, which creates some 10x overhead for loading python). For the typical 200MB .py files that would make 20-40 seconds extra minus the 10% that .pyc decompression would have eaten minus the saved transfer time.
4. reproducible builds For now, where it matters, let's touch generated .py files with a set time (probably mtime of tarball?). I suspect this is not an issue in most packages, so automation doesn't really help? But I'd be happy to learn more, e.g. see packages that don't build reproducibly and check out why that happens.
you can search in http://rb.zq1.de/compare.factory/reproducible.json for 'unreproducible' and http://rb.zq1.de/compare.factory/ also has some dozen build-compare diffs for those cases where more than just a file timestamp differed.
OTOH, doesn't rpm store mtime? Will a build with a regenerated file count as same as the previous build, if contents of the file are unchanged but metadata are?
it can be made constant with some recent patches linked in https://github.com/rpm-software-management/rpm/pull/144 That already allowed to build reproducibly 70-76% of Factory. Those patches are also in https://build.opensuse.org/package/show/home:bmwiedemann:reproducible/rpm
5. dropping .py That's a big NO. First of all, it's simply not worth it. On my work machine, which has a higher-than-usual amount of pythonic packages, the grand total size of all "*.py" files is 208 MB. Out of a 7.6GB /usr. Salt itself, which I installed for the purpose of this experiment, is 17 MB of that.
Second, users would kill us and I personally would be one of them. As I noted, .pyc is a *cache* for the primary source, which is the .py file. For python (and other languages that run from source), the automatic presence of source code and possibility of instant modification (for local patching, debugging, etc.) is a big advantage. Not installing .py files by default breaks all sorts of conventions, user expectations, goes against the spirit of open source, and is downright power-user-hostile, all in the name of saving space that's negligible on a typical system. We don't want to be That Distro (at least as far as I'm concerned)
strongly agree there. It would be same loss we got with systemd vs sysvinit scripts (e.g. the ones that handled filesystem mounts and crypto containers which is now handled by compiled C code)
I'm replying to Jan's message inline because I think it raised important points I want to reinforce with my voice. On 02/17/2017 08:00 AM, jan matejek wrote: ...
2. dropping .pyc and compiling on installation That's certainly possible (Debian does something of the sort) and has even some interesting advantages: it allows you to transparently support many different python versions simultaneously with just one package install. Of course, it would be a lot of work. Doing this manually in every package is not realistic, we'd need some sort of automation that does this for a package in a single macro, or maybe a "python-central" tool that you run against the filelist in %post, or something... You would also have to %ghost all the .pyc files, otherwise all you get is a big ugly mess of not-owned files. The singlespec macro set could help with this, possibly even creating the %post/%preun scriptlets and %ghost entries automatically in packages.
This requires further discussion on the tradeoffs, probably at least a rudimentary install time benchmark, but I'm not opposed to including something like this.
I guess this could make sense to some subset of users/packages/use cases etc., but it seems like an awful lot of work to solve a problem that I am not sure is even there.
3. dropping .pyc without replacement That could be reasonable in some packages, if we know that they are long-running and the startup time difference is negligible to users... which is up to the individual maintainer, i suppose. It's probably a bad idea for libraries, which tend to be used by command-line tools, where startup time matters.
If I understand what you are saying this is a non-starter for me. Python will produce the .pyc files when the .py files get run, unless the user were to always supply a flag on the Python command line, which is not even a possibility when using many Python command line tools. So if only .py files are distributed, then there will be a lot of .pyc cruft left on the system after a package is uninstalled.
4. reproducible builds For now, where it matters, let's touch generated .py files with a set time (probably mtime of tarball?). I suspect this is not an issue in most packages, so automation doesn't really help? But I'd be happy to learn more, e.g. see packages that don't build reproducibly and check out why that happens.
This seems a reasonable strategy to help solve the original question.
5. dropping .py That's a big NO. First of all, it's simply not worth it. On my work machine, which has a higher-than-usual amount of pythonic packages, the grand total size of all "*.py" files is 208 MB. Out of a 7.6GB /usr. Salt itself, which I installed for the purpose of this experiment, is 17 MB of that.
This I whole-heartedly agree. Cannot drop the .py files. Python software *is* the .py files. Python is an interpreted language and the fact that the .pyc concept exists doesn't change that. Also, does any other distro do this? There should be a good reason behind doing something no one else is doing. ...
(also, if 15 MB matter to you, maybe don't base your software on a language with a 50MB stdlib of which you need maybe a third, if that... or on a language where the sizes of source code and compiled objects are comparable ;) )
This is the key point IMO. I don't think a distribution should be trying to "solve" the "problems" it sees in a particular programming language. When a developer chooses a language they are also choosing its particular strengths and limitations, whether the developer understands that or not. If a particular user/use case can't abide with something on the magnitude of tens to hundreds of MB (this population has to be the tiniest sliver of the distro's potential user base), then they should look at alternative software. It shouldn't be the job of a general purpose Linux distribution to "solve" the tradeoffs implicit in the developer's choices when the vast majority of users don't need such a "solution". -- Jason Craig -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org
On Fri, 17 Feb 2017, jan matejek wrote:
hello,
replying to all comments so far
1. reason for existence of .pyc Startup times, plain and simple. This does not matter in apps that run for a long time, but it matters a lot for command line utilities.
Importantly, .pyc (and __pycache__ directories) is considered a *cache*. The source code is the primary thing. The fact that python can run purely off that cache is basically an implementation detail.
2. dropping .pyc and compiling on installation That's certainly possible (Debian does something of the sort) and has even some interesting advantages: it allows you to transparently support many different python versions simultaneously with just one package install. Of course, it would be a lot of work. Doing this manually in every package is not realistic, we'd need some sort of automation that does this for a package in a single macro, or maybe a "python-central" tool that you run against the filelist in %post, or something... You would also have to %ghost all the .pyc files, otherwise all you get is a big ugly mess of not-owned files. The singlespec macro set could help with this, possibly even creating the %post/%preun scriptlets and %ghost entries automatically in packages.
This requires further discussion on the tradeoffs, probably at least a rudimentary install time benchmark, but I'm not opposed to including something like this.
I wonder whether python can place .pyc files in a cache location like /var/cache/python-$VER which we could then even prune off old unused entires. More specifically whether python knows that .pyc "depends" on .py and thus when .py is modified it will re-build the .pyc cache file? I don't think the dance with %post/%preun and %ghost is worth the effort. The pruning (aka tmp-reaper) can easily take care of .pyc files with no longer installed .py files, no? Richard.
3. dropping .pyc without replacement That could be reasonable in some packages, if we know that they are long-running and the startup time difference is negligible to users... which is up to the individual maintainer, i suppose. It's probably a bad idea for libraries, which tend to be used by command-line tools, where startup time matters.
4. reproducible builds For now, where it matters, let's touch generated .py files with a set time (probably mtime of tarball?). I suspect this is not an issue in most packages, so automation doesn't really help? But I'd be happy to learn more, e.g. see packages that don't build reproducibly and check out why that happens.
OTOH, doesn't rpm store mtime? Will a build with a regenerated file count as same as the previous build, if contents of the file are unchanged but metadata are?
5. dropping .py That's a big NO. First of all, it's simply not worth it. On my work machine, which has a higher-than-usual amount of pythonic packages, the grand total size of all "*.py" files is 208 MB. Out of a 7.6GB /usr. Salt itself, which I installed for the purpose of this experiment, is 17 MB of that.
Second, users would kill us and I personally would be one of them. As I noted, .pyc is a *cache* for the primary source, which is the .py file. For python (and other languages that run from source), the automatic presence of source code and possibility of instant modification (for local patching, debugging, etc.) is a big advantage. Not installing .py files by default breaks all sorts of conventions, user expectations, goes against the spirit of open source, and is downright power-user-hostile, all in the name of saving space that's negligible on a typical system. We don't want to be That Distro (at least as far as I'm concerned)
For usecases where additional 15 MB matter (such as salt-minion? I have no idea what kind of system is the target), this might be a reasonable step. But you should also go further and install the whole thing as a zip file (which python can import transparently if added to sys.path); most of it would be symbol names, which are retained in .pyc, and here the zip compression helps a lot. Speaking of: whole salt is 32 MB, zipped up is 8.6 MB, zipped only *.py is 3.7 MB, zipped only *.pyc is 4.9 MB. Make your own tradeoff.
You can go even further and vendor-include the dependent packages in the zip file. Make the build process take the packages from their installed locations, to pick up all updates and security fixes on rebuild. Sounds like a good hackweek project.
(also, if 15 MB matter to you, maybe don't base your software on a language with a 50MB stdlib of which you need maybe a third, if that... or on a language where the sizes of source code and compiled objects are comparable ;) )
regards m.
On 17.2.2017 06:33, Bernhard M. Wiedemann wrote:
Via https://en.opensuse.org/openSUSE:Reproducible_Builds
I found that when we build python packages like python-amqp or python-binplist
it contains a .pyc file for every .py file and for every build these .pyc files differ, because they contain the timestamp of the corresponding source file and for some source files this is the time of build.
http://rb.zq1.de/compare.factory-20170208/python-amqp.html#content
http://rb.zq1.de/compare.factory-20170208/python-binplist.html#content
I was wondering how to best get those to build bit-by-bit identical rpms.
I assume, we want to keep the concept of .pyc files, since they provide some performance gain (e.g. I measured 'openstack --help' taking only 1.5 seconds with .pyc files versus 2.5 seconds without (on another machine it was 12 vs 13 seconds))
But why do we have to ship .pyc files as part of our binary rpms? They waste disk space and bandwidth for our mirrors and users. They could be created in a %post or %posttrans hook when installing the rpm (or do they need special build deps?) It might even be, that compiling them on the destination is faster than transferring and unpacking the LZMA compressed version.
The less intrusive alternative approach would be to touch .py files to a constant older date (e.g. $SOURCE_DATE_EPOCH if set) before generating the .pyc files.
What do you think which way to go?
Ciao Bernhard M.
-- Richard Biener <rguenther@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg) -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org
Hi, On 02/20/2017 08:26 AM, Richard Biener wrote:
On Fri, 17 Feb 2017, jan matejek wrote:
2. dropping .pyc and compiling on installation That's certainly possible (Debian does something of the sort) and has even some interesting advantages: it allows you to transparently support many different python versions simultaneously with just one package install. Of course, it would be a lot of work. Doing this manually in every package is not realistic, we'd need some sort of automation that does this for a package in a single macro, or maybe a "python-central" tool that you run against the filelist in %post, or something... You would also have to %ghost all the .pyc files, otherwise all you get is a big ugly mess of not-owned files. The singlespec macro set could help with this, possibly even creating the %post/%preun scriptlets and %ghost entries automatically in packages.
This requires further discussion on the tradeoffs, probably at least a rudimentary install time benchmark, but I'm not opposed to including something like this. I wonder whether python can place .pyc files in a cache location like /var/cache/python-$VER which we could then even prune off old unused entires. If this place is not writeable by the user, there's of course no way to do that. Thus the bytecode won't be cached. More specifically whether python knows that .pyc "depends" on .py and thus when .py is modified it will re-build the .pyc cache file? Simply by comparing timestamps, as already explained in other's mails.
Sebastian -- python programming - mail server - photo - video - https://sebix.at cryptographic key at https://sebix.at/DC9B463B.asc and on public keyservers
Hi, On Mon, 20 Feb 2017, Sebastian wrote:
If this place is not writeable by the user, there's of course no way to do that. Thus the bytecode won't be cached.
That can be solved in the same way as man pages caches is (mostly unused in recent times but the method is there). Ciao, Michael. -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org
Michael Matz wrote:
On Mon, 20 Feb 2017, Sebastian wrote:
If this place is not writeable by the user, there's of course no way to do that. Thus the bytecode won't be cached.
That can be solved in the same way as man pages caches is (mostly unused in recent times but the method is there).
The man page cache is done via cron job. Let's kill that one please :-) Nevertheless some central post transaction hook could generate the python cache. So invididual packges don't have to care. An even more crazy idea would be to have some socket activated daemon that python can trigger to cache specific files on demand. cu Ludwig -- (o_ Ludwig Nussel //\ V_/_ http://www.suse.com/ SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org
Hi, On Tue, 21 Feb 2017, Ludwig Nussel wrote:
That can be solved in the same way as man pages caches is (mostly unused in recent times but the method is there).
The man page cache is done via cron job.
Err, no. The question was how to create cached .pyc files in a shared directory as a user on demand. That's similar to pre-rendered man pages (cat files) from old times, which were created exactly the same. Of course for this it's necessary that the creation process (python) would at least be guid so it can write into the shared cache directory no matter which user called it. I'm not saying this is a good idea, merely that the problem of cached "compilation" results shared between all users has been solved. Ciao, Michael. -- To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org
participants (21)
-
Bernhard M. Wiedemann
-
Ferdinand Thiessen
-
Jan Engelhardt
-
jan matejek
-
Jason Craig
-
Klaus Kaempf
-
Kristoffer Grönlund
-
Ludwig Nussel
-
Michael Calmer
-
Michael Matz
-
Michael Ströder
-
Mikhail Terekhov
-
Richard Biener
-
Robert Schweikert
-
Sebastian
-
Simon Lees
-
Stefan Behlert
-
Stefan Bruens
-
Stephan Kulow
-
Thorsten Kukuk
-
Todd Rme