hello, replying to all comments so far 1. reason for existence of .pyc Startup times, plain and simple. This does not matter in apps that run for a long time, but it matters a lot for command line utilities. Importantly, .pyc (and __pycache__ directories) is considered a *cache*. The source code is the primary thing. The fact that python can run purely off that cache is basically an implementation detail. 2. dropping .pyc and compiling on installation That's certainly possible (Debian does something of the sort) and has even some interesting advantages: it allows you to transparently support many different python versions simultaneously with just one package install. Of course, it would be a lot of work. Doing this manually in every package is not realistic, we'd need some sort of automation that does this for a package in a single macro, or maybe a "python-central" tool that you run against the filelist in %post, or something... You would also have to %ghost all the .pyc files, otherwise all you get is a big ugly mess of not-owned files. The singlespec macro set could help with this, possibly even creating the %post/%preun scriptlets and %ghost entries automatically in packages. This requires further discussion on the tradeoffs, probably at least a rudimentary install time benchmark, but I'm not opposed to including something like this. 3. dropping .pyc without replacement That could be reasonable in some packages, if we know that they are long-running and the startup time difference is negligible to users... which is up to the individual maintainer, i suppose. It's probably a bad idea for libraries, which tend to be used by command-line tools, where startup time matters. 4. reproducible builds For now, where it matters, let's touch generated .py files with a set time (probably mtime of tarball?). I suspect this is not an issue in most packages, so automation doesn't really help? But I'd be happy to learn more, e.g. see packages that don't build reproducibly and check out why that happens. OTOH, doesn't rpm store mtime? Will a build with a regenerated file count as same as the previous build, if contents of the file are unchanged but metadata are? 5. dropping .py That's a big NO. First of all, it's simply not worth it. On my work machine, which has a higher-than-usual amount of pythonic packages, the grand total size of all "*.py" files is 208 MB. Out of a 7.6GB /usr. Salt itself, which I installed for the purpose of this experiment, is 17 MB of that. Second, users would kill us and I personally would be one of them. As I noted, .pyc is a *cache* for the primary source, which is the .py file. For python (and other languages that run from source), the automatic presence of source code and possibility of instant modification (for local patching, debugging, etc.) is a big advantage. Not installing .py files by default breaks all sorts of conventions, user expectations, goes against the spirit of open source, and is downright power-user-hostile, all in the name of saving space that's negligible on a typical system. We don't want to be That Distro (at least as far as I'm concerned) For usecases where additional 15 MB matter (such as salt-minion? I have no idea what kind of system is the target), this might be a reasonable step. But you should also go further and install the whole thing as a zip file (which python can import transparently if added to sys.path); most of it would be symbol names, which are retained in .pyc, and here the zip compression helps a lot. Speaking of: whole salt is 32 MB, zipped up is 8.6 MB, zipped only *.py is 3.7 MB, zipped only *.pyc is 4.9 MB. Make your own tradeoff. You can go even further and vendor-include the dependent packages in the zip file. Make the build process take the packages from their installed locations, to pick up all updates and security fixes on rebuild. Sounds like a good hackweek project. (also, if 15 MB matter to you, maybe don't base your software on a language with a 50MB stdlib of which you need maybe a third, if that... or on a language where the sizes of source code and compiled objects are comparable ;) ) regards m. On 17.2.2017 06:33, Bernhard M. Wiedemann wrote:
Via https://en.opensuse.org/openSUSE:Reproducible_Builds
I found that when we build python packages like python-amqp or python-binplist
it contains a .pyc file for every .py file and for every build these .pyc files differ, because they contain the timestamp of the corresponding source file and for some source files this is the time of build.
http://rb.zq1.de/compare.factory-20170208/python-amqp.html#content
http://rb.zq1.de/compare.factory-20170208/python-binplist.html#content
I was wondering how to best get those to build bit-by-bit identical rpms.
I assume, we want to keep the concept of .pyc files, since they provide some performance gain (e.g. I measured 'openstack --help' taking only 1.5 seconds with .pyc files versus 2.5 seconds without (on another machine it was 12 vs 13 seconds))
But why do we have to ship .pyc files as part of our binary rpms? They waste disk space and bandwidth for our mirrors and users. They could be created in a %post or %posttrans hook when installing the rpm (or do they need special build deps?) It might even be, that compiling them on the destination is faster than transferring and unpacking the LZMA compressed version.
The less intrusive alternative approach would be to touch .py files to a constant older date (e.g. $SOURCE_DATE_EPOCH if set) before generating the .pyc files.
What do you think which way to go?
Ciao Bernhard M.