When I started building progfiguration, I settled on Python, mostly because I know it very well and it has a batteries-included standard library. However, there were a few things I wanted that Python doesn’t have by default1:
- Single binary deployment without unpacking
- Very fast package builds
After some research, I discovered that Python has supported a simple solution (with some caveats) for this since version 2.6: the zip application archive format.
Single binary deployment
Python programs are typically distributed as pip packages, and installing them is a little slow. They are unpacked in special filesystem locations (/usr/lib/python3.xx/site-packages/...
, venv/lib/python3.xx/site-packages/...
, etc) before they can be run, and you cannot run a pip package without this unpacking step, but I’d like to be able to just copy and run my program like any regular executable.
And unfortunately, building Pip packages is even slower, especially when taking the recommended approach of building in an isolated environment (the default when running a simple python -m build
command, see the build
docs.)
A zip application solves these problems.
This is really exciting because it gets the distribution benefits of a compiled binary. You can distribute a package that anyone with a Python interpreter already on their system can run. They don’t have to worry about virtual environments, pip packages, etc. It works on Windows and Unix. All you have to do is zip up your project directory and add a __main__.py
. It’s really cool!
Zip application caveats
- Running the code from inside a pyz file is a bit slower. I suspect this is not signficant for the average case.
- Only pure Python code can be executed this way. Python packages that use compiled extensions will not work. While we can include our dependencies as well (see below), they must not require copmiled extensions, which rules out important libraries like cryptography, and important packages which depend on them like requests.
- Unlike pip packages, zip applications have no concept of their own version.
Static includes
This gets us most of the way there, but what if we require dependencies? We can make them available to our program as well just by copying them into the zip archive. Once you do, you can import them and use them in your program the same as if they were installed by pip. As noted in the previously linked documentation:
This reminds me of static linking, which is possible in programming environments like C, and the default for code written in Go. Python programs don’t have a linking step, so the name doesn’t apply exactly, and I’ve taken to thinking of them as static includes2.
Examples
Python includes a module called zipapp for creating these archives. You can use its command-line interface:
# See help
python3 -m zipapp --help
# Create a pyz for a package with an existing main function
python3 -m zipapp /path/to/yourpackage --output=yourpackage.pyz --main="yourpackage.cmd:main"
You can use its Python API instead:
import zipapp
zipapp.create_archive(
"/path/to/yourpackage",
target="yourpackage.pyz",
main="yourpackage:main"
)
But any tool that creates a zip file will work; the only requirement is a __main__.py
that contains code to execute.
Here’s an example adapted from progfiguration’s zipapp function.
It uses the regular zipfile
library, without zipapp
at all.
import pathlib
import stat
import zipfile
yourpackage_path = pathlib.Path("/path/to/yourpackage")
somedep_path = pathlib.Path("/path/to/somedependency")
pyz_path = pathlib.Path("yourpackage.pyz")
main_py = "import yourpackage; yourpackage.main()"
with open(pyz_path, "wb") as fp:
# Writing a shebang like this is optional in zipapp,
# but there's no reason not to since it's a valid zip file either way.
fp.write(b"#!/usr/bin/env python3\n")
# Note that we cannot combine the zipfile context manager with the open() context manager,
# because the zipfile context manager writes a zip header when it opens,
# and we need to write the shebang before the zip header.
with zipfile.ZipFile(fp, "w") as z:
# Copy your package into the zipfile
for child in yourpackage_path.rglob("*"):
child_relname = child.relative_to(yourpackage_path)
z.write(child, "yourpackage/" + child_relname.as_posix())
# Copy the dependency package into the zipfile
for child in somedep_path.rglob("*"):
child_relname = child.relative_to(somedep_path)
z.write(child, "somedependency/" + child_relname.as_posix())
# Add the __main__.py file to the zipfile root, which is required for zipapps
z.writestr("__main__.py", main_py.encode("utf-8"))
# Make the zipapp executable
pyz_path.chmod(pyz_path.stat().st_mode | stat.S_IEXEC)
How fast is this?
I haven’t measured hard numbers here, but I’m quite happy with the result in progfiguration.
Building zip applications is extremely fast for programs of a few thousand lines across a few dozen files, especially with compression disabled in the zip archive. It beats pip by a mile.
Executing them is fast enough that I don’t notice a slowdown compared to doing things like running external commands. For the kind of code I’m running, zip archive reading will not be a bottleneck so pulling files out of it will feel fast. YMMV.
-
Aside from those packaging and deployment items, I also wanted a more convenient way to run system commands, and ended up writing a
magicrun()
function to make system commands a bit more ergonomic. ↩︎ -
I think it’s worth differentiating between vendoring and static inclusion. Vendoring implies copying third party code into your own repository. Static inclusion implies copying third party code into the package you are distributing. You might statically include code that isn’t vendored by pulling it down at build time or copying it from your operating system, and you might vendor code that isn’t statically included if you need it during development but you don’t want to distribute it. ↩︎