Lighter Python images using multi-stage Dockerfiles

A very interesting feature coming with Docker 17.05 is multi-stage builds in the Dockerfile. This feature allows to implement a simple pipeline during the build phase and carry artifacts between stages. Previously, if you wanted to deliver lightweight images, you would have had to implement the builder pattern.

First, you would build an image with all the development dependencies, build the project and generate your artifacts, export them using docker cp or anything similar and then add those files to a new image, with a lighter base. This was a common patter, for example, for generating lightweight images of statically-compiled Golang applications.

With the multi-stage builds, it is possible to specify this workflow inside the Dockerfile. Let’s assume that we want a Python image with the pycrypto package installed, as described in the file requirements.txt:

$ cat requirements.txt

pycrypto

We could use this Dockerfile:

FROM python:3 as python-base
COPY requirements.txt .
RUN pip install -r requirements.txt

FROM python:3-alpine
COPY --from=python-base /root/.cache /root/.cache
COPY --from=python-base requirements.txt .
RUN pip install -r requirements.txt && rm -rf /root/.cache

There are a few points to note here:

  • The syntax of the Dockerfile allows several FROM directives, one per image/stage, that can be tagged using the as keyword.
  • The COPY directive accepts a --from= option to point to the artifacts and files from a previous image/stage.
  • I have chosen pycrypto as it is a library that require compilation at installation time. Thus, we are using a fat image named python:3 that ships with GCC and libraries.
  • The Python wheels that are downloaded or generated as a result of the compilation are stored in the ~/.cache/pip/wheels folder, so passing this folder as an artifact between stages prevents a succesive download and compilation of the packages, allowing us to use a lighter image (python:3-alpine).

The resulting image has a reduced surface attack, less dependencies installed and it is definitely lighter. The python:3 image with pycrypto installed weighs 697 MB. The alpine-based, only 96.4 MB.