blog.bejarano.io

How to write great container images

Containers are packed with many misconceptions, leading to disappointing misuse of what’s a truly remarkable underlying technology.

One very popular misuse of Docker is how images are made.

Here I’ll lay out the principles of what I consider “Dockerfile best practices”, and simultaneously walk you through them with a real example: we are going to write a Redis image!

Note: there are already official Dockerfile best practices, but it is my humble opinion that we can do better than that.

Compile your application during build time

Why?

  1. Dockerfiles are nothing else but machine-readable lists of instructions. If you compile your application during build time, you can refer users to the Dockerfile when asked about build instructions.

  2. You will need to write a list of all your build dependencies, which is useful as documentation.

  3. This will remove one link in the trust chain by removing the need of software repositories (I mean, repositories don’t get hijacked often, but it has happened before).

  4. And finally, this will let you fine tune your build flags for containerized environments, if need be.

How?

First ADD the source into the image, and verify it (with a checksum, for example). Then script your application’s build in a single RUN block.

Only one RUN block, you ask? Yes, to optimize build caching, and to minimize the image’s number of layers.

Warning: don’t forget to enable the binary exploit mitigations usually applied on public software repositories. More information on Debian's "Hardening" page.

Tip: it is up to you whether to strip (or not) the symbols out of your binary. If you do, you will produce a smaller binary. If you don’t, you will be able to stack trace your application.

Example

This is how our Dockerfile looks right now:

Tip: ARGs and ENVs differ in that ENVs stick with us during runtime, these are variables we only need now, so the “correct” clauses to use are ARGs.

Package your image FROM scratch

Why?

  1. You will be forced to explicitly COPY all of your application’s dependencies (shared libraries, other binaries, configuration files, assets, etc.), resulting in an exhaustive list of your application’s runtime dependencies, which is very useful as documentation.

  2. If you only include those files required to run, your image will inherit no bloat from base images, like shells, UNIX tools, package managers, unused shared libraries, etc. Which bring no benefit, and could be leveraged by attackers, should your container get compromised.

  3. A nice by-product of this is that you will produce the tiniest possible image for your application, lowering pull times and cold-start latency.

How?

Use multi-stage builds.

First, create an ASbuild stage to compile your binary with whatever operating system base image you like, then create a second FROM scratch stage, and COPY--from=build your binary and all its runtime dependencies.

Tip: use tools like ldd to figure out which dynamic libraries your binary requires, and find to find them in the build stage’s filesystem.

Note: disappointingly, multi-stage builds are not supported for official Docker Hub images.

Example

We throw multi-stage into our Dockerfile:

Don’t use root

This one is confusing.

It is well-known that running network-exposed applications as root is not a great idea.

But Docker has this thing called user namespaces, essentially meaning that a container’s root and the host’s root are not the same. Therefore, one could argue you can run everything as root in Docker and be cool about it.

But this is dangerous! Turns out user namespaces are not enabled by default, and never expect your users to change defaults.

Why?

Security.

If your application is vulnerable, and your daemon where to have an unpatched container escape vulnerability, chances are that it requires container root privileges.

Running your application a user with UID other than 0 simply acts as another layer of protection.

How?

This depends on your application.

Packages like NGINX or HAProxy require to be initialized by root and then they switch to whatever user you specify in their configuration.

Other packages simply run and do no process management. If that’s your case, make sure to include a USER clause at the end of your Dockerfile, as the default value is, in fact: root.

If you are building the image FROM scratch, you will need to create your user. To do so, you will need to write two files: /etc/passwd and /etc/group.

I like to write them off-image, and put them in a rootfs/etc folder in the root of my Docker build context, then COPY it into my image’s / directory:

Example

Create the redis user and group:

rootfs/etc/passwd:

rootfs/etc/group:

And add the rootfs directory as well as the USER clause to the Dockerfile:

To avoid issues with permissions, I created an empty rootfs/data directory so that it gets chown‘ed during COPY to UID 100 and GID 100, redisUID and GID. You will see what that directory is for in a minute.

Use the EXPOSE clause

Why?

  1. It will make it easier for the administrator to check which ports is your application exposing.

  2. It serves as quick reference if you forget (happens more often than one would expect from oneself).

How?

Add an EXPOSE clause with the port your application exposes and its protocol.

Example

Redis listens for connections at 6379/TCP, so:

Use the VOLUME clause

Why?

Storage is hard, using the VOLUME clause for data volumes just makes volume management easier.

How?

Add the VOLUME clause followed by the path to your data volume.

Example

Redis stores data in the /data directory, so:

Configure your STOPSIGNAL

Why?

If your application handles open connections or some other form of state, it is nice to let your process exit gracefully. If the mechanism to tell your process to finish is a signal, then chances are you need need to specify one explicitly.

How?

To let Docker know which signal triggers graceful shutdown, use the STOPSIGNAL clause.

The default is SIGTERM, so if that’s your signal you can ignore this step. An alternative example is HAProxy, it uses SIGUSR1.

Example

Redis stops gracefully upon receiving a SIGTERM, so there’s nothing for us to change here.

Properly set ENTRYPOINT/CMD

Why?

Properly configuring our image’s ENTRYPOINT and CMD gives versatility in that you can pass your own arguments as if you were running the binary locally.

How?

Think of ENTRYPOINT as the binary and of CMD as argv.

So if you want your container to run /bin/ping -c 3 google.com, you would put the following in your Dockerfile:

This way, if I download your ping image but want to ping my own website, I would do:

Example

Our Redis binary is named /redis-server, so:

We pass no arguments by default, so we add no CMD clause.

Keep your assembly line open

If you are offering your image to the general public, please have an openly auditable CI pipeline so that users can verify the integrity of the images they download from the registry.

Why?

Open-source etiquette.

It is a major red flag to me when an image says it was “recently updated” but the “Builds” tab in Docker Hub shows the last build was “2 years ago”. This means the image maintainers are pushing the images from somewhere else, so you can’t verify the integrity of the image’s binaries.

How?

Set up automated builds in Docker Hub or Quay that automatically pull from your application’s repository and build your image when there are changes.

Tip #1: set up three build rules: one for the master branch, appropriately labeled :master; and two that trigger upon new tags, one labeled :latest and the other one with the tag itself. This way, the default image your users pull (:latest) will represent “the latest tag”, not “the current state of master”.

Tip #2: set up email notifications when your build fails so you get notified when something has gone wrong.

Keep your images in two registries simultaneously

Why?

  1. You never know when a registry will go down. The Docker Hub recently went read-only for 10 hours straight.

  2. You never know when your service will be disrupted: this startup had its DigitalOcean account deactivated for 12 hours. You never know when a service provider might simply stop providing said service.

How?

A list of popular container registries:

Choose any two and configure automatic builds on both.

Example

In our Redis example, the image is in the Docker Hub and Quay. They both pull from the same GitHub repository.

That’s all!

The above are the quality standards I look for in an image, and I always try to follow them in my builds.

The Redis example above is an actual image. You can find its Dockerfile together with other images I’ve built following the aforementioned principles here:

These are all actively kept up-to-date. Feel free to use/fork them.

Feel free to reach me out if you want me to write an image or review yours.