How to write great container images
Containers are packed with many misconceptions, leading to disappointing misuse of what’s a truly remarkable underlying technology.
One very popular misuse of Docker is how images are made.
Here I’ll lay out the principles of what I consider “Dockerfile best practices”, and simultaneously walk you through them with a real example: we are going to write a Redis image!
Note: there are already official Dockerfile best practices, but it is my humble opinion that we can do better than that.
Compile your application during build time
Dockerfiles are nothing else but machine-readable lists of instructions. If you compile your application during build time, you can refer users to the Dockerfile when asked about build instructions.
You will need to write a list of all your build dependencies, which is useful as documentation.
This will remove one link in the trust chain by removing the need of software repositories (I mean, repositories don’t get hijacked often, but it has happened before).
And finally, this will let you fine tune your build flags for containerized environments, if need be.
ADD the source into the image, and verify it (with a checksum, for example). Then script your application’s build in a single
RUN block, you ask? Yes, to optimize build caching, and to minimize the image’s number of layers.
Warning: don’t forget to enable the binary exploit mitigations usually applied on public software repositories. More information on Debian's "Hardening" page.
Tip: it is up to you whether to strip (or not) the symbols out of your binary. If you do, you will produce a smaller binary. If you don’t, you will be able to stack trace your application.
This is how our Dockerfile looks right now:
ENVs differ in that
ENVs stick with us during runtime, these are variables we only need now, so the “correct” clauses to use are
Package your image
You will be forced to explicitly
COPYall of your application’s dependencies (shared libraries, other binaries, configuration files, assets, etc.), resulting in an exhaustive list of your application’s runtime dependencies, which is very useful as documentation.
If you only include those files required to run, your image will inherit no bloat from base images, like shells, UNIX tools, package managers, unused shared libraries, etc. Which bring no benefit, and could be leveraged by attackers, should your container get compromised.
A nice by-product of this is that you will produce the tiniest possible image for your application, lowering pull times and cold-start latency.
Use multi-stage builds.
First, create an
build stage to compile your binary with whatever operating system base image you like, then create a second
FROM scratch stage, and
--from=build your binary and all its runtime dependencies.
Tip: use tools like
ldd to figure out which dynamic libraries your binary requires, and
find to find them in the
build stage’s filesystem.
Note: disappointingly, multi-stage builds are not supported for official Docker Hub images.
We throw multi-stage into our Dockerfile:
This one is confusing.
It is well-known that running network-exposed applications as
root is not a great idea.
But Docker has this thing called user namespaces, essentially meaning that a container’s
root and the host’s
root are not the same. Therefore, one could argue you can run everything as
root in Docker and be cool about it.
But this is dangerous! Turns out user namespaces are not enabled by default, and never expect your users to change defaults.
If your application is vulnerable, and your daemon where to have an unpatched container escape vulnerability, chances are that it requires container
Running your application a user with
UID other than
0 simply acts as another layer of protection.
This depends on your application.
Packages like NGINX or HAProxy require to be initialized by
root and then they switch to whatever user you specify in their configuration.
Other packages simply run and do no process management. If that’s your case, make sure to include a
USER clause at the end of your Dockerfile, as the default value is, in fact:
If you are building the image
FROM scratch, you will need to create your user. To do so, you will need to write two files:
I like to write them off-image, and put them in a
rootfs/etc folder in the root of my Docker build context, then
COPY it into my image’s
redis user and group:
And add the
rootfs directory as well as the
USER clause to the Dockerfile:
To avoid issues with permissions, I created an empty
rootfs/data directory so that it gets
UID 100 and
GID. You will see what that directory is for in a minute.
It will make it easier for the administrator to check which ports is your application exposing.
It serves as quick reference if you forget (happens more often than one would expect from oneself).
EXPOSE clause with the port your application exposes and its protocol.
Redis listens for connections at
Storage is hard, using the
VOLUME clause for data volumes just makes volume management easier.
VOLUME clause followed by the path to your data volume.
Redis stores data in the
/data directory, so:
If your application handles open connections or some other form of state, it is nice to let your process exit gracefully. If the mechanism to tell your process to finish is a signal, then chances are you need need to specify one explicitly.
To let Docker know which signal triggers graceful shutdown, use the
The default is
SIGTERM, so if that’s your signal you can ignore this step. An alternative example is HAProxy, it uses
Redis stops gracefully upon receiving a SIGTERM, so there’s nothing for us to change here.
Properly configuring our image’s
CMD gives versatility in that you can pass your own arguments as if you were running the binary locally.
ENTRYPOINT as the binary and of
So if you want your container to run
/bin/ping -c 3 google.com, you would put the following in your Dockerfile:
This way, if I download your
ping image but want to ping my own website, I would do:
Our Redis binary is named
We pass no arguments by default, so we add no
Keep your assembly line open
If you are offering your image to the general public, please have an openly auditable CI pipeline so that users can verify the integrity of the images they download from the registry.
It is a major red flag to me when an image says it was “recently updated” but the “Builds” tab in Docker Hub shows the last build was “2 years ago”. This means the image maintainers are pushing the images from somewhere else, so you can’t verify the integrity of the image’s binaries.
Set up automated builds in Docker Hub or Quay that automatically pull from your application’s repository and build your image when there are changes.
Tip #1: set up three build rules: one for the
master branch, appropriately labeled
:master; and two that trigger upon new tags, one labeled
:latest and the other one with the tag itself. This way, the default image your users pull (
:latest) will represent “the latest tag”, not “the current state of master”.
Tip #2: set up email notifications when your build fails so you get notified when something has gone wrong.
Keep your images in two registries simultaneously
You never know when a registry will go down. The Docker Hub recently went read-only for 10 hours straight.
You never know when your service will be disrupted: this startup had its DigitalOcean account deactivated for 12 hours. You never know when a service provider might simply stop providing said service.
A list of popular container registries:
- Docker Hub, by Docker
- Quay, by Red Hat
- Elastic Container Registry, by Amazon Web Services
- Container Registry, by Google
- Container Registry, by Microsoft Azure
Choose any two and configure automatic builds on both.
In our Redis example, the image is in the Docker Hub and Quay. They both pull from the same GitHub repository.
The above are the quality standards I look for in an image, and I always try to follow them in my builds.
The Redis example above is an actual image. You can find its Dockerfile together with other images I’ve built following the aforementioned principles here:
- Kubernetes’ kube-state-metrics
- Kubernetes’ metrics-server
- Prometheus’ alertmanager
- Prometheus’ blackbox_exporter
- Prometheus’ node_exporter
- Prometheus’ snmp_exporter
- Redis (the example above)
These are all actively kept up-to-date. Feel free to use/fork them.
Feel free to reach me out if you want me to write an image or review yours.