Opinionated containers

Containers offer a compelling vision for software management. Additionally, from release of the Docker container management platform to the…

Jul 10, 2018

Photo by Samuel Zeller on Unsplash (Edited)

Containers offer a compelling vision for software management. Additionally, from release of the Docker container management platform to the later release of Kubernetes as a mechanism to orchestrate production use, containers have been embraced by the operations community faster than many other technologies, and are being used by an extremely large number of companies today.

However, what makes a “good” container? What guidelines can we provide to the dev teams to reduce the burden of operating somewhat arbitrary application deployments?

Obligatory Caveat

The below is what I think. Additionally, it’ll be subject to period review. I’m writing this to explain the design decisions I like, as well as the justification thereof.

Guidelines

Small, but standard

Containers allow creating and storing immutable blobs that are deployable across any number of machines. That makes them extremely attractive as an application deployment tool.

However, when working with containers for a while, one area that becomes particularly painful is pushing and pulling containers across office networks. While containers aren’t nearly as large as machines, it’s still tedious pulling down a 1gb image that contains everything including the kitchen sink.

There are various guides on reducing the size of images, as well as publications of the major operating systems in “slim” builds, missing packages that are critical to virtual machine management but largely superfleus to containers.

Additionally, it’s even possible to build a container containing only the relevant application binaries with languages like golang that are statically compiled. Such containers are usually a pleasure to work with.

Tini init

Containers do not provide any restrictions in the amount of processes that can be run in a given. Indeed, using container like technologies is one mechanism that a machine may be “rooted” — essentially running the entire a machine in a container!

However, practically speaking, running a simple process tree inside our containers gives us a number of benefits:

It’s simple to tell if the container is “healthy”. If the init process is dead, then everything is dead. Restart it.
We do not have to manage init inside our containers
We do not have to worry about the shutdown of various applications; only one
When applications are unhealthy due to overload or other issues, we have only a single process to debug rather than several.

So, while it’s possible to run many processes in a container it makes for much simpler deployments having a nice, simple process tree. The following tool is useful as it’s super slim, but valid init system:

https://github.com/krallin/tini

Configuration defaults that work, but are designed to be overridden

One of the more interesting questions containers raise is: where do we put configuration? There are several ways it can be managed now:

Packing it into the container at build time
Using environment variables in an orchestration tool
Mounting the configuration in as a volume

In my mind, it’s appropriate to pack enough configuration into the container that it passes some smoke tests, but that’s all. Configuration is not like code; it’s an inherently mutable structure that’s designed to be modified quickly. It’s reasonable to roll out a configuration change across a deployment of 100 services extremely quickly, but perhaps not as reasonable to run the build process for that same application.

Additionally, it’s a common pattern to run various pools of an application with different configuration (such as for A/B testing or feature flagging).

Zero bootstrap

One of the more interesting patterns I’ve noticed is when attempting to inject an application into a container, there are a series of “init” scripts that modify the container or application in some way that makes it more suitable or “containery”. Some examples I’ve seen are:

git clone once the “base container” is run
mkdir -p some directories required for application runtime inside the mounted volume
Stalling the boot of application while other applications boot

In such cases I find that while it’s possible to follow the assumptions of the author and make the container work (fairly) reliably, it at least stalls the booting of the application and can make it difficult to debug the step in which containers get marked unhealthy and restarted.

Additionally, it’s difficult to smoke-test these bootstrap steps outside the production environment.

Exposing a health check/readiness endpoint

In a similar vein, some application do have an inherently complex bootup cycle. Applications may have dependencies on other services that are themselves slow to boot, or may be JVM applications that take a while to boot. Or, the container may simply broken or misconfigured.

Without some clear application of whether the application is healthy, orchestration tooling will likely simply roll out a broken update over top of an existing application, taking a service down. However, that same orchestration tooling can usually be configured to query a given HTTP endpoint to determine whether the application is “healthy” or “ready” (designated by a 200 status code).

Given this information, the orchestration tooling can roll out the application only replacing each node as the previous is ready, or even blocking a failed rollout entirely.

Updated, tested daily

One of the more interesting side effects of containers is they do not have a process for self maintenance. Given a Linux machine, we can enable automatic updates, set a few things up for monitoring and allow it to reboot and be fairly comfortable it will continue to run correctly for years at a time.

However, containers are frozen, immutable state. Unless they’re explicitly updated, they remain in that same state. Over time, security issues are discovered, packages updated and other things changed that this immutable container will not receive by default.

Accordingly, it makes sense to rebuild the containers on a nightly basis, deploying either daily or every few days. However, if we’re deploying so regularly, we need additional assurance that the image that we build isn’t going to suffer one of this nefarious “oh ubuntu’s keyserver went down” type of issues.

Accordingly, as part of this build process the container should be smoke-tested, doing a minimal set of tests to ensure that the build passed, and the software packed is working reasonably well.

Exposing metrics

To determine what’s going on in our newly containerised environments we need to have some idea what’s happening in the application itself. The easiest way to do this is to have the application express it’s state numerically. We can then collect this information and store it in a time-series database, and look it up later when we want to understand what’s happening.

So, applications should expose, by default, their statistical information. They can expose it over the loopback interface (lo, 127.0.0.1) if security issues are a concern, and orchestration tools such as Kubernetes can run a separate pod in that same network namespace to query the lo interface.

One common way of expressing application state is the Prometheus format. Additionally, orchestration systems such as Kubernetes have excellent support for this format, allowing things like horizontal scaling based on custom application metrics in this format.

Exposing logs via STDOUT in JSON

Given the above simple process tree, containers allow us to take a bit of a shortcut with logs. Instead of logging them to disk and attempting to have another application read that bespoke application log back from disk, the application can simply dump the logs to STDOUT or STDERR.

Docker will then pick up theses logs and store them in an appropriate location on disk. Log aggregation tooling can then pick up the logs of all containers running on the machine at once, attach some meta information about the container etc and ship those logs off to the nearest log aggregator or ELK stack.

Additionally, because log aggregation becomes an essentially trivial exercise, it then makes sense to structure the logs in a way that allows querying the log database for a specific property about the logs, such as the status code or a given file name.

The most common and widely understood complex data structure for logs is JSON. By exposing the logs in JSON, users can still read them fairly easily with tools like jq but, more importantly, logs can be easily queried, correlated and understood with filtering in production.

In Conclusion

The above are some things that I’ve found make managing containers simpler. It’s not an exhaustive list, but it maybe it helps you shape your own container development with a view to how it’ll be used and abused in a production environment.

Simple, Beautiful Software Development

Discussion about this post