I spent a while the last week porting livegrep.com from running directly AWS to running on Kubernetes on Google’s Cloud Platform (specifically, the google container engine, which provisions and manages the cluster for me).
I left this experience profoundly enthusiastic about the future of Kubernetes. I think that if Google can execute properly, it’s clearly the future for how we build distributed applications. That said, it also feels like it has a ways to go yet. This post is some of my thoughts and experiences from this project.
The Good 🔗︎
The Right Abstractions 🔗︎
Kubernetes has a very strong set of abstractions for deploying applications to a cloud environment, and they’re very well thought out. I can create a “deployment” that ensures some code is running in a cluster, and then a “service” that handles routing traffic to that code within the cluster with a few lines of yaml. And these primitives are very well thought-out and robustly implemented; “deployments” natively support incremental no-downtime deployments, including healthchecking of the underlying service and so on. And since the abstraction level is so high – I’m really describing the logical units my application is built on, rather than the physical details – the implementation will hopefully continue to improve and evolve without requiring extensive modification by users.
The abstractions are also all very well-integrated with each other based on a rich, shared, schema; healthchecks are (rightly) a property of a container, but the “deployment” (ensures K copies of a container are always present), “service” (within-kubernetes routing), and “ingress” (external routing) primitives all automatically know to look at that healthcheck definition and take it into account for their own purposes.
This is a much higher level of abstraction than my previous deployment on raw AWS, which had to manage everything from firewall rules to user accounts up to Amazon autoscaling groups. And it shows in size, too: My kubernetes configuration comes in under 500 lines, compared with 600+ lines of ansible and 700+ lines of terraform for the ec2 deployment.
And the new version is better in many important ways: I have automated zero-downtime deploys with a single command, and fully-automated letsencrypt certificates, both of which I never fully built out on ec2.
Having the right abstractions is a strong start, but Kubernetes' potential would be limited if you were limited to use cases that had been thought of and supported by Google’s engineers. One of the key features of configuration management systems like Puppet and Terraform is the creation of new custom resource types, which can then be reused, either within a team or organization, or shared and used more broadly.
Kubernetes also supports user-defined resource types, known as third-party resources. While still beta and somewhat experimental, they provide a tantalizing vision of what kubernetes' extensibility will look like once mature.
Enabling it was fairly easy, although it involved a few manual steps;
I created a GCP volume and service account, added the service account
to a kubernetes secret, and then grabbed
from their repository, edited one slightly, and ran
kubectl create -f to create the configuration in my kubernetes cluster.
Once I’d done so, things got almost magically simple. I created a certificate object describing the certificate I wanted, waited a few seconds for the certificate manager to notice it and negotiate with the letsencrypt ACME server, and then I had a valid TLS certificate available as a Kubernetes secret:
$ kubectl describe secret beta.livegrep.com Name: beta.livegrep.com Namespace: default Labels: domain=beta.livegrep.com Annotations: stable.k8s.psg.io/kcm=true Type: kubernetes.io/tls Data ==== tls.crt: 3448 bytes tls.key: 1675 bytes
But in fact, I didn’t actually even need to create the
kube-cert-manager integrates with the existing kubernetes
ingress type: I added few lines of “annotations” to my
ingress definition, and the certificate controller automatically
provisioned a certificate, which was automatically picked up by the
ingress controller, registered with google’s cloud load balancer, and
available to serve https traffic into my application.
Third-party extensions in general, and
particular, are new, beta, and somewhat experimental, and I ran into a
few rough edges that I’ll mention later. However, even so, they still
worked impressively well, and give a tantalizing vision of the future
in which kubernetes will have a rich, extensible ecosystem of
third-party resources for controlling not only your kubernetes cluster
but any external dependencies you might want, in a uniform and
The Bad 🔗︎
Kubernetes, by default1, runs code in the form of docker containers distributed through a docker image repository.
While I’m incredibly enthusiastic about the future of container images as a deployment format, docker continues to feel like a hot mess every time I interact with it, and it’s entirely unclear to me how I’m “supposed” to build docker images for my application; I ended up stitching together a lot of pieces together by hand in a way that felt like I was working against the grain at every step. Here’s a few of the concrete issues and open questions I ran into:
Tagging images 🔗︎
Every time I try to understand docker repositories, I’m left a little baffled by how I’m supposed to use “tags”. The standard convention seems to be to rely heavily on mutable tags, tagging based on major releases or similar.
However, as an operator, this feels like a nightmare, making tracking the provenance of my images, or building reproducible images, an absolute nightmare. I’d much rather tag builds with a hyper-specific version number, so I have a more-or-less complete and readable provenance chain, and then bump those dependencies explicitly as needed. Similarly, kubernetes appears to implicitly expect immutable tags:
IfNotPresentwill cache a given image locally ~forever.
All the doc examples for deploying a new version assume you do so by switching over to a new image tag (which will automatically trigger an incremental rollout); I experimented initially with always using the
latesttag, and I was unable to find a way to force kubernetes to do a “no-op” redeploy just to get it to re-pull a tag.
However, none of the Docker tooling seems to support immutable
per-version tags well. For example, I can’t parameterize the
line, so if I want to depend on specific versions of an upstream
image, I’m forced to rewrite my
Dockerfile every time I want to
update. And docker hub’s automatic builds
seem to only support constant tags for a given branch, and not a
Building images 🔗︎
Livegrep is divided into two services, a C++ backend that manages the
index and processes the actual queries, and a Go web tier. In
addition, I run an
nginx frontend to serve static files. All three
components (the C++ binary, the Go binary, and the
same source repository.
I have no idea how I’m “supposed” to build
Dockerfiles for this. I
could build a single image that installs both livegrep and
the configuration for all three components, and invoke it in different
ways; That’d be easiest but feels kludgy and wouldn’t scale past a
certain point 2. What I wanted to do was build four images: A
base image containing the
livegrep build artifacts, and then three
specialized images for each service:
ubuntu <-- livegrep/base <-- livegrep/backend ^--- livegrep/frontend ^--- livegrep/nginx
However, I have no idea how to build such trees of images using
Dockerfiles. I could write four
Dockerfiles and run
four times, but how do I specify the appropriate base image in the
child images? In particular, it’s very important that a given build
reference a specific version of the
base image, since I want to
produce three leaf images from the same repository. But as described
above, Docker seems to want me to use fixed tags, which makes keeping
track of this tree challenging; If my leaf images start with
FROM livegrep/base:latest, how do I ensure that they actually build
against the right version?
I ended up building out a moderately complex build system of my own,
based on a Debian-esque version number
[livegrep git sha]-[packaging revision]), with
to handle the builds. It works and I’m quite happy with it, but I wish
there had been an obvious approach.
Deploying configuration 🔗︎
The final question I don’t have a satisfying answer for is how to deploy configuration data into my docker images. I ended up baking hardcoded configuration into my images, which feels very unsatisfying.
To pick a trivial example, my
hardcodes the “livegrep.com” domain. This works fine for me, but means
that no one else can reuse this setup to deploy their own internal
livegrep instance. But I’m not at all clear what the “right” option
is. It seems plausible I could store the configuration in a kubernetes
ConfigMap, but how
do I get it from there into the
nginx.conf? Do I need to templatize
my configuration at container startup? It seems that today that’s
something I would need to build myself.
The declarative/convergence model 🔗︎
Kubernetes follows a declarative resource model, in which an operator configures a resource by creating the object definition, and then a Kubernetes “controller” asynchronously comes by and attempts to update reality to conform to the described state.
Like all such systems, problems can arise when reality and the desired state are irreconcilable without destructive action, or when the change requires a transition that the controller doesn’t know how to make. Concretely, what this tends to mean is that certain properties can only be set at initial object creation time, and can’t be changed later. However, because the controller acts asynchronously from the Kubernetes API, such changes result in no visible errors, just silent divergence.
I ran into a few such cases while working on livegrep, where I had to destroy and re-create a resource in order for a change to take effect. Sometimes, once I dug, this behavior was documented somewhere; Sometimes I couldn’t find any mention of it.
This was mostly a minor annoyance, because I was developing a new deployment with no SLA, and so as soon as I realized what was going on I could just delete everything and start over. If I were building an application with more severe uptime requirements, I could imagine getting very frustrated trying to figure out how to orchestrate complex operational changes in a zero-downtime manner.
It’s very young 🔗︎
Kubernetes definitely feels very young and still a work-in-progress. Many of the features I relied on, like the third-party extensions for letsencrypt, and the GCP ingress controller that automatically configured Google’s Cloud Load Balancer from within Kubernetes, are marked as beta and subject to change. Kubernetes is changing rapidly, and documentation from a month ago may already be out of date, and there are very few public examples of nontrivial setups to learn from.
The third-party resource ecosystem is still very new and evolving; I
had to switch from Kelsey Hightower’s
kube-cert-manager to the new
PalmStoneGames implementation, and that one is still
that I could really use. I’m completely sold on the potential of this
ecosystem, but it remains mostly potential at this point.
Security is another area where it seems like there’s work to be done. Kubernetes secrets are stored in plaintext in etcd, which is fine for many applications but would probably scare me at a certain scale. Kubernetes supports fine-grained access control to the kubernetes API (essential for running a truly multi-tenant cluster), but still defaults to “always allow” and has three(!) different access-control mechanisms, and I’m unclear which one is considered “the future”.
Assuming Google plays their cards right, and community enthusiasm continues to build and mindshare continues to center, this will all obviously sort itself out in time. However, based on this experience, while I remain very excited for Kubernetes' potential, I’d be wary of adopting it or recommending it for a site with strict availability, security, or stability requirements.
Kubernetes nominally supports other container backends, like rkt; My impression is that docker is currently very much the “blessed” runtime. It’s also the only one supported by GKE, which was how I deployed my cluster. ↩︎
In retrospect, for my application, this would almost certainly have been the right solution; But I wanted to do the more-general thing out of stubborn curiosity. And even a single image wouldn’t have obviously my question of how to version images. ↩︎