Rootless Container Build on Kubernetes

Running CI tasks on your Kubernetes infrastructure is relatively easy nowadays. The integrations for Github Actions¹ and Gitlab CI can be set up in mere minutes. And if that doesn’t suit your needs, there is always Jenkins, which even has a Kubernetes native Operator now!

Many pipelines require building containers. In a lot of environments build jobs come with a docker installation. If your jobs are containerised you are probably using Docker in Docker. If your Kubernetes installation runs on docker, the setup is as easy as exposing the docker socket inside the job’s container. And you might be tempted to go that way, but there is a problem with that.

Since the default docker installation runs as root, this often entails giving jobs more privileges than they need or should have. That is usually not an issue on public CI infrastructure, because these systems mostly (all?) run on one-off Virtual Machines. On your Kubernetes infrastructure, however, you might not be prepared to give CI jobs that level of access — and you shouldn’t! Fortunately, this isn’t necessary in most cases. If you think about it: Why should compiling some software and packaging it into an archive require superuser privileges?

You might have heard about Docker’s rootless mode, which is still experimental at this point. A better solution might be not using Docker at all to package your container images. After all, your job is already running inside a container so you don’t need another container execution environment, right? Here the excellent moby/buildkit comes into play.

The BuildKit daemon can be deployed without any privileges in a simple container. You can find some explanations and example manifests for the deployment in the example folder of BuildKit’s repository. Following the examples deploys BuildKit as a shared service. The service will be accessible via a Kubernetes Service, on your cluster.

This solution is efficient, but shares some resources between CI Jobs. If you can live with that, you are already done with your setup at this point. A command to build an image in a job could look like this:

buildctl --addr tcp:///buildkitd:1234 \
  build \
  --frontend dockerfile.v0 \
  --local context=. \
  --local dockerfile=. \
  --output type=image,\"name=myreg/org/example-image:latest,myreg/org/example-image:stable\",push=true \
  --export-cache type=local,dest=/mnt/buildkit-cache \
  --import-cache type=local,src=/mnt/buildkit-cache

If you’re like me and prefer to have more separation between your job environments, you can even deploy the daemon as a sidecar container and use one-off runners. Be aware though, that this consumes more resources, slows down the job startup and comes with some drawbacks regarding shared caches. The following is an example extract from my Kubernetes deployment spec:

containers:
- name: runner
  [...]
  volumeMounts:
        - mountPath: /mnt/buildkit
          name: buildkit
- name: buildkit
  args:
    - --addr
    - unix:///run/user/1000/buildkit/buildkitd.sock
    - --addr
    - unix:///mnt/buildkit/buildkitd.sock
    - --oci-worker-no-process-sandbox
  readinessProbe:
    exec:
      command:
        - buildctl
        - debug
        - workers
    initialDelaySeconds: 5
    periodSeconds: 30
  livenessProbe:
    exec:
      command:
        - buildctl
        - debug
        - workers
    initialDelaySeconds: 5
    periodSeconds: 30
  securityContext:
    runAsUser: 1000
    runAsGroup: 1000
  image: moby/buildkit:v0.9.0-rootless
  imagePullPolicy: IfNotPresent
  resources: {}
  volumeMounts:
    - mountPath: /mnt/buildkit
      name: buildkit
volumes:
    - emptyDir: {}
      name: buildkit

Just make sure, that you share the Unix socket between the BuildKit and job containers.

Finally use a command like the following to build your containers (this is an example from GitHub Actions):

buildctl --addr unix:///mnt/buildkit/buildkitd.sock \
  build \
  --frontend dockerfile.v0 \
  --local context=. \
  --local dockerfile=. \
  --output type=image,\"name=myreg/org/example-image:latest,myreg/org/example-image:stable\",push=true \
  --export-cache type=inline \
  --import-cache type=registry,ref=myreg/org/example-image:cache-tag

This example uses the registry to cache all build layers because the BuildKit container has no persistence layer. This solution is not the most efficient, but it provides the best separation between builders. If you care a lot about resource efficiency or especially build speed, you might want to look at the other available caching options.

If you are looking for a cloud native way to orchestrate Github Runners you might want to have a look at evryfs/github-actions-runner-operator. ↩︎