Persistent Storage in Kubernetes

We have asserted that Kubernetes works around containers. The stateless nature of these containers allows us to scale and load balance with ease. But what happens when you want to keep some state on there? I mean, after all, how would you deploy a database server like Postgres or MySQL to Kubernetes if the next time the pod is recreated, you've lost all your data?

In comes the idea of persistent storage. In vanilla Docker, you would do something like create a volume and mount it in the running container. Well, it's pretty much the same stuff in Kubernetes, just with the additional loopholes we're familiar with.

Persistent Volumes

In Kubernetes, persistent volumes (PVs) are namespace-independent resources.  "It is a resource in the cluster just like a node is a cluster resource" - Kubernetes Docs. PVs are the bottom layer for persistent storage, but a running pod cannot simply consume a PV as is.

Persistent Volume Claims

In the same way pods consume the resources offered by nodes, PVCs consume the resources offered by PVs. They also provide the "interface" to mount persistent storage into pods. PVCs, unlike PVs, are namespace specific. When creating a deployment, you can include the PVC through a volumeMount into the container in your pod.

volumeMounts:
    - name: ThisIsTheNameOfMyVolume
      mountPath: /var/www/wherever

Now, everything stored in this folder will appear in the persistent volume, and will last between the pod being destroyed and recreated.

Snapshots

A really cool tool offered by Kubernetes with respect to PV(C)s is the ability to snapshot the volumes. A snapshot is a way, in Kubernetes, to make a copy of the entire volume without actually creating a new PV(C). Why is this cool? Because this provides a true cloud-native means of backing up (and restoring) data in your volumes. For example, before making changes to my database (or just out of good practice), I would make a snapshot I could revert to in case anything goes sideways.

There are two broad types of snapshots, which you can pretty much map to PV and PVCs. VolumeSnapshotContent and VolumeSnapshot are the snapshot equivalent of PVs and PVCs respectively. A VS is a "request" to use the resource that is a VSC.

Longhorn

My objective with these isn't to shill specific technologies (other than Kubernetes hehe) but I will add a mention to Longhorn. It's the storage solution I use on my personal cluster at home. Developed by Rancher but nonetheless free and open source, Longhorn is a CNCF incubating project promising (and delivering) robust block persistent storage.

An alternative to Longhorn is running Rook/Ceph in Kubernetes, and while I would totally recommend devops/sysadmin teams to use this combo over Longhorn, I don't see the benefits of running it for a small cluster due to the complexity overhead. Longhorn promises to be simple and definitely achieves this.

Rather than having non-portable, expensive storage solutions sitting in an external non-Kubernetes cluster, Longhorn promises the same (if not better) high availability, and functionality including incremental snapshot and scheduled backup to S3 compatible storage.

Until Next Time

Out of risk of making this chapter too heavy and bulky, I will end it here. Persistent storage is a real rabbit hole in Kubernetes. This chapter should leave you well equipped to deploy databases and other services requiring persistent storage, particularly on your hobby cluster. However, I would, as usual, recommend reading the docs which elaborate on a lot more techniques which I cannot cover here.