After dipping my toes into containers a while ago, I've been trying to experiment with scaling containers out for a low-fuss highly-available system.

I originally tried Docker Swarm, but while it has a very low barrier to entry, it was problematic in other ways that I ended up dumping that experiment.

After reading a few different blog posts (mostly by @shanselman), skimming through documentation and mostly just following @jessfraz, I thought I'd give Kubernetes a shot.

Kubernetes is an orchestrator that manages containers in a cloud or cloud-y environment. The name itself comes from the greek word for 'helmsman', which on top of 'containers' sets up for a world of bad puns including 'Helm', 'tiller' and 'Charts' in the Kubernetes ecosystem.

Creating a Cluster From Scratch

About a month ago, I organised parts for a NUC to run my various experiments and learning exercises on a dedicated machine at home. One machine barely makes for a highly-available cluster, however.

The idea here was to run a handful of VMs, and those would be my little cluster. After a lot of further reading an experimenting, I settled upon:

  • Host Operating System: Ubuntu Server 17.10.1
  • Hypervisor: VirtualBox
  • Guest OS: Ubuntu Server 17.10.1

Unfortunately, I can't find a lot of documentation on setting up your own Kubernetes cluster. Almost all of the existing documentation assumes you're using Amazon Web Services, Google Cloud Engine, Microsoft Azure, or one of a handful of lesser-known cloud services.

Fortunately, kubeadm exists, and although it's still in beta and not production-ready yet, I don't need production-ready to test and learn. kubeadm is a small tool to set up and manage Kubernetes clusters.

Setting up a Kubernetes cluster is fairly simple, assuming that you actually follow the instructions. Debugging a partially-working installation is hell, but do-able if you can find the necessary log files to dig through. On Ubuntu, journalctl will help you here - you will need to grep through logs for the kubelet service.

If you just run kubeadm init, you will install a Kubernetes master, but then you need a "pod network" add-on, and those often require certain command-line arguments to kubeadm init. In my case, kubeadm reset did not fully clear out everything and I had to completely rebuild my master VM in order to try again.

In my case, I'm using Flannel as the networking layer, because that's what the tutorial I was following used. I have no idea what the differences are in networking components and cannot find anywhere a good overview of the various networking layers and their differences.

In any case, to install a Kubernetes master and use Flannel, you need to run:

kubeadm init --pod-network-cidr=10.244.0.0/16
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/v0.9.1/Documentation/kube-flannel.yml

This sets up the master and initialises the networking layer.

Joining Nodes to the Cluster

The output from kubeadm init will include a full command-line for kubeadm join with instructions.

Simply run this on the nodes (once they have kubeadm installed), and they will join the cluster.

If you lose the command, you can rebuild it in pieces:

  1. You need a discovery token. You can create one of these with kubeadm token create
  2. You need the SHA-256 hash of the master's CA certificate. You can build this with openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'
  3. Join the new node with kubeadm join --token <discovery-token> --discovery-token-ca-cert-hash sha256:<cert-hash> <master-node-name>:6443

Using kubectl

Once you've set up your nodes, you can see them with kubectl get nodes.

kubectl has different verbs as the first argument, in a similar fashion to git and other similar tools.

get <resource> gives you a list of those resources.
describe <resource> <name> gives you detailed information about a particular resource.

For example, kubectl get nodes shows me a list of nodes.

yaakov@k8s-yavin4:~$ kubectl get nodes
NAME                              STATUS    ROLES     AGE       VERSION
k8s-crait.lab.yaakov.online       Ready     <none>    6d        v1.9.2
k8s-dantooine.lab.yaakov.online   Ready     <none>    2m        v1.9.2
k8s-hoth.lab.yaakov.online        Ready     <none>    6d        v1.9.2
k8s-takodana.lab.yaakov.online    Ready     <none>    6d        v1.9.2
k8s-yavin4.lab.yaakov.online      Ready     master    6d        v1.9.2

Similarly kubectl describe node k8s-crait.lab.yaakov.online shows information about that particular node.

yaakov@k8s-yavin4:~$ kubectl describe node k8s-crait.lab.yaakov.online
Name:               k8s-crait.lab.yaakov.online
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/hostname=k8s-crait.lab.yaakov.online
Annotations:        flannel.alpha.coreos.com/backend-data={"VtepMAC":"f6:77:10:18:45:3e"}
                    flannel.alpha.coreos.com/backend-type=vxlan
                    flannel.alpha.coreos.com/kube-subnet-manager=true
                    flannel.alpha.coreos.com/public-ip=192.168.0.201
                    node.alpha.kubernetes.io/ttl=0
                    volumes.kubernetes.io/controller-managed-attach-detach=true
Taints:             <none>
CreationTimestamp:  Sun, 28 Jan 2018 19:46:19 +1100
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  OutOfDisk        False   Sun, 04 Feb 2018 13:29:00 +1100   Sun, 28 Jan 2018 19:46:19 +1100   KubeletHasSufficientDisk     kubelet has sufficient disk space available
  MemoryPressure   False   Sun, 04 Feb 2018 13:29:00 +1100   Sun, 28 Jan 2018 19:46:19 +1100   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Sun, 04 Feb 2018 13:29:00 +1100   Sun, 28 Jan 2018 19:46:19 +1100   KubeletHasNoDiskPressure     kubelet has no disk pressure
  Ready            True    Sun, 04 Feb 2018 13:29:00 +1100   Sun, 28 Jan 2018 19:46:29 +1100   KubeletReady                 kubelet is posting ready status. AppArmor enabled
Addresses:
  InternalIP:  192.168.0.201
  Hostname:    k8s-crait.lab.yaakov.online
Capacity:
 cpu:     1
 memory:  2041580Ki
 pods:    110
Allocatable:
 cpu:     1
 memory:  1939180Ki
 pods:    110
System Info:
 Machine ID:                 fe04710066644669b4918298eccb6e81
 System UUID:                DB0CC9DC-B612-4778-88B1-032FFFA0EA55
 Boot ID:                    2e949d31-7c76-4de0-bc1f-4d8064ef58ac
 Kernel Version:             4.13.0-32-generic
 OS Image:                   Ubuntu 17.10
 Operating System:           linux
 Architecture:               amd64
 Container Runtime Version:  docker://17.12.0-ce
 Kubelet Version:            v1.9.2
 Kube-Proxy Version:         v1.9.2
PodCIDR:                     10.244.1.0/24
ExternalID:                  k8s-crait.lab.yaakov.online
Non-terminated Pods:         (2 in total)
  Namespace                  Name                     CPU Requests  CPU Limits  Memory Requests  Memory Limits
  ---------                  ----                     ------------  ----------  ---------------  -------------
  kube-system                kube-flannel-ds-wgxwg    0 (0%)        0 (0%)      0 (0%)           0 (0%)
  kube-system                kube-proxy-x4g8k         0 (0%)        0 (0%)      0 (0%)           0 (0%)
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  CPU Requests  CPU Limits  Memory Requests  Memory Limits
  ------------  ----------  ---------------  -------------
  0 (0%)        0 (0%)      0 (0%)           0 (0%)
Events:         <none>

Running Something on the Cluster

To run something, you need three things:

  1. A Pod is a set of containers working together to do something. In Linux terminology, they share kernel namespaces, so they can see the same files, networks, processes and so on.
  2. A Deployment is a set of pods that make up a service and controls how many replicas there are, rolling upgrades, and so on.
  3. A Service is the frontend for a Deployment - it gives it a name in DNS, and an addressable IP/Port so that you can actually connect to the pods.

Almost every tutorial, like configuring Flannel above, just goes "hey run this YAML and bang here is your service". If you then run kubectl apply -f http://some-yaml-file, you get a magic service.

This is not a very good way to learn what's going and how things work. This is a great way to build a pile of goop where you have no idea what's going on.

For my experiment, I set up a little PostgreSQL server with persistent storage via NFS to my home NAS. NFS is the easiest option for an at-home setup, however if you're in AWS/Azure/GCE there are better options available.

Here's my YAML configuration for PostgreSQL. Don't run away screaming if you can't understand it, just scroll past it.

apiVersion: v1
kind: PersistentVolume
metadata:
  name: postgres-nfs
spec:
  capacity:
    storage: 6Gi
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  nfs:
    server: scarif.lab.yaakov.online
    path: /volume1/ClusterData/kube/postgres

---

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-nfs
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 6Gi

---

apiVersion: v1
kind: Service
metadata:
  creationTimestamp: null
  labels:
    service: postgres
  name: postgres
spec:
  ports:
    - name: "5432"
      port: 5432
      targetPort: 5432
      nodePort: 30432
  selector:
    service: postgres
  type: NodePort
status:
  loadBalancer: { }

---

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  creationTimestamp: null
  name: postgres
spec:
  replicas: 1
  strategy: { }
  template:
    metadata:
      creationTimestamp: null
      labels:
        service: postgres
    spec:
      containers:
        - name: postgres
          image: postgres
          resources: { }
          ports:
            - containerPort: 5432
              hostPort: 5432
              protocol: TCP
          volumeMounts:
            - name: postgres-nfs
              mountPath: "/var/lib/postgresql/data"
          env:
            - name: POSTGRES_USER
              value: dbuser
            - name: POSTGRES_PASSWORD
              value: dbpass
            - name: POSTGRES_DB
              value: db
            - name: POSTGRES_SCHEMA
              value: dbschema 
      restartPolicy: Always
      volumes:
        - name: postgres-nfs
          persistentVolumeClaim:
            claimName: postgres-nfs

The easiest way to understand these files is to split them up by the --- separators. These separators demarcate different Kubernetes objects, and each section could in theory be it's own YAML file, or a command-line command to set it up.

Let's run through these pieces.

PersistentVolume

A persistent volume is exactly that - it gives you a space where data can be stored that outlives the lifetime of the container or pod that uses it.

apiVersion: v1  
kind: PersistentVolume  
metadata:  
  name: postgres-nfs
spec:  
  capacity:
    storage: 6Gi
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  nfs:
    server: scarif.lab.yaakov.online
    path: /volume1/ClusterData/kube/postgres

In this example, I've created a PersistentVolume with the following properties:

  • Storage: 6GB of capacity. I don't yet know why this is needed or used for NFS, I can understand why you'd use it for elastic block storage or similar. I need to do further research here.
  • Access Modes: ReadWriteMany allows multiple pods to mount this volume as read-write. The other available options are ReadWriteOnce (one node can mount read-write) and ReadOnlyMany (many nodes can mount read-only). Not every volume plugin supports every mode, so check the documentation.
  • Reclaim Policy: This sets what happens when the volume is no longer needed. Retain keeps the data around and it has to be manually cleaned up, should you so desire. Other options are Recycle which keeps the volume around but deletes all the contents, and Delete which deletes the entire volume.
  • NFS Options: This tells the NFS plugin to mount the shared volume scarif.lab.yaakov.online:/volume1/ClusterData/kube/postgres. You have to create this directory ahead of time, as Kubernetes will not create it for you, even if the parent directory already exists.

PersistentVolumeClaim

A persistent volume claim is a request by a pod to access a persistent volume. I don't quite understand yet why this exists, because all of the options I've had to specify are duplicates from the persistent volume:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-nfs
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 6Gi

Based on the persistent volume options, you should be able to tell what this one does.

Deployment

This is where the fun stuff happens.

A deployment configuration defines not only the options for the deployment itself, but also all of the details for how it creates a pod.

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  creationTimestamp: null
  name: postgres
spec:
  replicas: 1
  strategy: { }
  template:
    metadata:
      creationTimestamp: null
      labels:
        service: postgres
    spec:
      containers:
        - name: postgres
          image: postgres
          resources: { }
          ports:
            - containerPort: 5432
              hostPort: 5432
              protocol: TCP
          volumeMounts:
            - name: postgres-nfs
              mountPath: "/var/lib/postgresql/data"
          env:
            - name: POSTGRES_USER
              value: dbuser
            - name: POSTGRES_PASSWORD
              value: dbpass
            - name: POSTGRES_DB
              value: db
            - name: POSTGRES_SCHEMA
              value: dbschema 
      restartPolicy: Always
      volumes:
        - name: postgres-nfs
          persistentVolumeClaim:
            claimName: postgres-nfs

In this example, this creates a deployment the following properties:

  • Replicas: Only one replica is created. For any more I'd need to look into PostgreSQL multi-master or have standby read-only replicas. If this was a stateless application, I should be able to turn this all the way up without issue.

The pods created have the following properties:

  • Containers: The pod has only one container, based on the standard postgres image.
  • Ports: The pod exposes TCP port 5432 externally, which maps to 5432 internally, which is the standard port for PostgreSQL.
  • Volume Mounts: This tells it where to mount the NFS volume that we defined above. In this case, we want it mounted to /var/lib/postgresql/data.
  • Environment Variables: This defines a set of environment variables which are used by the PostgreSQL initialisation script to set up the database server the first time.
  • Restart Policy: This tells Kubernetes to always restart this container if it fails. Restarting uses exponential back-off, capped at a maximum of five minutes. Always is the only policy allowed for a Deployment, so I probably could have skipped it.
  • Volumes: This links it to the PersistentVolumeClaim above, so that our volume that stores the SQL data files is our NFS folder.

Service

The service exposes the deployment so that I can access it from a network outside of the internal Kubernetes network.

apiVersion: v1
kind: Service
metadata:
  creationTimestamp: null
  labels:
    service: postgres
  name: postgres
spec:
  ports:
    - name: "5432"
      port: 5432
      targetPort: 5432
      nodePort: 30432
  selector:
    service: postgres
  type: NodePort
status:
  loadBalancer: { }

This defines the following properties:

  • Ports: This defines port 5432 on the service to map to port 5432 on the pod. It also defines port 30432 on the node itself to map to port 5432 on the pod, so that I can connect to PostgreSQL from my regular LAN. If I don't specify this, Kubernetes will pick a random port from it's defined port range.
  • Type: NodePort means that the port is exposed directly on all Kubernetes nodes. Other options available include:
    • ClusterIP, which set it up on an IP address that is internal to the cluster only. This is useful for services that do not need to be publicly accessed, such as database servers.
    • LoadBalancer, which sets up a load balancer for the service with it's own public IP. This is only available on cloud services such as AWS/Azure/GCE
    • ExternalName, which sets up a DNS record somehow. I'll need to play around with this one to fully understand it.

Summary

  • Setting up Kubernetes is pretty easy, once you wrap your head around the basic concepts
  • You don't need Amazon AWS, Azure or GCE, but it does make things easier. If you're only doing short-term learning experiments, they're probably cheaper, too.
  • To understand Kubernetes YAML configuration files, split them up into their individual pieces. They're a lot more manageable and understandable that way, even if they're easier to install with kubectl apply as a single file.