After dipping my toes into containers a while ago, I've been trying to experiment with scaling containers out for a low-fuss highly-available system.
I originally tried Docker Swarm, but while it has a very low barrier to entry, it was problematic in other ways that I ended up dumping that experiment.
After reading a few different blog posts (mostly by @shanselman), skimming through documentation and mostly just following @jessfraz, I thought I'd give Kubernetes a shot.
Kubernetes is an orchestrator that manages containers in a cloud or cloud-y environment. The name itself comes from the greek word for 'helmsman', which on top of 'containers' sets up for a world of bad puns including 'Helm', 'tiller' and 'Charts' in the Kubernetes ecosystem.
Creating a Cluster From Scratch
About a month ago, I organised parts for a NUC to run my various experiments and learning exercises on a dedicated machine at home. One machine barely makes for a highly-available cluster, however.
The idea here was to run a handful of VMs, and those would be my little cluster. After a lot of further reading an experimenting, I settled upon:
- Host Operating System: Ubuntu Server 17.10.1
- Hypervisor: VirtualBox
- Guest OS: Ubuntu Server 17.10.1
Unfortunately, I can't find a lot of documentation on setting up your own Kubernetes cluster. Almost all of the existing documentation assumes you're using Amazon Web Services, Google Cloud Engine, Microsoft Azure, or one of a handful of lesser-known cloud services.
Fortunately, kubeadm
exists, and although it's still in beta and not production-ready yet, I don't need production-ready to test and learn. kubeadm
is a small tool to set up and manage Kubernetes clusters.
Setting up a Kubernetes cluster is fairly simple, assuming that you actually follow the instructions. Debugging a partially-working installation is hell, but do-able if you can find the necessary log files to dig through. On Ubuntu, journalctl
will help you here - you will need to grep
through logs for the kubelet
service.
If you just run kubeadm init
, you will install a Kubernetes master, but then you need a "pod network" add-on, and those often require certain command-line arguments to kubeadm init
. In my case, kubeadm reset
did not fully clear out everything and I had to completely rebuild my master VM in order to try again.
In my case, I'm using Flannel as the networking layer, because that's what the tutorial I was following used. I have no idea what the differences are in networking components and cannot find anywhere a good overview of the various networking layers and their differences.
In any case, to install a Kubernetes master and use Flannel, you need to run:
kubeadm init --pod-network-cidr=10.244.0.0/16
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/v0.9.1/Documentation/kube-flannel.yml
This sets up the master and initialises the networking layer.
Joining Nodes to the Cluster
The output from kubeadm init
will include a full command-line for kubeadm join
with instructions.
Simply run this on the nodes (once they have kubeadm
installed), and they will join the cluster.
If you lose the command, you can rebuild it in pieces:
- You need a discovery token. You can create one of these with
kubeadm token create
- You need the SHA-256 hash of the master's CA certificate. You can build this with
openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'
- Join the new node with
kubeadm join --token <discovery-token> --discovery-token-ca-cert-hash sha256:<cert-hash> <master-node-name>:6443
Using kubectl
Once you've set up your nodes, you can see them with kubectl get nodes
.
kubectl
has different verbs as the first argument, in a similar fashion to git
and other similar tools.
get <resource>
gives you a list of those resources.
describe <resource> <name>
gives you detailed information about a particular resource.
For example, kubectl get nodes
shows me a list of nodes.
yaakov@k8s-yavin4:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-crait.lab.yaakov.online Ready <none> 6d v1.9.2
k8s-dantooine.lab.yaakov.online Ready <none> 2m v1.9.2
k8s-hoth.lab.yaakov.online Ready <none> 6d v1.9.2
k8s-takodana.lab.yaakov.online Ready <none> 6d v1.9.2
k8s-yavin4.lab.yaakov.online Ready master 6d v1.9.2
Similarly kubectl describe node k8s-crait.lab.yaakov.online
shows information about that particular node.
yaakov@k8s-yavin4:~$ kubectl describe node k8s-crait.lab.yaakov.online
Name: k8s-crait.lab.yaakov.online
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/hostname=k8s-crait.lab.yaakov.online
Annotations: flannel.alpha.coreos.com/backend-data={"VtepMAC":"f6:77:10:18:45:3e"}
flannel.alpha.coreos.com/backend-type=vxlan
flannel.alpha.coreos.com/kube-subnet-manager=true
flannel.alpha.coreos.com/public-ip=192.168.0.201
node.alpha.kubernetes.io/ttl=0
volumes.kubernetes.io/controller-managed-attach-detach=true
Taints: <none>
CreationTimestamp: Sun, 28 Jan 2018 19:46:19 +1100
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
OutOfDisk False Sun, 04 Feb 2018 13:29:00 +1100 Sun, 28 Jan 2018 19:46:19 +1100 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Sun, 04 Feb 2018 13:29:00 +1100 Sun, 28 Jan 2018 19:46:19 +1100 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Sun, 04 Feb 2018 13:29:00 +1100 Sun, 28 Jan 2018 19:46:19 +1100 KubeletHasNoDiskPressure kubelet has no disk pressure
Ready True Sun, 04 Feb 2018 13:29:00 +1100 Sun, 28 Jan 2018 19:46:29 +1100 KubeletReady kubelet is posting ready status. AppArmor enabled
Addresses:
InternalIP: 192.168.0.201
Hostname: k8s-crait.lab.yaakov.online
Capacity:
cpu: 1
memory: 2041580Ki
pods: 110
Allocatable:
cpu: 1
memory: 1939180Ki
pods: 110
System Info:
Machine ID: fe04710066644669b4918298eccb6e81
System UUID: DB0CC9DC-B612-4778-88B1-032FFFA0EA55
Boot ID: 2e949d31-7c76-4de0-bc1f-4d8064ef58ac
Kernel Version: 4.13.0-32-generic
OS Image: Ubuntu 17.10
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://17.12.0-ce
Kubelet Version: v1.9.2
Kube-Proxy Version: v1.9.2
PodCIDR: 10.244.1.0/24
ExternalID: k8s-crait.lab.yaakov.online
Non-terminated Pods: (2 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
kube-system kube-flannel-ds-wgxwg 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-proxy-x4g8k 0 (0%) 0 (0%) 0 (0%) 0 (0%)
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
0 (0%) 0 (0%) 0 (0%) 0 (0%)
Events: <none>
Running Something on the Cluster
To run something, you need three things:
- A Pod is a set of containers working together to do something. In Linux terminology, they share kernel namespaces, so they can see the same files, networks, processes and so on.
- A Deployment is a set of pods that make up a service and controls how many replicas there are, rolling upgrades, and so on.
- A Service is the frontend for a Deployment - it gives it a name in DNS, and an addressable IP/Port so that you can actually connect to the pods.
Almost every tutorial, like configuring Flannel above, just goes "hey run this YAML and bang here is your service". If you then run kubectl apply -f http://some-yaml-file
, you get a magic service.
This is not a very good way to learn what's going and how things work. This is a great way to build a pile of goop where you have no idea what's going on.
For my experiment, I set up a little PostgreSQL server with persistent storage via NFS to my home NAS. NFS is the easiest option for an at-home setup, however if you're in AWS/Azure/GCE there are better options available.
Here's my YAML configuration for PostgreSQL. Don't run away screaming if you can't understand it, just scroll past it.
apiVersion: v1
kind: PersistentVolume
metadata:
name: postgres-nfs
spec:
capacity:
storage: 6Gi
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
nfs:
server: scarif.lab.yaakov.online
path: /volume1/ClusterData/kube/postgres
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: postgres-nfs
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 6Gi
---
apiVersion: v1
kind: Service
metadata:
creationTimestamp: null
labels:
service: postgres
name: postgres
spec:
ports:
- name: "5432"
port: 5432
targetPort: 5432
nodePort: 30432
selector:
service: postgres
type: NodePort
status:
loadBalancer: { }
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
creationTimestamp: null
name: postgres
spec:
replicas: 1
strategy: { }
template:
metadata:
creationTimestamp: null
labels:
service: postgres
spec:
containers:
- name: postgres
image: postgres
resources: { }
ports:
- containerPort: 5432
hostPort: 5432
protocol: TCP
volumeMounts:
- name: postgres-nfs
mountPath: "/var/lib/postgresql/data"
env:
- name: POSTGRES_USER
value: dbuser
- name: POSTGRES_PASSWORD
value: dbpass
- name: POSTGRES_DB
value: db
- name: POSTGRES_SCHEMA
value: dbschema
restartPolicy: Always
volumes:
- name: postgres-nfs
persistentVolumeClaim:
claimName: postgres-nfs
The easiest way to understand these files is to split them up by the ---
separators. These separators demarcate different Kubernetes objects, and each section could in theory be it's own YAML file, or a command-line command to set it up.
Let's run through these pieces.
PersistentVolume
A persistent volume is exactly that - it gives you a space where data can be stored that outlives the lifetime of the container or pod that uses it.
apiVersion: v1
kind: PersistentVolume
metadata:
name: postgres-nfs
spec:
capacity:
storage: 6Gi
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
nfs:
server: scarif.lab.yaakov.online
path: /volume1/ClusterData/kube/postgres
In this example, I've created a PersistentVolume with the following properties:
- Storage: 6GB of capacity. I don't yet know why this is needed or used for NFS, I can understand why you'd use it for elastic block storage or similar. I need to do further research here.
- Access Modes:
ReadWriteMany
allows multiple pods to mount this volume as read-write. The other available options areReadWriteOnce
(one node can mount read-write) andReadOnlyMany
(many nodes can mount read-only). Not every volume plugin supports every mode, so check the documentation. - Reclaim Policy: This sets what happens when the volume is no longer needed.
Retain
keeps the data around and it has to be manually cleaned up, should you so desire. Other options areRecycle
which keeps the volume around but deletes all the contents, andDelete
which deletes the entire volume. - NFS Options: This tells the NFS plugin to mount the shared volume
scarif.lab.yaakov.online:/volume1/ClusterData/kube/postgres
. You have to create this directory ahead of time, as Kubernetes will not create it for you, even if the parent directory already exists.
PersistentVolumeClaim
A persistent volume claim is a request by a pod to access a persistent volume. I don't quite understand yet why this exists, because all of the options I've had to specify are duplicates from the persistent volume:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: postgres-nfs
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 6Gi
Based on the persistent volume options, you should be able to tell what this one does.
Deployment
This is where the fun stuff happens.
A deployment configuration defines not only the options for the deployment itself, but also all of the details for how it creates a pod.
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
creationTimestamp: null
name: postgres
spec:
replicas: 1
strategy: { }
template:
metadata:
creationTimestamp: null
labels:
service: postgres
spec:
containers:
- name: postgres
image: postgres
resources: { }
ports:
- containerPort: 5432
hostPort: 5432
protocol: TCP
volumeMounts:
- name: postgres-nfs
mountPath: "/var/lib/postgresql/data"
env:
- name: POSTGRES_USER
value: dbuser
- name: POSTGRES_PASSWORD
value: dbpass
- name: POSTGRES_DB
value: db
- name: POSTGRES_SCHEMA
value: dbschema
restartPolicy: Always
volumes:
- name: postgres-nfs
persistentVolumeClaim:
claimName: postgres-nfs
In this example, this creates a deployment the following properties:
- Replicas: Only one replica is created. For any more I'd need to look into PostgreSQL multi-master or have standby read-only replicas. If this was a stateless application, I should be able to turn this all the way up without issue.
The pods created have the following properties:
- Containers: The pod has only one container, based on the standard
postgres
image. - Ports: The pod exposes TCP port 5432 externally, which maps to 5432 internally, which is the standard port for PostgreSQL.
- Volume Mounts: This tells it where to mount the NFS volume that we defined above. In this case, we want it mounted to
/var/lib/postgresql/data
. - Environment Variables: This defines a set of environment variables which are used by the PostgreSQL initialisation script to set up the database server the first time.
- Restart Policy: This tells Kubernetes to always restart this container if it fails. Restarting uses exponential back-off, capped at a maximum of five minutes.
Always
is the only policy allowed for a Deployment, so I probably could have skipped it. - Volumes: This links it to the PersistentVolumeClaim above, so that our volume that stores the SQL data files is our NFS folder.
Service
The service exposes the deployment so that I can access it from a network outside of the internal Kubernetes network.
apiVersion: v1
kind: Service
metadata:
creationTimestamp: null
labels:
service: postgres
name: postgres
spec:
ports:
- name: "5432"
port: 5432
targetPort: 5432
nodePort: 30432
selector:
service: postgres
type: NodePort
status:
loadBalancer: { }
This defines the following properties:
- Ports: This defines port 5432 on the service to map to port 5432 on the pod. It also defines port 30432 on the node itself to map to port 5432 on the pod, so that I can connect to PostgreSQL from my regular LAN. If I don't specify this, Kubernetes will pick a random port from it's defined port range.
- Type:
NodePort
means that the port is exposed directly on all Kubernetes nodes. Other options available include:ClusterIP
, which set it up on an IP address that is internal to the cluster only. This is useful for services that do not need to be publicly accessed, such as database servers.LoadBalancer
, which sets up a load balancer for the service with it's own public IP. This is only available on cloud services such as AWS/Azure/GCEExternalName
, which sets up a DNS record somehow. I'll need to play around with this one to fully understand it.
Summary
- Setting up Kubernetes is pretty easy, once you wrap your head around the basic concepts
- You don't need Amazon AWS, Azure or GCE, but it does make things easier. If you're only doing short-term learning experiments, they're probably cheaper, too.
- To understand Kubernetes YAML configuration files, split them up into their individual pieces. They're a lot more manageable and understandable that way, even if they're easier to install with
kubectl apply
as a single file.