This Kubernetes cluster deployment mode consists of a single control-plane and one or more worker nodes running the workload. In this HA topology, control-plane (that manages the cluster) is the single point of failure. In case of partial or a complete failure of a worker node, the control-plane reschedules the workload to the remaining worker nodes. But if the control-plane is down rescheduling of worker nodes cannot happen. However, the CX solution continues to work as long as there is no failure on a worker node simultaneously.
If the master node (control-plane) fails, the configuration management will not work. Solution will still work for a short time period (5 minutes) until control-plane returns.
See the What if scenarios below to learn more about the behaviour of this cluster.
A POD is down
Some features of the application may fail to work.
A component/service running inside the POD is down
This will affect the interworking of the application, it might not be able to query or save data.
A component/service running inside the POD is up
Application will start working normally
A worker-node is down
This will cause a 5 minute downtime for some features within the application, as some pods will move over to another working node.
A worker-node is restored
The control-plane node is down
No configurational changes will happen, you will need to wait for the control-plane node to be up to make any system changes.
The control-plane node is restored
Configurations can now be made
A worker-node is down while the control-plane is down
Application may fail to work.
This guide covers steps to install a single control-plane multi-worker deployment of an RKE2 cluster.
Step-1 Install RKE2 Control-plane
Install RKE2 control-plane RKE2 Control-plane Deployment
Step-2 Install Kube-VIP
A multi-worker cluster requires a floating IP shared among all workers. Kube-vip provides a floating IP among all workers. This ensures availability of the workload in the temporary absence of the control plane.
KubeVIP is not needed if you are creating a High Availability deployment using DNS and just adding worker nodes.
ARP – When using ARP or Layer 2 it will use leader election. Other modes that can also be used such as BGP, Routing Table and Wireguard. To understand how Kube-vip works do refer to this Kube-vip architecture.
All worker nodes have same interface names. The interface names can be views by typing in 'ip a s' in terminal.
All worker nodes including VIP should be on the same subnet for VIP configuration.
ARP is allowed on this worker nodes subnet
VIP has an FQDN assigned (used for CX)
Decide the IP and the interface on all nodes for Kube-VIP and setup these as environment variables. This step must be completed before deploying any other additional nodes in the cluster (both CP and Workers).CODE
export VIP=<Virtual-IP> export INTERFACE=<Interface>
Import the RBAC manifest for Kube-VIPCODE
curl https://kube-vip.io/manifests/rbac.yaml > /var/lib/rancher/rke2/server/manifests/kube-vip-rbac.yaml
Fetch the kube-vip imageCODE
/var/lib/rancher/rke2/bin/crictl -r "unix:///run/k3s/containerd/containerd.sock" pull ghcr.io/kube-vip/kube-vip:latest
Deploy the Kube-VIPCODE
CONTAINERD_ADDRESS=/run/k3s/containerd/containerd.sock ctr -n k8s.io run \ --rm \ --net-host \ ghcr.io/kube-vip/kube-vip:latest vip /kube-vip manifest daemonset --arp --interface $INTERFACE --address $VIP --controlplane --leaderElection --services --inCluster | tee /var/lib/rancher/rke2/server/manifests/kube-vip.yaml
Wait for the kube-vip to complete bootstrappingCODE
kubectl rollout status daemonset kube-vip-ds -n kube-system --timeout=650s
Once the condition is met, you can check the daemonset by kube-vip is running 1 podCODE
kubectl get ds -n kube-system kube-vip-ds
Once the cluster has more control-plane nodes added, the count will be equal to the total number of CP nodes.
Step-3 Get Control-plane token
On the control-plane node, run the following command to get the control-plane token to join worker(s) with this control-plane.
# It will display the node-token as something like the following
Step-4 Add Worker(s)
On each worker node,
Run the following command to install RKE2 agent on the worker.BASH
curl -sfL https://get.rke2.io | INSTALL_RKE2_TYPE="agent" sh -
rke2-agentservice by using the following command.BASH
systemctl enable rke2-agent.service
Create a directory by running the following commands.BASH
mkdir -p /etc/rancher/rke2/
/etc/rancher/rke2/config.yamland update the following fields.
<Control-Plane-IP>This is the IP for the control-plane node.
<Control-Plane-TOKEN>This is the token which can be extracted from first control-plane by running
server: https://<Control-Plane-IP>:9345 token: <Control-Plane-TOKEN> tls-san: - <FQDN> write-kubeconfig-mode: \"0644\" etcd-expose-metrics: true
Start the service by using follow command.BASH
systemctl start rke2-agent.service
Step 5: Verify
On the control-plane node run the following command to verify that the worker(s) have been added.
kubectl get nodes -o wide
Use a cloud native storage for a Worker HA setup. For available storage options, see Storage Solution - Getting Started
For multi-node (Worker HA) you can use local storage with node affinity. But, this will impose a restriction on worker nodes that a workload will have to be provisioned from the same node it was setup initially.
Setup CX on Kubernetes
To deploy Expertflow CX on this node, see CX Deployment on Kubernetes