Multi node cluster Failover scenarios
Introduction
This document illustrates the procedure and steps to perform the failover scenarios on any Kubernetes multi-node cluster with 1 master and 2 worker nodes.
Drain Node
You can use kubectl drain to safely evict all of your pods from a node before you perform maintenance on the node (e.g. kernel upgrade, hardware maintenance, etc.). Safe evictions allow the pod's containers to gracefully terminate and will respect the PodDisruptionBudgets you have specified.
use the following command to drain the node:
kubectl drain <node-name>
Cordon Node
Cordon Off the node to drain and remove the faulty node from the cluster by running
kubectl cordon <node-name>
Distribute the workload after node recovery
During node failure time lapse, the workload migrates to remaining available worker nodes. however when the faulted worker node joins back the cluster, the solution keeps on running on the previously available nodes. Only new workload is scheduled on the newly recovered worker node. In order to evenly distribute the work across all the worker nodes, it is necessary to restart the workloads based on their deployment models. EF-CX uses StatefulSets and Deployments only. Follow below given procedures to distribute the workloads. These procedure require outage.
EF-CX Components
Restart all the deployments in expertflow namespace so that all the Pods are distributed evenly.
kubectl get deploy -n expertflow|awk '!/NAME/ { print $1 }'|xargs kubectl -n expertflow rollout restart deploy
Repeat above step for other components if required.
StatefulSets ( STS ) and Deployments
External componenets are deloyed using StateFulSet (STS ) and Deployment models. If it is deemed necessary to do a workload distribution between all worker nodes and move any of the statefulsets/deployments in ef-external namespace, it should be moved appropriately by first scaling down the number of replicas to 0 and then recreating the pod, so that it can be rescheduled to newly recovered node. This process will involve downtime, please execute with outage announced before hand.
kubectl -n ef-external scale sts mongo --replicas=0
kubectl -n ef-external scale sts mongo --replicas=1
above steps can also be repeated for other STS like redis, postgreSQL and activeMQ. However, minio and grafana are configured to run as deployment mode ( opposite to STS ) and requires a command like below for example to restart the deployment of minio
kubectl -n ef-external rollout restart deploy minio
Master Node Scenarios
Control Plane is Down
During this period of master node unavailability, the cluster might still continue to run existing workloads as long as the worker nodes are healthy and the control plane's absence doesn't impact their operation.
Impact
However, any changes or new deployments requiring control plane interaction would be affected.
Recommendations
In a production environment, it's recommended to design Kubernetes clusters with high availability in mind, which includes having multiple master nodes and implementing measures like etcd clustering or load balancing to ensure the reliability of the control plane.
Worker Node Scenarios
One of the Worker is Down
Pods running on the faulted node worker will stop servicing requests.
Impact
All the service components will cease to work, as the default relocation time in kubernetes ranges between 5 to 10 minutes ( to avoid too much frequent hopping ).
Recommendations
If the Service components are not relocated automatically within a certain period of time to available pool of worker nodes, Please follow the draining method mentioned above.
Node Recovery Scenario
When a faulted node is recovered from maintenance, it will automatically rejoin the cluster automatically and will report back about its availability. However, the resources are not relocated to it automatically. This has to be managed manually if the workload distribution is required. Please revise the workload distribution and follow the procedure mentioned for workload distribution.
Adding a replacement of the faulted worker node with a completely new worker node, it must be deployed and passed through all the steps that were followed along when deploying the cluster on day-0.