RKE2 High Availability Pre-Deployment & Installation Guide
On-prem Deployment Challenges
When deploying on-prem using any of the above given distributions, following are the major concerns:
- fault-toleration
- self-sustainability
This can create a problem in leveraging High Availability in multiple zones or data-center locations when deploying the on-prem solutions. Extended Clusters with multiple zones or data-centers are not viable option as it requires a massive scalability both at the infrastructure and cluster itself and requires commercial support for setups with higher maintenance costs. Any cluster with high-availability essentially means that the cluster is able to provide some of the above mentioned capacities.
In practice, any Kubernetes based cluster can range from a single node to multi-master nodes, providing different edges in different contexts like:
- a Single node cluster can only provide self-healing and self-sustainability but without any sort of application availability in case of a node failure as it is a single node.
- a Multi-node cluster with only one single master and multiple worker nodes can sustain with node failure and application. It, however, fails when the master goes down.
- a Multi-Master cluster can work in almost all levels of availability (master (up to quorum) and worker node failures, application scaling to meet load spikes etc.) but can not failover to another cluster in another data-center in case of DR incident.
Following is the list of cluster topologies which can be deployed by keeping the above mentioned technologies in mind:
Cluster Topology
- HA setup using HA-Proxy, Nginx or any other External Load Balancer
- HA setup using Kube-VIP
- HA setup using DNS based routing
Storage Topology
- Local Storage with application level Replication
- NFS based Storage
- Cloud-Native Storage
Layered Networks Cluster
Cluster is provisioned with a separate networks for
- Cluster traffic
- POD Traffic
- Storage Traffic
Add-Ons for Kubernetes Cluster
To further enhance these clusters for further stability and resilience , we can also use:
- IPVS mode which gives a better performance in high work loads
- Nodelocal DNS for higher work-loads and better service availability
- Using multus as meta CNI plugin for multi-homed pods for-example for using separate path for Storage related traffic.
Cluster Topology Options
Storage Topology Option
Layered Networks Cluster
Please note this topology requires expertise to manage and maintain the cluster smoothly. An inefficient setup may result in poor performance and longer outage periods if something goes wrong.
Add-Ons for Kubernetes Cluster RECOMMENDED FOR HIGHER WORKLOADS
IPVS Mode
Using IPVS for Kubernetes gives edge over built-in kube-proxy's ability to use Iptables for POD's traffic. However, using IPVS requires some advanced management of the cluster and some additional tunning of the cluster may also be required. In this case, every single service is attached to the IPVS which then routes the traffic to the destination PODs.
NodeLocal DNS
Like many applications in a containerised architecture, CoreDNS or kube-dns runs in a distributed fashion. In certain circumstances, DNS reliability and latency can be impacted with this approach. The causes of this relate notably to conntrack race conditions or exhaustion, cloud provider limits, and the unreliable nature of the UDP protocol.
A number of workarounds exist, however long term mitigation of these and other issues has resulted in a redesign of the Kubernetes DNS architecture, and the result being the Nodelocal DNS cache project.
Requirements
- A Kubernetes cluster of v1.15 or greater created by Rancher v2.x or RKE2
- A Linux cluster, Windows is currently not supported
- Access to the cluster
RKE2: Using any RKE2 Kubernetes version
Update the default HelmChart for CoreDNS, the `nodelocal.enabled: true ` value will install node-local-dns in the cluster. Please see the documentation here for more details.
Further reading on NodeLocal DNS is available at https://www.suse.com/support/kb/doc/?id=000020174
Using Multus as meta CNI plugin
Multus CNI is a CNI plugin that enables attaching multiple network interfaces to pods. Multus does not replace CNI plugins, instead it acts as a CNI plugin multiplexer. Multus is useful in certain use cases, especially when pods are network intensive and require extra network interfaces that support dataplane acceleration techniques such as SR-IOV or separated storage traffice for NFS and Longhorn.
Multus can not be deployed standalone. It always requires at least one conventional CNI plugin that fulfills the Kubernetes cluster network requirements. That CNI plugin becomes the default for Multus, and will be used to provide the primary interface for all pods.
For details, please check https://docs.rke2.io/install/network_options#using-multus
Using Whereabout for IPAM with Multus meta CNI
Whereabouts is an IP Address Management (IPAM) CNI plugin that assigns IP addresses cluster-wide. Starting with RKE2 1.22, RKE2 includes the option to use Whereabouts with Multus to manage the IP addresses of the additional interfaces created through Multus. In order to do this, you need to use HelmChartConfig to configure the Multus CNI to use Whereabouts.
Additional information can be found at https://docs.rke2.io/install/network_options#using-multus-with-the-whereabouts-cni
Choose an Installation
Once you have gone through the above mentioned information, you can choose to select a mode of installation as per your requirement. The steps are explained in each of these guides:
RKE2 Deployment in High Availability with DNS
RKE2 Deployment in High Availability With Kube-VIP
RKE2 Deployment in High Availability With Nginx/HAProxy