RKE2 Deployment in High Availability With Nginx/HAProxy
Purpose
The purpose of this document is to describe steps to deploy the RKE2 Kubernetes Distribution in High Availability with Nginx/HAProxy.
Pre-requisites
The prerequisites and cluster topologies are describe in the RKE2 High Availability Pre-Deployment & Installation Guide. Please review the document before proceeding with installation in High Availability mode.
Quick Links
Preparing for Deployment
Decide what level of Availability is needed and prepare your nodes according to the below given structure for the solution
CIM Prerequisites
Load-Balancer Node
A Load Balancer Node is required to sit at the top of the cluster, which will be used for routing and Load-Balancing the traffic to the appropriate node (Control-Plane traffic for cluster administration and HTTP/HTTPS for Service Availability)
Load-Balancer Node can be:
- Nginx reverse Proxy
- HAproxy
- Existing Load-Balancer can also be used
Type | RAM (GB) | CPU | DISK | Scalability | Network Ports | Minimum Nodes |
---|---|---|---|---|---|---|
Load-Balancer | 4-8 | 4 | 100 GiB | Single-Node | 6443, 9345,80,443 to all CP/Worker Nodes nodes | 1 |
Load Balancer without HA is single point of failure in the cluster setup and customers are required to setup either of above in a failover cluster.
RKE2 Cluster Topologies
A production cluster can be run with a mixture of workload options.
- For a lighter loaded cluster Control-Plane Nodes can also be part of the work-load along with their Worker nodes
- For a highly busy workload, it is recommended to off-load the Control-Plane Nodes from the work load so that Control-Plane nodes are not effected by the heavy usage of the cluster resources and only use Worker nodes for processing the business logic.
Control-Plane Nodes (without Workload)
Type | RAM (GB) | CPU | DISK | Scalability | Network Ports | Minimum Nodes in HA |
---|---|---|---|---|---|---|
RKE2 | 4 | 4 | 150GiB (preferred on /var with SSD/NVMe Disks) | high |
| 3 |
If workload is enabled for Control-Plane Nodes, please enhance these figures to maximum available like 16GiB RAM, 8 vCPUS and 250 GIB of storage.
Worker Nodes
Worker nodes requirements are more energetic to entertain the workload.
Type | RAM (GB) | CPU | DISK | Scalability | Network Ports | Minimum Nodes in HA |
---|---|---|---|---|---|---|
RKE2 | 16 | 8 | 250GiB ( preferred on /var with SSD/NVMe Disks) | high |
| 3+ |
RASA-X Prerequisites
In a Multi-node cluster, Rasa-X can be deployed in different tiers:
- In a multi-node HA cluster, add another node and deploy the Rasa-X using Node-Affinity in such a way that allocates 1 worker node to RASA-X only. This method is preferred in a HA cluster. Read more at Node Affinity and Node Selector.
- However, if this is not a do-able option, RKE2 for Single-Node deployment should be used for standalone RASA-X and then configure the CIM accordingly.
Type | RAM (GB) | CPU | DISK | Scalability | Network Ports | Minimum Nodes in HA |
---|---|---|---|---|---|---|
RKE2 | 12 | 8 | 250GiB ( preferred on /var with SSD/NVMe Disks) | high |
| 1 |
Superset Prerequisites
For BI Reporting, Superset must be deployed separately from the main CIM Solution.
In a Multi-node cluster, Superset can be deployed in different tiers
- In a multi-node HA cluster, add another node and deploy the Superset using Node-Affinity in such a way that allocates 1 worker node specifically to superset only. This method is preferred in a HA cluster. Read Node Affinity and Node Selector
- However, if this is not a do-able option, RKE2 for Single Node Deployment should be used for standalone Superset and then configure the CIM accordingly.
Type | RAM (GB) | CPU | DISK | Scalability | Network Ports | Minimum Nodes in HA |
---|---|---|---|---|---|---|
RKE2 | 8 | 8 | 250GiB ( preferred on /var with SSD/NVMe Disks) | high |
| 1 |
1 RKE2 detailed requirements are also available at this.
2 Kubernetes detailed requirements can be seen here.
This deployment model requires that your connection to the system is stable and consistent. You can use any virtual terminal like 'screen' or 'tmux' which gives you the ability to resume your session even if the network gets disconnected.
FQDN
An FQDN must be mapped to an IP address
Iptables​
If you are running iptables in nftables mode instead of legacy you might encounter issues. We recommend utilizing newer iptables (such as 1.6.1+) to avoid issues.
Additionally, versions 1.8.0-1.8.4 have known issues that can cause RKE2 to fail. See Additional OS Preparations for workarounds.
Prepare all the Nodes in the cluster
Disable Services
Disable firewall and nm-cloud-setup on all nodes.
Environment Preparation
Before starting with RKE2 installation, following are the optional steps and the checklist to make sure that environment is prepared for the installation:
Linux-Based OS Instructions
We must run the following commands for RHEL OS before starting the installation of K3S.
Step1: Disable firewall and nm-cloud-setup service on RHEL and Ubuntu
systemctl disable apparmor.service
systemctl disable firewalld.service
systemctl stop apparmor.service
systemctl stop firewalld.service
Step 2: Lock the RedHat Release to version 8.7 only MANDATORY
To lock the release of RHEL to 8.7, which is the latest supported release by Longhorn, please execute these commands:
subscription-manager release --set=8.4 ;
yum clean all;
subscription-manager release --show;
rm -rf /var/cache/dnf
Step 3: Disable Swap
For RHEL and Ubuntu both:
systemctl disable swap.target
swapoff -a
Step 4: Update the RHEL package for 8.7 release
yum update -y
Checklist
Before proceeding with the deployment of HA cluster for RKE2, go through the checklist:
Object | Required | |
---|---|---|
| Internet access will be needed for all the nodes to fetch and run K3s | |
| 3 Control-Plane Nodes + 2 Worker Nodes( For High Availability ) | |
| RHEL-8.7 or Ubuntu-20.04 | |
| Firewall and nm-cloud-setup must be disabled | |
| RHEL-8.7 is only supported | |
| IP from the same range of CP nodes is needed for VIP fail-over | |
( check with systemctl status iscsid.service and confirm its enabled ). On All Nodes.2 | iscsid.service must be running before deploying longhorn Storage Manager | |
| NTP should be enabled for all nodes | |
| POD + Services IP Range must not co-exist with already existing IP Range | |
| Kube-VIP needs consistent interface names across all the control-plane nodes to fail-over. ( ip addr | grep -E ':\s.*?:' | cut -d ":" -f 2 | tr -d " " ) can be used to list interfaces |
- Air-Gapped deployment is also possible, check RKE2 web-site for more details at Air-Gapped install of K3s
- if any of the nodes is not running iscsid.service, the stateful workload will fail and may result in data loss .
Corporate HTTP/S Proxy Requirement
If the environment has strict HTTP or HTTPS proxy set, we must exclude the environment from the proxy controls.
The NO_PROXY variable must include your cluster pod and service IP ranges.
HTTP_PROXY=http://your-proxy.example.com:8888
HTTPS_PROXY=http://your-proxy.example.com:8888
NO_PROXY=127.0.0.0/8,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16,.svc,.cluster.local
If you want to configure the proxy settings for container without affecting K3s and the Kubelet, you can prefix the variables with CONTAINERD_:
CONTAINERD_HTTP_PROXY=http://your-proxy.example.com:8888
CONTAINERD_HTTPS_PROXY=http://your-proxy.example.com:8888
CONTAINERD_NO_PROXY=127.0.0.0/8,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16,.svc,.cluster.local
Customize the RKE2 Deployment for your Environment EXTRA OPTION
Environment Customization Steps
Below given options can also be used for customized environment setup:
Option | Switch | Default | Description |
---|---|---|---|
Default Deployment Directory of K3s | --data-dir value, -d value | /var/lib/rancher/rke2 | Folder to hold state |
Default POD IP Assignment Range | --cluster-cidr value | "10.42.0.0/16" | IPv4/IPv6 network CIDRs to use for pod IPs |
Default Service IP Assignment Range | --service-cidr value | "10.43.0.0/16" | IPv4/IPv6 network CIDRs to use for service IPs |
If any of the above option is required, add it in the next step.
cluster-cidr and service-cidr are independently evaluated. Decide wisely well before the the cluster deployment. This option is not configurable once the cluster is deployed and workload is running.
Installation Steps
Step 1: Load Balancer Deployment
1. Set up the Load Balancer
You will also need to set up a load balancer to direct traffic to the Expertflow CX replica on all nodes. That will prevent an outage of any single node from taking down communications to the Expertflow CX management server.
When Kubernetes gets set up in a later step, the RKE2 tool will deploy an Nginx Ingress controller. This controller will listen on ports 80 and 443 of the worker nodes, answering traffic destined for specific hostnames.
For your implementation, consider if you want or need to use a Layer-4 or Layer-7 load balancer:
- A layer-4 load balancer is the simpler of the two choices, in which you are forwarding TCP traffic to your nodes. We recommend configuring your load balancer as a Layer 4 balancer, forwarding traffic to ports TCP/80 and TCP/443 to the ef_cx management cluster nodes. The Ingress controller on the cluster will redirect HTTP traffic to HTTPS and terminate SSL/TLS on port TCP/443. The Ingress controller will forward traffic to port TCP/80 to the Ingress pod in the ef_cx deployment.
- A layer-7 load balancer is a bit more complicated but can offer features that you may want. For instance, a layer-7 load balancer is capable of handling TLS termination at the load balancer, as opposed to Nginx Ingress doing TLS termination itself. This can be beneficial if you want to centralize your TLS termination in your infrastructure. Layer-7 load balancing also offers the capability for your load balancer to make decisions based on HTTP attributes such as cookies, etc. that a layer-4 load balancer is not able to concern itself with. If you decide to terminate the SSL/TLS traffic on a layer-7 load balancer, you will need to use the
--set tls=external
option when installing Rancher in a later step. For more information, refer to the Rancher Helm chart options.
Once you have set up your load balancer, you will need to create a DNS record to send traffic to this load balancer.
Depending on your environment, this may be an A record pointing to the load balancer IP, or it may be a CNAME pointing to the load balancer hostname. In either case, make sure this record is the hostname that you intend Cluster to respond on.
Load balancer can be of the following types:
- Nginx
- HAProxy
- Existing Load Balancer
Running Nginx or HA Proxy as load balancer is mandatory to load balance the traffic to the cluster. Below given is a sample configuration for each of these and deployment should be done at the System level instead of docker based deployment
2. Decide which Load Balancer to use and follow appropriate tabs below:
Install Nginx using below given steps:
Nginx Deployment
For RHEL
yum install nginx -y
For ubuntu
apt install nginx -y
Rename the existing nginx.conf file and use the one in next step.
sudo mv /etc/nginx/nginx.conf /etc/nginx/nginx.conf.bak
sudo vim /etc/nginx/nginx.conf
Create a new Nginx file with the below lines replacing where required from next section.
Install HAProxy
Install HAProxy
Ubuntu
apt update
apt install -y haproxy
systemctl enable haproxy
systemctl start haproxy
CentOS / RedHat
yum update
yum install haproxy -y
systemctl enable haproxy
systemctl start haproxy
Edit the haproxy.cfg and update from the appropriate tab below for haproxy configuation ( With SSL termination or without SSL Termination )
vi /etc/haproxy/haproxy.cfg
Use the appropriate configuration below
Use appropriate tab below for your LB configuration.
Step 2: Prepare First Control Plane
1. Run the below commands to create required directories for RKE2 configurations.
mkdir -p /etc/rancher/rke2/
mkdir -p /var/lib/rancher/rke2/server/manifests/
2. Create a deployment manifest called config.yaml
for RKE2 Cluster and replace the IP addresses and corresponding FQDNS according.( add any other fields from the Extra Options sections in config.yaml
at this point )
cat<<EOF|tee /etc/rancher/rke2/config.yaml
tls-san:
- devops67.ef.com
- 10.192.168.67
- devops61.ef.com
- 10.192.168.61
- devops62.ef.com
- 10.192.168.62
- devops63.ef.com
- 10.192.168.63
write-kubeconfig-mode: "0600"
etcd-expose-metrics: true
cni:
- canal
EOF
In above mentioned template manifest,
- 10.192.168.67 is the Load Balancer IP
- devops67.ef.com is the Load Balancer FQDN
- remaining IPs and FQDN are for all 3 Control Planes
Step 3: Ingress-Nginx config for RKE2
1. By default RKE-2 based ingress controller doesn't allow additional snippet information in ingress manifests. Create this config before starting the deployment of RKE2.
cat<<EOF| tee /var/lib/rancher/rke2/server/manifests/rke2-ingress-nginx-config.yaml
---
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: rke2-ingress-nginx
namespace: kube-system
spec:
valuesContent: |-
controller:
metrics:
service:
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "10254"
config:
use-forwarded-headers: "true"
allowSnippetAnnotations: "true"
EOF
Step 4: First Control Plane Deployment
2. Begin the RKE2 Deployment
curl -sfL https://get.rke2.io | INSTALL_RKE2_TYPE=server sh -
3. Start the RKE2 service. Starting the Service will take approx. 10-15 minutes based on the network connection.
systemctl start rke2-server
Step 5: Enable the RKE2 Service
1. Enable RKE2 service.
systemctl enable rke2-server
2. By default RKE2 deploys all the binaries in /var/lib/rancher/rke2/bin
path. Add this path to system's default PATH for kubectl utility to work appropriately.
export PATH=$PATH:/var/lib/rancher/rke2/bin
export KUBECONFIG=/etc/rancher/rke2/rke2.yaml
3. Also, append these lines into current user's .bashrc
file.
echo "export PATH=$PATH:/var/lib/rancher/rke2/bin" >> $HOME/.bashrc
echo "export KUBECONFIG=/etc/rancher/rke2/rke2.yaml" >> $HOME/.bashrc
4. Get the token for joining other Control-Plane Nodes.
cat /var/lib/rancher/rke2/server/node-token
Step 6: Remaining Control-Plane Nodes
Perform the following steps on remaining control-plane nodes.
1. Create required directories for RKE2 configurations.
mkdir -p /etc/rancher/rke2/
mkdir -p /var/lib/rancher/rke2/server/manifests/
2. Create a deployment manifest called config.yaml
for RKE2 Cluster and replace the IP addresses and corresponding FQDNS according.( add any other fields from the Extra Options sections in config.yaml
at this point )
cat<<EOF|tee /etc/rancher/rke2/config.yaml
server: https://10.192.168.67:9345
token: [token from /var/lib/rancher/rke2/server/node-token on server node 1]
write-kubeconfig-mode: "0644" tls-san:
- devops67.ef.com
- 10.192.168.67
- devops61.ef.com
- 10.192.168.61
- devops62.ef.com
- 10.192.168.62
- devops63.ef.com
- 10.192.168.63
write-kubeconfig-mode: "0644"
etcd-expose-metrics: true
cni:
- canal
EOF
In above mentioned template manifest,
- 10.192.168.67 is the Load Balancer IP
- devops67.ef.com is the Load Balancer FQDN
- remaining IPs and FQDN are for all 3 Control Planes
Ingress-Nginx config for RKE2
By default RKE-2 based ingress controller doesn't allow additional snippet information in ingress manifests. Create this config before starting the deployment of RKE2.
cat<<EOF| tee /var/lib/rancher/rke2/server/manifests/rke2-ingress-nginx-config.yaml
---
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: rke2-ingress-nginx
namespace: kube-system
spec:
valuesContent: |-
controller:
metrics:
service:
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "10254"
config:
use-forwarded-headers: "true"
allowSnippetAnnotations: "true"
EOF
Step 7: Begin the deployment on all other Control-Plane Nodes
1. Begin the RKE2 Deployment
curl -sfL https://get.rke2.io | INSTALL_RKE2_TYPE=server sh -
2. Start the RKE2 service. Starting the Service will take approx. 10-15 minutes based on the network connection
systemctl start rke2-server
3. Enable the RKE2 Service
systemctl enable rke2-server
4. By default RKE2 deploys all the binaries in /var/lib/rancher/rke2/bin
path. Add this path to system's default PATH for kubectl utility to work appropriately
export PATH=$PATH:/var/lib/rancher/rke2/bin
export KUBECONFIG=/etc/rancher/rke2/rke2.yaml
5. Also append these lines into current user's .bashrc
file
echo "export PATH=$PATH:/var/lib/rancher/rke2/bin" >> $HOME/.bashrc
echo "export KUBECONFIG=/etc/rancher/rke2/rke2.yaml" >> $HOME/.bashrc
Step 8: Deploy Worker Nodes
1. Repeat these steps for all worker nodes. Create the rke2 directory
mkdir -p /etc/rancher/rke2/
2. Create the config.yaml
cat<<EOF|tee /etc/rancher/rke2/config.yaml
server: https://10.192.168.67:9345
token: [token from /var/lib/rancher/rke2/server/node-token on server node 1]
write-kubeconfig-mode: \"0644\"
EOF
3. Initiate the deployment of RKE2
curl -sfL https://get.rke2.io | INSTALL_RKE2_TYPE=agent sh -
4. Start the RKE2 Agent Service
systemctl start rke2-agent.service
5. Enable the RKE2 Agent Service to start at the boot time
systemctl enable rke2-agent.service
Step 9: Bash Completion for kubectl (Control Plane Nodes only)
1. Install bash-completion package
yum install bash-completion -y
2. Setup autocomplete in bash into the current shell, bash-completion
package should be installed first.
source <(kubectl completion bash)
echo "source <(kubectl completion bash)" >> ~/.bashrc
3. Also, add alias for short notation of kubectl
echo "alias k=kubectl" >> ~/.bashrc
echo "complete -o default -F __start_kubectl k" >> ~/.bashrc
4. Source your ~/.bashrc
source ~/.bashrc
Step 10: Install Helm
1. Helm is a nifty tool to deploy external components. To install helm on cluster, execute the following command.
curl -fsSL https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3|bash
2. Enable bash completion for Helm
helm completion bash > /etc/bash_completion.d/helm
3. and relogin to enable the bash completion or do `su - ` if running as `root ` user
4. List the cluster nodes' details. You can get the details of all nodes using the following command:
kubectl get nodes -o wide
You may install helm on only one of any master nodes.
5. List the cluster nodes' details. You can get the details of all nodes through the following command:
kubectl get nodes -o wide
Storage
RKE2 Kubernetes requires that at least 1 of the storage-class is available for storing data on the cluster, This is a mandatory step and requires the operator to decide well before deploying the production workload. Details provided are self-explanatory and should be considered according to the cluster usage.
Longhorn for Replicated Storage
Longhorn deployment is available at Longhorn Deployment Guide. This deployment model is for lighter scale cluster workloads and should be used with cautions that longhorn will require additional hardware specs for a production cluster. If this is only option, consider deploying the Longhorn on dedicated only in the cluster using node-affinity.
OpenEBS for Local Storage
Deploying OpenEBS enables localhost storage as target devices and can only be used in below given scenarios.
- Deployment of StatefulSets using nodeSelectors. In this deployment model, each statefulset is confined to a particular node so that it always be running on the same node. However, this inverses the High Availability of the statefulset services in such a way that when 1 worker node goes down, all services will not be available until the node recovers.
- Deploy StatefulSets in High-level replication and use local disks on each node. this deployment model gives the flexibility of having at least 3 nodes available with completes services.
Details on OpenEBS can be read here.
Expertflow CX Deployment on Kubernetes
Please follow the steps in the document, Expertflow CX Deployment on Kubernetes to deploy Expertflow CX Solution.
On this Page