High Availability with Kube-VIP

The purpose of this document is to describe steps to deploy an RKE2 Kubernetes cluster in High Availability using Kube-VIP.

Prerequisites

The prerequisites and cluster topologies are describe in the Singe Node Deployment. Please review the document before proceeding with installation in High Availability mode.

	Node Required	vCPU	vRAM	vDisk (GiB)	Comments
RKE2	3 Control Plane nodes	2	4	50	See RKE2 installation requirements for hardware sizing, the underlying operating system, and the networking requirements.
CX-Core	2 Worker nodes	2	4	250	If cloud-native storage is not available, then 2 additional worker nodes are required 1 on site-A and 1 on site-B.
Superset	1 Worker node	2	8	250	For reporting

Preparing for Deployment

Kube-VIP Requirements

A VIP is a virtual IP Address that remains available and traverses between all the Control-Plane nodes seamlessly with 1 Control-Plane node active to Kube-VIP. Kube-VIP works exactly as keepalive except that it has some additional flexibilities to configure depending upon the environment for example Kube-VIP can work using

ARP – When using ARP or Layer 2 it will use leader election.

Other modes that can also be used such as BGP, Routing Table and Wireguard

In ARP mode same subnet VIP for all the control plane nodes is required
Kube-VIP deployment is dependent on the atleast one working RKE2 Control Plane node before we can deploy other nodes ( both CP and Workers ) .

Installation Steps

Step 1: Prepare First Control Plane

<FQDN> is the Kube-VIP FQDN

This step is required for the Nginx Ingress controller to allow customized configurations.

Step 1. Create Manifests

Create necessary directories for RKE2 deployment

Bash

mkdir -p /etc/rancher/rke2/
mkdir -p  /var/lib/rancher/rke2/server/manifests/

Generate the ingress-nginx controller config file so that the RKE2 server bootstraps it accordingly.

Bash

cat<<EOF| tee /var/lib/rancher/rke2/server/manifests/rke2-ingress-nginx-config.yaml
---
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: rke2-ingress-nginx
  namespace: kube-system
spec:
  valuesContent: |-
    controller:
      extraInitContainers:
        - name: ef-set-sysctl
          image: busybox
          securityContext:
            privileged: true
          command:
          - sh
          - -c
          - |
            sysctl -w net.core.somaxconn=65535
            sysctl -w net.ipv4.ip_local_port_range="1024 65535"
      metrics:
        #  DO not enable at the cluster install, enable when monitoring is deployed.
        enabled: false   
        service:
          annotations:
            prometheus.io/scrape: "true"
            prometheus.io/port: "10254"
        serviceMonitor:
           #  DO not enable at the cluster install, enable when monitoring is deployed.
           enabled: false  
      config:
        use-forwarded-headers: "true"
        keep-alive-requests: "10000"
        upstream-keepalive-requests: "1000"
        worker-processes: "auto"
        max-worker-connections: "65535"
        use-gzip: "true"
      allowSnippetAnnotations: "true"
EOF

Create a deployment manifest called

config.yaml for RKE2 Cluster and replace the IP addresses and corresponding FQDNS according.( add any other fields from the Extra Options sections in config.yaml at this point ). Kid entry of tls-san can have FQDN and IP addresses of control-plane nodes and worker nodes as well. If you deploying worker HA, uncomment to disable rke2 ingress.

Bash

cat<<EOF|tee /etc/rancher/rke2/config.yaml
#Uncomment for Control-Plane HA    tls-san and its kid entry <FQDN>
#tls-san:
#  - <FQDN>
write-kubeconfig-mode: "0644"
etcd-expose-metrics: true
etcd-snapshot-schedule-cron: "0 */6 * * *"
# Keep 56 etcd snapshorts (equals to 2 weeks with 6 a day)
etcd-snapshot-retention: 56
cni:
  - canal
#Uncomment for Worker HA Deployment
#disable: 
#  - rke2-ingress-nginx
#Uncoment the following to retain logs for any component without integrating with ELK stack
#kubelet-arg:                               
#  - "container-log-max-files=5"            
#  - "container-log-max-size=10Mi"
  
  
EOF

In above mentioned template manifest,

<FQDN> must be pointing towards the first control plane

Step 2. Download the RKE2 binaries and start Installation

Following are some defaults that RKE2 uses while installing RKE2. You may change the following defaults as needed by specifying the switches mentioned.

	Switch	Default	Description
To change the default deployment directory of RKE2	`--data-dir value, -d` value	`/var/lib/rancher/rke2` or `${HOME}/.rancher/rke2` if not root	Important Note: Moving the default destination folder to another location is not recommended. However, if there is need for storing the containers in different partition, it is recommended to deploy the containerd separately and change its destination to the partition where you have space available using `--root` flag in containerd.server manifest, and subsequently adding `#container-runtime-endpoint: "/path/to/containerd.sock"` switch in RKE2 config.yaml file.
Default POD IP Assignment Range	`--cluster-cidr value`	`"10.42.0.0/16"`	IPv4/IPv6 network CIDRs to use for pod IPs
Default Service IP Assignment Range	`--service-cidr` value	`"10.43.0.0/16"`	IPv4/IPv6 network CIDRs to use for service IPs

cluster-cidr and service-cidr are independently evaluated. Decide wisely well before the the cluster deployment. This option is not configurable once the cluster is deployed and workload is running.

Run the following command to install RKE2.

Bash

curl -sfL https://get.rke2.io |INSTALL_RKE2_TYPE=server  sh -

Enable the rke2-server service

Bash

systemctl enable rke2-server.service

Start the service

Bash

systemctl start rke2-server.service

RKE2 server requires 10-15 minutes (at least) to bootstrap completely You can check the status of the RKE2 Server using systemctl status rke2-server; Only procced once everything is up and running or configurational issues might occur requiring redo of all the installation steps.

Step 3. Kubectl Profile setup

By default, RKE2 deploys all the binaries in

/var/lib/rancher/rke2/bin path. Add this path to the system's default PATH for kubectl utility to work appropriately.

Bash

echo "export PATH=$PATH:/var/lib/rancher/rke2/bin" >> $HOME/.bashrc
echo "export KUBECONFIG=/etc/rancher/rke2/rke2.yaml"  >> $HOME/.bashrc
source ~/.bashrc

Step 4. Bash Completion for kubectl

Install bash-completion package

For Ubuntu:-

Bash

apt install bash-completion -y

For RHEL:-

Bash

yum install bash-completion -y

Set-up autocomplete in bash into the current shell, Also, add alias for short notation of kubectl

Bash

kubectl completion bash > /etc/bash_completion.d/kubectl
echo "alias k=kubectl"  >> ~/.bashrc 
echo "complete -o default -F __start_kubectl k"  >> ~/.bashrc 
source ~/.bashrc

Step 5. Install helm

Helm is a super tool to deploy external components. In order to install helm on cluster, execute the following command:

Bash

curl -fsSL https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3|bash

In case the above mentioned command does not work, follow the below mentioned commands:-

For Ubuntu:-

Bash

curl https://baltocdn.com/helm/signing.asc | gpg --dearmor | sudo tee /usr/share/keyrings/helm.gpg > /dev/null
apt-get install apt-transport-https --yes
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/helm.gpg] https://baltocdn.com/helm/stable/debian/ all main" | sudo tee /etc/apt/sources.list.d/helm-stable-debian.list
apt-get update
apt-get install helm

For RHEL:-

Bash

curl -L https://mirror.openshift.com/pub/openshift-v4/clients/helm/latest/helm-linux-amd64 -o /usr/local/bin/helm
chmod +x /usr/local/bin/helm
helm version

Step 6. Enable bash completion for helm

Generate the scripts for help bash completion

Bash

helm completion bash > /etc/bash_completion.d/helm

create link for crictl to work properly.

Bash

ln -s /var/lib/rancher/rke2/agent/etc/crictl.yaml /etc/crictl.yaml

Step 4: Deploy Kube-VIP

1. Decide the IP and the interface on all nodes for Kube-VIP and setup these as environment variables. This step must be completed before deploying any other node in the cluster (both CP and Workers).

Bash

export VIP=<FQDN>
export INTERFACE=<Interface>

2. Import the RBAC manifest for Kube-VIP

Bash

curl https://kube-vip.io/manifests/rbac.yaml > /var/lib/rancher/rke2/server/manifests/kube-vip-rbac.yaml

3. Fetch the kube-vip image

Bash

/var/lib/rancher/rke2/bin/crictl -r "unix:///run/k3s/containerd/containerd.sock"  pull ghcr.io/kube-vip/kube-vip:latest

4. Deploy the Kube-VIP

Bash

CONTAINERD_ADDRESS=/run/k3s/containerd/containerd.sock  ctr -n k8s.io run \
--rm \
--net-host \
ghcr.io/kube-vip/kube-vip:latest vip /kube-vip manifest daemonset --arp --interface $INTERFACE --address $VIP --controlplane  --leaderElection --taint --services --inCluster | tee /var/lib/rancher/rke2/server/manifests/kube-vip.yaml

5. Wait for the kube-vip to complete bootstrapping

Bash

kubectl rollout status daemonset   kube-vip-ds    -n kube-system   --timeout=650s

6. Once the condition is met, you can check the daemonset by kube-vip is running 1 pod

Bash

kubectl  get ds -n kube-system  kube-vip-ds

Once the cluster has more control-plane nodes added, the count will be equal to the total number of CP nodes.

Step 5: Remaining Control-Plane Nodes

Perform these steps on remaining control-plane nodes.

1. Create required directories for RKE2 configurations.

Bash

mkdir -p /etc/rancher/rke2/
mkdir -p  /var/lib/rancher/rke2/server/manifests/

2. Create a deployment manifest called config.yaml for RKE2 Cluster and replace the IP addresses and corresponding FQDNS according (add any other fields from the Extra Options sections in config.yaml at this point).

Bash

cat<<EOF|tee /etc/rancher/rke2/config.yaml
server: https://<FQDN>:9345
token: [token from /var/lib/rancher/rke2/server/node-token on server node 1]
write-kubeconfig-mode: "0644" 
tls-san:
  - <FQDN>
write-kubeconfig-mode: "0644"
etcd-expose-metrics: true
cni:
  - canal

EOF

In above mentioned template manifest,

<FQDN> is the Kube-VIP FQDN

Ingress-Nginx config for RKE2

By default RKE-2 based ingress controller doesn't allow additional snippet information in ingress manifests, create this config before starting the deployment of RKE2

Bash

cat<<EOF| tee /var/lib/rancher/rke2/server/manifests/rke2-ingress-nginx-config.yaml
---
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: rke2-ingress-nginx
  namespace: kube-system
spec:
  valuesContent: |-
    controller:
      metrics:
        service:
          annotations:
            prometheus.io/scrape: "true"
            prometheus.io/port: "10254"
      config:
        use-forwarded-headers: "true"
      allowSnippetAnnotations: "true"
EOF

Step 6: Begin the RKE2 Deployment

1. Begin the RKE2 Deployment

Bash

curl -sfL https://get.rke2.io | INSTALL_RKE2_TYPE=server sh -

2. Start the RKE2 service. Starting the Service will take approx. 10-15 minutes based on the network connection

Bash

systemctl start rke2-server

3. Enable the RKE2 Service

Bash

systemctl enable rke2-server

4. By default, RKE2 deploys all the binaries in /var/lib/rancher/rke2/bin path. Add this path to system's default PATH for kubectl utility to work appropriately.

Bash

export PATH=$PATH:/var/lib/rancher/rke2/bin
export KUBECONFIG=/etc/rancher/rke2/rke2.yaml

5. Also, append these lines into current user's .bashrc file

Bash

echo "export PATH=$PATH:/var/lib/rancher/rke2/bin" >> $HOME/.bashrc
echo "export KUBECONFIG=/etc/rancher/rke2/rke2.yaml"  >> $HOME/.bashrc

Step 7: Deploy Worker Nodes

On each worker node,

Run the following command to install RKE2 agent on the worker.

curl -sfL https://get.rke2.io | INSTALL_RKE2_TYPE="agent" sh -

Enable the rke2-agent service by using the following command.
```
systemctl enable rke2-agent.service
```
Create a directory by running the following commands.
```
mkdir -p /etc/rancher/rke2/
```
Add/edit /etc/rancher/rke2/config.yaml and update the following fields.
1. <Control-Plane-IP> This is the IP for the control-plane node.
2. <Control-Plane-TOKEN> This is the token which can be extracted from first control-plane by running cat /var/lib/rancher/rke2/server/node-token
  server: https://<Control-Plane-IP>:9345 token: <Control-Plane-TOKEN>
Start the service by using follow command.
```
systemctl start rke2-agent.service
```

Choose storage - See Storage Solution - Getting Started
CX-Core deployment on Kubernetes