Breadcrumbs

RKE2 Deployment in High Availability with DNS

Purpose

The purpose of this document is to describe steps to deploy the RKE2 Kubernetes Distribution in High Availability with DNS.

Pre-requisites 

The prerequisites and cluster topologies are describe in the RKE2 High Availability Pre-Deployment & Installation Guide. Please review the document before proceeding with installation in High Availability mode.


Quick Links



    Preparing for Deployment

    Decide what level of availability is needed and prepare your nodes according to the structure for the solution as given below.

    CIM Prerequisites

    DNS Setup

    This setup is considered experimental  and should be used with caution as it depends mainly on the availability of Enterprise DNS services running already in the customer's infrastructure setup.

    Steps to configure the FQDN to IP Address mapping are dependent on DNS setup being used. Generally the Virtual FQDN will be mapped to IP addresses of all Control-Plane nodes in Round-Robin manner with

    • DNS server able to do health checks on the Control-Plane Nodes' availability on ports 6443, 9345, 80 and 443. However, if this not the option, in case of failure of any of the control-plane nodes this has to be manually managed in DNS

    • DNS server is already setup in a HA manner and should not be a single point of failure in the cluster setup.

    • DNS server should be able to serve high level of recursive queries for Cluster endpoints.

    RKE2 Cluster WorkLoad Topologies

    A production cluster can be run with a mixture of workload options.

    1. For a lighter loaded cluster Control-Plane Nodes can also be part of the work-load along with their Worker nodes. 

    2. For a highly busy workload, it is recommended to off-load the Control-Plane Nodes from the work load so that Control-Plane nodes are not effected by the heavy usage of the cluster resources and only use Worker nodes for processing the business logic.

    Control-Plane Nodes (without Workload)

    Type

    RAM (GB) 

    CPU

    DISK

    Scalability 

    Network  Ports

    Minimum Nodes in HA

    RKE2

    4

    4

    150GiB (preferred on /var with SSD/NVMe Disks)

    high

    • 6443/TCP to be accessible by all nodes

    • 8472/UDP for CNI

    • 10250/TCP for metrics-server

    • 2379-2380/TCP for Cluster HA 

    3


    If workload is enabled for Control-Plane Nodes, please enhance these figures to maximum available like 16GiB RAM, 8 vCPUS and 250 GIB of storage.

    Worker Nodes

    Worker nodes requirements are more energetic to entertain the workload.

    Type

    RAM (GB) 

    CPU

    DISK

    Scalability 

    Network  Ports

    Minimum Nodes in HA

    RKE2

    16

    8

    250GiB ( preferred on /var with SSD/NVMe Disks)

    high

    • 6443/TCP to be accessible by all nodes

    • 8472/UDP for CNI

    • 10250/TCP for metrics-server

    • 2379-2380/TCP for Cluster HA 

    3+

    RASA-X Prerequisites

    In a Multi-node cluster, Rasa-X can be deployed in different tiers:

    1. In a multi-node HA cluster, add another node and deploy the Rasa-X using Node-Affinity in such a way that allocates 1 worker node to RASA-X only. This method is preferred in a HA cluster. Read more at Node Affinity and Node Selector.

    2. However, if this is not a do-able option, RKE2 for Single-Node deployment should be used for standalone RASA-X and then configure the CIM accordingly.

    Type

    RAM (GB) 

    CPU

    DISK

    Scalability 

    Network  Ports

    Minimum Nodes in HA

    RKE2

    12

    8

    250GiB ( preferred on /var with SSD/NVMe Disks)

    high

    • 6443/TCP to be accessible by all nodes

    • 8472/UDP for CNI

    • 10250/TCP for metrics-server

    • 2379-2380/TCP for Cluster HA 

    1

    Superset Prerequisites

    For BI Reporting, Superset must be deployed separately from the main CIM Solution.

    In a Multi-node cluster, Superset can be deployed in different tiers

    1. In a multi-node HA cluster, add another node and deploy the Superset using Node-Affinity in such a way that allocates 1 worker node specifically to superset only. This method is preferred in a HA cluster. Read Node Affinity and Node Selector

    2. However, if this is not a do-able option, RKE2 for Single Node Deployment should be used for standalone Superset and then configure the CIM accordingly.

    Type

    RAM (GB) 

    CPU

    DISK

    Scalability 

    Network  Ports

    Minimum Nodes in HA

    RKE2

    8

    8

    250GiB ( preferred on /var with SSD/NVMe Disks)

    high

    • 6443/TCP to be accessible by all nodes

    • 8472/UDP for CNI

    • 10250/TCP for metrics-server

    • 2379-2380/TCP for Cluster HA 

    1

    1 RKE2 detailed requirements are also available at this.

    2 Kubernetes detailed requirements can be seen here.

    This deployment model requires that your connection to the system is stable and consistent. You can use any virtual terminal like 'screen' or 'tmux' which gives you the ability to resume your session even if the network gets disconnected.

    FQDN 

    An FQDN must be mapped to an IP address

    Iptables

    If you are running iptables in nftables mode instead of legacy you might encounter issues. We recommend utilizing newer iptables (such as 1.6.1+) to avoid issues.

    Additionally, versions 1.8.0-1.8.4 have known issues that can cause RKE2 to fail. See Additional OS Preparations for workarounds.

    Prepare all the Nodes in the cluster

    Disable Services

    Disable firewall and nm-cloud-setup on all nodes.

    Environment Preparation

    Before starting with RKE2 installation, following are the optional steps and the checklist to make sure that environment is prepared for the installation:


    Linux-Based OS Instructions


    RHEL and Debian/Ubuntu Commands

    We must run the following commands for RHEL OS before starting the installation of K3S.

    Step1: Disable firewall and nm-cloud-setup service on RHEL and Ubuntu

    Bash
    systemctl disable apparmor.service
    systemctl disable firewalld.service
    systemctl stop apparmor.service
    systemctl stop firewalld.service
    

    Step 2: Lock the RedHat Release to version 8.7 only  mandatory

    To lock the release of RHEL to 8.7, which is the latest supported release by Longhorn, please execute these commands:

    Bash
    subscription-manager release --set=8.4 ;
    yum clean all;
    subscription-manager release --show;
    rm -rf /var/cache/dnf
    

    Step 3: Disable Swap 

    For RHEL and Ubuntu both:

    Bash
    systemctl disable swap.target
    swapoff -a
    

    Step 4: Update the RHEL package for 8.7 release

    Bash
    yum update -y
    



    Checklist

    Before proceeding with the deployment of Multi-Node HA cluster for RKE2, go through the checklist:

    Object

    Required

    • Internet Access is available for all the nodes. 1

    Internet access will be needed for all the nodes to fetch and run K3s

    • Minimum Number of Nodes

    3 Control-Plane Nodes + 2 Worker Nodes( For High Availability  ) 

    • All Nodes running verified OS Release

    RHEL-8.7 or Ubuntu-20.04

    • Firewall Service on all nodes is disabled

    Firewall and nm-cloud-setup must be disabled

    • In case of RHEL, Release is fixed to 8.7

    RHEL-8.7 is only supported

    • Virtual IP obtained

    IP from the same range of CP nodes is needed for VIP fail-over 

    • If Longhorn is to be deployed, iscsid.service is enabled and started

    ( check with systemctl status iscsid.service and confirm its enabled ). On All Nodes.2

    iscsid.service must be running before deploying longhorn Storage Manager

    • If NTP is available ( preferred ) 

    NTP should be enabled for all nodes 

    • POD + services IP range decided

    POD + Services IP Range must not co-exist with already existing IP Range

    • All nodes in cluster have same identical network interface names  

    Kube-VIP needs consistent interface names across all the control-plane nodes to fail-over. (  ip addr | grep -E ':\s.*?:' | cut -d ":" -f 2 | tr -d " "  ) can be used to list interfaces

    1. Air-Gapped deployment is also possible, check K3s web-site for more details at Air-Gapped install of K3s

    2.  if any of the nodes is not running iscsid.service, the stateful workload will fail and may result in data loss .

    Corporate HTTP/S Proxy Requirement

    If the environment has strict HTTP or HTTPS proxy set, we must exclude the environment from the proxy controls. 

    The NO_PROXY variable must include your  cluster pod and service IP ranges.

    Bash
    HTTP_PROXY=http://your-proxy.example.com:8888
    HTTPS_PROXY=http://your-proxy.example.com:8888
    NO_PROXY=127.0.0.0/8,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16,.svc,.cluster.local
    

    If you want to configure the proxy settings for container without affecting RKE2 and the Kubelet, you can prefix the variables with CONTAINERD_:

    Bash
    CONTAINERD_HTTP_PROXY=http://your-proxy.example.com:8888
    CONTAINERD_HTTPS_PROXY=http://your-proxy.example.com:8888
    CONTAINERD_NO_PROXY=127.0.0.0/8,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16,.svc,.cluster.local
    


    Customize the RKE2 Deployment for your Environment  Extra Option

    Environment Customization Steps


    Click here to see customization steps.....

    Below given options can also be used for customized environment setup:

    Option

    Switch

    Default

    Description

    Default Deployment Directory of K3s

    --data-dir value, -d value

    /var/lib/rancher/rke2

    Folder to hold state

    Default POD IP Assignment Range

    --cluster-cidr value

    "10.42.0.0/16"

    IPv4/IPv6 network CIDRs to use for pod IPs

    Default Service IP Assignment Range

    --service-cidr value

    "10.43.0.0/16"

    IPv4/IPv6 network CIDRs to use for service IPs

    If any of the above option is required, add it in the next step.

    cluster-cidr and service-cidr are independently evaluated. Decide wisely well before the the cluster deployment. This option is not configurable once the cluster is deployed and workload is running.



    Installation Steps

    Step 1: Prepare First Control Plane

    1. Run the below commands to create required directories for RKE2 configurations.

    Bash
    mkdir -p /etc/rancher/rke2/
    mkdir -p  /var/lib/rancher/rke2/server/manifests/
    

    2. Create a deployment manifest called config.yaml for RKE2 Cluster and replace the IP addresses and corresponding FQDNS according.( add any other fields from the Extra Options sections in config.yaml  at this point ) 

    Bash
    cat<<EOF|tee /etc/rancher/rke2/config.yaml
    tls-san:
      - devops67.ef.com
      - 10.192.168.67
      - devops61.ef.com
      - 10.192.168.61
      - devops62.ef.com
      - 10.192.168.62
      - devops63.ef.com
      - 10.192.168.63
    write-kubeconfig-mode: "0600"
    etcd-expose-metrics: true
    cni:
      - canal
    
    EOF
    


    In above mentioned template manifest,

    • 10.192.168.67 is the FQDN  IP

    • devops67.ef.com is the FQDN

    • remaining IPs and FQDN are for all 3 Control Planes


    Step 2: Ingress-Nginx config for RKE2

    1. By default RKE-2 based ingress controller doesn't allow additional snippet information in ingress manifests. Create this config before starting the deployment of RKE2.

    Bash
    cat<<EOF| tee /var/lib/rancher/rke2/server/manifests/rke2-ingress-nginx-config.yaml
    ---
    apiVersion: helm.cattle.io/v1
    kind: HelmChartConfig
    metadata:
      name: rke2-ingress-nginx
      namespace: kube-system
    spec:
      valuesContent: |-
        controller:
          metrics:
            service:
              annotations:
                prometheus.io/scrape: "true"
                prometheus.io/port: "10254"
          config:
            use-forwarded-headers: "true"
          allowSnippetAnnotations: "true"
    EOF
    

    2. Begin the RKE2 Deployment

    Bash
    curl -sfL https://get.rke2.io | INSTALL_RKE2_TYPE=server sh -
    

    3. Start the RKE2 service. Starting the Service will take approx. 10-15 minutes based on the network connection.

    Bash
    systemctl start rke2-server
    


    Step 3: Enable the RKE2 Service

    1. Enable RKE2 service.

    Bash
    systemctl enable rke2-server
    

    2. By default RKE2 deploys all the binaries in /var/lib/rancher/rke2/bin path. Add this path to system's default PATH for kubectl utility to work appropriately.

    Bash
    export PATH=$PATH:/var/lib/rancher/rke2/bin
    export KUBECONFIG=/etc/rancher/rke2/rke2.yaml
    

    3. Also, append these lines into current user's .bashrc  file.

    Bash
    echo "export PATH=$PATH:/var/lib/rancher/rke2/bin" >> $HOME/.bashrc
    echo "export KUBECONFIG=/etc/rancher/rke2/rke2.yaml"  >> $HOME/.bashrc 
    

    4. Get the token for joining other Control-Plane Nodes.

    Bash
    cat /var/lib/rancher/rke2/server/node-token
    


    Step 4: Remaining Control-Plane Nodes

    Perform these steps on remaining control-plane nodes.

    1. Create required directories for RKE2 configurations.

    Bash
    mkdir -p /etc/rancher/rke2/
    mkdir -p  /var/lib/rancher/rke2/server/manifests/
    

    2. Create a deployment manifest called config.yaml  for RKE2 Cluster  and replace the IP addresses and corresponding FQDNS according (add any other fields from the Extra Options sections in config.yaml  at this point).

    Bash
    cat<<EOF|tee /etc/rancher/rke2/config.yaml
    server: https://10.192.168.67:9345
    token: [token from /var/lib/rancher/rke2/server/node-token on server node 1]
    write-kubeconfig-mode: "0644" 
    tls-san:
      - devops67.ef.com
      - 10.192.168.67
      - devops61.ef.com
      - 10.192.168.61
      - devops62.ef.com
      - 10.192.168.62
      - devops63.ef.com
      - 10.192.168.63
    write-kubeconfig-mode: "0644"
    etcd-expose-metrics: true
    cni:
      - canal
    
    EOF
    


    In above mentioned template manifest,

    • 10.192.168.67 is the FQDN  IP

    • devops67.ef.com is the FQDN

    • remaining IPs and FQDN are for all 3 Control Planes

    Ingress-Nginx config for RKE2

    By default RKE-2 based ingress controller doesn't allow additional  snippet information in ingress manifests, create this config before starting the deployment of RKE2 

    Bash
    cat<<EOF| tee /var/lib/rancher/rke2/server/manifests/rke2-ingress-nginx-config.yaml
    ---
    apiVersion: helm.cattle.io/v1
    kind: HelmChartConfig
    metadata:
      name: rke2-ingress-nginx
      namespace: kube-system
    spec:
      valuesContent: |-
        controller:
          metrics:
            service:
              annotations:
                prometheus.io/scrape: "true"
                prometheus.io/port: "10254"
          config:
            use-forwarded-headers: "true"
          allowSnippetAnnotations: "true"
    EOF
    

    Step 5: Install RKE2 HA with DNS

    1. Begin the RKE2  Deployment. Starting the Service will take approx. 10-15 minutes based on the network connection

    Bash
    curl -sfL https://get.rke2.io | INSTALL_RKE2_TYPE=server sh -
    

    2. Start the RKE2 service

    Bash
    systemctl start rke2-server
    

    3. Enable the RKE2 Service

    Bash
    systemctl enable rke2-server
    

    4. By default, RKE2 deploys all the binaries in /var/lib/rancher/rke2/bin  path,  add this path to system's default PATH for kubectl utility to work appropriately

    Bash
    export PATH=$PATH:/var/lib/rancher/rke2/bin
    export KUBECONFIG=/etc/rancher/rke2/rke2.yaml
    

    5. Append these lines into current user's .bashrc  file

    Bash
    echo "export PATH=$PATH:/var/lib/rancher/rke2/bin" >> $HOME/.bashrc
    echo "export KUBECONFIG=/etc/rancher/rke2/rke2.yaml"  >> $HOME/.bashrc 
    


    Step 5: Deploy Worker Nodes

    1. Create the RKE2 directory.

    Bash
    mkdir -p /etc/rancher/rke2/
    

     2. Create the config.yaml  

    Bash
    cat<<EOF|tee /etc/rancher/rke2/config.yaml
    server: https://10.192.168.67:9345
    token: [token from /var/lib/rancher/rke2/server/node-token on server node 1]
    write-kubeconfig-mode: \"0644\"
    
    EOF
    
    
    

    3. Initiate the deployment of RKE2.

    Bash
    curl -sfL https://get.rke2.io | INSTALL_RKE2_TYPE=agent sh -
    

    4. Start the RKE2 Agent Service

    Bash
    systemctl start rke2-agent.service
    

    5. Enable the RKE2 Agent Service to start at the boot time

    Bash
    systemctl enable  rke2-agent.service
    

    6. Repeat these steps for all worker nodes

    Step 6: Bash Completion for kubectl

    1. Install bash-completion package

    Bash
    yum install bash-completion -y
    

    2. Set up autocomplete in bash into the current shell, bash-completion package should be installed first.

    Bash
    source <(kubectl completion bash) 
    echo "source <(kubectl completion bash)" >> ~/.bashrc 
    

    3. Also add alias for short notation of kubectl

    Bash
    echo "alias k=kubectl"  >> ~/.bashrc 
    echo "complete -o default -F __start_kubectl k"  >> ~/.bashrc 
    

    4. Source your ~/.bashrc  

    Bash
    source ~/.bashrc
    

    Step 7: Install Helm

    1. Helm is a nifty tool to deploy external components. To install helm on cluster, execute the following command.

    Bash
    curl -fsSL https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3|bash
    

    2. Enable bash completion for Helm

    Bash
    helm completion bash > /etc/bash_completion.d/helm
    

    3. and relogin to enable the bash completion  or do `su - ` if running as `root ` user

    4. List the cluster nodes' details. You can get the details of all nodes using the following command:

    Bash
    kubectl get nodes -o wide
    


    You may install helm on only one of any master nodes.

    Storage

    RKE2 Kubernetes requires that at least 1 of the storage-class is available for storing data on the cluster, This is a mandatory step and requires the operator to decide well before deploying the production workload. Details provided are self-explanatory and should be considered according to the cluster usage.

    Longhorn for Replicated Storage

    Longhorn deployment is available at Longhorn Deployment Guide. This deployment model is for lighter scale cluster workloads and should be used with cautions that longhorn will require additional hardware specs for a production cluster. If this is only option, consider deploying the Longhorn on dedicated only in the cluster using node-affinity.

    OpenEBS for Local Storage

    Deploying OpenEBS enables localhost storage as target devices and can only be used in below given scenarios. 

    • Deployment of StatefulSets using nodeSelectors. In this deployment model, each statefulset is confined to a particular node so that it always be running on the same node. However, this inverses the High Availability of the statefulset services in such a way that when 1 worker node goes down, all services will not be available until the node recovers.

    • Deploy StatefulSets in High-level replication and use local disks on each node. this deployment model gives the flexibility of having at least 3 nodes available with completes services.

    Details on OpenEBS can be read here.

    Expertflow CX Deployment on Kubernetes

    Please follow the steps in the document, Expertflow CX Deployment on Kubernetes to deploy Expertflow CX Solution.

    .




    On this Page