Skip to main content
Skip table of contents

Geographical Deployment of Expertflow CX with Redundancy

The purpose of this document is to provide steps for the Geo-cluster (multi-region) deployment of Expertflow CX solution. The main purpose is to provide a redundant site for disaster recovery (DR) with the primary site.

Current Deployment Scenario

We have tested the Geo-cluster solution deployment on a collection of 3 master nodes with 3 worker nodes.

  • 3 Control Plane nodes

  • 3 Worker nodes

  • Cstor for replicated storage on worker nodes based on block storage

System Requirements

The system requirements for Geo-cluster solution using Kubernetes RKE2 distribution are:

RAM (GB) 

CPU

DISK

Minimum Nodes

16

8

150 GB per node

100 additional unformatted block storage for each worker node.

  • 3 master Node (each master will be deployed in a different region)

  • 3 worker nodes ( with 100GB additional raw unformatted block storage each)

Storage Setup - cStor

cStor is the recommended resilient storage for Geo-cluster solution.

cStor uses the raw block devices attached to the Kubernetes worker nodes to create cStor Pools. There are raw (unformatted) block devices attached to the Kubernetes worker nodes. The devices can be either direct attached devices (SSD/HDD) or cloud volumes (GPD, EBS).

Deployment Steps for cStor

Deploy cStor on the first control plane node.

  1. The deployment of cStor in our scenario is done using Helm. Helm helps us manage the Kubernetes application. Helm documentation can be accessed here. Helm deploys the components on all the added nodes automatically.

CODE
helm repo add openebs https://openebs.github.io/charts
helm repo update
helm uninstall rke2-snapshot-controller rke2-snapshot-controller-crd -n kube-system
helm install openebs --namespace openebs openebs/openebs --set cstor.enabled=true --create-namespace
  1. To verify that pods are up and running, use the following command:

CODE
kubectl get pod -n openebs
  1. Now, we need to block devices on each of the nodes (no file system must be present when the drive is mounted). To verify the presence of available block storage, use the following command:

CODE
kubectl get bd -n openebs

Sample Output:-

CODE
NAME                                          NODENAME         SIZE         CLAIMSTATE  STATUS   AGE
blockdevice-01afcdbe3a9c9e3b281c7133b2af1b68  worker-node-3    21474836480   Unclaimed   Active   2m10s
blockdevice-10ad9f484c299597ed1e126d7b857967  worker-node-1    21474836480   Unclaimed   Active   2m17s
blockdevice-3ec130dc1aa932eb4c5af1db4d73ea1b  worker-node-2    21474836480   Unclaimed   Active   2m12s
  1. The above command shows the node name and its block device.  Make sure that all the worker nodes have block devices present as this is what will be used when deploying a replicated storage pool.

Creation of Storage Pool

  1. Use the above block devices to create a storage pool.  Create a new file called

cspc.yaml and modify it's content as below:

CODE
apiVersion: cstor.openebs.io/v1
kind: CStorPoolCluster
metadata:
 name: cstor-disk-pool
 namespace: openebs
spec:
 pools:
   - nodeSelector:
       kubernetes.io/hostname: "worker-node-1"
     dataRaidGroups:
       - blockDevices:
           - blockDeviceName: "blockdevice-10ad9f484c299597ed1e126d7b857967"
     poolConfig:
       dataRaidGroupType: "stripe"

   - nodeSelector:
       kubernetes.io/hostname: "worker-node-2"
     dataRaidGroups:
       - blockDevices:
           - blockDeviceName: "blockdevice-3ec130dc1aa932eb4c5af1db4d73ea1b"
     poolConfig:
       dataRaidGroupType: "stripe"

   - nodeSelector:
       kubernetes.io/hostname: "worker-node-3"
     dataRaidGroups:
       - blockDevices:
           - blockDeviceName: "blockdevice-01afcdbe3a9c9e3b281c7133b2af1b68"
     poolConfig:
       dataRaidGroupType: "stripe"

In the above command, the block device ID reflects the node to which it is attached. In NodeSelector, add the hostname of the node.

  1. To get the nodeSelector value for each host, run the following command:

CODE
kubectl get node --show-labels
  1. Edit the hostname and the block id relevant to each node on the above cspc.yaml file. Once done run the following command:

CODE
kubectl apply -f cspc.yaml
  1. To verify that all the block devices are part of the storage pool, run the following command. This usually takes around 3-5 minutes.

CODE
kubectl get cspc -n openebs

Sample output:-

CODE
NAME                   HEALTHYINSTANCES   PROVISIONEDINSTANCES   DESIREDINSTANCES     AGE
cstor-disk-pool        3                  3                      3                    2m2s
  1. Now verify each block device has its pool online with the following command:

CODE
kubectl get cspi -n openebs

Sample output:-

CODE
NAME                  HOSTNAME             ALLOCATED   FREE    CAPACITY   STATUS   AGE
cstor-disk-pool-vn92  worker-node-1        60k         9900M    9900M     ONLINE   2m17s
cstor-disk-pool-al65  worker-node-2        60k         9900M    9900M     ONLINE   2m17s
cstor-disk-pool-y7pn  worker-node-3        60k         9900M    9900M     ONLINE   2m17s
  1. Once the above is verified we now need to create a storage class. 

Create a file name cstor-csi-disk.yaml and paste the below contents into it.

CODE
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: cstor-csi-disk
provisioner: cstor.csi.openebs.io
allowVolumeExpansion: true
parameters:
  cas-type: cstor
  # cstorPoolCluster should have the name of the CSPC
  cstorPoolCluster: cstor-disk-pool
  # replicaCount should be <= no. of CSPI created in the selected CSPC
  replicaCount: "3"
  1. After copying the above contents in the

cstor-csi-disk.yaml, apply it using the following command:

CODE
kubectl apply -f cstor-csi-disk.yaml
  1. Following the above command, you will have a cStor storage class. You can verify it by using the following command:

CODE
kubectl get sc

The output of the command shows the three possible storage classes - one for cStor and the other two for the local provisioner:

CODE
cstor-csi-disk               cstor.csi.openebs.io   Delete          Immediate              true                   34h
openebs-device               openebs.io/local       Delete          WaitForFirstConsumer   false                  34h
openebs-hostpath (default)   openebs.io/local       Delete          WaitForFirstConsumer   false                  34h

OpenEBS for Local Storage

Deploying OpenEBS enables localhost storage as target devices. We have deployed all the components using OpenEBS local path storage.

Make the storage class default, and replace <name> with your storage class name:

CODE
kubectl patch storageclass openebs-hostpath -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

Clone the Experflow CX Repository

  1. Start with cloning the repository from GitLab.

BASH
git clone -b <branch-name> https://efcx:RecRpsuH34yqp56YRFUb@gitlab.expertflow.com/cim/cim-solution.git

Modify the <release-branch> with your desired release branch name.

Node Affinity-Based Deployment 

To ensure the pods of the CX solution components are bound to site A as long as it is available, we have applied Node Affinity to CX components with assigned weightage which ensure pods are spined up on Site A when they are first deployed. In any case Node A becomes unavailable, and pods will be shifted from site A to site B.

For the replicas of the stateful set components, we have applied Pod anti-affinity which ensures the replicas are spined up on site B and no two Primary and Replica Pods are running on the same node if both sites are available. (i.e Mongodb's replica pod will not be in site A if site A is also hosting Mongodb's primary pod.)

Tainting Control Plane Nodes

By default, a control plane node can manage application workloads as well. This is okay for a lighter workload (~50 concurrent conversations) and CX Single Node Deployment. But, for a higher workload or a multi-cluster setup, all control plane nodes should be tainted to schedule control-plane pods only.

 

First, get the nodes to identify which are control-plane/master nodes.

CODE
kubectl get nodes

Then to taint the master nodes, use the following command for each master node.

CODE
kubectl taint nodes (nodename) node-role.kubernetes.io/master:NoSchedule

Once done allow the RKE Ingress to spin up on the control plane as well.

CODE
kubectl patch ds rke2-ingress-nginx-controller -n kube-system --type json -p='[{"op": "add", "path": "/spec/template/spec/tolerations", "value": [{"key": "node-role.kubernetes.io/master", "operator": "Exists", "effect": "NoSchedule"}]}]'

Expertflow CX Internal Components

Step 1: Change the directory

  1. Change to the directory to locate all the deployment yaml files.

BASH
cd ../..

Step 2: Blueprint for Node Affinity on CX components 

We will be using Node Affinity to keep the workload on the primary site at the start, only in case of downtime will the node shift towards any other site.

To apply Node Affinity, we first need to label our worker nodes.

  1. First, get the workers nodes by using the following command.

CODE
kubectl get nodes --show-labels

Output will be similar to this:-

CODE
NAME       STATUS   ROLES                       AGE   VERSION        LABELS
vm3     Ready    control-plane,etcd,master   37d   v1.24.7+k3s1   ..node-role.kubernetes.io/control-plane=true,node-role.kubernetes.io...
vm05   Ready    <none>                      37d   v1.24.7+k3s1   ..egress.k3s.io/cluster=true,env=cti,kubernetes.io/arch=amd64..
vm1    Ready    control-plane,etcd,master   37d   v1.24.7+k3s1   ..egress.k3s.io/cluster=true,env=cim,kubernetes.io/arch=amd64..

Now we will need to label these worker nodes and label the primary site worker nodes with the "primary" label.

CODE
kubectl label nodes <node name> site=primary 

Example:-

CODE
kubectl label nodes vm05 site=primary

Use the above command to label all the primary site worker nodes appropriately as above. You can also label your secondary site worker nodes similarly if needed according to your needs.

Step 3: Blueprint for applying Affinity on the Deployments.

Once labeling is completed for the primary site worker nodes, verify if the affinity block has been added in the pod deployment yaml files to allow the pods to always start spinning up in the primary site.

Yaml files can be found in the following directory.

CODE
cd cim/Deployments

Open up any of the yaml files and it should have the following node affinity values as uncommented down below.

It needs to be like the following example below for all yaml files.

CODE
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    ef.service: ef-agent-manager
    ef: expertflow
  name: ef-agent-manager
  namespace: expertflow

spec:
  replicas: 1
  selector:
    matchLabels:
      ef.service: ef-agent-manager

  strategy: {}
  template:
    metadata:
      labels:
        ef.service: ef-agent-manager
        ef: expertflow
    spec:
      imagePullSecrets:
      - name: expertflow-reg-cred
#      affinity:
#        nodeAffinity:
#          preferredDuringSchedulingIgnoredDuringExecution:
#          - weight: 50
#            preference:
#              matchExpressions:
#              - key: site
#                operator: In
#                values:
#                - worker-a

These are the exact value files that need to be commented out for node affinity to work in all the deployment yaml file. Change the "Value" annotation to the label you assigned to the worker node in Step 2. In the above example, we will change the "value" from "worker-a" to "primary" or to whatever label you have assigned to your worker nodes.

CODE
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 50
            preference:
              matchExpressions:
              - key: site
                operator: In
                values:
                - primary

Step 4: Disabling Init Containers in Deployment files.

We will also need to disable init containers in the same path as 

CODE
cd cim/Deployments

First, we will disable init containers for ef-conversation controller yaml.

CODE
vi ef-conversation-controller-deployment.yaml 

And make sure the below lines are commented out containing the init container, all of these lines need to be commented out.

CODE
##      initContainers:
##      - name: wait-for
##        image: ghcr.io/patrickdappollonio/wait-for:latest
##        imagePullPolicy: IfNotPresent
##        env:
##        - name: MONGO_HOST
##          value: "mongo-mongodb.ef-external.svc.cluster.local:27017"
##        - name: REDIS_HOST
##          value: "redis-master.ef-external.svc.cluster.local:6379"
##        command:
##          - /wait-for
##        args:
##          - --host="$(MONGO_HOST)"
##          - --host="$(REDIS_HOST)"
##          - --verbose

Now once these are done we will also comment out for routing engine deployment yaml

CODE
vi ef-routing-engine-deployment.yaml

Again comment out the following lines.

CODE
##      initContainers:
##      - name: wait-for
##        image: ghcr.io/patrickdappollonio/wait-for:latest
##        imagePullPolicy: IfNotPresent
##        env:
##        - name: MONGO_HOST
##          value: "mongo-mongodb.ef-external.svc.cluster.local:27017"
##        - name: REDIS_HOST
##          value: "redis-master.ef-external.svc.cluster.local:6379"
##        command:
##          - /wait-for
##        args:
##          - --host="$(MONGO_HOST)"
##          - --host="$(REDIS_HOST)"
##          - --verbose

The below architecture is used while using pod anti-affinity, pod affinity, node affinity and anti-affinity.

Components

Node Affinity

Pod anti affinity

NodeSelector

CIM Solution Components (CX Components)

Postgres (master)

Postgres (replica)

Mongodb (replica and master)

Redis (master)

✓ 

Redis (replica)

 ✓

Keycloak

  ✓

Step 5: Install Rancher OPTIONAL STEP

Rancher is web-UI for managing Kubernetes clusters.

  1. To deploy the Rancher Web-UI, add the Helm repository.

  1. Install the cert-manager required for the Rancher. 

After installation, wait for at least 30 seconds for cert-manager to start

BASH
helm upgrade --install=true cert-manager \
--wait=true \
--timeout=10m0s \
 --debug \
--namespace cert-manager \
--create-namespace \
--version v1.10.0 \
--values=external/cert-manager/values.yaml \
external/cert-manager
  1. Use the following command to see if all cert-manager pods are up and running.

BASH
kubectl get pods -n cert-manager
  1. Deploy the rancher using Helm Chart.

BASH
helm upgrade --install=true --wait=true --timeout=10m0s --debug rancher --namespace cattle-system --create-namespace --values=external/rancher/values.yaml external/rancher
  1. Rancher is by default not accessible outside the cluster. To make it accessible, change the service type from Cluster-IP to NodePort:

BASH
kubectl -n cattle-system patch svc rancher -p '{"spec": {"type": "NodePort"}}'
  1. Get the Rancher Service port by using the following command:

BASH
kubectl -n cattle-system get svc rancher -o go-template='{{(index .spec.ports 1).nodePort}}';echo;
  1. Now you can access the Rancher Web UI. It will be accessible at any-node-ip-of-cluster:PORT-from-above-command.

default username/password is admin/ExpertflowRNCR

Step 6: Create Namespace

All Expertflow components are deployed in a separate namespace inside the Kubernetes called 'expertflow'.

  1. Run the following command on the master node. Create the namespace using the command.

BASH
kubectl create namespace expertflow
  1. All external components will be deployed in

ef-external namespace. Run the following command on the master node.

BASH
kubectl create namespace ef-external

Step 7: Image Pull Secret

  1. For expertflow namespace, use the following command:

BASH
kubectl apply -f pre-deployment/registryCredits/ef-imagePullSecret-expertflow.yaml
  1. Run the following command for ef-external namespace:

BASH
kubectl apply -f pre-deployment/registryCredits/ef-imagePullSecret-ef-external.yaml

Step 8: Update the FQDN

  1. Decide the FQDN to be used in your solution and change the <FQDN> to your actual FQDN as given in the following command:

BASH
sed -i 's/devops[0-9]*.ef.com/<FQDN>/g' cim/ConfigMaps/* pre-deployment/grafana/* pre-deployment/keycloak/* cim/Ingresses/traefik/* 

Replace FQDN with the name of your Master Node FQDN when deploying the solution on Single Control Plane node. For Multi-Control-plane setup, use VIP or FQDN associated with VIP

Expertflow CX External Components 

Following are the required external components that need to be deployed with Expertflow CX:

Blueprint for Pod anti-affinity on Replica Pods For Helm Based Deployments (Optional)

These methods are to be used for helm-based deployments i.e. all the external components so that their replica pods don't spin up on the same node if it is not done on its own. These changes can be made on any helm file with the below pre-existing values in it’s .yaml file.

1. To apply pod anti-affinity, we first need to label pods so they can be segregated based on their labeling. To label pods, the following should be added to the value file:

CODE
  labels:
    app: store
  1. Next, once the pod has been assigned a label,

To set pod anti-affinity change the following values in the helm file 

CODE
  affinity:
     podAntiAffinity:
       requiredDuringSchedulingIgnoredDuringExecution:
       - labelSelector:
           matchExpressions:
           - key: app
             operator: In
             values:
             - store
         topologyKey: "kubernetes.io/hostname"


In the above code, the topology key refers to the node label on which you are applying pod anti-affinity to ensure no two pods whose label has been assigned as app with value store be spinning up together on this node.
Deploying External Components:

1. PostgreSQL 

To deploy Postgres in high availability we will use Postgre's pgpool which provides automated failover while also ensuring high availability in case any of the master pods is affected within a node. This deployment can be done directly from the helm file only changes needed are amount of replica pods to be deployed which can be adjusted. This value is changed on both under postgresql and under pgpool as well.

To change the amount of replicas edit the following value in values.yaml file which can be found using the command below.

CODE
cd external/bitnami/postgresql-ha/values.yaml

CODE
postgresql:

  image:
    registry: docker.io
    repository: bitnami/postgresql-repmgr
    tag: 15.3.0-debian-11-r2
    digest: ""
    ## Specify a imagePullPolicy. Defaults to 'Always' if image tag is 'latest', else set to 'IfNotPresent'
    ## ref: https://kubernetes.io/docs/user-guide/images/#pre-pulling-images
    ##
    pullPolicy: IfNotPresent
    ## Optionally specify an array of imagePullSecrets.
    ## Secrets must be manually created in the namespace.
    ## ref: https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/
    ## Example:
    ## pullSecrets:
    ##   - myRegistryKeySecretName
    ##
    pullSecrets: []
    ## Set to true if you would like to see extra information on logs
    ##
    debug: false
  ## @param postgresql.labels Labels to add to the StatefulSet. Evaluated as template
  ##
  labels: {}
  replicaCount: 3

--------------------------------------------------------------------------------------------------------------------------------------------------------

pgpool:
  ## Bitnami Pgpool image
  ## ref: https://hub.docker.com/r/bitnami/pgpool/tags/
  ## @param pgpool.image.registry Pgpool image registry
  ## @param pgpool.image.repository Pgpool image repository
  ## @param pgpool.image.tag Pgpool image tag
  ## @param pgpool.image.digest Pgpool image digest in the way sha256:aa.... Please note this parameter, if set, will override the tag
  ## @param pgpool.image.pullPolicy Pgpool image pull policy
  ## @param pgpool.image.pullSecrets Specify docker-registry secret names as an array
  ## @param pgpool.image.debug Specify if debug logs should be enabled
  ##
  image:
    registry: docker.io
    repository: bitnami/pgpool
    tag: 4.4.2-debian-11-r33
    digest: ""
    ## Specify a imagePullPolicy. Defaults to 'Always' if image tag is 'latest', else set to 'IfNotPresent'
    ## ref: https://kubernetes.io/docs/user-guide/images/#pre-pulling-images

   replicaCount: 3 

By default the value is set to 3 based on the number of nodes this can be changed as per needed.

PostgreSQL is deployed as a central datastore for both LicenseManager and Keycloak. 

  1. Create configmap for PostgreSQL to load the LicenseManager database and create keycloak_db:

CODE
kubectl -n ef-external create configmap ef-postgresql-license-manager-cm --from-file=./pre-deployment/licensemanager/licensemanager.sql
  1. Helm command for postgreSQL for clusters as given below:

CODE
helm upgrade --install=true --wait=true --timeout=10m0s --debug --namespace=ef-external --values=external/bitnami/postgresql-ha/values.yaml ef-postgresql external/bitnami/postgresql-ha

2. Keycloak

Since keycloak doesn't offer high availability within itself we manage it by providing it the external postgres that we deployed above and change it's internal database to the postgres one deployed externally.

The following changes need to be made in the keycloak’s helm values-ha.yaml file, which can be found in the following location

CODE
cd external/bitnami/keycloak/

CODE
postgresql:
  enabled: false
  auth:
    username: bn_keycloak
    password: ""
    database: bitnami_keycloak
    existingSecret: ""
  architecture: standalone
## External PostgreSQL configuration
## All of these values are only used when postgresql.enabled is set to false
## @param externalDatabase.host Database host
## @param externalDatabase.port Database port number
## @param externalDatabase.user Non-root username for Keycloak
## @param externalDatabase.password Password for the non-root username for Keycloak
## @param externalDatabase.database Keycloak database name
## @param externalDatabase.existingSecret Name of an existing secret resource containing the database credentials
## @param externalDatabase.existingSecretPasswordKey Name of an existing secret key containing the database credentials
## EXPERTFLOW
externalDatabase:
  host: "ef-postgresql-postgresql-ha-pgpool.ef-external.svc.cluster.local"
  port: 5432
  user: sa
  database: keycloak_db
  password: "Expertflow123"
  existingSecret: ""
  existingSecretPasswordKey: ""


 
  1. On the master node, create a global configmap for keycloak. Change the hostname and other parameters before applying this command in 

ef-keycloak-configmap.yaml file:

CODE
kubectl apply -f pre-deployment/keycloak/ef-keycloak-configmap.yaml
  1. The Helm command for Keycloak is given below:

CODE
helm upgrade --install=true --wait=true --timeout=10m0s --debug --namespace=ef-external --values=external/bitnami/keycloak/values-ha.yaml keycloak external/bitnami/keycloak/

3. Mongo DB

To enable high availability for MongoDB the following changes need to be made in the mongodb's helm value file. The Arbiter needs to be set as true. Affinity needs to be applied as below.

ReplicaCount needs to be set as per available worker nodes and hostname needs to be enabled while setting the appropriate replicasetname as per below.

Helm file  values-ha.yaml can be located at 

CODE
cd external/bitnami/mongodb/

CODE
arbiter:
  affinity: {}
  annotations: {}
  args: []
  command: []
  configuration: ""
  containerPorts:
    mongodb: 27017
  containerSecurityContext:
    enabled: true
    runAsNonRoot: true
    runAsUser: 1001
  customLivenessProbe: {}
  customReadinessProbe: {}
  customStartupProbe: {}
  enabled: true
-------------------------------------------------------------------------------------
affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    - labelSelector:
        matchExpressions:
        - key: app
          operator: In
          values:
          - store
      topologyKey: kubernetes.io/hostname
annotations: {}
--------------------------------------------------------------------------------------

replicaCount: 3
replicaSetConfigurationSettings:
  configuration:
    catchUpTimeoutMillis: 30000
    chainingAllowed: false
    electionTimeoutMillis: 10000
    heartbeatIntervalMillis: 2000
    heartbeatTimeoutSecs: 20
  enabled: true
replicaSetHostnames: true
replicaSetName: expertflow

1.Helm deployment for Mongo command is given below

CODE
helm upgrade --install=true --wait=true --timeout=10m0s --debug --namespace=ef-external --values=external/bitnami/mongodb/values-ha.yaml mongo external/bitnami/mongodb/

4. MinIO

To deploy minio in high availability, the following changes can be made to the helm value file for minio. The mode needs to be selected as distributedReplicaCount needs to be set as per need but it should be in even numbers and greater than or equal to 4, Zone should be 1, and drives per node should be 1 as well. Affinity needs to be applied as per below based on the first set pod label, then set value in affinity block.

Helm file  values-ha.yaml can be located at 

CODE
cd external/bitnami/minio/

CODE
clientImage:
  registry: docker.io
  repository: bitnami/minio-client
  tag: 2022.12.13-debian-11-r0
  digest: ""
## @param mode MinIO® server mode (`standalone` or `distributed`)
## ref: https://docs.minio.io/docs/distributed-minio-quickstart-guide
##
mode: distributed


--------------------------------------------------------------------------------------

statefulset:
  ## @param statefulset.updateStrategy.type StatefulSet strategy type
  ## ref: https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#update-strategies
  ## e.g:
  ## updateStrategy:
  ##  type: RollingUpdate
  ##  rollingUpdate:
  ##    maxSurge: 25%
  ##    maxUnavailable: 25%
  ##
  updateStrategy:
    type: RollingUpdate
  ## @param statefulset.podManagementPolicy StatefulSet controller supports relax its ordering guarantees while preserving its uniqueness and identity guarantees. There are two valid pod management policies: OrderedReady and Parallel
  ## ref: https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#pod-management-policy
  ##
  podManagementPolicy: Parallel
  ## @param statefulset.replicaCount Number of pods per zone (only for MinIO® distributed mode). Should be even and `>= 4`
  ##
  replicaCount: 4
  zones: 1
  ## @param statefulset.drivesPerNode Number of drives attached to every node (only for MinIO® distributed mode)
  ##
  drivesPerNode: 1
-------------------------------------------------------------------------------------------------------------------------
podLabels:
  app: minio
nodeAffinityPreset:
  ## @param nodeAffinityPreset.type Node affinity preset type. Ignored if `affinity` is set. Allowed values: `soft` or `hard`
  ##
  type: ""
  ## @param nodeAffinityPreset.key Node label key to match. Ignored if `affinity` is set.
  ## E.g.
  ## key: "kubernetes.io/e2e-az-name"
  ##
  key: ""
  ## @param nodeAffinityPreset.values Node label values to match. Ignored if `affinity` is set.
  ## E.g.
  ## values:
  ##   - e2e-az1
  ##   - e2e-az2
  ##
  values: []
## @param affinity Affinity for pod assignment. Evaluated as a template.
## ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity
## Note: podAffinityPreset, podAntiAffinityPreset, and nodeAffinityPreset will be ignored when it's set
##

  affinity:
     podAntiAffinity:
       requiredDuringSchedulingIgnoredDuringExecution:
       - labelSelector:
           matchExpressions:
           - key: app
             operator: In
             values:
             - minio
         topologyKey: "kubernetes.io/hostname"

CODE
helm upgrade --install=true --wait=true --timeout=10m0s --debug --namespace=ef-external --values=external/bitnami/minio/values-ha.yaml minio external/bitnami/minio/

5. Redis Sentinel

To provide high availability to Redis we have opted to deploy Redis sentinel, which is a high-availability solution designed to enhance the reliability and fault-tolerance version of Redis. At its core, Redis Sentinel enables the creation of a robust Redis deployment consisting of multiple Redis instances and Sentinel nodes. These Sentinel nodes constantly monitor the health of the Redis instances and automatically detect any failures or performance degradation. Upon detecting an issue, Sentinel orchestrates the failover process, promoting a standby Redis instance to become the new master.

To Enable Sentinel and Set Amount of Replicas. 

To enable sentinel edit the helm value file for Redis with the following changes. helm file can be found at

Helm file  values-ha.yaml can be located at 

CODE
cd external/bitnami/redis/

CODE
sentinel:
  ## @param sentinel.enabled Use Redis® Sentinel on Redis® pods.
  ## IMPORTANT: this will disable the master and replicas services and
  ## create a single Redis® service exposing both the Redis and Sentinel ports
  ##
  enabled: true
  ## Bitnami Redis® Sentinel image version

Set the enabled flag as True. this will allow a replica to become a master in case one of the pods gets affected in a node.

To set the amount of replicas change the following value in the Redis helm value file.

CODE
replica:
  ## @param replica.replicaCount Number of Redis® replicas to deploy
  ##
  replicaCount: 3
  ## @param replica.configuration Configuration for Redis® replicas nodes
  ## ref: https://redis.io/topics/config
  ##


Set the amount of replicas as needed based on the number of nodes.

CODE
helm upgrade --install=true --wait=true --timeout=10m0s --debug --namespace=ef-external --values=external/bitnami/redis/values-ha.yaml redis-ha external/bitnami/redis/

Setup Realtime Reports

Expertflow CX uses Grafana for business and solution monitoring. Business monitoring dashboards are embedded inside AgentDesk that provide real-time statistics for both agents and supervisors.

See Setup Grafana for embedded dashboards for details.

Setup Historical Reports

Expertflow CX uses Apache Superset for historical reports.

  1. Install Superset

  2. Setup Reporting Connector

Deploying Stateful Components 

To circumvent ActiveMQ's availability we have provided it with cloud-replicated storage to keep its storage in, this solves the high availability challenge. We will be using cstor that we deployed above to provide cloud-replicated storage. And updating its connection strings if not already updated. Below is the list of changes made to its deployment yaml containing the changes.

Changes in Spec: Env 

The file to be edited ef-amq-statefulset-ha.yaml is located at cim/StatefulSet/

CODE
vi cim/StatefulSet/ef-amq-statefulset-ha.yaml

Here we will be providing it with connection details for redis as well as postgres.

CODE
         env:
           - name: REDIS_HOST
             value: redis-master.ef-external.svc.cluster.local
           - name: REDIS_PORT
             value: "6379"
           - name: REDIS_PASSWORD
             value: Expertflow123
           - name: REDIS_SSL_ENABLED
             value: "false"
           - name: REDIS_MAX_ACTIVE
             value: "100"
           - name: REDIS_MAX_IDLE
             value: "100"
           - name: REDIS_MAX_WAIT
             value: "-1"
           - name: REDIS_MIN_IDLE
             value: "50"
           - name: REDIS_TIMEOUT
             value: "2000"
           - name: REDIS_SENTINEL_ENABLE
             value: "true"
           - name : REDIS_SENTINEL_MASTER
             value: "expertflow"
           - name : REDIS_SENTINEL_NODES
             value: "redis-ha-node-0.redis-ha-headless.ef-external.svc.cluster.local:26379,redis-ha-node-1.redis-ha-headless.ef-external.svc.cluster.local:26379,redis-ha-node-2.redis-ha-headless.ef-external.svc.cluster.local:26379"
           - name : REDIS_SENTINEL_PASSWORD
             value: "Expertflow123"
           - name: DB_URL
             value: ef-postgresql-postgresql-ha-pgpool.ef-external.svc
           - name: DB_USER

StatefulSet 

 ActiveMQ should be deployed before all other solution components. To deploy ActiveMQ as StatefulSet run.

CODE
kubectl apply -f cim/StatefulSet/ef-amq-statefulset-ha.yaml

Wait for the AMQ StatefulSet

CODE
kubectl wait pods ef-amq-0 -n ef-external --for condition=Ready --timeout=600s

Deploying CX Components

ConfigMaps

Conversation Manager ConfigMaps

If you need to change the default training, please update the corresponding files.

CODE
kubectl -n expertflow create configmap ef-conversation-controller-actions-cm --from-file=pre-deployment/conversation-Controller/actions
kubectl -n expertflow create configmap ef-conversation-controller-actions-pycache-cm --from-file=pre-deployment/conversation-Controller/__pycache__
kubectl -n expertflow create configmap ef-conversation-controller-actions-utils-cm --from-file=pre-deployment/conversation-Controller/utils

Unified Agent  ConfigMaps

 Translations for the unified agent are applicable in HC-4.1 and later releases.

CODE
kubectl -n expertflow  create configmap ef-app-translations-cm --from-file=pre-deployment/app-translations/unified-agent/i18n

 ConfigMaps have values that need to be uncommented for HA enablement. 1. Redis Sentinel 2. Mongodb

Edit the connection_env file in cim/ConfigMaps

CODE
vi cim/ConfigMaps/ef-connection-env-configmap.yaml

Enable the Redis Sentinel Flag in this and comment the single MongoDB host file and uncomment the multiple host file for MongoDB as show in below example

CODE
  ##MONGODB_HOST: mongodb://mongo-mongodb.ef-external.svc.cluster.local
  MONGODB_HOST: mongodb://mongo-mongodb-0.mongo-mongodb-headless.ef-external.svc.cluster.local:27017,mongo-mongodb-1.mongo-mongodb-headless.ef-external.svc.cluster.local:27017,mongo-mongodb-2.mongo-mongodb-headless.ef-external.svc.cluster.local:27017/?replicaSet=expertflow&tls=false&ssl=false&retrywrites=true&w=majority
-------------------------------------------------------------------------------------------
REDIS_SENTINEL_ENABLE: "true"

Now make changes to License Manager ConfigMaps

CODE
vi cim/ConfigMaps/ef-license-manager-configmap.yaml 

Uncomment the postgres-ha DB_URL as mentioned below and comment out the simple postgres DB URL.

CODE
   #DB_URL: jdbc:postgresql://ef-postgresql.ef-external.svc.cluster.local:5432/licenseManager
   DB_URL: jdbc:postgresql://ef-postgresql-postgresql-ha-pgpool.ef-external.svc.cluster.local:5432/licenseManager  

Apply all the configmap in ConfigMaps folder using

CODE
kubectl apply -f cim/ConfigMaps/

Services

Create services for all deployment EF components

CODE
kubectl apply -f cim/Services/

Services must be created before Deployments

Deployments

apply all the Deployment manifests 

CODE
kubectl apply -f cim/Deployments/

Team Announcement CronJob

 Team announcement cron job is applicable in HC-4.2 and later releases.

CODE
kubectl apply -f pre-deployment/team-announcement/

Import your own certificates

Now generate a secret with the certificate files. You must have a private.key and server.crt files available on the machine and in the correct directory.

for expertflow namespace:

CODE
kubectl -n expertflow create secret tls ef-ingress-tls-secret \
--key pre-deployment/certificates/server.key \
--cert pre-deployment/certificates/server.crt

and for ef-external namespace

CODE
kubectl -n ef-external create secret tls ef-ingress-tls-secret \
--key pre-deployment/certificates/server.key \
--cert pre-deployment/certificates/server.crt

Import your own certificates for RKE 

Now generate a secret with the following commands.

please modify the <FQDN> with your current fqdn before applying this command.

CODE
openssl req -x509 \
-newkey rsa:4096 \
-sha256 \
-days 3650 \
-nodes \
-keyout <fQDN>.key \
-out <FQDN>.crt \
-subj "/CN=<FQDN>" \
-addext "subjectAltName=DNS:www.<FQDN>,DNS:<FQDN>"

for expertflow namespace:

CODE
kubectl -n expertflow create secret tls ef-ingress-tls-secret --key  <fqdn>.key --cert <fqdn>.crt

and for ef-external namespace

CODE
kubectl -n ef-external  create secret tls ef-ingress-tls-secret --key  <fqdn>.key --cert <fqdn>.crt

Ingress

For K3s-based deployments using the Traefik Ingress Controller

Apply the Ingress Routes.

CODE
kubectl apply -f cim/Ingresses/traefik/

For RKE2-based Ingresses using Ingress-Nginx Controller

decide the FQDN to be used in your solution and change the <FQDN> in the below-given command to your actual FQDN

CODE
sed -i 's/devops[0-9]*.ef.com/<FQDN>/g' cim/Ingresses/nginx/*  

Apply the Ingress Routes.

BASH
kubectl apply -f cim/Ingresses/nginx/

Channel Manager Icons Bootstrapping

Once all expertflow service pods are completely  up and running, execute these steps for media channel icons to render successfully,

Run the minio-helper pod using

CODE
kubectl apply -f scripts/minio-helper.yaml

wait for the pod to start and copy the Media Icons from external folder inside the help pod.

CODE
 kubectl -n ef-external --timeout=90s wait --for=condition=ready pod minio-helper

and wait for the response pod/minio-helper condition met 

Copy the files to the minio-helper pod.

CODE
kubectl -n ef-external cp post-deployment/data/minio/bucket/default minio-helper:/tmp/

Copy the icon-helper.sh script inside the minio-helper pod 

CODE
 kubectl -n ef-external cp scripts/icon-helper.sh minio-helper:/tmp/

execute the icon-helper.sh using

CODE
kubectl -n ef-external exec -it minio-helper -- /bin/sh /tmp/icon-helper.sh

delete the minio-helper pod

CODE
kubectl delete -f scripts/minio-helper.yaml

Configurations

  1. Import default keyCloak realm for essential KeyCloak resources, permissions, and authentication configurations.

  2. If you intend to use Apache Superset for reporting, follow Configure and import historical report templates to configure the Reporting solution.

  3. For customer channel configuration, see customer channels.

  4. For CX-Voice component deployment this guide

Chat Initiation URL

To setup customer widget follow this link
https://expertflow-docs.atlassian.net/wiki/x/TgE4CQ

{FQDN}→  FQDN of Kubernetes Deployment 

Once all the deployments are successfully deployed, access the components to configure the solution. Keycloak is accessible at http://{cim-fqdn}/auth and unified-admin can be accessed using http://{cim-fqdn}/unified-admin and so on.

HA Testing Results/Remarks

Failover Testing

Strategy

Results / Changes Observed

Remarks

Node Failure

To acheive this we manually forced the node to be shut down.

After a node goes down kubernetes pods start shifting after a 5 minute wait window. This is the default behaviour of kubernetes. Previous node's pod

Node Failure CX Components

After 5 minute window these pods were moved to DR site, and spinning up issues were noticed in Routing engine Init and Conversation Controller Init. To solve this we have disabled the init containers for now.

New init conatiners would be designed that could have multiple end point for redis and mongo, so if one pod goes down they can communicate with others.

Node Failure Mongodb

If the primary pod is affected the either of the two replica becomes primary and starts its normal function this has been successful. Issue noticed with Arbiter running out of memory and Mongo tries to bring the original master up after a few hours on another node. Another fix applied is to remove arbiter from the connection string so components do not try to reach it

Node Failure Redis

If the primary pod is affected the either of the two replica becomes primary and starts its normal function this has been successful.

Node Failure Minio

Tested successfully.

Node Failure Postgres

If the primary pod is affected the either of the two replica becomes primary and starts its normal function this has been successful. Postgres is running in async mode.

Postgres performance tweaks include turning on async mode and disabling limits.

Node Failure Keycloak

After 5 minute window a new pod is created which connects with HA postgres. If pod is stuck in scheduling use the kubectl delete pod --force command 

kubectl delete pod --force command is needed if pod is stuck

Node Failure ActiveMQ

After 5 minute window a new pod is created which takes over replicated storage from Cstor. The previous pod needs to be manually terminated in case the new pod gets stuck in scheduling (kubectl delete pod --force command is needed to terminate previous one). We have moved ActiveMQ storage from local to postgreSQL

kubectl delete pod --force command is needed if pod is stuck. Storage is moved from local to postgres

OpenEBS Cstor

If the node goes down the virtual raw disk brings up a new disk identifier address that affects the replica pool. A physical storage would be preferred instead of a virtual disk.

HA Open Issues

Key
Authenticate to retrieve your issues

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.