Skip to main content
Skip table of contents

Failover Scenarios - High Availability


Failover scenarios

  1. System Failover
  2. Component Failover


System Failover

Normally, the High Availability(HA) setup is configured to forward and serve all the user traffic from the Primary System. HAProxy servicing as a load balancer and aggregator on top of both Primary and Secondary always takes care of the failover in such a way that it automatically shifts traffic load to the Secondary system in case the primary system goes down. This failover is automatic and does not require any manual effort involved. By design, there are multiple services ( MRE, Chat Server, and Communication Server ) stopped explicitly at the secondary system, and in case of failure at the Primary location, the Secondary system has to be prepared for traffic. This is accomplished by running a set of procedures on the Secondary.


all the HA related scripts mentioned below are located in <deployment-path>/HAScripts folder 

  • Renounce Scripts are to be run on the machine, that you want to shut down.
  • Takeover Scripts are to be run on the machine, that you want to be active.


ScenarioExpected Actions
Primary DownAll traffic should be routed and served at the Secondary System

Execute these scripts on the Secondary system to make it active and Primary.

1) takeover.sh
2) takeover_reporting.sh

Primary RecoversSecondary System should be renounced in favor of Primary 

After the Primary system recovery, follow these procedures to make is active.

On Secondary 

1) renounce.sh
2) renounce_reporting.sh


On Primary

and then either restart the complete solution using 

1) efutils all up


or


1)takeover.sh

2)takeover_reporting.sh


Renounce Scripts is to be run on the machine, that you won

Component Failover

MongoDB

Assuming that MongoDB is set up in a 3-node replica-set with Arbiter running on a separate VM.


IfExpected Result
1Primary Mongo instance is down
Failover occurs with R/W operations from Secondary (now elected as Primary)

Arbiter is down
The election is halted – Primary continues R/W operations. Secondary serves only READONLY. 
3Secondary Mongo instance is down
R/W operations continue from Primary. Once Secondary recovers it synchronizes data from PRIMARY.
4Both Primary and Arbiter mongo instances are down
Secondary serves READONLY mode
5Network-link of Secondary is down

Primary continues the R/W operations with Secondary disconnected. The secondary will be in READONLY mode.


Network-link of Secondary is restored
Upon link restoration Secondary automatically synchronizes data from Primary.
6Network-link of Primary is down
The secondary is elected as Primary. 

Network-link of Primary is restored
When the link is restored, the PRIMARY with id '0' is given precedence in the election, and Secondary steps down. 
8Election time10 Seconds Within 10 seconds the new election takes place 

In case any of the following Hybrid Chat components fail on primary, the complete VM should be stopped to 

  • Customer Channel Manager
  • Communication Server
  • Media Routing Engine
  • Chat-Server 

Active MQ Failover Scenarios

No.

Scenario

Behavior

1

AMQ-1 is down while SITE-A is active

AMQ-2 will take over and all client requests will be processed by the same SITE-A instance because of its higher consumer priority.

2

Both AMQ-1 and SITE-A is down

AMQ-2 and SITE-B will start receiving requests. SITE-B will acquire all agent’s XMPP subscription and will start processing requests.

3

SITE-A  restores

Connector-2 will continue to process requests until connectivity between the Client application and Connector-2 is lost or Connector-2 is down.

4

The link between Connector-1 and Connector-2 is down

Both connector instances serve requests independently.

5

AMQ-1 restores while SITE-A is still down and AMQ-2 is also down

The client will send a request to AMQ-1. AMQ-1 will send the request to SITE-B  because SITE-A is down. 

Request Flow: Client-App-1 → AMQ-1 → SITE-B 

Response Flow: SITE-B→ AMQ-1 → Client-App-1 

6

SITE-A  is down while both AMQ are active

The request flow will be the same as no. 5 

7

SITE-B  is down while both AMQ are active

AMQ-2 requests will be redirected to AMQ-1 and GC-1 will handle all requests. 

Request Flow: Client-App-2 → AMQ-2 → AMQ-1 →SITE-A

Response Flow: SITE-A → AMQ-1 → AMQ-2 → Client-App-2

Load Balancer

HA-Proxy is deployed as a single point of failure. You have the option to use any other load balancer other than HAProxy. However, if there is a requirement for High-Availability at Load Balancer level, a cluster of HAProxy based Load Balancer should be configured manually.

Rasa Bot Failover

Rasa Bot failover is beyond the scope of HC HA support.

SQL Server Failover

SQL Server failover is beyond the scope. The customer needs to handle SQL failover.


Facebook Connector Failover Scenarios

ScenarioExpected Behaviour
CCM is unreachable for Facebook

When CCM is not accessible to Facebook for any reason, Facebook retries for a finite number of times and marks it as dead afterward.

Restore link between Facebook and CCM

When the link is restored, the deployment engineer must re-register the CCM web-hook in Facebook to receive messages from Facebook. 

CCM can, however, send enqueued messages to Facebook even without re-registration of the web-hook. 

Facebook is unreachable for CCM

Facebook messages will continue to arrive via the CCM web-book. However, CCM makes indefinite retires until the message is delivered.

Viber Connector Failover Scenarios

ScenarioExpected Behaviour
CCM is unreachable to Viber

When CCM is not accessible to Facebook for any reason, Facebook retries for a finite number of times and marks it as dead afterward.

Restore link between Viber and CCM

When the link is restored, the deployment engineer must re-register the CCM web-hook in Viber to receive messages from Viber. CCM can, however, send enqueued messages to Viber even without re-registration of the web-hook. 

Viber is unreachable for CCM

Viber messages will continue to arrive via the CCM webhook. However, CCM makes indefinite retires until the message is delivered.




JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.