Data Migration On-boarding on Expertflow Data Platform

Introduction

With every new CX release, there might come a need to upgrade the data in the source MongoDB, which involves creating JavaScript files to be executed in the Mongo shell. As we transition from running data migrations directly in the Mongo shell to using the Expertflow Data Platform, teams now need the ability to configure and utilize the platform to perform their respective operations. This guide outlines all the essential steps required to set up the data platform for each data migration activity.

Configurations

Below is the data_migration_config.yaml, from where the migration activity is controlled

CODE

 source:
  type: "mongodb"
  ## Connection string: mongodb://{username}:{password}@{host}:{port}/?authSource=admin
  connection_string: "mongodb://root:Expertflow123@192.168.2.202:31545/?authSource=admin"

  ## For batch processing, use this template
  batch_processing:
    conversation-manager:     # migration to run 
      js_file: "Updated_4.4-4.5_upgrade_Bulk_Repeat.js"
      start_date: "2025-01-01" # Should be updated according to data
      end_date: "2025-01-30" # Should be updated according to data
      interval: "720" ## minute-wise interval (0.5 day = 720)

  ## For non-batch processing, use this template
  non_batch_processing:
    RE_adminPanel:
      js_file: "Updated_4.4-4.5_upgrade - RE and Admin.js"

  tls: true  # Set to false if you don't want to use TLS
  tls_ca_file: "/transflux/certificates/mongo_certs/mongodb-ca-cert"
  tls_cert_key_file: "/transflux/certificates/mongo_certs/client-pem"  # Includes both client certificate and private key

the data migration is performed either in batches or just a single run (non-batch). The batch migration is sub-divided in timelines and intervals. Below is description for each variable

connection_string: the connection string from the source mongodb
batch_processing: configure a batched pipeline for the provided information
- Within the batch_processing,
  - conversation-manager: This is the name assigned to a migration; it is used when creating a pipeline (DAG).
    - js_file Name of the JS file to execute for respective migration
    - start_date start date of the data to process (consult from source mongodb)
    - end_date end date of the data till the data processes (consult from source mongodb)
    - interval Minute-wise interval that filters the data to be processed within a specific timeline
non_batch_processing: configure a non-batch pipeline for the provided information
- Within the non_batch_processing
  - RE_adminPanel This is the name assigned to a migration; it is used when creating a pipeline (DAG).
    - js_file Name of the JS file to execute for respective migration
tls: TLS flag that determines if the mongo database supports only TLS verified connection
- tls_ca_file: path for mongo-ca-cert file
- tls_cert_key_file path for client-pem file

Pre-requisites

Before you get started

Make sure the desired JavaScript file has been tested on a test MongoDB server to avoid conflicts when running the activity through the data platform.
Make sure to create a feature branch from develop as per the respective ticket, the criterion for naming the feature is <current-release>_f-<jira_ticket_id> e.g 4.8_f-BI-185

How to proceed

Paste your .js file in mongo_migration_scripts folder

Edit it and make sure that the connection to databases is handled by the method db.getSiblingDB() i.e.

CODE

conversationManagerDb = db.getSiblingDB("conversation-manager_db")

CODE

// When using multiple databases in a single JS file

reDb = db.getSiblingDB("routing-engine_db");
adminPanel = db.getSiblingDB("adminPanel");

For batch processing pipelines, edit the .js file and place the following date filter in the beginning (this will change dynamically as per the start and end time declared in config file while the pipeline is running)
CODE
```
const input_start_dateTime = new Date("2025-01-01T00:00:00Z"); // Replace with your start date-time
const input_end_dateTime = new Date("2025-01-31T23:59:59Z");   // Replace with your end date-time
```

Edit the file config/data_migration_config.yaml, add your respective information in either batch_processing or non_batch_processing as per the need of .js file execution pattern

for batch_processing

CODE

batch_processing:
    <migration_to_run>:     # migration to run 
      js_file: <js_file_name>
      start_date: <start_date> # Should be updated according to data
      end_date: <end_date> # Should be updated according to data
      interval: 720 ## minute-wise interval (0.5 day = 720)

for non_batch_processing

CODE

non_batch_processing:
    <migration_to_run>:
      js_file: <js_file_name>

Create a new dag file in dags folder. the naming convention for setting up file name in dags folder is <migration_to_run>_migration_<batch_or_non_batch>_pipeline_dag.py

Paste and adjust the following contents as required

for batch processing pipeline

CODE

import datetime

from airflow import DAG
from transflux.src.dag_factory.data_migration_batch_callable import create_dag_migration_batch

DAG_ID = <dag_id>       ## The name of the DAG pipeline shown in the airflow UI
migration_to_run = <migration_to_run>     ## From the data_migration_config.yaml as set

dag = create_dag_migration_batch(
    dag_id=DAG_ID,
    migration_to_run=migration_to_run
)

# Register the DAG
globals()[DAG_ID] = dag

for non-batch processing pipeline

CODE

import datetime

from airflow import DAG
from transflux.src.dag_factory.data_migration_non_batch_callable import create_dag_migration_non_batch

DAG_ID = <dag_id>       ## The name of the DAG pipeline shown in the airflow UI
migration_to_run = <migration_to_run>     ## From the data_migration_config.yaml as set

dag = create_dag_migration_non_batch(
    dag_id=DAG_ID,
    migration_to_run=migration_to_run
)

# Register the DAG
globals()[DAG_ID] = dag

Deploy the solution on machine. The pipeline would appear as per the set <dag_id> in the …. migration_dag.py file
For the batch processing pipeline, turning the pipeline ON (side bar shown to left of pipeline name) will start running the pipeline operation
For non-batch processing pipeline, in addition to turning ON the pipeline, the pipeline should be triggered (play button shown to the right side of pipeline name)

For further details of operating migration pipelines from airflow UI, follow Migration Activity ( CX-4.7 to CX-4.8 )