Data Migration On-boarding on Expertflow Data Platform
Introduction
With every new CX release, there might come a need to upgrade the data in the source MongoDB, which involves creating JavaScript files to be executed in the Mongo shell. As we transition from running data migrations directly in the Mongo shell to using the Expertflow Data Platform, teams now need the ability to configure and utilize the platform to perform their respective operations. This guide outlines all the essential steps required to set up the data platform for each data migration activity.
Configurations
Below is the data_migration_config.yaml
, from where the migration activity is controlled
source:
type: "mongodb"
## Connection string: mongodb://{username}:{password}@{host}:{port}/?authSource=admin
connection_string: "mongodb://root:Expertflow123@192.168.2.202:31545/?authSource=admin"
## For batch processing, use this template
batch_processing:
conversation-manager: # migration to run
js_file: "Updated_4.4-4.5_upgrade_Bulk_Repeat.js"
start_date: "2025-01-01" # Should be updated according to data
end_date: "2025-01-30" # Should be updated according to data
interval: "720" ## minute-wise interval (0.5 day = 720)
## For non-batch processing, use this template
non_batch_processing:
RE_adminPanel:
js_file: "Updated_4.4-4.5_upgrade - RE and Admin.js"
tls: true # Set to false if you don't want to use TLS
tls_ca_file: "/transflux/certificates/mongo_certs/mongodb-ca-cert"
tls_cert_key_file: "/transflux/certificates/mongo_certs/client-pem" # Includes both client certificate and private key
the data migration is performed either in batches or just a single run (non-batch). The batch migration is sub-divided in timelines and intervals. Below is description for each variable
connection_string
: the connection string from the source mongodbbatch_processing:
configure a batched pipeline for the provided informationWithin the batch_processing,
conversation-manager:
This is the name assigned to a migration; it is used when creating a pipeline (DAG).js_file
Name of the JS file to execute for respective migrationstart_date
start date of the data to process (consult from source mongodb)end_date
end date of the data till the data processes (consult from source mongodb)interval
Minute-wise interval that filters the data to be processed within a specific timeline
non_batch_processing:
configure a non-batch pipeline for the provided informationWithin the non_batch_processing
RE_adminPanel
This is the name assigned to a migration; it is used when creating a pipeline (DAG).js_file
Name of the JS file to execute for respective migration
tls
: TLS flag that determines if the mongo database supports only TLS verified connectiontls_ca_file
: path for mongo-ca-cert filetls_cert_key_file
path for client-pem file
Pre-requisites
Before you get started
Make sure the desired JavaScript file has been tested on a test MongoDB server to avoid conflicts when running the activity through the data platform.
Make sure to create a feature branch from
develop
as per the respective ticket, the criterion for naming the feature is <current-release>_f-<jira_ticket_id> e.g4.8_f-BI-185
How to proceed
Paste your .js file in
mongo_migration_scripts
folderEdit it and make sure that the connection to databases is handled by the method
db.getSiblingDB()
i.e.CODEconversationManagerDb = db.getSiblingDB("conversation-manager_db")
CODE// When using multiple databases in a single JS file reDb = db.getSiblingDB("routing-engine_db"); adminPanel = db.getSiblingDB("adminPanel");
For batch processing pipelines, edit the .js file and place the following date filter in the beginning (this will change dynamically as per the start and end time declared in config file while the pipeline is running)
CODEconst input_start_dateTime = new Date("2025-01-01T00:00:00Z"); // Replace with your start date-time const input_end_dateTime = new Date("2025-01-31T23:59:59Z"); // Replace with your end date-time
Edit the file
config/data_migration_config.yaml
, add your respective information in eitherbatch_processing
ornon_batch_processing
as per the need of .js file execution patternfor
batch_processing
CODEbatch_processing: <migration_to_run>: # migration to run js_file: <js_file_name> start_date: <start_date> # Should be updated according to data end_date: <end_date> # Should be updated according to data interval: 720 ## minute-wise interval (0.5 day = 720)
for
non_batch_processing
CODEnon_batch_processing: <migration_to_run>: js_file: <js_file_name>
Create a new dag file in
dags
folder. the naming convention for setting up file name in dags folder is <migration_to_run>_migration_<batch_or_non_batch>_pipeline_dag.pyPaste and adjust the following contents as required
for batch processing pipeline
CODEimport datetime from airflow import DAG from transflux.src.dag_factory.data_migration_batch_callable import create_dag_migration_batch DAG_ID = <dag_id> ## The name of the DAG pipeline shown in the airflow UI migration_to_run = <migration_to_run> ## From the data_migration_config.yaml as set dag = create_dag_migration_batch( dag_id=DAG_ID, migration_to_run=migration_to_run ) # Register the DAG globals()[DAG_ID] = dag
for non-batch processing pipeline
CODEimport datetime from airflow import DAG from transflux.src.dag_factory.data_migration_non_batch_callable import create_dag_migration_non_batch DAG_ID = <dag_id> ## The name of the DAG pipeline shown in the airflow UI migration_to_run = <migration_to_run> ## From the data_migration_config.yaml as set dag = create_dag_migration_non_batch( dag_id=DAG_ID, migration_to_run=migration_to_run ) # Register the DAG globals()[DAG_ID] = dag
Deploy the solution on machine. The pipeline would appear as per the set <dag_id> in the …. migration_dag.py file
For the batch processing pipeline, turning the pipeline ON (side bar shown to left of pipeline name) will start running the pipeline operation
For non-batch processing pipeline, in addition to turning ON the pipeline, the pipeline should be triggered (play button shown to the right side of pipeline name)
For further details of operating migration pipelines from airflow UI, follow Migration Activity ( CX-4.7 to CX-4.8 )