Skip to main content
Skip table of contents

Data Migration On-boarding on Expertflow Data Platform

Introduction

With every new CX release, there might come a need to upgrade the data in the source MongoDB, which involves creating JavaScript files to be executed in the Mongo shell. As we transition from running data migrations directly in the Mongo shell to using the Expertflow Data Platform, teams now need the ability to configure and utilize the platform to perform their respective operations. This guide outlines all the essential steps required to set up the data platform for each data migration activity.

Configurations

Below is the data_migration_config.yaml, from where the migration activity is controlled

CODE
 source:
  type: "mongodb"
  ## Connection string: mongodb://{username}:{password}@{host}:{port}/?authSource=admin
  connection_string: "mongodb://root:Expertflow123@192.168.2.202:31545/?authSource=admin"

  ## For batch processing, use this template
  batch_processing:
    conversation-manager:     # migration to run 
      js_file: "Updated_4.4-4.5_upgrade_Bulk_Repeat.js"
      start_date: "2025-01-01" # Should be updated according to data
      end_date: "2025-01-30" # Should be updated according to data
      interval: "720" ## minute-wise interval (0.5 day = 720)

  ## For non-batch processing, use this template
  non_batch_processing:
    RE_adminPanel:
      js_file: "Updated_4.4-4.5_upgrade - RE and Admin.js"

  tls: true  # Set to false if you don't want to use TLS
  tls_ca_file: "/transflux/certificates/mongo_certs/mongodb-ca-cert"
  tls_cert_key_file: "/transflux/certificates/mongo_certs/client-pem"  # Includes both client certificate and private key

the data migration is performed either in batches or just a single run (non-batch). The batch migration is sub-divided in timelines and intervals. Below is description for each variable

  • connection_string: the connection string from the source mongodb

  • batch_processing: configure a batched pipeline for the provided information

    • Within the batch_processing,

      • conversation-manager: This is the name assigned to a migration; it is used when creating a pipeline (DAG).

        • js_file Name of the JS file to execute for respective migration

        • start_date start date of the data to process (consult from source mongodb)

        • end_date end date of the data till the data processes (consult from source mongodb)

        • interval Minute-wise interval that filters the data to be processed within a specific timeline

  • non_batch_processing: configure a non-batch pipeline for the provided information

    • Within the non_batch_processing

      • RE_adminPanel This is the name assigned to a migration; it is used when creating a pipeline (DAG).

        • js_file Name of the JS file to execute for respective migration

  • tls: TLS flag that determines if the mongo database supports only TLS verified connection

    • tls_ca_file: path for mongo-ca-cert file

    • tls_cert_key_file path for client-pem file

Pre-requisites

Before you get started

  • Make sure the desired JavaScript file has been tested on a test MongoDB server to avoid conflicts when running the activity through the data platform.

  • Make sure to create a feature branch from develop as per the respective ticket, the criterion for naming the feature is <current-release>_f-<jira_ticket_id> e.g 4.8_f-BI-185

How to proceed

  1. Paste your .js file in mongo_migration_scripts folder

  2. Edit it and make sure that the connection to databases is handled by the method db.getSiblingDB() i.e.

    CODE
    conversationManagerDb = db.getSiblingDB("conversation-manager_db")
    CODE
    // When using multiple databases in a single JS file
    
    reDb = db.getSiblingDB("routing-engine_db");
    adminPanel = db.getSiblingDB("adminPanel");
    1. For batch processing pipelines, edit the .js file and place the following date filter in the beginning (this will change dynamically as per the start and end time declared in config file while the pipeline is running)

      CODE
      const input_start_dateTime = new Date("2025-01-01T00:00:00Z"); // Replace with your start date-time
      const input_end_dateTime = new Date("2025-01-31T23:59:59Z");   // Replace with your end date-time
  3. Edit the file config/data_migration_config.yaml, add your respective information in either batch_processing or non_batch_processing as per the need of .js file execution pattern

    1. for batch_processing

      CODE
      batch_processing:
          <migration_to_run>:     # migration to run 
            js_file: <js_file_name>
            start_date: <start_date> # Should be updated according to data
            end_date: <end_date> # Should be updated according to data
            interval: 720 ## minute-wise interval (0.5 day = 720)
    2. for non_batch_processing

      CODE
      non_batch_processing:
          <migration_to_run>:
            js_file: <js_file_name>
  4. Create a new dag file in dags folder. the naming convention for setting up file name in dags folder is <migration_to_run>_migration_<batch_or_non_batch>_pipeline_dag.py

  5. Paste and adjust the following contents as required

    1. for batch processing pipeline

      CODE
      import datetime
      
      from airflow import DAG
      from transflux.src.dag_factory.data_migration_batch_callable import create_dag_migration_batch
      
      DAG_ID = <dag_id>       ## The name of the DAG pipeline shown in the airflow UI
      migration_to_run = <migration_to_run>     ## From the data_migration_config.yaml as set
      
      dag = create_dag_migration_batch(
          dag_id=DAG_ID,
          migration_to_run=migration_to_run
      )
      
      # Register the DAG
      globals()[DAG_ID] = dag
    2. for non-batch processing pipeline

      CODE
      import datetime
      
      from airflow import DAG
      from transflux.src.dag_factory.data_migration_non_batch_callable import create_dag_migration_non_batch
      
      DAG_ID = <dag_id>       ## The name of the DAG pipeline shown in the airflow UI
      migration_to_run = <migration_to_run>     ## From the data_migration_config.yaml as set
      
      dag = create_dag_migration_non_batch(
          dag_id=DAG_ID,
          migration_to_run=migration_to_run
      )
      
      # Register the DAG
      globals()[DAG_ID] = dag
  6. Deploy the solution on machine. The pipeline would appear as per the set <dag_id> in the …. migration_dag.py file

  7. For the batch processing pipeline, turning the pipeline ON (side bar shown to left of pipeline name) will start running the pipeline operation

  8. For non-batch processing pipeline, in addition to turning ON the pipeline, the pipeline should be triggered (play button shown to the right side of pipeline name)

For further details of operating migration pipelines from airflow UI, follow Migration Activity ( CX-4.7 to CX-4.8 )

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.