Campaign Scheduler Data Pipeline

The Campaigns Scheduler Data Pipeline is responsible processing all the data generated by the campaign scheduler in CX. It extracts data from sources (MongoDB) applies necessary transformations to align with schema requirements, and loads the data into target database. The respective target database schema for this pipeline is given in the following document: https://expertflow-docs.atlassian.net/wiki/spaces/CX/pages/2526317/Reporting+Database+Schema#campaign_scheduler . Key tasks include for this data pipeline include:

Validating data fields for accuracy and ensuring required fields are present.
Mapping data structure fields to the target schema.
Handling data updates and upserts to avoid data duplication.

Changes required in the `campaigns_data_pipeline_config.yaml` file:

Query

A query in the yaml file is a dictionary containing query configurations for different pipelines. For campaigns scheduler data pipeline the query is as follows

campaign_scheduler: Configurations for the campaigns scheduler pipeline.
- database: Name of the MongoDB database from where activities data is being extracted. Example: "conversation-manager_db".
- collection_name: Name of the MongoDB collection. Example: "ConversationActivities".
- filter: Query filter applied to fetch data. Example: {"activity.eventEmitter.senderName": "CAMPAIGN_SCHEDULER"}
- replication_key: Field used to track updates. Example: "recordCreationTime".
- transformation: Transformation function name. Example: "transform_campaign_scheduler".
- num_batches: Number of data batches. Example: 50.
- query_keys: Reserved for gold queries ( for loading data in gold table if needed ).

Changes required in the campaigns_data_pipeline_config.yaml file:

Query

Changes required in the `campaigns_data_pipeline_config.yaml` file: