Breadcrumbs

CX Customer Upload Enhancement: Prevent Duplication

This is enhancement of Bulk Customer Upload feature (available from CX4.10 onwards) and the enhancement is part of CX5.1 and onwards.

1. Overview

The CX Bulk Customer Upload feature allows administrators to upload customer data into the CX Customer using CSV files. The current implementation lacks duplicate validation, resulting in multiple records for the same customer and fragmented customer data.

This enhancement introduces a deduplication and merge mechanism during bulk uploads to ensure that:

  • Duplicate customer records are prevented

  • Existing customer records are intelligently updated

  • Customer attributes remain consistent and accurate

  • Users have control over how duplicate records are handled

The system will detect duplicates based on user-selected deduplication keys and apply a configurable conflict resolution strategy.


2. Problem Statement

The current bulk upload mechanism has the following issues:

2.1 Duplicate Customer Creation

  • The system does not check for existing customer records during bulk upload.

  • Uploading the same customer multiple times results in duplicate records.

2.2 Fragmented Customer Profiles

  • Customer information becomes scattered across multiple records.

  • Different uploads may contain partial information for the same customer.

Example:

Upload

Data

Upload 1

Name + Phone

Upload 2

Name + Email

Result:
Two different customer records instead of one unified profile.


2.3 Data Integrity Issues

Duplicate records cause issues in:

  • Customer analytics

  • Reporting

  • Contact routing

  • CRM integrations

  • Campaigns


3. Objectives

This enhancement aims to:

  1. Prevent duplicate customer creation.

  2. Allow flexible deduplication based on user-defined keys.

  3. Provide configurable actions for duplicate handling.

  4. Enable attribute merging for improved customer profiles.

  5. Maintain upload transparency through detailed results.


4. High-Level Solution

The solution introduces a deduplication framework integrated into the bulk upload pipeline.

Key capabilities:

  1. User selects Deduplication Key(s) during upload.

  2. User selects Duplicate Handling Behavior or Action.

  3. The cx-data-platform processes the CSV in batches.

  4. Each batch is sent to the CIM Customer.

  5. The microservice:

    • Detects duplicates

    • Categorizes customers

    • Executes the selected action

  6. The system returns a summary of operations performed.


5. Feature Workflow

Step 1 — Upload CSV

The user uploads a CSV file containing customer records via the Bulk Import UI on Unified Admin.

Example fields:

firstName,lastName,email,phone,city
John,Doe,john@example.com,123456,New York
Jane,Doe,jane@example.com,654321,Boston


Step 2 — Select Deduplication Key(s) — UI Enhancement

The user must select at least one deduplication key.

image-20260316-052109.png
Select Deduplication Keys

Possible keys:

  • Email

  • Phone Number

  • Facebook

  • All supported EFCX Channel Identifiers

  • First Name (default fallback)

These keys are used to detect duplicate records.

Example selection:

Deduplication Keys:
☑ Email
☑ Phone

Step 3 — Select Duplicate Handling Behavior

If duplicates are found, the user selects how the system should handle them.

image-20260316-052214.png
Action against the duplicate customer if found

Available options:

1. Ignore

Duplicate records are skipped.

  • Existing record remains unchanged.

  • Ignored contacts are returned in output CSV with reason.

Example:

Reason: Ignored

2. Merge

Only attributes with empty values in the existing record will be updated.

Example:

Existing Customer

Name

Email

Phone

John

(empty)

123

CSV Record

Name

Email

Phone

John

john@email.com

123

Result

Name

Email

Phone

John

john@email.com

123


3. Replace

The entire existing record is replaced with the CSV record.

Example:

Existing Record

Name

Email

Phone

John

old@email.com

123

CSV Record

Name

Email

Phone

John

new@email.com

999

Result

Name

Email

Phone

John

new@email.com

999


4. Append

New values from the CSV are appended to the existing record.

Rules:

  • Multi-valued fields → append

  • Single-valued fields → replace

Example:

Existing

phone = [111]

CSV

phone = [222]

Result

phone = [111,222]


6. System Architecture Flow

Frontend

  1. User uploads CSV

  2. User selects:

    • Deduplication Key(s)

    • Duplicate handling action

  3. Frontend uploads the file and deduplication details with status Unprocessed are saved in db


Data Platform

Responsibilities:

  • Parse unprocessed CSV

  • Convert records into JSON

  • Split records into batches

Example batch payload:

{
  "deduplicationKeys": ["email","phone"],
  "action": "append",
  "customers": [
    { "firstName":"John","email":"john@example.com"},
    { "firstName":"Jane","email":"jane@example.com"}
  ]
}

The Data Platform sends batches sequentially to the CIM Customer Microservice.


CIM Customer Microservice

For each batch:

  1. Iterate over customers

  2. Check duplicates using dedupe keys

  3. Categorize customers into:

duplicateCustomers[]
uniqueCustomers[]

Processing Logic

Unique Customers

Bulk inserted into database.

bulkInsert(uniqueCustomers)

Duplicate Customers

Based on selected action:

Action

Operation

Ignore

Skip record

Merge

Update only empty fields

Replace

Replace full record

Append

Append or replace depending on field type

Operations executed using bulk database operations.


7. API Enhancement

Bulk Upload API

Request

POST /customers/bulkCustomers

Payload:

{
  "action": "append",
  "deduplicationKeys": [
            "firstName",
            "web",
            "instagram"
        ],
  "customers": [
    {
      "firstName": "Aliceee",
      "phoneNumber": ["+000000"],
      "isAnonymous": false,
      "email": ["alice@example.com"],
      "labels": "premium",
      "web": ["abc.com", "nice.com"]
    },
    {
      "firstName": "Bob",
      "web": "bobsite.com",
      "isAnonymous": false,
      "facebook": ["bob.njice"],
      "telegram": ["bob_telegram"]
    },
    {
      "isAnonymous": false,
      "voice": ["+1987654321"],
      "linkedin": ["charlie_in"]
    },
    {
      "firstName": "Diana",
      "isAnonymous": false,
      "viber": ["diana_viber"],
      "youtube": ["dianaYT"]
    },
    {
      "firstName": "Ethan",
      "isAnonymous": false,
      "instagram": ["ethan.ig"],
      "twitter": ["@ethan_tw"],
      "email": ["ethan@mail.com"]
    },
    {
      "firstName": "Fiona",
      "isAnonymous": false,
      "labels": "newsletter",
      "phoneNumber": ["+1122334455"],
      "telegram": ["fiona_tg"]
    }
  ]
}

Response

{
    "insertedCount": 5,
    "rejectedCustomers": [],
    "appendedCount": 0,
    "rejectedCount": 0
}

8. Output CSV for Failed / Ignored Records

The system provides a downloadable CSV containing:

  • Ignored contacts

  • Rejected records

  • Error reason

Example csv:

Customer

Reason

john@email.com

Ignored

jane@email.com

Invalid email format


9. Benefits of the Enhancement

Improved Data Quality

Ensures single source of truth for customers.

Duplicate Prevention

Stops multiple records of the same customer.

Flexible Merge Options

Allows users to choose how data should be merged.

Better Customer Profiles

Enables gradual enrichment of customer attributes.

Performance Optimized

Batch processing with bulk database operations