Skip to main content
Skip table of contents

Keycloak Users Pipeline

The Keycloak Users Pipeline is responsible for managing and processing keycloak users data. It extracts data from source (API) applies necessary transformations to align with schema requirements, and loads the data into target database (MySQL). Key tasks include:

  • Validating data fields for completeness and accuracy.

  • Mapping form structure fields to the target schema.

  • Handling data updates and upserts to maintain data integrity.

Configurations

Configurations for the Keycloak Users Pipeline are provided in a keycloak_users_data_pipeline_config.yaml format to ensure flexibility and adaptability. These configurations are designed for normal and ideal use cases and are advised to be used as-is to achieve optimal results.

CODE
source:
  type: "api"

  # The API endpoint to fetch token against Keycloak
  # endpoint_token: "{KEYCLOAK_URL}/auth/realms/{REALM}/protocol/openid-connect/token"
  # The API endpoint to fetch token against Keycloak
  # endpoint_token: "{KEYCLOAK_URL}/auth/admin/realms/{REALM}/roles/{role}/users"

  KEYCLOAK_URL: "https://studio-01.expertflow.com"
  REALM: "expertflow"
  ADMIN_USERNAME: "admin"
  ADMIN_PASSWORD: "admin"
  CLIENT_ID: "cim"
  CLIENT_SECRET: "ef61df80-061c-4c29-b9ac-387e6bf67052"

  queries:
    users:
      database:
      collection_name:
      filter: ["agent","evaluator","quality-manager","supervisor"]
      replication_key: "createdTimestamp"
      transformation: "transform_users_data"
      num_batches: 50
      query_keys:

batch_size: 10000 # Max results fetched in API call

target:
  type: "mysql"
  db_url: "mysql+pymysql://elonmusk:68i3nj7t@192.168.1.182:3306/forms_db"
  enable_ssl: true  # Enable or disable SSL connections
  ssl_ca: "/transflux/certificates/mysql_certs/ca.pem"
  ssl_cert: "/transflux/certificates/mysql_certs/client-cert.pem"
  ssl_key: "/transflux/certificates/mysql_certs/client-key.pem"
configdb:
  type: "mysql"
  db_url: "mysql+pymysql://elonmusk:68i3nj7t@192.168.1.182:3306/forms_db"
  enable_ssl: true  # Enable or disable SSL connections
  ssl_ca: "/transflux/certificates/mysql_certs/ca.pem"
  ssl_cert: "/transflux/certificates/mysql_certs/client-cert.pem"
  ssl_key: "/transflux/certificates/mysql_certs/client-key.pem"
transformation:
schedule_interval: "*/15 * * * *"

Source Configuration

This section defines the API source settings for data extraction.

The KEYCLOAK_URL is the only element that varies with each Expertflow Data Platform deployment, corresponding to the specific EFCX instance. Other configuration details remain consistent, managed during EFCX deployment, and are accessible solely via the Keycloak admin panel

  • type: Specifies the data source type.
    Example: "api" indicates that API is the source.

  • KEYCLOAK_URL:
    The FQDN where keycloak is deployed
    Example: "https://studio-01.expertflow.com"

  • REALM:
    Realm of the dedicated keycloak
    Example: “expertflow”

  • Following fields are required for authentication and generation of access token to send API requests in order to fetch data.

    • ADMIN_USERNAME:
      Example: "admin"

    • ADMIN_PASSWORD:
      Example: "admin"

    • CLIENT_ID:
      Example: "cim"

    • CLIENT_SECRET:
      Example: "ef61df80-061c-4c29-b9ac-387e6bf67052"

  • queries:
    A dictionary containing query configurations for different pipelines.

    • users: Configurations for the users pipeline.

      • filter: Query filter applied to fetch data. Example: ["agent","evaluator","quality-manager","supervisor"] (to fetch all users data whose role fall in this filter).

      • replication_key: Field used to track updates. Example: "createdTimestamp".

      • transformation: Transformation mapping function name. Example: "transform_users_data".

      • num_batches: Number of data batches. Example: 50.

      • query_keys: Reserved for gold queries ( for loading data in gold table if needed ).

Batch Size

  • batch_size:
    Corresponds to the max results returned from API call
    Example: 10000.

Target Configuration

This section defines the target MySQL database settings for data loading.

  • type: Specifies the target database type.
    Example: "mysql".

  • db_url: Connection string for the target MySQL database.
    Format: "mysql+pymysql://<username>:<password>@<host>:<port>/<database>".
    Example: "mysql+pymysql://elonmusk:68i3nj7t@192.168.1.182:3306/forms_db".

  • enable_ssl: Enables SSL communication with the MySQL database.
    Example: true.

  • SSL Configuration:

    • ssl_ca: Path to the CA certificate. Example: "/transflux/certificates/mysql_certs/ca.pem".

    • ssl_cert: Path to the client certificate. Example: "/transflux/certificates/mysql_certs/client-cert.pem".

    • ssl_key: Path to the client private key. Example: "/transflux/certificates/mysql_certs/client-key.pem".

Config Database

The configuration database (configdb) stores metadata and operational settings for Data Platform.

  • Fields are identical to the target configuration.

Schedule Interval

  • schedule_interval:
    Cron expression defining the pipeline's schedule in Data Platform.
    Example: "*/15 * * * *" (runs every 15 minutes).

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.