Keycloak Users Pipeline
The Keycloak Users Pipeline is responsible for managing and processing keycloak users data. It extracts data from source (API) applies necessary transformations to align with schema requirements, and loads the data into target database (MySQL). Key tasks include:
Validating data fields for completeness and accuracy.
Mapping form structure fields to the target schema.
Handling data updates and upserts to maintain data integrity.
Configurations
Configurations for the Keycloak Users Pipeline are provided in a keycloak_users_data_pipeline_config.yaml
format to ensure flexibility and adaptability. These configurations are designed for normal and ideal use cases and are advised to be used as-is to achieve optimal results.
source:
type: "api"
# The API endpoint to fetch token against Keycloak
# endpoint_token: "{KEYCLOAK_URL}/auth/realms/{REALM}/protocol/openid-connect/token"
# The API endpoint to fetch token against Keycloak
# endpoint_token: "{KEYCLOAK_URL}/auth/admin/realms/{REALM}/roles/{role}/users"
KEYCLOAK_URL: "https://studio-01.expertflow.com"
REALM: "expertflow"
ADMIN_USERNAME: "admin"
ADMIN_PASSWORD: "admin"
CLIENT_ID: "cim"
CLIENT_SECRET: "ef61df80-061c-4c29-b9ac-387e6bf67052"
queries:
users:
database:
collection_name:
filter: ["agent","evaluator","quality-manager","supervisor"]
replication_key: "createdTimestamp"
transformation: "transform_users_data"
num_batches: 50
query_keys:
batch_size: 10000 # Max results fetched in API call
target:
type: "mysql"
db_url: "mysql+pymysql://elonmusk:68i3nj7t@192.168.1.182:3306/forms_db"
enable_ssl: true # Enable or disable SSL connections
ssl_ca: "/transflux/certificates/mysql_certs/ca.pem"
ssl_cert: "/transflux/certificates/mysql_certs/client-cert.pem"
ssl_key: "/transflux/certificates/mysql_certs/client-key.pem"
configdb:
type: "mysql"
db_url: "mysql+pymysql://elonmusk:68i3nj7t@192.168.1.182:3306/forms_db"
enable_ssl: true # Enable or disable SSL connections
ssl_ca: "/transflux/certificates/mysql_certs/ca.pem"
ssl_cert: "/transflux/certificates/mysql_certs/client-cert.pem"
ssl_key: "/transflux/certificates/mysql_certs/client-key.pem"
transformation:
schedule_interval: "*/15 * * * *"
Source Configuration
This section defines the API source settings for data extraction.
The KEYCLOAK_URL
is the only element that varies with each Expertflow Data Platform deployment, corresponding to the specific EFCX instance. Other configuration details remain consistent, managed during EFCX deployment, and are accessible solely via the Keycloak admin panel
type
: Specifies the data source type.
Example:"api"
indicates that API is the source.KEYCLOAK_URL
:
The FQDN where keycloak is deployed
Example:"https://studio-01.expertflow.com"
REALM
:
Realm of the dedicated keycloak
Example:“expertflow”
Following fields are required for authentication and generation of access token to send API requests in order to fetch data.
ADMIN_USERNAME
:
Example:"admin"
ADMIN_PASSWORD
:
Example:"admin"
CLIENT_ID
:
Example:"cim"
CLIENT_SECRET
:
Example:"ef61df80-061c-4c29-b9ac-387e6bf67052"
queries
:
A dictionary containing query configurations for different pipelines.users
: Configurations for the users pipeline.filter
: Query filter applied to fetch data. Example:["agent","evaluator","quality-manager","supervisor"]
(to fetch all users data whose role fall in this filter).replication_key
: Field used to track updates. Example:"createdTimestamp"
.transformation
: Transformation mapping function name. Example:"transform_users_data"
.num_batches
: Number of data batches. Example:50
.query_keys
: Reserved for gold queries ( for loading data in gold table if needed ).
Batch Size
batch_size
:
Corresponds to the max results returned from API call
Example:10000
.
Target Configuration
This section defines the target MySQL database settings for data loading.
type
: Specifies the target database type.
Example:"mysql"
.db_url
: Connection string for the target MySQL database.
Format:"mysql+pymysql://<username>:<password>@<host>:<port>/<database>"
.
Example:"mysql+pymysql://elonmusk:68i3nj7t@192.168.1.182:3306/forms_db"
.enable_ssl
: Enables SSL communication with the MySQL database.
Example:true
.SSL Configuration:
ssl_ca
: Path to the CA certificate. Example:"/transflux/certificates/mysql_certs/ca.pem"
.ssl_cert
: Path to the client certificate. Example:"/transflux/certificates/mysql_certs/client-cert.pem"
.ssl_key
: Path to the client private key. Example:"/transflux/certificates/mysql_certs/client-key.pem"
.
Config Database
The configuration database (configdb
) stores metadata and operational settings for Data Platform.
Fields are identical to the target configuration.
Schedule Interval
schedule_interval
:
Cron expression defining the pipeline's schedule in Data Platform.
Example:"*/15 * * * *"
(runs every 15 minutes).