Setting up Daily Pipeline Execution
You must have downloaded and installed the relevant package containing the pipeline you want to use before setting up daily pipeline execution.
1. Create the daily gen directory
Create a new directory to contain all of the files needed to set up daily pipeline execution for data generation.
mkdir daily-gen
Move the necessary files to setup the cronjob to your daily-gen
directory. There will be three yaml files to configure the cronjob, namely gen-inspect-pod.yaml
, gen-pvc.yaml
and generation-cronjob.yaml
; along with a pipeline code file. The pipeline code file will depend on the pipeline you are setting up daily execution for. The files will be in the accelerators package you have downloaded. The three yaml files should be in a config
directory and the pipeline code file should be in the pipeline-spec
directory. For bar generation the files are:
- bargeneration.q
- gen-inspect-pod.yaml
- gen-pvc.yaml
- generation-cronjob.yaml
For equity analytics generation the files are: - eqeagentca.q - gen-inspect-pod.yaml - gen-pvc.yaml - generation-cronjob.yaml
So for the bar generation example the command to do this would be
cp IceFixedIncome/pipeline-spec/bargeneration.q IceFixedIncome/config/gen-inspect-pod.yaml IceFixedIncome/config/gen-pvc.yaml IceFixedIncome/config/generation-cronjob.yaml daily-gen
Edit the pipeline code file
Edit the assemblyName
variable in the pipeline code file (eg bargeneration.q
, eqeagentca.q
) to be the name of your already running assembly, eg in the case of the bargeneration pipeline this is the fsi-app-ice-fi
assembly.
// TODO Update the name of your assembly below within the quotation marks instead of the placeholder `$"ENTER_YOUR_ASSEMBLY_NAME_HERE"
assemblyName:`$"ENTER_YOUR_ASSEMBLY_NAME_HERE"
Note, you may need the pipeline code file again if you are setting up manual pipeline execution.
2. Create run-generation-cronjob.sh
We need to create a bash script called run-generation-cronjob.sh
that will run within the cronjob process. The script will submit a POST REST request to bring up a pipeline using the pipeline code file.
vi run-generation-cronjob.sh
And insert the below code into the script, replacing <your-assembly>
with the name of your already running assembly, eg in the case of the bargeneration pipeline this is the fsi-app-ice-fi
assembly. Also replace <your-file>
with the name of your pipeline code file, eg bargeneration.q
, eqeagentca.q
.
#!/bin/bash
## Env variables needed before running script with
## TODO: change ASM_NAME variable to the name of your assembly
ASM_NAME="<your-assembly>"
## Install the libraries needed to run the POST request
apt-get update -o=dir::cache=/mnt/gen/packages/
mkdir -p /mnt/gen/packages/archives/partial
apt-get install -f -y -o=dir::cache=/mnt/gen/packages/ git vim jq curl
echo "alias aws='/mnt/gen/packages/bin/aws'" >> ~/.bashrc
cd /mnt/gen/
################ FUNCTIONS ######################
logMsg(){
if [ -z "$*" ]; then
echo "No message provided"
return 1
fi
echo "$(date -u +'%Y-%m-%dT%H:%M:%S.%3NZ') ## $*"
}
renewToken(){
logMsg "Renewing keycloak token"
curl -s --header "Content-Type: application/x-www-form-urlencoded" \
-d "grant_type=client_credentials&client_id=$CLIENT_NAME&client_secret=$CLIENT_SECRET" \
"$INSIGHTS_HOSTNAME/auth/realms/insights/protocol/openid-connect/token" \
| jq -r .access_token > token
}
getPipelineStatus(){
## arg1 pipeline name
renewToken
curl -s -S -X GET -H "Authorization: Bearer $(cat token)" $INSIGHTS_HOSTNAME/streamprocessor/pipeline/status/insights-generation
}
teardown(){
# arg1- pipeline name
renewToken
curl -s -S -X POST -H "Authorization: Bearer $(cat token)" $INSIGHTS_HOSTNAME/streamprocessor/pipeline/teardown/insights-generation?clearCheckpoints=true
}
runPipeline(){
export PIPELINE_NAME="generation"
export SPEC_FILE="/mnt/gen/<your-file>"
echo $SPEC_FILE $PIPELINE_NAME
# Token needs renewed before running pipeline
renewToken
## Teardown pipeline if it already exists
teardown $PIPELINE_NAME
logMsg "Waiting for pipeline to teardown"
sleep 10
# run request
curl -s -S -X POST $INSIGHTS_HOSTNAME/streamprocessor/pipeline/create\
-H "Authorization: Bearer $(cat token)" \
-d "$(jq -n --arg spec "$(cat $SPEC_FILE)" --arg aws_access_key_id $AWS_ACCESS_KEY_ID \
--arg aws_secret_access_key $AWS_SECRET_ACCESS_KEY \
--arg pipeline_name $PIPELINE_NAME \
--arg asm_name $ASM_NAME \
--arg configmap_name $ASM_NAME-assembly-configmap \
'{
name : $pipeline_name,
type : "spec",
config : { content: $spec },
settings : {
minWorkers: "1",
maxWorkers: "1"
},
env : {
KXI_SP_BETA_FEATURES: "true",
ASM_NAME: $asm_name,
AWS_REGION: "eu-west-1",
AWS_ACCESS_KEY_ID: $aws_access_key_id,
AWS_SECRET_ACCESS_KEY: $aws_secret_access_key,
KX_KURL_DEBUG_LOG: "1",
KXI_SP_DIRECT_WRITE_ASSEMBLY: $asm_name,
KX_TRACE_S3: "1"
},
kubeConfig : {
configMaps: $configmap_name
}
}' | jq -asR .)"
}
runPipeline
3. Create PVC to mount your volume and move files to the volume
Create a PVC that you will mount to your generation process.
kubectl create -f daily-gen/gen-pvc.yaml
Create the pod that mounts to the volume so that you can move files into the mount file path
kubectl apply -f daily-gen/gen-inspect-pod.yaml
Then copy the run-generation-cronjob.sh
script, and the pipeline code file (eg bargeneration.q
, eqeagentca.q
) to your volume which should be at the path /mnt/gen
. Replace {your-namespace} with the name of your namespace within your kubernetes cluster and replace {your-file} with the name of your pipeline code file.
kubectl cp run-generation-cronjob.sh {your-namespace}/pod-inspect-gen-pvc:/mnt/gen
kubectl cp daily-gen/{your-file} {your-namespace}/pod-inspect-gen-pvc:/mnt/gen
Note, you may have to update the permissions of the run-generation-cronjob.sh
script once it is mounted on the volume. You can do this by running a chmod command on the pvc pod using kubectl exec
. For example:
kubectl exec -it pod-inspect-gen-pvc -- chmod 777 /mnt/gen/run-generation-cronjob.sh
Once you have copied your files over, you can delete the pod which you used to mount your pvc (Your cronjob does not run if the volume is mounted by another process).
kubectl delete pod pod-inspect-gen-pvc
4. Set up cronjob to have generation run daily
Review the cronjob file generation-cronjob.yaml
. The time it is due to run can be found within the schedule section of the spec section of the file.
By default it is defined as schedule: "00 02 * * *"
which means it runs at 2am UTC every day. See https://crontab.guru/ to see how you can configure the time to run at different times of the day.
Review the secrets that are used within the generation-cronjob.yaml
file. The fields that you will need are:
- AWS_ACCESS_KEY_ID
- AWS_SECRET_ACCESS_KEY
- CLIENT_ID
- INSIGHTS_HOSTNAME
- CLIENT_SECRET
The generation-cronjob.yaml
file reads these from kubernetes secrets. The secrets referred to in this file should be adjusted to reflect the secrets in your namespace/cluster.
To set the cronjob up to run, you need to run the command
kubectl apply -f daily-gen/generation-cronjob.yaml
This runs the run-generation-cronjob.sh
script within a pod. The script installs the necessary libraries to run a POST command to bring up the pipeline.
The pipeline will be prefixed with insights-
eg insights-generation