Skip to content

Setting up Daily Pipeline Execution

You must have downloaded and installed the relevant package containing the pipeline you want to use before setting up daily pipeline execution.

1. Create the daily gen directory

Create a new directory to contain all of the files needed to set up daily pipeline execution for data generation.

mkdir daily-gen

Move the necessary files to setup the cronjob to your daily-gen directory. There will be three yaml files to configure the cronjob, namely gen-inspect-pod.yaml, gen-pvc.yaml and generation-cronjob.yaml; along with a pipeline code file. The pipeline code file will depend on the pipeline you are setting up daily execution for. The files will be in the accelerators package you have downloaded. The three yaml files should be in a config directory and the pipeline code file should be in the pipeline-spec directory. For bar generation the files are: - bargeneration.q - gen-inspect-pod.yaml - gen-pvc.yaml - generation-cronjob.yaml

For equity analytics generation the files are: - eqeagentca.q - gen-inspect-pod.yaml - gen-pvc.yaml - generation-cronjob.yaml

So for the bar generation example the command to do this would be

cp  IceFixedIncome/pipeline-spec/bargeneration.q IceFixedIncome/config/gen-inspect-pod.yaml IceFixedIncome/config/gen-pvc.yaml IceFixedIncome/config/generation-cronjob.yaml daily-gen

Edit the pipeline code file

Edit the assemblyName variable in the pipeline code file (eg bargeneration.q, eqeagentca.q) to be the name of your already running assembly, eg in the case of the bargeneration pipeline this is the fsi-app-ice-fi assembly.

// TODO Update the name of your assembly below within the quotation marks instead of the placeholder `$"ENTER_YOUR_ASSEMBLY_NAME_HERE"
assemblyName:`$"ENTER_YOUR_ASSEMBLY_NAME_HERE"

Note, you may need the pipeline code file again if you are setting up manual pipeline execution.

2. Create run-generation-cronjob.sh

We need to create a bash script called run-generation-cronjob.sh that will run within the cronjob process. The script will submit a POST REST request to bring up a pipeline using the pipeline code file.

vi run-generation-cronjob.sh

And insert the below code into the script, replacing <your-assembly> with the name of your already running assembly, eg in the case of the bargeneration pipeline this is the fsi-app-ice-fi assembly. Also replace <your-file> with the name of your pipeline code file, eg bargeneration.q, eqeagentca.q.

#!/bin/bash

## Env variables needed before running script with
## TODO: change ASM_NAME variable to the name of your assembly
ASM_NAME="<your-assembly>"

## Install the libraries needed to run the POST request
apt-get update -o=dir::cache=/mnt/gen/packages/
mkdir -p /mnt/gen/packages/archives/partial
apt-get install -f -y -o=dir::cache=/mnt/gen/packages/ git vim jq curl
echo "alias aws='/mnt/gen/packages/bin/aws'" >> ~/.bashrc
cd /mnt/gen/

################ FUNCTIONS ######################

logMsg(){
  if [ -z "$*" ]; then
    echo "No message provided"
    return 1
  fi
  echo "$(date -u +'%Y-%m-%dT%H:%M:%S.%3NZ') ## $*"
}

renewToken(){
    logMsg "Renewing keycloak token"
    curl -s --header "Content-Type: application/x-www-form-urlencoded" \
         -d "grant_type=client_credentials&client_id=$CLIENT_NAME&client_secret=$CLIENT_SECRET" \
         "$INSIGHTS_HOSTNAME/auth/realms/insights/protocol/openid-connect/token" \
         | jq -r .access_token > token
}

getPipelineStatus(){
    ## arg1 pipeline name
    renewToken
    curl -s -S -X GET -H  "Authorization: Bearer $(cat token)"  $INSIGHTS_HOSTNAME/streamprocessor/pipeline/status/insights-generation
}

teardown(){
    # arg1- pipeline name
    renewToken
    curl -s -S -X POST -H "Authorization: Bearer $(cat token)" $INSIGHTS_HOSTNAME/streamprocessor/pipeline/teardown/insights-generation?clearCheckpoints=true

}

runPipeline(){
    export PIPELINE_NAME="generation"
    export SPEC_FILE="/mnt/gen/<your-file>"

    echo $SPEC_FILE  $PIPELINE_NAME

    # Token needs renewed before running pipeline
    renewToken

    ## Teardown pipeline if it already exists
    teardown $PIPELINE_NAME
    logMsg "Waiting for pipeline to teardown"
    sleep 10

    # run request
    curl -s -S -X POST $INSIGHTS_HOSTNAME/streamprocessor/pipeline/create\
        -H "Authorization: Bearer $(cat token)" \
        -d "$(jq -n  --arg spec "$(cat $SPEC_FILE)" --arg aws_access_key_id $AWS_ACCESS_KEY_ID \
        --arg aws_secret_access_key $AWS_SECRET_ACCESS_KEY \
        --arg pipeline_name $PIPELINE_NAME \
        --arg asm_name $ASM_NAME \
        --arg configmap_name $ASM_NAME-assembly-configmap \
        '{
            name     : $pipeline_name,
            type     : "spec",
            config   : { content: $spec },
            settings : {
                minWorkers: "1",
                maxWorkers: "1"
            },
            env      : {
                KXI_SP_BETA_FEATURES: "true",
                ASM_NAME: $asm_name,
                AWS_REGION: "eu-west-1",
                AWS_ACCESS_KEY_ID: $aws_access_key_id,
                AWS_SECRET_ACCESS_KEY: $aws_secret_access_key,
                KX_KURL_DEBUG_LOG: "1",
                KXI_SP_DIRECT_WRITE_ASSEMBLY: $asm_name,
                KX_TRACE_S3: "1"
            },
            kubeConfig  : {
                configMaps: $configmap_name
            }
        }' | jq -asR .)"

}

runPipeline

3. Create PVC to mount your volume and move files to the volume

Create a PVC that you will mount to your generation process.

kubectl create -f daily-gen/gen-pvc.yaml

Create the pod that mounts to the volume so that you can move files into the mount file path

kubectl apply -f daily-gen/gen-inspect-pod.yaml

Then copy the run-generation-cronjob.sh script, and the pipeline code file (eg bargeneration.q, eqeagentca.q) to your volume which should be at the path /mnt/gen. Replace {your-namespace} with the name of your namespace within your kubernetes cluster and replace {your-file} with the name of your pipeline code file.

kubectl cp run-generation-cronjob.sh {your-namespace}/pod-inspect-gen-pvc:/mnt/gen
kubectl cp daily-gen/{your-file} {your-namespace}/pod-inspect-gen-pvc:/mnt/gen

Note, you may have to update the permissions of the run-generation-cronjob.sh script once it is mounted on the volume. You can do this by running a chmod command on the pvc pod using kubectl exec. For example:

kubectl exec -it pod-inspect-gen-pvc -- chmod 777 /mnt/gen/run-generation-cronjob.sh

Once you have copied your files over, you can delete the pod which you used to mount your pvc (Your cronjob does not run if the volume is mounted by another process).

kubectl delete pod pod-inspect-gen-pvc

4. Set up cronjob to have generation run daily

Review the cronjob file generation-cronjob.yaml. The time it is due to run can be found within the schedule section of the spec section of the file.

By default it is defined as schedule: "00 02 * * *" which means it runs at 2am UTC every day. See https://crontab.guru/ to see how you can configure the time to run at different times of the day.

Review the secrets that are used within the generation-cronjob.yaml file. The fields that you will need are:

  • AWS_ACCESS_KEY_ID
  • AWS_SECRET_ACCESS_KEY
  • CLIENT_ID
  • INSIGHTS_HOSTNAME
  • CLIENT_SECRET

The generation-cronjob.yaml file reads these from kubernetes secrets. The secrets referred to in this file should be adjusted to reflect the secrets in your namespace/cluster.

To set the cronjob up to run, you need to run the command

kubectl apply -f daily-gen/generation-cronjob.yaml

This runs the run-generation-cronjob.sh script within a pod. The script installs the necessary libraries to run a POST command to bring up the pipeline.

The pipeline will be prefixed with insights- eg insights-generation