Running RT using Kubernetes

Introduction

This section provides a guide to how Reliable Transport (RT) can be brought up in a Kubernetes cluster using an helm chart and accompanying docker image.

The default deployment starts a 3 node RT cluster with a node affinity of hard. The node affinity means that the 3 pods will be started on distinct nodes. This is to maximize the fault tolerance and ensure that if one of the 3 nodes goes down for any reason, RT can continue to operate.

It also includes the default setting for the required: - Volumes - Environment variables - Resources, memory and CPU

For more information on how RT works see here.

A useful tool for inspecting and navigating the Kubernetes cluster is k9s.

To be able to pull down the relevant image kxi-rt, you need to log into a docker registry.

docker login registry.dl.kx.com -u username -p password

Provide a license

A license for kdb+ Insights is required to run RT. Instructions on how to make use of a kdb+ Cloud Edition within a Kubernetes cluster is documented below.

Download the chart

The chart can be found on an external registry. Assuming the appropriate access has been granted, the chart will be available for download.

Ensure you have access to the appropriate report for the charts.

$ helm repo ls
NAME            URL
kx-insights     https://nexus.dl.kx.com/repository/kx-insights-charts

If the appropriate repo is not available, you can obtain access as follows:

$ helm repo add kx-insights https://nexus.dl.kx.com/repository/kx-insights-charts --username **** --password ****  ## can search for the chart. if available this will return the location

"kx-insights" has been added to your repositories

You can now search for the chart and determine the appropriate chart and app version

$ helm search repo kx-insights/kxi-rt
NAME                            CHART VERSION   APP VERSION     DESCRIPTION
kx-insights/kxi-rt         b    1.2.3           1.2.3           A Helm chart for Kubernetes

In order to download the chart, as well as untar the chart into your local session, you can run the following:
```
helm fetch kx-insights/kxi-rt --version  1.2.3 --untar
```

Configuration

The values file allows for custom configuration to be defined.

You can edit the top level fields inside of the values.yaml file as follows:

Application

logging:
  logLevel:             INFO
  qulogLevel:           INFO
  qulogLeader:          "1"

stream:               mystream

raft:
  heartbeat:          1000

Information on the significance of the stream name can be found here.

For .logging.logLevel and .logging.qulogLevel, you can control the level of logging generated by RT. RT is made up of several components, the latter variable .logging.qulogLevel controls the level of Raft logging. Whether you want to log the Raft leader or not is controlled by .logging.qulogLeader.

With .raft.heartbeat, you can control, in milliseconds, the heartbeat interval between the RT pods in your cluster. A follower will timeout and call an election if it doesn't receive a heartbeat from the leader for between 2*$RAFT_HEARTBEAT and 4*$RAFT_HEARTBEAT. The follower randomly chooses the heartbeat timeout between those lower and upper bounds.

Typically an election will take place in the following scenarios,

Upon an install of kxi-rt, once there are 2 nodes up and running
Network instability, if the heartbeat between nodes exceeds the $RAFT_HEARTBEAT conditions

You can reduce this value, it'll mean faster but more frequent elections. e.g the elections will be more sensitive to network blips

Resources

resources:
  requests:
    memory: "1Gi"
    cpu: "1000m"
  limits:
    memory: "1Gi"
    cpu: "1000m"

affinity: hard

persistence:
  capacity: 18Gi
  storageClass: ""

resources: you can control the amount of CPU and memory that your pods will consume. The values chosen for these should reflect the amount of data expected to be ingested. For estimated values on these please reach out to your KX sales representative who can assist.
affinity: you can control how a pod is launched relative to other pods. The Kubernetes scheduler can place a pod either on a group of nodes or a pod relative to the placement of other pods. To maximise fault tolerance, RT pods should be ran on distinct nodes, therefore if capacity is available, an affinity of hard should be configured.
persistence: you can control the type and size of the PVC that is provisioned in this section. When leaving the storageClass empty, as has been done above, the storage class chose is the default of the cloud provider.

Archiver

archiver:
  time:         60
  disk:         90
  limit:        "5Gi"
  awsBackup:
    # Settings to backup log files to AWS S3
    enabled: "0"
    bucket: "mybucket"
    region: "us-east-2"
    keyPrefix: "mystream/"
    logLevel: INFO
    numThreads: 8
    parallelFiles: 2
  azureBackup:
    # Settings to backup log files to Azure blob storage
    enabled: "0"
    container: "mystream"
    logLevel: INFO
    threadsPerFile: 4
    parallelFiles: 2

Information on the RT archiver and the garbage collection policies can be found here.

AWS backup

The RT archiver supports backing up RT merged log files to AWS S3. In order to use this, AWS credentials must be provided with read and write access to the bucket. Typically this is done by creating an IAM role and adding this as a service account in the values.yaml:

serviceAccount:
  # Specifies whether a service account should be created
  create: true
  # Annotations to add to the service account
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::0123456789012:role/my-role-irsa
  # The name of the service account to use.
  # If not set and create is true, a name is generated using the fullname template
  name: my-service-account
  #
  # Specifies whether to auto-mount a service account
  #
  autoMount: true
  #

The awsBackup settings can thus be configured:

enabled: set to "1" to enable AWS backup. Default disabled.
bucket: the S3 bucket that the log files should be written to.
region: the AWS region where the bucket is hosted. Default "aws-global"
keyPrefix: The object key prefix in the bucket under which to backup the log files. If set this must end in a / such that all the log files are placed under a directory in the S3 bucket. For example with bucket: "mybucket" and keyPrefix: "mystream/" the log files would be backed up as s3://mybucket/mystream/log.0.0, s3://mybucket/mystream/log.0.1, etc. Default "".
logLevel: S3 backup logging level, one of NONE, FATAL, ERROR, WARN, INFO, DEBUG or TRACE. Default INFO.
numThreads: number of threads that the AWS backup service should use. Default 8.
parallelFiles: number of log files to be backing up in parallel. Default 2.

Azure backup

The RT archiver supports backing up RT merged log files to Azure Blob Storage. In order to use this, Azure credentials must be provided with read and write access to the container. This can be done by setting the standard Azure environment variables in the values.yaml:

env:
  AZURE_STORAGE_CONNECTION_STRING: "<REDACTED>"
  AZURE_STORAGE_SERVICE_ENDPOINT: "<REDACTED>"
  AZURE_STORAGE_ACCOUNT: "<REDACTED>"
  AZURE_STORAGE_KEY: "<REDACTED>"
  AZURE_STORAGE_SAS_TOKEN: "<REDACTED>"

Credentials are determined in this order:

AZURE_STORAGE_CONNECTION_STRING or
(AZURE_STORAGE_SERVICE_ENDPOINT or AZURE_STORAGE_ACCOUNT) and (AZURE_STORAGE_KEY or AZURE_STORAGE_SAS_TOKEN)

The azureBackup settings can thus be configured:

enabled: set to "1" to enable Azure backup. Default disabled.
container: the Azure storage container that the log files should be written to.
logLevel: S3 backup logging level, one of NONE, FATAL, ERROR, WARN, INFO, DEBUG or TRACE. Default INFO.
threadsPerFile: number of threads that each file upload operation should use. Default 8.
parallelFiles: number of log files to be backing up in parallel. Default 2.

Deployment

Installing

When starting the helm chart, there will be 2 inputs:

release name (for example kx)
chart name, in the example below, the chart name is kxi-rt. This will be a static value and corresponds to the name of the chart, kxi-rt.

To install many instances of the chart, distinct release names should be used:

helm install <release.name> <chart.name> -n <namespace>
helm install kx kxi-rt -n <namespace>

You might find it useful to have a global settings file to configure entities such as the kdb+ license file.

The details below show how a kubernetes secret can be created and added to the relevant namespace, before subsequently being reference in the helm chart deploy.

Create secret and reference secret in global settings file:

base64 -w0 <path to kc.lic> > license.txt
kubectl create secret generic kxi-license --from-file=license=./license.txt
$ cat global_settings.yaml
global:
  license:
    secretName: kxi-license
    asFile: false
    onDemand: true

These global settings can then be used when installing the chart by using the -f argument:

helm install <release.name> <chart.name> -n <namespace>
helm install kx kxi-rt -n <namespace> -f global_settings.yaml

Upon installing the helm chart with the configuration values above, there will be 3 pods launched on distinct nodes. There will also be 3 distinct PVCs:

$ kubectl get pods -n <namespace>
NAME          READY   STATUS    RESTARTS   AGE
kx-mystream-0   1/1     Running   0          29m
kx-mystream-1   1/1     Running   0          29m
kx-mystream-2   1/1     Running   0          29m

$ kubectl get pvc -n <namespace>
NAME                            STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
kx-store--data-kx-mystream-0      Bound    pvc-1d67c2bb-38cb-4d32-92ce-791923e35e37   12Gi       RWO            gp2            3h43m
kx-store--data-kx-mystream-1      Bound    pvc-bd5ca1b9-6769-4235-ba4b-2344952ad7e6   12Gi       RWO            gp2            3h43m
kx-store--data-kx-mystream-2      Bound    pvc-37843bb4-deb4-4066-8e02-dc7a84ba24ce   12Gi       RWO            gp2            3h42m

Uninstalling

To stop the helm chart, you run the following:

helm uninstall <release.name> -n <namespace>
helm uninstall kx -n <namespace>

Upon uninstalling the 3 RT pods will be taken down. However, note that the PVCs will be retained. These can be manually deleted if required.

$ kubectl get pods -n <namespace>
NAME          READY   STATUS    RESTARTS   AGE

$ kubectl get pvc -n dwalsh-helm
NAME                            STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
kx-store--data-kx-mystream-0      Bound    pvc-1d67c2bb-38cb-4d32-92ce-791923e35e37   12Gi       RWO            gp2            3h43m
kx-store--data-kx-mystream-1      Bound    pvc-bd5ca1b9-6769-4235-ba4b-2344952ad7e6   12Gi       RWO            gp2            3h43m
kx-store--data-kx-mystream-2      Bound    pvc-37843bb4-deb4-4066-8e02-dc7a84ba24ce   12Gi       RWO            gp2            3h42m

Steps to bring up RT chart with support for external SDKs and SSL

The RT external SDKs (c and Java) were designed to connect to the kdb Insights Enterprise via an Information Service which will provide the RT external endpoints and associated SSL ca/cert/key for a client which has already been enrolled with Keycloak.

Managing service discovery and authentication with a standalone RT is application specific and therefore outside the scope of this document but their role can be mocked and the process demonstrated.

It is necessary to perform some additional steps when bringing up the RT helm chart to support these external SDKs. These steps must be performed in the correct order:

Run the make_certs.sh script which will generate client and server ca/cert/key in the certs/ subdirectory. A kubernetes secret will be created from the certs/server directory which is mounted into the /cert directory of the RT pods where the server ca/cert/key is used to start the external replicators:

sh make_certs.sh <namespace> <streamid>

Having generated the certs, bring up the RT chart as described above:

helm install kx kxi-rt -n <namespace> -f global_settings.yaml

With the chart up, run the enrol_json.sh script. This uses kubectl to look up up the load balancer endpoints for the external replicators, and reads the client ca/cert/key from certs/client:

sh enrol_json.sh <namespace> <streamid>

It then uses this information to generate a client.json which conforms to the same structure as would be returned by the Information Service:

cat client.json | jq
{
  "name": "client-name",
  "topics": {
    "insert": "mystream",
    "query": "requests"
  },
  "ca": "<REDACTED>",
  "cert": "<REDACTED>",
  "key": "<REDACTED>",
  "insert": {
    "insert": [
      ":k8s-nealalph-kximystr-f69ea8c8bd-097e6615d0e2d36f.elb.eu-west-1.amazonaws.com:5000",
      ":k8s-nealalph-kximystr-2c3b427ac6-0f9c2c6d1783dab7.elb.eu-west-1.amazonaws.com:5000",
      ":k8s-nealalph-kximystr-cff7c9dea0-55bc821329d7c3cd.elb.eu-west-1.amazonaws.com:5000"
    ],
    "query": []
  },
  "query": []
}

The external SDKs can now be started by pointing it to this file rather than the Information Service endpoint.

Java

RT_REP_DIR=<REPLICATOR_LOCATION>
RT_LOG_PATH=<RT_LOG_PATH>
KXI_CONFIG_FILE=./client.json
java -jar ./rtdemo-<VERSION>-all.jar --runCsvLoadDemo=<CSV_FILE>

C

DSN="DRIVER=/usr/local/lib/kodbc/libkodbc.so;CONFIG_FILE=./client.json"
Schema="sensorID:int,captureTS:ts,readTS:ts,valFloat:float,qual:byte,alarm:byte"
Table="trace"
./csvupload -c "$DSN" -t "$Table" -s "$Schema" < sample.csv