Skip to content

S3 Historial Data

This example will deploy the service gateway, with Historial Databases (HDBs) that mount S3 buckets.

You will need:

  • An existing cloud storage bucket
  • An existing Kubernetes cluster or permission to create one
  • local copy of sym file, and par.txt for the cloud storage buckets
  • Kubernetes secrets for a Kx license and image pull secrets
  • Kubernetes service account with S3 Read Access, named kx-s3-read-access.

Note

Image pull secrets and Kx license info variables are named kx-repo-access and kx-license-info respectively. You may need to rename these references.

For reference on starting a new cluster, see:

Introducing fine-grained IAM roles for service accounts

Uploading your database to S3

Skip this section if you already have uploaded your database to S3.

To upload a database to S3, use aws s3 cp.

aws s3 cp "/path/to/file.txt" s3://kxinsights-marketplace-data/ --recursive

For reference, if you want to try out this example and you have no database in mind, a very simple database of n-rows per date can be created with:

// generate data for today + last few days
n:1000000;
d:asc .z.d - til 3;
{[d;n]sv[`;.Q.par[`:data/;d;`trade],`]set .Q.en[`:data/;([]sym:`$'n?.Q.A;time:("p"$d)+til n;price:n?100f;size:n?50f)];}[;n] each d;

Note the sym file at the top of the database directory. You will need this file later. Do not confuse this with the sym column inside the trade table.

data
├── 2022.05.02
│   └── trade
│       ├── price
│       ├── size
│       ├── sym
│       └── time
├── 2022.05.03
│   └── trade
│       ├── price
│       ├── size
│       ├── sym
│       └── time
├── 2022.05.04
│   └── trade
│       ├── price
│       ├── size
│       ├── sym
│       └── time
└── sym

Creating a cluster

An example kubernetes configuration file is available here.

This example describes the canonical trade and quote schemas used in many KX proof of concepts. If you are not using trades or quotes data, you will need to modify the assembly section within the s3Deployment.yml to correct the schema.

If you do not already have one, create a Service Account to allow read access to Amazon S3, named kx-s3-read-access. This can be done using eksctl.

eksctl create iamserviceaccount --name kx-s3-read-access\
    --namespace <your namespace>\
    --region <your region>\
    --cluster <your cluster>\
    --attach-policy-arn arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess\
    --approve

If not using the default namespace, find and replace the string default within s3Deployment.yml.

# Examples of Find and Replace using sed
sed -i 's|default|yourNamespace|g' s3Deployment.yml    # Linux
sed -i "" 's|default|yourNamespace|g' s3Deployment.yml # Mac OSX

Setup License and Repository

If this is your first time using an Kx Insights example you will need to create license and repository secret:

kubectl create secret docker-registry kx-repo-access \
    --docker-username=${NEXUS_USER} \
    --docker-password=${NEXUS_PASSWORD} \
    --docker-server=registry.dl.kx.com
kubectl create secret generic kx-license-info \
  --from-literal=license=$(base64 -w 0 < $QLIC/kc.lic)

If you already have existing secrets, take care to update the names within the reference file to use your names.

Reference these license secrets with container env variables:

          env:
            - name: KDB_LICENSE_B64
              valueFrom:
                secretKeyRef:
                  name: kx-license-info
                  key: license

Upload par.txt and sym as configmap

Create a ConfigMap with par.txt and your buckets sym file.

If you do not already have a par.txt, you can create one. It should be a single line text file, with the S3 bucket location of your database.

The sym file would be located with the partitioned database and would have be generated when you first created it.

An example par.txt looks like:

s3://kxinsights-marketplace-data/zd1726/db

Upload both the par.txt and the sym file as a config map using kubectl:

kubectl create configmap kxinsights-s3-configmap\
    --from-file=sym=/path/to/sym\
    --from-file=par.txt=/path/to/par.txt

Set volumeMounts and mounts

Within the reference file, you will want to modify the volume mounts and mounts sections to reference the sym and par.txt from the configmap above.

Ensure that the paths within volumeMounts reference the kxinsights-s3-configmap you created above.

Ensure that the mountPath(s) have the same values as those in the elements.dap.instances.HDB.sym and elements.dap.instances.HDB.par.

          volumeMounts:
            - name: s3config
              mountPath: /opt/kx/data/hdb/par.txt
              subPath: par.txt
            - name: s3config
              mountPath: /opt/kx/data/hdb/sym
              subPath: sym
...
    elements:
      dap:
        instances:
          HDB:
            mountName: hdb
            sym: /opt/kx/data/hdb/sym
            par: /opt/kx/data/hdb/par.txt

Ensure that the mount is type object, and that the baseURI the mount location that sym and par.txt will be placed into.

    mounts:
      hdb:
        type: object
        baseURI: file:///opt/kx/data/hdb
        partition: none

If the baseURI is set to a folder that does not contain sym + par.txt, or you place sym and par in different folders, we will copy sym + par.txt into that baseURI location. This will require that the pod has write access to that directory location.

Set environment variables

Set AWS_REGION as an environment variable:

          env:
            - name: KX_TRACE_OBJSTR
              value: "1"
            - name: AWS_REGION
              value: "us-east-2"

Make sure that KXI_SG_RC_ADDR uses the same namespace you are using:

If your namespace was hello, your value should be:

    - name: KXI_SG_RC_ADDR
      value: kxinsights-resource-coordinator.hello.svc:5060

You may also want to create secrets for AWS credentials and env if your bucket exists outside your clusters reach.

$ kubectl create secret generic aws-access-secret\
    --from-literal=AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}\
    --from-literal=AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}

You can reference those secrets as environment variables with:

          env:
            - name: AWS_ACCESS_KEY_ID
              valueFrom:
                secretKeyRef:
                  name: aws-access-secret
                  key: AWS_ACCESS_KEY_ID
            - name: AWS_SECRET_ACCESS_KEY
              valueFrom:
                secretKeyRef:
                  name: aws-access-secret
                  key: AWS_SECRET_ACCESS_KEY

Tip

You should not need to set credentials if kx-s3-read-access was used earlier.

Configure the schema

You will need to modify the schema to match the shape of your database.

Deploy the Assembly

Install resources and run the deployment with:

kubectl apply -f s3Deployment.yml

kubectl get pods can be used to view all of the running pods.

# kubectl get pods
NAME                                               READY   STATUS    RESTARTS   AGE
kxinsights-aggregator-786d9bc674-fz94d             1/1     Running   0          11s
kxinsights-aggregator-786d9bc674-p4k6f             1/1     Running   0          11s
kxinsights-aggregator-786d9bc674-rxg99             1/1     Running   0          11s
kxinsights-hdb-da-0                                1/1     Running   1          10s
kxinsights-hdb-da-1                                1/1     Running   1          6s
kxinsights-hdb-da-2                                1/1     Running   0          4s
kxinsights-resource-coordinator-5664d4f898-m6plb   1/1     Running   0          11s
kxinsights-sg-gateway-54596c8fc7-dwcnv             1/1     Running   0          11s
kxinsights-sg-gateway-54596c8fc7-jm8fc             1/1     Running   0          11s
kxinsights-sg-gateway-54596c8fc7-kqg4f             1/1     Running   0          11s

kubectl get services can be used to print the IP address of the Gateway LoadBalancer, shown below with the example IP of 192.0.2.127.

NAME                             TYPE           CLUSTER-IP    EXTERNAL-IP  PORT(S)                        AGE
kxinsights-aggregator            ClusterIP      10.0.0.111    <none>       5070/TCP                       2m59s
kxinsights-hdb-da                ClusterIP      None          <none>       5080/TCP                       2m59s
kxinsights-resource-coordinator  ClusterIP      10.0.0.112    <none>       5060/TCP                       2m59s
kxinsights-sg-gateway            LoadBalancer   10.0.0.113    192.0.2.127  8080:31881/TCP,5050:31943/TCP  2m59s

Query

Using the example IP of 192.0.2.127, we can submit queries over q-IPC, or HTTP.

h:hopen `:192.0.2.127:5050
x:(`.kxi.getData;
    `table`region`startTS`endTS`filter!(`trade;`Canada;-0wp;0wp;"sym=`ODLI, size within 50 100");
    `callback;
    (0#`)!());
// Sync, callback not used
show h x;
// async
callback:{[x] show (`callback; x)};
neg[h] x

Note

By default, a classic Amazon Elastic LoadBalancer will have an idle timeout of 60 seconds. You may wish to modify this, to beyond 60, if you aren't going to query that frequently. https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/config-idle-timeout.html

Using HTTP, with curl.

curl -X POST\
    --header "Content-Type: application/json"\
    --header "Accepted: application/json"\
    --data '{ "table":  "trade", "startTS":"2021.08.31D00:00:00.000000000", "endTS":"2021.09.01D00:00:00.000000000", "region": "Canada"}'\
    "http://192.0.2.127:8080/kxi/getData"