Skip to content

S3 Historial Data

This example will deploy the service gateway, with Historial Databases (HDBs) that mount S3 buckets.

You will need:

  • An existing cloud storage bucket, with a date partitioned database
  • A Kubernetes Cluster with read access to the above bucket
  • local copy of sym file, and par.txt for the cloud storage buckets
  • Kubernetes secrets for a Kx license and image pull secrets
  • Kubernetes service account with S3 Read Access, named kx-s3-read-access.

Note

Image pull secrets and Kx license info variables are named kx-repo-access and kx-license-info respectively. You may need to rename these references.

For reference on starting a new cluster, see:

Introducing fine-grained IAM roles for service accounts

Deployment

An example kubernetes configuration file is available here.

If you do not already have one, create a Service Account to allow read access to Amazon S3, named kx-s3-read-access. This can be done using eksctl.

eksctl create iamserviceaccount --name kx-s3-read-access\
    --namespace <your namespace>\
    --region <your region>\
    --cluster <your cluster>\
    --attach-policy-arn arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess\
    --approve

If not using the default namespace, find and replace the string default within s3Deployment.yml.

# Examples of Find and Replace using sed
sed -i 's|default|yourNamespace|g' s3Deployment.yml    # Linux
sed -i "" 's|default|yourNamespace|g' s3Deployment.yml # Mac OSX

Create a ConfigMap with par.txt and your buckets sym file.

kubectl create configmap kxinsights-s3-configmap\
    --from-file=sym=/path/to/sym\
    --from-file=par.txt=/path/to/par.txt

The assembly configuration in the example is setup to describe the canonical trade and quote schemas used in many Kx proof of concepts. If you are not using trades or quotes data, you will need to modify the assembly section in the s3Deployment.yml.

Install resources and run the deployment with:

kubectl apply -f s3Deployment.yml

kubectl get pods can be used to view all of the running pods.

# kubectl get pods
NAME                                               READY   STATUS    RESTARTS   AGE
kxinsights-aggregator-786d9bc674-fz94d             1/1     Running   0          11s
kxinsights-aggregator-786d9bc674-p4k6f             1/1     Running   0          11s
kxinsights-aggregator-786d9bc674-rxg99             1/1     Running   0          11s
kxinsights-hdb-da-0                                1/1     Running   1          10s
kxinsights-hdb-da-1                                1/1     Running   1          6s
kxinsights-hdb-da-2                                1/1     Running   0          4s
kxinsights-resource-coordinator-5664d4f898-m6plb   1/1     Running   0          11s
kxinsights-sg-gateway-54596c8fc7-dwcnv             1/1     Running   0          11s
kxinsights-sg-gateway-54596c8fc7-jm8fc             1/1     Running   0          11s
kxinsights-sg-gateway-54596c8fc7-kqg4f             1/1     Running   0          11s

kubectl get services can be used to print the IP address of the Gateway LoadBalancer, shown below with the example IP of 192.0.2.127.

NAME                             TYPE           CLUSTER-IP    EXTERNAL-IP  PORT(S)                        AGE
kxinsights-aggregator            ClusterIP      10.0.0.111    <none>       5070/TCP                       2m59s
kxinsights-hdb-da                ClusterIP      None          <none>       5080/TCP                       2m59s
kxinsights-resource-coordinator  ClusterIP      10.0.0.112    <none>       5060/TCP                       2m59s
kxinsights-sg-gateway            LoadBalancer   10.0.0.113    192.0.2.127  8080:31881/TCP,5050:31943/TCP  2m59s

Query

Using the example IP of 192.0.2.127, we can submit queries over q-IPC, or HTTP.

h:hopen `:192.0.2.127:5050
x:(`.kxi.getData;
    `table`region`startTS`endTS`filter!(`trade;`Canada;-0wp;0wp;"sym=`ODLI, size within 50 100");
    `callback;
    (0#`)!());
// Sync, callback not used
show h x;
// async
callback:{[x] show (`callback; x)};
neg[h] x

Note

By default, a classic Amazon Elastic LoadBalancer will have an idle timeout of 60 seconds. You may wish to modify this, to beyond 60, if you aren't going to query that frequently. https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/config-idle-timeout.html

Using HTTP, with curl.

curl -X POST\
    --header "Content-Type: application/json"\
    --header "Accepted: application/json"\
    --data '{ "table":  "trade", "startTS":"2021.08.31D00:00:00.000000000", "endTS":"2021.09.01D00:00:00.000000000", "region": "Canada"}'\
    "http://192.0.2.127:8080/kxi/getData"

Note

When using HTTP, timestamps must include all digits, nulls and infinites yet not supported.