Skip to content

Object Storage Quickstart

This page explains how to set up a simple app using Docker, the object storage module, and the Kxreaper cache-clearing application.

Automatically authenticate with cloud credentials using the Kurl module and allow native access to cloud object storage.

Cloud Data

Setup

Public data has been provided for AWS, MS Azure and GCP.

# set AZURE_STORAGE_ACCOUNT=kxinsightsmarketplace
# set AZURE_STORAGE_SHARED_KEY to value returned by below
# az storage account keys list --account-name kxinsightsmarketplace --resource-group kxinsightsmarketplace
$ gsutil ls gs://kxinsights-marketplace-data/
gs://kxinsights-marketplace-data/sym
gs://kxinsights-marketplace-data/db/

$ gsutil ls gs://kxinsights-marketplace-data/db/ | head -n 5
gs://kxinsights-marketplace-data/db/2020.01.01/
gs://kxinsights-marketplace-data/db/2020.01.02/
gs://kxinsights-marketplace-data/db/2020.01.03/
gs://kxinsights-marketplace-data/db/2020.01.06/
gs://kxinsights-marketplace-data/db/2020.01.07/
# set AWS_REGION=eu-west-1 before starting `q`
$ aws s3 ls s3://kxs-prd-cxt-twg-roinsightsdemo/kxinsights-marketplace-data/
PRE db/
2021-03-10 21:19:33      42568 sym

$ aws s3 ls s3://kxs-prd-cxt-twg-roinsightsdemo/kxinsights-marketplace-data/db/ | head -n 5
PRE 2020.01.01/
PRE 2020.01.02/
PRE 2020.01.03/
PRE 2020.01.06/
PRE 2020.01.07/
$ az storage blob list --account-name kxinsightsmarketplace \
  --container-name data | jq -r '.[] | .name' | tail -n 5
db/2020.12.30/trade/size
db/2020.12.30/trade/stop
db/2020.12.30/trade/sym
db/2020.12.30/trade/time
sym

Create a sample using your selected cloud provider:

$ mkdir ~/db
$ gsutil cp gs://kxinsights-marketplace-data/sym ~/db/
$ tee ~/db/par.txt << EOF
> gs://kxinsights-marketplace-data/db
> EOF
$ mkdir ~/db
$ aws s3 cp s3://kxs-prd-cxt-twg-roinsightsdemo/kxinsights-marketplace-data/sym ~/db/
$ tee ~/db/par.txt << EOF
> s3://kxs-prd-cxt-twg-roinsightsdemo/kxinsights-marketplace-data/db
> EOF
$ mkdir ~/db
$ # example commands below:
$ az storage blob download --account-name kxinsightsmarketplace \
  --container-name data --name sym --file sym
$ tee ~/db/par.txt << EOF
> ms://data/db
> EOF

This creates a standard HDB root directory where the partition in use is object storage. There should be no trailing / on the object store location in par.txt.

$ tree ~/db/
/home/user/db/
├── par.txt
└── sym

Querying data

Run q on this directory:

$ ls
par.txt  sym
$ q /home/user/db/
KDB+ 4.0 2021.06.12 Copyright (C) 1993-2021 Kx Systems
l64/ 20()core 64227MB XXXXXX XXXXXXXXXX 127.0.1.1 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX KXCE XXXXXXXX

q)

You should now be able to run queries:

q)tables[]
`s#`quote`trade
q)select count i by date from quote
date      | x
----------| ---------
2018.09.04| 692639728
2018.09.05| 762152767
2018.09.06| 788482304
2018.09.07| 801891891
2018.09.10| 635192966

The performance is limited by the speed of your network to the cloud storage, so creating a cache on fast SSDs or on local NVME (or even in shared memory) can be desirable.

If you set these environment variables before starting q, the cache is enabled. The following shows caching improving the speed of subsequent requests, by using \t:

$ export KX_OBJSTR_CACHE_PATH=/dev/shm/cache/
$ ls
par.txt  sym
$ q /home/user/db
KDB+ 4.0 2021.06.12 Copyright (C) 1993-2021 Kx Systems
l64/ 20()core 64227MB XXXXXX XXXXXXXXXX 127.0.1.1 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX KXCE XXXXXXXX

q)\t select count i by date from quote
4785
q)\t select count i by date from quote
0

Run the kxreaper application to prune the cache automatically if it gets full. The following runs Kxreaper to monitor the cache directory and limit the space used to 10000MB:

$ kxreaper "$KX_OBJSTR_CACHE_PATH" 10000 &

S3 Compatible Object Store

This guide demonstrates how data can be stored and queried from any object store that's compatible with the S3 interface. This example uses Docker to create a local object store using the MinIO implementation of an S3-compatible server.

Prerequisites

The AWS CLI must be installed to work through this example.

Start minio server using docker or podman.

docker run -it --rm -p 9000:9000 -p 9001:9001 -e "MINIO_ROOT_USER=<insert minio user>" -e "MINIO_ROOT_PASSWORD=<insert minio password>" minio/minio server /data --console-address ":9001"

Set the AWS credentials by executing aws configure. The AWS Access Key ID should be set to the MINIO_ROOT_USER and AWS Secret Access Key to MINIO_ROOT_PASSWORD (refer to the command above).

Create some test data using kdb+ into the directory ./test

q)d:2021.09.01+til 20
q){[d;n]sv[`;.Q.par[`:test/db/;d;`trade],`]set .Q.en[`:test/;([]sym:`$'n?.Q.A;time:.z.P+til n;price:n?100f;size:n?50)];}[;10000]each d

Create a bucket in MinIO called test

aws --endpoint-url http://127.0.0.1:9000 s3 mb s3://test

Copy the generated test data to the test bucket

aws --endpoint-url http://127.0.0.1:9000 s3 sync test/db s3://test
aws --endpoint-url http://127.0.0.1:9000 s3 ls s3://test

Create a hdb root directory called dbroot that uses data from the test bucket

mkdir dbroot
cp test/sym dbroot/
echo "s3://test" > dbroot/par.txt

Start a q process as below, loading the dbroot directory that uses the test bucket. Environment variables are set to change the queries to use the MinIO server.

export KX_S3_ENDPOINT=http://127.0.0.1:9000
export KX_S3_USE_PATH_REQUEST_STYLE=1
q dbroot -q

Query the data in the bucket, which automatically retrieves the required data from the S3-compatible object store (MinIO)

q)tables[]
,`trade
q)select count sym from trade
sym
------
200000
q)5#select count sym by date from trade
date      | sym
----------| -----
2021.09.01| 10000
2021.09.02| 10000
2021.09.03| 10000
2021.09.04| 10000
2021.09.05| 10000

To see which urls are used in the query, export KX_TRACE_OBJSTR=1 and restart the q process.