Skip to content

Object storage

Automatically authenticate with cloud credentials via the Kurl module and allow native access to cloud object storage.

This guide will help create a simple app using Docker, Qpacker, the object-storage module objstor.qpk, and the Kxreaper cache-clearing application.

Cloud Data

Public data has been provided for AWS, MS Azure and GCP.

$ gsutil ls gs://kxinsights-marketplace-data/
gs://kxinsights-marketplace-data/sym
gs://kxinsights-marketplace-data/db/

$ gsutil ls gs://kxinsights-marketplace-data/db/ | head -5
gs://kxinsights-marketplace-data/db/2020.01.01/
gs://kxinsights-marketplace-data/db/2020.01.02/
gs://kxinsights-marketplace-data/db/2020.01.03/
gs://kxinsights-marketplace-data/db/2020.01.06/
gs://kxinsights-marketplace-data/db/2020.01.07/
$ aws s3 ls s3://kxinsights-marketplace-data/
PRE db/
2021-03-10 21:19:33      42568 sym

$aws s3 ls  s3://kxinsights-marketplace-data/ | head -5
PRE 2020.01.01/
PRE 2020.01.02/
PRE 2020.01.03/
PRE 2020.01.06/
PRE 2020.01.07/
$ az storage blob list --account-name kxinsightsmarketplace \
  --container-name data | jq -r '.[] | .name' | tail -5
db/2020.12.30/trade/size
db/2020.12.30/trade/stop
db/2020.12.30/trade/sym
db/2020.12.30/trade/time
sym

Create a sample using your selected cloud provider:

$ mkdir ~/db
$ gsutil cp gs://kxinsights-marketplace-data/sym ~/db/.
$ tee ~/db/par.txt << EOF
> gs://kxinsights-marketplace-data/db
> EOF
$ mkdir ~/db
$ aws s3 cp s3://kxinsights-marketplace-data/sym ~/db/.
$ tee ~/db/par.txt << EOF
> s3://kxinsights-marketplace-data/db
> EOF
$ mkdir ~/db
$ # example commands below:
$ az storage blob download --account-name kxinsightsmarketplace \
  --container-name data --name sym --file sym
$ tee ~/db/par.txt << EOF
> ms://data/db
> EOF

This will create a standard HDB root directory where the partition in use is object storage. There should be no trailing / on the object store location in par.txt.

$ tree ~/db/
/home/user/db/
├── par.txt
└── sym

Running locally

Run qce on this directory:

$ ls
par.txt  sym
$ qce .
KDB+ 4.0 2021.06.12 Copyright (C) 1993-2021 Kx Systems
l64/ 20()core 64227MB XXXXXX XXXXXXXXXX 127.0.1.1 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX KXCE XXXXXXXX

q)

You should now be able to run queries:

q)tables[]
`s#`quote`trade
q)select count i by date from quote
date      | x        
----------| ---------
2018.09.04| 692639728
2018.09.05| 762152767
2018.09.06| 788482304
2018.09.07| 801891891
2018.09.10| 635192966

The performance is limited by the speed of your network to the cloud storage, so creating a cache on fast SSDs or on local NVME (or even in shared memory) can be desirable.

If you set these environment variables before starting qce, the cache will be enabled:

$ export KX_OBJSTR_CACHE_PATH=/dev/shm/cache/
$ export KX_OBJSTR_CACHE_SIZE=10000000
$ ls
par.txt  sym
$ qce .
KDB+ 4.0 2021.06.12 Copyright (C) 1993-2021 Kx Systems
l64/ 20()core 64227MB XXXXXX XXXXXXXXXX 127.0.1.1 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX KXCE XXXXXXXX

q)\t select count i by date from quote
4785
q)\t select count i by date from quote
0

You will want to run the kxreaper application to prune the cache automatically if it gets full.

q)\kxreaper "$KX_OBJSTR_CACHE_PATH" "$KX_OBJSTR_CACHE_SIZE" &

S3 Compatible Object Store

Demonstrate integration with S3 compatible object store min.io.

Prerequisites

The aws cli must be installed to work through this example.

Start minio server using docker or podman.

docker run -d -p 9000:9000 -p 9001:9001 -e "MINIO_ROOT_USER=<insert minio user>" -e "MINIO_ROOT_PASSWORD=<insert minio password>" minio/minio server /data --console-address ":9001"

Set the aws credentials using aws configure. The AWS Access Key ID should be set to the MINIO_ROOT_USER and AWS Secret Access Key to MINIO_ROOT_PASSWORD(see command above).

Create some test data

d:2021.09.01+til 20
{[d;n]sv[`;.Q.par[`:test/db/;d;`trade],`]set .Q.en[`:test/;([]sym:`$'n?.Q.A;time:.z.P+til n;price:n?100f;size:n?50)];}[;10000]each d

Create a bucket in minio and copy in the test data

aws --endpoint-url http://127.0.0.1:9000 s3 mb s3://test
aws --endpoint-url http://127.0.0.1:9000 s3 sync test/db/. s3://test
aws --endpoint-url http://127.0.0.1:9000 s3 ls s3://test

Create a hdb root directory

mkdir dbroot
cp test/sym dbroot/
echo "s3://test" > dbroot/par.txt

Add the below environment variables as applicable

AWS_REGION=us-east-2
AZURE_STORAGE_ACCOUNT=kxinsightsmarketplace

Run the below
az storage account keys list --resource-group DefaultResourceGroup-EUS --account-name kxinsightsmarketplace

And set AZURE_STORAGE_SHARED_KEY to the key value that is returned

Start a qce process as below.

export KX_S3_ENDPOINT=http://127.0.0.1:9000
export KX_S3_USE_PATH_REQUEST_STYLE=1
qce dbroot -q

Query the data in the bucket.

tables[]
,`trade
select count sym from trade
sym
------
200000
5#select count sym by date from trade
date      | sym
----------| -----
2021.09.01| 10000
2021.09.02| 10000
2021.09.03| 10000
2021.09.04| 10000
2021.09.05| 10000

Note

To see which urls are used in the query, export KX_TRACE_OBJSTR=1 and restart the qce process.

Cloud HDB application

Create a historical database reader process using QPacker

Make a new directory for your project and copy into it objstor.qpk from the kdb+ Cloud Edition pack.

Create qp.json.

{
    "hdb": {
        "ui": "console",
        "entry": [ "hdb.q" ]
    }
}

Create the entrypoint script hdb.q used by Docker, which will load the HDB directory. Replace <user> with your username.

hdb.q

system"c 20 200"
system"l /home/<user>/db"

Create a location to store cache data to be used later e.g. /fastssd/s3cache/<user>. Use this location to update the third Docker volume location in the next step.

Create a docker-compose.yml file to run your application. Update the <user> and Docker volume.

version: "3.7"
services:
  hdb:
    image: "${hdb}"
    volumes:
      - /home/<user>/db:/home/<user>/db
      - /fastssd/s3cache/<user>:/fastssd/s3cache/<user>
    env_file:
      - .env
    command: -p 5010
    tty: true
    stdin_open: true

Build the project: qp build.

For AWS and Azure append to the qpbuild/.env file:

AWS_REGION=us-east-2
AZURE_STORAGE_ACCOUNT=kxinsightsmarketplace
KX_KURL_DISABLE_AUTO_REGISTER=1

Run the application by first copying the docker-compose.yml file into the newly created qpbuild folder. Move into the qpbuild folder and run docker-compose up. You should see something much like

$ docker-compose up
Building with native build. Learn about native build in Compose here: https://docs.docker.com/go/compose-native-build/
Starting qpbuild_hdb_1 ... done
Attaching to qpbuild_hdb_1
hdb_1  | INFO: Using existing license file [/opt/kx/lic/kc.lic]
hdb_1  | RUN [q startq.q -p 5010 -s 10]
hdb_1  | KDB+ 4.0 2021.06.12 Copyright (C) 1993-2021 Kx Systems
hdb_1  | l64/ 2()core 7960MB root 44c9ff69a925 172.26.0.2 EXPIRE 2022.01.22 user@kx.com KXCE #????????
hdb_1  |

In another terminal run docker attach qpbuild_hdb_1. This will attach you to the Docker container running the q code.

Press Return to see the q prompt. Run queries as you normally would against the kdb+ data on object storage.

q)tables[]
`s#`quote`trade

q)meta trade
c    | t f a
-----| -----
date | d
time | t
sym  | s   p
cond | c
ex   | c
price| e
size | i
stop | b
q)select count sym by date from quote where date in 2020.01.01 2020.01.02
date      | sym    
----------| -------
2020.01.01| 1890944
2020.01.02| 1890944

Exit the q process with \\ and run docker-compose down.

Cache clearing

Configure the Kxreaper cache-clearing app. This is a utility program to limit the amount of object storage data cached on local SSD by kdb+.

Cloud object storage such as AWS S3 is slow relative to local storage such as SSD. The performance of kdb+ when working with S3 can be improved by caching S3 data. Each query to S3 costs money; caching resulting data can help to reduce this cost.

Multiple kdb+ instances using the same HDB should use the same cache area, the base of which is stored in environment variable KX_OBJSTR_CACHE_PATH.

Append these line to your Docker .env file in the qpbuild folder. Check the path you created earlier.

KX_OBJSTR_CACHE_PATH=/fastssd/s3cache/<user>
KX_OBJSTR_CACHE_SIZE=673477140480

Add this line to your hdb.q file so that kxreaper starts automatically.

\kxreaper "$KX_OBJSTR_CACHE_PATH" "$KX_OBJSTR_CACHE_SIZE" &

Rebuild the application with qp build then restart the application using docker-compose up.

Notice the kxreaper application has now been started within the container.

$ docker-compose up
Building with native build. Learn about native build in Compose here: https://docs.docker.com/go/compose-native-build/
Starting qpbuild_hdb_1 ... done
Attaching to qpbuild_hdb_1
hdb_1  | INFO: Using existing license file [/opt/kx/lic/kc.lic]
hdb_1  | RUN [q startq.q -p 5010 -s 10]
hdb_1  | KDB+cloud 4.0t 2021.02.02 Copyright (C) 1993-2021 Kx Systems
hdb_1  | l64/ 2()core 7960MB root 44c9ff69a925 172.26.0.2 EXPIRE 2022.01.22 user@kx.com KXCE #????????
hdb_1  |
hdb_1  | kxreaper v1.0.0-a.1
hdb_1  | Watching cache dir /fastssd/s3cache/rtuser/objects
hdb_1  | Limiting to 83473735680 (MB)
hdb_1  | Reduced limit to available free disk space 22604 (MB)

Attach to the process and query.

Next you can try:

  • create a standard HDB partition using the local disk attached and incorporate it into your HDB database above
  • add secondary threads to the HDB q process and explore the performance benefits

-s command-line option


Object store documentation