Skip to content

Object storage

Automatically authenticate with cloud credentials via the Kurl module and allow native access to cloud object storage.

This guide will help create a simple app using Docker, Qpacker, the object-storage module objstor.qpk, and the Kxreaper cache-clearing application.

Data

Public data has been provided for both AWS and GCP. Instructions have also been included to allow you to create kdb+ data on Azure blob storage.

$ gsutil ls gs://kxinsights-marketplace-data/
gs://kxinsights-marketplace-data/sym
gs://kxinsights-marketplace-data/db/

$ gsutil ls gs://kxinsights-marketplace-data/db/ | head -5
gs://kxinsights-marketplace-data/db/2020.01.01/
gs://kxinsights-marketplace-data/db/2020.01.02/
gs://kxinsights-marketplace-data/db/2020.01.03/
gs://kxinsights-marketplace-data/db/2020.01.06/
gs://kxinsights-marketplace-data/db/2020.01.07/
$ aws s3 ls s3://kxinsights-marketplace-data/
PRE db/
2021-03-10 21:19:33      42568 sym

$aws s3 ls  s3://kxinsights-marketplace-data/ | head -5
PRE 2020.01.01/
PRE 2020.01.02/
PRE 2020.01.03/
PRE 2020.01.06/
PRE 2020.01.07/
$ az storage blob list --account-name kxinsightsmarketplace \
  --container-name data | jq -r '.[] | .name' | tail -5
db/2020.12.30/trade/size
db/2020.12.30/trade/stop
db/2020.12.30/trade/sym
db/2020.12.30/trade/time
sym

Create a sample using your selected cloud provider:

$ mkdir ~/db
$ gsutil cp gs://kxinsights-marketplace-data/sym ~/db/.
$ tee ~/db/par.txt << EOF
> gs://kxinsights-marketplace-data/db
> EOF
$ mkdir ~/db
$ aws s3 cp s3://kxinsights-marketplace-data/sym ~/db/.
$ tee ~/db/par.txt << EOF
> s3://kxinsights-marketplace-data/db
> EOF
$ mkdir ~/db
$ # example commands below:
$ az storage blob download --account-name kxinsightsmarketplace \
  --container-name data --name sym --file db/sym
$ tee ~/db/par.txt << EOF
> ms://data/db
> EOF

This will create a standard HDB root directory where the partition in use is object storage. There should be no trailing / on the object store location in par.txt.

/home/user/db/
├── par.txt
└── sym

Running locally

Run qce on this directory:

$ ls
par.txt  sym
$ qce .
KDB+ 4.1t 2021.06.12 Copyright (C) 1993-2021 Kx Systems
l64/ 20()core 64227MB XXXXXX XXXXXXXXXX 127.0.1.1 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX KXCE XXXXXXXX

q)

You should now be able to run queries:

q)tables[]
`s#`quote`trade
q)select count i by date from quote
date      | x        
----------| ---------
2018.09.04| 692639728
2018.09.05| 762152767
2018.09.06| 788482304
2018.09.07| 801891891
2018.09.10| 635192966

The performance is limited by the speed of your network to the cloud storage, so creating a cache on fast SSDs or on local NVME (or even in shared memory) can be desirable.

You can also try out the new SQL support:

q)s)select sym, avg(price) from trade where cast(date as date) = '2018-09-04' group by sym

If you set these environment variables before starting qce, the cache will be enabled:

$ export KX_S3_CACHE_PATH=/dev/shm/cache/
$ export KX_S3_CACHE_SIZE=10000000
$ ls
par.txt  sym
$ qce .
KDB+ 4.1t 2021.06.12 Copyright (C) 1993-2021 Kx Systems
l64/ 20()core 64227MB XXXXXX XXXXXXXXXX 127.0.1.1 XXXXXXXXXXXXXXXX KXCE XXXXXX

q)\t select count i by date from quote
4785
q)\t select count i by date from quote
0

You will want to run the kxreaper application to prune the cache automatically if it gets full.

q)\kxreaper "$KX_S3_CACHE_PATH" "$KX_S3_CACHE_SIZE" &

Cloud HDB application

Create a historical database reader process using QPacker

Make a new directory for your project and copy into it objstor.qpk from the kdb+ Cloud Edition pack.

Create qp.json.

{
    "hdb": {
        "ui": "console",
        "entry": [ "hdb.q" ]
    }
}

Create the entrypoint script hdb.q used by Docker, which will load the HDB directory. Replace <user> with your username.

hdb.q

system"c 20 200"
system"l /home/<user>/db"

Create a location to store cache data to be used later e.g. /fastssd/s3cache/<user>. Use this location to update the third Docker volume location in the next step.

Create a docker-compose.yml file to run your application. Update the <user> and Docker volume.

version: "3.7"
services:
  hdb:
    image: "${hdb}"
    volumes:
      - /home/<user>/db:/home/<user>/db
      - /fastssd/s3cache/<user>:/fastssd/s3cache/<user>
    env_file:
      - .env
    command: -p 5010
    tty: true
    stdin_open: true

Build the project: qp build.

For AWS and Azure append to the qpbuild/.env file:

AWS_REGION=us-east-2
AZURE_STORAGE_ACCOUNT=kxinsightsmarketplace
KX_KURL_DISABLE_AUTO_REGISTER=1

Run the application by first copying the docker-compose.yml file into the newly created qpbuild folder. Move into the qpbuild folder and run docker-compose up. You should see something much like

$ docker-compose up
Building with native build. Learn about native build in Compose here: https://docs.docker.com/go/compose-native-build/
Starting qpbuild_hdb_1 ... done
Attaching to qpbuild_hdb_1
hdb_1  | INFO: Using existing license file [/opt/kx/lic/kc.lic]
hdb_1  | RUN [q startq.q -p 5010 -s 10]
hdb_1  | KDB+cloud 4.0 2021.02.02 Copyright (C) 1993-2021 Kx Systems
hdb_1  | l64/ 2()core 7960MB root 44c9ff69a925 172.26.0.2 EXPIRE 2022.01.22 user@kx.com KXCE #????????
hdb_1  |

In another terminal run docker attach qpbuild_hdb_1. This will attach you to the Docker container running the q code.

Press Return to see the q prompt. Run queries as you normally would against the kdb+ data on object storage.

q)tables[]
`s#`quote`trade

q)meta trade
c    | t f a
-----| -----
date | d
time | t
sym  | s   p
cond | c
ex   | c
price| e
size | i
stop | b

q)select count sym by date from quote where date in 2020.01.01 2020.01.02
date      | sym    
----------| -------
2020.01.01| 1890944
2020.01.02| 1890944

Exit the q process with \\ and run docker-compose down.

Cache clearing

Configure the Kxreaper cache-clearing app. This is a utility program to limit the amount of object storage data cached on local SSD by kdb+.

Cloud object storage such as AWS S3 is slow relative to local storage such as SSD. The performance of kdb+ when working with S3 can be improved by caching S3 data. Each query to S3 costs money; caching resulting data can help to reduce this cost.

Multiple kdb+ instances using the same HDB should use the same cache area, the base of which is stored in environment variable KX_S3_CACHE_PATH.

Append these line to your Docker .env file in the qpbuild folder. Check the path you created earlier.

KX_S3_CACHE_PATH=/fastssd/s3cache/<user>
KX_S3_CACHE_SIZE=673477140480

Add this line to your hdb.q file so that kxreaper starts automatically.

\kxreaper "$KX_S3_CACHE_PATH" "$KX_S3_CACHE_SIZE" &

Rebuild the application with qp build then restart the application using docker-compose up.

Notice the kxreaper application has now been started within the container.

$ docker-compose up
Building with native build. Learn about native build in Compose here: https://docs.docker.com/go/compose-native-build/
Starting qpbuild_hdb_1 ... done
Attaching to qpbuild_hdb_1
hdb_1  | INFO: Using existing license file [/opt/kx/lic/kc.lic]
hdb_1  | RUN [q startq.q -p 5010 -s 10]
hdb_1  | KDB+cloud 4.0t 2021.02.02 Copyright (C) 1993-2021 Kx Systems
hdb_1  | l64/ 2()core 7960MB root 44c9ff69a925 172.26.0.2 EXPIRE 2022.01.22 user@kx.com KXCE #????????
hdb_1  |
hdb_1  | kxreaper v1.0.0-a.1
hdb_1  | Watching cache dir /fastssd/s3cache/rtuser/objects
hdb_1  | Limiting to 83473735680 (MB)
hdb_1  | Reduced limit to available free disk space 22604 (MB)

Attach to the process and query.

Next you can try:

  • create a standard HDB partition using the local disk attached and incorporate it into your HDB database above
  • add secondary threads to the HDB q process and explore the performance benefits

-s command-line option


Object store documentation