Object storage quickstart
Automatically authenticate with cloud credentials via the Kurl module and allow native access to cloud object storage.
This guide will help create a simple app using Docker, Qpacker, the object-storage module objstor.qpk
, and the Kxreaper cache-clearing application.
Cloud Data
Public data has been provided for AWS, MS Azure and GCP.
$ gsutil ls gs://kxinsights-marketplace-data/
gs://kxinsights-marketplace-data/sym
gs://kxinsights-marketplace-data/db/
$ gsutil ls gs://kxinsights-marketplace-data/db/ | head -5
gs://kxinsights-marketplace-data/db/2020.01.01/
gs://kxinsights-marketplace-data/db/2020.01.02/
gs://kxinsights-marketplace-data/db/2020.01.03/
gs://kxinsights-marketplace-data/db/2020.01.06/
gs://kxinsights-marketplace-data/db/2020.01.07/
$ aws s3 ls s3://kxinsights-marketplace-data/
PRE db/
2021-03-10 21:19:33 42568 sym
$ aws s3 ls s3://kxinsights-marketplace-data/ | head -5
PRE 2020.01.01/
PRE 2020.01.02/
PRE 2020.01.03/
PRE 2020.01.06/
PRE 2020.01.07/
$ az storage blob list --account-name kxinsightsmarketplace \
--container-name data | jq -r '.[] | .name' | tail -5
db/2020.12.30/trade/size
db/2020.12.30/trade/stop
db/2020.12.30/trade/sym
db/2020.12.30/trade/time
sym
Create a sample using your selected cloud provider:
$ mkdir ~/db
$ gsutil cp gs://kxinsights-marketplace-data/sym ~/db/.
$ tee ~/db/par.txt << EOF
> gs://kxinsights-marketplace-data/db
> EOF
$ mkdir ~/db
$ aws s3 cp s3://kxinsights-marketplace-data/sym ~/db/.
$ tee ~/db/par.txt << EOF
> s3://kxinsights-marketplace-data/db
> EOF
$ mkdir ~/db
$ # example commands below:
$ az storage blob download --account-name kxinsightsmarketplace \
--container-name data --name sym --file sym
$ tee ~/db/par.txt << EOF
> ms://data/db
> EOF
This will create a standard HDB root directory where the partition in use is object storage. There should be no trailing /
on the object store location in par.txt
.
$ tree ~/db/
/home/user/db/
├── par.txt
└── sym
Running locally
Run q
on this directory:
$ ls
par.txt sym
$ q
KDB+ 4.0 2021.06.12 Copyright (C) 1993-2021 Kx Systems
l64/ 20()core 64227MB XXXXXX XXXXXXXXXX 127.0.1.1 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX KXCE XXXXXXXX
q)
You should now be able to run queries:
q)tables[]
`s#`quote`trade
q)select count i by date from quote
date | x
----------| ---------
2018.09.04| 692639728
2018.09.05| 762152767
2018.09.06| 788482304
2018.09.07| 801891891
2018.09.10| 635192966
The performance is limited by the speed of your network to the cloud storage, so creating a cache on fast SSDs or on local NVME (or even in shared memory) can be desirable.
If you set these environment variables before starting q
, the cache will be enabled:
$ export KX_OBJSTR_CACHE_PATH=/dev/shm/cache/
$ export KX_OBJSTR_CACHE_SIZE=10000000
$ ls
par.txt sym
$ q
KDB+ 4.0 2021.06.12 Copyright (C) 1993-2021 Kx Systems
l64/ 20()core 64227MB XXXXXX XXXXXXXXXX 127.0.1.1 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX KXCE XXXXXXXX
q)\t select count i by date from quote
4785
q)\t select count i by date from quote
0
You will want to run the kxreaper
application to prune the cache automatically if it gets full.
q)\kxreaper "$KX_OBJSTR_CACHE_PATH" "$KX_OBJSTR_CACHE_SIZE" &
S3 Compatible Object Store
Demonstrate integration with S3 compatible object store min.io.
Prerequisites
The aws cli must be installed to work through this example.
Start minio server using docker or podman.
docker run -d -p 9000:9000 -p 9001:9001 -e "MINIO_ROOT_USER=<insert minio user>" -e "MINIO_ROOT_PASSWORD=<insert minio password>" minio/minio server /data --console-address ":9001"
Set the aws credentials using aws configure
. The AWS Access Key ID should be set to the MINIO_ROOT_USER
and AWS Secret Access Key to MINIO_ROOT_PASSWORD
(see command above).
Create some test data
d:2021.09.01+til 20
{[d;n]sv[`;.Q.par[`:test/db/;d;`trade],`]set .Q.en[`:test/;([]sym:`$'n?.Q.A;time:.z.P+til n;price:n?100f;size:n?50)];}[;10000]each d
Create a bucket in minio and copy in the test data
aws --endpoint-url http://127.0.0.1:9000 s3 mb s3://test
aws --endpoint-url http://127.0.0.1:9000 s3 sync test/db/. s3://test
aws --endpoint-url http://127.0.0.1:9000 s3 ls s3://test
Create a hdb root directory
mkdir dbroot
cp test/sym dbroot/
echo "s3://test" > dbroot/par.txt
Add the below environment variables as applicable
AWS_REGION=us-east-2
AZURE_STORAGE_ACCOUNT=kxinsightsmarketplace
Run the below
az storage account keys list --resource-group DefaultResourceGroup-EUS --account-name kxinsightsmarketplace
And set AZURE_STORAGE_SHARED_KEY to the key value that is returned
Start a q
process as below.
export KX_S3_ENDPOINT=http://127.0.0.1:9000
export KX_S3_USE_PATH_REQUEST_STYLE=1
q dbroot -q
Query the data in the bucket.
tables[]
,`trade
select count sym from trade
sym
------
200000
5#select count sym by date from trade
date | sym
----------| -----
2021.09.01| 10000
2021.09.02| 10000
2021.09.03| 10000
2021.09.04| 10000
2021.09.05| 10000
Note
To see which urls are used in the query, export KX_TRACE_OBJSTR=1
and restart the q
process.