Object storage
Automatically authenticate with cloud credentials via the Kurl module and allow native access to cloud object storage.
This guide will help create a simple app using Docker, Qpacker, the object-storage module objstor.qpk
, and the Kxreaper cache-clearing application.
Cloud Data
Public data has been provided for AWS, MS Azure and GCP.
$ gsutil ls gs://kxinsights-marketplace-data/
gs://kxinsights-marketplace-data/sym
gs://kxinsights-marketplace-data/db/
$ gsutil ls gs://kxinsights-marketplace-data/db/ | head -5
gs://kxinsights-marketplace-data/db/2020.01.01/
gs://kxinsights-marketplace-data/db/2020.01.02/
gs://kxinsights-marketplace-data/db/2020.01.03/
gs://kxinsights-marketplace-data/db/2020.01.06/
gs://kxinsights-marketplace-data/db/2020.01.07/
$ aws s3 ls s3://kxinsights-marketplace-data/
PRE db/
2021-03-10 21:19:33 42568 sym
$aws s3 ls s3://kxinsights-marketplace-data/ | head -5
PRE 2020.01.01/
PRE 2020.01.02/
PRE 2020.01.03/
PRE 2020.01.06/
PRE 2020.01.07/
$ az storage blob list --account-name kxinsightsmarketplace \
--container-name data | jq -r '.[] | .name' | tail -5
db/2020.12.30/trade/size
db/2020.12.30/trade/stop
db/2020.12.30/trade/sym
db/2020.12.30/trade/time
sym
Create a sample using your selected cloud provider:
$ mkdir ~/db
$ gsutil cp gs://kxinsights-marketplace-data/sym ~/db/.
$ tee ~/db/par.txt << EOF
> gs://kxinsights-marketplace-data/db
> EOF
$ mkdir ~/db
$ aws s3 cp s3://kxinsights-marketplace-data/sym ~/db/.
$ tee ~/db/par.txt << EOF
> s3://kxinsights-marketplace-data/db
> EOF
$ mkdir ~/db
$ # example commands below:
$ az storage blob download --account-name kxinsightsmarketplace \
--container-name data --name sym --file sym
$ tee ~/db/par.txt << EOF
> ms://data/db
> EOF
This will create a standard HDB root directory where the partition in use is object storage. There should be no trailing /
on the object store location in par.txt
.
$ tree ~/db/
/home/user/db/
├── par.txt
└── sym
Running locally
Run qce
on this directory:
$ ls
par.txt sym
$ qce .
KDB+ 4.0 2021.06.12 Copyright (C) 1993-2021 Kx Systems
l64/ 20()core 64227MB XXXXXX XXXXXXXXXX 127.0.1.1 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX KXCE XXXXXXXX
q)
You should now be able to run queries:
q)tables[]
`s#`quote`trade
q)select count i by date from quote
date | x
----------| ---------
2018.09.04| 692639728
2018.09.05| 762152767
2018.09.06| 788482304
2018.09.07| 801891891
2018.09.10| 635192966
The performance is limited by the speed of your network to the cloud storage, so creating a cache on fast SSDs or on local NVME (or even in shared memory) can be desirable.
If you set these environment variables before starting qce
, the cache will be enabled:
$ export KX_OBJSTR_CACHE_PATH=/dev/shm/cache/
$ export KX_OBJSTR_CACHE_SIZE=10000000
$ ls
par.txt sym
$ qce .
KDB+ 4.0 2021.06.12 Copyright (C) 1993-2021 Kx Systems
l64/ 20()core 64227MB XXXXXX XXXXXXXXXX 127.0.1.1 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX KXCE XXXXXXXX
q)\t select count i by date from quote
4785
q)\t select count i by date from quote
0
You will want to run the kxreaper
application to prune the cache automatically if it gets full.
q)\kxreaper "$KX_OBJSTR_CACHE_PATH" "$KX_OBJSTR_CACHE_SIZE" &
S3 Compatible Object Store
Demonstrate integration with S3 compatible object store min.io.
Prerequisites
The aws cli must be installed to work through this example.
Start minio server using docker or podman.
docker run -d -p 9000:9000 -p 9001:9001 -e "MINIO_ROOT_USER=<insert minio user>" -e "MINIO_ROOT_PASSWORD=<insert minio password>" minio/minio server /data --console-address ":9001"
Set the aws credentials using aws configure
. The AWS Access Key ID should be set to the MINIO_ROOT_USER
and AWS Secret Access Key to MINIO_ROOT_PASSWORD
(see command above).
Create some test data
d:2021.09.01+til 20
{[d;n]sv[`;.Q.par[`:test/db/;d;`trade],`]set .Q.en[`:test/;([]sym:`$'n?.Q.A;time:.z.P+til n;price:n?100f;size:n?50)];}[;10000]each d
Create a bucket in minio and copy in the test data
aws --endpoint-url http://127.0.0.1:9000 s3 mb s3://test
aws --endpoint-url http://127.0.0.1:9000 s3 sync test/db/. s3://test
aws --endpoint-url http://127.0.0.1:9000 s3 ls s3://test
Create a hdb root directory
mkdir dbroot
cp test/sym dbroot/
echo "s3://test" > dbroot/par.txt
Add the below environment variables as applicable
AWS_REGION=us-east-2
AZURE_STORAGE_ACCOUNT=kxinsightsmarketplace
Run the below
az storage account keys list --resource-group DefaultResourceGroup-EUS --account-name kxinsightsmarketplace
And set AZURE_STORAGE_SHARED_KEY to the key value that is returned
Start a qce
process as below.
export KX_S3_ENDPOINT=http://127.0.0.1:9000
export KX_S3_USE_PATH_REQUEST_STYLE=1
qce dbroot -q
Query the data in the bucket.
tables[]
,`trade
select count sym from trade
sym
------
200000
5#select count sym by date from trade
date | sym
----------| -----
2021.09.01| 10000
2021.09.02| 10000
2021.09.03| 10000
2021.09.04| 10000
2021.09.05| 10000
Note
To see which urls are used in the query, export KX_TRACE_OBJSTR=1
and restart the qce
process.
Cloud HDB application
Create a historical database reader process using QPacker
Make a new directory for your project and copy into it objstor.qpk
from the kdb+ Cloud Edition pack.
Create qp.json
.
{
"hdb": {
"ui": "console",
"entry": [ "hdb.q" ]
}
}
Create the entrypoint script hdb.q
used by Docker, which will load the HDB directory. Replace <user>
with your username.
hdb.q
system"c 20 200"
system"l /home/<user>/db"
Create a location to store cache data to be used later e.g. /fastssd/s3cache/<user>
. Use this location to update the third Docker volume location in the next step.
Create a docker-compose.yml
file to run your application.
Update the <user>
and Docker volume.
version: "3.7"
services:
hdb:
image: "${hdb}"
volumes:
- /home/<user>/db:/home/<user>/db
- /fastssd/s3cache/<user>:/fastssd/s3cache/<user>
env_file:
- .env
command: -p 5010
tty: true
stdin_open: true
Build the project: qp build
.
For AWS and Azure append to the qpbuild/.env
file:
AWS_REGION=us-east-2
AZURE_STORAGE_ACCOUNT=kxinsightsmarketplace
KX_KURL_DISABLE_AUTO_REGISTER=1
Run the application by first copying the docker-compose.yml
file into the newly created qpbuild
folder. Move into the qpbuild
folder and run docker-compose up
. You should see something much like
$ docker-compose up
Building with native build. Learn about native build in Compose here: https://docs.docker.com/go/compose-native-build/
Starting qpbuild_hdb_1 ... done
Attaching to qpbuild_hdb_1
hdb_1 | INFO: Using existing license file [/opt/kx/lic/kc.lic]
hdb_1 | RUN [q startq.q -p 5010 -s 10]
hdb_1 | KDB+ 4.0 2021.06.12 Copyright (C) 1993-2021 Kx Systems
hdb_1 | l64/ 2()core 7960MB root 44c9ff69a925 172.26.0.2 EXPIRE 2022.01.22 user@kx.com KXCE #????????
hdb_1 |
In another terminal run docker attach qpbuild_hdb_1
. This will attach you to the Docker container running the q code.
Press Return to see the q prompt. Run queries as you normally would against the kdb+ data on object storage.
q)tables[]
`s#`quote`trade
q)meta trade
c | t f a
-----| -----
date | d
time | t
sym | s p
cond | c
ex | c
price| e
size | i
stop | b
q)select count sym by date from quote where date in 2020.01.01 2020.01.02
date | sym
----------| -------
2020.01.01| 1890944
2020.01.02| 1890944
Exit the q process with \\
and run docker-compose down
.
Cache clearing
Configure the Kxreaper cache-clearing app. This is a utility program to limit the amount of object storage data cached on local SSD by kdb+.
Cloud object storage such as AWS S3 is slow relative to local storage such as SSD. The performance of kdb+ when working with S3 can be improved by caching S3 data. Each query to S3 costs money; caching resulting data can help to reduce this cost.
Multiple kdb+ instances using the same HDB should use the same cache area, the base of which is stored in environment variable KX_OBJSTR_CACHE_PATH
.
Append these line to your Docker .env
file in the qpbuild
folder.
Check the path you created earlier.
KX_OBJSTR_CACHE_PATH=/fastssd/s3cache/<user>
KX_OBJSTR_CACHE_SIZE=673477140480
Add this line to your hdb.q
file so that kxreaper starts automatically.
\kxreaper "$KX_OBJSTR_CACHE_PATH" "$KX_OBJSTR_CACHE_SIZE" &
Rebuild the application with qp build
then restart the application using docker-compose up
.
Notice the kxreaper application has now been started within the container.
$ docker-compose up
Building with native build. Learn about native build in Compose here: https://docs.docker.com/go/compose-native-build/
Starting qpbuild_hdb_1 ... done
Attaching to qpbuild_hdb_1
hdb_1 | INFO: Using existing license file [/opt/kx/lic/kc.lic]
hdb_1 | RUN [q startq.q -p 5010 -s 10]
hdb_1 | KDB+cloud 4.0t 2021.02.02 Copyright (C) 1993-2021 Kx Systems
hdb_1 | l64/ 2()core 7960MB root 44c9ff69a925 172.26.0.2 EXPIRE 2022.01.22 user@kx.com KXCE #????????
hdb_1 |
hdb_1 | kxreaper v1.0.0-a.1
hdb_1 | Watching cache dir /fastssd/s3cache/rtuser/objects
hdb_1 | Limiting to 83473735680 (MB)
hdb_1 | Reduced limit to available free disk space 22604 (MB)
Attach to the process and query.
Next you can try:
- create a standard HDB partition using the local disk attached and incorporate it into your HDB database above
- add secondary threads to the HDB q process and explore the performance benefits