Object Storage Quickstart
This page explains how to get started with the object storage module in KDB-X.
Learn about:
- Authentication
- File handle format
- Module load
- Reading data
- Accessing cloud data
- Using an S3-compatible object store
- Performance consideration
Authentication
Authenticate with cloud credentials using Kurl in order to get native access to cloud object storage.
File handle format
Access to files stored on cloud storage is differentiated by the following file prefixes:
aws `:s3://
gcp `:gs://
azure `:ms://
Objects can be accessed using the [prefix]/[bucket]/[key-name] format for file handles.
Module load
As a prerequisite, you must have the appropriate envrionment variables or credentials defined.
export AWS_ACCESS_KEY_ID="… "
export AWS_SECRET_ACCESS_KEY="…"
export AWS_SESSION_TOKEN="… "
export AZURE_STORAGE_SHARED_KEY=".."
export AZURE_STORAGE_ACCOUNT=".."
gcloud init
export GCP_TOKEN=$(gcloud auth print-access-token)
Initialize
To access the object storage module, run the following command:
.objstor:use`kx.objstor
.objstor.init[]
Sample request
For example, running hcount on an object with key data/2025.02.02/tbl/b in bucket mybucket within s3 would be:
hcount `:s3://mybucket/data/2025.02.02/tbl/b
Reading data
-
Start KDB-X with the object store module loaded.
-
Delve deeper into the buckets:
AWS_REGION must be set to eu-west-1
# set AWS_REGION=eu-west-1 before starting `q` q)key`:s3:// `s#`kxs-prd-cxt-twg-roinsightsdemo/kxinsights-marketplace-data q)key`$":s3://kxs-prd-cxt-twg-roinsightsdemo/kxinsights-marketplace-data/" `s#`_inventory`db`par.txt`sym`symlinks q)key`$":s3://kxs-prd-cxt-twg-roinsightsdemo/kxinsights-marketplace-data/db/" `s#`2020.01.01`2020.01.02`2020.01.03`2020.01.06`2020.01.07... q)key`$":s3://kxs-prd-cxt-twg-roinsightsdemo/kxinsights-marketplace-data/db/2020.01.01/trade/" `s#`.d`cond`ex`price`size`stop`sym`time q)get`$":s3://kxs-prd-cxt-twg-roinsightsdemo/kxinsights-marketplace-data/db/2020.01.01/trade/.d" `sym`time`price`size`stop`cond`ex# set AZURE_STORAGE_ACCOUNT=kxinsightsmarketplace # set AZURE_STORAGE_SHARED_KEY to value returned by below # az storage account keys list --account-name kxinsightsmarketplace --resource-group kxinsightsmarketplace q)key`:ms:// ,`data q)key`$":ms://data/" `s#`_inventory`db`par.txt`sym q)key`$":ms://data/db/" `s#`2020.01.01`2020.01.02`2020.01.03`2020.01.06`2020.01.07... q)key`$":ms://data/db/2020.01.01/trade/" `s#`.d`cond`ex`price`size`stop`sym`time q)get`$":ms://data/db/2020.01.01/trade/.d" `sym`time`price`size`stop`cond`exq)key`:gs:// `s#`kxinsights-marketplace-data q)key`$":gs://kxinsights-marketplace-data/" `s#`_inventory`db`par.txt`sym q)key`$":gs://kxinsights-marketplace-data/db" `s#`2020.01.01`2020.01.02`2020.01.03`2020.01.06`2020.01.07`2020.01.08`2020.01.. q)key`$":gs://kxinsights-marketplace-data/db/2020.01.01/trade/" `s#`.d`cond`ex`price`size`stop`sym`time q)get`$":gs://kxinsights-marketplace-data/db/2020.01.01/trade/.d" `sym`time`price`size`stop`cond`ex -
Other read operations work as if the file were on block storage.
```q q)hcount `$":s3://kxs-prd-cxt-twg-roinsightsdemo/kxinsights-marketplace-data/db/2020.01.01/trade/sym" 2955832 q)-21!`$":s3://kxs-prd-cxt-twg-roinsightsdemo/kxinsights-marketplace-data/db/2020.01.01/trade/sym" compressedLength | 69520 uncompressedLength| 2955832 algorithm | 2i logicalBlockSize | 17i zipLevel | 6i q)read1`$":s3://kxs-prd-cxt-twg-roinsightsdemo/kxinsights-marketplace-data/db/2020.01.01/trade/.d" 0xff010b000000000073796d0074696d650070726963650073697a650073746f7000636f6e640.. ``````q q)hcount `$":ms://data/db/2020.01.01/trade/sym" 2955832 q)-21!`$":ms://data/db/2020.01.01/trade/sym" compressedLength | 69520 uncompressedLength| 2955832 algorithm | 2i logicalBlockSize | 17i zipLevel | 6i q)read1`$:ms://data/db/2020.01.01/trade/.d" 0xff010b000000000073796d0074696d650070726963650073697a650073746f7000636f6e640.. ``````q q)hcount `$":gs://kxinsights-marketplace-data/db/2020.01.01/trade/sym" 2955832 q)-21!`$":gs://kxinsights-marketplace-data/db/2020.01.01/trade/sym" compressedLength | 69520 uncompressedLength| 2955832 algorithm | 2i logicalBlockSize | 17i zipLevel | 6i q)read1`$":gs://kxinsights-marketplace-data/db/2020.01.01/trade/.d" 0xff010b000000000073796d0074696d650070726963650073697a650073746f7000636f6e640.. ```
Cloud data
In this section, learn how to set up cloud access and query cloud data.
Setup
Public datasets are available for AWS, Azure, and GCP.
Choose the provider that matches your environment.
# set AZURE_STORAGE_ACCOUNT=kxinsightsmarketplace
# set AZURE_STORAGE_SHARED_KEY to value returned by below
# az storage account keys list --account-name kxinsightsmarketplace --resource-group kxinsightsmarketplace
$ gsutil ls gs://kxinsights-marketplace-data/
gs://kxinsights-marketplace-data/sym
gs://kxinsights-marketplace-data/db/
$ gsutil ls gs://kxinsights-marketplace-data/db/ | head -n 5
gs://kxinsights-marketplace-data/db/2020.01.01/
gs://kxinsights-marketplace-data/db/2020.01.02/
gs://kxinsights-marketplace-data/db/2020.01.03/
gs://kxinsights-marketplace-data/db/2020.01.06/
gs://kxinsights-marketplace-data/db/2020.01.07/
# set AWS_REGION=eu-west-1 before starting `q`
$ aws s3 ls s3://kxs-prd-cxt-twg-roinsightsdemo/kxinsights-marketplace-data/
PRE db/
2021-03-10 21:19:33 42568 sym
$ aws s3 ls s3://kxs-prd-cxt-twg-roinsightsdemo/kxinsights-marketplace-data/db/ | head -n 5
PRE 2020.01.01/
PRE 2020.01.02/
PRE 2020.01.03/
PRE 2020.01.06/
PRE 2020.01.07/
$ az storage blob list --account-name kxinsightsmarketplace \
--container-name data | jq -r '.[] | .name' | tail -n 5
db/2020.12.30/trade/size
db/2020.12.30/trade/stop
db/2020.12.30/trade/sym
db/2020.12.30/trade/time
sym
Create a sample using your selected cloud provider:
$ mkdir ~/db
$ gsutil cp gs://kxinsights-marketplace-data/sym ~/db/
$ tee ~/db/par.txt << EOF
> gs://kxinsights-marketplace-data/db
> EOF
$ mkdir ~/db
$ aws s3 cp s3://kxs-prd-cxt-twg-roinsightsdemo/kxinsights-marketplace-data/sym ~/db/
$ tee ~/db/par.txt << EOF
> s3://kxs-prd-cxt-twg-roinsightsdemo/kxinsights-marketplace-data/db
> EOF
$ mkdir ~/db
$ # example commands below:
$ az storage blob download --account-name kxinsightsmarketplace \
--container-name data --name sym --file sym
$ tee ~/db/par.txt << EOF
> ms://data/db
> EOF
This creates a standard HDB root directory where the partition in use is object storage.
Do not use trailing / on the object store location in par.txt.
$ tree ~/db/
/home/user/db/
├── par.txt
└── sym
Querying data
Run q on this directory:
$ ls
par.txt sym
$ q
q).objstor:use`kx.objstor
q).objstor.init[]
q)\l /home/user/db/
You should now be able to run queries:
q)tables[]
`s#`quote`trade
q)select count i by date from quote
date | x
----------| ---------
2018.09.04| 692639728
2018.09.05| 762152767
2018.09.06| 788482304
2018.09.07| 801891891
2018.09.10| 635192966
Network speed to cloud storage can limit performance, so it's helpful to create a cache using fast SSDs, local NVMe drives, or even shared memory.
If you set these environment variables before starting q, the cache is enabled.
The example below shows how caching improves the speed of subsequent requests by using \t:
$ export KX_OBJSTR_CACHE_PATH=/dev/shm/cache/
$ ls
par.txt sym
$ q
KDB+ 4.0 2021.06.12 Copyright (C) 1993-2021 Kx Systems
l64/ 20()core 64227MB XXXXXX XXXXXXXXXX 127.0.1.1 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX KXCE XXXXXXXX
q).objstor:use`kx.objstor
q).objstor.init[]
q)\l /home/user/db
q)\t select count i by date from quote
4785
q)\t select count i by date from quote
0
Run the kxreaper application to prune the cache automatically if it gets full. The following examples show how to run Kxreaper to monitor the cache directory and limit the space used to 10000MB:
$ kxreaper "$KX_OBJSTR_CACHE_PATH" 10000 &
Using an S3-compatible object store
In addition to AWS, Azure, and GCP, the object storage module works with S3-compatible services such as MinIO.
This is useful if you want to test locally or deploy against a private object store.
At minimum, you need to set two environment variables so that KDB-X can connect:
export KX_S3_ENDPOINT=http://localhost:9000
export KX_S3_USE_PATH_REQUEST_STYLE=1
Once set, you can reference the bucket using the :s3:// prefix in par.txt or directly in queries, just like native S3.
Refer to the examples for a full walkthrough with MinIO.
Performance considerations
Cloud storage has higher latency. Use caching, secondary threads, compression, and inventory files to improve performance.
Caching
Due to the high latency of cloud storage, KDB-X offers the ability to configure a cache to locally store the requests results on high-performance disk.
Cloud vendors charge for object storage as a combination of volume stored, per retrieval request, and volume egress. Using the built-in compression and the cache can help to reduce these costs.
Secondary threads
The way to achieve concurrency with these high-latency queries is with secondary threads, through the command line option -s. It is expected that the larger the number of secondary threads, irrespective of CPU core count, the better the performance of object storage. Conversely the performance of cached data appears to be better if the secondary-thread count matches the CPU core count. A balance is to be found. We expect in future to improve the thread usage for these requests.
Compression
Due to the cost of storage, possible egress costs, high-latency and low bandwidth, we recommend storing data on cloud object storage using compression.
Metadata load times
Metadata load times for a HDB process can be improved by adding an inventory file to the storage account.
The file must be gzipped JSON, as an array of {Key:string,Size:int} objects. For example:
[
{
"Key": "db/2020.12.30/trade/size",
"Size": 563829
},
{
"Key": "db/2020.12.30/trade/stop",
"Size": 49731
},
{
"Key": "db/2020.12.30/trade/sym",
"Size": 69520
},
{
"Key": "db/2020.12.30/trade/time",
"Size": 1099583
}
]
The inventory file can be created and uploaded to the storage account using the following commands.
aws --output json s3api list-objects --bucket kxs-prd-cxt-twg-roinsightsdemo/kxinsights-marketplace-data --prefix 'db/' --query 'Contents[].{Key: Key, Size: Size}' | gzip -9 -c > aws.json.gz
aws s3 cp aws.json.gz s3://kxs-prd-cxt-twg-roinsightsdemo/kxinsights-marketplace-data/_inventory/all.json.gz
az storage blob list --account-name kxinsightsmarketplace --container-name data | jq '.[] | {Key: .name , Size: .properties.contentLength }' | jq -s '.' | gzip -9 -c > azure.json.gz
az storage blob upload --account-name kxinsightsmarketplace \
--container-name data --name _inventory/all.json.gz --file azure.json.gz
gsutil ls -lr gs://kxinsights-marketplace-data/db/*/*/* | awk '{printf "{ \"Key\": \"%s\" , \"Size\": %s }\n", $3, $1}' | head -n -1 | jq -s '.' | sed 's/gs:\/\/kxinsights-marketplace-data\/db\///g' | gzip -9 -c > gcp.json.gz
gsutil cp gcp.json.gz gs://kxinsights-marketplace-data/_inventory/all.json.gz
User can control which file is used as inventory via env var
export KX_OBJSTR_INVENTORY_FILE=_inventory/all.json.gz
The reading of the inventory file bypasses the cache, and to avoid cache invalidation issues, is not readable explicitly.
Symbolic links
Symbolic links can be simulated through an optional "Path" field in an entry, which represents the realpath that the Key should resolve to. The Path must reside in the same bucket, and the Size field must represent that of the Path. For example:
[
{
"Key": "db/2020.12.30/trade/size",
"Size": 563829,
"Path": "newdb/2020.12.30/trade/size"
},
{
"Key": "db/2020.12.30/trade/stop",
"Size": 49731,
"Path": "newdb/2020.12.30/trade/stop"
},
{
"Key": "db/2020.12.30/trade/sym",
"Size": 69520,
"Path": "newdb/2020.12.30/trade/sym"
},
{
"Key": "db/2020.12.30/trade/time",
"Size": 1099583,
"Path": "newdb/2020.12.30/trade/time"
}
]