Send Feedback
Skip to content

Object Storage and KDB-X Examples

This page provides example code snippets to help you use the object storage module on KDB-X.

As a prerequisite, you must have your cloud vendor CLI installed to run some of the commands.

Creating data on Cloud Storage

In order for data to be migrated to the cloud, it must first be staged locally on a POSIX filesystem. This is because KDB-X does not support writing to cloud storage using the traditional set and other write functions.

To migrate the below database to a cloud storage account using a sample database created in KDB-X:

q)d:2021.09.01+til 20
q){[d;n]sv[`;.Q.par[`:test/db/;d;`trade],`]set .Q.en[`:test/;([]sym:`$'n?.Q.A;time:.z.P+til n;price:n?100f;size:n?50)];}[;10000]each d

This creates the following structure:

test/.
├── db
│   ├── 2021.09.01
│   ├── 2021.09.02
│   ├── 2021.09.03
│   ├── 2021.09.04
│   ├── 2021.09.05
│   ├── 2021.09.06
│   ├── 2021.09.07
│   ├── 2021.09.08
│   ├── 2021.09.09
│   ├── 2021.09.10
│   ├── 2021.09.11
│   ├── 2021.09.12
│   ├── 2021.09.13
│   ├── 2021.09.14
│   ├── 2021.09.15
│   ├── 2021.09.16
│   ├── 2021.09.17
│   ├── 2021.09.18
│   ├── 2021.09.19
│   └── 2021.09.20
└── sym

The below functions can be used to create a storage account and copy the database to the newly created storage account

Documentation provided here

For example

## create bucket
aws s3 mb s3://mybucket --region us-west-1

## copy database to bucket
aws s3 cp test/* s3://mybucket/ --recursive

Documentation provided here

For example

## create bucket
az group create --name <resource-group> --location <location>
az storage account create \
    --name <storage-account> \
    --resource-group <resource-group> \
    --location <location> \
    --sku Standard_ZRS \
    --encryption-services blob

az ad signed-in-user show --query objectId -o tsv | az role assignment create \
    --role "Storage Blob Data Contributor" \
    --assignee @- \
    --scope "/subscriptions/<subscription>/resourceGroups/<resource-group>/providers/Microsoft.Storage/storageAccounts/<storage-account>"

az storage container create \
    --account-name <storage-account> \
    --name <container> \
    --auth-mode login

## copy database to bucket
az storage blob upload \
    --account-name <storage-account> \
    --container-name <container> \
    --name helloworld \
    --file helloworld \
    --auth-mode login

Documentation provided here

For example

## create bucket
gsutil mb -p PROJECT_ID -c STORAGE_CLASS -l BUCKET_LOCATION -b on gs://BUCKET_NAME

## copy database to bucket
gsutil cp -r OBJECT_LOCATION gs://DESTINATION_BUCKET_NAME/

Deleting data from Cloud Storage

Deleting data on Cloud Storage should be a rare occurrence but in the event that such a change is needed, the below steps should be followed

  • Offline any hdb reader processes that are currently using the storage account

  • Remove any caches created by the kxreaper application

  • Delete the data from the storage account using

Changing data on Cloud Storage

Altering data e.g. changing types, adding columns etc will require the same steps as deleting data. Once the reader processes have been taken offline, the changes will be able to happen safely bearing in mind that in order to change data, it will first need to be copied from the storage account, amended and the copied back to the appropriate path using a cloud cli copy command.

Creating inventory JSON

Refer to the inventory file documentation for instructions on how to create and use the inventory file.

Combining Cloud and Local Storage in a single HDB

The addition of the object store library allows clients to extend their tiering strategies to cloud storage. In some instances, it is necessary to query data that has some partitions on a local POSIX filesystem and other partitions on cloud storage. To give a KDB-X process access to both datasets, the par.txt can be set as follows:

s3://mybucket/db
/path/to/local/partitions

Note: if multiple storage accounts are added they must be in the same AWS region.

ms://mybucket/db
/path/to/local/partitions
gs://mybucket/db
/path/to/local/partitions

Note that multiple local filesystems and storage accounts can be added to par.txt.

S3 compatible object store example

This example demonstrates how data can be stored and queried from any object store that's compatible with the S3 interface.

The example uses Docker to create a local object store using the MinIO implementation of an S3-compatible server.

Prerequisites

The AWS CLI must be installed to work through this example.

  1. Start the MinIO server using Docker or Podman.

    docker run -it --rm -p 9000:9000 -p 9001:9001 -e "MINIO_ROOT_USER=<insert minio user>" -e "MINIO_ROOT_PASSWORD=<insert minio password>" minio/minio server /data --console-address ":9001"
    
  2. Set the AWS credentials by executing aws configure. The AWS Access Key ID should be set to the MINIO_ROOT_USER and AWS Secret Access Key to MINIO_ROOT_PASSWORD (refer to the command above).

  3. Create some test data using KDB-X into the directory ./test

    q)d:2021.09.01+til 20
    q){[d;n]sv[`;.Q.par[`:test/db/;d;`trade],`]set .Q.en[`:test/;([]sym:`$'n?.Q.A;time:.z.P+til n;price:n?100f;size:n?50)];}[;10000]each d
    
  4. Create a bucket in MinIO called test

    aws --endpoint-url http://127.0.0.1:9000 s3 mb s3://test
    
  5. Copy the generated test data to the test bucket

    aws --endpoint-url http://127.0.0.1:9000 s3 sync test/db s3://test
    aws --endpoint-url http://127.0.0.1:9000 s3 ls s3://test
    
  6. Create a HDB root directory called dbroot that uses data from the test bucket

    mkdir dbroot
    cp test/sym dbroot/
    echo "s3://test" > dbroot/par.txt
    
  7. Start a q process as below, loading the dbroot directory that uses the test bucket. Environment variables are set to change the queries to use the MinIO server.

    export KX_S3_ENDPOINT=http://127.0.0.1:9000
    export KX_S3_USE_PATH_REQUEST_STYLE=1
    q
    
  8. Load the dbroot directory and query the data in the bucket, which automatically retrieves the required data from the S3-compatible object store (MinIO)

    q).objstor:use`kx.objstor
    q).objstor.init[]
    q)\l dbroot/
    q)tables[]
    ,`trade
    q)select count sym from trade
    sym
    ------
    200000
    q)5#select count sym by date from trade
    date      | sym
    ----------| -----
    2021.09.01| 10000
    2021.09.02| 10000
    2021.09.03| 10000
    2021.09.04| 10000
    2021.09.05| 10000
    

To see which URLs are used in the query, export KX_TRACE_OBJSTR=1 and restart the q process.