Examples

Code snippets to help use kdb+ and cloud storage.

To run some of the commands it will be necessary to have your cloud vendor cli installed.

Amazon Web Services Microsoft Azure Google Cloud Platform

Creating data on Cloud Storage

In order for data to to be migrated to the cloud it must first be staged locally on a POSIX filesystem. This is because KX Insights Cores does not support writing to cloud storage using the traditional set and other write functions.

To migrate the below database to a cloud storage account. Using a sample database

d:2021.09.01+til 20
{[d;n]sv[`;.Q.par[`:test/db/;d;`trade],`]set .Q.en[`:test/;([]sym:`$'n?.Q.A;time:.z.P+til n;price:n?100f;size:n?50)];}[;10000]each d

This will create the below structure

test/.
├── db
│   ├── 2021.09.01
│   ├── 2021.09.02
│   ├── 2021.09.03
│   ├── 2021.09.04
│   ├── 2021.09.05
│   ├── 2021.09.06
│   ├── 2021.09.07
│   ├── 2021.09.08
│   ├── 2021.09.09
│   ├── 2021.09.10
│   ├── 2021.09.11
│   ├── 2021.09.12
│   ├── 2021.09.13
│   ├── 2021.09.14
│   ├── 2021.09.15
│   ├── 2021.09.16
│   ├── 2021.09.17
│   ├── 2021.09.18
│   ├── 2021.09.19
│   └── 2021.09.20
└── sym

The below functions can be used to create a storage account and copy the database to the newly created storage account

AWS Azure GCP

Documentation provided here

For example

## create bucket
aws s3 mb s3://mybucket --region us-west-1

## copy database to bucket
aws s3 cp test/* s3://mybucket/ --recursive

Documentation provided here

For example

## create bucket
az group create --name <resource-group> --location <location>
az storage account create \
    --name <storage-account> \
    --resource-group <resource-group> \
    --location <location> \
    --sku Standard_ZRS \
    --encryption-services blob

az ad signed-in-user show --query objectId -o tsv | az role assignment create \
    --role "Storage Blob Data Contributor" \
    --assignee @- \
    --scope "/subscriptions/<subscription>/resourceGroups/<resource-group>/providers/Microsoft.Storage/storageAccounts/<storage-account>"

az storage container create \
    --account-name <storage-account> \
    --name <container> \
    --auth-mode login

## copy database to bucket
az storage blob upload \
    --account-name <storage-account> \
    --container-name <container> \
    --name helloworld \
    --file helloworld \
    --auth-mode login

Documentation provided here

For example

## create bucket
gsutil mb -p PROJECT_ID -c STORAGE_CLASS -l BUCKET_LOCATION -b on gs://BUCKET_NAME

## copy database to bucket
gsutil cp -r OBJECT_LOCATION gs://DESTINATION_BUCKET_NAME/

Deleting data from Cloud Storage

Deleting data on Cloud Storage should be a rare occurence but in the event that such a change is needed, the below steps should be followed

Offline any hdb reader processes that are currently using the storage account
Remove any caches created by the kxreaper application
Delete the data from the storage account using

AWS Azure GCP

Deleting objects

Batch Deleting objects

Deleting objects

Recreate the inventory file(if used)
Online the reader processes making sure they are reloaded to pick up the new inventory file and drop any metadata caches using drop command

Changing data on Cloud Storage

Altering data e.g. changing types, adding columns etc will require the same steps as deleting data. Once the reader processes have been taken offline, the changes will be able to happen safely bearing in mind that in order to change data, it will first need to be copied from the storage account, amended and the copied back to the appropriate path using a cloud cli copy command.

Creating inventory JSON

Instructions to create and use the inventory file can be found here

Combining Cloud and Local Storage in a single HDB

The addition of the object store library allows clients to extend their tiering strategies to cloud storage. In some instances it will be necessary to query data that has some partitions on a local POSIX filesystem and other partitions on cloud storage. To give a kdb+ process access to both datasets the par.txt can be set as below

AWS Azure GCP

s3://mybucket/db
/path/to/local/partitions

Note: if multple storage accounts are added they must be in the same AWS region.

ms://mybucket/db
/path/to/local/partitions

gs://mybucket/db
/path/to/local/partitions

Note that multiple local filesystems and storage accounts can be added to par.txt.

Multiple HDB processes and caching

In many kdb+ architectures multiple HDB processes are used to handle load and scale horizontally. All instances of a HDB that use the same storage account can also use the same cache directory by setting the KX_OBJSTR_CACHE_PATH environment variable in each process. A single reaper process should then be run to control the amount of data contained in the cache.

Note: if using NAS, the reaper process should be on the same machine as the HDB reader process and for this reason is not a recommended setup. For optimal performance the cache should be located on local attached storage.