kdb+ Initial Import Quickstart

This page explains how to import an existing kdb+ database into kdb Insights.

If you have an existing large kdb+ database, you should use initial import to add your data to a kdb Insights database. This is generally faster than using a batch ingest.

It is important to draw a clear distinction between an initial import and a batch ingest.

Initial Import: can only be used with a database that has never been initialized before and consequently does not contain any data.
Batch Ingest: can be used to add data to an already initialized and running database that may also contain data.

Before importing any data, ensure you have read the prerequisites to determine that your database is suitable for an initial import ingest.

Quickstart steps

Create temporary structure

First, create a temporary structure.

TEMP=`mktemp -d`
export TEMP # root directory to store data and config for example
export DATA=$TEMP/data/ # directory where test data will be created
export SCHEMA=$TEMP/schema/ # directory where schema will be stored
export INSIGHTS_HOSTNAME=insights.example.com  # kdb+ Insights URL
mkdir -pv $DATA $SCHEMA

If needed, create some sample data. The following sample creates a kdb+ database containing trade and quote tables.

dbroot:hsym`$getenv`DATA;
d:2021.09.01+til 20;
{[d;n]sv[`;.Q.par[dbroot;d;`trade],`]set .Q.en[dbroot;([]sym:`$'n?.Q.A;time:d+.z.N+til n;price:n?100f;size:n?50)];}[;10000]each d;
{[d;n]sv[`;.Q.par[dbroot;d;`quote],`]set .Q.en[dbroot;([]sym:`$'n?.Q.A;time:d+.z.N+til n;ask:n?100f;asize:n?50;bid:n?100f;bsize:n?50)];}[;10000]each d;

This creates the following structure.

/tmp/test/.
├── db
│   ├── 2021.09.01
│   ├── 2021.09.02
│   ├── 2021.09.03
│   ├── 2021.09.04
│   ├── 2021.09.05
│   ├── 2021.09.06
│   ├── 2021.09.07
│   ├── 2021.09.08
│   ├── 2021.09.09
│   ├── 2021.09.10
│   ├── 2021.09.11
│   ├── 2021.09.12
│   ├── 2021.09.13
│   ├── 2021.09.14
│   ├── 2021.09.15
│   ├── 2021.09.16
│   ├── 2021.09.17
│   ├── 2021.09.18
│   ├── 2021.09.19
│   └── 2021.09.20
└── sym

Copy the data

Next, copy the data using the following steps. Create a database tier(s) storage to store the data.

You may need to change the storage class for the PVC depending on what is available on your cluster.

YAML to create a PVC:

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: $KX_DATABASE_NAME-hdb
spec:
  storageClassName: rook-cephfs
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 20Gi
EOF

Create a pod, which mounts the local storage tier.

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: $KX_DATABASE_NAME-copy-pod
spec:
  securityContext:
    fsGroup: 65534
    fsGroupChangePolicy: OnRootMismatch
    runAsNonRoot: true
    runAsUser: 65534
  containers:
    - name: $KX_DATABASE_NAME-copy-container
      image: amazonlinux:2
      stdin: true
      command: [ "/bin/bash", "-c" ]
      args:
        - |
          yum install -y rsync
          mkdir /data/db/hdb/data/
          while true; do sleep 30; done;
      volumeMounts:
        - mountPath: /data/db/hdb
          name: $KX_DATABASE_NAME-storage
  volumes:
  - name: $KX_DATABASE_NAME-storage
    persistentVolumeClaim:
      claimName: $KX_DATABASE_NAME-hdb
EOF

Copy kdb+ data to the tiers created.

bash krsync.sh $DATA $KX_DATABASE_NAME-copy-pod /data/db/hdb/data/

Deploy the database

Now the data is imported, you can deploy your database.

Create a kdb+ Insights schema for the database to be imported to.
```
q create-schema.q -db $DATA -fmt package -out $SCHEMA
```
Edit the schema of the trade and quote tables to set the prtnCol.

Create a kdb+ Insights database with the new database configured.

The following example shows a database configuration with a schema:

Database Assembly

name: Trade data
labels:
  region: amer

dbSettings:
  encryption:
    encryptAll: false

tables:
  quote: 
    type: "partitioned"
    columns: 
      - name: "sym"
        type: "symbol"
      - name: "time"
        type: "timestamp"
      - name: "ask"
        type: "float"
      - name: "asize"
        type: "long"
      - name: "bid"
        type: "float"
      - name: "bsize"
        type: "long"
    prtnCol: "time"
  trade: 
    type: "partitioned"
    columns: 
      - name: "sym"
        type: "symbol"
      - name: "time"
        type: "timestamp"
      - name: "price"
        type: "float"
      - name: "size"
        type: "long"
    prtnCol: "time"

mounts:
  rdb:
    type: stream
    partition: none
    baseURI: none
  idb:
    type: local
    partition: ordinal
    baseURI: file:///data/db/idb
  hdb:
    type: local
    partition: date
    baseURI: file:///data/db/hdb

elements:
  dap:
    instances:
      rdb:
        mountName: rdb
      idb:
        mountName: idb
      hdb:
        mountName: hdb

  sm:
    source: stream
    tiers:
      - name: rdb
        mount: rdb
      - name: idb
        mount: idb
        schedule:
          freq: 0D00:10:00 # every 10 minutes
      - name: hdb
        mount: hdb
        schedule:
          freq: 1D00:00:00 # every day
          snap:   01:35:00 # at 1:35 AM
        retain:
          time: 2 days

Deploy the database

The deployment mechanism depends on which SDK microservices are being used. Refer to these examples for more details.

Inspect status API and logs to verify that the import worked.

kxi --realm insights --hostname $INSIGHTS_HOSTNAME auth login
export INSIGHTS_TOKEN=`kxi auth print-token` && curl -L "https://${INSIGHTS_HOSTNAME}/servicegateway/api/v1/database/$KX_DATABASE_NAME/status" -H "Authorization: Bearer $INSIGHTS_TOKEN"

{
  "state": "normal",
  "encryption": "encrypted",
  "progress": {
    "cmdCurrent": "",
    "cmdIndex": null,
    "cmdTotal": null,
    "subCurrent": "",
    "subIndex": null,
    "subTotal": null,
    "completeETA": ""
  },
  "memory": {
    "usage": 151252992,
    "limit": 2147483648
  }
}

curl -X POST --header "Content-Type: application/json"\
    --header "Accepted: application/json"\
    --header "Authorization: Bearer $INSIGHTS_TOKEN"\
    --data "{\"table\":\"trade\",\"startTS\":\"2021-09-01T00:00:00.000000000\",\"endTS\":\"2024-11-12T13:27:00.000000000\",\"agg\":[[\"cnt\",\"count\",\"time\"]]}"\
    "https://${INSIGHTS_HOSTNAME}/servicegateway/kxi/getData"

{
  "header": {
    "auditID": "e13b1e27-a82a-41e3-b6c9-cc08d4678367",
    "rcvTS": "2024-11-12T20:00:15.588000000",
    "corr": "aff5c6a1-ac83-4895-9df5-221b66c6e64d",
    "logCorr": "e13b1e27-a82a-41e3-b6c9-cc08d4678367",
    "http": "json",
    "api": ".kxi.getData",
    "agg": ":10.6.142.183:5070",
    "refVintage": -9223372036854776000,
    "rc": 0,
    "ac": 0,
    "ai": "",
    "limitApplied": false
  },
  "payload": [
    {
      "cnt": 200000
    }
  ]
}

Next steps

Now you have tried this process with the quickstart, you can attempt following the initial import process. If you experience any issues, refer to our troubleshooting page for help.