Rest and qIPC Query

This page compares REST and qIPC query interfaces in kdb Insights, highlighting their performance characteristics and suitable use cases.

kdb Insights includes services for persisting, and accessing data.

The Service Gateway offers an authenticated, secure and OpenAPI compatible API to retrieve data from the system.

An operator is used to dynamically provision data access processes, and storage manager nodes.

Deployment

To query data, ensure a database is configured and data publishers are deployed.

To configure and deploy a database, refer to the configuration and deployment guides in the kdb Insights documentation.

SQL Usage

To use SQL, you must augment the database to set queryEnvironment. For more information, refer to the SQL documentation.

Querying data

All DA processes come equipped with an API for simple data retrieval, called .kxi.getData.

To query data using this API, you can make a REST API call to servicegateway/kxi/getData.

A query minimally includes the name of the table, start timestamp, end timestamp, and one or more user-defined labels.

For an example user-defined label assetClass:

START=$(date "+%Y.%m.%dD00:00:00.000000000")
END=$(date "+%Y.%m.%dD23:59:59.999999999")

curl -X POST --header "Content-Type: application/json"\
    --header "Accept: application/json"\
    --data "{\"table\":\"trades\",\"startTS\":\"$START\",\"endTS\":\"$END\",\"assetClass\": \"manufacturing\"}"\
    "https://${INSIGHTS_HOSTNAME}/servicegateway/kxi/getData"

The getData API supports additional parameters for reducing the columns returned, and basic filtering.

For more details, refer to the getData API page.

Warning

Labels are case sensitive.

Ensure that the label key/value pairs provided match the labels assigned when the database was applied.

For an overview of the REST API, refer to the REST API page.

Using QIPC responses

By including the HTTP Accept header "application/octet-stream", you can get query results as a serialized QIPC byte array.

Accept header "application/struct-text" returns data via JSON, in a structured text format.

This header allows for significantly reduced overhead and faster response times at the cost of some minor complexity when handling the results.

By using any of the kdb+ as client interfaces, you can deserialize the responses, and then process as normal.

Added Bonus

Using this strategy has the additional benefit of preserving type information. JSON responses have the disadvantage of converting all numbers to floats, and may truncate the precision of timestamps.

Each of the following examples assumes you have the INSIGHTS_HOSTNAME defined in your environment.

curl and kdb Insights, octet-streamcurl and kdb Insights, struct-textkdb Insights REST ClientJavaScript

# Save results to results.dat
curl -X POST --header "Content-Type: application/json"\
    --header "Accept: application/octet-stream"\
    -o results.dat\
    --data "{\"table\":\"trades\"}"\
    "https://${INSIGHTS_HOSTNAME}/servicegateway/kxi/getData"

Start q and deserialize the response:

-9!read1`:results.dat

# Save results to results.dat
curl -X POST --header "Content-Type: application/json"\
    --header "Accept: application/struct-text"\
    -o results.dat\
    --data "{\"table\":\"trades\"}"\
    "https://${INSIGHTS_HOSTNAME}/servicegateway/kxi/getData"

Start q and deserialize the response:

.j.k "c"$read1`:results.dat

URL:"https://",getenv[`INSIGHTS_HOSTNAME],"/servicegateway/kxi/getData";
headers:("Accept";"Content-Type";"Authorization")!(
    "application/octet-stream";
    "application/json");    

body:.j.j enlist[`table]!enlist "trades";
resp:.kurl.sync (URL; `POST; `binary`headers`body!(1b;headers;body));
if[200 <> first resp; 'last resp];
show -9!last resp

Ensure your copy of c.js has decompression support: // 2021.04.05 added decompress support

const https = require('https');
const c = require('./c');

const options = {
    host    : process.env.INSIGHTS_HOSTNAME,
    path    : '/servicegateway/kxi/getData',
    method  : 'POST',
    headers : {
        'Accept'      : 'application/octet-stream',
        'Content-Type'  : 'application/json'
    }
};


let request = https.request(options, (res) => {
    res.setEncoding('binary');
    if (res.statusCode !== 200) {
        console.error(`Non 200 error code ${res.statusCode}`)
        res.resume();
        return;
    }
    let chunks = [];
    res.on('data', (chunk) => {
        chunks.push(Buffer.from(chunk, 'binary'));
    });
    res.on('end', () => {
        let b = Buffer.concat(chunks);
        console.log(c.deserialize(b));
    });
    });
request.write(JSON.stringify(body));
request.end();
request.on('error', (err) => {
    console.error(`Encountered an error trying to make a request: ${err.message}`);
});

User Defined Analytics

User Defined Analytics (UDAs) enable you to define new APIs that are callable through the Service Gateway (SG). UDAs augment the standard set of APIs available in the kdb Insights system with application logic specific to your business needs.

For instructions, refer to Installing UDAs in the kdb Insights documentation.

Calling UDAs

UDAs are callable through the servicegateway. For configuration details, refer to the Using UDAs documentation.

If UDAs are configured, they are included in a getMeta request.

To call a UDA named example/api using REST, use the following format:

# Example that uses UDA on data within the current hour
startTS=$(date -u '+%Y.%m.%dD%H:00:00')
endTS=$(date -u '+%Y.%m.%dD%H:%M%:%S')
curl -X POST --header "Content-Type: application/json"\
    --header "Accepted: application/json"\        
    --data "{\"table\": \"trades\", \"columns\":[\"sym\",\"price\"], \"startTS\": \"$startTS\", \"endTS\": \"$endTS\"}"\
    "https://${INSIGHTS_HOSTNAME}/servicegateway/example/api"

Note

If you use the scope.tier to specify a single DAP as the target of a request no aggregation is performed.

By default, when more than one DAP is targeted the aggregation method is raze, unless the UDA includes an aggregation function. You can override the aggregation by defining a custom aggFn function as part of the request that overrides this.

# Example that uses custom aggregation API on `getData` within the current hour
startTS=$(date -u '+%Y.%m.%dD%H:00:00')
endTS=$(date -u '+%Y.%m.%dD%H:%M%:%S')
curl -X POST --header "Content-Type: application/json"\
    --header "Accepted: application/json"\   
    --data "{\"table\": \"trades\", \"columns\":[\"sym\",\"price\"],  \"startTS\": \"$startTS\", \"endTS\": \"$endTS\", \"opts\": {\"aggFn\":\"avPrice\"}}"\
    "https://${INSIGHTS_HOSTNAME}/servicegateway/example/api"

Note

If the scope parameter is mandatory you must set it to the name of the assembly that the UDA is defined in. This ensures the appropriate aggregator is used when running the query.

START=$(date "+%Y.%m.%dD00:00:00.000000000")
END=$(date "+%Y.%m.%dD23:59:59.999999999")
ASSEMBLY="my-assembly"
curl -X POST --header "Content-Type: application/json"\
    --header "Accepted: application/json"\
    --data "{\"table\":\"table\",\"startTS\":\"$START\",\"endTS\":\"$END\",\"scope\":{\"assembly\":$ASSEMBLY}\}"\
    "https://${INSIGHTS_HOSTNAME}/servicegateway/namespace/name"

Data tiers and life-cycle

Databases in insights are distributed across tiers. Data migrates across tiers as the data ages.

Data tiers are configured in the database specification, including mounts and data retention lifecycle settings.

Newly received data can be made available in-memory for a number of days, before being migrated to on-disk storage or cloud storage. This enables a faster response time for recent data.

An example mount description detailing that the IDB/HDB are to be kept in a Rook CephFS partition, under the root /data/db.

  mounts:
    rdb:
      type: stream
      baseURI: none
      partition: none
    idb:
      type: local
      baseURI: file:///data/db/idb
      partition: ordinal
      volume:
        storageClass: "rook-cephfs"
        accessModes:
          - ReadWriteMany
    hdb:
      type: local
      baseURI: file:///data/db/hdb
      partition: date
      dependency:
      - idb
      volume:
        storageClass: "rook-cephfs"
        accessModes:
          - ReadWriteMany

An example showing corresponding data tiering configuration, saved under the Storage Manager elements.

Intra-day data would migrate from memory, to on disk every ten hours, again every midnight, and be retained for 3 months.

  elements:
    sm:
      source: south
      tiers:
        - name: streaming
          mount: rdb
        - name: interval
          mount: idb
          schedule:
            freq: 00:10:00
        - name: recent
          mount: hdb
          schedule:
            freq: 1D00:00:00
            snap:   01:35:00
          retain:
            time: 3 Months

For a full detail description of data tiering, such as data compression, refer to the Elements section of the Database configuration page.

Querying is tier agnostic.

Do not specify a tier when accessing data, instead use labels to query data.

Troubleshooting

For troubleshooting information, refer to Troubleshooting.