Skip to content

GetData

The getData API is a generic data retrieval API that can be executed against an arbitrary database table. It is intended to run on a Data Access Process and retrieve records from a configured database table using a specified dictionary of parameters.

Parameters

Name Required Type Description
table yes symbol Name of table to retrieve data from
startTS yes timestamp Inclusive start time of period of interest
endTS yes timestamp Exclusive end time of period of interest
inputTZ no symbol Timezone of startTS and endTS, UTC if not provided
outputTZ no symbol Timezone of output timestamp columns, UTC if not provided
filter no list[] List of triadic lists of the form (function;column name;parameter)
groupBy no symbol[] List of columns to group aggregation result by
agg no symbol[] List of triples of aggregations or columns to select. e.g. Aggregation dict example:(`c1`avg`price;`c2`sum`size), basic select example `price`size
fill no symbol How to handle nulls in the data. Supported values are zero and forward. The zero option fills numeric types with zeroes. The forward option fills nulls with previous, non-null entry.
temporality no symbol Sets the range of data in view for each day within the query. Support two types of temporality: snapshot which takes a continuous range of the data, and slice which returns data within the startTS and endTS dates that is between the times defined in the slice argument
slice no times[] Sets the time range to grab between each date in the startTS and endTS range when using a temporality of slice
sortCols no symbol[] Columns to sort result data on

In addition to the above parameters any of the labels defined in the assembly file of the DAPs can be used in the args dictionary to only select data from DAPs in assemblies matching that label. So for example if the there are DAPs belonging to two assemblies, one with a region label of canada another with a region label of europe, then a dictionary with a region value of canada would return only the canada data. The assembly label arguments are always optional, and will hit all assemblies when not specified.

Specifying timestamps

For getting the start and end times using bash, you may use date -u to print formatted dates:

# Get data within the hour
startTS=$(date -u '+%Y.%m.%dD%H:00:00')
endTS=$(date -u '+%Y.%m.%dD%H:%M%:%S')

Times may be extended to contain a variable number of trailing digits to specify milliseconds and nanosecond precision.

Example of valid dates are:

Example Implication
2022.01.01 2022.01.01D00:00:00.000000000
2022.01.01D01 2022.01.01D01:00:00.000000000
2022.01.01D01:01 2022.01.01D01:01:00.000000000
2022.01.01D01:01:01 2022.01.01D01:01:01.000000000
2022.01.01D01:01:01.1 2022.01.01D01:01:01.100000000
2022.01.01D01:01:01.11 2022.01.01D01:01:01.110000000
2022.01.01D01:01:01.111 2022.01.01D01:01:01.111000000
2022.01.01D01:01:01.1111 2022.01.01D01:01:01.111100000
2022.01.01D01:01:01.11111 2022.01.01D01:01:01.111110000
2022.01.01D01:01:01.111111 2022.01.01D01:01:01.111111000
2022.01.01D01:01:01.1111111 2022.01.01D01:01:01.111111100
2022.01.01D01:01:01.11111111 2022.01.01D01:01:01.111111110
2022.01.01D01:01:01.111111111 2022.01.01D01:01:01.111111111

ISO 8601 times can be used, of the format: +%Y-%m-%dT%H:%M%:%S. For example 2000-01-01T00:00:00.

filter

The filter parameter is used for applying custom filtering to the query. It's specified via a list of triples where each triple has the form (operator; column name; values).

(("<=";`realFloat;100f);("within";`qual;2 3h))
curl -X POST "http://${GATEWAY_HOSTNAME}/kxi/getData" \
    --header "Content-Type: application/json" \
    --header "Accepted: application/json" \
    --data "{\"table\":\"trace\",\"startTS\":\"$START\",\"endTS\":\"$END\",\"filter\":[[\"<=\",\"valFloat\",100],[\"within\",\"qual\",[0,2]]]}" \
curl -X POST "https://${INSIGHTS_HOSTNAME}/servicegateway/kxi/getData" \
    --header "Content-Type: application/json" \
    --header "Accepted: application/json" \
    --header "Authorization: Bearer $INSIGHTS_TOKEN" \
    --data "{\"table\":\"trace\",\"startTS\":\"$START\",\"endTS\":\"$END\",\"filter\":[[\"<=\",\"valFloat\",100],[\"within\",\"qual\",[0,2]]]}" \

The filters are applied in the order they are defined when selecting from the table.

Supported filter functions are:

in within < > like <> <= >= =

Nested filters are not currently supported.

groupBy

Specifies the columns which define the by-clause used in the selection of the data. Can be used in conjunction with agg parameter to perform aggregations across column values.

`sensorID`qual
curl -X POST "http://${GATEWAY_HOSTNAME}/kxi/getData" \
    --header "Content-Type: application/json" \
    --header "Accepted: application/json" \
    --data "{\"table\":\"trace\",\"startTS\":\"$START\",\"endTS\":\"$END\",\"groupBy\":[\"sensorID\",\"qual\"]}"
curl -X POST "https://${INSIGHTS_HOSTNAME}/servicegateway/kxi/getData" \
    --header "Content-Type: application/json" \
    --header "Accepted: application/json" \
    --header "Authorization: Bearer $INSIGHTS_TOKEN" \
    --data "{\"table\":\"trace\",\"startTS\":\"$START\",\"endTS\":\"$END\",\"groupBy\":[\"sensorID\",\"qual\"]}"

agg

The agg parameter specifies the columns and/or aggregations to return after selecting the data. Argument is specified as a list of symbols, or as a list of lists of symbols. Behaviour differs depending on whether the parameter is specified as a list or list of lists.

If specified as a list of symbols, then those columns are selected and returned from the table across DAPs.

`sensorID`readTS`qual`valFloat
curl -X POST "http://${GATEWAY_HOSTNAME}/kxi/getData" \
    --header "Content-Type: application/json" \
    --header "Accepted: application/json" \
    --data "{\"table\":\"trace\",\"startTS\":\"$START\",\"endTS\":\"$END\",\"agg\":[\"sensorID\",\"readTS\"]}"
curl -X POST "https://${INSIGHTS_HOSTNAME}/servicegateway/kxi/getData" \
    --header "Content-Type: application/json" \
    --header "Accepted: application/json" \
    --header "Authorization: Bearer $INSIGHTS_TOKEN" \
    --data "{\"table\":\"trace\",\"startTS\":\"$START\",\"endTS\":\"$END\",\"agg\":[\"sensorID\",\"readTS\"]}"

Alternatively the selection can be specified to aggregate the column with a new name before returning. This is specified as a list of lists where the first element is the new column name, the second element is the aggregation function to use, and the third is the column to aggregate.

Below does the equivalent aggregation to select c1:avg valFloat, c2:min qual from t. Where the c1, and c2 are the names of the returned columns are the average of valFloat and the minimum of qual column.

(`c1`avg`valFloat;`c2`min`qual)
curl -X POST "http://${GATEWAY_HOSTNAME}/kxi/getData" \
    --header "Content-Type: application/json" \
    --header "Accepted: application/json" \
    --data "{\"table\":\"trace\",\"startTS\":\"$START\",\"endTS\":\"$END\",\"groupBy\":[\"sensorID\"],\"agg\":[[\"c1\",\"avg\",\"valFloat\"]]}"
curl -X POST "https://${INSIGHTS_HOSTNAME}/servicegateway/kxi/getData" \
    --header "Content-Type: application/json" \
    --header "Accepted: application/json" \
    --header "Authorization: Bearer $INSIGHTS_TOKEN" \
    --data "{\"table\":\"trace\",\"startTS\":\"$START\",\"endTS\":\"$END\",\"groupBy\":[\"sensorID\"],\"agg\":[[\"c1\",\"avg\",\"valFloat\"]]}"

Supported aggregation functions are:

count first last sum prd min max all any var
avg wsum wavg var dev cov cor svar sdev scov

!!! warn Note that in the case of using the first and last aggregation may return inconsistent results in the case of cross-assembly queries or when late data is enabled. With cross-assembly queries the first/last results returned is dependant on the how the assembly partial results are returned. In the late data case, the result can be dependant on ordering of late data result.

fill

The fill parameter can be used to specify how the to handle null values. It takes a single symbol which can be one of forward or zero.

If forward is specified then the query will replace null values with the previous non-null value.

If zero is specified, null values are replaced with a 0 value of the appropriate type.

temporality

There are two types of temporality currently supported, and each determines the way the startTS and endTS arguments are interpreted.

The default value is snapshot and when specified getData will return a continuous chunk of data between the specified startTS and endTS arguments.

When a slice temporality is specified, the API will grab data between the time casted values of the startTS and endTS parameters, for each date within the startTS and endTS timestamp.

For example, these paremeters

startTS:     2021.01.01D10:00:00
endTS:       2021.01.04D13:00:00
temporality: `slice

Would return data between the times of 10:00:00 and 13:00:00, for each day between 2021.01.01 and 2021.01.04.

sortCols

Specifies the columns to sort on before returning the response.

`sensorID`readTS
curl -X POST "http://${GATEWAY_HOSTNAME}/kxi/getData" \
    --header "Content-Type: application/json" \
    --header "Accepted: application/json" \
    --data "{\"table\":\"trace\",\"startTS\":\"$START\",\"endTS\":\"$END\",\"groupBy\":[\"sensorID\"],\"agg\":[[\"c1\",\"avg\",\"valFloat\"]],\"sortCols\":[\"sensorID\"]}"
curl -X POST "https://${INSIGHTS_HOSTNAME}/servicegateway/kxi/getData" \
    --header "Content-Type: application/json" \
    --header "Accepted: application/json" \
    --header "Authorization: Bearer $INSIGHTS_TOKEN" \
    --data "{\"table\":\"trace\",\"startTS\":\"$START\",\"endTS\":\"$END\",\"groupBy\":[\"sensorID\"],\"agg\":[[\"c1\",\"avg\",\"valFloat\"]],\"sortCols\":[\"sensorID\"]}"

Querying reference data

Reference data is specified in the YAML by leveraging the foreign and primaryKey fields of the assembly schema. For example, consider the schema snippets below:

  tables:
    trade:
      type: partitioned
      blockSize: 10000
      prtnCol: time
      sortColsOrd: [sym]
      sortColsDisk: [sym]
      columns:
        - name: sym
          description: trade symbol
          type: symbol
          attrMem: grouped
          attrDisk: parted
          attrOrd: parted
        - name: code
          type: symbol
          foreign: markets.code  # Foreign key relationship
          description: Code for the market the stock was exchanged on
        - name: price
          type: float
        - name: time
          description: timestamp
          type: timestamp
    markets:
       description: reference market data
       type: splayed             # Splayed table, rather than partitioned
       updTsCol: updateTS
       primaryKeys:
        - code                   # Primary key column
       columns:
        - name: code
          type: symbol
          description: Market code
        - name: opCode
          type: string
          description: Market operating (parent) code
        - name: site
          type: string
          description: Market website
        - name: updateTS
          description: Timestamp of last mutation
          type: timestamp

Within this schema, the trade partitioned time series table has a foreign key relationship to the markets splayed reference data.

Splayed and basic reference data is held in RAM

The current value for every key seen for the reference data is held in RAM of every DAP to allow for efficient joins.

The reference data table can be queried as usual with the getData API by omitting the startTS and endTS arguments.

Additionally, when querying the time series data (here, trade) with the getData API, anywhere a column can be specified, references into the reference data can be supplied by specifying the column as <tableName>.<column>. As an example, to query by the market opCode:

curl ... \
    --header "Content-Type: application/json" \
    --header "Accepted: application/json" \
    --data "{\"table\":\"trade\",\"startTS\":\"$START\",\"endTS\":\"$END\",\"groupBy\":[\"markets.opCode\"],\"agg\":[[\"c1\",\"avg\",\"price\"]]}"

Or to include reference data columns in the result set:

curl ... \
    --header "Content-Type: application/json" \
    --header "Accepted: application/json" \
    --data "{\"table\":\"trade\",\"startTS\":\"$START\",\"endTS\":\"$END\",\"agg\":[\"price\", \"code\", \"markets.opCode\"]}"

Note

Aggregations that return reference columns must include the foreign key as well.

Note

kdb+ syntax to use time cast operations e.g. realTime.second, realTime.minute is not supported.

Response

See the possible response codes for success or failure information.

The response payload is a list that includes columns to sort on, return columns, by-clause to apply in the aggregator, secondary aggregation, and the table data.

For issues with the query or responses checkout the troubleshooting section here.

Examples

Assume there is a partitioned table called trace in all data access processes with the following schema:

name type description
sensorID int Sensor identifier
readTS timestamp Timestamp of reading, and the column which determines HDB partition
valFloat float Reading measurement
qual short Quality of the reading
updateTS timestamp Update timestamp of reading

For these examples, assume that we have a handle opened to the gateway assigned to h, a callback function called cb set to {res::(x;y)}.

For these examples, assume that a kdb Insights request token has been set in the INSIGHTS_TOKEN env var.

In this example we do a basic get call with only the table and time range specified:

q)args
table  | `trace
startTS| -0Wp
endTS  | 0Wp

q)res:neg[h](`.kxi.getData;args;`cb;()!()) / Make the API call and store the result in "res"
q)5#res[1] / Display only the first 5 records

sensorID readTS                        valFloat qual updateTS
----------------------------------------------------------------------------------
0        2021.02.09D00:00:00.000000000 283.0992 1    2021.02.10D19:02:25.332557500
0        2021.02.09D01:00:00.000000000 906.6764 5    2021.02.10D19:02:25.332557500
0        2021.02.09D02:00:00.000000000 546.6978 4    2021.02.10D19:02:25.332557500
0        2021.02.09D03:00:00.000000000 164.0931 4    2021.02.10D19:02:25.332557500
0        2021.02.09D04:00:00.000000000 663.1649 4    2021.02.10D19:02:25.332557500
curl -X POST "http://${GATEWAY_HOSTNAME}/kxi/getData" \
    --header "Content-Type: application/json" \
    --header "Accepted: application/json" \
    --data "{\"table\":\"trace\"}"
curl -X POST "https://${INSIGHTS_HOSTNAME}/servicegateway/kxi/getData" \
    --header "Content-Type: application/json" \
    --header "Accepted: application/json" \
    --header "Authorization: Bearer $INSIGHTS_TOKEN" \
    --data "{\"table\":\"trace\"}"

In this example we refine the time range by specifying a startTS:

q)args
table  | `trace
startTS| 2021.02.09D01:00:00.000000000
endTS  | 0Wp

q)res:neg[h](`.kxi.getData;args;`cb;()!()) / Make the API call and store the result in "res"
q)5#res[1] / Display only the first 5 records

sensorID readTS                        valFloat qual updateTS
----------------------------------------------------------------------------------
0        2021.02.09D01:00:00.000000000 906.6764 5    2021.02.10D19:02:25.332557500
0        2021.02.09D02:00:00.000000000 546.6978 4    2021.02.10D19:02:25.332557500
0        2021.02.09D03:00:00.000000000 164.0931 4    2021.02.10D19:02:25.332557500
0        2021.02.09D04:00:00.000000000 663.1649 4    2021.02.10D19:02:25.332557500
0        2021.02.09D05:00:00.000000000 537.0982 4    2021.02.10D19:02:25.332557500
curl -X POST "http://${GATEWAY_HOSTNAME}/kxi/getData" \
    --header "Content-Type: application/json" \
    --header "Accepted: application/json" \
    --data "{\"table\":\"trace\",\"startTS\":\"2021.02.09D01.00.00.000000000\"}"
curl -X POST "https://${INSIGHTS_HOSTNAME}/servicegateway/kxi/getData" \
    --header "Content-Type: application/json" \
    --header "Accepted: application/json" \
    --header "Authorization: Bearer $INSIGHTS_TOKEN" \
    --data "{\"table\":\"trace\",\"startTS\":\"2021.02.09D01.00.00.000000000\"}"

Here we specify select only a subset of the columns:

q)args
table  | `trace
startTS| -0Wp
endTS  | 0Wp
agg    | `sensorID`readTS`valFloat

q)res:neg[h](`.kxi.getData;args;`cb;()!()) / Make the API call and store the result in "res"
q)5#res[2] / Display only the first 5 records

sensorID readTS                        valFloat
-----------------------------------------------
0        2021.05.12D00:00:00.000000000 278.1034
0        2021.05.12D01:00:00.000000000 256.3526
0        2021.05.12D02:00:00.000000000 73.38756
0        2021.05.12D03:00:00.000000000 911.9714
0        2021.05.12D04:00:00.000000000 171.8527
curl -X POST "http://${GATEWAY_HOSTNAME}/kxi/getData" \
    --header "Content-Type: application/json" \
    --header "Accepted: application/json" \
    --data "{\"table\":\"trace\",\"startTS\":\"2021.02.09D01:00:00.000000000\",\"agg\":[\"sensorID\",\"readTS\",\"valFloat\"]}"
curl -X POST "https://${INSIGHTS_HOSTNAME}/servicegateway/kxi/getData" \
    --header "Content-Type: application/json" \
    --header "Accepted: application/json" \
    --header "Authorization: Bearer $INSIGHTS_TOKEN" \
    --data "{\"table\":\"trace\",\"startTS\":\"2021.02.09D01:00:00.000000000\",\"agg\":[\"sensorID\",\"readTS\",\"valFloat\"]}"

Example using the filter argument. Here we ask for sensorIDs equal to 10i, as well as filtering for qual values between 1 and 3h:

q)args
table  | `trace
startTS| -0Wp
endTS  | 0Wp
filter | (("=";`sensorID;1 3h);("within";`qual;1 3h))

q)res:neg[h] (`.kxi.getData;args;`cb;()!()) / Make the API call and store the result in "res"
q)5#res[1] / Display only the first 5 records

sensorID readTS                        valFloat qual updateTS
----------------------------------------------------------------------------------
10       2021.05.12D03:00:00.000000000 781.8174 3    2021.05.13D14:01:56.802276100
10       2021.05.12D04:00:00.000000000 951.9392 2    2021.05.13D14:01:56.802276100
10       2021.05.12D07:00:00.000000000 971.0995 3    2021.05.13D14:01:56.802276100
10       2021.05.12D13:00:00.000000000 103.5567 2    2021.05.13D14:01:56.802276100
10       2021.05.12D14:00:00.000000000 297.4721 1    2021.05.13D14:01:56.802276100
curl -X POST "http://${GATEWAY_HOSTNAME}/kxi/getData" \
    --header "Content-Type: application/json" \
    --header "Accepted: application/json" \
    --data "{\"table\":\"trace\",\"startTS\":\"2021.02.09D01:00:00.000000000\",\"agg\":[\"sensorID\",\"readTS\",\"valFloat\"],\"filter\":[[\in\",\"sensorID\",[1,3]],[\"within\",\"qual\",[0,2]]}"
curl -X POST "https://${INSIGHTS_HOSTNAME}/servicegateway/kxi/getData" \
    --header "Content-Type: application/json" \
    --header "Accepted: application/json" \
    --header "Authorization: Bearer $INSIGHTS_TOKEN" \
    --data "{\"table\":\"trace\",\"startTS\":\"2021.02.09D01:00:00.000000000\",\"agg\":[\"sensorID\",\"readTS\",\"valFloat\"],\"filter\":[[\"in\",\"sensorID\",[1,3]],[\"within\",\"qual\",[0,2]]}"

Again we make use of the sortCols argument to return the result sorted by qual and valFloat:

q)args
table      | `trace
startTS    | 2021.05.13D14:05:00.000000000
endTS      | 0Wp
sortCols   | `qual`valFloat

q)res:neg[h](`.kxi.getData;args;`cb;()!()) / Make the API call and store the result in "res"
q)5#res[1] / Display only the first 5 records

sensorID readTS                        valFloat   qual updateTS
------------------------------------------------------------------------------------
99       2021.05.12D08:00:00.000000000 0.00825664 0    2021.05.13D14:11:56.802232800
39       2021.05.12D12:00:00.000000000 0.09356276 0    2021.05.13D14:12:36.802253700
70       2021.05.12D02:00:00.000000000 0.1157809  0    2021.05.13D14:17:56.802235900
1        2021.05.12D01:00:00.000000000 0.197002   0    2021.05.13D14:17:46.804990700
33       2021.05.12D12:00:00.000000000 0.2034092  0    2021.05.13D14:13:16.803428200
curl -X POST "http://${GATEWAY_HOSTNAME}/kxi/getData" \
    --header "Content-Type: application/json" \
    --header "Accepted: application/json" \
    --data "{\"table\":\"trace\",\"startTS\":\"2021.02.09D01:00:00.000000000\",\"sortCols\":[\"qual\",\"valFloat\"]}"
curl -X POST "https://${INSIGHTS_HOSTNAME}/servicegateway/kxi/getData" \
    --header "Content-Type: application/json" \
    --header "Accepted: application/json" \
    --header "Authorization: Bearer $INSIGHTS_TOKEN" \
    --data "{\"table\":\"trace\",\"startTS\":\"2021.02.09D01:00:00.000000000\",\"sortCols\":[\"qual\",\"valFloat\"]}"

Extended Options

For IPC calls, extended options like timeout are supplied as the last parameter: neg[h] (`.kxi.getData;args;callback;options).

For HTTP, a reserved request parameter called opts exists. To supply a timeout for a getData call, the following would be used:

curl -X POST "http://${GATEWAY_HOSTNAME}/kxi/getData" \
    --header "Content-Type: application/json" \
    --header "Accepted: application/json" \
    --data "{\"table\":\"trace\",\"startTS\":\"2021.02.09D01:00:00.000000000\",\"opts\":{"timeout": 10000}}"

For a full list of options, see the header documentation.