GetData
The getData
API is a generic data retrieval API that can be executed against an arbitrary database table. It is intended to run on a Data Access Process and retrieve records from a configured database table using a specified dictionary of parameters.
Parameters
Name | Required | Type | Description |
---|---|---|---|
table | yes | symbol | Name of table to retrieve data from |
startTS | yes | timestamp | Inclusive start time of period of interest |
endTS | yes | timestamp | Exclusive end time of period of interest |
inputTZ | no | symbol | Timezone of startTS and endTS, UTC if not provided |
outputTZ | no | symbol | Timezone of output timestamp columns, UTC if not provided |
filter | no | list[] | List of triadic lists of the form (function;column name;parameter) |
groupBy | no | symbol[] | List of columns to group aggregation result by |
agg | no | symbol[] | List of triples of aggregations or columns to select. e.g. Aggregation dict example:(`c1`avg`price;`c2`sum`size) , basic select example `price`size |
fill | no | symbol | How to handle nulls in the data. Supported values are zero and forward . The zero option fills numeric types with zeroes. The forward option fills nulls with previous, non-null entry. |
temporality | no | symbol | Sets the range of data in view for each day within the query. Support two types of temporality: snapshot which takes a continuous range of the data, and slice which returns data within the startTS and endTS dates that is between the times defined in the slice argument |
slice | no | times[] | Sets the time range to grab between each date in the startTS and endTS range when using a temporality of slice |
sortCols | no | symbol[] | Columns to sort result data on |
In addition to the above parameters any of the labels defined in the assembly file of the DAPs can be used in the args
dictionary to only select data from DAPs in assemblies matching that label. So for example if the there are DAPs belonging to two assemblies, one with a region
label of canada
another with a region label of europe
, then a dictionary with a region
value of canada
would return only the canada
data. The assembly label arguments are always optional, and will hit all assemblies when not specified.
Specifying timestamps
For getting the start and end times using bash
, you may use date -u
to print formatted dates:
# Get data within the hour
startTS=$(date -u '+%Y.%m.%dD%H:00:00')
endTS=$(date -u '+%Y.%m.%dD%H:%M%:%S')
Times may be extended to contain a variable number of trailing digits to specify milliseconds and nanosecond precision.
Example of valid dates are:
Example | Implication |
---|---|
2022.01.01 | 2022.01.01D00:00:00.000000000 |
2022.01.01D01 | 2022.01.01D01:00:00.000000000 |
2022.01.01D01:01 | 2022.01.01D01:01:00.000000000 |
2022.01.01D01:01:01 | 2022.01.01D01:01:01.000000000 |
2022.01.01D01:01:01.1 | 2022.01.01D01:01:01.100000000 |
2022.01.01D01:01:01.11 | 2022.01.01D01:01:01.110000000 |
2022.01.01D01:01:01.111 | 2022.01.01D01:01:01.111000000 |
2022.01.01D01:01:01.1111 | 2022.01.01D01:01:01.111100000 |
2022.01.01D01:01:01.11111 | 2022.01.01D01:01:01.111110000 |
2022.01.01D01:01:01.111111 | 2022.01.01D01:01:01.111111000 |
2022.01.01D01:01:01.1111111 | 2022.01.01D01:01:01.111111100 |
2022.01.01D01:01:01.11111111 | 2022.01.01D01:01:01.111111110 |
2022.01.01D01:01:01.111111111 | 2022.01.01D01:01:01.111111111 |
ISO 8601 times can be used, of the format: +%Y-%m-%dT%H:%M%:%S
. For example 2000-01-01T00:00:00
.
filter
The filter
parameter is used for applying custom filtering to the query. It's specified via a list of triples where each triple has the form (operator; column name; values).
(("<=";`realFloat;100f);("within";`qual;2 3h))
curl -X POST "http://${GATEWAY_HOSTNAME}/kxi/getData" \
--header "Content-Type: application/json" \
--header "Accepted: application/json" \
--data "{\"table\":\"trace\",\"startTS\":\"$START\",\"endTS\":\"$END\",\"filter\":[[\"<=\",\"valFloat\",100],[\"within\",\"qual\",[0,2]]]}" \
curl -X POST "https://${INSIGHTS_HOSTNAME}/servicegateway/kxi/getData" \
--header "Content-Type: application/json" \
--header "Accepted: application/json" \
--header "Authorization: Bearer $INSIGHTS_TOKEN" \
--data "{\"table\":\"trace\",\"startTS\":\"$START\",\"endTS\":\"$END\",\"filter\":[[\"<=\",\"valFloat\",100],[\"within\",\"qual\",[0,2]]]}" \
The filters are applied in the order they are defined when selecting from the table.
Supported filter functions are:
in within < > like <> <= >= =
Nested filters are not currently supported.
groupBy
Specifies the columns which define the by-clause used in the selection of the data. Can be used in conjunction with agg
parameter to perform aggregations across column values.
`sensorID`qual
curl -X POST "http://${GATEWAY_HOSTNAME}/kxi/getData" \
--header "Content-Type: application/json" \
--header "Accepted: application/json" \
--data "{\"table\":\"trace\",\"startTS\":\"$START\",\"endTS\":\"$END\",\"groupBy\":[\"sensorID\",\"qual\"]}"
curl -X POST "https://${INSIGHTS_HOSTNAME}/servicegateway/kxi/getData" \
--header "Content-Type: application/json" \
--header "Accepted: application/json" \
--header "Authorization: Bearer $INSIGHTS_TOKEN" \
--data "{\"table\":\"trace\",\"startTS\":\"$START\",\"endTS\":\"$END\",\"groupBy\":[\"sensorID\",\"qual\"]}"
agg
The agg
parameter specifies the columns and/or aggregations to return after selecting the data. Argument is specified as a list of symbols, or as a list of lists of symbols. Behaviour differs depending on whether the parameter is specified as a list or list of lists.
If specified as a list of symbols, then those columns are selected and returned from the table across DAPs.
`sensorID`readTS`qual`valFloat
curl -X POST "http://${GATEWAY_HOSTNAME}/kxi/getData" \
--header "Content-Type: application/json" \
--header "Accepted: application/json" \
--data "{\"table\":\"trace\",\"startTS\":\"$START\",\"endTS\":\"$END\",\"agg\":[\"sensorID\",\"readTS\"]}"
curl -X POST "https://${INSIGHTS_HOSTNAME}/servicegateway/kxi/getData" \
--header "Content-Type: application/json" \
--header "Accepted: application/json" \
--header "Authorization: Bearer $INSIGHTS_TOKEN" \
--data "{\"table\":\"trace\",\"startTS\":\"$START\",\"endTS\":\"$END\",\"agg\":[\"sensorID\",\"readTS\"]}"
Alternatively the selection can be specified to aggregate the column with a new name before returning. This is specified as a list of lists where the first element is the new column name, the second element is the aggregation function to use, and the third is the column to aggregate.
Below does the equivalent aggregation to select c1:avg valFloat, c2:min qual from t
. Where the c1
, and c2
are the names of the returned columns are the average of valFloat
and the minimum of qual
column.
(`c1`avg`valFloat;`c2`min`qual)
curl -X POST "http://${GATEWAY_HOSTNAME}/kxi/getData" \
--header "Content-Type: application/json" \
--header "Accepted: application/json" \
--data "{\"table\":\"trace\",\"startTS\":\"$START\",\"endTS\":\"$END\",\"groupBy\":[\"sensorID\"],\"agg\":[[\"c1\",\"avg\",\"valFloat\"]]}"
curl -X POST "https://${INSIGHTS_HOSTNAME}/servicegateway/kxi/getData" \
--header "Content-Type: application/json" \
--header "Accepted: application/json" \
--header "Authorization: Bearer $INSIGHTS_TOKEN" \
--data "{\"table\":\"trace\",\"startTS\":\"$START\",\"endTS\":\"$END\",\"groupBy\":[\"sensorID\"],\"agg\":[[\"c1\",\"avg\",\"valFloat\"]]}"
Supported aggregation functions are:
count first last sum prd min max all any var
avg wsum wavg var dev cov cor svar sdev scov
!!! warn Note that in the case of using the first
and last
aggregation may return inconsistent results in the case of cross-assembly queries or when late data is enabled. With cross-assembly queries the first/last results returned is dependant on the how the assembly partial results are returned. In the late data case, the result can be dependant on ordering of late data result.
fill
The fill
parameter can be used to specify how the to handle null values. It takes a single symbol which can be one of forward
or zero
.
If forward
is specified then the query will replace null values with the previous non-null value.
If zero
is specified, null values are replaced with a 0 value of the appropriate type.
temporality
There are two types of temporality
currently supported, and each determines the way the startTS
and endTS
arguments are interpreted.
The default value is snapshot
and when specified getData
will return a continuous chunk of data between the specified startTS
and endTS
arguments.
When a slice
temporality is specified, the API will grab data between the time
casted values of the startTS
and endTS
parameters, for each date within the startTS
and endTS
timestamp.
For example, these paremeters
startTS: 2021.01.01D10:00:00
endTS: 2021.01.04D13:00:00
temporality: `slice
Would return data between the times of 10:00:00 and 13:00:00, for each day between 2021.01.01 and 2021.01.04.
sortCols
Specifies the columns to sort on before returning the response.
`sensorID`readTS
curl -X POST "http://${GATEWAY_HOSTNAME}/kxi/getData" \
--header "Content-Type: application/json" \
--header "Accepted: application/json" \
--data "{\"table\":\"trace\",\"startTS\":\"$START\",\"endTS\":\"$END\",\"groupBy\":[\"sensorID\"],\"agg\":[[\"c1\",\"avg\",\"valFloat\"]],\"sortCols\":[\"sensorID\"]}"
curl -X POST "https://${INSIGHTS_HOSTNAME}/servicegateway/kxi/getData" \
--header "Content-Type: application/json" \
--header "Accepted: application/json" \
--header "Authorization: Bearer $INSIGHTS_TOKEN" \
--data "{\"table\":\"trace\",\"startTS\":\"$START\",\"endTS\":\"$END\",\"groupBy\":[\"sensorID\"],\"agg\":[[\"c1\",\"avg\",\"valFloat\"]],\"sortCols\":[\"sensorID\"]}"
Querying reference data
Reference data is specified in the YAML by leveraging the foreign
and primaryKey
fields of the assembly schema. For example, consider the schema snippets below:
tables:
trade:
type: partitioned
blockSize: 10000
prtnCol: time
sortColsOrd: [sym]
sortColsDisk: [sym]
columns:
- name: sym
description: trade symbol
type: symbol
attrMem: grouped
attrDisk: parted
attrOrd: parted
- name: code
type: symbol
foreign: markets.code # Foreign key relationship
description: Code for the market the stock was exchanged on
- name: price
type: float
- name: time
description: timestamp
type: timestamp
markets:
description: reference market data
type: splayed # Splayed table, rather than partitioned
updTsCol: updateTS
primaryKeys:
- code # Primary key column
columns:
- name: code
type: symbol
description: Market code
- name: opCode
type: string
description: Market operating (parent) code
- name: site
type: string
description: Market website
- name: updateTS
description: Timestamp of last mutation
type: timestamp
Within this schema, the trade
partitioned time series table has a foreign key relationship to the markets
splayed reference data.
Splayed and basic reference data is held in RAM
The current value for every key seen for the reference data is held in RAM of every DAP to allow for efficient joins.
The reference data table can be queried as usual with the getData
API by omitting the startTS
and endTS
arguments.
Additionally, when querying the time series data (here, trade
) with the getData
API, anywhere a column can be specified, references into the reference data can be supplied by specifying the column as <tableName>.<column>
. As an example, to query by the market opCode
:
curl ... \
--header "Content-Type: application/json" \
--header "Accepted: application/json" \
--data "{\"table\":\"trade\",\"startTS\":\"$START\",\"endTS\":\"$END\",\"groupBy\":[\"markets.opCode\"],\"agg\":[[\"c1\",\"avg\",\"price\"]]}"
Or to include reference data columns in the result set:
curl ... \
--header "Content-Type: application/json" \
--header "Accepted: application/json" \
--data "{\"table\":\"trade\",\"startTS\":\"$START\",\"endTS\":\"$END\",\"agg\":[\"price\", \"code\", \"markets.opCode\"]}"
Note
Aggregations that return reference columns must include the foreign key as well.
Note
kdb+ syntax to use time cast operations e.g. realTime.second
, realTime.minute
is not supported.
Response
See the possible response codes for success or failure information.
The response payload is a list that includes columns to sort on, return columns, by-clause to apply in the aggregator, secondary aggregation, and the table data.
For issues with the query or responses checkout the troubleshooting section here.
Examples
Assume there is a partitioned table called trace
in all data access processes with the following schema:
name | type | description |
---|---|---|
sensorID | int | Sensor identifier |
readTS | timestamp | Timestamp of reading, and the column which determines HDB partition |
valFloat | float | Reading measurement |
qual | short | Quality of the reading |
updateTS | timestamp | Update timestamp of reading |
For these examples, assume that we have a handle opened to the gateway assigned to h
, a callback function called cb
set to {res::(x;y)}
.
For these examples, assume that a kdb Insights request token has been set in the INSIGHTS_TOKEN
env var.
In this example we do a basic get call with only the table and time range specified:
q)args
table | `trace
startTS| -0Wp
endTS | 0Wp
q)res:neg[h](`.kxi.getData;args;`cb;()!()) / Make the API call and store the result in "res"
q)5#res[1] / Display only the first 5 records
sensorID readTS valFloat qual updateTS
----------------------------------------------------------------------------------
0 2021.02.09D00:00:00.000000000 283.0992 1 2021.02.10D19:02:25.332557500
0 2021.02.09D01:00:00.000000000 906.6764 5 2021.02.10D19:02:25.332557500
0 2021.02.09D02:00:00.000000000 546.6978 4 2021.02.10D19:02:25.332557500
0 2021.02.09D03:00:00.000000000 164.0931 4 2021.02.10D19:02:25.332557500
0 2021.02.09D04:00:00.000000000 663.1649 4 2021.02.10D19:02:25.332557500
curl -X POST "http://${GATEWAY_HOSTNAME}/kxi/getData" \
--header "Content-Type: application/json" \
--header "Accepted: application/json" \
--data "{\"table\":\"trace\"}"
curl -X POST "https://${INSIGHTS_HOSTNAME}/servicegateway/kxi/getData" \
--header "Content-Type: application/json" \
--header "Accepted: application/json" \
--header "Authorization: Bearer $INSIGHTS_TOKEN" \
--data "{\"table\":\"trace\"}"
In this example we refine the time range by specifying a startTS:
q)args
table | `trace
startTS| 2021.02.09D01:00:00.000000000
endTS | 0Wp
q)res:neg[h](`.kxi.getData;args;`cb;()!()) / Make the API call and store the result in "res"
q)5#res[1] / Display only the first 5 records
sensorID readTS valFloat qual updateTS
----------------------------------------------------------------------------------
0 2021.02.09D01:00:00.000000000 906.6764 5 2021.02.10D19:02:25.332557500
0 2021.02.09D02:00:00.000000000 546.6978 4 2021.02.10D19:02:25.332557500
0 2021.02.09D03:00:00.000000000 164.0931 4 2021.02.10D19:02:25.332557500
0 2021.02.09D04:00:00.000000000 663.1649 4 2021.02.10D19:02:25.332557500
0 2021.02.09D05:00:00.000000000 537.0982 4 2021.02.10D19:02:25.332557500
curl -X POST "http://${GATEWAY_HOSTNAME}/kxi/getData" \
--header "Content-Type: application/json" \
--header "Accepted: application/json" \
--data "{\"table\":\"trace\",\"startTS\":\"2021.02.09D01.00.00.000000000\"}"
curl -X POST "https://${INSIGHTS_HOSTNAME}/servicegateway/kxi/getData" \
--header "Content-Type: application/json" \
--header "Accepted: application/json" \
--header "Authorization: Bearer $INSIGHTS_TOKEN" \
--data "{\"table\":\"trace\",\"startTS\":\"2021.02.09D01.00.00.000000000\"}"
Here we specify select only a subset of the columns:
q)args
table | `trace
startTS| -0Wp
endTS | 0Wp
agg | `sensorID`readTS`valFloat
q)res:neg[h](`.kxi.getData;args;`cb;()!()) / Make the API call and store the result in "res"
q)5#res[2] / Display only the first 5 records
sensorID readTS valFloat
-----------------------------------------------
0 2021.05.12D00:00:00.000000000 278.1034
0 2021.05.12D01:00:00.000000000 256.3526
0 2021.05.12D02:00:00.000000000 73.38756
0 2021.05.12D03:00:00.000000000 911.9714
0 2021.05.12D04:00:00.000000000 171.8527
curl -X POST "http://${GATEWAY_HOSTNAME}/kxi/getData" \
--header "Content-Type: application/json" \
--header "Accepted: application/json" \
--data "{\"table\":\"trace\",\"startTS\":\"2021.02.09D01:00:00.000000000\",\"agg\":[\"sensorID\",\"readTS\",\"valFloat\"]}"
curl -X POST "https://${INSIGHTS_HOSTNAME}/servicegateway/kxi/getData" \
--header "Content-Type: application/json" \
--header "Accepted: application/json" \
--header "Authorization: Bearer $INSIGHTS_TOKEN" \
--data "{\"table\":\"trace\",\"startTS\":\"2021.02.09D01:00:00.000000000\",\"agg\":[\"sensorID\",\"readTS\",\"valFloat\"]}"
Example using the filter argument. Here we ask for sensorIDs equal to 10i, as well as filtering for qual values between 1 and 3h:
q)args
table | `trace
startTS| -0Wp
endTS | 0Wp
filter | (("=";`sensorID;1 3h);("within";`qual;1 3h))
q)res:neg[h] (`.kxi.getData;args;`cb;()!()) / Make the API call and store the result in "res"
q)5#res[1] / Display only the first 5 records
sensorID readTS valFloat qual updateTS
----------------------------------------------------------------------------------
10 2021.05.12D03:00:00.000000000 781.8174 3 2021.05.13D14:01:56.802276100
10 2021.05.12D04:00:00.000000000 951.9392 2 2021.05.13D14:01:56.802276100
10 2021.05.12D07:00:00.000000000 971.0995 3 2021.05.13D14:01:56.802276100
10 2021.05.12D13:00:00.000000000 103.5567 2 2021.05.13D14:01:56.802276100
10 2021.05.12D14:00:00.000000000 297.4721 1 2021.05.13D14:01:56.802276100
curl -X POST "http://${GATEWAY_HOSTNAME}/kxi/getData" \
--header "Content-Type: application/json" \
--header "Accepted: application/json" \
--data "{\"table\":\"trace\",\"startTS\":\"2021.02.09D01:00:00.000000000\",\"agg\":[\"sensorID\",\"readTS\",\"valFloat\"],\"filter\":[[\in\",\"sensorID\",[1,3]],[\"within\",\"qual\",[0,2]]}"
curl -X POST "https://${INSIGHTS_HOSTNAME}/servicegateway/kxi/getData" \
--header "Content-Type: application/json" \
--header "Accepted: application/json" \
--header "Authorization: Bearer $INSIGHTS_TOKEN" \
--data "{\"table\":\"trace\",\"startTS\":\"2021.02.09D01:00:00.000000000\",\"agg\":[\"sensorID\",\"readTS\",\"valFloat\"],\"filter\":[[\"in\",\"sensorID\",[1,3]],[\"within\",\"qual\",[0,2]]}"
Again we make use of the sortCols
argument to return the result sorted by qual and valFloat:
q)args
table | `trace
startTS | 2021.05.13D14:05:00.000000000
endTS | 0Wp
sortCols | `qual`valFloat
q)res:neg[h](`.kxi.getData;args;`cb;()!()) / Make the API call and store the result in "res"
q)5#res[1] / Display only the first 5 records
sensorID readTS valFloat qual updateTS
------------------------------------------------------------------------------------
99 2021.05.12D08:00:00.000000000 0.00825664 0 2021.05.13D14:11:56.802232800
39 2021.05.12D12:00:00.000000000 0.09356276 0 2021.05.13D14:12:36.802253700
70 2021.05.12D02:00:00.000000000 0.1157809 0 2021.05.13D14:17:56.802235900
1 2021.05.12D01:00:00.000000000 0.197002 0 2021.05.13D14:17:46.804990700
33 2021.05.12D12:00:00.000000000 0.2034092 0 2021.05.13D14:13:16.803428200
curl -X POST "http://${GATEWAY_HOSTNAME}/kxi/getData" \
--header "Content-Type: application/json" \
--header "Accepted: application/json" \
--data "{\"table\":\"trace\",\"startTS\":\"2021.02.09D01:00:00.000000000\",\"sortCols\":[\"qual\",\"valFloat\"]}"
curl -X POST "https://${INSIGHTS_HOSTNAME}/servicegateway/kxi/getData" \
--header "Content-Type: application/json" \
--header "Accepted: application/json" \
--header "Authorization: Bearer $INSIGHTS_TOKEN" \
--data "{\"table\":\"trace\",\"startTS\":\"2021.02.09D01:00:00.000000000\",\"sortCols\":[\"qual\",\"valFloat\"]}"
Extended Options
For IPC calls, extended options like timeout
are supplied as the last parameter: neg[h] (`.kxi.getData;args;callback;options)
.
For HTTP, a reserved request parameter called opts
exists. To supply a timeout for a getData call, the following would be used:
curl -X POST "http://${GATEWAY_HOSTNAME}/kxi/getData" \
--header "Content-Type: application/json" \
--header "Accepted: application/json" \
--data "{\"table\":\"trace\",\"startTS\":\"2021.02.09D01:00:00.000000000\",\"opts\":{"timeout": 10000}}"
For a full list of options, see the header documentation.