getStats

Analytics Services: Get results of analytics

Syntax: getStats parms

Where parms is a dictionary of parameters, returns data from statistical analyzes. The parameter dictionary is constructed from two lists: keys and values:

getTicks `key1`key2`key3!(value1;value2;value3)

Parameters for getTicks are also parameters for getStats. Some (R) are required: omitting a required parameter signals an error.

analytics R Analytics fill Racking and filling granularity Bucketing and bars granularityUnit Bucketing and bars byCol Group By Column doNotValidate Disable Parameter Validation

Note that Analytic Services (getStats) execute against raw time series data (tick data).

Analytics

Key analytics required Value one or more analytics as symbol atom or vector, mixed list, or dictionary

Examples (avg;`price)

Default `

An analytic is:

  • a q operator, keyword, or lambda that is an aggregate function with the name of a result column on which to apply it (as a symbol atom), e.g. (avg;`price).
  • a named analytic

Note that by default in a core-only installation, there are no named analytics available.

Bucketing and bars (binning)

Key granularity optional Value number of units per bucket as int atom

Example 30

Default 1

Key granularityUnit optional Value granularity unit as a symbol atom Valid tick millisecond second minute hour day

Example `millisecond

Default (none)

getTicks supports customizable bar sizes, which are configurable on request. For example, you can request one-minute bars, two-minute bars, etc. per query.

You can specify the size of bars with the granularity and the granularityUnit parameters and the time interval. granularity gives the number of units per bucket and granularityUnit the bucket time period. For example, granularity of 3 and granularityUnit of `hour gives 3-hour buckets within the time window.

Recommended maximum timespans for the granularity units:

tick            1 day
millisecond     1 day
second          1 week
minute          1 week
hour            1 month
day             1 month

If no granularity is set, an aggregation over entire intervals is returned.

With the granularity unit set to

day

Start and end times must not be set: it generates daily bars based on tick data from 00:00:00.000000000 to 23:59:59.999999999.

tick

Granularity cannot be set. There will be no aggregation, instead tick-level analytics. (This is used to retrieve spread per tick.)

Tick-level data is returned. Otherwise the time and sym columns are returned as well as columns for each of the analytics requested in the API, e.g. VWAP or lastExchangeTime.

Inclusivity of times

The start and end times are inclusive. So when requesting an end time of xx:xx:00, the results will contain data for that last nanosecond. This is best explained with an example.

In the following query

getStats .[!]flip(
    (`idList        ; `7203.T);
    (`dataType       ; `trade);
    (`startDate      ; 2018.07.02);
    (`endDate        ; 2018.07.02);
    (`startTime      ; 00:00:00);
    (`endTime        ; 01:00:00);
    (`time           ; `exchangeTime);
    (`dataSource     ; `equity);
    (`granularity    ; 1i);
    (`granularityUnit; `minute);
    (`analytics      ; `VWAP`sumVolume`firstInsertTime`lastInsertTime`firstExchangeTime)
  )

getStats will provide 61 minute bars:

  • sixty one-minute bars, for all whole minutes from 00:00:00.000000000 to 00:59:59.999999999, plus
  • an extra minute bar for the single nanosecond 01.00.00.000000000.

Racking and filling

Key fill optional Value fill option as a symbol atom Valid ` `zero `forward `null

Example `forward

Default `

The fill parameter specifies how to manage time bars with no data. By default or if ` is set, the resulting data will be only that which has values:

timestampbar                  sym   medSpread avgSpread
---------------------------------------------------
2016.05.24D00:00:00.000000000 1EDM6 0.0025    0.003265
2016.05.24D05:00:00.000000000 1EDM6 0.0025    0.003641
2016.05.24D06:00:00.000000000 1EDM6 0.0025    0.002642
2016.05.24D07:00:00.000000000 1EDM6 0.005     0.004087
2016.05.24D08:00:00.000000000 1EDM6 0.0025    0.00292
2016.05.24D09:00:00.000000000 1EDM6 0.005     0.004393
2016.05.24D10:00:00.000000000 1EDM6 0.0025    0.0025
2016.05.24D11:00:00.000000000 1EDM6 0.0025    0.002566
2016.05.24D12:00:00.000000000 1EDM6 0.0025    0.002565
2016.05.24D13:00:00.000000000 1EDM6 0.0025    0.002508
2016.05.24D14:00:00.000000000 1EDM6 0.0025    0.002505

Note the missing bars between rows 1 and 2 (00:00 to 05:00). If the fill is zero, the data will be racked and zero filled:

timestampbar                  sym   medSpread avgSpread
-------------------------------------------------------
2016.05.24D00:00:00.000000000 1EDM6 0.0025    0.003265
2016.05.24D01:00:00.000000000 1EDM6 0.0       0.0
2016.05.24D02:00:00.000000000 1EDM6 0.0       0.0
2016.05.24D03:00:00.000000000 1EDM6 0.0       0.0
2016.05.24D04:00:00.000000000 1EDM6 0.0       0.0
2016.05.24D05:00:00.000000000 1EDM6 0.0025    0.003641
2016.05.24D06:00:00.000000000 1EDM6 0.0025    0.002642
2016.05.24D07:00:00.000000000 1EDM6 0.005     0.004087
2016.05.24D08:00:00.000000000 1EDM6 0.0025    0.00292
2016.05.24D09:00:00.000000000 1EDM6 0.005     0.004393
2016.05.24D10:00:00.000000000 1EDM6 0.0025    0.0025
2016.05.24D11:00:00.000000000 1EDM6 0.0025    0.002566
2016.05.24D12:00:00.000000000 1EDM6 0.0025    0.002565
2016.05.24D13:00:00.000000000 1EDM6 0.0025    0.002508

If the fill is null, the data will be racked and null-filled:

timestampbar                  sym   medSpread avgSpread
-------------------------------------------------------
2016.05.24D00:00:00.000000000 1EDM6 0.0025    0.003265
2016.05.24D01:00:00.000000000 1EDM6 null      null
2016.05.24D02:00:00.000000000 1EDM6 null      null
2016.05.24D03:00:00.000000000 1EDM6 null      null
2016.05.24D04:00:00.000000000 1EDM6 null      null
2016.05.24D05:00:00.000000000 1EDM6 0.0025    0.003641
2016.05.24D06:00:00.000000000 1EDM6 0.0025    0.002642
2016.05.24D07:00:00.000000000 1EDM6 0.005     0.004087
2016.05.24D08:00:00.000000000 1EDM6 0.0025    0.00292
2016.05.24D09:00:00.000000000 1EDM6 0.005     0.004393
2016.05.24D10:00:00.000000000 1EDM6 0.0025    0.0025
2016.05.24D11:00:00.000000000 1EDM6 0.0025    0.002566
2016.05.24D12:00:00.000000000 1EDM6 0.0025    0.002565
2016.05.24D13:00:00.000000000 1EDM6 0.0025    0.002508

If the fill is forward, the data will be racked and forward filled:

timestampbar                  sym   medSpread avgSpread
-------------------------------------------------------
2016.05.24D00:00:00.000000000 1EDM6 0.0025    0.003265
2016.05.24D01:00:00.000000000 1EDM6 0.0025    0.003265
2016.05.24D02:00:00.000000000 1EDM6 0.0025    0.003265
2016.05.24D03:00:00.000000000 1EDM6 0.0025    0.003265
2016.05.24D04:00:00.000000000 1EDM6 0.0025    0.003265
2016.05.24D05:00:00.000000000 1EDM6 0.0025    0.003641
2016.05.24D06:00:00.000000000 1EDM6 0.0025    0.002642
2016.05.24D07:00:00.000000000 1EDM6 0.005     0.004087
2016.05.24D08:00:00.000000000 1EDM6 0.0025    0.00292
2016.05.24D09:00:00.000000000 1EDM6 0.005     0.004393
2016.05.24D10:00:00.000000000 1EDM6 0.0025    0.0025
2016.05.24D11:00:00.000000000 1EDM6 0.0025    0.002566
2016.05.24D12:00:00.000000000 1EDM6 0.0025    0.002565
2016.05.24D13:00:00.000000000 1EDM6 0.0025    0.002508

Group by column

Key byCol optional Value byCol option as a symbol atom or vector Valid `column1 `column1`column2
`ReferenceTable.Column1 (works with foreign keys) `TableAColumn.TableBColumn (KDB foreign keys)

Example `sym

Default `

Before Refinery 5.6.1, there was no way other than using granularityUnit to group the queried data by anything more than the symCol and timeCol. Manual grouping was required to be done afterwards, resulting in longer computation time. Through the addition of byCol, this grouping is done during the API's data selection stage.

In addition to being able to group by different columns within your table, you can also group by columns in a referenced table (foreign key). You achieve this by using dot notation between the table name and the column within that table.

`ReferenceTable.Column1

Not only can you use reference table foreign keys, as stated above, but you can also use KDB foreign keys

`TableAColumn.TableBColumn.<etc..>

Do not validate

Key doNotValidate optional Value parameter symbol vector

Example `analytics

Default none

Disables pre-processing parameter validation checks for the specified parameters. For supported parameters, see Parameter Validation