getStats¶
Analytics Services: Get results of analytics
Syntax: getStats parms
Where parms is a dictionary of parameters, returns data from statistical analyzes. The parameter dictionary is constructed from two lists: keys and values:
getTicks `key1`key2`key3!(value1;value2;value3)
Parameters for getTicks are also parameters for getStats.
Some (R) are required: omitting a required parameter signals an error.
analytics R Analytics fill Racking and filling granularity Bucketing and bars granularityUnit Bucketing and bars byCol Group By Column doNotValidate Disable Parameter Validation
Note that Analytic Services (getStats) execute against raw time series data (tick data).
Analytics¶
Key analytics required Value one or more analytics as symbol atom or vector, mixed list, or dictionary
Examples (avg;`price)
Default `
An analytic is:
- a q operator, keyword, or lambda that is an aggregate function with the name of a result column on which to apply it (as a symbol atom), e.g.
(avg;`price). - a named analytic
Note that by default in a core-only installation, there are no named analytics available.
Bucketing and bars (binning)¶
Key granularity optional Value number of units per bucket as int atom
Example 30
Default 1
Key granularityUnit optional Value granularity unit as a symbol atom Valid tick millisecond second minute hour day
Example `millisecond
Default (none)
getTicks supports customizable bar sizes, which are configurable on request. For example, you can request one-minute bars, two-minute bars, etc. per query.
You can specify the size of bars with the granularity and the granularityUnit parameters and the time interval.
granularity gives the number of units per bucket and granularityUnit the bucket time period. For example, granularity of 3 and granularityUnit of `hour gives 3-hour buckets within the time window.
Recommended maximum timespans for the granularity units:
tick 1 day
millisecond 1 day
second 1 week
minute 1 week
hour 1 month
day 1 month
If no granularity is set, an aggregation over entire intervals is returned.
With the granularity unit set to
day-
Start and end times must not be set: it generates daily bars based on tick data from
00:00:00.000000000to23:59:59.999999999. tick-
Granularity cannot be set. There will be no aggregation, instead tick-level analytics. (This is used to retrieve spread per tick.)
Tick-level data is returned. Otherwise the
timeandsymcolumns are returned as well as columns for each of the analytics requested in the API, e.g.VWAPorlastExchangeTime.
Inclusivity of times¶
The start and end times are inclusive. So when requesting an end time of xx:xx:00, the results will contain data for that last nanosecond. This is best explained with an example.
In the following query
getStats .[!]flip(
(`idList ; `7203.T);
(`dataType ; `trade);
(`startDate ; 2018.07.02);
(`endDate ; 2018.07.02);
(`startTime ; 00:00:00);
(`endTime ; 01:00:00);
(`time ; `exchangeTime);
(`dataSource ; `equity);
(`granularity ; 1i);
(`granularityUnit; `minute);
(`analytics ; `VWAP`sumVolume`firstInsertTime`lastInsertTime`firstExchangeTime)
)
getStats will provide 61 minute bars:
- sixty one-minute bars, for all whole minutes from
00:00:00.000000000to00:59:59.999999999, plus - an extra minute bar for the single nanosecond
01.00.00.000000000.
Racking and filling¶
Key fill optional Value fill option as a symbol atom Valid ` `zero `forward `null
Example `forward
Default `
The fill parameter specifies how to manage time bars with no data. By default or if ` is set, the resulting data will be only that which has values:
timestampbar sym medSpread avgSpread
---------------------------------------------------
2016.05.24D00:00:00.000000000 1EDM6 0.0025 0.003265
2016.05.24D05:00:00.000000000 1EDM6 0.0025 0.003641
2016.05.24D06:00:00.000000000 1EDM6 0.0025 0.002642
2016.05.24D07:00:00.000000000 1EDM6 0.005 0.004087
2016.05.24D08:00:00.000000000 1EDM6 0.0025 0.00292
2016.05.24D09:00:00.000000000 1EDM6 0.005 0.004393
2016.05.24D10:00:00.000000000 1EDM6 0.0025 0.0025
2016.05.24D11:00:00.000000000 1EDM6 0.0025 0.002566
2016.05.24D12:00:00.000000000 1EDM6 0.0025 0.002565
2016.05.24D13:00:00.000000000 1EDM6 0.0025 0.002508
2016.05.24D14:00:00.000000000 1EDM6 0.0025 0.002505
Note the missing bars between rows 1 and 2 (00:00 to 05:00). If the fill is zero, the data will be racked and zero filled:
timestampbar sym medSpread avgSpread
-------------------------------------------------------
2016.05.24D00:00:00.000000000 1EDM6 0.0025 0.003265
2016.05.24D01:00:00.000000000 1EDM6 0.0 0.0
2016.05.24D02:00:00.000000000 1EDM6 0.0 0.0
2016.05.24D03:00:00.000000000 1EDM6 0.0 0.0
2016.05.24D04:00:00.000000000 1EDM6 0.0 0.0
2016.05.24D05:00:00.000000000 1EDM6 0.0025 0.003641
2016.05.24D06:00:00.000000000 1EDM6 0.0025 0.002642
2016.05.24D07:00:00.000000000 1EDM6 0.005 0.004087
2016.05.24D08:00:00.000000000 1EDM6 0.0025 0.00292
2016.05.24D09:00:00.000000000 1EDM6 0.005 0.004393
2016.05.24D10:00:00.000000000 1EDM6 0.0025 0.0025
2016.05.24D11:00:00.000000000 1EDM6 0.0025 0.002566
2016.05.24D12:00:00.000000000 1EDM6 0.0025 0.002565
2016.05.24D13:00:00.000000000 1EDM6 0.0025 0.002508
If the fill is null, the data will be racked and null-filled:
timestampbar sym medSpread avgSpread
-------------------------------------------------------
2016.05.24D00:00:00.000000000 1EDM6 0.0025 0.003265
2016.05.24D01:00:00.000000000 1EDM6 null null
2016.05.24D02:00:00.000000000 1EDM6 null null
2016.05.24D03:00:00.000000000 1EDM6 null null
2016.05.24D04:00:00.000000000 1EDM6 null null
2016.05.24D05:00:00.000000000 1EDM6 0.0025 0.003641
2016.05.24D06:00:00.000000000 1EDM6 0.0025 0.002642
2016.05.24D07:00:00.000000000 1EDM6 0.005 0.004087
2016.05.24D08:00:00.000000000 1EDM6 0.0025 0.00292
2016.05.24D09:00:00.000000000 1EDM6 0.005 0.004393
2016.05.24D10:00:00.000000000 1EDM6 0.0025 0.0025
2016.05.24D11:00:00.000000000 1EDM6 0.0025 0.002566
2016.05.24D12:00:00.000000000 1EDM6 0.0025 0.002565
2016.05.24D13:00:00.000000000 1EDM6 0.0025 0.002508
If the fill is forward, the data will be racked and forward filled:
timestampbar sym medSpread avgSpread
-------------------------------------------------------
2016.05.24D00:00:00.000000000 1EDM6 0.0025 0.003265
2016.05.24D01:00:00.000000000 1EDM6 0.0025 0.003265
2016.05.24D02:00:00.000000000 1EDM6 0.0025 0.003265
2016.05.24D03:00:00.000000000 1EDM6 0.0025 0.003265
2016.05.24D04:00:00.000000000 1EDM6 0.0025 0.003265
2016.05.24D05:00:00.000000000 1EDM6 0.0025 0.003641
2016.05.24D06:00:00.000000000 1EDM6 0.0025 0.002642
2016.05.24D07:00:00.000000000 1EDM6 0.005 0.004087
2016.05.24D08:00:00.000000000 1EDM6 0.0025 0.00292
2016.05.24D09:00:00.000000000 1EDM6 0.005 0.004393
2016.05.24D10:00:00.000000000 1EDM6 0.0025 0.0025
2016.05.24D11:00:00.000000000 1EDM6 0.0025 0.002566
2016.05.24D12:00:00.000000000 1EDM6 0.0025 0.002565
2016.05.24D13:00:00.000000000 1EDM6 0.0025 0.002508
Group by column¶
Key byCol optional Value byCol option as a symbol atom or vector Valid `column1 `column1`column2
`ReferenceTable.Column1 (works with foreign keys) `TableAColumn.TableBColumn (KDB foreign keys)Example `sym
Default `
Before Refinery 5.6.1, there was no way other than using granularityUnit to group the queried data by anything more than the symCol and timeCol. Manual grouping was required to be done afterwards, resulting in longer computation time. Through the addition of byCol, this grouping is done during the API's data selection stage.
In addition to being able to group by different columns within your table, you can also group by columns in a referenced table (foreign key). You achieve this by using dot notation between the table name and the column within that table.
`ReferenceTable.Column1
Not only can you use reference table foreign keys, as stated above, but you can also use KDB foreign keys
`TableAColumn.TableBColumn.<etc..>
Do not validate¶
Key doNotValidate optional Value parameter symbol vector
Example `analytics
Default none
Disables pre-processing parameter validation checks for the specified parameters. For supported parameters, see Parameter Validation