Pandas Like API for PyKX Tables¶
This page demonstrates the PyKX functionality that aligns with the Pandas API for DataFrame interactions.
Only operations on PyKX tables that adhere to Pandas API conventions are covered. The focus is on areas where PyKX/q can offer a performance advantage over Pandas, particularly in terms of memory footprint and execution time.
A full breakdown of the the available functionality and examples of its use can be found in the Pandas API section below.
Covered sections of the Pandas API¶
In this context, coverage refers to the functionality provided by the PyKX API for Tables, which matches the methods and attributes supported by the Pandas DataFrame API. It does not include functionality for interacting with Pandas Series objects or for reading/writing CSV/JSON files.
If there's any functionality you would like us to add to this library, please open an issue here or open a pull request here.
Property/metadata type information¶
DataFrame properties | PyKX supported? | PyKX API documentation link |
---|---|---|
columns | :material-check: | link |
dtypes | :material-check: | link |
empty | :material-check: | link |
ndim | :material-check: | link |
shape | :material-check: | link |
size | :material-check: | link |
Analytic functionality¶
DataFrame method | PyKX supported? | PyKX API documentation link |
---|---|---|
abs | :material-check: | link |
agg | :material-check: | link |
apply | :material-check: | link |
applymap | :material-check: | link |
groupby | :material-check: | link |
idxmax | :material-check: | link |
idxmin | :material-check: | link |
kurt | :material-check: | link |
max | :material-check: | link |
map | :material-check: | link |
mean | :material-check: | link |
median | :material-check: | link |
min | :material-check: | link |
mode | :material-check: | link |
sem | :material-check: | link |
sum | :material-check: | link |
skew | :material-check: | link |
std | :material-check: | link |
prod | :material-check: | link |
Querying and data interrogation¶
DataFrame method | PyKX supported? | PyKX API documentation link |
---|---|---|
all | :material-check: | link |
any | :material-check: | link |
at | :material-check: | link |
count | :material-check: | link |
get | :material-check: | link |
head | :material-check: | link |
iloc | :material-check: | link |
isna | :material-check: | link |
isnull | :material-check: | link |
loc | :material-check: | link |
notna | :material-check: | link |
notnull | :material-check: | link |
sample | :material-check: | link |
select_dtypes | :material-check: | link |
tail | :material-check: | link |
Data preprocessing¶
DataFrame method | PyKX supported? | PyKX API documentation link |
---|---|---|
add_prefix | :material-check: | link |
add_suffix | :material-check: | link |
astype | :material-check: | link |
drop | :material-check: | link |
drop_duplicates | :material-check: | link |
pop | :material-check: | link |
rename | :material-check: | link |
reset_index | :material-check: | link |
set_index | :material-check: | link |
Data joins/merge¶
DataFrame method | PyKX supported? | PyKX API documentation link |
---|---|---|
merge | :material-check: | link |
merge_asof | :material-check: | link |
Data sorting¶
DataFrame method | PyKX supported? | PyKX API documentation link |
---|---|---|
sort_values | :material-check: | link |
nlargest | :material-check: | link |
nsmallest | :material-check: | link |
Unsupported functionality¶
DataFrame methods | PyKX supported? | Additional information |
---|---|---|
*from* |
:material-close: | Functionality for the creation of PyKX Tables from alternative data sources is not supported at this time. |
*plot* |
:material-close: | Functionality for the plotting of columns/tables is not supported at this time. |
*sparse* |
:material-close: | Sparse data like interactions presently not supported. |
to_* |
:material-close: | Functionality for the conversion/persistence of PyKX Tables to other formats is not supported at this time. |
Pandas API¶
This notebook provides a demonstration of the capabilities of the pandas-like API for PyKX Table objects.
To follow along, download this notebook.
This demonstration outlines the following:
import pykx as kx
import numpy as np
import pandas as pd
kx.q.system.console_size = [10, 80]
Constructing Tables¶
Table¶
Create a table from a list of rows or by converting a Python dictionary object:
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x | Union[list, array] | An array like object containing the contents of each row of the table. | None |
data | dict | A dictionary to be converted into a Table object. | None |
columns | list[str] | A list of column names to use when constructing from an array of rows. | None |
Returns:
Type | Description |
---|---|
Table | The newly constructed table object. |
Examples:
Create a table from a dictionary object.
kx.Table(data={'x': list(range(10)), 'y': [10 - x for x in range(10)]})
x | y | |
---|---|---|
0 | 0 | 10 |
1 | 1 | 9 |
2 | 2 | 8 |
3 | 3 | 7 |
4 | 4 | 6 |
5 | 5 | 5 |
6 | 6 | 4 |
7 | 7 | 3 |
8 | 8 | 2 |
9 | 9 | 1 |
Create a table from an array-like object.
kx.Table([[0, 1], [2, 3], [4, 5], [6, 7], [8, 9]])
x | x1 | |
---|---|---|
0 | 0 | 1 |
1 | 2 | 3 |
2 | 4 | 5 |
3 | 6 | 7 |
4 | 8 | 9 |
Create a table from an array-like object and provide names for the columns to use.
kx.Table([[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11]], columns=['x', 'y', 'z'])
x | y | z | |
---|---|---|---|
0 | 0 | 1 | 2 |
1 | 3 | 4 | 5 |
2 | 6 | 7 | 8 |
3 | 9 | 10 | 11 |
Keyed Table¶
Create a keyed table from a list of rows or by converting a Python dictionary object:
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x | Union[list, array] | An array like object containing the contents of each row of the table. | None |
data | dict | A dictionary to be converted into a Table object. | None |
columns | list[str] | A list of column names to use when constructing from an array of rows. | None |
index | list[Any] | An array like object to use as the index column of the table. | None |
Returns:
Type | Description |
---|---|
KeyedTable | The newly constructed keyed table object. |
Examples:
Create a keyed table from a dictionary object.
kx.KeyedTable(data={'x': list(range(10)), 'y': list(10 - x for x in range(10))})
x | y | |
---|---|---|
idx | ||
0 | 0 | 10 |
1 | 1 | 9 |
2 | 2 | 8 |
3 | 3 | 7 |
4 | 4 | 6 |
5 | 5 | 5 |
6 | 6 | 4 |
7 | 7 | 3 |
8 | 8 | 2 |
9 | 9 | 1 |
Create a keyed table from a list of rows:
kx.KeyedTable([[0, 1], [2, 3], [4, 5], [6, 7], [8, 9]])
x | x1 | |
---|---|---|
idx | ||
0 | 0 | 1 |
1 | 2 | 3 |
2 | 4 | 5 |
3 | 6 | 7 |
4 | 8 | 9 |
Create a keyed table from a list of rows and provide names for the resulting columns:
kx.KeyedTable([[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11]], columns=['x', 'y', 'z'])
x | y | z | |
---|---|---|---|
idx | |||
0 | 0 | 1 | 2 |
1 | 3 | 4 | 5 |
2 | 6 | 7 | 8 |
3 | 9 | 10 | 11 |
Create a keyed table with a specified index column:
kx.KeyedTable(data={'x': list(range(10)), 'y': list(10 - x for x in range(10))}, index=[2 * x for x in range(10)])
x | y | |
---|---|---|
idx | ||
0 | 0 | 10 |
2 | 1 | 9 |
4 | 2 | 8 |
6 | 3 | 7 |
8 | 4 | 6 |
10 | 5 | 5 |
12 | 6 | 4 |
14 | 7 | 3 |
16 | 8 | 2 |
18 | 9 | 1 |
Metadata¶
N = 1000
tab = kx.Table(data = {
'x': kx.q.til(N),
'y': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),
'z': kx.random.random(N, 500.0),
'w': kx.random.random(N, 1000),
'v': kx.random.random(N, [kx.LongAtom.null, 0, 50, 100, 200, 250])})
tab
x | y | z | w | v | |
---|---|---|---|---|---|
0 | 0 | AAPL | 454.7063 | 766 | 100 |
1 | 1 | AAPL | 149.4238 | 916 | 0 |
2 | 2 | MSFT | 227.0315 | 469 | 0 |
3 | 3 | GOOG | 78.47098 | 280 | 250 |
4 | 4 | MSFT | 23.49633 | 369 | 100 |
5 | 5 | GOOG | 479.5628 | 254 | 200 |
6 | 6 | MSFT | 398.3865 | 388 | 200 |
7 | 7 | MSFT | 132.0956 | 318 | 250 |
8 | 8 | GOOG | 157.248 | 372 | 50 |
... | ... | ... | ... | ... | ... |
999 | 999 | GOOG | 3.79704 | 246 | 0 |
1,000 rows × 5 columns
Table.columns¶
Get the name of each column in the table:
tab.columns
pykx.SymbolVector(pykx.q('`x`y`z`w`v'))
Table.dtypes¶
Get the datatypes of the table columns:
print(tab.dtypes)
columns datatypes ----------------------- x "kx.LongAtom" y "kx.SymbolAtom" z "kx.FloatAtom" w "kx.LongAtom" v "kx.LongAtom"
Table.empty¶
Returns True
if the table is empty otherwise returns False
.
tab.empty
pykx.BooleanAtom(pykx.q('0b'))
Table.ndim¶
Get the number of columns within the table:
tab.ndim
pykx.LongAtom(pykx.q('2'))
Table.shape¶
Get the shape of the table as a tuple (number of rows, number of columns):
tab.shape
(pykx.LongAtom(pykx.q('1000')), pykx.LongAtom(pykx.q('5')))
Table.size¶
Get the number of values in the table (rows * cols):
tab.size
pykx.LongAtom(pykx.q('5000'))
Querying and Data Interrogation¶
# The examples in this section will use this example table filled with random data
N = 1000
tab = kx.Table(data = {
'x': kx.q.til(N),
'y': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),
'z': kx.random.random(N, 500.0),
'w': kx.random.random(N, 1000),
'v': kx.random.random(N, [kx.LongAtom.null, 0, 50, 100, 200, 250])})
tab
x | y | z | w | v | |
---|---|---|---|---|---|
0 | 0 | GOOG | 326.1157 | 458 | 0 |
1 | 1 | MSFT | 4.083758 | 511 | 250 |
2 | 2 | GOOG | 81.29854 | 96 | 0 |
3 | 3 | AAPL | 256.6323 | 998 | 50 |
4 | 4 | MSFT | 398.2529 | 103 | 0 |
5 | 5 | GOOG | 429.4278 | 214 | 100 |
6 | 6 | AAPL | 470.0497 | 807 | 100 |
7 | 7 | GOOG | 409.6725 | 727 | 200 |
8 | 8 | AAPL | 280.2929 | 230 | 250 |
... | ... | ... | ... | ... | ... |
999 | 999 | AAPL | 142.8215 | 874 | 0 |
1,000 rows × 5 columns
Table.all()¶
Table.all(axis=0, bool_only=False, skipna=True)
Returns whether or not all values across the given axis have a truthy
value.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
axis | int | The axis to calculate all across 0 is columns, 1 is rows. |
0 |
bool_only | bool | Only use columns of the table that are boolean types. | False |
skipna | bool | Ignore any null values along the axis. | True |
Returns:
Type | Description |
---|---|
Dictionary | A dictionary where the key represents the column name / row number and the values are the result of calling all on that column / row. |
tab.all()
x | 0b |
---|---|
y | 1b |
z | 1b |
w | 1b |
v | 0b |
Table.any()¶
Table.any(axis=0, bool_only=False, skipna=True)
Returns whether or not any values across the given axis have a truthy
value.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
axis | int | The axis to calculate any across 0 is columns, 1 is rows. |
0 |
bool_only | bool | Only use columns of the table that are boolean types. | False |
skipna | bool | Ignore any null values along the axis. | True |
Returns:
Type | Description |
---|---|
Dictionary | A dictionary where the key represents the column name / row number and the values are the result of calling any on that column / row. |
tab.any()
x | 1b |
---|---|
y | 1b |
z | 1b |
w | 1b |
v | 1b |
Table.at[]¶
Table.at[row, col]
Access a single value for a row / column pair.
Similar to loc[]
, in that both provide label-based lookups. Use at
if you only need to get or set a single value.
The at
property can be used for both assignment and retrieval of values at a given row and column.
Examples:
Get the value of the z
column in the 997th row.
tab.at[997, 'z']
pykx.FloatAtom(pykx.q('20.61333'))
Reassign the value of the z
column in the 997th row to 3.14159
.
tab.at[997, 'z'] = 3.14159
tab.at[997, 'z']
pykx.FloatAtom(pykx.q('3.14159'))
Table.get()¶
Table.get(key, default=None)
Get a column or columns from a table by key, if the key does not exist return the default value.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
key | Union[str, list[str]] | The column name or list of names to get from the table. | required |
default | int | The default value if the key is not found. | None |
Returns:
Type | Description |
---|---|
Union[Table, Any] | A table containing only the columns requested or the default value. |
Examples:
Get the y
column from the table.
tab.get(['y'])
y | |
---|---|
0 | GOOG |
1 | MSFT |
2 | GOOG |
3 | AAPL |
4 | MSFT |
5 | GOOG |
6 | AAPL |
7 | GOOG |
8 | AAPL |
... | ... |
999 | AAPL |
1,000 rows × 1 columns
Get the z
column from the table as a vector.
tab.get('z')
pykx.FloatVector(pykx.q('326.1157 4.083758 81.29854 256.6323 398.2529 429.4278 470.0497 409.6725 280.2..'))
Get the y
and z
columns from the table.
tab.get(['y', 'z'])
y | z | |
---|---|---|
0 | GOOG | 326.1157 |
1 | MSFT | 4.083758 |
2 | GOOG | 81.29854 |
3 | AAPL | 256.6323 |
4 | MSFT | 398.2529 |
5 | GOOG | 429.4278 |
6 | AAPL | 470.0497 |
7 | GOOG | 409.6725 |
8 | AAPL | 280.2929 |
... | ... | ... |
999 | AAPL | 142.8215 |
1,000 rows × 2 columns
Attempt to get the q
column from the table and receive the default
value not found
as that column does not exist. Will error if default
value is not set.
tab.get(['q'], 'not found')
'not found'
Table.head()¶
Table.head(n=5)
Get the first n rows from a table.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n | int | The number of rows to return. | 5 |
Returns:
Type | Description |
---|---|
Table | The first n rows of the table. |
Examples:
Return the first 5 rows of the table.
tab.head()
x | y | z | w | v | |
---|---|---|---|---|---|
0 | 0 | GOOG | 326.1157 | 458 | 0 |
1 | 1 | MSFT | 4.083758 | 511 | 250 |
2 | 2 | GOOG | 81.29854 | 96 | 0 |
3 | 3 | AAPL | 256.6323 | 998 | 50 |
4 | 4 | MSFT | 398.2529 | 103 | 0 |
Return the first 10 rows of the table.
tab.head(10)
x | y | z | w | v | |
---|---|---|---|---|---|
0 | 0 | GOOG | 326.1157 | 458 | 0 |
1 | 1 | MSFT | 4.083758 | 511 | 250 |
2 | 2 | GOOG | 81.29854 | 96 | 0 |
3 | 3 | AAPL | 256.6323 | 998 | 50 |
4 | 4 | MSFT | 398.2529 | 103 | 0 |
5 | 5 | GOOG | 429.4278 | 214 | 100 |
6 | 6 | AAPL | 470.0497 | 807 | 100 |
7 | 7 | GOOG | 409.6725 | 727 | 200 |
8 | 8 | AAPL | 280.2929 | 230 | 250 |
9 | 9 | AAPL | 223.9135 | 153 | 100 |
Table.isna()¶
Table.isna()
Detects null values in a Table object.
Parameters:
Returns:
Type | Description |
---|---|
Table | A Table with the same shape as the original but containing boolean values. 1b represents a null value present in a cell, 0b represents the opposite. |
tabDemo = kx.Table(data= {
'a': [1, 0, float('nan')],
'b': [1, 0, float('nan')],
'c': [float('nan'), 4, 0]
})
tabDemo.isna()
a | b | c | |
---|---|---|---|
0 | 0b | 0b | 1b |
1 | 0b | 0b | 0b |
2 | 1b | 1b | 0b |
Table.isnull()¶
Table.isnull()
Alias of Table.isna().
Detects null values in a Table object.
Parameters:
Returns:
Type | Description |
---|---|
Table | A Table with the same shape as the original but containing boolean values. 1b represents a null value present in a cell, 0b represents the opposite. |
tabDemo.isnull()
a | b | c | |
---|---|---|---|
0 | 0b | 0b | 1b |
1 | 0b | 0b | 0b |
2 | 1b | 1b | 0b |
Table.notna()¶
Table.notna()
Boolean inverse of Table.isna().
Detects non-null values on a Table object.
Parameters:
Returns:
Type | Description |
---|---|
Table | A Table with the same shape as the original but containing boolean values. 0b represents a null value present in a cell, 1b represents the opposite. |
tabDemo.notna()
a | b | c | |
---|---|---|---|
0 | 1b | 1b | 0b |
1 | 1b | 1b | 1b |
2 | 0b | 0b | 1b |
Table.notnull()¶
Table.notna()
Boolean inverse of Table.isnull(). Alias of Table.isna()
Detects non-null values on a Table object.
Parameters:
Returns:
Type | Description |
---|---|
Table | A Table with the same shape as the original but containing boolean values. 0b represents a null value present in a cell, 1b represents the opposite. |
tabDemo.notnull()
a | b | c | |
---|---|---|---|
0 | 1b | 1b | 0b |
1 | 1b | 1b | 1b |
2 | 0b | 0b | 1b |
Table.iloc[]¶
Table.iloc[:, :]
Purely integer-location based indexing for selection by position.
iloc
is primarily integer position based (from 0 to length-1 of the axis), but may also be used with a BooleanVector.
Allowed inputs are:
- An integer, e.g. 5.
- A list or array of integers, e.g. [4, 3, 0].
- A slice object with ints, e.g. 1:7.
- A BooleanVector.
- A callable function with one argument (the calling Series or DataFrame) and that returns valid output for indexing (one of the above). This is useful in method chains, when you don’t have a reference to the calling object, but would like to base your selection on some value.
- A tuple of row and column indexes. The tuple elements consist of one of the above inputs, e.g. (0, 1).
Returns:
Type | Description |
---|---|
Table | A table containing only the columns / rows requested. |
Examples:
Get the second row from a table.
tab.iloc[1]
x | y | z | w | v | |
---|---|---|---|---|---|
0 | 1 | MSFT | 4.083758 | 511 | 250 |
Get the first 5 rows from a table.
tab.iloc[:5]
x | y | z | w | v | |
---|---|---|---|---|---|
0 | 0 | GOOG | 326.1157 | 458 | 0 |
1 | 1 | MSFT | 4.083758 | 511 | 250 |
2 | 2 | GOOG | 81.29854 | 96 | 0 |
3 | 3 | AAPL | 256.6323 | 998 | 50 |
4 | 4 | MSFT | 398.2529 | 103 | 0 |
Get all rows of the table where the y
column is equal to AAPL
.
tab.iloc[tab['y'] == 'AAPL']
x | y | z | w | v | |
---|---|---|---|---|---|
0 | 3 | AAPL | 256.6323 | 998 | 50 |
1 | 6 | AAPL | 470.0497 | 807 | 100 |
2 | 8 | AAPL | 280.2929 | 230 | 250 |
3 | 9 | AAPL | 223.9135 | 153 | 100 |
4 | 10 | AAPL | 293.3982 | 354 | 0 |
5 | 15 | AAPL | 444.315 | 286 | 50 |
6 | 17 | AAPL | 147.3942 | 895 | 50 |
7 | 18 | AAPL | 460f | 471 | 50 |
8 | 21 | AAPL | 315.1793 | 291 | 200 |
... | ... | ... | ... | ... | ... |
341 | 999 | AAPL | 142.8215 | 874 | 0 |
342 rows × 5 columns
Get all rows of the table where the y
column is equal to AAPL
, and only return the y
, z
and w
columns.
tab.iloc[tab['y'] == 'AAPL', ['y', 'z', 'w']]
y | z | w | |
---|---|---|---|
0 | AAPL | 256.6323 | 998 |
1 | AAPL | 470.0497 | 807 |
2 | AAPL | 280.2929 | 230 |
3 | AAPL | 223.9135 | 153 |
4 | AAPL | 293.3982 | 354 |
5 | AAPL | 444.315 | 286 |
6 | AAPL | 147.3942 | 895 |
7 | AAPL | 460f | 471 |
8 | AAPL | 315.1793 | 291 |
... | ... | ... | ... |
341 | AAPL | 142.8215 | 874 |
342 rows × 3 columns
Replace all null values in the column v
with the value -100
.
tab.iloc[tab['v'] == kx.q('0N'), 'v'] = -100
tab
x | y | z | w | v | |
---|---|---|---|---|---|
0 | 0 | GOOG | 326.1157 | 458 | 0 |
1 | 1 | MSFT | 4.083758 | 511 | 250 |
2 | 2 | GOOG | 81.29854 | 96 | 0 |
3 | 3 | AAPL | 256.6323 | 998 | 50 |
4 | 4 | MSFT | 398.2529 | 103 | 0 |
5 | 5 | GOOG | 429.4278 | 214 | 100 |
6 | 6 | AAPL | 470.0497 | 807 | 100 |
7 | 7 | GOOG | 409.6725 | 727 | 200 |
8 | 8 | AAPL | 280.2929 | 230 | 250 |
... | ... | ... | ... | ... | ... |
999 | 999 | AAPL | 142.8215 | 874 | 0 |
1,000 rows × 5 columns
Table.loc[]¶
Table.loc[:, :]
Access a group of rows and columns by label or by BooleanVector.
loc
is a label based form of indexing, but may also be used with a boolean array.
Allowed inputs are:
- A single label, e.g. 5 or 'a', (note that 5 is interpreted as a label of the index, and never as an integer position along the index)
- A list or array of labels, e.g. ['a', 'b', 'c']
- A slice object with labels, e.g. 'a':'f'
- Warning contrary to usual python slices, both the start and the stop are included
- A BooleanVector of the same length as the axis being sliced
- An alignable BooleanVector. The index of the key will be aligned before masking
- An alignable Index. The Index of the returned selection will be the input
- A callable function with one argument (the calling Table like object) and that returns valid output for indexing (e.g. one of the above)
Note:
When the Pandas API is enabled, using []
to index into a table will use Table.loc[]
Returns:
Type | Description |
---|---|
Table | A table containing only the columns / rows requested. |
Examples:
Get every row in the y
column.
tab[:, 'y']
y | |
---|---|
0 | GOOG |
1 | MSFT |
2 | GOOG |
3 | AAPL |
4 | MSFT |
5 | GOOG |
6 | AAPL |
7 | GOOG |
8 | AAPL |
... | ... |
999 | AAPL |
1,000 rows × 1 columns
Get all rows of the table where the value in the z
column is greater than 250.0
tab[tab['z'] > 250.0]
x | y | z | w | v | |
---|---|---|---|---|---|
0 | 0 | GOOG | 326.1157 | 458 | 0 |
1 | 3 | AAPL | 256.6323 | 998 | 50 |
2 | 4 | MSFT | 398.2529 | 103 | 0 |
3 | 5 | GOOG | 429.4278 | 214 | 100 |
4 | 6 | AAPL | 470.0497 | 807 | 100 |
5 | 7 | GOOG | 409.6725 | 727 | 200 |
6 | 8 | AAPL | 280.2929 | 230 | 250 |
7 | 10 | AAPL | 293.3982 | 354 | 0 |
8 | 14 | GOOG | 385.7119 | 424 | 0 |
... | ... | ... | ... | ... | ... |
517 | 998 | AAPL | 383.3942 | 13 | 0 |
518 rows × 5 columns
Replace all null values in the column v
with the value -100
.
tab.loc[tab['v'] == kx.LongAtom.null, 'v'] = -100
tab
x | y | z | w | v | |
---|---|---|---|---|---|
0 | 0 | GOOG | 326.1157 | 458 | 0 |
1 | 1 | MSFT | 4.083758 | 511 | 250 |
2 | 2 | GOOG | 81.29854 | 96 | 0 |
3 | 3 | AAPL | 256.6323 | 998 | 50 |
4 | 4 | MSFT | 398.2529 | 103 | 0 |
5 | 5 | GOOG | 429.4278 | 214 | 100 |
6 | 6 | AAPL | 470.0497 | 807 | 100 |
7 | 7 | GOOG | 409.6725 | 727 | 200 |
8 | 8 | AAPL | 280.2929 | 230 | 250 |
... | ... | ... | ... | ... | ... |
999 | 999 | AAPL | 142.8215 | 874 | 0 |
1,000 rows × 5 columns
Replace all locations in column v
where the value is -100
with a null.
tab[tab['v'] == -100, 'v'] = kx.LongAtom.null
tab
x | y | z | w | v | |
---|---|---|---|---|---|
0 | 0 | GOOG | 326.1157 | 458 | 0 |
1 | 1 | MSFT | 4.083758 | 511 | 250 |
2 | 2 | GOOG | 81.29854 | 96 | 0 |
3 | 3 | AAPL | 256.6323 | 998 | 50 |
4 | 4 | MSFT | 398.2529 | 103 | 0 |
5 | 5 | GOOG | 429.4278 | 214 | 100 |
6 | 6 | AAPL | 470.0497 | 807 | 100 |
7 | 7 | GOOG | 409.6725 | 727 | 200 |
8 | 8 | AAPL | 280.2929 | 230 | 250 |
... | ... | ... | ... | ... | ... |
999 | 999 | AAPL | 142.8215 | 874 | 0 |
1,000 rows × 5 columns
Usage of the loc
functionality under the hood additionally allows users to set columns within a table for single or multiple columns. Data passed for this can be q/Python.
tab['new_col'] = kx.random.random(1000, 1.0)
tab[['new_col1', 'new_col2']] = [20, kx.random.random(1000, kx.GUIDAtom.null)]
Table.sample()¶
Table.sample(n, frac, replace, weights, random_state, axis, ignore_index)
Sample random data from the table.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n | int | Number of rows to return. Cannot be used with frac . Default is 1 if frac is None. |
None |
frac | float | Fraction of the rows to return. Cannot be used with n . |
None |
replace | bool | Whether or not it should be possible to sample the same row twice. | False |
weights | None | Not yet implemented. | None |
random_state | None | Not yet implemented. | None |
axis | None | Not yet implemented. | None |
ignore_index | bool | Not yet implemented. | False |
Returns:
Type | Description |
---|---|
Table | A table with the given column(s) renamed. |
# The examples in this section will use this example table filled with random data
N = 1000
tab = kx.Table(data = {
'x': kx.q.til(N),
'y': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),
'z': kx.random.random(N, 500.0),
'w': kx.random.random(N, 1000),
'v': kx.random.random(N, [kx.LongAtom.null, 0, 50, 100, 200, 250])})
tab.head()
x | y | z | w | v | |
---|---|---|---|---|---|
0 | 0 | AAPL | 51.56208 | 1 | 200 |
1 | 1 | GOOG | 192.8475 | 238 | 200 |
2 | 2 | GOOG | 152.3489 | 774 | 0N |
3 | 3 | GOOG | 104.8755 | 913 | 0 |
4 | 4 | GOOG | 160.1738 | 506 | 100 |
Examples:
Sample 10 Rows.
tab.sample(n=10)
x | y | z | w | v | |
---|---|---|---|---|---|
0 | 604 | AAPL | 276.1397 | 785 | 0N |
1 | 893 | MSFT | 353.1517 | 659 | 200 |
2 | 590 | AAPL | 30.3018 | 634 | 200 |
3 | 881 | MSFT | 384.4023 | 461 | 0N |
4 | 224 | MSFT | 64.81406 | 455 | 250 |
5 | 740 | GOOG | 184.8131 | 730 | 100 |
6 | 172 | MSFT | 460.7157 | 329 | 0 |
7 | 738 | MSFT | 425.1625 | 556 | 50 |
8 | 943 | AAPL | 254.8109 | 792 | 100 |
9 | 400 | MSFT | 105.4923 | 628 | 250 |
Sample 10% of the rows.
tab.sample(frac=0.1)
x | y | z | w | v | |
---|---|---|---|---|---|
0 | 580 | AAPL | 218.6735 | 743 | 0N |
1 | 653 | MSFT | 458.9859 | 2 | 250 |
2 | 875 | MSFT | 30.793 | 273 | 0N |
3 | 641 | GOOG | 41.78246 | 973 | 50 |
4 | 436 | AAPL | 345.3793 | 103 | 250 |
5 | 261 | MSFT | 2.576664 | 990 | 100 |
6 | 780 | GOOG | 25.01553 | 7 | 0N |
7 | 851 | GOOG | 492.5621 | 688 | 0N |
8 | 285 | MSFT | 211.3792 | 982 | 0N |
... | ... | ... | ... | ... | ... |
99 | 107 | AAPL | 189.0718 | 667 | 0N |
100 rows × 5 columns
Sample 10% of the rows and allow the same row to be sampled twice.
tab.sample(frac=0.1, replace=True)
x | y | z | w | v | |
---|---|---|---|---|---|
0 | 576 | MSFT | 32.12644 | 902 | 0 |
1 | 298 | MSFT | 203.5688 | 36 | 50 |
2 | 48 | MSFT | 477.9462 | 345 | 0N |
3 | 172 | MSFT | 460.7157 | 329 | 0 |
4 | 144 | MSFT | 149.4715 | 441 | 250 |
5 | 65 | AAPL | 347.9628 | 848 | 50 |
6 | 336 | GOOG | 31.34298 | 618 | 0 |
7 | 137 | GOOG | 476.6447 | 676 | 0N |
8 | 985 | AAPL | 391.4535 | 71 | 200 |
... | ... | ... | ... | ... | ... |
99 | 217 | GOOG | 358.6286 | 781 | 250 |
100 rows × 5 columns
Table.select_dtypes()¶
Table.select_dtypes(include=None, exclude=None)
Return a subset of the DataFrame’s columns based on the column dtypes.
Allowed inputs for include
/exclude
are:
- A single dtype or string.
- A list of dtypes or strings.
- Inputs given for
include
andexclude
cannot overlap. - If both
include
andexclude
are passed thenexclude
is ignored. - If no columns are to be returned then
pykx.Identity(pykx.q('::'))
will be returned. - For a
kx.KeyedTable
the key columns are all always returned, filtering only applies to value columns. - For a
kx.KeyedTable
if no value column is to be returned thenpykx.Identity(pykx.q('::'))
will be returned.
The dtype kx.CharVector
will return an error. Use kx.CharAtom
for a column of single chars.
Both kx.*Atom
and kx.*Vector
will be taken to mean a column containing a single item per row of type *
. kx.List
will include/exclude any columns containing mixed list data (including string columns).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
include | Union[List, str] | A selection of dtypes or strings to be included. | None |
exclude | Union[List, str] | A selection of dtypes or strings to be excluded. | None |
At least one of these parameters must be supplied.
Returns:
Type | Description |
---|---|
Dataframe | The subset of the frame including the dtypes in include or excluding the dtypes in exclude . |
Examples:
The examples in the section will use the example table.
df = kx.Table(data = {
'c1': kx.SymbolVector(['a', 'b', 'c']),
'c2': kx.ShortVector([1, 2, 3]),
'c3': kx.LongVector([1, 2, 3]),
'c4': kx.IntVector([1, 2, 3])
})
Exclude columns containing symbols
df.select_dtypes(exclude = kx.SymbolVector)
c2 | c3 | c4 | |
---|---|---|---|
0 | 1h | 1 | 1i |
1 | 2h | 2 | 2i |
2 | 3h | 3 | 3i |
Include a list of column types
df.select_dtypes(include = [kx.ShortVector, kx.LongVector])
c2 | c3 | |
---|---|---|
0 | 1h | 1 |
1 | 2h | 2 |
2 | 3h | 3 |
If no columns are to be returned then pykx.Identity(pykx.q('::'))
is returned
df.select_dtypes(include = kx.FloatVector)
pykx.Identity(pykx.q('::'))
For a kx.KeyedTable
the key columns are all always returned, filtering only applies to value columns.
dfk = df.set_index('c1')
dfk.select_dtypes(include = kx.ShortVector)
c2 | |
---|---|
c1 | |
a | 1h |
b | 2h |
c | 3h |
For a kx.KeyedTable
if no value column is to be returned then pykx.Identity(pykx.q('::'))
will be returned.
dfk.select_dtypes(exclude=[kx.ShortAtom, kx.LongAtom, kx.IntAtom])
pykx.Identity(pykx.q('::'))
Table.tail()¶
Table.tail(n=5)
Get the last n rows from a table.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n | int | The number of rows to return. | 5 |
Returns:
Type | Description |
---|---|
Table | The last n rows of the table. |
Examples:
Return the last 5 rows of the table.
tab.tail()
x | y | z | w | v | |
---|---|---|---|---|---|
0 | 995 | MSFT | 30.33 | 974 | 100 |
1 | 996 | GOOG | 417.5746 | 591 | 0 |
2 | 997 | GOOG | 237.1697 | 119 | 250 |
3 | 998 | MSFT | 75.62617 | 784 | 50 |
4 | 999 | AAPL | 324.1628 | 231 | 200 |
Return the last 10 rows of the table.
tab.tail(10)
x | y | z | w | v | |
---|---|---|---|---|---|
0 | 990 | MSFT | 251.8079 | 223 | 0N |
1 | 991 | MSFT | 316.3818 | 523 | 0N |
2 | 992 | GOOG | 236.4901 | 855 | 0N |
3 | 993 | MSFT | 415.9962 | 542 | 100 |
4 | 994 | GOOG | 468.8945 | 175 | 100 |
5 | 995 | MSFT | 30.33 | 974 | 100 |
6 | 996 | GOOG | 417.5746 | 591 | 0 |
7 | 997 | GOOG | 237.1697 | 119 | 250 |
8 | 998 | MSFT | 75.62617 | 784 | 50 |
9 | 999 | AAPL | 324.1628 | 231 | 200 |
Sorting¶
Table.sort_values()¶
Table.sort_values(by, ascending=True)
Sort Table objects based on the value of a selected column.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
by | str or list of str | The name of the column to sort by. | required |
ascending | bool | The order in which to sort the values, ascending is True and descending is False. | True |
Returns:
Type | Description |
---|---|
Table | The resulting table after the sort has been performed |
Examples:
tab = kx.Table(data={
'column_a': [20, 3, 100],
'column_b': [56, 15, 42],
'column_c': [45, 80, 8]})
tab
column_a | column_b | column_c | |
---|---|---|---|
0 | 20 | 56 | 45 |
1 | 3 | 15 | 80 |
2 | 100 | 42 | 8 |
Sort a Table by the second column
tab.sort_values(by='column_b')
column_a | column_b | column_c | |
---|---|---|---|
0 | 3 | 15 | 80 |
1 | 100 | 42 | 8 |
2 | 20 | 56 | 45 |
Sort a Table by the third column in descending order
tab.sort_values(by='column_c', ascending=False)
column_a | column_b | column_c | |
---|---|---|---|
0 | 3 | 15 | 80 |
1 | 20 | 56 | 45 |
2 | 100 | 42 | 8 |
Table.nsmallest()¶
Table.nsmallest(
n,
columns,
keep='first'
)
Return the first n rows of a Table ordered by columns in ascending order
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n | int | The number of rows to return | required |
columns | str or list of str | Column labels to order by | required |
keep | str | Can be 'first', 'last' or 'all'. Used in case of duplicate values | 'first' |
Returns
Type | Description |
---|---|
Table | The first n rows ordered by the given columns in ascending order |
Examples:
Sample table
tab = kx.Table(data={
'column_a': [2, 3, 2, 2, 1],
'column_b': [56, 15, 42, 102, 32],
'column_c': [45, 80, 8, 61, 87]})
tab
column_a | column_b | column_c | |
---|---|---|---|
0 | 2 | 56 | 45 |
1 | 3 | 15 | 80 |
2 | 2 | 42 | 8 |
3 | 2 | 102 | 61 |
4 | 1 | 32 | 87 |
Get the row where the first column is the smallest
tab.nsmallest(n=1, columns='column_a')
column_a | column_b | column_c | |
---|---|---|---|
0 | 1 | 32 | 87 |
Get the 4 rows where the first column is the smallest, then any equal values are sorted based on the second column
tab.nsmallest(n=4,columns=['column_a', 'column_b'])
column_a | column_b | column_c | |
---|---|---|---|
0 | 1 | 32 | 87 |
1 | 2 | 42 | 8 |
2 | 2 | 56 | 45 |
3 | 2 | 102 | 61 |
Get the 2 rows with the smallest values for the first column and in case of duplicates, take the last entry in the table
tab.nsmallest(n=2, columns=['column_a'], keep='last')
column_a | column_b | column_c | |
---|---|---|---|
0 | 1 | 32 | 87 |
1 | 2 | 102 | 61 |
Table.nlargest()¶
Table.nlargest(
n,
columns,
keep='first'
)
Return the first n rows of a Table ordered by columns in descending order
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n | int | The number of rows to return | required |
columns | str or list of str | Column labels to order by | required |
keep | str | Can be 'first', 'last' or 'all'. Used in case of duplicate values | 'first' |
Returns
Type | Description |
---|---|
Table | The first n rows ordered by the given columns in descending order |
Examples:
Sample table
tab = kx.Table(data={
'column_a': [2, 3, 2, 2, 1],
'column_b': [102, 15, 42, 56, 32],
'column_c': [45, 80, 8, 61, 87]})
tab
column_a | column_b | column_c | |
---|---|---|---|
0 | 2 | 102 | 45 |
1 | 3 | 15 | 80 |
2 | 2 | 42 | 8 |
3 | 2 | 56 | 61 |
4 | 1 | 32 | 87 |
Get the row with the largest value for the first column
tab.nlargest(n=1, columns='column_a')
column_a | column_b | column_c | |
---|---|---|---|
0 | 3 | 15 | 80 |
Get the 4 rows where the first column is the largest, then any equal values are sorted based on the third column
tab.nlargest(n=4,columns=['column_a', 'column_c'])
column_a | column_b | column_c | |
---|---|---|---|
0 | 3 | 15 | 80 |
1 | 2 | 56 | 61 |
2 | 2 | 102 | 45 |
3 | 2 | 42 | 8 |
Get the 2 rows with the smallest values for the first column and in case of duplicates, take all rows of the same value for that column
tab.nsmallest(n=2, columns=['column_a'], keep='all')
column_a | column_b | column_c | |
---|---|---|---|
0 | 1 | 32 | 87 |
1 | 2 | 102 | 45 |
2 | 2 | 42 | 8 |
3 | 2 | 56 | 61 |
Data Joins/Merging¶
Table.merge()¶
Table.merge(
right,
how='inner',
on=None,
left_on=None,
right_on=None,
left_index=False,
right_index=False,
sort=False,
suffixes=('_x', '_y'),
copy=True,
validate=None,
q_join=False
)
Merge Table or KeyedTable objects with a database-style join.
The join is done on columns or keys. If joining columns on columns, the Table key will be ignored. Otherwise if joining keys on keys or keys on a column or columns, the index will be passed on. When performing a cross merge, no column specifications to merge on are allowed.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
right | Union[Table/KeyedTable] | The object to merge with. | required |
how | str | The type of join to be used. One of {‘left’, ‘right’, ‘outer’, ‘inner’, ‘cross’}. | ‘inner’ |
on | str | The column name to join on. | None |
left_on | str | The column name in the left table to join on. | None |
right_on | str | The column name in the right table to join on. | None |
left_index | bool | Use the index of the left Table. | False |
right_index | bool | Use the index of the right Table. | False |
sort | bool | Sort the join keys of the resulting table. | False |
suffixes | Tuple(str, str) | The number of rows to return. | ('_x', '_y') |
copy | bool | If False avoid copies and modify the input table. | None |
validate | str | If specified checks if merge matches specified type. - “one_to_one” or “1:1”: check if merge keys are unique in both left and right datasets. - “one_to_many” or “1:m”: check if merge keys are unique in left dataset. - “many_to_one” or “m:1”: check if merge keys are unique in right dataset. - “many_to_many” or “m:m”: allowed, but does not result in checks. |
None |
q_join | bool | If True perform native q joins instead of the pandas SQL like joins. More documentation around these joins can be found here. | False |
Returns:
Type | Description |
---|---|
Table / KeyedTable | The resulting table-like object after the join has been preformed. |
Examples:
Merge tab1 and tab2 on the lkey and rkey columns. The value columns have the default suffixes, _x and _y, appended.
tab1 = kx.Table(data={'lkey': ['foo', 'bar', 'baz', 'foo'], 'value': [1, 2, 3, 5]})
tab2 = kx.Table(data={'rkey': ['foo', 'bar', 'baz', 'foo'], 'value': [5, 6, 7, 8]})
tab1.merge(tab2, left_on='lkey', right_on='rkey')
lkey | value_x | rkey | value_y | |
---|---|---|---|---|
0 | foo | 1 | foo | 5 |
1 | foo | 1 | foo | 8 |
2 | foo | 5 | foo | 5 |
3 | foo | 5 | foo | 8 |
4 | bar | 2 | bar | 6 |
5 | baz | 3 | baz | 7 |
Merge tab1 and tab2 with specified left and right suffixes appended to any overlapping columns.
tab1.merge(tab2, left_on='lkey', right_on='rkey', suffixes=('_left', '_right'))
lkey | value_left | rkey | value_right | |
---|---|---|---|---|
0 | foo | 1 | foo | 5 |
1 | foo | 1 | foo | 8 |
2 | foo | 5 | foo | 5 |
3 | foo | 5 | foo | 8 |
4 | bar | 2 | bar | 6 |
5 | baz | 3 | baz | 7 |
Merge tab1 and tab2 but raise an exception if the Tables have any overlapping columns.
try:
tab1.merge(tab2, left_on='lkey', right_on='rkey', suffixes=(False, False))
except BaseException as e:
print(f'Caught Error: {e}')
Caught Error: Columns overlap but no suffix specified: ['value']
tab1 = kx.Table(data={'a': ['foo', 'bar'], 'b': [1, 2]})
tab2 = kx.Table(data={'a': ['foo', 'baz'], 'c': [3, 4]})
Merge tab1 and tab2 on the a
column using an inner join.
tab1.merge(tab2, how='inner', on='a')
a | b | c | |
---|---|---|---|
0 | foo | 1 | 3 |
Merge tab1 and tab2 on the a
column using a left join.
tab1.merge(tab2, how='left', on='a')
a | b | c | |
---|---|---|---|
0 | foo | 1 | 3 |
1 | bar | 2 | 0N |
Merge tab1 and tab2 using a cross join.
tab1 = kx.Table(data={'left': ['foo', 'bar']})
tab2 = kx.Table(data={'right': [7, 8]})
tab1.merge(tab2, how='cross')
left | right | |
---|---|---|
0 | foo | 7 |
1 | foo | 8 |
2 | bar | 7 |
3 | bar | 8 |
Merge tab1 and tab2_keyed using a left join with q_join
set to True
. Inputs/Outputs will match q lj behaviour.
tab1 = kx.Table(data={'a': ['foo', 'bar', 'baz'], 'b': [1, 2, 3]})
tab2 = kx.Table(data={'a': ['foo', 'baz', 'baz'], 'c': [3, 4, 5]})
tab2_keyed = tab2.set_index(1)
tab1.merge(tab2_keyed, how='left', q_join=True)
a | b | c | |
---|---|---|---|
0 | foo | 1 | 3 |
1 | baz | 3 | 4 |
Inputs/Outputs will match q ij behaviour.
tab3 = kx.Table(data={'a': ['foo', 'bar'], 'd': [6, 7]})
tab3_keyed = tab3.set_index(1)
tab1.merge(tab3_keyed, how='inner', q_join=True)
a | b | d | |
---|---|---|---|
0 | foo | 1 | 6 |
1 | bar | 2 | 7 |
Merge using q_join
set to True
, and how
set to left
, will fail when tab2
is not a keyed table.
#Will error as Left Join requires a keyed column for the right dataset.
try:
tab1.merge(tab2, how='left', q_join=True)
except ValueError as e:
print(f'Caught Error: {e}')
Caught Error: Left Join requires a keyed table for the right dataset.
Table.merge_asof()¶
Table.merge_asof(
right,
on=None,
left_on=None,
right_on=None,
left_index=False,
right_index=False,
by=None,
left_by=None,
right_by=None,
suffixes=('_x', '_y'),
tolerance=None,
allow_exact_matches=True,
direction='backward'
)
Merge Table or KeyedTable objects with a database-style join.
The join is done on columns or keys. If joining columns on columns, the Table key will be ignored. Otherwise if joining keys on keys or keys on a column or columns, the index will be passed on. When performing a cross merge, no column specifications to merge on are allowed.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
right | Union[Table/KeyedTable] | The object to merge with. | required |
how | str | The type of join to be used. One of {‘left’, ‘right’, ‘outer’, ‘inner’, ‘cross’}. | ‘inner’ |
on | str | The column name to join on. | None |
left_on | str | The column name in the left table to join on. | None |
right_on | str | The column name in the right table to join on. | None |
left_index | bool | Use the index of the left Table. | False |
right_index | bool | Use the index of the right Table. | False |
by | str | Not yet implemented. | None |
left_by | str | Field names to match on in the left table. | None |
right_by | str | Field names to match on in the right table. | None |
suffixes | Tuple(str, str) | The number of rows to return. | ('_x', '_y') |
tolerance | Any | Not yet implemented. | None |
allow_exact_matches | bool | Not yet implemented. | True |
direction | str | Not yet implemented. | 'backward' |
Returns:
Type | Description |
---|---|
Table / KeyedTable | The resulting table like object after the join has been preformed. |
Examples:
Perform a simple asof join on two tables.
left = kx.Table(data={"a": [1, 5, 10], "left_val": ["a", "b", "c"]})
right = kx.Table(data={"a": [1, 2, 3, 6, 7], "right_val": [1, 2, 3, 6, 7]})
left
a | left_val | |
---|---|---|
0 | 1 | a |
1 | 5 | b |
2 | 10 | c |
right
a | right_val | |
---|---|---|
0 | 1 | 1 |
1 | 2 | 2 |
2 | 3 | 3 |
3 | 6 | 6 |
4 | 7 | 7 |
left.merge_asof(right)
a | left_val | right_val | |
---|---|---|---|
0 | 1 | a | 1 |
1 | 5 | b | 3 |
2 | 10 | c | 7 |
Perform a asof join on two tables but first merge them on the by column.
trades = kx.Table(data={
"time": [
pd.Timestamp("2016-05-25 13:30:00.023"),
pd.Timestamp("2016-05-25 13:30:00.023"),
pd.Timestamp("2016-05-25 13:30:00.030"),
pd.Timestamp("2016-05-25 13:30:00.041"),
pd.Timestamp("2016-05-25 13:30:00.048"),
pd.Timestamp("2016-05-25 13:30:00.049"),
pd.Timestamp("2016-05-25 13:30:00.072"),
pd.Timestamp("2016-05-25 13:30:00.075")
],
"ticker": [
"GOOG",
"MSFT",
"MSFT",
"MSFT",
"GOOG",
"AAPL",
"GOOG",
"MSFT"
],
"bid": [720.50, 51.95, 51.97, 51.99, 720.50, 97.99, 720.50, 52.01],
"ask": [720.93, 51.96, 51.98, 52.00, 720.93, 98.01, 720.88, 52.03]
})
quotes = kx.Table(data={
"time": [
pd.Timestamp("2016-05-25 13:30:00.023"),
pd.Timestamp("2016-05-25 13:30:00.038"),
pd.Timestamp("2016-05-25 13:30:00.048"),
pd.Timestamp("2016-05-25 13:30:00.048"),
pd.Timestamp("2016-05-25 13:30:00.048")
],
"ticker": ["MSFT", "MSFT", "GOOG", "GOOG", "AAPL"],
"price": [51.95, 51.95, 720.77, 720.92, 98.0],
"quantity": [75, 155, 100, 100, 100]
})
trades
time | ticker | bid | ask | |
---|---|---|---|---|
0 | 2016.05.25D13:30:00.023000000 | GOOG | 720.5 | 720.93 |
1 | 2016.05.25D13:30:00.023000000 | MSFT | 51.95 | 51.96 |
2 | 2016.05.25D13:30:00.030000000 | MSFT | 51.97 | 51.98 |
3 | 2016.05.25D13:30:00.041000000 | MSFT | 51.99 | 52f |
4 | 2016.05.25D13:30:00.048000000 | GOOG | 720.5 | 720.93 |
5 | 2016.05.25D13:30:00.049000000 | AAPL | 97.99 | 98.01 |
6 | 2016.05.25D13:30:00.072000000 | GOOG | 720.5 | 720.88 |
7 | 2016.05.25D13:30:00.075000000 | MSFT | 52.01 | 52.03 |
quotes
time | ticker | price | quantity | |
---|---|---|---|---|
0 | 2016.05.25D13:30:00.023000000 | MSFT | 51.95 | 75 |
1 | 2016.05.25D13:30:00.038000000 | MSFT | 51.95 | 155 |
2 | 2016.05.25D13:30:00.048000000 | GOOG | 720.77 | 100 |
3 | 2016.05.25D13:30:00.048000000 | GOOG | 720.92 | 100 |
4 | 2016.05.25D13:30:00.048000000 | AAPL | 98f | 100 |
trades.merge_asof(quotes, on="time")
time | ticker_x | bid | ask | ticker_y | price | quantity | |
---|---|---|---|---|---|---|---|
0 | 2016.05.25D13:30:00.023000000 | GOOG | 720.5 | 720.93 | MSFT | 51.95 | 75 |
1 | 2016.05.25D13:30:00.023000000 | MSFT | 51.95 | 51.96 | MSFT | 51.95 | 75 |
2 | 2016.05.25D13:30:00.030000000 | MSFT | 51.97 | 51.98 | MSFT | 51.95 | 75 |
3 | 2016.05.25D13:30:00.041000000 | MSFT | 51.99 | 52f | MSFT | 51.95 | 155 |
4 | 2016.05.25D13:30:00.048000000 | GOOG | 720.5 | 720.93 | AAPL | 98f | 100 |
5 | 2016.05.25D13:30:00.049000000 | AAPL | 97.99 | 98.01 | AAPL | 98f | 100 |
6 | 2016.05.25D13:30:00.072000000 | GOOG | 720.5 | 720.88 | AAPL | 98f | 100 |
7 | 2016.05.25D13:30:00.075000000 | MSFT | 52.01 | 52.03 | AAPL | 98f | 100 |
Analytic functionality¶
# All the examples in this section will use this example table.
N = 100
kx.Table(data={
'sym': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),
'price': 250 + kx.random.random(N, 500.0),
'traded': 100 - kx.random.random(N, 200),
'hold': kx.random.random(N, False)
})
tab
column_a | column_b | column_c | |
---|---|---|---|
0 | 2 | 102 | 45 |
1 | 3 | 15 | 80 |
2 | 2 | 42 | 8 |
3 | 2 | 56 | 61 |
4 | 1 | 32 | 87 |
Table.abs()¶
Table.abs(numeric_only=False)
Take the absolute value of each element in the table. This will raise an error if there are columns that contain data that have no absolute value.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
numeric_only | bool | Only use columns of the table that can be converted to an absolute value. | False |
Returns:
Type | Description |
---|---|
Table / KeyedTable | The resulting table like object with only positive numerical values. |
tab.abs(numeric_only=True)
column_a | column_b | column_c | |
---|---|---|---|
0 | 2 | 102 | 45 |
1 | 3 | 15 | 80 |
2 | 2 | 42 | 8 |
3 | 2 | 56 | 61 |
4 | 1 | 32 | 87 |
Table.count()¶
Table.count(axis=0, numeric_only=False)
Returns the count of non null values across the given axis.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
axis | int | The axis to count elements across 1 is columns, 0 is rows. | 0 |
numeric_only | bool | Only use columns of the table that are of a numeric data type. | False |
Returns:
Type | Description |
---|---|
Dictionary | A dictionary where the key represent the column name / row number and the values are the result of calling count on that column / row. |
tab.count()
column_a | 5 |
---|---|
column_b | 5 |
column_c | 5 |
Table.max()¶
Table.max(axis=0, skipna=True, numeric_only=False)
Returns the maximum value across the given axis.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
axis | int | The axis to calculate the maximum across 0 is columns, 1 is rows. | 0 |
skipna | bool | Ignore any null values along the axis. | True |
numeric_only | bool | Only use columns of the table that are of a numeric data type. | False |
Returns:
Type | Description |
---|---|
Dictionary | A dictionary where the key represents the column name / row number and the values are the result of calling max on that column / row. |