Pandas API¶

The purpose of this notebook is to provide a demonstration of the capabilities of the pandas like API for PyKX Table objects.

To follow along please download this notebook using the following link.

This demonstration will outline the following

Constructing Tables
Metadata
Querying and Data Interrogation
Data Joins/Merging
Analytic Functionality
Data Preprocessing

In [2]:

Copied!





import pykx as kx
import numpy as np
import pandas as pd
kx.q.system.console_size = [10, 80]
import pykx as kx
import numpy as np
import pandas as pd
kx.q.system.console_size = [10, 80]

Constructing Tables¶

Table¶

Create a table from a list of rows or by converting a Python dictionary object

Parameters:

Name	Type	Description	Default
x	Union[list, array]	An array like object containing the contents of each row of the table.	None
data	dict	A dictionary to be converted into a Table object.	None
columns	list[str]	A list of column names to use when constructing from an array of rows.	None

Returns:

Type	Description
Table	The newly constructed table object.

Examples:

Create a table from a dictionary object.

In [3]:

Copied!

kx.Table(data={'x': list(range(10)), 'y': [10 - x for x in range(10)]})
kx.Table(data={'x': list(range(10)), 'y': [10 - x for x in range(10)]})

Out[3]:

	x	y

0	0	10
1	1	9
2	2	8
3	3	7
4	4	6
5	5	5
6	6	4
7	7	3
8	8	2

Create a Table from an array like object.

In [4]:

Copied!

kx.Table([[0, 1], [2, 3], [4, 5], [6, 7], [8, 9]])
kx.Table([[0, 1], [2, 3], [4, 5], [6, 7], [8, 9]])

Out[4]:

	x	x1

0	0	1
1	2	3
2	4	5
3	6	7
4	8	9

Create a Table from an array like object and provide names for the columns to use.

In [5]:

Copied!

kx.Table([[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11]], columns=['x', 'y', 'z'])
kx.Table([[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11]], columns=['x', 'y', 'z'])

Out[5]:

	x	y	z

0	0	1	2
1	3	4	5
2	6	7	8
3	9	10	11

Keyed Table¶

Create a keyed table from a list of rows or by converting a Python dictionary object

Parameters:

Name	Type	Description	Default
x	Union[list, array]	An array like object containing the contents of each row of the table.	None
data	dict	A dictionary to be converted into a Table object.	None
columns	list[str]	A list of column names to use when constructing from an array of rows.	None
index	list[Any]	An array like object to use as the index column of the table.	None

Returns:

Type	Description
KeyedTable	The newly constructed keyed table object.

Examples:

Create a keyed table from a dictionary object.

In [6]:

Copied!

kx.KeyedTable(data={'x': list(range(10)), 'y': list(10 - x for x in range(10))})
kx.KeyedTable(data={'x': list(range(10)), 'y': list(10 - x for x in range(10))})

Out[6]:

	x	y
idx
0	0	10
1	1	9
2	2	8
3	3	7
4	4	6
5	5	5
6	6	4
7	7	3
8	8	2

Create a keyed table from a list of rows.

In [7]:

Copied!

kx.KeyedTable([[0, 1], [2, 3], [4, 5], [6, 7], [8, 9]])
kx.KeyedTable([[0, 1], [2, 3], [4, 5], [6, 7], [8, 9]])

Out[7]:

	x	x1
idx
0	0	1
1	2	3
2	4	5
3	6	7
4	8	9

Create a keyed table from a list of rows and provide names for the resulting columns.

In [8]:

Copied!

kx.KeyedTable([[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11]], columns=['x', 'y', 'z'])
kx.KeyedTable([[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11]], columns=['x', 'y', 'z'])

Out[8]:

	x	y	z
idx
0	0	1	2
1	3	4	5
2	6	7	8
3	9	10	11

Create a keyed table with a specified index column.

In [9]:

Copied!

kx.KeyedTable(data={'x': list(range(10)), 'y': list(10 - x for x in range(10))}, index=[2 * x for x in range(10)])
kx.KeyedTable(data={'x': list(range(10)), 'y': list(10 - x for x in range(10))}, index=[2 * x for x in range(10)])

Out[9]:

	x	y
idx
0	0	10
2	1	9
4	2	8
6	3	7
8	4	6
10	5	5
12	6	4
14	7	3
16	8	2

Metadata¶

In [10]:

Copied!





N = 1000
tab = kx.Table(data = {
    'x': kx.q.til(N),
    'y': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),
    'z': kx.random.random(N, 500.0),
    'w': kx.random.random(N, 1000),
    'v': kx.random.random(N, [kx.LongAtom.null, 0, 50, 100, 200, 250])})
tab
N = 1000
tab = kx.Table(data = {
    'x': kx.q.til(N),
    'y': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),
    'z': kx.random.random(N, 500.0),
    'w': kx.random.random(N, 1000),
    'v': kx.random.random(N, [kx.LongAtom.null, 0, 50, 100, 200, 250])})
tab

Out[10]:

	x	y	z	w	v

0	0	AAPL	454.7063	766	100
1	1	AAPL	149.4238	916	0
2	2	MSFT	227.0315	469	0
3	3	GOOG	78.47098	280	250
4	4	MSFT	23.49633	369	100
5	5	GOOG	479.5628	254	200
6	6	MSFT	398.3865	388	200
7	7	MSFT	132.0956	318	250
...	...	...	...	...	...
999	999	GOOG	3.79704	246	0

1,000 rows × 5 columns

Table.columns¶

Get the name of each column in the table

In [11]:

Copied!

tab.columns
tab.columns

Out[11]:

pykx.SymbolVector(pykx.q('`x`y`z`w`v'))

Table.dtypes¶

Get the datatypes of the table columns

In [12]:

Copied!

tab.dtypes
tab.dtypes

Out[12]:

	columns	datatypes	type

0	x	"kx.LongAtom"	"kx.LongAtom"
1	y	"kx.SymbolAtom"	"kx.SymbolAtom"
2	z	"kx.FloatAtom"	"kx.FloatAtom"
3	w	"kx.LongAtom"	"kx.LongAtom"
4	v	"kx.LongAtom"	"kx.LongAtom"

Table.empty¶

Returns True if the table is empty otherwise returns False.

In [13]:

Copied!

tab.empty
tab.empty

Out[13]:

pykx.BooleanAtom(pykx.q('0b'))

Table.ndim¶

Get the nuber of columns within the table.

In [14]:

Copied!

tab.ndim
tab.ndim

Out[14]:

pykx.LongAtom(pykx.q('2'))

Table.shape¶

Get the shape of the table as a tuple (number of rows, number of columns).

In [15]:

Copied!

tab.shape
tab.shape

Out[15]:

(pykx.LongAtom(pykx.q('1000')), pykx.LongAtom(pykx.q('5')))

Table.size¶

Get the number of values in the table (rows * cols).

In [16]:

Copied!

tab.size
tab.size

Out[16]:

pykx.LongAtom(pykx.q('5000'))

Querying and Data Interrogation¶

In [17]:

Copied!





# The examples in this section will use this example table filled with random data
N = 1000
tab = kx.Table(data = {
    'x': kx.q.til(N),
    'y': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),
    'z': kx.random.random(N, 500.0),
    'w': kx.random.random(N, 1000),
    'v': kx.random.random(N, [kx.LongAtom.null, 0, 50, 100, 200, 250])})
tab
# The examples in this section will use this example table filled with random data
N = 1000
tab = kx.Table(data = {
    'x': kx.q.til(N),
    'y': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),
    'z': kx.random.random(N, 500.0),
    'w': kx.random.random(N, 1000),
    'v': kx.random.random(N, [kx.LongAtom.null, 0, 50, 100, 200, 250])})
tab

Out[17]:

	x	y	z	w	v

0	0	GOOG	326.1157	458	0
1	1	MSFT	4.083758	511	250
2	2	GOOG	81.29854	96	0
3	3	AAPL	256.6323	998	50
4	4	MSFT	398.2529	103	0
5	5	GOOG	429.4278	214	100
6	6	AAPL	470.0497	807	100
7	7	GOOG	409.6725	727	200
...	...	...	...	...	...
999	999	AAPL	142.8215	874	0

1,000 rows × 5 columns

Table.all()¶

Table.all(axis=0, bool_only=False, skipna=True)

Returns whether or not all values across the given axis have a truthy value.

Parameters:

Name	Type	Description	Default
axis	int	The axis to calculate `all` across 0 is columns, 1 is rows.	0
bool_only	bool	Only use columns of the table that are boolean types.	False
skipna	bool	Ignore any null values along the axis.	True

Returns:

Type	Description
Dictionary	A dictionary where the key represents the column name / row number and the values are the result of calling `all` on that column / row.

In [18]:

Copied!

tab.all()
tab.all()

Out[18]:



x	0b
y	1b
z	1b
w	1b
v	0b

Table.any()¶

Table.any(axis=0, bool_only=False, skipna=True)

Returns whether or not any values across the given axis have a truthy value.

Parameters:

Name	Type	Description	Default
axis	int	The axis to calculate `any` across 0 is columns, 1 is rows.	0
bool_only	bool	Only use columns of the table that are boolean types.	False
skipna	bool	Ignore any null values along the axis.	True

Returns:

Type	Description
Dictionary	A dictionary where the key represents the column name / row number and the values are the result of calling `any` on that column / row.

In [19]:

Copied!

tab.any()
tab.any()

Out[19]:



x	1b
y	1b
z	1b
w	1b
v	1b

Table.at[]¶

Table.at[row, col]

Access a single value for a row / column pair.

Similar to loc[], in that both provide label-based lookups. Use at if you only need to get or set a single value.

The at property can be used for both assignment and retrieval of values at a given row and column.

Examples:

Get the value of the z column in the 997th row.

In [20]:

Copied!

tab.at[997, 'z']
tab.at[997, 'z']

Out[20]:

pykx.FloatAtom(pykx.q('20.61333'))

Reassign the value of the z column in the 997th row to 3.14159.

In [21]:

Copied!

tab.at[997, 'z'] = 3.14159
tab.at[997, 'z']
tab.at[997, 'z'] = 3.14159
tab.at[997, 'z']

Out[21]:

pykx.FloatAtom(pykx.q('3.14159'))

Table.get()¶

Table.get(key, default=None)

Get a column or columns from a table by key, if the key does not exist return the default value.

Parameters:

Name	Type	Description	Default
key	Union[str, list[str]]	The column name or list of names to get from the table.	required
default	int	The default value if the key is not found.	None

Returns:

Type	Description
Union[Table, Any]	A table containing only the columns requested or the default value.

Examples:

Get the y column from the table.

In [22]:

Copied!

tab.get('y')
tab.get('y')

/usr/local/lib/python3.10/site-packages/pykx/pandas_api/pandas_indexing.py:40: FutureWarning: 
	Single column retrieval using 'get' method will return a vector/list object in release 3.0+
	To access the vector/list directly use table['column_name']
  warnings.warn("\n\tSingle column retrieval using 'get' method will return a vector/list "

Out[22]:

	y

0	GOOG
1	MSFT
2	GOOG
3	AAPL
4	MSFT
5	GOOG
6	AAPL
7	GOOG
...	...
999	AAPL

1,000 rows × 1 columns

Get the y and z columns from the table.

In [23]:

Copied!

tab.get(['y', 'z'])
tab.get(['y', 'z'])

Out[23]:

	y	z

0	GOOG	326.1157
1	MSFT	4.083758
2	GOOG	81.29854
3	AAPL	256.6323
4	MSFT	398.2529
5	GOOG	429.4278
6	AAPL	470.0497
7	GOOG	409.6725
...	...	...
999	AAPL	142.8215

1,000 rows × 2 columns

Attempt to get the q column from the table and receive none as that column does not exist.

In [24]:

Copied!

print(tab.get('q'))
print(tab.get('q'))

None

/usr/local/lib/python3.10/site-packages/pykx/pandas_api/pandas_indexing.py:40: FutureWarning: 
	Single column retrieval using 'get' method will return a vector/list object in release 3.0+
	To access the vector/list directly use table['column_name']
  warnings.warn("\n\tSingle column retrieval using 'get' method will return a vector/list "

Attempt to get the q column from the table and receive the default value not found as that column does not exist.

In [25]:

Copied!

tab.get('q', 'not found')
tab.get('q', 'not found')

Out[25]:

'not found'

Table.head()¶

Table.head(n=5)

Get the first n rows from a table.

Parameters:

Name	Type	Description	Default
n	int	The number of rows to return.	5

Returns:

Type	Description
Table	The first `n` rows of the table.

Examples:

Return the first 5 rows of the table.

In [26]:

Copied!

tab.head()
tab.head()

Out[26]:

	x	y	z	w	v

0	0	GOOG	326.1157	458	0
1	1	MSFT	4.083758	511	250
2	2	GOOG	81.29854	96	0
3	3	AAPL	256.6323	998	50
4	4	MSFT	398.2529	103	0

Return the first 10 rows of the table.

In [27]:

Copied!

tab.head(10)
tab.head(10)

Out[27]:

	x	y	z	w	v

0	0	GOOG	326.1157	458	0
1	1	MSFT	4.083758	511	250
2	2	GOOG	81.29854	96	0
3	3	AAPL	256.6323	998	50
4	4	MSFT	398.2529	103	0
5	5	GOOG	429.4278	214	100
6	6	AAPL	470.0497	807	100
7	7	GOOG	409.6725	727	200
8	8	AAPL	280.2929	230	250

Table.isna()¶

Table.isna()

Detects null values in a Table object.

Parameters:

Returns:

Type	Description
Table	A Table with the same shape as the original but containing boolean values. `1b` represents a null value present in a cell, `0b` represents the opposite.

In [28]:

Copied!





tabDemo = kx.Table(data= {
    'a': [1, 0, float('nan')],
    'b': [1, 0, float('nan')],
    'c': [float('nan'), 4, 0]
    })
tabDemo = kx.Table(data= {
    'a': [1, 0, float('nan')],
    'b': [1, 0, float('nan')],
    'c': [float('nan'), 4, 0]
    })

In [29]:

Copied!

tabDemo.isna()
tabDemo.isna()

Out[29]:

	a	b	c

0	0b	0b	1b
1	0b	0b	0b
2	1b	1b	0b

Table.isnull()¶

Table.isnull()

Alias of Table.isna().

Detects null values in a Table object.

Parameters:

Returns:

Type	Description
Table	A Table with the same shape as the original but containing boolean values. `1b` represents a null value present in a cell, `0b` represents the opposite.

In [30]:

Copied!

tabDemo.isnull()
tabDemo.isnull()

Out[30]:

	a	b	c

0	0b	0b	1b
1	0b	0b	0b
2	1b	1b	0b

Table.notna()¶

Table.notna()

Boolean inverse of Table.isna().

Detects non-null values on a Table object.

Parameters:

Returns:

Type	Description
Table	A Table with the same shape as the original but containing boolean values. `0b` represents a null value present in a cell, `1b` represents the opposite.

In [31]:

Copied!

tabDemo.notna()
tabDemo.notna()

Out[31]:

	a	b	c

0	1b	1b	0b
1	1b	1b	1b
2	0b	0b	1b

Table.notnull()¶

Table.notna()

Boolean inverse of Table.isnull(). Alias of Table.isna()

Detects non-null values on a Table object.

Parameters:

Returns:

Type	Description
Table	A Table with the same shape as the original but containing boolean values. `0b` represents a null value present in a cell, `1b` represents the opposite.

In [32]:

Copied!

tabDemo.notnull()
tabDemo.notnull()

Out[32]:

	a	b	c

0	1b	1b	0b
1	1b	1b	1b
2	0b	0b	1b

Table.iloc[]¶

Table.iloc[:, :]

Purely integer-location based indexing for selection by position.

iloc is primarily integer position based (from 0 to length-1 of the axis), but may also be used with a BooleanVector.

Allowed inputs are:

An integer, e.g. 5.
A list or array of integers, e.g. [4, 3, 0].
A slice object with ints, e.g. 1:7.
A BooleanVector.
A callable function with one argument (the calling Series or DataFrame) and that returns valid output for indexing (one of the above). This is useful in method chains, when you don’t have a reference to the calling object, but would like to base your selection on some value.
A tuple of row and column indexes. The tuple elements consist of one of the above inputs, e.g. (0, 1).

Returns:

Type	Description
Table	A table containing only the columns / rows requested.

Examples:

Get the second row from a table.

In [33]:

Copied!

tab.iloc[1]
tab.iloc[1]

Out[33]:

	x	y	z	w	v

0	1	MSFT	4.083758	511	250

Get the first 5 rows from a table.

In [34]:

Copied!

tab.iloc[:5]
tab.iloc[:5]

Out[34]:

	x	y	z	w	v

0	0	GOOG	326.1157	458	0
1	1	MSFT	4.083758	511	250
2	2	GOOG	81.29854	96	0
3	3	AAPL	256.6323	998	50
4	4	MSFT	398.2529	103	0

Get all rows of the table where the y column is equal to AAPL.

In [35]:

Copied!

tab.iloc[tab['y'] == 'AAPL']
tab.iloc[tab['y'] == 'AAPL']

Out[35]:

	x	y	z	w	v

0	3	AAPL	256.6323	998	50
1	6	AAPL	470.0497	807	100
2	8	AAPL	280.2929	230	250
3	9	AAPL	223.9135	153	100
4	10	AAPL	293.3982	354	0
5	15	AAPL	444.315	286	50
6	17	AAPL	147.3942	895	50
7	18	AAPL	460f	471	50
...	...	...	...	...	...
341	999	AAPL	142.8215	874	0

342 rows × 5 columns

Get all rows of the table where the y column is equal to AAPL, and only return the y, z and w columns.

In [36]:

Copied!

tab.iloc[tab['y'] == 'AAPL', ['y', 'z', 'w']]
tab.iloc[tab['y'] == 'AAPL', ['y', 'z', 'w']]

Out[36]:

	y	z	w

0	AAPL	256.6323	998
1	AAPL	470.0497	807
2	AAPL	280.2929	230
3	AAPL	223.9135	153
4	AAPL	293.3982	354
5	AAPL	444.315	286
6	AAPL	147.3942	895
7	AAPL	460f	471
...	...	...	...
341	AAPL	142.8215	874

342 rows × 3 columns

Replace all null values in the column v with the value -100.

In [37]:

Copied!

tab.iloc[tab['v'] == kx.q('0N'), 'v'] = -100
tab
tab.iloc[tab['v'] == kx.q('0N'), 'v'] = -100
tab

Out[37]:

	x	y	z	w	v

0	0	GOOG	326.1157	458	0
1	1	MSFT	4.083758	511	250
2	2	GOOG	81.29854	96	0
3	3	AAPL	256.6323	998	50
4	4	MSFT	398.2529	103	0
5	5	GOOG	429.4278	214	100
6	6	AAPL	470.0497	807	100
7	7	GOOG	409.6725	727	200
...	...	...	...	...	...
999	999	AAPL	142.8215	874	0

1,000 rows × 5 columns

Table.loc[]¶

Table.loc[:, :]

Access a group of rows and columns by label or by BooleanVector.

loc is a label based form of indexing, but may also be used with a boolean array.

Allowed inputs are:

A single label, e.g. 5 or 'a', (note that 5 is interpreted as a label of the index, and never as an integer position along the index)
A list or array of labels, e.g. ['a', 'b', 'c']
A slice object with labels, e.g. 'a':'f'
- Warning contrary to usual python slices, both the start and the stop are included
A BooleanVector of the same length as the axis being sliced
An alignable BooleanVector. The index of the key will be aligned before masking
An alignable Index. The Index of the returned selection will be the input
A callable function with one argument (the calling Table like object) and that returns valid output for indexing (e.g. one of the above)

Note: When the Pandas API is enabled, using [] to index into a table will use Table.loc[]

Returns:

Type	Description
Table	A table containing only the columns / rows requested.

Examples:

Get every row in the y column.

In [38]:

Copied!

tab[:, 'y']
tab[:, 'y']

Out[38]:

	y

0	GOOG
1	MSFT
2	GOOG
3	AAPL
4	MSFT
5	GOOG
6	AAPL
7	GOOG
...	...
999	AAPL

1,000 rows × 1 columns

Get all rows of the table where the value in the z column is greater than 250.0

In [39]:

Copied!

tab[tab['z'] > 250.0]
tab[tab['z'] > 250.0]

Out[39]:

	x	y	z	w	v

0	0	GOOG	326.1157	458	0
1	3	AAPL	256.6323	998	50
2	4	MSFT	398.2529	103	0
3	5	GOOG	429.4278	214	100
4	6	AAPL	470.0497	807	100
5	7	GOOG	409.6725	727	200
6	8	AAPL	280.2929	230	250
7	10	AAPL	293.3982	354	0
...	...	...	...	...	...
517	998	AAPL	383.3942	13	0

518 rows × 5 columns

Replace all null values in the column v with the value -100.

In [40]:

Copied!

tab.loc[tab['v'] == kx.LongAtom.null, 'v'] = -100
tab
tab.loc[tab['v'] == kx.LongAtom.null, 'v'] = -100
tab

Out[40]:

	x	y	z	w	v

0	0	GOOG	326.1157	458	0
1	1	MSFT	4.083758	511	250
2	2	GOOG	81.29854	96	0
3	3	AAPL	256.6323	998	50
4	4	MSFT	398.2529	103	0
5	5	GOOG	429.4278	214	100
6	6	AAPL	470.0497	807	100
7	7	GOOG	409.6725	727	200
...	...	...	...	...	...
999	999	AAPL	142.8215	874	0

1,000 rows × 5 columns

Replace all locations in column v where the value is -100 with a null.

In [41]:

Copied!

tab[tab['v'] == -100, 'v'] = kx.LongAtom.null
tab
tab[tab['v'] == -100, 'v'] = kx.LongAtom.null
tab

Out[41]:

	x	y	z	w	v

0	0	GOOG	326.1157	458	0
1	1	MSFT	4.083758	511	250
2	2	GOOG	81.29854	96	0
3	3	AAPL	256.6323	998	50
4	4	MSFT	398.2529	103	0
5	5	GOOG	429.4278	214	100
6	6	AAPL	470.0497	807	100
7	7	GOOG	409.6725	727	200
...	...	...	...	...	...
999	999	AAPL	142.8215	874	0

1,000 rows × 5 columns

Usage of the loc functionality under the hood additionally allows users to set columns within a table for single or multiple columns. Data passed for this can be q/Python.

In [42]:

Copied!

tab['new_col'] = kx.random.random(1000, 1.0)
tab['new_col'] = kx.random.random(1000, 1.0)

In [43]:

Copied!

tab[['new_col1', 'new_col2']] = [20, kx.random.random(1000, kx.GUIDAtom.null)]
tab[['new_col1', 'new_col2']] = [20, kx.random.random(1000, kx.GUIDAtom.null)]

Table.sample()¶

Table.sample(n, frac, replace, weights, random_state, axis, ignore_index)

Sample random data from the table.

Parameters:

Name	Type	Description	Default
n	int	Number of rows to return. Cannot be used with `frac`. Default is 1 if `frac` is None.	None
frac	float	Fraction of the rows to return. Cannot be used with `n`.	None
replace	bool	Whether or not it should be possible to sample the same row twice.	False
weights	None	Not yet implemented.	None
random_state	None	Not yet implemented.	None
axis	None	Not yet implemented.	None
ignore_index	bool	Not yet implemented.	False

Returns:

Type	Description
Table	A table with the given column(s) renamed.

In [44]:

Copied!





# The examples in this section will use this example table filled with random data
N = 1000
tab = kx.Table(data = {
    'x': kx.q.til(N),
    'y': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),
    'z': kx.random.random(N, 500.0),
    'w': kx.random.random(N, 1000),
    'v': kx.random.random(N, [kx.LongAtom.null, 0, 50, 100, 200, 250])})
tab.head()
# The examples in this section will use this example table filled with random data
N = 1000
tab = kx.Table(data = {
    'x': kx.q.til(N),
    'y': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),
    'z': kx.random.random(N, 500.0),
    'w': kx.random.random(N, 1000),
    'v': kx.random.random(N, [kx.LongAtom.null, 0, 50, 100, 200, 250])})
tab.head()

Out[44]:

	x	y	z	w	v

0	0	AAPL	51.56208	1	200
1	1	GOOG	192.8475	238	200
2	2	GOOG	152.3489	774	0N
3	3	GOOG	104.8755	913	0
4	4	GOOG	160.1738	506	100

Examples:

Sample 10 Rows.

In [45]:

Copied!

tab.sample(n=10)
tab.sample(n=10)

Out[45]:

	x	y	z	w	v

0	604	AAPL	276.1397	785	0N
1	893	MSFT	353.1517	659	200
2	590	AAPL	30.3018	634	200
3	881	MSFT	384.4023	461	0N
4	224	MSFT	64.81406	455	250
5	740	GOOG	184.8131	730	100
6	172	MSFT	460.7157	329	0
7	738	MSFT	425.1625	556	50
8	943	AAPL	254.8109	792	100

Sample 10% of the rows.

In [46]:

Copied!

tab.sample(frac=0.1)
tab.sample(frac=0.1)

Out[46]:

	x	y	z	w	v

0	580	AAPL	218.6735	743	0N
1	653	MSFT	458.9859	2	250
2	875	MSFT	30.793	273	0N
3	641	GOOG	41.78246	973	50
4	436	AAPL	345.3793	103	250
5	261	MSFT	2.576664	990	100
6	780	GOOG	25.01553	7	0N
7	851	GOOG	492.5621	688	0N
...	...	...	...	...	...
99	107	AAPL	189.0718	667	0N

100 rows × 5 columns

Sample 10% of the rows and allow the same row to be sampled twice.

In [47]:

Copied!

tab.sample(frac=0.1, replace=True)
tab.sample(frac=0.1, replace=True)

Out[47]:

	x	y	z	w	v

0	576	MSFT	32.12644	902	0
1	298	MSFT	203.5688	36	50
2	48	MSFT	477.9462	345	0N
3	172	MSFT	460.7157	329	0
4	144	MSFT	149.4715	441	250
5	65	AAPL	347.9628	848	50
6	336	GOOG	31.34298	618	0
7	137	GOOG	476.6447	676	0N
...	...	...	...	...	...
99	217	GOOG	358.6286	781	250

100 rows × 5 columns

Table.select_dtypes()¶

Table.select_dtypes(include=None, exclude=None)

Return a subset of the DataFrame’s columns based on the column dtypes.

Allowed inputs for include/exclude are:

A single dtype or string.
A list of dtypes or strings.
Inputs given for include and exclude cannot overlap.

The dtype kx.CharVector will return an error. Use kx.CharAtom for a column of single chars. Both kx.*Atom and kx.*Vector will be taken to mean a column containing a single item per row of type *. kx.List will include/exclude any columns containing mixed list data (including string columns).

Parameters:

Name	Type	Description	Default
include	Union[List, str]	A selection of dtypes or strings to be included.	None
exclude	Union[List, str]	A selection of dtypes or strings to be excluded.	None

At least one of these parameters must be supplied.

Returns:

Type	Description
Dataframe	The subset of the frame including the dtypes in `include` and excluding the dtypes in `exclude`.

Examples:

The examples in the section will use the example table.

In [48]:

Copied!





df = kx.Table(data = {
  'c1': kx.SymbolVector(['a', 'b', 'c']),
  'c2': kx.ShortVector([1, 2, 3]),
  'c3': kx.LongVector([1, 2, 3]),
  'c4': kx.IntVector([1, 2, 3])
  })
df = kx.Table(data = {
  'c1': kx.SymbolVector(['a', 'b', 'c']),
  'c2': kx.ShortVector([1, 2, 3]),
  'c3': kx.LongVector([1, 2, 3]),
  'c4': kx.IntVector([1, 2, 3])
  })

Exclude columns containing symbols

In [49]:

Copied!

df.select_dtypes(exclude = kx.SymbolVector)
df.select_dtypes(exclude = kx.SymbolVector)

Out[49]:

	c2	c3	c4

0	1h	1	1i
1	2h	2	2i
2	3h	3	3i

Include a list of column types

In [50]:

Copied!

df.select_dtypes(include = [kx.ShortVector, kx.LongVector])
df.select_dtypes(include = [kx.ShortVector, kx.LongVector])

Out[50]:

	c2	c3

0	1h	1
1	2h	2
2	3h	3

Table.tail()¶

Table.tail(n=5)

Get the last n rows from a table.

Parameters:

Name	Type	Description	Default
n	int	The number of rows to return.	5

Returns:

Type	Description
Table	The last `n` rows of the table.

Examples:

Return the last 5 rows of the table.

In [51]:

Copied!

tab.tail()
tab.tail()

Out[51]:

	x	y	z	w	v

0	995	MSFT	30.33	974	100
1	996	GOOG	417.5746	591	0
2	997	GOOG	237.1697	119	250
3	998	MSFT	75.62617	784	50
4	999	AAPL	324.1628	231	200

Return the last 10 rows of the table.

In [52]:

Copied!

tab.tail(10)
tab.tail(10)

Out[52]:

	x	y	z	w	v

0	990	MSFT	251.8079	223	0N
1	991	MSFT	316.3818	523	0N
2	992	GOOG	236.4901	855	0N
3	993	MSFT	415.9962	542	100
4	994	GOOG	468.8945	175	100
5	995	MSFT	30.33	974	100
6	996	GOOG	417.5746	591	0
7	997	GOOG	237.1697	119	250
8	998	MSFT	75.62617	784	50

Sorting¶

Table.sort_values()¶

Table.sort_values(by, ascending=True)

Sort Table objects based on the value of a selected column.

Parameters:

Name	Type	Description	Default
by	str or list of str	The name of the column to sort by.	required
ascending	bool	The order in which to sort the values, ascending is True and descending is False.	True

Returns:

Type	Description
Table	The resulting table after the sort has been perfomed

Examples:

In [53]:

Copied!





tab = kx.Table(data={
    'column_a': [20, 3, 100],
    'column_b': [56, 15, 42],
    'column_c': [45, 80, 8]})
tab
tab = kx.Table(data={
    'column_a': [20, 3, 100],
    'column_b': [56, 15, 42],
    'column_c': [45, 80, 8]})
tab

Out[53]:

	column_a	column_b	column_c

0	20	56	45
1	3	15	80
2	100	42	8

Sort a Table by the second column

In [54]:

Copied!

tab.sort_values(by='column_b')
tab.sort_values(by='column_b')

Out[54]:

	column_a	column_b	column_c

0	3	15	80
1	100	42	8
2	20	56	45

Sort a Table by the third column in descending order

In [55]:

Copied!

tab.sort_values(by='column_c', ascending=False)
tab.sort_values(by='column_c', ascending=False)

Out[55]:

	column_a	column_b	column_c

0	3	15	80
1	20	56	45
2	100	42	8

Table.nsmallest()¶

Table.nsmallest(
    n,
    columns,
    keep='first'
)

Return the first n rows of a Table ordered by columns in ascending order

Parameters:

Name	Type	Description	Default
n	int	The number of rows to return	required
columns	str or list of str	Column labels to order by	required
keep	str	Can be 'first', 'last' or 'all'. Used in case of duplicate values	'first'

Returns

Type	Description
Table	The first n rows ordered by the given columns in ascending order

Examples:

Sample table

In [56]:

Copied!





tab = kx.Table(data={
    'column_a': [2, 3, 2, 2, 1],
    'column_b': [56, 15, 42, 102, 32],
    'column_c': [45, 80, 8, 61, 87]})
tab
tab = kx.Table(data={
    'column_a': [2, 3, 2, 2, 1],
    'column_b': [56, 15, 42, 102, 32],
    'column_c': [45, 80, 8, 61, 87]})
tab

Out[56]:

	column_a	column_b	column_c

0	2	56	45
1	3	15	80
2	2	42	8
3	2	102	61
4	1	32	87

Get the row where the first column is the smallest

In [57]:

Copied!

tab.nsmallest(n=1, columns='column_a')
tab.nsmallest(n=1, columns='column_a')

Out[57]:

	column_a	column_b	column_c

0	1	32	87

Get the 4 rows where the first column is the smallest, then any equal values are sorted based on the second column

In [58]:

Copied!

tab.nsmallest(n=4,columns=['column_a', 'column_b'])
tab.nsmallest(n=4,columns=['column_a', 'column_b'])

Out[58]:

	column_a	column_b	column_c

0	1	32	87
1	2	42	8
2	2	56	45
3	2	102	61

Get the 2 rows with the smallest values for the first column and in case of duplicates, take the last entry in the table

In [59]:

Copied!

tab.nsmallest(n=2, columns=['column_a'], keep='last')
tab.nsmallest(n=2, columns=['column_a'], keep='last')

Out[59]:

	column_a	column_b	column_c

0	1	32	87
1	2	102	61

Table.nlargest()¶

Table.nlargest(
    n,
    columns,
    keep='first'
)

Return the first n rows of a Table ordered by columns in descending order

Parameters:

Name	Type	Description	Default
n	int	The number of rows to return	required
columns	str or list of str	Column labels to order by	required
keep	str	Can be 'first', 'last' or 'all'. Used in case of duplicate values	'first'

Returns

Type	Description
Table	The first n rows ordered by the given columns in descending order

Examples:

Sample table

In [60]:

Copied!





tab = kx.Table(data={
    'column_a': [2, 3, 2, 2, 1],
    'column_b': [102, 15, 42, 56, 32],
    'column_c': [45, 80, 8, 61, 87]})
tab
tab = kx.Table(data={
    'column_a': [2, 3, 2, 2, 1],
    'column_b': [102, 15, 42, 56, 32],
    'column_c': [45, 80, 8, 61, 87]})
tab

Out[60]:

	column_a	column_b	column_c

0	2	102	45
1	3	15	80
2	2	42	8
3	2	56	61
4	1	32	87

Get the row with the largest value for the first column

In [61]:

Copied!

tab.nlargest(n=1, columns='column_a')
tab.nlargest(n=1, columns='column_a')

Out[61]:

	column_a	column_b	column_c

0	3	15	80

Get the 4 rows where the first column is the largest, then any equal values are sorted based on the third column

In [62]:

Copied!

tab.nlargest(n=4,columns=['column_a', 'column_c'])
tab.nlargest(n=4,columns=['column_a', 'column_c'])

Out[62]:

	column_a	column_b	column_c

0	3	15	80
1	2	56	61
2	2	102	45
3	2	42	8

Get the 2 rows with the smallest values for the first column and in case of duplicates, take all rows of the same value for that column

In [63]:

Copied!

tab.nsmallest(n=2, columns=['column_a'], keep='all')
tab.nsmallest(n=2, columns=['column_a'], keep='all')

Out[63]:

	column_a	column_b	column_c

0	1	32	87
1	2	102	45
2	2	42	8
3	2	56	61

Data Joins/Merging¶

Table.merge()¶

Table.merge(
    right,
    how='inner',
    on=None,
    left_on=None,
    right_on=None,
    left_index=False,
    right_index=False,
    sort=False,
    suffixes=('_x', '_y'),
    copy=True,
    validate=None,
    q_join=False
)

Merge Table or KeyedTable objects with a database-style join.

The join is done on columns or keys. If joining columns on columns, the Table key will be ignored. Otherwise if joining keys on keys or keys on a column or columns, the index will be passed on. When performing a cross merge, no column specifications to merge on are allowed.

Parameters:

Name	Type	Description	Default
right	Union[Table/KeyedTable]	The object to merge with.	required
how	str	The type of join to be used. One of {‘left’, ‘right’, ‘outer’, ‘inner’, ‘cross’}.	‘inner’
on	str	The column name to join on.	None
left_on	str	The column name in the left table to join on.	None
right_on	str	The column name in the right table to join on.	None
left_index	bool	Use the index of the left Table.	False
right_index	bool	Use the index of the right Table.	False
sort	bool	Sort the join keys of the resulting table.	False
suffixes	Tuple(str, str)	The number of rows to return.	('_x', '_y')
copy	bool	If False avoid copies and modify the input table.	None
validate	str	If specified checks if merge matches specified type. - “one_to_one” or “1:1”: check if merge keys are unique in both left and right datasets. - “one_to_many” or “1:m”: check if merge keys are unique in left dataset. - “many_to_one” or “m:1”: check if merge keys are unique in right dataset. - “many_to_many” or “m:m”: allowed, but does not result in checks.	None
q_join	bool	If True perform native q joins instead of the pandas SQL like joins. More documentation around these joins can be found here.	False

Returns:

Type	Description
Table / KeyedTable	The resulting table-like object after the join has been preformed.

Examples:

Merge tab1 and tab2 on the lkey and rkey columns. The value columns have the default suffixes, _x and _y, appended.

In [64]:

Copied!

tab1 = kx.Table(data={'lkey': ['foo', 'bar', 'baz', 'foo'], 'value': [1, 2, 3, 5]})
tab2 = kx.Table(data={'rkey': ['foo', 'bar', 'baz', 'foo'], 'value': [5, 6, 7, 8]})
tab1.merge(tab2, left_on='lkey', right_on='rkey')
tab1 = kx.Table(data={'lkey': ['foo', 'bar', 'baz', 'foo'], 'value': [1, 2, 3, 5]})
tab2 = kx.Table(data={'rkey': ['foo', 'bar', 'baz', 'foo'], 'value': [5, 6, 7, 8]})
tab1.merge(tab2, left_on='lkey', right_on='rkey')

Out[64]:

	lkey	value_x	rkey	value_y

0	foo	1	foo	5
1	foo	1	foo	8
2	foo	5	foo	5
3	foo	5	foo	8
4	bar	2	bar	6
5	baz	3	baz	7

Merge tab1 and tab2 with specified left and right suffixes appended to any overlapping columns.

In [65]:

Copied!

tab1.merge(tab2, left_on='lkey', right_on='rkey', suffixes=('_left', '_right'))
tab1.merge(tab2, left_on='lkey', right_on='rkey', suffixes=('_left', '_right'))

Out[65]:

	lkey	value_left	rkey	value_right

0	foo	1	foo	5
1	foo	1	foo	8
2	foo	5	foo	5
3	foo	5	foo	8
4	bar	2	bar	6
5	baz	3	baz	7

Merge tab1 and tab2 but raise an exception if the Tables have any overlapping columns.

In [66]:

Copied!





try:
    tab1.merge(tab2, left_on='lkey', right_on='rkey', suffixes=(False, False))
except BaseException as e:
    print(f'Caught Error: {e}')
try:
    tab1.merge(tab2, left_on='lkey', right_on='rkey', suffixes=(False, False))
except BaseException as e:
    print(f'Caught Error: {e}')

Caught Error: Columns overlap but no suffix specified: ['value']

In [67]:

Copied!

tab1 = kx.Table(data={'a': ['foo', 'bar'], 'b': [1, 2]})
tab2 = kx.Table(data={'a': ['foo', 'baz'], 'c': [3, 4]})
tab1 = kx.Table(data={'a': ['foo', 'bar'], 'b': [1, 2]})
tab2 = kx.Table(data={'a': ['foo', 'baz'], 'c': [3, 4]})

Merge tab1 and tab2 on the a column using an inner join.

In [68]:

Copied!

tab1.merge(tab2, how='inner', on='a')
tab1.merge(tab2, how='inner', on='a')

Out[68]:

	a	b	c

0	foo	1	3

Merge tab1 and tab2 on the a column using a left join.

In [69]:

Copied!

tab1.merge(tab2, how='left', on='a')
tab1.merge(tab2, how='left', on='a')

Out[69]:

	a	b	c

0	foo	1	3
1	bar	2	0N

Merge tab1 and tab2 using a cross join.

In [70]:

Copied!

tab1 = kx.Table(data={'left': ['foo', 'bar']})
tab2 = kx.Table(data={'right': [7, 8]})
tab1.merge(tab2, how='cross')
tab1 = kx.Table(data={'left': ['foo', 'bar']})
tab2 = kx.Table(data={'right': [7, 8]})
tab1.merge(tab2, how='cross')

Out[70]:

	left	right

0	foo	7
1	foo	8
2	bar	7
3	bar	8

Merge tab1 and tab2_keyed using a left join with q_join set to True. Inputs/Outputs will match q lj behaviour.

In [71]:

Copied!





tab1 = kx.Table(data={'a': ['foo', 'bar', 'baz'], 'b': [1, 2, 3]})
tab2 = kx.Table(data={'a': ['foo', 'baz', 'baz'], 'c': [3, 4, 5]})
tab2_keyed = tab2.set_index(1)
tab1.merge(tab2_keyed, how='left', q_join=True)
tab1 = kx.Table(data={'a': ['foo', 'bar', 'baz'], 'b': [1, 2, 3]})
tab2 = kx.Table(data={'a': ['foo', 'baz', 'baz'], 'c': [3, 4, 5]})
tab2_keyed = tab2.set_index(1)
tab1.merge(tab2_keyed, how='left', q_join=True)

Out[71]:

	a	b	c

0	foo	1	3
1	baz	3	4

Inputs/Outputs will match q ij behaviour.

In [72]:

Copied!

tab3 = kx.Table(data={'a': ['foo', 'bar'], 'd': [6, 7]})
tab3_keyed = tab3.set_index(1)
tab1.merge(tab3_keyed, how='inner', q_join=True)
tab3 = kx.Table(data={'a': ['foo', 'bar'], 'd': [6, 7]})
tab3_keyed = tab3.set_index(1)
tab1.merge(tab3_keyed, how='inner', q_join=True)

Out[72]:

	a	b	d

0	foo	1	6
1	bar	2	7

Merge using q_join set to True, and how set to left, will fail when tab2 is not a keyed table.

In [73]:

Copied!





#Will error as Left Join requires a keyed column for the right dataset.
try:
    tab1.merge(tab2, how='left', q_join=True)
except ValueError as e:
    print(f'Caught Error: {e}')
#Will error as Left Join requires a keyed column for the right dataset.
try:
    tab1.merge(tab2, how='left', q_join=True)
except ValueError as e:
    print(f'Caught Error: {e}')

Caught Error: Left Join requires a keyed table for the right dataset.

Table.merge_asof()¶

Table.merge_asof(
    right,
    on=None,
    left_on=None,
    right_on=None,
    left_index=False,
    right_index=False,
    by=None,
    left_by=None,
    right_by=None,
    suffixes=('_x', '_y'),
    tolerance=None,
    allow_exact_matches=True,
    direction='backward'

)

Merge Table or KeyedTable objects with a database-style join.

The join is done on columns or keys. If joining columns on columns, the Table key will be ignored. Otherwise if joining keys on keys or keys on a column or columns, the index will be passed on. When performing a cross merge, no column specifications to merge on are allowed.

Parameters:

Name	Type	Description	Default
right	Union[Table/KeyedTable]	The object to merge with.	required
how	str	The type of join to be used. One of {‘left’, ‘right’, ‘outer’, ‘inner’, ‘cross’}.	‘inner’
on	str	The column name to join on.	None
left_on	str	The column name in the left table to join on.	None
right_on	str	The column name in the right table to join on.	None
left_index	bool	Use the index of the left Table.	False
right_index	bool	Use the index of the right Table.	False
by	str	Not yet implemented.	None
left_by	str	Field names to match on in the left table.	None
right_by	str	Field names to match on in the right table.	None
suffixes	Tuple(str, str)	The number of rows to return.	('_x', '_y')
tolerance	Any	Not yet implemented.	None
allow_exact_matches	bool	Not yet implemented.	True
direction	str	Not yet implemented.	'backward'

Returns:

Type	Description
Table / KeyedTable	The resulting table like object after the join has been preformed.

Examples:

Perform a simple asof join on two tables.

In [74]:

Copied!

left  = kx.Table(data={"a": [1, 5, 10], "left_val": ["a", "b", "c"]})
right = kx.Table(data={"a": [1, 2, 3, 6, 7], "right_val": [1, 2, 3, 6, 7]})
left
left  = kx.Table(data={"a": [1, 5, 10], "left_val": ["a", "b", "c"]})
right = kx.Table(data={"a": [1, 2, 3, 6, 7], "right_val": [1, 2, 3, 6, 7]})
left

Out[74]:

	a	left_val

0	1	a
1	5	b
2	10	c

In [75]:

Copied!

right
right

Out[75]:

	a	right_val

0	1	1
1	2	2
2	3	3
3	6	6
4	7	7

In [76]:

Copied!

left.merge_asof(right)
left.merge_asof(right)

Out[76]:

	a	left_val	right_val

0	1	a	1
1	5	b	3
2	10	c	7

Perform a asof join on two tables but first merge them on the by column.

In [77]:

Copied!





trades = kx.Table(data={
    "time": [
        pd.Timestamp("2016-05-25 13:30:00.023"),
        pd.Timestamp("2016-05-25 13:30:00.023"),
        pd.Timestamp("2016-05-25 13:30:00.030"),
        pd.Timestamp("2016-05-25 13:30:00.041"),
        pd.Timestamp("2016-05-25 13:30:00.048"),
        pd.Timestamp("2016-05-25 13:30:00.049"),
        pd.Timestamp("2016-05-25 13:30:00.072"),
        pd.Timestamp("2016-05-25 13:30:00.075")
    ],
    "ticker": [
       "GOOG",
       "MSFT",
       "MSFT",
       "MSFT",
       "GOOG",
       "AAPL",
       "GOOG",
       "MSFT"
   ],
   "bid": [720.50, 51.95, 51.97, 51.99, 720.50, 97.99, 720.50, 52.01],
   "ask": [720.93, 51.96, 51.98, 52.00, 720.93, 98.01, 720.88, 52.03]
})
quotes = kx.Table(data={
   "time": [
       pd.Timestamp("2016-05-25 13:30:00.023"),
       pd.Timestamp("2016-05-25 13:30:00.038"),
       pd.Timestamp("2016-05-25 13:30:00.048"),
       pd.Timestamp("2016-05-25 13:30:00.048"),
       pd.Timestamp("2016-05-25 13:30:00.048")
   ],
   "ticker": ["MSFT", "MSFT", "GOOG", "GOOG", "AAPL"],
   "price": [51.95, 51.95, 720.77, 720.92, 98.0],
   "quantity": [75, 155, 100, 100, 100]
})
trades
trades = kx.Table(data={
    "time": [
        pd.Timestamp("2016-05-25 13:30:00.023"),
        pd.Timestamp("2016-05-25 13:30:00.023"),
        pd.Timestamp("2016-05-25 13:30:00.030"),
        pd.Timestamp("2016-05-25 13:30:00.041"),
        pd.Timestamp("2016-05-25 13:30:00.048"),
        pd.Timestamp("2016-05-25 13:30:00.049"),
        pd.Timestamp("2016-05-25 13:30:00.072"),
        pd.Timestamp("2016-05-25 13:30:00.075")
    ],
    "ticker": [
       "GOOG",
       "MSFT",
       "MSFT",
       "MSFT",
       "GOOG",
       "AAPL",
       "GOOG",
       "MSFT"
   ],
   "bid": [720.50, 51.95, 51.97, 51.99, 720.50, 97.99, 720.50, 52.01],
   "ask": [720.93, 51.96, 51.98, 52.00, 720.93, 98.01, 720.88, 52.03]
})
quotes = kx.Table(data={
   "time": [
       pd.Timestamp("2016-05-25 13:30:00.023"),
       pd.Timestamp("2016-05-25 13:30:00.038"),
       pd.Timestamp("2016-05-25 13:30:00.048"),
       pd.Timestamp("2016-05-25 13:30:00.048"),
       pd.Timestamp("2016-05-25 13:30:00.048")
   ],
   "ticker": ["MSFT", "MSFT", "GOOG", "GOOG", "AAPL"],
   "price": [51.95, 51.95, 720.77, 720.92, 98.0],
   "quantity": [75, 155, 100, 100, 100]
})
trades

Out[77]:

	time	ticker	bid	ask

0	2016.05.25D13:30:00.023000000	GOOG	720.5	720.93
1	2016.05.25D13:30:00.023000000	MSFT	51.95	51.96
2	2016.05.25D13:30:00.030000000	MSFT	51.97	51.98
3	2016.05.25D13:30:00.041000000	MSFT	51.99	52f
4	2016.05.25D13:30:00.048000000	GOOG	720.5	720.93
5	2016.05.25D13:30:00.049000000	AAPL	97.99	98.01
6	2016.05.25D13:30:00.072000000	GOOG	720.5	720.88
7	2016.05.25D13:30:00.075000000	MSFT	52.01	52.03

In [78]:

Copied!

quotes
quotes

Out[78]:

	time	ticker	price	quantity

0	2016.05.25D13:30:00.023000000	MSFT	51.95	75
1	2016.05.25D13:30:00.038000000	MSFT	51.95	155
2	2016.05.25D13:30:00.048000000	GOOG	720.77	100
3	2016.05.25D13:30:00.048000000	GOOG	720.92	100
4	2016.05.25D13:30:00.048000000	AAPL	98f	100

In [79]:

Copied!

trades.merge_asof(quotes, on="time")
trades.merge_asof(quotes, on="time")

Out[79]:

	time	ticker_x	bid	ask	ticker_y	price	quantity

0	2016.05.25D13:30:00.023000000	GOOG	720.5	720.93	MSFT	51.95	75
1	2016.05.25D13:30:00.023000000	MSFT	51.95	51.96	MSFT	51.95	75
2	2016.05.25D13:30:00.030000000	MSFT	51.97	51.98	MSFT	51.95	75
3	2016.05.25D13:30:00.041000000	MSFT	51.99	52f	MSFT	51.95	155
4	2016.05.25D13:30:00.048000000	GOOG	720.5	720.93	AAPL	98f	100
5	2016.05.25D13:30:00.049000000	AAPL	97.99	98.01	AAPL	98f	100
6	2016.05.25D13:30:00.072000000	GOOG	720.5	720.88	AAPL	98f	100
7	2016.05.25D13:30:00.075000000	MSFT	52.01	52.03	AAPL	98f	100

Analytic functionality¶

In [80]:

Copied!





# All the examples in this section will use this example table.
N = 100
kx.Table(data={
    'sym': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),
    'price': 250 + kx.random.random(N, 500.0),
    'traded': 100 - kx.random.random(N, 200),
    'hold': kx.random.random(N, False)
    })
tab
# All the examples in this section will use this example table.
N = 100
kx.Table(data={
    'sym': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),
    'price': 250 + kx.random.random(N, 500.0),
    'traded': 100 - kx.random.random(N, 200),
    'hold': kx.random.random(N, False)
    })
tab

Out[80]:

	column_a	column_b	column_c

0	2	102	45
1	3	15	80
2	2	42	8
3	2	56	61
4	1	32	87

Table.abs()¶

Table.abs(numeric_only=False)

Take the absolute value of each element in the table. This will raise an error if there are columns that contain data that have no absolute value.

Parameters:

Name	Type	Description	Default
numeric_only	bool	Only use columns of the table that can be converted to an absolute value.	False

Returns:

Type	Description
Table / KeyedTable	The resulting table like object with only positive numerical values.

In [81]:

Copied!

tab.abs(numeric_only=True)
tab.abs(numeric_only=True)

Out[81]:

	column_a	column_b	column_c

0	2	102	45
1	3	15	80
2	2	42	8
3	2	56	61
4	1	32	87

Table.count()¶

Table.count(axis=0, numeric_only=False)

Returns the count of non null values across the given axis.

Parameters:

Name	Type	Description	Default
axis	int	The axis to count elements across 1 is columns, 0 is rows.	0
numeric_only	bool	Only use columns of the table that are of a numeric data type.	False

Returns:

Type	Description
Dictionary	A dictionary where the key represent the column name / row number and the values are the result of calling `count` on that column / row.

In [82]:

Copied!

tab.count()
tab.count()

Out[82]:



column_a	5
column_b	5
column_c	5

Table.max()¶

Table.max(axis=0, skipna=True, numeric_only=False)

Returns the maximum value across the given axis.

Parameters:

Name	Type	Description	Default
axis	int	The axis to calculate the maximum across 0 is columns, 1 is rows.	0
skipna	bool	Ignore any null values along the axis.	True
numeric_only	bool	Only use columns of the table that are of a numeric data type.	False

Returns:

Type	Description
Dictionary	A dictionary where the key represents the column name / row number and the values are the result of calling `max` on that column / row.

In [83]:

Copied!

tab.max()
tab.max()

Out[83]:



column_a	3
column_b	102
column_c	87

Table.min()¶

Table.min(axis=0, skipna=True, numeric_only=False)

Returns the minimum value across the given axis.

Parameters:

Name	Type	Description	Default
axis	int	The axis to calculate the minimum across 0 is columns, 1 is rows.	0
skipna	bool	Ignore any null values along the axis.	True
numeric_only	bool	Only use columns of the table that are of a numeric data type.	False

Returns:

Type	Description
Dictionary	A dictionary where the key represents the column name / row number and the values are the result of calling `min` on that column / row.

In [84]:

Copied!

tab.min()
tab.min()

Out[84]:



column_a	1
column_b	15
column_c	8

Table.idxmax()¶

Table.idxmax(axis=0, skipna=True, numeric_only=False)

Return index of first occurrence of maximum over requested axis.

Parameters:

Name	Type	Description	Default
axis	int	The axis to calculate the idxmax across. 0 is columns, 1 is rows.	0
skipna	bool	Ignore any null values along the axis.	True
numeric_only	bool	Only use columns of the table that are of a numeric data type.	False

Returns:

Type	Description
Dictionary	A dictionary where the key represents the column name / row number and the values are the result of calling `idxmax` on that column / row.

Examples:

Calculate the idxmax across the columns of a table

In [85]:

Copied!

tab.idxmax()
tab.idxmax()

Out[85]:



column_a	1
column_b	0
column_c	4

Calculate the idxmax across the rows of a table using only columns thar are of a numeric data type

In [86]:

Copied!

tab.idxmax(axis=1, numeric_only=True)
tab.idxmax(axis=1, numeric_only=True)

Out[86]:



0	column_b
1	column_c
2	column_b
3	column_c
4	column_c

Table.idxmin()¶

Table.idxmax(axis=0, skipna=True, numeric_only=False)

Return index of first occurrence of minimum over requested axis.

Parameters:

Name	Type	Description	Default
axis	int	The axis to calculate the idxmin across. 0 is columns, 1 is rows.	0
skipna	bool	Ignore any null values along the axis.	True
numeric_only	bool	Only use columns of the table that are of a numeric data type.	False

Returns:

Type	Description
Dictionary	A dictionary where the key represents the column name / row number and the values are the result of calling `idxmin` on that column / row.

Examples:

Calculate the idxmin across the columns of a table

In [87]:

Copied!

tab.idxmin()
tab.idxmin()

Out[87]:



column_a	4
column_b	1
column_c	2

Calculate the idxmin across the rows of a table using only columns thar are of a numeric data type

In [88]:

Copied!

tab.idxmin(axis=1, numeric_only=True)
tab.idxmin(axis=1, numeric_only=True)

Out[88]:



0	column_a
1	column_a
2	column_a
3	column_a
4	column_a

Table.sum()¶

Table.sum(axis=0, skipna=True, numeric_only=False, min_count=0)

Returns the sum of all values across the given axis.

Parameters:

Name	Type	Description	Default
axis	int	The axis to calculate the sum across 0 is columns, 1 is rows.	0
skipna	bool	Ignore any null values along the axis.	True
numeric_only	bool	Only use columns of the table that are of a numeric data type.	False
min_count	int	If not set to 0 if there are less then `min_count` values across the axis a null value will be returned	0

Returns:

Type	Description
Dictionary	A dictionary where the key represents the column name / row number and the values are the result of calling `sum` on that column / row.

In [89]:

Copied!

tab.sum()
tab.sum()

Out[89]:



column_a	10
column_b	247
column_c	281

Table.mean()¶

Table.mean(axis=0, numeric_only=False)

Get the mean of values across the requested axis.

Parameters:

Name	Type	Description	Default
axis	int	The axis to calculate mean across 0 is columns, 1 is rows.	0
numeric_only	bool	Include only columns / rows with numeric data.	False

Returns:

Type	Description
Dictionary	The mean across each row / column with the key corresponding to the row number or column name.

Examples:

Calculate the mean across the columns of a table

In [90]:

Copied!





tab = kx.Table(data=
    {
        'a': [1, 2, 2, 4],
        'b': [1, 2, 6, 7],
        'c': [7, 8, 9, 10],
        'd': [7, 11, 14, 14]
    }
)
tab
tab = kx.Table(data=
    {
        'a': [1, 2, 2, 4],
        'b': [1, 2, 6, 7],
        'c': [7, 8, 9, 10],
        'd': [7, 11, 14, 14]
    }
)
tab

Out[90]:

	a	b	c	d

0	1	1	7	7
1	2	2	8	11
2	2	6	9	14
3	4	7	10	14

In [91]:

Copied!

tab.mean()
tab.mean()

Out[91]:



a	2.25
b	4f
c	8.5
d	11.5

Calculate the mean across the rows of a table

In [92]:

Copied!

tab.mean(axis=1)
tab.mean(axis=1)

Out[92]:



0	4f
1	5.75
2	7.75
3	8.75

Table.median()¶

Table.median(axis=0, numeric_only=False)

Get the median of values across the requested axis.

Parameters:

Name	Type	Description	Default
axis	int	The axis to calculate median across 0 is columns, 1 is rows.	0
numeric_only	bool	Include only columns / rows with numeric data.	False

Returns:

Type	Description
Dictionary	The median across each row / column with the key corresponding to the row number or column name.

Examples:

Calculate the median across the columns of a table

In [93]:

Copied!





tab = kx.Table(data=
    {
        'a': [1, 2, 2, 4],
        'b': [1, 2, 6, 7],
        'c': [7, 8, 9, 10],
        'd': [7, 11, 14, 14]
    }
)
tab
tab = kx.Table(data=
    {
        'a': [1, 2, 2, 4],
        'b': [1, 2, 6, 7],
        'c': [7, 8, 9, 10],
        'd': [7, 11, 14, 14]
    }
)
tab

Out[93]:

	a	b	c	d

0	1	1	7	7
1	2	2	8	11
2	2	6	9	14
3	4	7	10	14

In [94]:

Copied!

tab.median()
tab.median()

Out[94]:



a	2f
b	4f
c	8.5
d	12.5

Calculate the median across the rows of a table

In [95]:

Copied!

tab.median(axis=1)
tab.median(axis=1)

Out[95]:



0	4f
1	5f
2	7.5
3	8.5

Table.mode()¶

Table.mode(axis=0, numeric_only=False, dropna=True)

Get the mode of values across the requested axis.

Parameters:

Name	Type	Description	Default
axis	int	The axis to calculate mode across 0 is columns, 1 is rows.	0
numeric_only	bool	Include only columns / rows with numeric data.	False
dropna	bool	Remove null values from the data before calculating the mode.	True

Returns:

Type	Description
Table	The mode across each row / column with the column corresponding to the row number or column name.

Examples:

Calculate the mode across the columns of a table

In [96]:

Copied!





tab = kx.Table(data=
    {
        'a': [1, 2, 2, 4],
        'b': [1, 2, 6, 7],
        'c': [7, 8, 9, 10],
        'd': [7, 11, 14, 14]
    }
)
tab
tab = kx.Table(data=
    {
        'a': [1, 2, 2, 4],
        'b': [1, 2, 6, 7],
        'c': [7, 8, 9, 10],
        'd': [7, 11, 14, 14]
    }
)
tab

Out[96]:

	a	b	c	d

0	1	1	7	7
1	2	2	8	11
2	2	6	9	14
3	4	7	10	14

In [97]:

Copied!

tab.mode()
tab.mode()

Out[97]:

	a	b	c	d

0	2	1	7	14
1	0N	2	8	0N
2	0N	6	9	0N
3	0N	7	10	0N

Calculate the median across the rows of a table

In [98]:

Copied!

tab.mode(axis=1)
tab.mode(axis=1)

Out[98]:

	idx	0	1	2	3

0	0	1	7	0N	0N
1	1	2	0N	0N	0N
2	2	2	6	9	14
3	3	4	7	10	14

Calculate the mode across columns and keep null values.

In [99]:

Copied!





tab = kx.Table(data=
    {
        'x': [0, 1, 2, 3, 4, 5, 6, 7, np.NaN, np.NaN],
        'y': [10, 11, 12, 13, 14, 15, 16, 17, 18, np.NaN],
        'z': ['a', 'b', 'c', 'd', 'd', 'e', 'e', 'f', 'g', 'h']
    }
)
tab
tab = kx.Table(data=
    {
        'x': [0, 1, 2, 3, 4, 5, 6, 7, np.NaN, np.NaN],
        'y': [10, 11, 12, 13, 14, 15, 16, 17, 18, np.NaN],
        'z': ['a', 'b', 'c', 'd', 'd', 'e', 'e', 'f', 'g', 'h']
    }
)
tab

Out[99]:

	x	y	z

0	0	10	a
1	1	11	b
2	2	12	c
3	3	13	d
4	4	14	d
5	5	15	e
6	6	16	e
7	7	17	f
8	0n	18	g

In [100]:

Copied!

tab.mode(dropna=False)
tab.mode(dropna=False)

Out[100]:

	x	y	z

0	0n	10	d
1	0n	11	e
2	0n	12
3	0n	13
4	0n	14
5	0n	15
6	0n	16
7	0n	17
8	0n	18

Table.prod()¶

Table.prod(axis=0, skipna=True, numeric_only=False, min_count=0)

Returns the product of all values across the given axis.

Parameters:

Name	Type	Description	Default
axis	int	The axis to calculate the product across 0 is columns, 1 is rows.	0
skipna	bool	Ignore any null values along the axis.	True
numeric_only	bool	Only use columns of the table that are of a numeric data type.	False
min_count	int	If not set to 0 if there are less then `min_count` values across the axis a null value will be returned	0

Returns:

Type	Description
Dictionary	A dictionary where the key represents the column name / row number and the values are the result of calling `prd` on that column / row.

In [101]:

Copied!





# This example will use a smaller version of the above table
# as the result of calculating the product quickly goes over the integer limits.
N = 10
tab = kx.Table(data={
    'sym': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),
    'price': 2.5 - kx.random.random(N, 5.0),
    'traded': 10 - kx.random.random(N, 20),
    'hold': kx.random.random(N, False)
    })
tab[tab['traded'] == 0, 'traded'] = 1
tab[tab['price'] == 0, 'price'] = 1.0
tab
# This example will use a smaller version of the above table
# as the result of calculating the product quickly goes over the integer limits.
N = 10
tab = kx.Table(data={
    'sym': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),
    'price': 2.5 - kx.random.random(N, 5.0),
    'traded': 10 - kx.random.random(N, 20),
    'hold': kx.random.random(N, False)
    })
tab[tab['traded'] == 0, 'traded'] = 1
tab[tab['price'] == 0, 'price'] = 1.0
tab

Out[101]:

	sym	price	traded	hold

0	GOOG	0.5558637	-7	0b
1	MSFT	0.2195904	7	1b
2	AAPL	-0.8458965	-6	0b
3	MSFT	-2.290325	-8	0b
4	GOOG	-0.8030193	-4	0b
5	GOOG	1.420972	7	1b
6	GOOG	0.1515956	3	1b
7	MSFT	-1.253654	7	1b
8	MSFT	-0.1387201	7	0b

In [102]:

Copied!

tab.prod(numeric_only=True)
tab.prod(numeric_only=True)

Out[102]:



price	-0.001379105
traded	-58084992
hold	0i

Table.kurt()¶

Table.kurt(axis=0, skipna=True, numeric_only=False)

Return unbiased kurtosis over requested axis. Kurtosis obtained using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). Normalized by N-1.

Parameters:

Name	Type	Description	Default
axis	int	Axis for the function to be applied on. 0 is columns, 1 is rows.	0
skipna	bool	Not yet implemented	True
numeric_only	bool	Only use columns of the table that are of a numeric data type.	False

Returns:

Type	Description
Dictionary	Map of columns and their yielded kurtosis values

Examples:

Calculate the kurt across the columns of a table

In [103]:

Copied!





tab = kx.Table(data=
    {
        'a': [1, 2, 2, 4],
        'b': [1, 2, 6, 7],
        'c': [7, 8, 9, 10],
        'd': [7, 11, 14, 14]
    }
)
tab
tab = kx.Table(data=
    {
        'a': [1, 2, 2, 4],
        'b': [1, 2, 6, 7],
        'c': [7, 8, 9, 10],
        'd': [7, 11, 14, 14]
    }
)
tab

Out[103]:

	a	b	c	d

0	1	1	7	7
1	2	2	8	11
2	2	6	9	14
3	4	7	10	14

In [104]:

Copied!

tab.kurt()
tab.kurt()

Out[104]:



a	2.227147
b	-4.890533
c	-1.2
d	-0.04958678

Calculate the kurtosis across the rows of a table

In [105]:

Copied!

tab.kurt(axis=1)
tab.kurt(axis=1)

Out[105]:



0	-6f
1	-3.901235
2	-0.1014759
3	-0.6838056

Table.sem()¶

Table.sem(axis=0, skipna=True, numeric_only=False, ddof=0)

Return unbiased standard error of the mean over requested axis. Normalized by N-1 by default. This can be changed using the ddof argument.

Parameters:

Name	Type	Description	Default
axis	int	The axis to calculate the sum across. 0 is columns, 1 is rows.	0
skipna	bool	not yet implemented	True
numeric_only	bool	Only use columns of the table that are of a numeric data type.	False
ddof	int	Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.	1

Returns:

Type	Description
Dictionary	The sem across each row / column with the key corresponding to the row number or column name.

Examples

Calculate the sem across the columns of a table

In [106]:

Copied!





tab = kx.Table(data=
        {
            'a': [1, 2, 2, 4],
            'b': [1, 2, 6, 7],
            'c': [7, 8, 9, 10],
            'd': [7, 11, 14, 14],
        }
    )
tab
tab = kx.Table(data=
        {
            'a': [1, 2, 2, 4],
            'b': [1, 2, 6, 7],
            'c': [7, 8, 9, 10],
            'd': [7, 11, 14, 14],
        }
    )
tab

Out[106]:

	a	b	c	d

0	1	1	7	7
1	2	2	8	11
2	2	6	9	14
3	4	7	10	14

In [107]:

Copied!

tab.sem()
tab.sem()

Out[107]:



a	0.6291529
b	1.47196
c	0.6454972
d	1.658312

Calculate the sem across the rows of a table

In [108]:

Copied!

tab.sem(axis=1)
tab.sem(axis=1)

Out[108]:



0	1.732051
1	2.25
2	2.528998
3	2.136001

Calculate sem accross columns with ddof=0:

In [109]:

Copied!

tab.sem(ddof=0)
tab.sem(ddof=0)

Out[109]:



a	0.5448624
b	1.274755
c	0.559017
d	1.436141

Table.skew()¶

Table.skew(axis=0, skipna=True, numeric_only=False)

Returns the skewness of all values across the given axis.

Parameters:

Name	Type	Description	Default
axis	int	The axis to calculate the skewness across 0 is columns, 1 is rows.	0
skipna	bool	Ignore any null values along the axis.	True
numeric_only	bool	Only use columns of the table that are of a numeric data type.	False

Returns:

Type	Description
Dictionary	A dictionary where the key represent the column name / row number and the values are the result of calling `skew` on that column / row.

In [110]:

Copied!

tab.skew(numeric_only=True)
tab.skew(numeric_only=True)

Out[110]:



a	1.129338
b	0f
c	0f
d	-1.096405

Table.std()¶

Table.std(axis=0, skipna=True, numeric_only=False, ddof=0)

Return sample standard deviation over requested axis. Normalized by N-1 by default. This can be changed using the ddof argument.

Parameters:

Name	Type	Description	Default
axis	int	The axis to calculate the sum across 0 is columns, 1 is rows.	0
skipna	bool	not yet implemented	True
numeric_only	bool	Only use columns of the table that are of a numeric data type.	False
ddof	int	Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.	1

Returns:

Type	Description
Table	The std across each row / column with the key corresponding to the row number or column name.

Examples:

Calculate the std across the columns of a table

In [111]:

Copied!





tab = kx.Table(data=
    {
        'a': [1, 2, 2, 4],
        'b': [1, 2, 6, 7],
        'c': [7, 8, 9, 10],
        'd': [7, 11, 14, 14]
    }
)
tab
tab = kx.Table(data=
    {
        'a': [1, 2, 2, 4],
        'b': [1, 2, 6, 7],
        'c': [7, 8, 9, 10],
        'd': [7, 11, 14, 14]
    }
)
tab

Out[111]:

	a	b	c	d

0	1	1	7	7
1	2	2	8	11
2	2	6	9	14
3	4	7	10	14

In [112]:

Copied!

tab.std()
tab.std()

Out[112]:



a	1.258306
b	2.94392
c	1.290994
d	3.316625

Calculate the std across the rows of a table

In [113]:

Copied!

tab.std(axis=1)
tab.std(axis=1)

Out[113]:



0	3.464102
1	4.5
2	5.057997
3	4.272002

Calculate std accross columns with ddof=0:

In [114]:

Copied!

tab.std(ddof=0)
tab.std(ddof=0)

Out[114]:



a	1.089725
b	2.54951
c	1.118034
d	2.872281

Group By¶

Table.groupby()¶

Table.groupby(
    by=None,
    axis=0,
    level=None,
    as_index=True,
    sort=True,
    group_keys=True,
    observed=False,
    dropna=True
)

Group data based on like values within columns to easily apply operations on groups.

Parameters:

Name	Type	Description	Default
by	Union[Symbol/SymbolVector/int/list]	The column name(s) or column index(es) to group the data on.	None
axis	int	Not Yet Implemented.	0
level	Union[Symbol/SymbolVector/int/list]	The column name(s) or column index(es) to group the data on.	None
as_index	bool	Return the table with groups as the key column.	True
sort	bool	Sort the resulting table based off the key.	True
group_keys	bool	Not Yet Implemented.	True
observed	bool	Not Yet Implemented.	False
dropna	bool	Drop groups where the group is null.	True

Either by or level can be used to specify the columns to group on, using both will raise an error.

Using and integer or list of integers is only possible when calling groupby on a KeyedTable object.

Returns:

Type	Description
GroupbyTable	The resulting table after the grouping is done.

Examples:

Example Table.

In [115]:

Copied!





tab = kx.Table(data={
    'Animal': ['Falcon', 'Falcon', 'Parrot', 'Parrot'],
    'Max Speed': [380., 370., 24., 26.],
    'Max Altitude': [570., 555., 275., 300.]
})

tab
tab = kx.Table(data={
    'Animal': ['Falcon', 'Falcon', 'Parrot', 'Parrot'],
    'Max Speed': [380., 370., 24., 26.],
    'Max Altitude': [570., 555., 275., 300.]
})

tab

Out[115]:

	Animal	Max Speed	Max Altitude

0	Falcon	380f	570f
1	Falcon	370f	555f
2	Parrot	24f	275f
3	Parrot	26f	300f

Group on the Animal column and calculate the mean of the resulting Max Speed and Max Altitude columns.

In [116]:

Copied!

tab.groupby(kx.SymbolVector(['Animal'])).mean()
tab.groupby(kx.SymbolVector(['Animal'])).mean()

Out[116]:

	Max Speed	Max Altitude
Animal
Falcon	375f	562.5
Parrot	25f	287.5

Example table with multiple columns to group on.

In [117]:

Copied!





tab = kx.Table(
    data={
        'Animal': ['Falcon', 'Falcon', 'Parrot', 'Parrot', 'Parrot'],
        'Type': ['Captive', 'Wild', 'Captive', 'Wild', 'Wild'],
        'Max Speed': [390., 350., 30., 20., 25.]
    })
tab = tab.set_index(2)
tab
tab = kx.Table(
    data={
        'Animal': ['Falcon', 'Falcon', 'Parrot', 'Parrot', 'Parrot'],
        'Type': ['Captive', 'Wild', 'Captive', 'Wild', 'Wild'],
        'Max Speed': [390., 350., 30., 20., 25.]
    })
tab = tab.set_index(2)
tab

Out[117]:

		Max Speed
Animal	Type
Falcon	Captive	390f
Falcon	Wild	350f
Parrot	Captive	30f
	Wild	20f
	Wild	25f

Group on multiple columns using thier indexes.

In [118]:

Copied!

tab.groupby(level=[0, 1]).mean()
tab.groupby(level=[0, 1]).mean()

Out[118]:

		Max Speed
Animal	Type
Falcon	Captive	390f
Falcon	Wild	350f
Parrot	Captive	30f
Parrot	Wild	22.5

Example table with Nulls.

In [119]:

Copied!





tab = kx.Table(
    [
        ["a", 12, 12],
        [kx.SymbolAtom.null, 12.3, 33.],
        ["b", 12.3, 123],
        ["a", 1, 1]
    ],
    columns=["a", "b", "c"]
)
tab
tab = kx.Table(
    [
        ["a", 12, 12],
        [kx.SymbolAtom.null, 12.3, 33.],
        ["b", 12.3, 123],
        ["a", 1, 1]
    ],
    columns=["a", "b", "c"]
)
tab

Out[119]:

	a	b	c

0	a	12	12
1		12.3	33f
2	b	12.3	123
3	a	1	1

Group on column a and keep null groups.

In [120]:

Copied!

tab.groupby('a', dropna=False).sum()
tab.groupby('a', dropna=False).sum()

Out[120]:

	b	c
a
	12.3	33f
a	13	13
b	12.3	123

Group on column a keeping null groups and not using the groups as an index column.

In [121]:

Copied!

tab.groupby('a', dropna=False, as_index=False).sum()
tab.groupby('a', dropna=False, as_index=False).sum()

Out[121]:

	a	b	c
idx
0		12.3	33f
1	a	13	13
2	b	12.3	123

Apply¶

Table.apply()¶

Table.apply(
    func,
    *args,
    axis=0,
    raw=None,
    result_type=None,
    **kwargs
)

Apply a function along an axis of the DataFrame.

Objects passed to a function are passed as kx list objects.

Parameters:

Name	Type	Description	Default
func	function	Function to apply to each column or row.
`*args`	any	Positional arguments to pass to `func` in addition to the kx list.
axis	int	The axis along which the function is applied, `0` applies function to each column, `1` applied function to each row.	0
raw	bool	Not yet implemented.	None
result_type	str	Not yet implemented.	None
`**kwargs`	dict	Additional keyword arguments to pass as keywords to `func`, this argument is not implemented in the case `func` is a kx callable function.	None

Returns:

Type	Description
List, Dictionary or Table	Result of applying `func` along the giveen axis of the `kx.Table`.

Examples:

Example Table.

In [122]:

Copied!

tab = kx.Table([[4, 9]] * 3, columns=['A', 'B'])

tab
tab = kx.Table([[4, 9]] * 3, columns=['A', 'B'])

tab

Out[122]:

	A	B

0	4	9
1	4	9
2	4	9

Apply square root on each item within a column

In [123]:

Copied!

tab.apply(kx.q.sqrt)
tab.apply(kx.q.sqrt)

Out[123]:

	A	B

0	2f	3f
1	2f	3f
2	2f	3f

Apply a reducing function sum on either axis

In [124]:

Copied!

tab.apply(kx.q.sum)
tab.apply(kx.q.sum)

Out[124]:



A	12
B	27

In [125]:

Copied!

tab.apply(lambda x: sum(x), axis=1)
tab.apply(lambda x: sum(x), axis=1)

Out[125]:

pykx.LongVector(pykx.q('13 13 13'))

Aggregate¶

Table.agg()¶

Table.agg(
    func,
    axis=0,
    *args,
    **kwargs
)

Aggregate data using one or more operations over a specified axis

Objects passed to a function are passed as kx vector/list objects.

Parameters:

Name	Type	Description	Default
func	function, str, list or dict	Function to use for aggregating the data. If a function this must either work when passed a `Table` or when passed to `Table.apply` Accepted combinations are: function string function name list of functions and/or function names, e.g. `[kx.q.sum, 'mean']` dict of axis labels -> functions or function names
`*args`	any	Positional arguments to pass to `func` in addition to the kx list.
axis	int	The axis along which the function is applied, `0` applies function to each column, at present row based application is not supported.	0
`**kwargs`	dict	Additional keyword arguments to pass as keywords to `func`, this argument is not implemented in the case `func` is a kx callable function.	None

Returns:

Type	Description
List, Dictionary or Table	Result of applying `func` along the giveen axis of the `kx.Table`.

Examples:

Example Table.

In [126]:

Copied!





tab = kx.Table([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9],
                   [kx.FloatAtom.null, kx.FloatAtom.null, kx.FloatAtom.null]],
                  columns=['A', 'B', 'C'])

tab
tab = kx.Table([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9],
                   [kx.FloatAtom.null, kx.FloatAtom.null, kx.FloatAtom.null]],
                  columns=['A', 'B', 'C'])

tab

Out[126]:

	A	B	C

0	1	2	3
1	4	5	6
2	7	8	9
3	0n	0n	0n

Aggregate a list of functions over rows

In [127]:

Copied!

tab.agg(['sum', 'min'])
tab.agg(['sum', 'min'])

Out[127]:

	A	B	C
function
sum	12	15	18
min	1	2	3

Perform an aggregation using a user specified function

In [128]:

Copied!





import statistics
def mode(x):
    return statistics.mode(x)
tab.agg(mode)
import statistics
def mode(x):
    return statistics.mode(x)
tab.agg(mode)

Out[128]:



A	1
B	2
C	3

Apply an aggregation supplying column specification for supplied function

In [129]:

Copied!

tab.agg({'A': 'max', 'B': mode})
tab.agg({'A': 'max', 'B': mode})

Out[129]:

	A	B	C
function
max	7	()	()
mode	0N	2	()

Data Preprocessing¶

Table.add_prefix()¶

Table.add_prefix(columns)

Rename columns adding a prefix in a table and return the resulting Table object.

Parameters:

Name	Type	Description	Default
prefix	str	The string that will be concatenated with the name of the columns	required
axis	int	Axis to add prefix on.	0

Returns:

Type	Description
Table	A table with the given column(s) renamed adding a prefix.

Examples:

The initial table to which a prefix will be added to its columns

In [130]:

Copied!

tab.head()
tab.head()

Out[130]:

	A	B	C

0	1	2	3
1	4	5	6
2	7	8	9
3	0n	0n	0n
4	1	2	3

Add "col_" to table columns:

In [131]:

Copied!

tab.add_prefix(prefix="col_").head()
tab.add_prefix(prefix="col_").head()

Out[131]:

	col_A	col_B	col_C

0	1	2	3
1	4	5	6
2	7	8	9
3	0n	0n	0n
4	1	2	3

Table.add_suffix()¶

Table.add_suffix(columns)

Rename columns adding a suffix in a table and return the resulting Table object.

Parameters:

Name	Type	Description	Default
suffix	str	The string that will be concatenated with the name of the columns	required
axis	int	Axis to add suffix on.	0

Returns:

Type	Description
Table	A table with the given column(s) renamed adding a suffix.

Examples:

The initial table to which a suffix will be added to its columns

In [132]:

Copied!

tab.head()
tab.head()

Out[132]:

	A	B	C

0	1	2	3
1	4	5	6
2	7	8	9
3	0n	0n	0n
4	1	2	3

Add "_col" to table columns:

In [133]:

Copied!

tab.add_suffix(suffix="_col").head()
tab.add_suffix(suffix="_col").head()

Out[133]:

	A_col	B_col	C_col

0	1	2	3
1	4	5	6
2	7	8	9
3	0n	0n	0n
4	1	2	3

Table.astype()¶

Table.astype(dtype, copy=True, errors='raise')

Cast a column/columns of the Dataframes object to a specified dtype.

Parameters:

Name	Type	Description	Default
dtype	data type, or dict of column name -> data type	Use a PyKx wrapper data type or Python type to cast all columns to the same type. Alternatively, use {col: dtype, …}, where col is a column label and dtype is PyKx wrapper data type to cast one or more of the DataFrame’s columns to column-specific types.
copy	Boolean	Default of True, False not implemented	True
errors	{‘raise’, ‘ignore’}	If passed anything other than 'raise', it will return the dataframe	'raise'

Returns:

Type	Description
Dataframe	The dataframe with columns casted according to passed dtypes

Examples:

The examples in the section will use the example table.

In [134]:

Copied!





df = kx.Table(data = {
  'c1': kx.IntVector([1, 2, 3]),
  'c2': kx.LongVector([1, 2, 3]),
  'c3': kx.ShortVector([1, 2, 3]),
  'c4': kx.IntVector([1, 2, 3])
  })
df = kx.Table(data = {
  'c1': kx.IntVector([1, 2, 3]),
  'c2': kx.LongVector([1, 2, 3]),
  'c3': kx.ShortVector([1, 2, 3]),
  'c4': kx.IntVector([1, 2, 3])
  })

Cast all columns to dtype LongVector

In [135]:

Copied!

df.astype(kx.LongVector)
df.astype(kx.LongVector)

Out[135]:

	c1	c2	c3	c4

0	1	1	1	1
1	2	2	2	2
2	3	3	3	3

Casting as specified in the dictionary supplied with given dtype per column

In [136]:

Copied!

df.astype({'c1':kx.LongVector, 'c2':'kx.ShortVector'})
df.astype({'c1':kx.LongVector, 'c2':'kx.ShortVector'})

Out[136]:

	c1	c2	c3	c4

0	1	1h	1h	1i
1	2	2h	2h	2i
2	3	3h	3h	3i

The next example will use this table

In [137]:

Copied!





df = kx.Table(data={
    'c1': kx.TimestampAtom('now'),
    'c2': ['abc', 'def', 'ghi'],
    'c3': [1, 2, 3],
    'c4': [b'abc', b'def', b'ghi'],
    'c5': b'abc',
    'c6': [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
    })
df
df = kx.Table(data={
    'c1': kx.TimestampAtom('now'),
    'c2': ['abc', 'def', 'ghi'],
    'c3': [1, 2, 3],
    'c4': [b'abc', b'def', b'ghi'],
    'c5': b'abc',
    'c6': [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
    })
df

Out[137]:

	c1	c2	c3	c4	c5	c6

0	2024.10.22D13:25:45.927484586	abc	1	"abc"	"a"	1 2 3
1	2024.10.22D13:25:45.927484586	def	2	"def"	"b"	4 5 6
2	2024.10.22D13:25:45.927484586	ghi	3	"ghi"	"c"	7 8 9

Casting char and string columns to symbol columns

In [138]:

Copied!

df.astype({'c4':kx.SymbolVector, 'c5':kx.SymbolVector})
df.astype({'c4':kx.SymbolVector, 'c5':kx.SymbolVector})

Out[138]:

	c1	c2	c3	c4	c5	c6

0	2024.10.22D13:25:45.927484586	abc	1	abc	a	1 2 3
1	2024.10.22D13:25:45.927484586	def	2	def	b	4 5 6
2	2024.10.22D13:25:45.927484586	ghi	3	ghi	c	7 8 9

Table.drop()¶

Table.drop(item, axis=0)

Remove either columns or rows from a table and return the resulting Table object.

Parameters:

Name	Type	Description	Default
item	Union[str, list[str]]	The column name(s) or row number(s) to drop from the table.	required
axis	int	The column name or list of names to pop from the table.	0

Returns:

Type	Description
Table	A table with the given column(s) / row(s) removed.

Examples:

Drop rows from a table.

In [139]:

Copied!





# The examples in this section will use this example table filled with random data
N = 1000
tab = kx.Table(data = {
    'x': kx.q.til(N),
    'y': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),
    'z': kx.random.random(N, 500.0),
    'w': kx.random.random(N, 1000),
    'v': kx.random.random(N, [kx.LongAtom.null, 0, 50, 100, 200, 250])})
tab.head()
# The examples in this section will use this example table filled with random data
N = 1000
tab = kx.Table(data = {
    'x': kx.q.til(N),
    'y': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),
    'z': kx.random.random(N, 500.0),
    'w': kx.random.random(N, 1000),
    'v': kx.random.random(N, [kx.LongAtom.null, 0, 50, 100, 200, 250])})
tab.head()

Out[139]:

	x	y	z	w	v

0	0	AAPL	199.1148	467	0
1	1	MSFT	123.2344	832	50
2	2	GOOG	301.4733	150	0
3	3	GOOG	36.14336	784	0N
4	4	MSFT	246.0589	802	100

In [140]:

Copied!

tab.drop([0, 2, 4, 6, 8, 10]).head()
tab.drop([0, 2, 4, 6, 8, 10]).head()

Out[140]:

	x	y	z	w	v

0	1	MSFT	123.2344	832	50
1	3	GOOG	36.14336	784	0N
2	5	MSFT	449.9769	447	50
3	7	GOOG	478.7092	437	250
4	9	GOOG	182.2163	25	0N

Drop columns from a table.

In [141]:

Copied!

tab.drop('y', axis=1).head()
tab.drop('y', axis=1).head()

Out[141]:

	x	z	w	v

0	0	199.1148	467	0
1	1	123.2344	832	50
2	2	301.4733	150	0
3	3	36.14336	784	0N
4	4	246.0589	802	100

Table.drop_duplicates()¶

Table.drop_duplicates()

Remove either columns or rows from a table and return the resulting Table object.

Returns:

Type	Description
Table	A table with all duplicate rows removed.

Examples:

Create a table with duplicates for the example

In [142]:

Copied!





N = 100
tab2 = kx.Table(data ={
    'x': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),
    'x1': kx.random.random(N, 3)
    })
tab2
N = 100
tab2 = kx.Table(data ={
    'x': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),
    'x1': kx.random.random(N, 3)
    })
tab2

Out[142]:

	x	x1

0	MSFT	0
1	AAPL	1
2	AAPL	2
3	MSFT	0
4	MSFT	1
5	GOOG	1
6	GOOG	2
7	MSFT	0
...	...	...
99	AAPL	2

100 rows × 2 columns

Drop all duplicate rows from the table.

In [143]:

Copied!

tab2.drop_duplicates()
tab2.drop_duplicates()

Out[143]:

	x	x1

0	MSFT	0
1	AAPL	1
2	AAPL	2
3	MSFT	1
4	GOOG	1
5	GOOG	2
6	AAPL	0
7	MSFT	2
8	GOOG	0

Table.pop()¶

Table.pop(item)

Remove a column or columns from a table by column name and return the column after it has been removed.

Parameters:

Name	Type	Description	Default
item	Union[str, list[str]]	The column name or list of names to pop from the table.	required

Returns:

Type	Description
Table	A table containing only the columns removed from the input table.

Examples:

Remove the v column from the table and return it.

In [144]:

Copied!





display(tab.head())
print('\n\nPop the `v` column out of the table')
display(tab.pop("v"))
print('\n\nUpdated Table')
display(tab.head())
display(tab.head())
print('\n\nPop the `v` column out of the table')
display(tab.pop("v"))
print('\n\nUpdated Table')
display(tab.head())

	x	y	z	w	v

0	0	AAPL	199.1148	467	0
1	1	MSFT	123.2344	832	50
2	2	GOOG	301.4733	150	0
3	3	GOOG	36.14336	784	0N
4	4	MSFT	246.0589	802	100


Pop the `v` column out of the table

	v

0	0
1	50
2	0
3	0N
4	100
5	50
6	50
7	250
...	...
999	0N

1,000 rows × 1 columns


Updated Table

	x	y	z	w

0	0	AAPL	199.1148	467
1	1	MSFT	123.2344	832
2	2	GOOG	301.4733	150
3	3	GOOG	36.14336	784
4	4	MSFT	246.0589	802

Remove the z and w columns from the table and return them.

In [145]:

Copied!





display(tab.head())
print('\n\nPop the `z` and `w` columns out of the table')
display(tab.pop(["z", "w"]).head())
print('\n\nUpdated Table')
display(tab.head())
display(tab.head())
print('\n\nPop the `z` and `w` columns out of the table')
display(tab.pop(["z", "w"]).head())
print('\n\nUpdated Table')
display(tab.head())

	x	y	z	w

0	0	AAPL	199.1148	467
1	1	MSFT	123.2344	832
2	2	GOOG	301.4733	150
3	3	GOOG	36.14336	784
4	4	MSFT	246.0589	802


Pop the `z` and `w` columns out of the table

	z	w

0	199.1148	467
1	123.2344	832
2	301.4733	150
3	36.14336	784
4	246.0589	802


Updated Table

	x	y

0	0	AAPL
1	1	MSFT
2	2	GOOG
3	3	GOOG
4	4	MSFT

Table.rename()¶

Table.rename(labels=None, index=None, columns=None, axis=None, copy=None, inplace=False, level=None, errors='ignore', mapper=None)

Rename columns in a table and return the resulting Table object.

Parameters:

Name	Type	Description	Default
labels	dict	Deprecated. Please use `mapper` keyword.	None
columns	dict	A dictionary of column name to new column name to use when renaming.	None
index	dict	A dictionary of index to new index name to use when renaming single key column keyed tables.	None
axis	{0 or 'index', 1 or 'columns'}	Designating the axis to be renamed by the mapper dictionary.	None
copy	None	Not yet implemented.	None
inplace	bool	Not yet implemented.	None
level	None	Not yet implemented.	None
errors	string	Not yet implemented.	None
mapper	dict	A dictionary of either new index or column names to new names to be used in conjunction with the axis parameter.	None

Returns:

Type	Description
Table	A table with the given columns or indices renamed.

Examples:

The initial table we will be renaming columns on and a keyed table to rename the index on.

In [146]:

Copied!

tab.head()
key_tab = kx.KeyedTable(data=tab)
tab.head()
key_tab = kx.KeyedTable(data=tab)

Rename column x to index and y to symbol using the columns keyword.

In [147]:

Copied!

tab.rename(columns={'x': 'index', 'y': 'symbol'}).head()
tab.rename(columns={'x': 'index', 'y': 'symbol'}).head()

Out[147]:

	index	symbol

0	0	AAPL
1	1	MSFT
2	2	GOOG
3	3	GOOG
4	4	MSFT

Rename column x to index and y to symbol by setting the axis keyword.

In [148]:

Copied!

tab.rename({'x': 'index', 'y': 'symbol'}, axis = 1).head()
tab.rename({'x': 'index', 'y': 'symbol'}, axis = 1).head()

Out[148]:

	index	symbol

0	0	AAPL
1	1	MSFT
2	2	GOOG
3	3	GOOG
4	4	MSFT

Rename index of a keyed table by using literal index as the axis parameter.

In [149]:

Copied!

key_tab.rename({0:"a", 1:"b"}, axis = 'index').head()
key_tab.rename({0:"a", 1:"b"}, axis = 'index').head()

Out[149]:

	x	y
idx
`a	0	AAPL
`b	1	MSFT
2	2	GOOG
3	3	GOOG
4	4	MSFT

Table.replace()¶

Table.replace(to_replace, value)

Replace all values in a table with another given value.

Parameters:

Name	Type	Description	Default
to_replace	any	Value of element in table you wish to replace.	None
value	any	New value to perform replace with.	None

Returns:

Type	Description
Table	A table with the given elements replaced with new value.

Examples

Create an unkeyed Table and a KeyedTable with elements to be replaced.

In [150]:

Copied!

tab = kx.q('([] a:2 2 3; b:4 2 6; c:(1b;0b;1b); d:(`a;`b;`c); e:(1;2;`a))')
ktab = kx.q('([a:2 2 3]b:4 2 6; c:(1b;0b;1b); d:(`a;`b;`c); e:(1;2;`a))')
ktab
tab = kx.q('([] a:2 2 3; b:4 2 6; c:(1b;0b;1b); d:(`a;`b;`c); e:(1;2;`a))')
ktab = kx.q('([a:2 2 3]b:4 2 6; c:(1b;0b;1b); d:(`a;`b;`c); e:(1;2;`a))')
ktab

Out[150]:

	b	c	d	e
a
2	4	1b	a	1
2	2	0b	b	2
3	6	1b	c	`a

Replace all instances of 2 in the KeyedTable with 123. Note the key column remains unchanged.

In [151]:

Copied!

ktab.replace(2,123)
ktab.replace(2,123)

Out[151]:

	b	c	d	e
a
2	4	1b	a	1
2	123	0b	b	123
3	6	1b	c	`a

Replace all True values with a list of strings.

In [152]:

Copied!

tab.replace(True, (b"one", b"two", b"three"))
tab.replace(True, (b"one", b"two", b"three"))

Out[152]:

	a	b	c	d	e

0	2	4	("one";"two";"three")	a	1
1	2	2	0b	b	2
2	3	6	("one";"two";"three")	c	`a

Table.reset_index()¶

Table.reset_index(levels, *,
                  drop=False, inplace=False,
                  col_level=0, col_fill='',
                  allow_duplicates=False,
                  names=None)

Reset the keys/index of a keyed PyKX table. This can be used to remove/unset one or more keys within a table.

Parameters:

Name	Type	Description	Default
level	int, str or list	The name/indices of the keys to be reset within the table.	None
drop	Boolean	Should remaining key columns be removed from the table post index resetting.	False
inplace	Boolean	Not Yet Implemented	False
col_level	int or str	Not Yet Implemented	0
col_fill	object	Not Yet Implemented	''
allow_duplicates	Boolean	Can duplicate columns be created	False
names	str or list	Not Yet Implemented	None

Returns:

Type	Description
Dataframe	The dataframe with table updated following index reset request

Examples:

Generate data to be used for index resetting

In [153]:

Copied!





N = 1000
qtab = kx.Table(data = {
    'x0': kx.random.random(N, ['a', 'b', 'c']),
    'x1': kx.random.random(N, ['d', 'e', 'f']),
    'x2': kx.random.random(N, ['g', 'h', 'i']),
    'y0': kx.random.random(N, 10.0),
    'y1': kx.random.random(N, 10.0),
    'y2': kx.random.random(N, kx.GUIDAtom.null)
    }).set_index(['x0', 'x1', 'x2'])
qtab
N = 1000
qtab = kx.Table(data = {
    'x0': kx.random.random(N, ['a', 'b', 'c']),
    'x1': kx.random.random(N, ['d', 'e', 'f']),
    'x2': kx.random.random(N, ['g', 'h', 'i']),
    'y0': kx.random.random(N, 10.0),
    'y1': kx.random.random(N, 10.0),
    'y2': kx.random.random(N, kx.GUIDAtom.null)
    }).set_index(['x0', 'x1', 'x2'])
qtab

Out[153]:

			y0	y1	y2
x0	x1	x2
a	f	i	1.238681	6.299753	0248b2cb-974d-50d7-f79c-b7c27fac8446
	d	h	2.194869	8.953976	ccccfb99-20f7-ba7f-3f32-265a9031a3a1
	d	g	1.155368	8.708738	cdeab52e-ac7b-6b53-3dee-3c810743aaf0
c	d	i	9.413713	5.084035	9f1da09e-19ce-6297-f245-ad903b405afc
c	f	h	2.205184	2.474564	a6478672-84d9-50e7-ed24-6a20ce93f66d
a	d	h	3.715352	3.317283	848e2f8c-d8c9-4e2d-a44a-12c4b062f0a4
b	d	g	8.84107	7.208391	800b0a15-1bb3-259e-985d-0bb9b100b86e
c	e	h	2.894724	3.737715	05b7da51-c08f-df70-449e-a3e9bb3486c3
...	...	...	...	...	...
a	d	g	6.613845	2.998951	cae053d3-6598-5ec7-4d7b-c31cebbf72ec

1,000 rows × 6 columns

Resetting the index of the table will result in original index columns being added to the table directly

In [154]:

Copied!

qtab.reset_index()
qtab.reset_index()

Out[154]:

	x0	x1	x2	y0	y1	y2

0	a	f	i	1.238681	6.299753	0248b2cb-974d-50d7-f79c-b7c27fac8446
1	a	d	h	2.194869	8.953976	ccccfb99-20f7-ba7f-3f32-265a9031a3a1
2	a	d	g	1.155368	8.708738	cdeab52e-ac7b-6b53-3dee-3c810743aaf0
3	c	d	i	9.413713	5.084035	9f1da09e-19ce-6297-f245-ad903b405afc
4	c	f	h	2.205184	2.474564	a6478672-84d9-50e7-ed24-6a20ce93f66d
5	a	d	h	3.715352	3.317283	848e2f8c-d8c9-4e2d-a44a-12c4b062f0a4
6	b	d	g	8.84107	7.208391	800b0a15-1bb3-259e-985d-0bb9b100b86e
7	c	e	h	2.894724	3.737715	05b7da51-c08f-df70-449e-a3e9bb3486c3
...	...	...	...	...	...	...
999	a	d	g	6.613845	2.998951	cae053d3-6598-5ec7-4d7b-c31cebbf72ec

1,000 rows × 6 columns

Reset the index adding a specified named column to the table

In [155]:

Copied!

qtab.reset_index('x0')
qtab.reset_index('x0')

Out[155]:

		x0	y0	y1	y2
x1	x2
f	i	a	1.238681	6.299753	0248b2cb-974d-50d7-f79c-b7c27fac8446
d	h	a	2.194869	8.953976	ccccfb99-20f7-ba7f-3f32-265a9031a3a1
	g	a	1.155368	8.708738	cdeab52e-ac7b-6b53-3dee-3c810743aaf0
	i	c	9.413713	5.084035	9f1da09e-19ce-6297-f245-ad903b405afc
f	h	c	2.205184	2.474564	a6478672-84d9-50e7-ed24-6a20ce93f66d
d	h	a	3.715352	3.317283	848e2f8c-d8c9-4e2d-a44a-12c4b062f0a4
d	g	b	8.84107	7.208391	800b0a15-1bb3-259e-985d-0bb9b100b86e
e	h	c	2.894724	3.737715	05b7da51-c08f-df70-449e-a3e9bb3486c3
...	...	...	...	...	...
d	g	a	6.613845	2.998951	cae053d3-6598-5ec7-4d7b-c31cebbf72ec

1,000 rows × 6 columns

Reset the index using multiple named columns

In [156]:

Copied!

qtab.reset_index(['x0', 'x1'])
qtab.reset_index(['x0', 'x1'])

Out[156]:

	x0	x1	y0	y1	y2
x2
i	a	f	1.238681	6.299753	0248b2cb-974d-50d7-f79c-b7c27fac8446
h	a	d	2.194869	8.953976	ccccfb99-20f7-ba7f-3f32-265a9031a3a1
g	a	d	1.155368	8.708738	cdeab52e-ac7b-6b53-3dee-3c810743aaf0
i	c	d	9.413713	5.084035	9f1da09e-19ce-6297-f245-ad903b405afc
h	c	f	2.205184	2.474564	a6478672-84d9-50e7-ed24-6a20ce93f66d
h	a	d	3.715352	3.317283	848e2f8c-d8c9-4e2d-a44a-12c4b062f0a4
g	b	d	8.84107	7.208391	800b0a15-1bb3-259e-985d-0bb9b100b86e
h	c	e	2.894724	3.737715	05b7da51-c08f-df70-449e-a3e9bb3486c3
...	...	...	...	...	...
g	a	d	6.613845	2.998951	cae053d3-6598-5ec7-4d7b-c31cebbf72ec

1,000 rows × 6 columns

Reset the index specifying the column number which is to be added to the table

In [157]:

Copied!

qtab.reset_index(0)
qtab.reset_index(0)

Out[157]:

		x0	y0	y1	y2
x1	x2
f	i	a	1.238681	6.299753	0248b2cb-974d-50d7-f79c-b7c27fac8446
d	h	a	2.194869	8.953976	ccccfb99-20f7-ba7f-3f32-265a9031a3a1
	g	a	1.155368	8.708738	cdeab52e-ac7b-6b53-3dee-3c810743aaf0
	i	c	9.413713	5.084035	9f1da09e-19ce-6297-f245-ad903b405afc
f	h	c	2.205184	2.474564	a6478672-84d9-50e7-ed24-6a20ce93f66d
d	h	a	3.715352	3.317283	848e2f8c-d8c9-4e2d-a44a-12c4b062f0a4
d	g	b	8.84107	7.208391	800b0a15-1bb3-259e-985d-0bb9b100b86e
e	h	c	2.894724	3.737715	05b7da51-c08f-df70-449e-a3e9bb3486c3
...	...	...	...	...	...
d	g	a	6.613845	2.998951	cae053d3-6598-5ec7-4d7b-c31cebbf72ec

1,000 rows × 6 columns

Reset the index specifying multiple numbered columns

In [158]:

Copied!

qtab.reset_index([0, 2])
qtab.reset_index([0, 2])

Out[158]:

	x0	x2	y0	y1	y2
x1
f	a	i	1.238681	6.299753	0248b2cb-974d-50d7-f79c-b7c27fac8446
d	a	h	2.194869	8.953976	ccccfb99-20f7-ba7f-3f32-265a9031a3a1
d	a	g	1.155368	8.708738	cdeab52e-ac7b-6b53-3dee-3c810743aaf0
d	c	i	9.413713	5.084035	9f1da09e-19ce-6297-f245-ad903b405afc
f	c	h	2.205184	2.474564	a6478672-84d9-50e7-ed24-6a20ce93f66d
d	a	h	3.715352	3.317283	848e2f8c-d8c9-4e2d-a44a-12c4b062f0a4
d	b	g	8.84107	7.208391	800b0a15-1bb3-259e-985d-0bb9b100b86e
e	c	h	2.894724	3.737715	05b7da51-c08f-df70-449e-a3e9bb3486c3
...	...	...	...	...	...
d	a	g	6.613845	2.998951	cae053d3-6598-5ec7-4d7b-c31cebbf72ec

1,000 rows × 6 columns

Drop index columns from table

In [159]:

Copied!

qtab.reset_index(drop=True)
qtab.reset_index(drop=True)

Out[159]:

	y0	y1	y2

0	1.238681	6.299753	0248b2cb-974d-50d7-f79c-b7c27fac8446
1	2.194869	8.953976	ccccfb99-20f7-ba7f-3f32-265a9031a3a1
2	1.155368	8.708738	cdeab52e-ac7b-6b53-3dee-3c810743aaf0
3	9.413713	5.084035	9f1da09e-19ce-6297-f245-ad903b405afc
4	2.205184	2.474564	a6478672-84d9-50e7-ed24-6a20ce93f66d
5	3.715352	3.317283	848e2f8c-d8c9-4e2d-a44a-12c4b062f0a4
6	8.84107	7.208391	800b0a15-1bb3-259e-985d-0bb9b100b86e
7	2.894724	3.737715	05b7da51-c08f-df70-449e-a3e9bb3486c3
...	...	...	...
999	6.613845	2.998951	cae053d3-6598-5ec7-4d7b-c31cebbf72ec

1,000 rows × 3 columns

Drop specified key columns from table

In [160]:

Copied!

qtab.reset_index('x0', drop=True)
qtab.reset_index('x0', drop=True)

Out[160]:

		y0	y1	y2
x1	x2
f	i	1.238681	6.299753	0248b2cb-974d-50d7-f79c-b7c27fac8446
d	h	2.194869	8.953976	ccccfb99-20f7-ba7f-3f32-265a9031a3a1
	g	1.155368	8.708738	cdeab52e-ac7b-6b53-3dee-3c810743aaf0
	i	9.413713	5.084035	9f1da09e-19ce-6297-f245-ad903b405afc
f	h	2.205184	2.474564	a6478672-84d9-50e7-ed24-6a20ce93f66d
d	h	3.715352	3.317283	848e2f8c-d8c9-4e2d-a44a-12c4b062f0a4
d	g	8.84107	7.208391	800b0a15-1bb3-259e-985d-0bb9b100b86e
e	h	2.894724	3.737715	05b7da51-c08f-df70-449e-a3e9bb3486c3
...	...	...	...	...
d	g	6.613845	2.998951	cae053d3-6598-5ec7-4d7b-c31cebbf72ec

1,000 rows × 5 columns

Table.set_index()¶

Table.set_index(
    keys,
    drop=True,
    append=False,
    inplace=False,
    verify_integrity=False,
)

Add index/indexes to a Table/KeyedTable.

Parameters:

Name	Type	Description	Default
keys	Union[Symbol/SymbolVector/Table]	The key(s) or data to key on	required
drop	bool	Not Yet Implemented	True
append	bool	Whether to append columns to existing index.	False
inplace	bool	Not Yet Implemented	False
verify_integrity	bool	Check the new index for duplicates	False

Returns:

Type	Description
KeyedTable	The resulting table after the index is applied

Examples:

Adding indexes:

In [161]:

Copied!





N = 10
tab = kx.Table(data={
    'sym': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),
    'price': 2.5 - kx.random.random(N, 5.0),
    'traded': 10 - kx.random.random(N, 20),
    'hold': kx.random.random(N, False)
    })
N = 10
tab = kx.Table(data={
    'sym': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),
    'price': 2.5 - kx.random.random(N, 5.0),
    'traded': 10 - kx.random.random(N, 20),
    'hold': kx.random.random(N, False)
    })

In [162]:

Copied!

#Setting a single index
tab.set_index('sym')
#Setting a single index
tab.set_index('sym')

Out[162]:

	price	traded	hold
sym
GOOG	-0.04115503	0	0b
MSFT	2.057705	-7	1b
AAPL	0.5650117	9	0b
GOOG	-1.54609	-1	0b
GOOG	-0.9236442	7	1b
MSFT	-2.386905	-7	1b
GOOG	-1.61369	3	0b
GOOG	-1.561493	2	1b
MSFT	-1.895856	2	0b

In [163]:

Copied!

#Setting multipe indexes
tab.set_index(['sym', 'traded'])
#Setting multipe indexes
tab.set_index(['sym', 'traded'])

Out[163]:

		price	hold
sym	traded
GOOG	0	-0.04115503	0b
MSFT	-7	2.057705	1b
AAPL	9	0.5650117	0b
GOOG	-1	-1.54609	0b
GOOG	7	-0.9236442	1b
MSFT	-7	-2.386905	1b
GOOG	3	-1.61369	0b
GOOG	2	-1.561493	1b
MSFT	2	-1.895856	0b

In [164]:

Copied!

#Pass a table as index (lengths must match)
status = kx.q('{select movement from ungroup select movement:`down`up 0<=deltas price by sym from x}',tab)
tab.set_index(status)
#Pass a table as index (lengths must match)
status = kx.q('{select movement from ungroup select movement:`down`up 0<=deltas price by sym from x}',tab)
tab.set_index(status)

Out[164]:

	sym	price	traded	hold
movement
up	GOOG	-0.04115503	0	0b
down	MSFT	2.057705	-7	1b
down	AAPL	0.5650117	9	0b
up	GOOG	-1.54609	-1	0b
down	GOOG	-0.9236442	7	1b
up	MSFT	-2.386905	-7	1b
up	GOOG	-1.61369	3	0b
down	GOOG	-1.561493	2	1b
up	MSFT	-1.895856	2	0b

Appending:

In [165]:

Copied!

#Default is false - previous index 'sym' deleted and replaced by 'hold'
tab.set_index('sym').set_index('hold')
#Default is false - previous index 'sym' deleted and replaced by 'hold'
tab.set_index('sym').set_index('hold')

Out[165]:

	price	traded
hold
0b	-0.04115503	0
1b	2.057705	-7
0b	0.5650117	9
0b	-1.54609	-1
1b	-0.9236442	7
1b	-2.386905	-7
0b	-1.61369	3
1b	-1.561493	2
0b	-1.895856	2

In [166]:

Copied!

#append= True will retain 'sym' index and add 'hold' as second index
tab.set_index('sym').set_index('hold', append= True)
#append= True will retain 'sym' index and add 'hold' as second index
tab.set_index('sym').set_index('hold', append= True)

Out[166]:

		price	traded
sym	hold
GOOG	0b	-0.04115503	0
MSFT	1b	2.057705	-7
AAPL	0b	0.5650117	9
GOOG	0b	-1.54609	-1
GOOG	1b	-0.9236442	7
MSFT	1b	-2.386905	-7
GOOG	0b	-1.61369	3
GOOG	1b	-1.561493	2
MSFT	0b	-1.895856	2

Verify Integrity:

In [167]:

Copied!

#Will allow duplicates in index:
tab.set_index('sym')
#Will allow duplicates in index:
tab.set_index('sym')

Out[167]:

	price	traded	hold
sym
GOOG	-0.04115503	0	0b
MSFT	2.057705	-7	1b
AAPL	0.5650117	9	0b
GOOG	-1.54609	-1	0b
GOOG	-0.9236442	7	1b
MSFT	-2.386905	-7	1b
GOOG	-1.61369	3	0b
GOOG	-1.561493	2	1b
MSFT	-1.895856	2	0b

In [168]:

Copied!





#Will error as 'sym' has duplicates
try:
    tab.set_index('sym', verify_integrity= True)
except kx.QError as e:
    print(f'Caught Error: {e}')
#Will error as 'sym' has duplicates
try:
    tab.set_index('sym', verify_integrity= True)
except kx.QError as e:
    print(f'Caught Error: {e}')

Caught Error: Index has duplicate key(s)