{
"cells": [
{
"cell_type": "markdown",
"id": "d2a3ccf7",
"metadata": {},
"source": [
"# Pandas API\n",
"The purpose of this notebook is to provide a demonstration of the capabilities of the pandas like API for PyKX Table objects.\n",
"\n",
"To follow along please download this notebook using the following link.\n",
"\n",
"This demonstration will outline the following\n",
"\n",
"1. [Constructing Tables](#Constructing-Tables)\n",
"2. [Metadata](#Metadata)\n",
"3. [Querying and Data Interrogation](#Querying-and-Data-Interrogation)\n",
"4. [Data Joins/Merging](#Data-Joins/Merging)\n",
"5. [Analytic Functionality](#Analytic-functionality)\n",
"6. [Data Preprocessing](#Data-Preprocessing)"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "13267c00",
"metadata": {
"tags": [
"hide_code"
]
},
"outputs": [],
"source": [
"import os\n",
"os.environ['IGNORE_QHOME'] = '1' # Ignore symlinking PyKX q libraries to QHOME \n",
"os.environ['PYKX_Q_LOADED_MARKER'] = '' # Only used here for running Notebook under mkdocs-jupyter during document generation."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "44c90043",
"metadata": {},
"outputs": [],
"source": [
"import pykx as kx\n",
"import numpy as np\n",
"import pandas as pd\n",
"kx.q.system.console_size = [10, 80]"
]
},
{
"cell_type": "markdown",
"id": "06e3f624",
"metadata": {},
"source": [
"## Constructing Tables"
]
},
{
"cell_type": "markdown",
"id": "31561309",
"metadata": {},
"source": [
"### Table\n",
"\n",
"Create a table from a list of rows or by converting a Python dictionary object\n",
"\n",
"**Parameters:**\n",
"\n",
"| Name | Type | Description | Default |\n",
"| :-----: | :----------------: | :--------------------------------------------------------------------- | :------: |\n",
"| x | Union[list, array] | An array like object containing the contents of each row of the table. | None |\n",
"| data | dict | A dictionary to be converted into a Table object. | None |\n",
"| columns | list[str] | A list of column names to use when constructing from an array of rows. | None |\n",
"\n",
"**Returns:**\n",
"\n",
"| Type | Description |\n",
"| :---: | :---------------------------------- |\n",
"| Table | The newly constructed table object. |\n",
"\n",
"**Examples:**\n",
"\n",
"Create a table from a dictionary object."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "170587aa",
"metadata": {},
"outputs": [],
"source": [
"kx.Table(data={'x': list(range(10)), 'y': [10 - x for x in range(10)]})"
]
},
{
"cell_type": "markdown",
"id": "273de502",
"metadata": {},
"source": [
"Create a Table from an array like object."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "62b9f5c1",
"metadata": {},
"outputs": [],
"source": [
"kx.Table([[0, 1], [2, 3], [4, 5], [6, 7], [8, 9]])"
]
},
{
"cell_type": "markdown",
"id": "51d82353",
"metadata": {},
"source": [
"Create a Table from an array like object and provide names for the columns to use."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e9751924",
"metadata": {},
"outputs": [],
"source": [
"kx.Table([[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11]], columns=['x', 'y', 'z'])"
]
},
{
"cell_type": "markdown",
"id": "36edf1de",
"metadata": {},
"source": [
"### Keyed Table\n",
"\n",
"Create a keyed table from a list of rows or by converting a Python dictionary object\n",
"\n",
"**Parameters:**\n",
"\n",
"| Name | Type | Description | Default |\n",
"| :-----: | :----------------: | :--------------------------------------------------------------------- | :------: |\n",
"| x | Union[list, array] | An array like object containing the contents of each row of the table. | None |\n",
"| data | dict | A dictionary to be converted into a Table object. | None |\n",
"| columns | list[str] | A list of column names to use when constructing from an array of rows. | None |\n",
"| index | list[Any] | An array like object to use as the index column of the table. | None |\n",
"\n",
"**Returns:**\n",
"\n",
"| Type | Description |\n",
"| :--------: | :---------------------------------------- |\n",
"| KeyedTable | The newly constructed keyed table object. |\n",
"\n",
"**Examples:**\n",
"\n",
"Create a keyed table from a dictionary object."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0ab1d288",
"metadata": {},
"outputs": [],
"source": [
"kx.KeyedTable(data={'x': list(range(10)), 'y': list(10 - x for x in range(10))})"
]
},
{
"cell_type": "markdown",
"id": "1a2f9b56",
"metadata": {},
"source": [
"Create a keyed table from a list of rows."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8a0b5ce8",
"metadata": {},
"outputs": [],
"source": [
"kx.KeyedTable([[0, 1], [2, 3], [4, 5], [6, 7], [8, 9]])"
]
},
{
"cell_type": "markdown",
"id": "804183ed",
"metadata": {},
"source": [
"Create a keyed table from a list of rows and provide names for the resulting columns."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "21b018fe",
"metadata": {},
"outputs": [],
"source": [
"kx.KeyedTable([[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11]], columns=['x', 'y', 'z'])"
]
},
{
"cell_type": "markdown",
"id": "b91e990b",
"metadata": {},
"source": [
"Create a keyed table with a specified index column."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d2631bad",
"metadata": {},
"outputs": [],
"source": [
"kx.KeyedTable(data={'x': list(range(10)), 'y': list(10 - x for x in range(10))}, index=[2 * x for x in range(10)])"
]
},
{
"cell_type": "markdown",
"id": "f1f43263",
"metadata": {},
"source": [
"## Metadata"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "15b9c003",
"metadata": {},
"outputs": [],
"source": [
"N = 1000\n",
"tab = kx.Table(data = {\n",
" 'x': kx.q.til(N),\n",
" 'y': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),\n",
" 'z': kx.random.random(N, 500.0),\n",
" 'w': kx.random.random(N, 1000),\n",
" 'v': kx.random.random(N, [kx.LongAtom.null, 0, 50, 100, 200, 250])})\n",
"tab"
]
},
{
"cell_type": "markdown",
"id": "c2122f58",
"metadata": {},
"source": [
"### Table.columns\n",
"\n",
"Get the name of each column in the table"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6e35b1b4",
"metadata": {},
"outputs": [],
"source": [
"tab.columns"
]
},
{
"cell_type": "markdown",
"id": "fc006fd7",
"metadata": {},
"source": [
"### Table.dtypes\n",
"\n",
"Get the datatypes of the table columns"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c8f571f6",
"metadata": {},
"outputs": [],
"source": [
"tab.dtypes"
]
},
{
"cell_type": "markdown",
"id": "5b4d25bf",
"metadata": {},
"source": [
"### Table.empty\n",
"\n",
"Returns True if the table is empty otherwise returns False."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b01c0791",
"metadata": {},
"outputs": [],
"source": [
"tab.empty"
]
},
{
"cell_type": "markdown",
"id": "550c1126",
"metadata": {},
"source": [
"### Table.ndim\n",
"\n",
"Get the nuber of columns within the table."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "88affa6b",
"metadata": {},
"outputs": [],
"source": [
"tab.ndim"
]
},
{
"cell_type": "markdown",
"id": "f479bdcc",
"metadata": {},
"source": [
"### Table.shape\n",
"\n",
"Get the shape of the table as a tuple (number of rows, number of columns)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a0609e97",
"metadata": {},
"outputs": [],
"source": [
"tab.shape"
]
},
{
"cell_type": "markdown",
"id": "42bc2bc3",
"metadata": {},
"source": [
"### Table.size\n",
"\n",
"Get the number of values in the table (rows * cols)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "886296f3",
"metadata": {},
"outputs": [],
"source": [
"tab.size"
]
},
{
"cell_type": "markdown",
"id": "1439bde3",
"metadata": {},
"source": [
"## Querying and Data Interrogation"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "776b5725",
"metadata": {},
"outputs": [],
"source": [
"# The examples in this section will use this example table filled with random data\n",
"N = 1000\n",
"tab = kx.Table(data = {\n",
" 'x': kx.q.til(N),\n",
" 'y': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),\n",
" 'z': kx.random.random(N, 500.0),\n",
" 'w': kx.random.random(N, 1000),\n",
" 'v': kx.random.random(N, [kx.LongAtom.null, 0, 50, 100, 200, 250])})\n",
"tab"
]
},
{
"cell_type": "markdown",
"id": "d356c82f",
"metadata": {},
"source": [
"### Table.all()\n",
"\n",
"```\n",
"Table.all(axis=0, bool_only=False, skipna=True)\n",
"```\n",
"\n",
"Returns whether or not all values across the given axis have a `truthy` value.\n",
"\n",
"**Parameters:**\n",
"\n",
"| Name | Type | Description | Default |\n",
"| :----------: | :--: | :------------------------------------------------------------------------------- | :-----: |\n",
"| axis | int | The axis to calculate `all` across 0 is columns, 1 is rows. | 0 |\n",
"| bool_only | bool | Only use columns of the table that are boolean types. | False |\n",
"| skipna | bool | Ignore any null values along the axis. | True |\n",
"\n",
"**Returns:**\n",
"\n",
"| Type | Description |\n",
"| :----------------: | :------------------------------------------------------------------- |\n",
"| Dictionary | A dictionary where the key represents the column name / row number and the values are the result of calling `all` on that column / row. |"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b1c046de",
"metadata": {},
"outputs": [],
"source": [
"tab.all()"
]
},
{
"cell_type": "markdown",
"id": "e9c11a2e",
"metadata": {},
"source": [
"### Table.any()\n",
"\n",
"```\n",
"Table.any(axis=0, bool_only=False, skipna=True)\n",
"```\n",
"\n",
"Returns whether or not any values across the given axis have a `truthy` value.\n",
"\n",
"**Parameters:**\n",
"\n",
"| Name | Type | Description | Default |\n",
"| :----------: | :--: | :------------------------------------------------------------------------------- | :-----: |\n",
"| axis | int | The axis to calculate `any` across 0 is columns, 1 is rows. | 0 |\n",
"| bool_only | bool | Only use columns of the table that are boolean types. | False |\n",
"| skipna | bool | Ignore any null values along the axis. | True |\n",
"\n",
"**Returns:**\n",
"\n",
"| Type | Description |\n",
"| :----------------: | :------------------------------------------------------------------- |\n",
"| Dictionary | A dictionary where the key represents the column name / row number and the values are the result of calling `any` on that column / row. |"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "501721e8",
"metadata": {},
"outputs": [],
"source": [
"tab.any()"
]
},
{
"cell_type": "markdown",
"id": "cb69b61a",
"metadata": {},
"source": [
"### Table.at[]\n",
"\n",
"```\n",
"Table.at[row, col]\n",
"```\n",
"\n",
"Access a single value for a row / column pair.\n",
"\n",
"Similar to `loc[]`, in that both provide label-based lookups. Use `at` if you only need to get or set a single value.\n",
"\n",
"The `at` property can be used for both assignment and retrieval of values at a given row and column."
]
},
{
"cell_type": "markdown",
"id": "8262b005",
"metadata": {},
"source": [
"**Examples:**\n",
"\n",
"Get the value of the `z` column in the 997th row."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3664be9c",
"metadata": {},
"outputs": [],
"source": [
"tab.at[997, 'z']"
]
},
{
"cell_type": "markdown",
"id": "043ed9ca",
"metadata": {},
"source": [
"Reassign the value of the `z` column in the 997th row to `3.14159`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3c7c4bc7",
"metadata": {},
"outputs": [],
"source": [
"tab.at[997, 'z'] = 3.14159\n",
"tab.at[997, 'z']"
]
},
{
"cell_type": "markdown",
"id": "903c0aac",
"metadata": {},
"source": [
"### Table.get()\n",
"\n",
"```\n",
"Table.get(key, default=None)\n",
"```\n",
"\n",
"Get a column or columns from a table by key, if the key does not exist return the default value.\n",
"\n",
"**Parameters:**\n",
"\n",
"| Name | Type | Description | Default |\n",
"| :-----: | :--------------------: | :------------------------------------------------------ | :--------: |\n",
"| key | Union[str, list[str]] | The column name or list of names to get from the table. | _required_ |\n",
"| default | int | The default value if the key is not found. | None |\n",
"\n",
"**Returns:**\n",
"\n",
"| Type | Description |\n",
"| :---------------: | :------------------------------------------------------------------- |\n",
"| Union[Table, Any] | A table containing only the columns requested or the default value. |"
]
},
{
"cell_type": "markdown",
"id": "3d094b7b",
"metadata": {},
"source": [
"**Examples:**\n",
"\n",
"Get the `y` column from the table."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7809ac4a",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"tab.get('y')"
]
},
{
"cell_type": "markdown",
"id": "2ddd9659",
"metadata": {},
"source": [
"Get the `y` and `z` columns from the table."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "78c9f224",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"tab.get(['y', 'z'])"
]
},
{
"cell_type": "markdown",
"id": "379219ef",
"metadata": {},
"source": [
"Attempt to get the `q` column from the table and receive none as that column does not exist."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "010d9d98",
"metadata": {},
"outputs": [],
"source": [
"print(tab.get('q'))"
]
},
{
"cell_type": "markdown",
"id": "3ee99633",
"metadata": {},
"source": [
"Attempt to get the `q` column from the table and receive the default value `not found` as that column does not exist."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ffd3a851",
"metadata": {},
"outputs": [],
"source": [
"tab.get('q', 'not found')"
]
},
{
"cell_type": "markdown",
"id": "34016a3f",
"metadata": {},
"source": [
"### Table.head()\n",
"\n",
"```\n",
"Table.head(n=5)\n",
"```\n",
"\n",
"Get the first n rows from a table.\n",
"\n",
"**Parameters:**\n",
"\n",
"| Name | Type | Description | Default |\n",
"| :--: | :--: | :---------------------------- | :-----: |\n",
"| n | int | The number of rows to return. | 5 |\n",
"\n",
"**Returns:**\n",
"\n",
"| Type | Description |\n",
"| :---: | :------------------------------- |\n",
"| Table | The first `n` rows of the table. |"
]
},
{
"cell_type": "markdown",
"id": "d823513a",
"metadata": {},
"source": [
"**Examples:**\n",
"\n",
"Return the first 5 rows of the table."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5618880c",
"metadata": {},
"outputs": [],
"source": [
"tab.head()"
]
},
{
"cell_type": "markdown",
"id": "c5a8b2e8",
"metadata": {},
"source": [
"Return the first 10 rows of the table."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "90071dcf",
"metadata": {},
"outputs": [],
"source": [
"tab.head(10)"
]
},
{
"cell_type": "markdown",
"id": "5e21bef1",
"metadata": {},
"source": [
"### Table.isna()\n",
"\n",
"```\n",
"Table.isna()\n",
"```\n",
"\n",
"Detects null values in a Table object.\n",
"\n",
"**Parameters:**\n",
"\n",
"\n",
"**Returns:**\n",
"\n",
"| Type | Description |\n",
"| :----------------: | :------------------------------------------------------------------- |\n",
"| Table | A Table with the same shape as the original but containing boolean values. `1b` represents a null value present in a cell, `0b` represents the opposite. |"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d8ff16e1",
"metadata": {},
"outputs": [],
"source": [
"tabDemo = kx.Table(data= {\n",
" 'a': [1, 0, float('nan')],\n",
" 'b': [1, 0, float('nan')],\n",
" 'c': [float('nan'), 4, 0]\n",
" })"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d8ff16e1",
"metadata": {},
"outputs": [],
"source": [
"tabDemo.isna()"
]
},
{
"cell_type": "markdown",
"id": "47d20b00",
"metadata": {},
"source": [
"### Table.isnull()\n",
"\n",
"```\n",
"Table.isnull()\n",
"```\n",
"\n",
"Alias of Table.isna().\n",
"\n",
"Detects null values in a Table object.\n",
"\n",
"**Parameters:**\n",
"\n",
"\n",
"**Returns:**\n",
"\n",
"| Type | Description |\n",
"| :----------------: | :------------------------------------------------------------------- |\n",
"| Table | A Table with the same shape as the original but containing boolean values. `1b` represents a null value present in a cell, `0b` represents the opposite. |"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "400c209e",
"metadata": {},
"outputs": [],
"source": [
"tabDemo.isnull()"
]
},
{
"cell_type": "markdown",
"id": "fb3164d5",
"metadata": {},
"source": [
"### Table.notna()\n",
"\n",
"```\n",
"Table.notna()\n",
"```\n",
"\n",
"Boolean inverse of Table.isna().\n",
"\n",
"Detects non-null values on a Table object.\n",
"\n",
"**Parameters:**\n",
"\n",
"\n",
"**Returns:**\n",
"\n",
"| Type | Description |\n",
"| :----------------: | :------------------------------------------------------------------- |\n",
"| Table | A Table with the same shape as the original but containing boolean values. `0b` represents a null value present in a cell, `1b` represents the opposite. |"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4206eec3",
"metadata": {},
"outputs": [],
"source": [
"tabDemo.notna()"
]
},
{
"cell_type": "markdown",
"id": "4e8e5c07",
"metadata": {},
"source": [
"### Table.notnull()\n",
"\n",
"```\n",
"Table.notna()\n",
"```\n",
"\n",
"Boolean inverse of Table.isnull(). Alias of Table.isna()\n",
"\n",
"Detects non-null values on a Table object.\n",
"\n",
"**Parameters:**\n",
"\n",
"\n",
"**Returns:**\n",
"\n",
"| Type | Description |\n",
"| :----------------: | :------------------------------------------------------------------- |\n",
"| Table | A Table with the same shape as the original but containing boolean values. `0b` represents a null value present in a cell, `1b` represents the opposite. |"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3138d21a",
"metadata": {},
"outputs": [],
"source": [
"tabDemo.notnull()"
]
},
{
"cell_type": "markdown",
"id": "d97d6bae",
"metadata": {},
"source": [
"### Table.iloc[]\n",
"\n",
"```\n",
"Table.iloc[:, :]\n",
"```\n",
"\n",
"Purely integer-location based indexing for selection by position.\n",
"\n",
"`iloc` is primarily integer position based (from 0 to length-1 of the axis), but may also be used with a BooleanVector.\n",
"\n",
"Allowed inputs are:\n",
"- An integer, e.g. 5.\n",
"- A list or array of integers, e.g. [4, 3, 0].\n",
"- A slice object with ints, e.g. 1:7.\n",
"- A BooleanVector.\n",
"- A callable function with one argument (the calling Series or DataFrame) and that returns valid output for indexing (one of the above). This is useful in method chains, when you don’t have a reference to the calling object, but would like to base your selection on some value.\n",
"- A tuple of row and column indexes. The tuple elements consist of one of the above inputs, e.g. (0, 1).\n",
"\n",
"**Returns:**\n",
"\n",
"| Type | Description |\n",
"| :---: | :----------------------------------------------------- |\n",
"| Table | A table containing only the columns / rows requested. |"
]
},
{
"cell_type": "markdown",
"id": "a3945130",
"metadata": {},
"source": [
"**Examples:**\n",
"\n",
"Get the second row from a table."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1f83db52",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"tab.iloc[1]"
]
},
{
"cell_type": "markdown",
"id": "72b468a1",
"metadata": {},
"source": [
"Get the first 5 rows from a table."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5354ca81",
"metadata": {},
"outputs": [],
"source": [
"tab.iloc[:5]"
]
},
{
"cell_type": "markdown",
"id": "9295eddc",
"metadata": {},
"source": [
"Get all rows of the table where the `y` column is equal to `AAPL`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6410e870",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"tab.iloc[tab['y'] == 'AAPL']"
]
},
{
"cell_type": "markdown",
"id": "08792c1d",
"metadata": {},
"source": [
"Get all rows of the table where the `y` column is equal to `AAPL`, and only return the `y`, `z` and `w` columns."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d61b8396",
"metadata": {},
"outputs": [],
"source": [
"tab.iloc[tab['y'] == 'AAPL', ['y', 'z', 'w']]"
]
},
{
"cell_type": "markdown",
"id": "4525b646",
"metadata": {},
"source": [
"Replace all null values in the column `v` with the value `-100`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b65e7a05",
"metadata": {},
"outputs": [],
"source": [
"tab.iloc[tab['v'] == kx.q('0N'), 'v'] = -100\n",
"tab"
]
},
{
"cell_type": "markdown",
"id": "dc97669c",
"metadata": {},
"source": [
"### Table.loc[]\n",
"\n",
"```\n",
"Table.loc[:, :]\n",
"```\n",
"\n",
"Access a group of rows and columns by label or by BooleanVector.\n",
"\n",
"`loc` is a label based form of indexing, but may also be used with a boolean array.\n",
"\n",
"Allowed inputs are:\n",
"\n",
"- A single label, e.g. 5 or 'a', (note that 5 is interpreted as a label of the index, and never as an integer position along the index)\n",
"- A list or array of labels, e.g. ['a', 'b', 'c']\n",
"- A slice object with labels, e.g. 'a':'f'\n",
" - **Warning contrary to usual python slices, both the start and the stop are included**\n",
"- A BooleanVector of the same length as the axis being sliced\n",
"- An alignable BooleanVector. The index of the key will be aligned before masking\n",
"- An alignable Index. The Index of the returned selection will be the input\n",
"- A callable function with one argument (the calling Table like object) and that returns valid output for indexing (e.g. one of the above)\n",
"\n",
"Note:\n",
"When the Pandas API is enabled, using `[]` to index into a table will use `Table.loc[]`\n",
"\n",
"**Returns:**\n",
"\n",
"| Type | Description |\n",
"| :---: | :----------------------------------------------------- |\n",
"| Table | A table containing only the columns / rows requested. |"
]
},
{
"cell_type": "markdown",
"id": "f90efe27",
"metadata": {},
"source": [
"**Examples:**\n",
"\n",
"Get every row in the `y` column."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "20974780",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"tab[:, 'y']"
]
},
{
"cell_type": "markdown",
"id": "ceccd5a9",
"metadata": {},
"source": [
"Get all rows of the table where the value in the `z` column is greater than `250.0`"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e99478b5",
"metadata": {},
"outputs": [],
"source": [
"tab[tab['z'] > 250.0]"
]
},
{
"cell_type": "markdown",
"id": "5300666e",
"metadata": {},
"source": [
"Replace all null values in the column `v` with the value `-100`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "889ddbd3",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"tab.loc[tab['v'] == kx.LongAtom.null, 'v'] = -100\n",
"tab"
]
},
{
"cell_type": "markdown",
"id": "e52f569f",
"metadata": {},
"source": [
"Replace all locations in column `v` where the value is `-100` with a null."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2df5ddff",
"metadata": {},
"outputs": [],
"source": [
"tab[tab['v'] == -100, 'v'] = kx.LongAtom.null\n",
"tab"
]
},
{
"cell_type": "markdown",
"id": "ca371dea",
"metadata": {},
"source": [
"Usage of the `loc` functionality under the hood additionally allows users to set columns within a table for single or multiple columns. Data passed for this can be q/Python."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2c5b1db2",
"metadata": {},
"outputs": [],
"source": [
"tab['new_col'] = kx.random.random(1000, 1.0)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "87d71574",
"metadata": {},
"outputs": [],
"source": [
"tab[['new_col1', 'new_col2']] = [20, kx.random.random(1000, kx.GUIDAtom.null)]"
]
},
{
"cell_type": "markdown",
"id": "53c9631f",
"metadata": {},
"source": [
"### Table.sample()\n",
"\n",
"```\n",
"Table.sample(n, frac, replace, weights, random_state, axis, ignore_index)\n",
"```\n",
"\n",
"Sample random data from the table.\n",
"\n",
"**Parameters:**\n",
"\n",
"| Name | Type | Description | Default |\n",
"| :----------: | :---: | :----------------------------------------------------------------- | :-----: |\n",
"| n | int | Number of rows to return. Cannot be used with `frac`. Default is 1 if `frac` is None. | None |\n",
"| frac | float | Fraction of the rows to return. Cannot be used with `n`. | None |\n",
"| replace | bool | Whether or not it should be possible to sample the same row twice. | False |\n",
"| weights | None | Not yet implemented. | None |\n",
"| random_state | None | Not yet implemented. | None |\n",
"| axis | None | Not yet implemented. | None |\n",
"| ignore_index | bool | Not yet implemented. | False |\n",
"\n",
"**Returns:**\n",
"\n",
"| Type | Description |\n",
"| :---: | :----------------------------------------------------------------- |\n",
"| Table | A table with the given column(s) renamed. |"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "845e22d6",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"# The examples in this section will use this example table filled with random data\n",
"N = 1000\n",
"tab = kx.Table(data = {\n",
" 'x': kx.q.til(N),\n",
" 'y': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),\n",
" 'z': kx.random.random(N, 500.0),\n",
" 'w': kx.random.random(N, 1000),\n",
" 'v': kx.random.random(N, [kx.LongAtom.null, 0, 50, 100, 200, 250])})\n",
"tab.head()"
]
},
{
"cell_type": "markdown",
"id": "c9d84056",
"metadata": {},
"source": [
"**Examples:**\n",
"\n",
"Sample 10 Rows."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ebfeeec5",
"metadata": {},
"outputs": [],
"source": [
"tab.sample(n=10)"
]
},
{
"cell_type": "markdown",
"id": "d3150483",
"metadata": {},
"source": [
"Sample 10% of the rows."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "67844a62",
"metadata": {},
"outputs": [],
"source": [
"tab.sample(frac=0.1)"
]
},
{
"cell_type": "markdown",
"id": "dce42092",
"metadata": {},
"source": [
"Sample 10% of the rows and allow the same row to be sampled twice."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1a2326fd",
"metadata": {},
"outputs": [],
"source": [
"tab.sample(frac=0.1, replace=True)"
]
},
{
"cell_type": "markdown",
"id": "7d42cde9",
"metadata": {},
"source": [
"### Table.select_dtypes()\n",
"\n",
"```\n",
"Table.select_dtypes(include=None, exclude=None)\n",
"```\n",
"\n",
"Return a subset of the DataFrame’s columns based on the column dtypes.\n",
"\n",
"Allowed inputs for `include`/`exclude` are:\n",
"- A single dtype or string.\n",
"- A list of dtypes or strings.\n",
"- Inputs given for `include` and `exclude` cannot overlap.\n",
"\n",
"The dtype `kx.CharVector` will return an error. Use `kx.CharAtom` for a column of single chars.\n",
"Both `kx.*Atom` and `kx.*Vector` will be taken to mean a column containing a single item per row of type `*`. `kx.List` will include/exclude any columns containing mixed list data (including string columns).\n",
" \n",
"**Parameters:**\n",
"\n",
"| Name | Type | Description | Default |\n",
"| :-----: | :--------------: | :----------------------------------------------: | :-----: |\n",
"| include | Union[List, str] | A selection of dtypes or strings to be included. | None |\n",
"| exclude | Union[List, str] | A selection of dtypes or strings to be excluded. | None |\n",
" \n",
"At least one of these parameters must be supplied.\n",
" \n",
"**Returns:**\n",
"\n",
"| Type | Description |\n",
"| :-------: | :----------------------------------------------------------------------------------------------: |\n",
"| Dataframe | The subset of the frame including the dtypes in `include` and excluding the dtypes in `exclude`. |"
]
},
{
"cell_type": "markdown",
"id": "bb6fc886",
"metadata": {},
"source": [
"**Examples:**\n",
"\n",
"The examples in the section will use the example table."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ca9b5532",
"metadata": {},
"outputs": [],
"source": [
"df = kx.Table(data = {\n",
" 'c1': kx.SymbolVector(['a', 'b', 'c']),\n",
" 'c2': kx.ShortVector([1, 2, 3]),\n",
" 'c3': kx.LongVector([1, 2, 3]),\n",
" 'c4': kx.IntVector([1, 2, 3])\n",
" })"
]
},
{
"cell_type": "markdown",
"id": "8eb25b29",
"metadata": {},
"source": [
"Exclude columns containing symbols"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "de81564b",
"metadata": {},
"outputs": [],
"source": [
"df.select_dtypes(exclude = kx.SymbolVector)"
]
},
{
"cell_type": "markdown",
"id": "1e842cc3",
"metadata": {},
"source": [
"Include a list of column types"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ba874cb6",
"metadata": {},
"outputs": [],
"source": [
"df.select_dtypes(include = [kx.ShortVector, kx.LongVector])"
]
},
{
"cell_type": "markdown",
"id": "5bb4eaa2",
"metadata": {},
"source": [
"### Table.tail()\n",
"\n",
"```\n",
"Table.tail(n=5)\n",
"```\n",
"\n",
"Get the last n rows from a table.\n",
"\n",
"**Parameters:**\n",
"\n",
"| Name | Type | Description | Default |\n",
"| :--: | :--: | :---------------------------- | :-----: |\n",
"| n | int | The number of rows to return. | 5 |\n",
"\n",
"**Returns:**\n",
"\n",
"| Type | Description |\n",
"| :---: | :------------------------------- |\n",
"| Table | The last `n` rows of the table. |"
]
},
{
"cell_type": "markdown",
"id": "2c9de3b3",
"metadata": {},
"source": [
"**Examples:**\n",
"\n",
"Return the last 5 rows of the table."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5c31fc24",
"metadata": {},
"outputs": [],
"source": [
"tab.tail()"
]
},
{
"cell_type": "markdown",
"id": "5ad81954",
"metadata": {},
"source": [
"Return the last 10 rows of the table."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "02974f05",
"metadata": {},
"outputs": [],
"source": [
"tab.tail(10)"
]
},
{
"cell_type": "markdown",
"id": "a2edb648",
"metadata": {},
"source": [
"## Sorting"
]
},
{
"cell_type": "markdown",
"id": "ee65b6ab",
"metadata": {},
"source": [
"### Table.sort_values()\n",
"\n",
"```\n",
"Table.sort_values(by, ascending=True)\n",
"```\n",
"\n",
"Sort Table objects based on the value of a selected column.\n",
"\n",
"**Parameters:**\n",
"\n",
"| Name | Type | Description | Default |\n",
"| :---------: | :--: | :-------------------------------------------------------------------------------- | :-----: |\n",
"| by | str or list of str | The name of the column to sort by. | _required_ |\n",
"| ascending | bool | The order in which to sort the values, ascending is True and descending is False. | True |\n",
"\n",
"\n",
"**Returns:**\n",
"\n",
"| Type | Description |\n",
"| :----------------: | :------------------------------------------------------------------ |\n",
"| Table | The resulting table after the sort has been perfomed |"
]
},
{
"cell_type": "markdown",
"id": "6b4c5b68",
"metadata": {},
"source": [
"**Examples:**"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e996a181",
"metadata": {},
"outputs": [],
"source": [
"tab = kx.Table(data={\n",
" 'column_a': [20, 3, 100],\n",
" 'column_b': [56, 15, 42],\n",
" 'column_c': [45, 80, 8]})\n",
"tab"
]
},
{
"cell_type": "markdown",
"id": "5093808f",
"metadata": {},
"source": [
"Sort a Table by the second column"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "08eb698c",
"metadata": {},
"outputs": [],
"source": [
"tab.sort_values(by='column_b')"
]
},
{
"cell_type": "markdown",
"id": "4a48687d",
"metadata": {},
"source": [
"Sort a Table by the third column in descending order"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4ba2b42f",
"metadata": {},
"outputs": [],
"source": [
"tab.sort_values(by='column_c', ascending=False)"
]
},
{
"cell_type": "markdown",
"id": "29930425",
"metadata": {},
"source": [
"### Table.nsmallest()\n",
"```\n",
"Table.nsmallest(\n",
" n,\n",
" columns,\n",
" keep='first'\n",
")\n",
"```\n",
"\n",
"Return the first n rows of a Table ordered by columns in ascending order\n",
"\n",
"**Parameters:**\n",
"\n",
"| Name | Type | Description | Default |\n",
"| :---------: | :--: | :----------------------------------------------------------------------------------| :-----------: |\n",
"| n | int | The number of rows to return | _required_ |\n",
"| columns | str or list of str | Column labels to order by | _required_ |\n",
"| keep | str | Can be 'first', 'last' or 'all'. Used in case of duplicate values | 'first' | \n",
"\n",
"**Returns**\n",
"\n",
"| Type | Description |\n",
"| :----------------: | :------------------------------------------------------------------ |\n",
"| Table | The first n rows ordered by the given columns in ascending order |"
]
},
{
"cell_type": "markdown",
"id": "64976edc",
"metadata": {},
"source": [
"**Examples:**\n",
"\n",
"Sample table"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "302d4b08",
"metadata": {},
"outputs": [],
"source": [
"tab = kx.Table(data={\n",
" 'column_a': [2, 3, 2, 2, 1],\n",
" 'column_b': [56, 15, 42, 102, 32],\n",
" 'column_c': [45, 80, 8, 61, 87]})\n",
"tab"
]
},
{
"cell_type": "markdown",
"id": "c687bc12",
"metadata": {},
"source": [
"Get the row where the first column is the smallest"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5f2e6e8b",
"metadata": {},
"outputs": [],
"source": [
"tab.nsmallest(n=1, columns='column_a')"
]
},
{
"cell_type": "markdown",
"id": "580d8d06",
"metadata": {},
"source": [
"Get the 4 rows where the first column is the smallest, then any equal values are sorted based on the second column"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "039083ba",
"metadata": {},
"outputs": [],
"source": [
"tab.nsmallest(n=4,columns=['column_a', 'column_b'])"
]
},
{
"cell_type": "markdown",
"id": "db0280b3",
"metadata": {},
"source": [
"Get the 2 rows with the smallest values for the first column and in case of duplicates, take the last entry in the table"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "eb02553b",
"metadata": {},
"outputs": [],
"source": [
"tab.nsmallest(n=2, columns=['column_a'], keep='last')"
]
},
{
"cell_type": "markdown",
"id": "fbb4e07f",
"metadata": {},
"source": [
"### Table.nlargest()\n",
"```\n",
"Table.nlargest(\n",
" n,\n",
" columns,\n",
" keep='first'\n",
")\n",
"```\n",
"\n",
"Return the first n rows of a Table ordered by columns in descending order\n",
"\n",
"**Parameters:**\n",
"\n",
"| Name | Type | Description | Default |\n",
"| :---------: | :--: | :-------------------------------------------------------------------------------- | :-------: |\n",
"| n | int | The number of rows to return | _required_|\n",
"| columns | str or list of str | Column labels to order by | _required_|\n",
"| keep | str | Can be 'first', 'last' or 'all'. Used in case of duplicate values | 'first' | \n",
"\n",
"**Returns**\n",
"\n",
"| Type | Description |\n",
"| :----------------: | :------------------------------------------------------------------ |\n",
"| Table | The first n rows ordered by the given columns in descending order|"
]
},
{
"cell_type": "markdown",
"id": "394bdd98",
"metadata": {},
"source": [
"**Examples:**\n",
"\n",
"Sample table"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ead5bfc0",
"metadata": {},
"outputs": [],
"source": [
"tab = kx.Table(data={\n",
" 'column_a': [2, 3, 2, 2, 1],\n",
" 'column_b': [102, 15, 42, 56, 32],\n",
" 'column_c': [45, 80, 8, 61, 87]})\n",
"tab"
]
},
{
"cell_type": "markdown",
"id": "efc9b4c7",
"metadata": {},
"source": [
"Get the row with the largest value for the first column"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c7c6363a",
"metadata": {},
"outputs": [],
"source": [
"tab.nlargest(n=1, columns='column_a')"
]
},
{
"cell_type": "markdown",
"id": "18b2a6ce",
"metadata": {},
"source": [
"Get the 4 rows where the first column is the largest, then any equal values are sorted based on the third column"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9162934a",
"metadata": {},
"outputs": [],
"source": [
"tab.nlargest(n=4,columns=['column_a', 'column_c'])"
]
},
{
"cell_type": "markdown",
"id": "65fce7c3",
"metadata": {},
"source": [
"Get the 2 rows with the smallest values for the first column and in case of duplicates, take all rows of the same value for that column"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f0bc8991",
"metadata": {},
"outputs": [],
"source": [
"tab.nsmallest(n=2, columns=['column_a'], keep='all')"
]
},
{
"cell_type": "markdown",
"id": "ffc7e449",
"metadata": {},
"source": [
"## Data Joins/Merging"
]
},
{
"cell_type": "markdown",
"id": "6a4c9fc9",
"metadata": {},
"source": [
"### Table.merge()\n",
"\n",
"```\n",
"Table.merge(\n",
" right,\n",
" how='inner',\n",
" on=None,\n",
" left_on=None,\n",
" right_on=None,\n",
" left_index=False,\n",
" right_index=False,\n",
" sort=False,\n",
" suffixes=('_x', '_y'),\n",
" copy=True,\n",
" validate=None,\n",
" q_join=False\n",
")\n",
"```\n",
"\n",
"Merge Table or KeyedTable objects with a database-style join.\n",
"\n",
"The join is done on columns or keys. If joining columns on columns, the Table key will be ignored. Otherwise if joining keys on keys or keys on a column or columns, the index will be passed on. When performing a cross merge, no column specifications to merge on are allowed.\n",
"\n",
"**Parameters:**\n",
"\n",
"| Name | Type | Description | Default |\n",
"| :---------: | :--: | :-------------------------------------------------------------------------------- | :-----: |\n",
"| right | Union[Table/KeyedTable] | The object to merge with. | _required_ |\n",
"| how | str | The type of join to be used. One of {‘left’, ‘right’, ‘outer’, ‘inner’, ‘cross’}. | ‘inner’ |\n",
"| on | str | The column name to join on. | None |\n",
"| left_on | str | The column name in the left table to join on. | None |\n",
"| right_on | str | The column name in the right table to join on. | None |\n",
"| left_index | bool | Use the index of the left Table. | False |\n",
"| right_index | bool | Use the index of the right Table. | False |\n",
"| sort | bool | Sort the join keys of the resulting table. | False |\n",
"| suffixes | Tuple(str, str) | The number of rows to return. | ('\\_x', '\\_y') |\n",
"| copy | bool | If False avoid copies and modify the input table. | None |\n",
"| validate | str | If specified checks if merge matches specified type.
- “one_to_one” or “1:1”: check if merge keys are unique in both left and right datasets.
- “one_to_many” or “1:m”: check if merge keys are unique in left dataset.
- “many_to_one” or “m:1”: check if merge keys are unique in right dataset.
- “many_to_many” or “m:m”: allowed, but does not result in checks.
| None |\n",
"| q_join | bool | If True perform native q joins instead of the pandas SQL like joins. More documentation around these joins can be found [here.](https://code.kx.com/q/basics/joins/) | False |\n",
"\n",
"**Returns:**\n",
"\n",
"| Type | Description |\n",
"| :----------------: | :------------------------------------------------------------------ |\n",
"| Table / KeyedTable | The resulting table-like object after the join has been preformed. |"
]
},
{
"cell_type": "markdown",
"id": "3fbf575d",
"metadata": {},
"source": [
"**Examples:**\n",
"\n",
"Merge tab1 and tab2 on the lkey and rkey columns. The value columns have the default suffixes, \\_x and \\_y, appended."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0f5f134f",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"tab1 = kx.Table(data={'lkey': ['foo', 'bar', 'baz', 'foo'], 'value': [1, 2, 3, 5]})\n",
"tab2 = kx.Table(data={'rkey': ['foo', 'bar', 'baz', 'foo'], 'value': [5, 6, 7, 8]})\n",
"tab1.merge(tab2, left_on='lkey', right_on='rkey')"
]
},
{
"cell_type": "markdown",
"id": "e9a9809e",
"metadata": {},
"source": [
"Merge tab1 and tab2 with specified left and right suffixes appended to any overlapping columns."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "86b35497",
"metadata": {},
"outputs": [],
"source": [
"tab1.merge(tab2, left_on='lkey', right_on='rkey', suffixes=('_left', '_right'))"
]
},
{
"cell_type": "markdown",
"id": "c2a3ed1a",
"metadata": {},
"source": [
"Merge tab1 and tab2 but raise an exception if the Tables have any overlapping columns."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a6326a4c",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"try:\n",
" tab1.merge(tab2, left_on='lkey', right_on='rkey', suffixes=(False, False))\n",
"except BaseException as e:\n",
" print(f'Caught Error: {e}')"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9d56ecee",
"metadata": {},
"outputs": [],
"source": [
"tab1 = kx.Table(data={'a': ['foo', 'bar'], 'b': [1, 2]})\n",
"tab2 = kx.Table(data={'a': ['foo', 'baz'], 'c': [3, 4]})"
]
},
{
"cell_type": "markdown",
"id": "c97d6764",
"metadata": {},
"source": [
"Merge tab1 and tab2 on the `a` column using an inner join."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "756423a2",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"tab1.merge(tab2, how='inner', on='a')"
]
},
{
"cell_type": "markdown",
"id": "cad8a08e",
"metadata": {},
"source": [
"Merge tab1 and tab2 on the `a` column using a left join."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e3511b35",
"metadata": {},
"outputs": [],
"source": [
"tab1.merge(tab2, how='left', on='a')"
]
},
{
"cell_type": "markdown",
"id": "cba56e88",
"metadata": {},
"source": [
"Merge tab1 and tab2 using a cross join."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3d8eb680",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"tab1 = kx.Table(data={'left': ['foo', 'bar']})\n",
"tab2 = kx.Table(data={'right': [7, 8]})\n",
"tab1.merge(tab2, how='cross')"
]
},
{
"cell_type": "markdown",
"id": "caa8cb07",
"metadata": {},
"source": [
"Merge tab1 and tab2_keyed using a left join with `q_join` set to `True`. Inputs/Outputs will match q [lj](https://code.kx.com/q/ref/lj/) behaviour."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1a7fb401",
"metadata": {},
"outputs": [],
"source": [
"tab1 = kx.Table(data={'a': ['foo', 'bar', 'baz'], 'b': [1, 2, 3]})\n",
"tab2 = kx.Table(data={'a': ['foo', 'baz', 'baz'], 'c': [3, 4, 5]})\n",
"tab2_keyed = tab2.set_index(1)\n",
"tab1.merge(tab2_keyed, how='left', q_join=True)"
]
},
{
"cell_type": "markdown",
"id": "b465b9fc",
"metadata": {},
"source": [
"Inputs/Outputs will match q [ij](https://code.kx.com/q/ref/ij/) behaviour."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bb0c0f70",
"metadata": {},
"outputs": [],
"source": [
"tab3 = kx.Table(data={'a': ['foo', 'bar'], 'd': [6, 7]})\n",
"tab3_keyed = tab3.set_index(1)\n",
"tab1.merge(tab3_keyed, how='inner', q_join=True)"
]
},
{
"cell_type": "markdown",
"id": "125d8479",
"metadata": {},
"source": [
"Merge using `q_join` set to `True`, and `how` set to `left`, will fail when `tab2` is not a keyed table."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6d71a5e4",
"metadata": {},
"outputs": [],
"source": [
"#Will error as Left Join requires a keyed column for the right dataset.\n",
"try:\n",
" tab1.merge(tab2, how='left', q_join=True)\n",
"except ValueError as e:\n",
" print(f'Caught Error: {e}')"
]
},
{
"cell_type": "markdown",
"id": "42158c05",
"metadata": {},
"source": [
"### Table.merge_asof()\n",
"\n",
"```\n",
"Table.merge_asof(\n",
" right,\n",
" on=None,\n",
" left_on=None,\n",
" right_on=None,\n",
" left_index=False,\n",
" right_index=False,\n",
" by=None,\n",
" left_by=None,\n",
" right_by=None,\n",
" suffixes=('_x', '_y'),\n",
" tolerance=None,\n",
" allow_exact_matches=True,\n",
" direction='backward'\n",
"\n",
")\n",
"```\n",
"\n",
"Merge Table or KeyedTable objects with a database-style join.\n",
"\n",
"The join is done on columns or keys. If joining columns on columns, the Table key will be ignored. Otherwise if joining keys on keys or keys on a column or columns, the index will be passed on. When performing a cross merge, no column specifications to merge on are allowed.\n",
"\n",
"**Parameters:**\n",
"\n",
"| Name | Type | Description | Default |\n",
"| :---------: | :--: | :-------------------------------------------------------------------------------- | :-----: |\n",
"| right | Union[Table/KeyedTable] | The object to merge with. | _required_ |\n",
"| how | str | The type of join to be used. One of {‘left’, ‘right’, ‘outer’, ‘inner’, ‘cross’}. | ‘inner’ |\n",
"| on | str | The column name to join on. | None |\n",
"| left_on | str | The column name in the left table to join on. | None |\n",
"| right_on | str | The column name in the right table to join on. | None |\n",
"| left_index | bool | Use the index of the left Table. | False |\n",
"| right_index | bool | Use the index of the right Table. | False |\n",
"| by | str | Not yet implemented. | None |\n",
"| left_by | str | Field names to match on in the left table. | None |\n",
"| right_by | str | Field names to match on in the right table. | None |\n",
"| suffixes | Tuple(str, str) | The number of rows to return. | ('\\_x', '\\_y') |\n",
"| tolerance | Any | Not yet implemented. | None |\n",
"| allow_exact_matches | bool | Not yet implemented. | True |\n",
"| direction | str | Not yet implemented. | 'backward' |\n",
"\n",
"\n",
"**Returns:**\n",
"\n",
"| Type | Description |\n",
"| :----------------: | :------------------------------------------------------------------ |\n",
"| Table / KeyedTable | The resulting table like object after the join has been preformed. |"
]
},
{
"cell_type": "markdown",
"id": "8712f68e",
"metadata": {},
"source": [
"**Examples:**\n",
"\n",
"Perform a simple asof join on two tables."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "16fbf21a",
"metadata": {},
"outputs": [],
"source": [
"left = kx.Table(data={\"a\": [1, 5, 10], \"left_val\": [\"a\", \"b\", \"c\"]})\n",
"right = kx.Table(data={\"a\": [1, 2, 3, 6, 7], \"right_val\": [1, 2, 3, 6, 7]})\n",
"left"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c8d023aa",
"metadata": {},
"outputs": [],
"source": [
"right"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b2f2766b",
"metadata": {},
"outputs": [],
"source": [
"left.merge_asof(right)"
]
},
{
"cell_type": "markdown",
"id": "e10eced6",
"metadata": {},
"source": [
"Perform a asof join on two tables but first merge them on the by column."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "943dd5b1",
"metadata": {},
"outputs": [],
"source": [
"trades = kx.Table(data={\n",
" \"time\": [\n",
" pd.Timestamp(\"2016-05-25 13:30:00.023\"),\n",
" pd.Timestamp(\"2016-05-25 13:30:00.023\"),\n",
" pd.Timestamp(\"2016-05-25 13:30:00.030\"),\n",
" pd.Timestamp(\"2016-05-25 13:30:00.041\"),\n",
" pd.Timestamp(\"2016-05-25 13:30:00.048\"),\n",
" pd.Timestamp(\"2016-05-25 13:30:00.049\"),\n",
" pd.Timestamp(\"2016-05-25 13:30:00.072\"),\n",
" pd.Timestamp(\"2016-05-25 13:30:00.075\")\n",
" ],\n",
" \"ticker\": [\n",
" \"GOOG\",\n",
" \"MSFT\",\n",
" \"MSFT\",\n",
" \"MSFT\",\n",
" \"GOOG\",\n",
" \"AAPL\",\n",
" \"GOOG\",\n",
" \"MSFT\"\n",
" ],\n",
" \"bid\": [720.50, 51.95, 51.97, 51.99, 720.50, 97.99, 720.50, 52.01],\n",
" \"ask\": [720.93, 51.96, 51.98, 52.00, 720.93, 98.01, 720.88, 52.03]\n",
"})\n",
"quotes = kx.Table(data={\n",
" \"time\": [\n",
" pd.Timestamp(\"2016-05-25 13:30:00.023\"),\n",
" pd.Timestamp(\"2016-05-25 13:30:00.038\"),\n",
" pd.Timestamp(\"2016-05-25 13:30:00.048\"),\n",
" pd.Timestamp(\"2016-05-25 13:30:00.048\"),\n",
" pd.Timestamp(\"2016-05-25 13:30:00.048\")\n",
" ],\n",
" \"ticker\": [\"MSFT\", \"MSFT\", \"GOOG\", \"GOOG\", \"AAPL\"],\n",
" \"price\": [51.95, 51.95, 720.77, 720.92, 98.0],\n",
" \"quantity\": [75, 155, 100, 100, 100]\n",
"})\n",
"trades"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "20657aed",
"metadata": {},
"outputs": [],
"source": [
"quotes"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a858ec29",
"metadata": {},
"outputs": [],
"source": [
"trades.merge_asof(quotes, on=\"time\")"
]
},
{
"cell_type": "markdown",
"id": "e6280a9a",
"metadata": {},
"source": [
"## Analytic functionality"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b5d4844f",
"metadata": {},
"outputs": [],
"source": [
"# All the examples in this section will use this example table.\n",
"N = 100\n",
"kx.Table(data={\n",
" 'sym': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),\n",
" 'price': 250 + kx.random.random(N, 500.0),\n",
" 'traded': 100 - kx.random.random(N, 200),\n",
" 'hold': kx.random.random(N, False)\n",
" })\n",
"tab"
]
},
{
"cell_type": "markdown",
"id": "fa9c8fc5",
"metadata": {},
"source": [
"### Table.abs()\n",
"\n",
"```\n",
"Table.abs(numeric_only=False)\n",
"```\n",
"\n",
"Take the absolute value of each element in the table. This will raise an error if there are columns that contain data that have no absolute value.\n",
"\n",
"**Parameters:**\n",
"\n",
"| Name | Type | Description | Default |\n",
"| :----------: | :--: | :------------------------------------------------------------------------------- | :-----: |\n",
"| numeric_only | bool | Only use columns of the table that can be converted to an absolute value. | False |\n",
"\n",
"**Returns:**\n",
"\n",
"| Type | Description |\n",
"| :----------------: | :------------------------------------------------------------------- |\n",
"| Table / KeyedTable | The resulting table like object with only positive numerical values. |"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "032c6006",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"tab.abs(numeric_only=True)"
]
},
{
"cell_type": "markdown",
"id": "d644f8ee",
"metadata": {},
"source": [
"### Table.count()\n",
"\n",
"```\n",
"Table.count(axis=0, numeric_only=False)\n",
"```\n",
"\n",
"Returns the count of non null values across the given axis.\n",
"\n",
"**Parameters:**\n",
"\n",
"| Name | Type | Description | Default |\n",
"| :----------: | :--: | :------------------------------------------------------------------------------- | :-----: |\n",
"| axis | int | The axis to count elements across 1 is columns, 0 is rows. | 0 |\n",
"| numeric_only | bool | Only use columns of the table that are of a numeric data type. | False |\n",
"\n",
"**Returns:**\n",
"\n",
"| Type | Description |\n",
"| :----------------: | :------------------------------------------------------------------- |\n",
"| Dictionary | A dictionary where the key represent the column name / row number and the values are the result of calling `count` on that column / row. |"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cd70f67c",
"metadata": {},
"outputs": [],
"source": [
"tab.count()"
]
},
{
"cell_type": "markdown",
"id": "f8554641",
"metadata": {},
"source": [
"### Table.max()\n",
"\n",
"```\n",
"Table.max(axis=0, skipna=True, numeric_only=False)\n",
"```\n",
"\n",
"Returns the maximum value across the given axis.\n",
"\n",
"**Parameters:**\n",
"\n",
"| Name | Type | Description | Default |\n",
"| :----------: | :--: | :------------------------------------------------------------------------------- | :-----: |\n",
"| axis | int | The axis to calculate the maximum across 0 is columns, 1 is rows. | 0 |\n",
"| skipna | bool | Ignore any null values along the axis. | True |\n",
"| numeric_only | bool | Only use columns of the table that are of a numeric data type. | False |\n",
"\n",
"**Returns:**\n",
"\n",
"| Type | Description |\n",
"| :----------------: | :------------------------------------------------------------------- |\n",
"| Dictionary | A dictionary where the key represents the column name / row number and the values are the result of calling `max` on that column / row. |"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "743d7fb5",
"metadata": {},
"outputs": [],
"source": [
"tab.max()"
]
},
{
"cell_type": "markdown",
"id": "bc5b6dde",
"metadata": {},
"source": [
"### Table.min()\n",
"\n",
"```\n",
"Table.min(axis=0, skipna=True, numeric_only=False)\n",
"```\n",
"\n",
"Returns the minimum value across the given axis.\n",
"\n",
"**Parameters:**\n",
"\n",
"| Name | Type | Description | Default |\n",
"| :----------: | :--: | :------------------------------------------------------------------------------- | :-----: |\n",
"| axis | int | The axis to calculate the minimum across 0 is columns, 1 is rows. | 0 |\n",
"| skipna | bool | Ignore any null values along the axis. | True |\n",
"| numeric_only | bool | Only use columns of the table that are of a numeric data type. | False |\n",
"\n",
"**Returns:**\n",
"\n",
"| Type | Description |\n",
"| :----------------: | :------------------------------------------------------------------- |\n",
"| Dictionary | A dictionary where the key represents the column name / row number and the values are the result of calling `min` on that column / row. |"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d730d7e0",
"metadata": {},
"outputs": [],
"source": [
"tab.min()"
]
},
{
"cell_type": "markdown",
"id": "b52627d2",
"metadata": {},
"source": [
"### Table.idxmax()\n",
"\n",
"```\n",
"Table.idxmax(axis=0, skipna=True, numeric_only=False)\n",
"```\n",
"\n",
"Return index of first occurrence of maximum over requested axis.\n",
"\n",
"**Parameters:**\n",
"\n",
"| Name | Type | Description | Default |\n",
"| :----------: | :--: | :------------------------------------------------------------------------------- | :-----: |\n",
"| axis | int | The axis to calculate the idxmax across. 0 is columns, 1 is rows. | 0 |\n",
"| skipna | bool | Ignore any null values along the axis. | True |\n",
"| numeric_only | bool | Only use columns of the table that are of a numeric data type. | False |\n",
"\n",
"**Returns:**\n",
"\n",
"| Type | Description |\n",
"| :----------------: | :------------------------------------------------------------------- |\n",
"| Dictionary | A dictionary where the key represents the column name / row number and the values are the result of calling `idxmax` on that column / row. |"
]
},
{
"cell_type": "markdown",
"id": "838a07dd",
"metadata": {},
"source": [
"**Examples:**\n",
"\n",
"Calculate the idxmax across the columns of a table"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6c6d3384",
"metadata": {},
"outputs": [],
"source": [
"tab.idxmax()"
]
},
{
"cell_type": "markdown",
"id": "30738846",
"metadata": {},
"source": [
"Calculate the idxmax across the rows of a table using only columns thar are of a numeric data type"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "edf9e128",
"metadata": {},
"outputs": [],
"source": [
"tab.idxmax(axis=1, numeric_only=True)"
]
},
{
"cell_type": "markdown",
"id": "fdb4114c-640a-41ac-a4e7-6c236e9d93ea",
"metadata": {},
"source": [
"### Table.idxmin()\n",
"\n",
"```\n",
"Table.idxmax(axis=0, skipna=True, numeric_only=False)\n",
"```\n",
"\n",
"Return index of first occurrence of minimum over requested axis.\n",
"\n",
"**Parameters:**\n",
"\n",
"| Name | Type | Description | Default |\n",
"| :----------: | :--: | :------------------------------------------------------------------------------- | :-----: |\n",
"| axis | int | The axis to calculate the idxmin across. 0 is columns, 1 is rows. | 0 |\n",
"| skipna | bool | Ignore any null values along the axis. | True |\n",
"| numeric_only | bool | Only use columns of the table that are of a numeric data type. | False |\n",
"\n",
"**Returns:**\n",
"\n",
"| Type | Description |\n",
"| :----------------: | :------------------------------------------------------------------- |\n",
"| Dictionary | A dictionary where the key represents the column name / row number and the values are the result of calling `idxmin` on that column / row. |"
]
},
{
"cell_type": "markdown",
"id": "57053c36-932e-4805-916c-4cd5e3d33e82",
"metadata": {},
"source": [
"**Examples:**\n",
"\n",
"Calculate the idxmin across the columns of a table"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "da7cbf8f",
"metadata": {},
"outputs": [],
"source": [
"tab.idxmin()"
]
},
{
"cell_type": "markdown",
"id": "fb531e00",
"metadata": {},
"source": [
"Calculate the idxmin across the rows of a table using only columns thar are of a numeric data type"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9907226a",
"metadata": {},
"outputs": [],
"source": [
"tab.idxmin(axis=1, numeric_only=True)"
]
},
{
"cell_type": "markdown",
"id": "4aee2790",
"metadata": {},
"source": [
"### Table.sum()\n",
"\n",
"```\n",
"Table.sum(axis=0, skipna=True, numeric_only=False, min_count=0)\n",
"```\n",
"\n",
"Returns the sum of all values across the given axis.\n",
"\n",
"**Parameters:**\n",
"\n",
"| Name | Type | Description | Default |\n",
"| :----------: | :--: | :------------------------------------------------------------------------------- | :-----: |\n",
"| axis | int | The axis to calculate the sum across 0 is columns, 1 is rows. | 0 |\n",
"| skipna | bool | Ignore any null values along the axis. | True |\n",
"| numeric_only | bool | Only use columns of the table that are of a numeric data type. | False |\n",
"| min_count | int | If not set to 0 if there are less then `min_count` values across the axis a null value will be returned | 0 |\n",
"\n",
"**Returns:**\n",
"\n",
"| Type | Description |\n",
"| :----------------: | :------------------------------------------------------------------- |\n",
"| Dictionary | A dictionary where the key represents the column name / row number and the values are the result of calling `sum` on that column / row. |"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4303521e",
"metadata": {},
"outputs": [],
"source": [
"tab.sum()"
]
},
{
"cell_type": "markdown",
"id": "3fd35bc7",
"metadata": {},
"source": [
"### Table.mean()\n",
"\n",
"```\n",
"Table.mean(axis=0, numeric_only=False)\n",
"```\n",
"\n",
"Get the mean of values across the requested axis.\n",
"\n",
"**Parameters:**\n",
"\n",
"| Name | Type | Description | Default |\n",
"| :----------: | :--: | :--------------------------------------------------------- | :-----: |\n",
"| axis | int | The axis to calculate mean across 0 is columns, 1 is rows. | 0 |\n",
"| numeric_only | bool | Include only columns / rows with numeric data. | False |\n",
"\n",
"**Returns:**\n",
"\n",
"| Type | Description |\n",
"| :--------: | :--------------------------------------------------------------------------------------------- |\n",
"| Dictionary | The mean across each row / column with the key corresponding to the row number or column name. |"
]
},
{
"cell_type": "markdown",
"id": "4ce8168f",
"metadata": {},
"source": [
"**Examples:**\n",
"\n",
"Calculate the mean across the columns of a table"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "50b58aad",
"metadata": {},
"outputs": [],
"source": [
"tab = kx.Table(data=\n",
" {\n",
" 'a': [1, 2, 2, 4],\n",
" 'b': [1, 2, 6, 7],\n",
" 'c': [7, 8, 9, 10],\n",
" 'd': [7, 11, 14, 14]\n",
" }\n",
")\n",
"tab"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fc7ab777",
"metadata": {},
"outputs": [],
"source": [
"tab.mean()"
]
},
{
"cell_type": "markdown",
"id": "f3b85934",
"metadata": {},
"source": [
"Calculate the mean across the rows of a table"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8f85e05c",
"metadata": {},
"outputs": [],
"source": [
"tab.mean(axis=1)"
]
},
{
"cell_type": "markdown",
"id": "b0eff83a",
"metadata": {},
"source": [
"### Table.median()\n",
"\n",
"```\n",
"Table.median(axis=0, numeric_only=False)\n",
"```\n",
"\n",
"Get the median of values across the requested axis.\n",
"\n",
"**Parameters:**\n",
"\n",
"| Name | Type | Description | Default |\n",
"| :----------: | :--: | :----------------------------------------------------------- | :-----: |\n",
"| axis | int | The axis to calculate median across 0 is columns, 1 is rows. | 0 |\n",
"| numeric_only | bool | Include only columns / rows with numeric data. | False |\n",
"\n",
"**Returns:**\n",
"\n",
"| Type | Description |\n",
"| :--------: | :----------------------------------------------------------------------------------------------- |\n",
"| Dictionary | The median across each row / column with the key corresponding to the row number or column name. |"
]
},
{
"cell_type": "markdown",
"id": "80f2f2a1",
"metadata": {},
"source": [
"**Examples:**\n",
"\n",
"Calculate the median across the columns of a table"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "46ca7078",
"metadata": {},
"outputs": [],
"source": [
"tab = kx.Table(data=\n",
" {\n",
" 'a': [1, 2, 2, 4],\n",
" 'b': [1, 2, 6, 7],\n",
" 'c': [7, 8, 9, 10],\n",
" 'd': [7, 11, 14, 14]\n",
" }\n",
")\n",
"tab"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0bd18f87",
"metadata": {},
"outputs": [],
"source": [
"tab.median()"
]
},
{
"cell_type": "markdown",
"id": "8312046c",
"metadata": {},
"source": [
"Calculate the median across the rows of a table"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6fd7558c",
"metadata": {},
"outputs": [],
"source": [
"tab.median(axis=1)"
]
},
{
"cell_type": "markdown",
"id": "929fe196",
"metadata": {},
"source": [
"### Table.mode()\n",
"\n",
"```\n",
"Table.mode(axis=0, numeric_only=False, dropna=True)\n",
"```\n",
"\n",
"Get the mode of values across the requested axis.\n",
"\n",
"**Parameters:**\n",
"\n",
"| Name | Type | Description | Default |\n",
"| :----------: | :--: | :------------------------------------------------------------ | :-----: |\n",
"| axis | int | The axis to calculate mode across 0 is columns, 1 is rows. | 0 |\n",
"| numeric_only | bool | Include only columns / rows with numeric data. | False |\n",
"| dropna | bool | Remove null values from the data before calculating the mode. | True |\n",
"\n",
"**Returns:**\n",
"\n",
"| Type | Description |\n",
"| :--------: | :------------------------------------------------------------------------------------------------ |\n",
"| Table | The mode across each row / column with the column corresponding to the row number or column name. |"
]
},
{
"cell_type": "markdown",
"id": "880e64c2",
"metadata": {},
"source": [
"**Examples:**\n",
"\n",
"Calculate the mode across the columns of a table"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b0b087e3",
"metadata": {},
"outputs": [],
"source": [
"tab = kx.Table(data=\n",
" {\n",
" 'a': [1, 2, 2, 4],\n",
" 'b': [1, 2, 6, 7],\n",
" 'c': [7, 8, 9, 10],\n",
" 'd': [7, 11, 14, 14]\n",
" }\n",
")\n",
"tab"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "19d3a003",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"tab.mode()"
]
},
{
"cell_type": "markdown",
"id": "85ce92d2",
"metadata": {},
"source": [
"Calculate the median across the rows of a table"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3d418ed9",
"metadata": {},
"outputs": [],
"source": [
"tab.mode(axis=1)"
]
},
{
"cell_type": "markdown",
"id": "097ff9d9",
"metadata": {},
"source": [
"Calculate the mode across columns and keep null values."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "503efd21",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"tab = kx.Table(data=\n",
" {\n",
" 'x': [0, 1, 2, 3, 4, 5, 6, 7, np.NaN, np.NaN],\n",
" 'y': [10, 11, 12, 13, 14, 15, 16, 17, 18, np.NaN],\n",
" 'z': ['a', 'b', 'c', 'd', 'd', 'e', 'e', 'f', 'g', 'h']\n",
" }\n",
")\n",
"tab"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "94f25640",
"metadata": {},
"outputs": [],
"source": [
"tab.mode(dropna=False)"
]
},
{
"cell_type": "markdown",
"id": "7371feb5",
"metadata": {},
"source": [
"### Table.prod()\n",
"\n",
"```\n",
"Table.prod(axis=0, skipna=True, numeric_only=False, min_count=0)\n",
"```\n",
"\n",
"Returns the product of all values across the given axis.\n",
"\n",
"**Parameters:**\n",
"\n",
"| Name | Type | Description | Default |\n",
"| :----------: | :--: | :------------------------------------------------------------------------------- | :-----: |\n",
"| axis | int | The axis to calculate the product across 0 is columns, 1 is rows. | 0 |\n",
"| skipna | bool | Ignore any null values along the axis. | True |\n",
"| numeric_only | bool | Only use columns of the table that are of a numeric data type. | False |\n",
"| min_count | int | If not set to 0 if there are less then `min_count` values across the axis a null value will be returned | 0 |\n",
"\n",
"**Returns:**\n",
"\n",
"| Type | Description |\n",
"| :----------------: | :------------------------------------------------------------------- |\n",
"| Dictionary | A dictionary where the key represents the column name / row number and the values are the result of calling `prd` on that column / row. |"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7852e009",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"# This example will use a smaller version of the above table\n",
"# as the result of calculating the product quickly goes over the integer limits.\n",
"N = 10\n",
"tab = kx.Table(data={\n",
" 'sym': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),\n",
" 'price': 2.5 - kx.random.random(N, 5.0),\n",
" 'traded': 10 - kx.random.random(N, 20),\n",
" 'hold': kx.random.random(N, False)\n",
" })\n",
"tab[tab['traded'] == 0, 'traded'] = 1\n",
"tab[tab['price'] == 0, 'price'] = 1.0\n",
"tab"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5ced8761",
"metadata": {},
"outputs": [],
"source": [
"tab.prod(numeric_only=True)"
]
},
{
"cell_type": "markdown",
"id": "fe565b65-fbf2-47ba-a26e-791d09fd4f55",
"metadata": {},
"source": [
"### Table.kurt()\n",
"\n",
"```\n",
"Table.kurt(axis=0, skipna=True, numeric_only=False)\n",
"```\n",
"\n",
"Return unbiased kurtosis over requested axis. Kurtosis obtained using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). Normalized by N-1.\n",
"\n",
"\n",
"**Parameters:**\n",
"\n",
"| Name | Type | Description | Default |\n",
"| :----------: | :--: | :------------------------------------------------------------------------------- | :-----: |\n",
"| axis | int | Axis for the function to be applied on. 0 is columns, 1 is rows. | 0 |\n",
"| skipna | bool | Not yet implemented | True |\n",
"| numeric_only | bool | Only use columns of the table that are of a numeric data type. | False |\n",
"\n",
"**Returns:**\n",
"\n",
"| Type | Description |\n",
"| :--------: | :--------------------------------------------------------------------------------------- |\n",
"| Dictionary | Map of columns and their yielded kurtosis values |"
]
},
{
"cell_type": "markdown",
"id": "e6069cac-d260-4f80-9688-3d1ec273cd22",
"metadata": {},
"source": [
"**Examples:**\n",
"\n",
"Calculate the kurt across the columns of a table"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4219c826-a84b-4722-9847-372d3837acdb",
"metadata": {},
"outputs": [],
"source": [
"tab = kx.Table(data=\n",
" {\n",
" 'a': [1, 2, 2, 4],\n",
" 'b': [1, 2, 6, 7],\n",
" 'c': [7, 8, 9, 10],\n",
" 'd': [7, 11, 14, 14]\n",
" }\n",
")\n",
"tab"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "437ab485-bf73-4209-b63e-aa0d1bfa5d58",
"metadata": {},
"outputs": [],
"source": [
"tab.kurt()"
]
},
{
"cell_type": "markdown",
"id": "ea3e1cf6-2304-4061-a846-1cbc0572ea9d",
"metadata": {},
"source": [
"Calculate the kurtosis across the rows of a table"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "63312e8b-76f0-46eb-b4d7-b2213561c86e",
"metadata": {},
"outputs": [],
"source": [
"tab.kurt(axis=1)"
]
},
{
"cell_type": "markdown",
"id": "b248fef1",
"metadata": {},
"source": [
"### Table.sem()\n",
"\n",
"```\n",
"Table.sem(axis=0, skipna=True, numeric_only=False, ddof=0)\n",
"```\n",
"Return unbiased standard error of the mean over requested axis. Normalized by N-1 by default. This can be changed using the `ddof` argument.\n",
"\n",
"**Parameters:**\n",
"\n",
"| Name | Type | Description | Default |\n",
"| :----------: | :--: | :------------------------------------------------------------------------------- | :-----: |\n",
"| axis | int | The axis to calculate the sum across. 0 is columns, 1 is rows. | 0 |\n",
"| skipna | bool | not yet implemented | True |\n",
"| numeric_only | bool | Only use columns of the table that are of a numeric data type. | False |\n",
"| ddof | int | Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements. | 1 |\n",
"\n",
"**Returns:**\n",
"\n",
"| Type | Description |\n",
"| :----------------: | :------------------------------------------------------------------- |\n",
"| Dictionary | The sem across each row / column with the key corresponding to the row number or column name. |"
]
},
{
"cell_type": "markdown",
"id": "71bd1d6f",
"metadata": {},
"source": [
"**Examples**\n",
"\n",
"Calculate the sem across the columns of a table"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "350c2b7c",
"metadata": {},
"outputs": [],
"source": [
"tab = kx.Table(data=\n",
" {\n",
" 'a': [1, 2, 2, 4],\n",
" 'b': [1, 2, 6, 7],\n",
" 'c': [7, 8, 9, 10],\n",
" 'd': [7, 11, 14, 14],\n",
" }\n",
" )\n",
"tab"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b89307e9",
"metadata": {},
"outputs": [],
"source": [
"tab.sem()"
]
},
{
"cell_type": "markdown",
"id": "6933f01f",
"metadata": {},
"source": [
"Calculate the sem across the rows of a table"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3edd3feb",
"metadata": {},
"outputs": [],
"source": [
"tab.sem(axis=1)"
]
},
{
"cell_type": "markdown",
"id": "ae7afe5a",
"metadata": {},
"source": [
"Calculate sem accross columns with ddof=0:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "de626961",
"metadata": {},
"outputs": [],
"source": [
"tab.sem(ddof=0)"
]
},
{
"cell_type": "markdown",
"id": "ff51630f",
"metadata": {},
"source": [
"### Table.skew()\n",
"\n",
"```\n",
"Table.skew(axis=0, skipna=True, numeric_only=False)\n",
"```\n",
"\n",
"Returns the skewness of all values across the given axis.\n",
"\n",
"**Parameters:**\n",
"\n",
"| Name | Type | Description | Default |\n",
"| :----------: | :--: | :------------------------------------------------------------------------------- | :-----: |\n",
"| axis | int | The axis to calculate the skewness across 0 is columns, 1 is rows. | 0 |\n",
"| skipna | bool | Ignore any null values along the axis. | True |\n",
"| numeric_only | bool | Only use columns of the table that are of a numeric data type. | False |\n",
"\n",
"\n",
"**Returns:**\n",
"\n",
"| Type | Description |\n",
"| :----------------: | :------------------------------------------------------------------- |\n",
"| Dictionary | A dictionary where the key represent the column name / row number and the values are the result of calling `skew` on that column / row. |"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "af65b9ab",
"metadata": {},
"outputs": [],
"source": [
"tab.skew(numeric_only=True)"
]
},
{
"cell_type": "markdown",
"id": "b054645b",
"metadata": {},
"source": [
"### Table.std()\n",
"\n",
"```\n",
"Table.std(axis=0, skipna=True, numeric_only=False, ddof=0)\n",
"```\n",
"\n",
"Return sample standard deviation over requested axis. Normalized by N-1 by default. This can be changed using the ddof argument.\n",
"\n",
"\n",
"**Parameters:**\n",
"\n",
"| Name | Type | Description | Default |\n",
"| :----------: | :--: | :------------------------------------------------------------------------------- | :-----: |\n",
"| axis | int | The axis to calculate the sum across 0 is columns, 1 is rows. | 0 |\n",
"| skipna | bool | not yet implemented | True |\n",
"| numeric_only | bool | Only use columns of the table that are of a numeric data type. | False |\n",
"| ddof | int | Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements. | 1 |\n",
"\n",
"**Returns:**\n",
"\n",
"| Type | Description |\n",
"| :----------------: | :------------------------------------------------------------------- |\n",
"| Table | The std across each row / column with the key corresponding to the row number or column name. |"
]
},
{
"cell_type": "markdown",
"id": "9a0c1a5d",
"metadata": {},
"source": [
"**Examples:**\n",
"\n",
"Calculate the std across the columns of a table"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "42c3e6bf",
"metadata": {},
"outputs": [],
"source": [
"tab = kx.Table(data=\n",
" {\n",
" 'a': [1, 2, 2, 4],\n",
" 'b': [1, 2, 6, 7],\n",
" 'c': [7, 8, 9, 10],\n",
" 'd': [7, 11, 14, 14]\n",
" }\n",
")\n",
"tab"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "947435db",
"metadata": {},
"outputs": [],
"source": [
"tab.std()"
]
},
{
"cell_type": "markdown",
"id": "463894f1",
"metadata": {},
"source": [
"Calculate the std across the rows of a table"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7d918f6a",
"metadata": {},
"outputs": [],
"source": [
"tab.std(axis=1)"
]
},
{
"cell_type": "markdown",
"id": "ad38071b",
"metadata": {},
"source": [
"Calculate std accross columns with ddof=0:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "77c7aaa3",
"metadata": {},
"outputs": [],
"source": [
"tab.std(ddof=0)"
]
},
{
"cell_type": "markdown",
"id": "5f1e5350",
"metadata": {},
"source": [
"## Group By"
]
},
{
"cell_type": "markdown",
"id": "57fe61a2",
"metadata": {},
"source": [
"### Table.groupby()\n",
"\n",
"```\n",
"Table.groupby(\n",
" by=None,\n",
" axis=0,\n",
" level=None,\n",
" as_index=True,\n",
" sort=True,\n",
" group_keys=True,\n",
" observed=False,\n",
" dropna=True\n",
")\n",
"```\n",
"\n",
"Group data based on like values within columns to easily apply operations on groups.\n",
"\n",
"**Parameters:**\n",
"\n",
"| Name | Type | Description | Default |\n",
"| :--------------: | :--: | :-------------------------------------------------------------------------- | :------: |\n",
"| by | Union[Symbol/SymbolVector/int/list] | The column name(s) or column index(es) to group the data on. | None |\n",
"| axis | int | Not Yet Implemented. | 0 |\n",
"| level | Union[Symbol/SymbolVector/int/list] | The column name(s) or column index(es) to group the data on. | None | \n",
"| as_index | bool | Return the table with groups as the key column. | True |\n",
"| sort | bool | Sort the resulting table based off the key. | True |\n",
"| group_keys | bool | Not Yet Implemented. | True | \n",
"| observed | bool | Not Yet Implemented. | False |\n",
"| dropna | bool | Drop groups where the group is null. | True | \n",
"\n",
"Either `by` or `level` can be used to specify the columns to group on, using both will raise an error.\n",
"\n",
"Using and integer or list of integers is only possible when calling `groupby` on a `KeyedTable` object.\n",
"\n",
"**Returns:**\n",
"\n",
"| Type | Description |\n",
"| :----------: | :---------------------------------------------- |\n",
"| GroupbyTable | The resulting table after the grouping is done. |\n",
"\n",
"**Examples:**\n",
"\n",
"Example Table."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "aa82d895",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"tab = kx.Table(data={\n",
" 'Animal': ['Falcon', 'Falcon', 'Parrot', 'Parrot'],\n",
" 'Max Speed': [380., 370., 24., 26.],\n",
" 'Max Altitude': [570., 555., 275., 300.]\n",
"})\n",
"\n",
"tab"
]
},
{
"cell_type": "markdown",
"id": "0487cfe5",
"metadata": {},
"source": [
"Group on the `Animal` column and calculate the mean of the resulting `Max Speed` and `Max Altitude` columns."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "db5f0dd6",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"tab.groupby(kx.SymbolVector(['Animal'])).mean()"
]
},
{
"cell_type": "markdown",
"id": "361019ba",
"metadata": {},
"source": [
"Example table with multiple columns to group on."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c1985906",
"metadata": {},
"outputs": [],
"source": [
"tab = kx.Table(\n",
" data={\n",
" 'Animal': ['Falcon', 'Falcon', 'Parrot', 'Parrot', 'Parrot'],\n",
" 'Type': ['Captive', 'Wild', 'Captive', 'Wild', 'Wild'],\n",
" 'Max Speed': [390., 350., 30., 20., 25.]\n",
" })\n",
"tab = tab.set_index(2)\n",
"tab"
]
},
{
"cell_type": "markdown",
"id": "ae3d3244",
"metadata": {},
"source": [
"Group on multiple columns using thier indexes."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bb9dd53b",
"metadata": {},
"outputs": [],
"source": [
"tab.groupby(level=[0, 1]).mean()"
]
},
{
"cell_type": "markdown",
"id": "14dfdd2a",
"metadata": {},
"source": [
"Example table with Nulls."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8f389591",
"metadata": {},
"outputs": [],
"source": [
"tab = kx.Table(\n",
" [\n",
" [\"a\", 12, 12],\n",
" [kx.SymbolAtom.null, 12.3, 33.],\n",
" [\"b\", 12.3, 123],\n",
" [\"a\", 1, 1]\n",
" ],\n",
" columns=[\"a\", \"b\", \"c\"]\n",
")\n",
"tab"
]
},
{
"cell_type": "markdown",
"id": "62e3f5f5",
"metadata": {},
"source": [
"Group on column `a` and keep null groups."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bcca967d",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"tab.groupby('a', dropna=False).sum()"
]
},
{
"cell_type": "markdown",
"id": "2ddc596a",
"metadata": {},
"source": [
"Group on column `a` keeping null groups and not using the groups as an index column."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c8f9a0b4",
"metadata": {},
"outputs": [],
"source": [
"tab.groupby('a', dropna=False, as_index=False).sum()"
]
},
{
"cell_type": "markdown",
"id": "56cf152e",
"metadata": {},
"source": [
"## Apply\n",
"\n",
"### Table.apply()\n",
"\n",
"```\n",
"Table.apply(\n",
" func,\n",
" *args,\n",
" axis=0,\n",
" raw=None,\n",
" result_type=None,\n",
" **kwargs\n",
")\n",
"```\n",
"\n",
"Apply a function along an axis of the DataFrame.\n",
"\n",
"Objects passed to a function are passed as kx list objects.\n",
"\n",
"**Parameters:**\n",
"\n",
"| Name | Type | Description | Default |\n",
"| :--------------: | :---------------------------------: | :-------------------------------------------------------------------------- | :------: |\n",
"| func | function | Function to apply to each column or row. | |\n",
"| `*args` | any | Positional arguments to pass to `func` in addition to the kx list. | |\n",
"| axis | int | The axis along which the function is applied, `0` applies function to each column, `1` applied function to each row. | 0 | \n",
"| raw | bool | Not yet implemented. | None |\n",
"| result_type | str | Not yet implemented. | None |\n",
"| `**kwargs` | dict | Additional keyword arguments to pass as keywords to `func`, this argument is not implemented in the case `func` is a kx callable function. | None | \n",
"\n",
"\n",
"**Returns:**\n",
"\n",
"| Type | Description |\n",
"| :-----------------------: | :---------------------------------------------- |\n",
"| List, Dictionary or Table | Result of applying `func` along the giveen axis of the `kx.Table`. |\n",
"\n",
"**Examples:**\n",
"\n",
"Example Table."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0a85caee",
"metadata": {},
"outputs": [],
"source": [
"tab = kx.Table([[4, 9]] * 3, columns=['A', 'B'])\n",
"\n",
"tab"
]
},
{
"cell_type": "markdown",
"id": "e4cddd7b",
"metadata": {},
"source": [
"Apply square root on each item within a column"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0895f9c5",
"metadata": {},
"outputs": [],
"source": [
"tab.apply(kx.q.sqrt)"
]
},
{
"cell_type": "markdown",
"id": "47b6ca70",
"metadata": {},
"source": [
"Apply a reducing function sum on either axis"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "901a692b",
"metadata": {},
"outputs": [],
"source": [
"tab.apply(kx.q.sum)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "43ab33ab",
"metadata": {},
"outputs": [],
"source": [
"tab.apply(lambda x: sum(x), axis=1)"
]
},
{
"cell_type": "markdown",
"id": "c20acb8a",
"metadata": {},
"source": [
"## Aggregate\n",
"\n",
"### Table.agg()\n",
"\n",
"```\n",
"Table.agg(\n",
" func,\n",
" axis=0,\n",
" *args,\n",
" **kwargs\n",
")\n",
"```\n",
"\n",
"Aggregate data using one or more operations over a specified axis\n",
"\n",
"Objects passed to a function are passed as kx vector/list objects.\n",
"\n",
"**Parameters:**\n",
"\n",
"| Name | Type | Description | Default |\n",
"| :--------------: | :---------------------------------: | :-------------------------------------------------------------------------- | :------: |\n",
"| func | function, str, list or dict | Function to use for aggregating the data. If a function this must either work when passed a `Table` or when passed to `Table.apply`
Accepted combinations are:
- function
- string function name
- list of functions and/or function names, e.g. `[kx.q.sum, 'mean']`
- dict of axis labels -> functions or function names
| |\n",
"| `*args` | any | Positional arguments to pass to `func` in addition to the kx list. | |\n",
"| axis | int | The axis along which the function is applied, `0` applies function to each column, at present row based application is not supported. | 0 | \n",
"| `**kwargs` | dict | Additional keyword arguments to pass as keywords to `func`, this argument is not implemented in the case `func` is a kx callable function. | None | \n",
"\n",
"\n",
"**Returns:**\n",
"\n",
"| Type | Description |\n",
"| :-----------------------: | :---------------------------------------------- |\n",
"| List, Dictionary or Table | Result of applying `func` along the giveen axis of the `kx.Table`. |\n",
"\n",
"**Examples:**\n",
"\n",
"Example Table."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0fd05e6e",
"metadata": {},
"outputs": [],
"source": [
"tab = kx.Table([[1, 2, 3],\n",
" [4, 5, 6],\n",
" [7, 8, 9],\n",
" [kx.FloatAtom.null, kx.FloatAtom.null, kx.FloatAtom.null]],\n",
" columns=['A', 'B', 'C'])\n",
"\n",
"tab"
]
},
{
"cell_type": "markdown",
"id": "cecd45f0",
"metadata": {},
"source": [
"Aggregate a list of functions over rows"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "857ff7cf",
"metadata": {},
"outputs": [],
"source": [
"tab.agg(['sum', 'min'])"
]
},
{
"cell_type": "markdown",
"id": "8bc17135",
"metadata": {},
"source": [
"Perform an aggregation using a user specified function"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4108f2e5",
"metadata": {},
"outputs": [],
"source": [
"import statistics\n",
"def mode(x):\n",
" return statistics.mode(x)\n",
"tab.agg(mode)"
]
},
{
"cell_type": "markdown",
"id": "ba013165",
"metadata": {},
"source": [
"Apply an aggregation supplying column specification for supplied function"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1cf2c721",
"metadata": {},
"outputs": [],
"source": [
"tab.agg({'A': 'max', 'B': mode})"
]
},
{
"cell_type": "markdown",
"id": "dc726b75",
"metadata": {},
"source": [
"## Data Preprocessing"
]
},
{
"cell_type": "markdown",
"id": "d508891a",
"metadata": {},
"source": [
"### Table.add_prefix()\n",
"\n",
"```\n",
"Table.add_prefix(columns)\n",
"```\n",
"\n",
"Rename columns adding a prefix in a table and return the resulting Table object.\n",
"\n",
"**Parameters:**\n",
"\n",
"| Name | Type | Description | Default |\n",
"| :-----: | :-------------: | :------------------------------------------------------------------ | :--------: |\n",
"| prefix | str | The string that will be concatenated with the name of the columns | _required_ |\n",
"| axis | int | Axis to add prefix on. | 0 |\n",
"\n",
"**Returns:**\n",
"\n",
"| Type | Description |\n",
"| :---: | :----------------------------------------------------------------- |\n",
"| Table | A table with the given column(s) renamed adding a prefix. |"
]
},
{
"cell_type": "markdown",
"id": "4255701a",
"metadata": {},
"source": [
"**Examples:**\n",
"\n",
"The initial table to which a prefix will be added to its columns"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "905c810d",
"metadata": {},
"outputs": [],
"source": [
"tab.head()"
]
},
{
"cell_type": "markdown",
"id": "cd6a4005",
"metadata": {},
"source": [
"Add \"col_\" to table columns:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "11296af4",
"metadata": {},
"outputs": [],
"source": [
"tab.add_prefix(prefix=\"col_\").head()"
]
},
{
"cell_type": "markdown",
"id": "8fb874ba",
"metadata": {},
"source": [
"### Table.add_suffix()\n",
"\n",
"```\n",
"Table.add_suffix(columns)\n",
"```\n",
"\n",
"Rename columns adding a suffix in a table and return the resulting Table object.\n",
"\n",
"**Parameters:**\n",
"\n",
"| Name | Type | Description | Default |\n",
"| :-----: | :-------------: | :------------------------------------------------------------------ | :--------: |\n",
"| suffix | str | The string that will be concatenated with the name of the columns | _required_ |\n",
"| axis | int | Axis to add suffix on. | 0 |\n",
"\n",
"**Returns:**\n",
"\n",
"| Type | Description |\n",
"| :---: | :----------------------------------------------------------------- |\n",
"| Table | A table with the given column(s) renamed adding a suffix. |"
]
},
{
"cell_type": "markdown",
"id": "47618c02",
"metadata": {},
"source": [
"**Examples:**\n",
"\n",
"The initial table to which a suffix will be added to its columns"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1e5c17b3",
"metadata": {},
"outputs": [],
"source": [
"tab.head()"
]
},
{
"cell_type": "markdown",
"id": "e93f30cb",
"metadata": {},
"source": [
"Add \"_col\" to table columns:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5625768b",
"metadata": {},
"outputs": [],
"source": [
"tab.add_suffix(suffix=\"_col\").head()"
]
},
{
"cell_type": "markdown",
"id": "a5bb7631",
"metadata": {},
"source": [
"### Table.astype()\n",
"\n",
"```\n",
"Table.astype(dtype, copy=True, errors='raise')\n",
"```\n",
"\n",
"Cast a column/columns of the Dataframes object to a specified `dtype`.\n",
"\n",
"**Parameters:**\n",
"\n",
"| Name | Type | Description | Default |\n",
"| :-: | :-: | :-: | :-: |\n",
"| dtype | data type, or dict of column name -> data type | Use a PyKx wrapper data type or Python type to cast all columns to the same type. Alternatively, use {col: dtype, …}, where col is a column label and dtype is PyKx wrapper data type to cast one or more of the DataFrame’s columns to column-specific types. | |\n",
"| copy | Boolean | Default of True, False not implemented | True |\n",
"| errors | {‘raise’, ‘ignore’} | If passed anything other than 'raise', it will return the dataframe | 'raise' |\n",
"\n",
"\n",
"**Returns:**\n",
"\n",
"| Type | Description |\n",
"| :-: | :-: |\n",
"| Dataframe | The dataframe with columns casted according to passed dtypes |"
]
},
{
"cell_type": "markdown",
"id": "e0af2087",
"metadata": {},
"source": [
"**Examples:**\n",
"\n",
"The examples in the section will use the example table."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "deb4809e",
"metadata": {},
"outputs": [],
"source": [
"df = kx.Table(data = {\n",
" 'c1': kx.IntVector([1, 2, 3]),\n",
" 'c2': kx.LongVector([1, 2, 3]),\n",
" 'c3': kx.ShortVector([1, 2, 3]),\n",
" 'c4': kx.IntVector([1, 2, 3])\n",
" })"
]
},
{
"cell_type": "markdown",
"id": "9126a84d",
"metadata": {},
"source": [
"Cast all columns to dtype LongVector"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "da1b75cb",
"metadata": {},
"outputs": [],
"source": [
"df.astype(kx.LongVector)"
]
},
{
"cell_type": "markdown",
"id": "3799183f",
"metadata": {},
"source": [
"Casting as specified in the dictionary supplied with given dtype per column"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "77de55f5",
"metadata": {},
"outputs": [],
"source": [
"df.astype({'c1':kx.LongVector, 'c2':'kx.ShortVector'})"
]
},
{
"cell_type": "markdown",
"id": "e73a33cd",
"metadata": {},
"source": [
"The next example will use this table"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "73e47ecc",
"metadata": {},
"outputs": [],
"source": [
"df = kx.Table(data={\n",
" 'c1': kx.TimestampAtom('now'),\n",
" 'c2': ['abc', 'def', 'ghi'],\n",
" 'c3': [1, 2, 3],\n",
" 'c4': [b'abc', b'def', b'ghi'],\n",
" 'c5': b'abc',\n",
" 'c6': [[1, 2, 3], [4, 5, 6], [7, 8, 9]]\n",
" })\n",
"df"
]
},
{
"cell_type": "markdown",
"id": "5eb8e9f2",
"metadata": {},
"source": [
"Casting char and string columns to symbol columns"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b56e61ab",
"metadata": {},
"outputs": [],
"source": [
"df.astype({'c4':kx.SymbolVector, 'c5':kx.SymbolVector})"
]
},
{
"cell_type": "markdown",
"id": "c7422edd",
"metadata": {},
"source": [
"### Table.drop()\n",
"\n",
"```\n",
"Table.drop(item, axis=0)\n",
"```\n",
"\n",
"Remove either columns or rows from a table and return the resulting Table object.\n",
"\n",
"**Parameters:**\n",
"\n",
"| Name | Type | Description | Default |\n",
"| :-----: | :--------------------: | :---------------------------------------------------------- | :--------: |\n",
"| item | Union[str, list[str]] | The column name(s) or row number(s) to drop from the table. | _required_ |\n",
"| axis | int | The column name or list of names to pop from the table. | 0 |\n",
"\n",
"**Returns:**\n",
"\n",
"| Type | Description |\n",
"| :---: | :----------------------------------------------------------------- |\n",
"| Table | A table with the given column(s) / row(s) removed. |"
]
},
{
"cell_type": "markdown",
"id": "6b589694",
"metadata": {},
"source": [
"**Examples:**\n",
"\n",
"Drop rows from a table."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e0df894a",
"metadata": {},
"outputs": [],
"source": [
"# The examples in this section will use this example table filled with random data\n",
"N = 1000\n",
"tab = kx.Table(data = {\n",
" 'x': kx.q.til(N),\n",
" 'y': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),\n",
" 'z': kx.random.random(N, 500.0),\n",
" 'w': kx.random.random(N, 1000),\n",
" 'v': kx.random.random(N, [kx.LongAtom.null, 0, 50, 100, 200, 250])})\n",
"tab.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f7553c97",
"metadata": {},
"outputs": [],
"source": [
"tab.drop([0, 2, 4, 6, 8, 10]).head()"
]
},
{
"cell_type": "markdown",
"id": "3b68fcbf",
"metadata": {},
"source": [
"Drop columns from a table."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1a07c27f",
"metadata": {},
"outputs": [],
"source": [
"tab.drop('y', axis=1).head()"
]
},
{
"cell_type": "markdown",
"id": "d30d870b",
"metadata": {},
"source": [
"### Table.drop_duplicates()\n",
"\n",
"```\n",
"Table.drop_duplicates()\n",
"```\n",
"\n",
"Remove either columns or rows from a table and return the resulting Table object.\n",
"\n",
"**Returns:**\n",
"\n",
"| Type | Description |\n",
"| :---: | :--------------------------------------- |\n",
"| Table | A table with all duplicate rows removed. |"
]
},
{
"cell_type": "markdown",
"id": "3c633610",
"metadata": {},
"source": [
"**Examples:**\n",
"\n",
"Create a table with duplicates for the example"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "672ae369",
"metadata": {},
"outputs": [],
"source": [
"N = 100\n",
"tab2 = kx.Table(data ={\n",
" 'x': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),\n",
" 'x1': kx.random.random(N, 3)\n",
" })\n",
"tab2"
]
},
{
"cell_type": "markdown",
"id": "5912fc4e",
"metadata": {},
"source": [
"Drop all duplicate rows from the table."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9cc0d387",
"metadata": {},
"outputs": [],
"source": [
"tab2.drop_duplicates()"
]
},
{
"cell_type": "markdown",
"id": "6110d8d9",
"metadata": {},
"source": [
"### Table.pop()\n",
"\n",
"```\n",
"Table.pop(item)\n",
"```\n",
"\n",
"Remove a column or columns from a table by column name and return the column after it has been removed.\n",
"\n",
"**Parameters:**\n",
"\n",
"| Name | Type | Description | Default |\n",
"| :-----: | :--------------------: | :------------------------------------------------------ | :--------: |\n",
"| item | Union[str, list[str]] | The column name or list of names to pop from the table. | _required_ |\n",
"\n",
"**Returns:**\n",
"\n",
"| Type | Description |\n",
"| :---: | :----------------------------------------------------------------- |\n",
"| Table | A table containing only the columns removed from the input table. |"
]
},
{
"cell_type": "markdown",
"id": "70c2c22a",
"metadata": {},
"source": [
"**Examples:**\n",
"\n",
"Remove the `v` column from the table and return it."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cc1770f6",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"display(tab.head())\n",
"print('\\n\\nPop the `v` column out of the table')\n",
"display(tab.pop(\"v\"))\n",
"print('\\n\\nUpdated Table')\n",
"display(tab.head())"
]
},
{
"cell_type": "markdown",
"id": "e4843e47",
"metadata": {},
"source": [
"Remove the `z` and `w` columns from the table and return them."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3c9dda2a",
"metadata": {},
"outputs": [],
"source": [
"display(tab.head())\n",
"print('\\n\\nPop the `z` and `w` columns out of the table')\n",
"display(tab.pop([\"z\", \"w\"]).head())\n",
"print('\\n\\nUpdated Table')\n",
"display(tab.head())"
]
},
{
"cell_type": "markdown",
"id": "68e67196",
"metadata": {},
"source": [
"### Table.rename()\n",
"\n",
"```\n",
"Table.rename(labels=None, index=None, columns=None, axis=None, copy=None, inplace=False, level=None, errors='ignore', mapper=None)\n",
"```\n",
"\n",
"Rename columns in a table and return the resulting Table object.\n",
"\n",
"**Parameters:**\n",
"\n",
"| Name | Type | Description | Default |\n",
"| :------: | :----: | :------------------------------------------------------------------------------------------------------------------| :---:|\n",
"| labels | dict | Deprecated. Please use `mapper` keyword. | None |\n",
"| columns | dict | A dictionary of column name to new column name to use when renaming. | None |\n",
"| index | dict | A dictionary of index to new index name to use when renaming single key column keyed tables. | None |\n",
"| axis | {0 or 'index', 1 or 'columns'} | Designating the axis to be renamed by the _mapper_ dictionary. | None |\n",
"| copy | None | Not yet implemented. | None |\n",
"| inplace | bool | Not yet implemented. | None |\n",
"| level | None | Not yet implemented. | None |\n",
"| errors | string | Not yet implemented. | None |\n",
"| mapper | dict | A dictionary of either new index or column names to new names to be used in conjunction with the _axis_ parameter. | None |\n",
"\n",
"**Returns:**\n",
"\n",
"| Type | Description |\n",
"| :---: | :----------------------------------------------------------------- |\n",
"| Table | A table with the given columns or indices renamed. |"
]
},
{
"cell_type": "markdown",
"id": "08c8748e",
"metadata": {},
"source": [
"**Examples:**\n",
"\n",
"The initial table we will be renaming columns on and a keyed table to rename the index on."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e131bae9",
"metadata": {},
"outputs": [],
"source": [
"tab.head()\n",
"key_tab = kx.KeyedTable(data=tab) "
]
},
{
"cell_type": "markdown",
"id": "b5ef3e3d",
"metadata": {},
"source": [
"Rename column `x` to `index` and `y` to `symbol` using the `columns` keyword."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e03e5b8e",
"metadata": {},
"outputs": [],
"source": [
"tab.rename(columns={'x': 'index', 'y': 'symbol'}).head()"
]
},
{
"cell_type": "markdown",
"id": "6d25ea19",
"metadata": {},
"source": [
"Rename column `x` to `index` and `y` to `symbol` by setting the `axis` keyword."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4a8da84c",
"metadata": {},
"outputs": [],
"source": [
"tab.rename({'x': 'index', 'y': 'symbol'}, axis = 1).head()"
]
},
{
"cell_type": "markdown",
"id": "9d887f84",
"metadata": {},
"source": [
"Rename index of a keyed table by using literal `index` as the `axis` parameter."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4619e64e",
"metadata": {},
"outputs": [],
"source": [
"key_tab.rename({0:\"a\", 1:\"b\"}, axis = 'index').head()"
]
},
{
"cell_type": "markdown",
"id": "fda14bd0-5be3-44f3-a5ba-36ab067eb384",
"metadata": {},
"source": [
"### Table.replace()\n",
"``` Table.replace(to_replace, value) ```\n",
"\n",
"Replace all values in a table with another given value.\n",
"\n",
"**Parameters:**\n",
"\n",
"| Name | Type | Description | Default |\n",
"| :-------: | :--- | :------------------------------------------------------------------------------------------| :-----: |\n",
"| to_replace| any | Value of element in table you wish to replace. | None |\n",
"| value | any | New value to perform replace with. | None |\n",
"\n",
"**Returns:**\n",
"\n",
"| Type | Description |\n",
"| :---: | :----------------------------------------------------------------- |\n",
"| Table | A table with the given elements replaced with new value. |"
]
},
{
"cell_type": "markdown",
"id": "d211a836-b74c-42df-9da4-b20896c6c1f7",
"metadata": {},
"source": [
"**Examples**\n",
"\n",
"Create an unkeyed `Table` and a `KeyedTable` with elements to be replaced."
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "bbbec511-0395-4be3-b9b4-e6d3c09a21a7",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
" \n",
" \n",
" | \n",
" b | \n",
" c | \n",
" d | \n",
" e | \n",
"
\n",
" \n",
" a | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 2 | \n",
" 4 | \n",
" 1b | \n",
" a | \n",
" 1 | \n",
"
\n",
" \n",
" 2 | \n",
" 2 | \n",
" 0b | \n",
" b | \n",
" 2 | \n",
"
\n",
" \n",
" 3 | \n",
" 6 | \n",
" 1b | \n",
" c | \n",
" `a | \n",
"
\n",
" \n",
"
"
],
"text/plain": [
"pykx.KeyedTable(pykx.q('\n",
"a| b c d e \n",
"-| --------\n",
"2| 4 1 a 1 \n",
"2| 2 0 b 2 \n",
"3| 6 1 c `a\n",
"'))"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"tab = kx.q('([] a:2 2 3; b:4 2 6; c:(1b;0b;1b); d:(`a;`b;`c); e:(1;2;`a))')\n",
"ktab = kx.q('([a:2 2 3]b:4 2 6; c:(1b;0b;1b); d:(`a;`b;`c); e:(1;2;`a))')\n",
"ktab"
]
},
{
"cell_type": "markdown",
"id": "cbfcf189-628d-45fe-ab85-2330b46fdcc9",
"metadata": {},
"source": [
"Replace all instances of `2` in the `KeyedTable` with `123`. Note the key column remains unchanged."
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "3a36a978-022a-4e49-8191-05a768d5f30e",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
" \n",
" \n",
" | \n",
" b | \n",
" c | \n",
" d | \n",
" e | \n",
"
\n",
" \n",
" a | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 2 | \n",
" 4 | \n",
" 1b | \n",
" a | \n",
" 1 | \n",
"
\n",
" \n",
" 2 | \n",
" 123 | \n",
" 0b | \n",
" b | \n",
" 123 | \n",
"
\n",
" \n",
" 3 | \n",
" 6 | \n",
" 1b | \n",
" c | \n",
" `a | \n",
"
\n",
" \n",
"
"
],
"text/plain": [
"pykx.KeyedTable(pykx.q('\n",
"a| b c d e \n",
"-| -----------\n",
"2| 4 1 a 1 \n",
"2| 123 0 b 123\n",
"3| 6 1 c `a \n",
"'))"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ktab.replace(2,123)"
]
},
{
"cell_type": "markdown",
"id": "6cc51c70-af14-4061-bdd7-d2fa7d8df20b",
"metadata": {},
"source": [
"Replace all `True` values with a list of strings."
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "a1b87680-f2aa-4434-bcb6-2f4b384b735c",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
" \n",
" \n",
" | \n",
" a | \n",
" b | \n",
" c | \n",
" d | \n",
" e | \n",
"
\n",
" \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 2 | \n",
" 4 | \n",
" `one`two`three | \n",
" a | \n",
" 1 | \n",
"
\n",
" \n",
" 1 | \n",
" 2 | \n",
" 2 | \n",
" 0b | \n",
" b | \n",
" 2 | \n",
"
\n",
" \n",
" 2 | \n",
" 3 | \n",
" 6 | \n",
" `one`two`three | \n",
" c | \n",
" `a | \n",
"
\n",
" \n",
"
"
],
"text/plain": [
"pykx.Table(pykx.q('\n",
"a b c d e \n",
"-----------------------\n",
"2 4 `one`two`three a 1 \n",
"2 2 0b b 2 \n",
"3 6 `one`two`three c `a\n",
"'))"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"tab.replace(True, (b\"one\", b\"two\", b\"three\"))"
]
},
{
"cell_type": "markdown",
"id": "73059996",
"metadata": {},
"source": [
"### Table.reset_index()\n",
"\n",
"```\n",
"Table.reset_index(levels, *,\n",
" drop=False, inplace=False,\n",
" col_level=0, col_fill='',\n",
" allow_duplicates=False,\n",
" names=None)\n",
"```\n",
"\n",
"Reset the keys/index of a keyed PyKX table. This can be used to remove/unset one or more keys within a table.\n",
"\n",
"**Parameters:**\n",
"\n",
"| Name | Type | Description | Default |\n",
"| :-: | :-: | :-: | :-: |\n",
"| level | int, str or list | The name/indices of the keys to be reset within the table. | None |\n",
"| drop | Boolean | Should remaining key columns be removed from the table post index resetting. | False |\n",
"| inplace | Boolean | Not Yet Implemented | False |\n",
"| col_level | int or str | Not Yet Implemented | 0 |\n",
"| col_fill | object | Not Yet Implemented | '' |\n",
"| allow_duplicates | Boolean | Can duplicate columns be created | False |\n",
"| names | str or list | Not Yet Implemented | None |\n",
"\n",
"\n",
"**Returns:**\n",
"\n",
"| Type | Description |\n",
"| :-: | :-: |\n",
"| Dataframe | The dataframe with table updated following index reset request |\n",
"\n",
"**Examples:**\n",
"\n",
"Generate data to be used for index resetting "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "05f5d858",
"metadata": {},
"outputs": [],
"source": [
"N = 1000\n",
"qtab = kx.Table(data = {\n",
" 'x0': kx.random.random(N, ['a', 'b', 'c']),\n",
" 'x1': kx.random.random(N, ['d', 'e', 'f']),\n",
" 'x2': kx.random.random(N, ['g', 'h', 'i']),\n",
" 'y0': kx.random.random(N, 10.0),\n",
" 'y1': kx.random.random(N, 10.0),\n",
" 'y2': kx.random.random(N, kx.GUIDAtom.null)\n",
" }).set_index(['x0', 'x1', 'x2'])\n",
"qtab"
]
},
{
"cell_type": "markdown",
"id": "ac9a7e94",
"metadata": {},
"source": [
"Resetting the index of the table will result in original index columns being added to the table directly"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "35f78f09",
"metadata": {},
"outputs": [],
"source": [
"qtab.reset_index()"
]
},
{
"cell_type": "markdown",
"id": "ea62a377",
"metadata": {},
"source": [
"Reset the index adding a specified named column to the table"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c136c0f7",
"metadata": {},
"outputs": [],
"source": [
"qtab.reset_index('x0')"
]
},
{
"cell_type": "markdown",
"id": "4a4223bb",
"metadata": {},
"source": [
"Reset the index using multiple named columns"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "be677606",
"metadata": {},
"outputs": [],
"source": [
"qtab.reset_index(['x0', 'x1'])"
]
},
{
"cell_type": "markdown",
"id": "535841af",
"metadata": {},
"source": [
"Reset the index specifying the column `number` which is to be added to the table"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b3e6bda0",
"metadata": {},
"outputs": [],
"source": [
"qtab.reset_index(0)"
]
},
{
"cell_type": "markdown",
"id": "80719030",
"metadata": {},
"source": [
"Reset the index specifying multiple numbered columns"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fab2e4e7",
"metadata": {},
"outputs": [],
"source": [
"qtab.reset_index([0, 2])"
]
},
{
"cell_type": "markdown",
"id": "ed82d445",
"metadata": {},
"source": [
"Drop index columns from table"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "945b8293",
"metadata": {},
"outputs": [],
"source": [
"qtab.reset_index(drop=True)"
]
},
{
"cell_type": "markdown",
"id": "db72bcbb",
"metadata": {},
"source": [
"Drop specified key columns from table"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b9646f1d",
"metadata": {},
"outputs": [],
"source": [
"qtab.reset_index('x0', drop=True)"
]
},
{
"cell_type": "markdown",
"id": "2201d826",
"metadata": {},
"source": [
"### Table.set_index()\n",
"\n",
"```\n",
"Table.set_index(\n",
" keys,\n",
" drop=True,\n",
" append=False,\n",
" inplace=False,\n",
" verify_integrity=False,\n",
")\n",
"```\n",
"\n",
"Add index/indexes to a Table/KeyedTable.\n",
"\n",
"**Parameters:**\n",
"\n",
"| Name | Type | Description | Default |\n",
"| :--------------: | :--: | :----------------------------------------------------------------------- | :--------: |\n",
"| keys | Union[Symbol/SymbolVector/Table] | The key(s) or data to key on | _required_ |\n",
"| drop | bool | Not Yet Implemented | True |\n",
"| append | bool | Whether to append columns to existing index. | False | \n",
"| inplace | bool | Not Yet Implemented | False |\n",
"| verify_integrity | bool | Check the new index for duplicates | False | \n",
"\n",
"**Returns:**\n",
"\n",
"| Type | Description |\n",
"| :--------: | :--------------------------------------------- |\n",
"| KeyedTable | The resulting table after the index is applied |\n",
"\n",
"**Examples:**\n",
"\n",
"Adding indexes:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e2ef05c3",
"metadata": {},
"outputs": [],
"source": [
"N = 10\n",
"tab = kx.Table(data={\n",
" 'sym': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),\n",
" 'price': 2.5 - kx.random.random(N, 5.0),\n",
" 'traded': 10 - kx.random.random(N, 20),\n",
" 'hold': kx.random.random(N, False)\n",
" })"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f561efd4",
"metadata": {},
"outputs": [],
"source": [
"#Setting a single index\n",
"tab.set_index('sym')"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "66f9b964",
"metadata": {},
"outputs": [],
"source": [
"#Setting multipe indexes\n",
"tab.set_index(['sym', 'traded'])"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "00dda488",
"metadata": {},
"outputs": [],
"source": [
"#Pass a table as index (lengths must match)\n",
"status = kx.q('{select movement from ungroup select movement:`down`up 0<=deltas price by sym from x}',tab)\n",
"tab.set_index(status)"
]
},
{
"cell_type": "markdown",
"id": "965ef63a",
"metadata": {},
"source": [
"Appending:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cf53a132",
"metadata": {},
"outputs": [],
"source": [
"#Default is false - previous index 'sym' deleted and replaced by 'hold'\n",
"tab.set_index('sym').set_index('hold')"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "19f88bde",
"metadata": {},
"outputs": [],
"source": [
"#append= True will retain 'sym' index and add 'hold' as second index\n",
"tab.set_index('sym').set_index('hold', append= True)"
]
},
{
"cell_type": "markdown",
"id": "7d605454",
"metadata": {},
"source": [
"Verify Integrity:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "63c810f0",
"metadata": {},
"outputs": [],
"source": [
"#Will allow duplicates in index:\n",
"tab.set_index('sym')"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "266dbc68",
"metadata": {},
"outputs": [],
"source": [
"#Will error as 'sym' has duplicates\n",
"try:\n",
" tab.set_index('sym', verify_integrity= True)\n",
"except kx.QError as e:\n",
" print(f'Caught Error: {e}')"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}