KDB.AI Python API Client

Overview

This is the API reference for the Python client to KDB.AI. For example usage see the Quickstart guide

`Session`

Session represents a connection to a KDB.AI instance.

Parameters:

Name	Type	Description	Default
`api_key`	`str`	API Key to be used for authentication.	`None`
`host`	`str`	Host name or IP address of the KDB.AI server to connect.	`HOST`
`port`	`int`	REST API gateway port on KDB.AI server.	`PORT`
`protocol`	`str`	`http` or `https`.	`PROTOCOL`

Example

Open a session on KDB.AI Cloud with an api key:

session = Session(api_key='YOUR_API_KEY')

Open a session on a custom KDB.AI instance on http://localhost:8082:

session = kdbai.Session(host='localhost', port=8082, protocol='http')

`config`

Retrieve the server configuration.

Returns:

Type	Description
`Dict`	A `dict` containing all metadata of the KDB.AI instance.

`list`

Retrieve the list of tables.

Returns:

Type	Description
`List[str]`	A list of strings with the names of the existing tables.

Example

table.list()
["trade", "quote"]

`table`

Retrieve an existing table.

Parameters:

Name	Type	Description	Default
`name`	`str`	Name of the table to retrieve.	required

Returns:

Type	Description
`Table`	A `Table` object representing the KDB.AI table.

Example

Retrieve the trade table:

table = session.table("trade")

`create_table`

Create a table with a schema

Parameters:

Name	Type	Description	Default
`name`	`str`	Name of the table to create.	required
`schema`	`dict`	Schema of the table to create. This schema must contain a list of columns. All columns must have either a `pytype` or a `qtype` specified except the column of vectors. One column of vector embeddings may also have a `vectorIndex` attribute with the configuration of the index for similarity search - this column is implicitly an array of `float32`.	required

Raises:

Type	Description
`KDBAIException`	Raised when a error happens during the creation of the table.

Example

schema = {'columns': [{'name': 'id', 'pytype': 'str'},
                      {'name': 'tag', 'pytype': 'str'},
                      {'name': 'text', 'pytype': 'bytes'},
                      {'name': 'embeddings',
                       'vectorIndex': {'dims': 1536, 'metric': 'L2', 'type': 'flat'}}]}
table = session.create_table('documents', schema)

`Table`

KDB.AI table.

Table object shall be created with session.create_table(...) or retrieved with session.table(...). This constructor shall not be used directly.

`schema`

Retrieve the schema of the table.

Raises:

Type	Description
`KDBAIException`	Raised when an error occurs during schema retrieval

Returns:

Type	Description
`Dict`	A `dict` containing the table name and the list of column names and appropriate numpy datatypes.

Example

table.schema()

{'columns': [{'name': 'id', 'pytype': 'str', 'qtype': 'symbol'},
              {'name': 'tag', 'pytype': 'str', 'qtype': 'symbol'},
              {'name': 'text', 'pytype': 'bytes', 'qtype': 'string'},
              {'name': 'embeddings',
               'pytype': 'float32',
               'qtype': 'reals',
               'vectorIndex': {'dims': 1536, 'metric': 'L2', 'type': 'flat'}}]}

`train`

Train the index (IVF and IVFPQ only).

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Pandas dataframe with column names/types matching the target table.	required
`warn`	`bool`	If True, display a warning when `data` has a trivial which will be dropped before training.	`True`

Raises:

Type	Description
`KDBAIException`	Raised when an error occurs during training.

`insert`

Insert data into the table.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Pandas dataframe with column names/types matching the target table.	required
`warn`	`bool`	If True, display a warning when `data` has a trivial which will be dropped before insertion.	`True`

Examples:

ROWS = 50 DIMS = 10

data = { "time": [timedelta(microseconds=np.random.randint(0, int(1e10))) for _ in range(ROWS)], "sym": [f"sym_{np.random.randint(0, 999)}" for _ in range(ROWS)], "realTime": [datetime.utcnow() for _ in range(ROWS)], "price": [np.random.rand(DIMS).astype(np.float32) for _ in range(ROWS)], "size": [np.random.randint(1, 100) for _ in range(ROWS)], } df = pd.DataFrame(data) table.insert(df)

Raises:

Type	Description
`KDBAIException`	Raised when an error occurs during insert.

`query`

Query data from the table.

Parameters:

Name	Type	Description	Default
`filter`	`Optional[List[list]]`	A list of filter conditions as triplets in the following format: `[['function', 'column name', 'parameter'], ... ]` See all filter operators here: https://code.kx.com/insights/1.6/api/database/query/get-data.html#supported-filter-functions	`None`
`group_by`	`Optional[str]`	A list of column names to use for group by.	`None`
`aggs`	`Optional[List[list]]`	Either a list of column names to select or a list of aggregations to perform as a list of triplers in the following form: `[['output_column', 'agg_function', 'input_column'], ... ]` See all aggregation functions here: https://code.kx.com/insights/1.6/api/database/query/get-data.html#supported-aggregations	`None`
`sort_by`	`Optional[List[str]]`	List of column names to sort on.	`None`
`fill`	`Optional[str]`	This defines how to handle null values. This should be either `'forward'` or `'zero'` or `None`.	`None`
`start_time`	`Optional[str]`	Start of the time interval to query as an ISO 8601 formatted string (`start_time` included in the time range). `kdbai_client.MIN_DATETIME` corresponds to the minimum datetime supported.	`None`
`end_time`	`Optional[str]`	End of the time interval to query as an ISO 8601 formatted string (`end_time` excluded in the time range). `kdbai_client.MAX_DATETIME` corresponds to the maximum datetime supported	`None`
`input_timezone`	`Optional[str]`	The timezones of start_time and end_time, default is UTC if not specified.	`None`
`output_timezone`	`Optional[str]`	The timezone of output timestamp columns, default is UTC if not specified.	`None`

Examples:

table.query(group_by = ['sensorID', 'qual'])
table.query(filter = [['within', 'qual', [0, 2]]])
table.query(start_time='2000-05-26', end_time='2000-05-27')

Raises:

Type	Description
`KDBAIException`	Raised when an error occurs during query.

Returns:

Type	Description
`DataFrame`	Pandas dataframe with the query results.

`search`

Perform similarity search on the table.

Parameters:

Name	Type	Description	Default
`vectors`	`list of lists`	Query vectors for the search.	required
`n`	`int`	Number of neighbours to return.	`1`
`index_options`	`dict`	Index specific options for similarity search.	`None`
`distances`	`str`	Optional name of a column to output the distances. If not specified, __nn_distance will be added as an extra column to the result table.	`None`
`filter`	`Optional[List[list]]`	A list of filter conditions as triplets in the following format: `[('function', 'column name', 'parameter'), ... ]` See all filter operators here: https://code.kx.com/insights/1.6/api/database/query/get-data.html#supported-filter-functions	`None`
`group_by`	`Optional[str]`	A list of column names to use for group by.	`None`
`aggs`	`Optional[List[list]]`	Either a list of column names to select or a list of aggregations to perform as a list of triplers in the following form: `[('output_column', 'agg_function', 'input_column'), ... ]` See all aggregation functions here: https://code.kx.com/insights/1.6/api/database/query/get-data.html#supported-aggregations	`None`
`sort_by`	`Optional[List[str]]`	List of column names to sort on.	`None`
`start_time`	`Optional[str]`	Start of the time interval to query as an ISO 8601 formatted string (`start_time` included in the time range). `kdbai_client.MIN_DATETIME` corresponds to the minimum datetime supported.	`None`
`end_time`	`Optional[str]`	End of the time interval to query as an ISO 8601 formatted string (`end_time` excluded in the time range). `kdbai_client.MAX_DATETIME` corresponds to the maximum datetime supported	`None`
`input_timezone`	`Optional[str]`	The timezones of start_time and end_time, default is UTC if not specified.	`None`
`output_timezone`	`Optional[str]`	The timezone of output timestamp columns, default is UTC if not specified.	`None`

Examples:

#Find the closest neighbour of a single query vector
table.search(vectors=[[0,0,0,0,0,0,0,0,0,0]], n=1)

#Find the 3 closest neighbours of 2 query vectors
table.search(vectors=[[0,0,0,0,0,0,0,0,0,0], [1,1,1,1,1,1,1,1,1,1]], n=3)

# With aggregation and sorting
table.search(vectors=[[0,0,0,0,0,0,0,0,0,0],[1,1,1,1,1,1,1,1,1,1]],
n=3,
aggs=[('sumSize','sum','size')],
group_by=[('sym')],
sort_by=[('sumSize')])

# Filter
table.search(vectors=[[0,0,0,0,0,0,0,0,0,0],[1,1,1,1,1,1,1,1,1,1]],
n=3,
filter=[('within','size',(5,999)),('like','sym','AAP*')])

# More advanced options
table.search(vectors=[[0,0,0,0,0,0,0,0,0,0],[1,1,1,1,1,1,1,1,1,1]],
n=3,
input_timezone='UTC',
output_timezone='Europe/Zurich',
start_time='2023.08.01D12:13:14.123456789',
end_time='2030.08.30D12:13:14.123456789',
distances='myDist')

Raises:

Type	Description
`KDBAIException`	Raised when an error occurs during search.

Returns:

Type	Description
`List[DataFrame]`	List of Pandas dataframes with one dataframe of matching neighbors for each query vector.

`drop`

Drop the table.

Examples:

table.drop()

Raises:

Type	Description
`KDBAIException`	Raised when an error occurs during the table deletion.

`KDBAIException`

Bases: Exception

KDB.AI exception.