Skip to content

KDB.AI Python API Client

Overview

This is the API reference for the Python client to KDB.AI. For example usage see the Quickstart guide

Session

Session represents a connection to a KDB.AI instance.

Parameters:

Name Type Description Default
api_key str

API Key to be used for authentication.

None
host str

Host name or IP address of the KDB.AI server to connect.

HOST
port int

REST API gateway port on KDB.AI server.

PORT
protocol str

http or https.

PROTOCOL
Example

Open a session on KDB.AI Cloud with an api key:

session = Session(api_key='YOUR_API_KEY')

Open a session on a custom KDB.AI instance on http://localhost:8082:

session = kdbai.Session(host='localhost', port=8082, protocol='http')

config

Retrieve the server configuration.

Returns:

Type Description
Dict

A dict containing all metadata of the KDB.AI instance.

list

Retrieve the list of tables.

Returns:

Type Description
List[str]

A list of strings with the names of the existing tables.

Example
table.list()
["trade", "quote"]

table

Retrieve an existing table.

Parameters:

Name Type Description Default
name str

Name of the table to retrieve.

required

Returns:

Type Description
Table

A Table object representing the KDB.AI table.

Example

Retrieve the trade table:

table = session.table("trade")

create_table

Create a table with a schema

Parameters:

Name Type Description Default
name str

Name of the table to create.

required
schema dict

Schema of the table to create. This schema must contain a list of columns. All columns must have either a pytype or a qtype specified except the column of vectors. One column of vector embeddings may also have a vectorIndex attribute with the configuration of the index for similarity search - this column is implicitly an array of float32.

required

Raises:

Type Description
KDBAIException

Raised when a error happens during the creation of the table.

Example
schema = {'columns': [{'name': 'id', 'pytype': 'str'},
                      {'name': 'tag', 'pytype': 'str'},
                      {'name': 'text', 'pytype': 'bytes'},
                      {'name': 'embeddings',
                       'vectorIndex': {'dims': 1536, 'metric': 'L2', 'type': 'flat'}}]}
table = session.create_table('documents', schema)

Table

KDB.AI table.

Table object shall be created with session.create_table(...) or retrieved with session.table(...). This constructor shall not be used directly.

schema

Retrieve the schema of the table.

Raises:

Type Description
KDBAIException

Raised when an error occurs during schema retrieval

Returns:

Type Description
Dict

A dict containing the table name and the list of column names and appropriate numpy datatypes.

Example
table.schema()

{'columns': [{'name': 'id', 'pytype': 'str', 'qtype': 'symbol'},
              {'name': 'tag', 'pytype': 'str', 'qtype': 'symbol'},
              {'name': 'text', 'pytype': 'bytes', 'qtype': 'string'},
              {'name': 'embeddings',
               'pytype': 'float32',
               'qtype': 'reals',
               'vectorIndex': {'dims': 1536, 'metric': 'L2', 'type': 'flat'}}]}

train

Train the index (IVF and IVFPQ only).

Parameters:

Name Type Description Default
data DataFrame

Pandas dataframe with column names/types matching the target table.

required
warn bool

If True, display a warning when data has a trivial which will be dropped before training.

True

Raises:

Type Description
KDBAIException

Raised when an error occurs during training.

insert

Insert data into the table.

Parameters:

Name Type Description Default
data DataFrame

Pandas dataframe with column names/types matching the target table.

required
warn bool

If True, display a warning when data has a trivial which will be dropped before insertion.

True

Examples:

ROWS = 50 DIMS = 10

data = { "time": [timedelta(microseconds=np.random.randint(0, int(1e10))) for _ in range(ROWS)], "sym": [f"sym_{np.random.randint(0, 999)}" for _ in range(ROWS)], "realTime": [datetime.utcnow() for _ in range(ROWS)], "price": [np.random.rand(DIMS).astype(np.float32) for _ in range(ROWS)], "size": [np.random.randint(1, 100) for _ in range(ROWS)], } df = pd.DataFrame(data) table.insert(df)

Raises:

Type Description
KDBAIException

Raised when an error occurs during insert.

query

Query data from the table.

Parameters:

Name Type Description Default
filter Optional[List[list]]

A list of filter conditions as triplets in the following format: [['function', 'column name', 'parameter'], ... ] See all filter operators here: https://code.kx.com/insights/1.6/api/database/query/get-data.html#supported-filter-functions

None
group_by Optional[str]

A list of column names to use for group by.

None
aggs Optional[List[list]]

Either a list of column names to select or a list of aggregations to perform as a list of triplers in the following form: [['output_column', 'agg_function', 'input_column'], ... ] See all aggregation functions here: https://code.kx.com/insights/1.6/api/database/query/get-data.html#supported-aggregations

None
sort_by Optional[List[str]]

List of column names to sort on.

None
fill Optional[str]

This defines how to handle null values. This should be either 'forward' or 'zero' or None.

None
start_time Optional[str]

Start of the time interval to query as an ISO 8601 formatted string (start_time included in the time range). kdbai_client.MIN_DATETIME corresponds to the minimum datetime supported.

None
end_time Optional[str]

End of the time interval to query as an ISO 8601 formatted string (end_time excluded in the time range). kdbai_client.MAX_DATETIME corresponds to the maximum datetime supported

None
input_timezone Optional[str]

The timezones of start_time and end_time, default is UTC if not specified.

None
output_timezone Optional[str]

The timezone of output timestamp columns, default is UTC if not specified.

None

Examples:

table.query(group_by = ['sensorID', 'qual'])
table.query(filter = [['within', 'qual', [0, 2]]])
table.query(start_time='2000-05-26', end_time='2000-05-27')

Raises:

Type Description
KDBAIException

Raised when an error occurs during query.

Returns:

Type Description
DataFrame

Pandas dataframe with the query results.

search

Perform similarity search on the table.

Parameters:

Name Type Description Default
vectors list of lists

Query vectors for the search.

required
n int

Number of neighbours to return.

1
index_options dict

Index specific options for similarity search.

None
distances str

Optional name of a column to output the distances. If not specified, __nn_distance will be added as an extra column to the result table.

None
filter Optional[List[list]]

A list of filter conditions as triplets in the following format: [('function', 'column name', 'parameter'), ... ] See all filter operators here: https://code.kx.com/insights/1.6/api/database/query/get-data.html#supported-filter-functions

None
group_by Optional[str]

A list of column names to use for group by.

None
aggs Optional[List[list]]

Either a list of column names to select or a list of aggregations to perform as a list of triplers in the following form: [('output_column', 'agg_function', 'input_column'), ... ] See all aggregation functions here: https://code.kx.com/insights/1.6/api/database/query/get-data.html#supported-aggregations

None
sort_by Optional[List[str]]

List of column names to sort on.

None
start_time Optional[str]

Start of the time interval to query as an ISO 8601 formatted string (start_time included in the time range). kdbai_client.MIN_DATETIME corresponds to the minimum datetime supported.

None
end_time Optional[str]

End of the time interval to query as an ISO 8601 formatted string (end_time excluded in the time range). kdbai_client.MAX_DATETIME corresponds to the maximum datetime supported

None
input_timezone Optional[str]

The timezones of start_time and end_time, default is UTC if not specified.

None
output_timezone Optional[str]

The timezone of output timestamp columns, default is UTC if not specified.

None

Examples:

#Find the closest neighbour of a single query vector
table.search(vectors=[[0,0,0,0,0,0,0,0,0,0]], n=1)

#Find the 3 closest neighbours of 2 query vectors
table.search(vectors=[[0,0,0,0,0,0,0,0,0,0], [1,1,1,1,1,1,1,1,1,1]], n=3)

# With aggregation and sorting
table.search(vectors=[[0,0,0,0,0,0,0,0,0,0],[1,1,1,1,1,1,1,1,1,1]],
n=3,
aggs=[('sumSize','sum','size')],
group_by=[('sym')],
sort_by=[('sumSize')])

# Filter
table.search(vectors=[[0,0,0,0,0,0,0,0,0,0],[1,1,1,1,1,1,1,1,1,1]],
n=3,
filter=[('within','size',(5,999)),('like','sym','AAP*')])

# More advanced options
table.search(vectors=[[0,0,0,0,0,0,0,0,0,0],[1,1,1,1,1,1,1,1,1,1]],
n=3,
input_timezone='UTC',
output_timezone='Europe/Zurich',
start_time='2023.08.01D12:13:14.123456789',
end_time='2030.08.30D12:13:14.123456789',
distances='myDist')

Raises:

Type Description
KDBAIException

Raised when an error occurs during search.

Returns:

Type Description
List[DataFrame]

List of Pandas dataframes with one dataframe of matching neighbors for each query vector.

drop

Drop the table.

Examples:

table.drop()

Raises:

Type Description
KDBAIException

Raised when an error occurs during the table deletion.

KDBAIException

Bases: Exception

KDB.AI exception.