Python API Client

This section contains the API references for the Python client to KDB.AI. For example usage, see the Quickstart Guide.

`Session`

Session represents a connection to a KDB.AI instance.

Parameters:

Name	Type	Description	Default
`api_key`	`str`	API Key to be used for authentication.	`None`
`endpoint`	`str`	Server endpoint to connect to.	`'http://localhost:8082'`

Example

Open a session on KDB.AI Cloud with an api key:

session = Session(endpoint='YOUR_INSTANCE_ENDPOINT', api_key='YOUR_API_KEY')

Open a session on a custom KDB.AI instance on http://localhost:8082:

session = kdbai.Session(endpoint='http://localhost:8082')

`list`

Retrieve the list of tables.

Returns:

Type	Description
`List[str]`	A list of strings with the names of the existing tables.

Example

session.list()
["trade", "quote"]

`table`

Retrieve an existing table which was created in the previous session.

Parameters:

Name	Type	Description	Default
`name`	`str`	Name of the table to retrieve.	required

Returns:

Type	Description
`Table`	A `Table` object representing the KDB.AI table.

Example

Retrieve the trade table:

session1 = kdbai.Session(endpoint='http://localhost:8082') # Previous session
table1 = session1.create_table('trade1', schema)           # Create table 'trade1'

session2 = kdbai.Session(endpoint='http://localhost:8082') # Current session
table2 = session2.table("trade1")                          # Retrieve table 'trade1'

`create_table`

Create a table with a schema

Parameters:

Name	Type	Description	Default
`name`	`str`	Name of the table to create.	required
`schema`	`dict`	Schema of the table to create. This schema must contain a list of columns. All columns must have either a `pytype` or a `qtype` specified except the column of vectors. One column of vector embeddings may also have a `vectorIndex` attribute with the configuration of the index for similarity search - this column is implicitly an array of `float32`.	required

Returns:

Type	Description
`Table`	A newly created `Table` object based on the schema.

Raises:

Type	Description
`KDBAIException`	Raised when a error happens during the creation of the table.

Example Flat Index

schema = {'columns': [{'name': 'id', 'pytype': 'str'},
                      {'name': 'tag', 'pytype': 'str'},
                      {'name': 'text', 'pytype': 'bytes'},
                      {'name': 'embeddings',
                       'vectorIndex': {'dims': 1536, 'metric': 'L2', 'type': 'flat'}}]}
table = session.create_table('documents', schema)

Example IVF Index

schema = {'columns': [{'name': 'id', 'pytype': 'str'},
                      {'name': 'tag', 'pytype': 'str'},
                      {'name': 'text', 'pytype': 'bytes'},
                      {'name': 'embeddings',
                       'vectorIndex': {'trainingVectors': 1000,
                                       'metric': 'CS',
                                       'type': 'ivf',
                                       'nclusters': 10}}]}
table = session.create_table('documents', schema)

Example IVFPQ Index

schema = {'columns': [{'name': 'id', 'pytype': 'str'},
                      {'name': 'tag', 'pytype': 'str'},
                      {'name': 'text', 'pytype': 'bytes'},
                      {'name': 'embeddings',
                       'vectorIndex': {'trainingVectors': 5000,
                                       'metric': 'L2',
                                       'type': 'ivfpq',
                                       'nclusters': 50,
                                       'nsplits': 8,
                                       'nbits': 8}}]}
table = session.create_table('documents', schema)

Example HNSW Index

```python schema = {'columns': [{'name': 'id', 'pytype': 'str'}, {'name': 'tag', 'pytype': 'str'}, {'name': 'text', 'pytype': 'bytes'}, {'name': 'embeddings', 'vectorIndex': {'dims': 1536, 'metric': 'IP', 'type': 'hnsw', 'efConstruction' : 8, 'M': 8}}]} table = session.create_table('documents', schema)

Example Sparse Index

```python schema = {'columns': [{'name': 'id', 'pytype': 'str'}, {'name': 'tag', 'pytype': 'str'}, {'name': 'text', 'pytype': 'bytes'}, {'name': 'embeddings', 'sparseIndex': {'k': 1.25, 'b': 0.75}}]} table = session.create_table('documents', schema)

Example Flat + Sparse Indexes:

schema = {'columns': [{'name': 'id', 'pytype': 'str'},
                      {'name': 'tag', 'pytype': 'str'},
                      {'name': 'text', 'pytype': 'bytes'},
                      {'name': 'denseCol',
                       'vectorIndex': {'dims': 1536,
                                       'metric': 'L2',
                                       'type': 'flat'}},
                      {'name': 'sparseCol',
                       'sparseIndex': {'k': 1.25,
                                       'b': 0.75}}]}
table = session.create_table('documents', schema)

`Table`

KDB.AI table.

Table object shall be created with session.create_table(...) or retrieved with session.table(...). This constructor shall not be used directly.

`schema`

Retrieve the schema of the table.

Raises:

Type	Description
`KDBAIException`	Raised when an error occurs during schema retrieval

Returns:

Type	Description
`Dict`	A `dict` containing the table name and the list of column names and appropriate numpy datatypes.

Example

table.schema()

{'columns': [{'name': 'id', 'pytype': 'str', 'qtype': 'symbol'},
              {'name': 'tag', 'pytype': 'str', 'qtype': 'symbol'},
              {'name': 'text', 'pytype': 'bytes', 'qtype': 'string'},
              {'name': 'embeddings',
               'pytype': 'float32',
               'qtype': 'reals',
               'vectorIndex': {'dims': 1536, 'metric': 'L2', 'type': 'flat'}}]}

`train`

Train the index (IVF and IVFPQ only).

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Pandas dataframe with column names/types matching the target table.	required
`warn`	`bool`	If True, display a warning when `data` has a trivial which will be dropped before training.	`True`

Returns:

Type	Description
`str`	A `string` containing the status after training

Examples:

from datetime import timedelta
from datetime import datetime

ROWS = 50
DIMS = 10

data = {
    "time": [timedelta(microseconds=np.random.randint(0, int(1e10))) for _ in range(ROWS)],
    "sym": [f"sym_{np.random.randint(0, 999)}" for _ in range(ROWS)],
    "realTime": [datetime.utcnow() for _ in range(ROWS)],
    "price": [np.random.rand(DIMS).astype(np.float32) for _ in range(ROWS)],
    "size": [np.random.randint(1, 100) for _ in range(ROWS)],
}
df = pd.DataFrame(data)
table.train(df)

Raises:

Type	Description
`KDBAIException`	Raised when an error occurs during training.

`insert`

Insert data into the table.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Pandas dataframe with column names/types matching the target table.	required
`warn`	`bool`	If True, display a warning when `data` has a trivial which will be dropped before insertion.	`True`

Returns:

Type	Description
`bool`	A boolean which is True if the insertion was successful.

Examples:

ROWS = 50
DIMS = 10

data = {
    "time": [timedelta(microseconds=np.random.randint(0, int(1e10))) for _ in range(ROWS)],
    "sym": [f"sym_{np.random.randint(0, 999)}" for _ in range(ROWS)],
    "realTime": [datetime.utcnow() for _ in range(ROWS)],
    "price": [np.random.rand(DIMS).astype(np.float32) for _ in range(ROWS)],
    "size": [np.random.randint(1, 100) for _ in range(ROWS)],
}
df = pd.DataFrame(data)
table.insert(df)

Raises:

Type	Description
`KDBAIException`	Raised when an error occurs during insert.

`query`

Query data from the table.

Parameters:

Name	Type	Description	Default
`filter`	`Optional[List[list]]`	A list of filter conditions as triplets in the following format: `[['function', 'column name', 'parameter'], ... ]` See all filter operators here	`None`
`group_by`	`Optional[str]`	A list of column names to use for group by.	`None`
`aggs`	`Optional[List[list]]`	Either a list of column names to select or a list of aggregations to perform as a list of triplers in the following form: `[['output_column', 'agg_function', 'input_column'], ... ]` See all aggregation functions here	`None`
`sort_by`	`Optional[List[str]]`	List of column names to sort on.	`None`
`fill`	`Optional[str]`	This defines how to handle null values. This should be either `'forward'` or `'zero'` or `None`.	`None`

Returns:

Type	Description
`DataFrame`	Pandas dataframe with the query results.

Examples:

table.query(group_by = ['sensorID', 'qual'])
table.query(filter = [['within', 'qual', [0, 2]]])

# Select subset of columns
table.query(aggs=['size'])
table.query(aggs=['size', 'price'])

Raises:

Type	Description
`KDBAIException`	Raised when an error occurs during query.

`search`

Perform similarity search on the table, supports dense or sparse queries.

Parameters:

Name	Type	Description	Default
`vectors`	`List[list] \| List[dict]`	Query vectors for the search.	required
`n`	`int`	Number of neighbours to return.	`1`
`index_options`	`dict`	Index specific options for similarity search.	`None`
`distances`	`str`	Optional name of a column to output the distances. If not specified, __nn_distance will be added as an extra column to the result table.	`None`
`filter`	`Optional[List[list]]`	A list of filter conditions as triplets in the following format: `[['function', 'column name', 'parameter'], ... ]` See all filter operators here	`None`
`group_by`	`Optional[str]`	A list of column names to use for group by.	`None`
`aggs`	`Optional[List[list]]`	Either a list of column names to select or a list of aggregations to perform as a list of triplers in the following form: `[['output_column', 'agg_function', 'input_column'], ... ]` See all aggregation functions here	`None`
`sort_by`	`Optional[List[str]]`	List of column names to sort on.	`None`

Returns:

Type	Description
`List[DataFrame]`	List of Pandas dataframes with one dataframe of matching neighbors for each query vector.

Examples:

#Find the closest neighbour of a single (dense) query vector
table.search(vectors=[[0,0,0,0,0,0,0,0,0,0]], n=1)

#Find the closest neighbour of a single (sparse) query vector
table.search(vectors=[{101:1,4578:1,102:1}], n=1)

#Find the 3 closest neighbours of 2 query vectors
table.search(vectors=[[0,0,0,0,0,0,0,0,0,0], [1,1,1,1,1,1,1,1,1,1]], n=3)

# With aggregation and sorting
table.search(vectors=[[0,0,0,0,0,0,0,0,0,0],[1,1,1,1,1,1,1,1,1,1]],
n=3,
aggs=[['sumSize','sum','size']],
group_by=['sym'],
sort_by=['sumSize'])

# Returns a subset of columns for each match
table.search(vectors=[[0,0,0,0,0,0,0,0,0,0],[1,1,1,1,1,1,1,1,1,1]], n=3, aggs=['size'])
table.search(vectors=[[0,0,0,0,0,0,0,0,0,0],[1,1,1,1,1,1,1,1,1,1]], n=3, aggs=['size', 'price'])

# Filter
table.search(vectors=[[0,0,0,0,0,0,0,0,0,0],[1,1,1,1,1,1,1,1,1,1]],
n=3,
filter=[['within','size',(5,999)],['like','sym','AAP*']])

# Customized distance name
table.search(vectors=[[0,0,0,0,0,0,0,0,0,0],[1,1,1,1,1,1,1,1,1,1]],
n=3,
distances='myDist')

# Index options
table.search(vectors=[[0,0,0,0,0,0,0,0,0,0],[1,1,1,1,1,1,1,1,1,1]],n=3,index_options=dict(efSearch=512))
table.search(vectors=[[0,0,0,0,0,0,0,0,0,0],[1,1,1,1,1,1,1,1,1,1]],n=3,index_options=dict(clusters=16))

Raises:

Type	Description
`KDBAIException`	Raised when an error occurs during search.

`hybrid_search`

Perform hybrid search on the table.

Parameters:

Name	Type	Description	Default
`dense_vectors`	`list of lists`	Dense query vectors for the search.	required
`sparse_vectors`	`list of dicts`	Sparse query vectors for the search.	required
`n`	`int`	Number of neighbours to return.	`1`
`dense_index_options`	`dict`	Index specific options for similarity search.	`None`
`sparse_index_options`	`dict`	Index specific options for similarity search.	`None`
`alpha`	`float`	Weight of strategy in [0,1], 0 sparse vs 1 dense	`0.5`
`distances`	`str`	Optional name of a column to output the distances. If not specified, __nn_distance will be added as an extra column to the result table.	`None`
`filter`	`Optional[List[list]]`	A list of filter conditions as triplets in the following format: `[['function', 'column name', 'parameter'], ... ]` See all filter operators here	`None`
`group_by`	`Optional[str]`	A list of column names to use for group by.	`None`
`aggs`	`Optional[List[list]]`	Either a list of column names to select or a list of aggregations to perform as a list of triplers in the following form: `[['output_column', 'agg_function', 'input_column'], ... ]` See all aggregation functions here	`None`
`sort_by`	`Optional[List[str]]`	List of column names to sort on.	`None`

Returns:

Type	Description
`List[DataFrame]`	List of Pandas dataframes with one dataframe of matching neighbors for each query vector.

Raises:

Type	Description
`KDBAIException`	Raised when an error occurs during search.

Examples:

# Find the closest neighbour of a single hybrid query vector
table.hybrid_search(dense_vectors=[[0,0,0,0,0,0,0,0,0,0]],
                    sparse_vectors=[{101:1,4578:1,102:1}],
                    n=1)

# Find the 3 closest neighbours for 2 hybrid queries
table.hybrid_search(dense_vectors=[[0,0,0,0,0,0,0,0,0,0],[1,1,1,1,1,1,1,1,1,1]],
                    sparse_vectors=[{101:1,4578:1,102:1},{101:1,6079:2,102:1}],
                    n=3)

# Weight the sparse leg of the query higher setting alpha = 0.1
table.hybrid_search(dense_vectors=[[0,0,0,0,0,0,0,0,0,0]],
                    sparse_vectors=[{101:1,4578:1,102:1}],
                    alpha=0.1,
                    n=1)

# Filter
table.hybrid_search(dense_vectors=[[0,0,0,0,0,0,0,0,0,0]],
                    sparse_vectors=[{101:1,4578:1,102:1}],
                    n=1,
                    filter=[['within','size',(5,999)],['like','sym','AAP*']])

# Index options
table.hybrid_search(dense_vectors=[[0,0,0,0,0,0,0,0,0,0]],
                    sparse_vectors=[{101:1,4578:1,102:1}],
                    n=1,
                    dense_index_options=dict(efSearch=521),
                    sparse_index_options={'k':1.4,'b':0.78})

# Customized distance name
table.hybrid_search(dense_vectors=[[0,0,0,0,0,0,0,0,0,0]],
                    sparse_vectors=[{101:1,4578:1,102:1}],
                    n=1,
                    distances='myDist')

`drop`

Drop the table.

Returns:

Type	Description
`bool`	A boolean which is True if the table was successfully dropped.

Examples:

table.drop()

Raises:

Type	Description
`KDBAIException`	Raised when an error occurs during the table deletion.

`KDBAIException`

Bases: Exception

KDB.AI exception.