Skip to content

Python API Client

This section contains the API references for the Python client to KDB.AI. For example usage, see the Quickstart Guide.

Note

Before you start, ensure you have the following installed on your machine:

Note: Index Options

The argument index_option of function search() or the argument dense_index_options of function hybrid_search() are the index specific options for similarity search. For example, efSearch can be specified for HNSW indexes, while clusters can be specified for IVF/IVFPQ indexes.

For details of the usage of index_option, see the How to use an Index in KDB.AI page.

Session

Session represents a connection to a KDB.AI instance.

Parameters:

Name Type Description Default
api_key str

API Key to be used for authentication.

None
endpoint str

Server endpoint to connect to.

'http://localhost:8082'
Example

Open a session on KDB.AI Cloud with an api key:

session = Session(endpoint='YOUR_INSTANCE_ENDPOINT', api_key='YOUR_API_KEY')

Open a session on a custom KDB.AI instance on http://localhost:8082:

session = kdbai.api.Session(endpoint='http://localhost:8082')

version

Retrieve version information from server

list

Retrieve the list of tables.

Returns:

Type Description
List[str]

A list of strings with the names of the existing tables.

Example
session.list()
["trade", "quote"]

table

Retrieve an existing table which was created in the previous session.

Parameters:

Name Type Description Default
name str

Name of the table to retrieve.

required

Returns:

Type Description
Table

A Table object representing the KDB.AI table.

Example

Retrieve the trade table:

session1 = kdbai.Session(endpoint='http://localhost:8082') # Previous session
table1 = session1.create_table('trade1', schema)           # Create table 'trade1'

session2 = kdbai.Session(endpoint='http://localhost:8082') # Current session
table2 = session2.table("trade1")                          # Retrieve table 'trade1'

create_table

Create a table with a schema

Parameters:

Name Type Description Default
name str

Name of the table to create.

required
schema dict

Schema of the table to create. This schema must contain a list of columns. All columns must have a pytype specified except the column of vectors. One column of vector embeddings may also have a vectorIndex attribute with the configuration of the index for similarity search - this column is implicitly an array of float32.

required

Returns:

Type Description
Table

A newly created Table object based on the schema.

Raises:

Type Description
KDBAIException

Raised when a error happens during the creation of the table.

Example Flat Index
schema = {'columns': [{'name': 'id', 'pytype': 'str'},
                      {'name': 'tag', 'pytype': 'str'},
                      {'name': 'text', 'pytype': 'bytes'},
                      {'name': 'embeddings',
                       'vectorIndex': {'dims': 1536, 'metric': 'L2', 'type': 'flat'}}]}
table = session.create_table('documents', schema)
Example qFlat Index
schema = {'columns': [{'name': 'id', 'pytype': 'str'},
                      {'name': 'tag', 'pytype': 'str'},
                      {'name': 'text', 'pytype': 'bytes'},
                      {'name': 'embeddings',
                       'vectorIndex': {'dims': 1536, 'metric': 'L2', 'type': 'qFlat'}}]}
table = session.create_table('documents', schema)
Example IVF Index
schema = {'columns': [{'name': 'id', 'pytype': 'str'},
                      {'name': 'tag', 'pytype': 'str'},
                      {'name': 'text', 'pytype': 'bytes'},
                      {'name': 'embeddings',
                       'vectorIndex': {'trainingVectors': 1000,
                                       'metric': 'CS',
                                       'type': 'ivf',
                                       'nclusters': 10}}]}
table = session.create_table('documents', schema)
Example IVFPQ Index
schema = {'columns': [{'name': 'id', 'pytype': 'str'},
                      {'name': 'tag', 'pytype': 'str'},
                      {'name': 'text', 'pytype': 'bytes'},
                      {'name': 'embeddings',
                       'vectorIndex': {'trainingVectors': 5000,
                                       'metric': 'L2',
                                       'type': 'ivfpq',
                                       'nclusters': 50,
                                       'nsplits': 8,
                                       'nbits': 8}}]}
table = session.create_table('documents', schema)
Example HNSW Index
schema = {'columns': [{'name': 'id', 'pytype': 'str'},
                      {'name': 'tag', 'pytype': 'str'},
                      {'name': 'text', 'pytype': 'bytes'},
                      {'name': 'embeddings',
                       'vectorIndex': {'dims': 1536,
                                       'metric': 'IP',
                                       'type': 'hnsw',
                                       'efConstruction' : 8, 'M': 8}}]}
table = session.create_table('documents', schema)
Example Sparse Index
schema = {'columns': [{'name': 'id', 'pytype': 'str'},
                      {'name': 'tag', 'pytype': 'str'},
                      {'name': 'text', 'pytype': 'bytes'},
                      {'name': 'embeddings',
                       'sparseIndex': {'k': 1.25,
                                       'b': 0.75}}]}
table = session.create_table('documents', schema)
Example Flat with Sparse Index
schema = {'columns': [{'name': 'id', 'pytype': 'str'},
                      {'name': 'tag', 'pytype': 'str'},
                      {'name': 'text', 'pytype': 'bytes'},
                      {'name': 'denseCol',
                       'vectorIndex': {'dims': 1536,
                                       'metric': 'L2',
                                       'type': 'flat'}},
                      {'name': 'sparseCol',
                       'sparseIndex': {'k': 1.25,
                                       'b': 0.75}}]}
table = session.create_table('documents', schema)

Table

KDB.AI table.

Table object shall be created with session.create_table(...) or retrieved with session.table(...). This constructor shall not be used directly.

schema

Retrieve the schema of the table.

Raises:

Type Description
KDBAIException

Raised when an error occurs during schema retrieval

Returns:

Type Description
Dict

A dict containing the table name and the list of column names and appropriate numpy datatypes.

Example
table.schema()

{'columns': [{'name': 'id', 'pytype': 'str'},
              {'name': 'tag', 'pytype': 'str'},
              {'name': 'text', 'pytype': 'bytes'},
              {'name': 'embeddings',
               'pytype': 'float32',
               'vectorIndex': {'dims': 1536, 'metric': 'L2', 'type': 'flat'}}]}

train

Train the index (IVF and IVFPQ only).

Parameters:

Name Type Description Default
data DataFrame

Pandas dataframe with column names/types matching the target table.

required
warn bool

If True, display a warning when data has a trivial which will be dropped before training.

True

Returns:

Type Description
str

A string containing the status after training

Examples:

from datetime import timedelta
from datetime import datetime

ROWS = 50
DIMS = 10

data = {
    "time": [timedelta(microseconds=np.random.randint(0, int(1e10))) for _ in range(ROWS)],
    "sym": [f"sym_{np.random.randint(0, 999)}" for _ in range(ROWS)],
    "realTime": [datetime.utcnow() for _ in range(ROWS)],
    "price": [np.random.rand(DIMS).astype(np.float32) for _ in range(ROWS)],
    "size": [np.random.randint(1, 100) for _ in range(ROWS)],
}
df = pd.DataFrame(data)
table.train(df)

Raises:

Type Description
KDBAIException

Raised when an error occurs during training.

insert

Insert data into the table.

Parameters:

Name Type Description Default
data DataFrame

Pandas dataframe with column names/types matching the target table.

required
warn bool

If True, display a warning when data has a trivial which will be dropped before insertion.

True

Returns:

Type Description
bool

A boolean which is True if the insertion was successful.

Examples:

ROWS = 50
DIMS = 10

data = {
    "time": [timedelta(microseconds=np.random.randint(0, int(1e10))) for _ in range(ROWS)],
    "sym": [f"sym_{np.random.randint(0, 999)}" for _ in range(ROWS)],
    "realTime": [datetime.utcnow() for _ in range(ROWS)],
    "price": [np.random.rand(DIMS).astype(np.float32) for _ in range(ROWS)],
    "size": [np.random.randint(1, 100) for _ in range(ROWS)],
}
df = pd.DataFrame(data)
table.insert(df)

Raises:

Type Description
KDBAIException

Raised when an error occurs during insert.

query

Query data from the table.

Parameters:

Name Type Description Default
filter Optional[List[list]]

A list of filter conditions as triplets in the following format: [['function', 'column name', 'parameter'], ... ] See all filter operators here

None
group_by Optional[str]

A list of column names to use for group by.

None
aggs Optional[List[list]]

Either a list of column names to select or a list of aggregations to perform as a list of triplers in the following form: [['output_column', 'agg_function', 'input_column'], ... ] See all aggregation functions here

None
sort_by Optional[List[str]]

List of column names to sort on.

None
fill Optional[str]

This defines how to handle null values. This should be either 'forward' or 'zero' or None.

None

Returns:

Type Description
DataFrame

Pandas dataframe with the query results.

Examples:

table.query(group_by = ['sensorID', 'qual'])
table.query(filter = [['within', 'qual', [0, 2]]])

# Select subset of columns
table.query(aggs=['size'])
table.query(aggs=['size', 'price'])

Raises:

Type Description
KDBAIException

Raised when an error occurs during query.

search

Perform similarity search on the table, supports dense or sparse queries.

Parameters:

Name Type Description Default
vectors List[list] | List[dict]

Query vectors for the search.

required
n int

Number of neighbours to return.

5
index_options dict

Index specific options for similarity search.

None
distances str

Optional name of a column to output the distances. If not specified, __nn_distance will be added as an extra column to the result table.

None
filter Optional[List[list]]

A list of filter conditions as triplets in the following format: [['function', 'column name', 'parameter'], ... ] See all filter operators here

None
group_by Optional[str]

A list of column names to use for group by.

None
aggs Optional[List[list]]

Either a list of column names to select or a list of aggregations to perform as a list of triplers in the following form: [['output_column', 'agg_function', 'input_column'], ... ] See all aggregation functions here

None
sort_by Optional[List[str]]

List of column names to sort on.

None

Returns:

Type Description
List[DataFrame]

List of Pandas dataframes with one dataframe of matching neighbors for each query vector.

Examples:

#Find the closest neighbour of a single (dense) query vector
table.search(vectors=[[0,0,0,0,0,0,0,0,0,0]], n=1)

#Find the closest neighbour of a single (sparse) query vector
table.search(vectors=[{101:1,4578:1,102:1}], n=1)

#Find the 3 closest neighbours of 2 query vectors
table.search(vectors=[[0,0,0,0,0,0,0,0,0,0], [1,1,1,1,1,1,1,1,1,1]], n=3)

# With aggregation and sorting
table.search(vectors=[[0,0,0,0,0,0,0,0,0,0],[1,1,1,1,1,1,1,1,1,1]],
n=3,
aggs=[['sumSize','sum','size']],
group_by=['sym'],
sort_by=['sumSize'])

# Returns a subset of columns for each match
table.search(vectors=[[0,0,0,0,0,0,0,0,0,0],[1,1,1,1,1,1,1,1,1,1]], n=3, aggs=['size'])
table.search(vectors=[[0,0,0,0,0,0,0,0,0,0],[1,1,1,1,1,1,1,1,1,1]], n=3, aggs=['size', 'price'])

# Filter
table.search(vectors=[[0,0,0,0,0,0,0,0,0,0],[1,1,1,1,1,1,1,1,1,1]],
n=3,
filter=[['within','size',(5,999)],['like','sym','AAP*']])

# Customized distance name
table.search(vectors=[[0,0,0,0,0,0,0,0,0,0],[1,1,1,1,1,1,1,1,1,1]],
n=3,
distances='myDist')

# Index options
table.search(vectors=[[0,0,0,0,0,0,0,0,0,0],[1,1,1,1,1,1,1,1,1,1]],n=3,index_options=dict(efSearch=512))
table.search(vectors=[[0,0,0,0,0,0,0,0,0,0],[1,1,1,1,1,1,1,1,1,1]],n=3,index_options=dict(clusters=16))

Raises:

Type Description
KDBAIException

Raised when an error occurs during search.

Perform hybrid search on the table.

Parameters:

Name Type Description Default
dense_vectors list of lists

Dense query vectors for the search.

required
sparse_vectors list of dicts

Sparse query vectors for the search.

required
n int

Number of neighbours to return.

5
dense_index_options dict

Index specific options for similarity search.

None
sparse_index_options dict

Index specific options for similarity search.

None
alpha float

Weight of strategy in [0,1], 0 sparse vs 1 dense

0.5
distances str

Optional name of a column to output the distances. If not specified, __nn_distance will be added as an extra column to the result table.

None
filter Optional[List[list]]

A list of filter conditions as triplets in the following format: [['function', 'column name', 'parameter'], ... ] See all filter operators here

None
group_by Optional[str]

A list of column names to use for group by.

None
aggs Optional[List[list]]

Either a list of column names to select or a list of aggregations to perform as a list of triplers in the following form: [['output_column', 'agg_function', 'input_column'], ... ] See all aggregation functions here

None
sort_by Optional[List[str]]

List of column names to sort on.

None

Returns:

Type Description
List[DataFrame]

List of Pandas dataframes with one dataframe of matching neighbors for each query vector.

Raises:

Type Description
KDBAIException

Raised when an error occurs during search.

Examples:

# Find the closest neighbour of a single hybrid query vector
table.hybrid_search(dense_vectors=[[0,0,0,0,0,0,0,0,0,0]],
                    sparse_vectors=[{101:1,4578:1,102:1}],
                    n=1)

# Find the 3 closest neighbours for 2 hybrid queries
table.hybrid_search(dense_vectors=[[0,0,0,0,0,0,0,0,0,0],[1,1,1,1,1,1,1,1,1,1]],
                    sparse_vectors=[{101:1,4578:1,102:1},{101:1,6079:2,102:1}],
                    n=3)

# Weight the sparse leg of the query higher setting alpha = 0.1
table.hybrid_search(dense_vectors=[[0,0,0,0,0,0,0,0,0,0]],
                    sparse_vectors=[{101:1,4578:1,102:1}],
                    alpha=0.1,
                    n=1)

# Filter
table.hybrid_search(dense_vectors=[[0,0,0,0,0,0,0,0,0,0]],
                    sparse_vectors=[{101:1,4578:1,102:1}],
                    n=1,
                    filter=[['within','size',(5,999)],['like','sym','AAP*']])

# Index options
table.hybrid_search(dense_vectors=[[0,0,0,0,0,0,0,0,0,0]],
                    sparse_vectors=[{101:1,4578:1,102:1}],
                    n=1,
                    dense_index_options=dict(efSearch=521),
                    sparse_index_options={'k':1.4,'b':0.78})

# Customized distance name
table.hybrid_search(dense_vectors=[[0,0,0,0,0,0,0,0,0,0]],
                    sparse_vectors=[{101:1,4578:1,102:1}],
                    n=1,
                    distances='myDist')

drop

Drop the table.

Returns:

Type Description
bool

A boolean which is True if the table was successfully dropped.

Examples:

table.drop()

Raises:

Type Description
KDBAIException

Raised when an error occurs during the table deletion.

KDBAIException

Bases: Exception

KDB.AI exception.