Python API Client
This section contains the API references for the Python client to KDB.AI. For example usage, see the Quickstart Guide.
Note
Before you start, ensure you have the following installed on your machine:
- Python (versions 3.8 to 3.11)
kdbai-client
- PyKX dependencies.
Note: Index Options
The argument index_option
of function search()
or the argument dense_index_options
of function hybrid_search()
are the index specific options for similarity search. For example, efSearch
can be specified for HNSW indexes, while clusters
can be specified for IVF/IVFPQ indexes.
For details of the usage of index_option
, see the How to use an Index in KDB.AI page.
Session
Session represents a connection to a KDB.AI instance.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
api_key |
str
|
API Key to be used for authentication. |
None
|
endpoint |
str
|
Server endpoint to connect to. |
'http://localhost:8082'
|
Example
Open a session on KDB.AI Cloud with an api key:
session = Session(endpoint='YOUR_INSTANCE_ENDPOINT', api_key='YOUR_API_KEY')
Open a session on a custom KDB.AI instance on http://localhost:8082:
session = kdbai.Session(endpoint='http://localhost:8082')
version
Retrieve version information from server
list
table
Retrieve an existing table which was created in the previous session.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name |
str
|
Name of the table to retrieve. |
required |
Returns:
Type | Description |
---|---|
Table
|
A |
Example
Retrieve the trade
table:
session1 = kdbai.Session(endpoint='http://localhost:8082') # Previous session
table1 = session1.create_table('trade1', schema) # Create table 'trade1'
session2 = kdbai.Session(endpoint='http://localhost:8082') # Current session
table2 = session2.table("trade1") # Retrieve table 'trade1'
create_table
Create a table with a schema
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name |
str
|
Name of the table to create. |
required |
schema |
dict
|
Schema of the table to create. This schema must contain a list of columns. All columns
must have a |
required |
Returns:
Type | Description |
---|---|
Table
|
A newly created |
Raises:
Type | Description |
---|---|
KDBAIException
|
Raised when a error happens during the creation of the table. |
Example Flat Index
schema = {'columns': [{'name': 'id', 'pytype': 'str'},
{'name': 'tag', 'pytype': 'str'},
{'name': 'text', 'pytype': 'bytes'},
{'name': 'embeddings',
'vectorIndex': {'dims': 1536, 'metric': 'L2', 'type': 'flat'}}]}
table = session.create_table('documents', schema)
Example qFlat Index
schema = {'columns': [{'name': 'id', 'pytype': 'str'},
{'name': 'tag', 'pytype': 'str'},
{'name': 'text', 'pytype': 'bytes'},
{'name': 'embeddings',
'vectorIndex': {'dims': 1536, 'metric': 'L2', 'type': 'qFlat'}}]}
table = session.create_table('documents', schema)
Example IVF Index
schema = {'columns': [{'name': 'id', 'pytype': 'str'},
{'name': 'tag', 'pytype': 'str'},
{'name': 'text', 'pytype': 'bytes'},
{'name': 'embeddings',
'vectorIndex': {'trainingVectors': 1000,
'metric': 'CS',
'type': 'ivf',
'nclusters': 10}}]}
table = session.create_table('documents', schema)
Example IVFPQ Index
schema = {'columns': [{'name': 'id', 'pytype': 'str'},
{'name': 'tag', 'pytype': 'str'},
{'name': 'text', 'pytype': 'bytes'},
{'name': 'embeddings',
'vectorIndex': {'trainingVectors': 5000,
'metric': 'L2',
'type': 'ivfpq',
'nclusters': 50,
'nsplits': 8,
'nbits': 8}}]}
table = session.create_table('documents', schema)
Example HNSW Index
schema = {'columns': [{'name': 'id', 'pytype': 'str'},
{'name': 'tag', 'pytype': 'str'},
{'name': 'text', 'pytype': 'bytes'},
{'name': 'embeddings',
'vectorIndex': {'dims': 1536,
'metric': 'IP',
'type': 'hnsw',
'efConstruction' : 8, 'M': 8}}]}
table = session.create_table('documents', schema)
Example Sparse Index
schema = {'columns': [{'name': 'id', 'pytype': 'str'},
{'name': 'tag', 'pytype': 'str'},
{'name': 'text', 'pytype': 'bytes'},
{'name': 'embeddings',
'sparseIndex': {'k': 1.25,
'b': 0.75}}]}
table = session.create_table('documents', schema)
Example Flat with Sparse Index
schema = {'columns': [{'name': 'id', 'pytype': 'str'},
{'name': 'tag', 'pytype': 'str'},
{'name': 'text', 'pytype': 'bytes'},
{'name': 'denseCol',
'vectorIndex': {'dims': 1536,
'metric': 'L2',
'type': 'flat'}},
{'name': 'sparseCol',
'sparseIndex': {'k': 1.25,
'b': 0.75}}]}
table = session.create_table('documents', schema)
Table
KDB.AI table.
Table object shall be created with session.create_table(...)
or retrieved with session.table(...)
.
This constructor shall not be used directly.
schema
Retrieve the schema of the table.
Raises:
Type | Description |
---|---|
KDBAIException
|
Raised when an error occurs during schema retrieval |
Returns:
Type | Description |
---|---|
Dict
|
A |
Example
table.schema()
{'columns': [{'name': 'id', 'pytype': 'str'},
{'name': 'tag', 'pytype': 'str'},
{'name': 'text', 'pytype': 'bytes'},
{'name': 'embeddings',
'pytype': 'float32',
'vectorIndex': {'dims': 1536, 'metric': 'L2', 'type': 'flat'}}]}
train
Train the index (IVF and IVFPQ only).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
DataFrame
|
Pandas dataframe with column names/types matching the target table. |
required |
warn |
bool
|
If True, display a warning when |
True
|
Returns:
Type | Description |
---|---|
str
|
A |
Examples:
from datetime import timedelta
from datetime import datetime
ROWS = 50
DIMS = 10
data = {
"time": [timedelta(microseconds=np.random.randint(0, int(1e10))) for _ in range(ROWS)],
"sym": [f"sym_{np.random.randint(0, 999)}" for _ in range(ROWS)],
"realTime": [datetime.utcnow() for _ in range(ROWS)],
"price": [np.random.rand(DIMS).astype(np.float32) for _ in range(ROWS)],
"size": [np.random.randint(1, 100) for _ in range(ROWS)],
}
df = pd.DataFrame(data)
table.train(df)
Raises:
Type | Description |
---|---|
KDBAIException
|
Raised when an error occurs during training. |
insert
Insert data into the table.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
DataFrame
|
Pandas dataframe with column names/types matching the target table. |
required |
warn |
bool
|
If True, display a warning when |
True
|
Returns:
Type | Description |
---|---|
bool
|
A boolean which is True if the insertion was successful. |
Examples:
ROWS = 50
DIMS = 10
data = {
"time": [timedelta(microseconds=np.random.randint(0, int(1e10))) for _ in range(ROWS)],
"sym": [f"sym_{np.random.randint(0, 999)}" for _ in range(ROWS)],
"realTime": [datetime.utcnow() for _ in range(ROWS)],
"price": [np.random.rand(DIMS).astype(np.float32) for _ in range(ROWS)],
"size": [np.random.randint(1, 100) for _ in range(ROWS)],
}
df = pd.DataFrame(data)
table.insert(df)
Raises:
Type | Description |
---|---|
KDBAIException
|
Raised when an error occurs during insert. |
query
Query data from the table.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filter |
Optional[List[list]]
|
A list of filter conditions as triplets in the following format:
|
None
|
group_by |
Optional[str]
|
A list of column names to use for group by. |
None
|
aggs |
Optional[List[list]]
|
Either a list of column names to select or a list of aggregations to perform as a
list of triplers in the following form:
|
None
|
sort_by |
Optional[List[str]]
|
List of column names to sort on. |
None
|
fill |
Optional[str]
|
This defines how to handle null values. This should be either |
None
|
Returns:
Type | Description |
---|---|
DataFrame
|
Pandas dataframe with the query results. |
Examples:
table.query(group_by = ['sensorID', 'qual'])
table.query(filter = [['within', 'qual', [0, 2]]])
# Select subset of columns
table.query(aggs=['size'])
table.query(aggs=['size', 'price'])
Raises:
Type | Description |
---|---|
KDBAIException
|
Raised when an error occurs during query. |
search
Perform similarity search on the table, supports dense or sparse queries.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
vectors |
List[list] | List[dict]
|
Query vectors for the search. |
required |
n |
int
|
Number of neighbours to return. |
5
|
index_options |
dict
|
Index specific options for similarity search. |
None
|
distances |
str
|
Optional name of a column to output the distances. If not specified, __nn_distance will be added as an extra column to the result table. |
None
|
filter |
Optional[List[list]]
|
A list of filter conditions as triplets in the following format:
|
None
|
group_by |
Optional[str]
|
A list of column names to use for group by. |
None
|
aggs |
Optional[List[list]]
|
Either a list of column names to select or a list of aggregations to perform as a
list of triplers in the following form:
|
None
|
sort_by |
Optional[List[str]]
|
List of column names to sort on. |
None
|
Returns:
Type | Description |
---|---|
List[DataFrame]
|
List of Pandas dataframes with one dataframe of matching neighbors for each query vector. |
Examples:
#Find the closest neighbour of a single (dense) query vector
table.search(vectors=[[0,0,0,0,0,0,0,0,0,0]], n=1)
#Find the closest neighbour of a single (sparse) query vector
table.search(vectors=[{101:1,4578:1,102:1}], n=1)
#Find the 3 closest neighbours of 2 query vectors
table.search(vectors=[[0,0,0,0,0,0,0,0,0,0], [1,1,1,1,1,1,1,1,1,1]], n=3)
# With aggregation and sorting
table.search(vectors=[[0,0,0,0,0,0,0,0,0,0],[1,1,1,1,1,1,1,1,1,1]],
n=3,
aggs=[['sumSize','sum','size']],
group_by=['sym'],
sort_by=['sumSize'])
# Returns a subset of columns for each match
table.search(vectors=[[0,0,0,0,0,0,0,0,0,0],[1,1,1,1,1,1,1,1,1,1]], n=3, aggs=['size'])
table.search(vectors=[[0,0,0,0,0,0,0,0,0,0],[1,1,1,1,1,1,1,1,1,1]], n=3, aggs=['size', 'price'])
# Filter
table.search(vectors=[[0,0,0,0,0,0,0,0,0,0],[1,1,1,1,1,1,1,1,1,1]],
n=3,
filter=[['within','size',(5,999)],['like','sym','AAP*']])
# Customized distance name
table.search(vectors=[[0,0,0,0,0,0,0,0,0,0],[1,1,1,1,1,1,1,1,1,1]],
n=3,
distances='myDist')
# Index options
table.search(vectors=[[0,0,0,0,0,0,0,0,0,0],[1,1,1,1,1,1,1,1,1,1]],n=3,index_options=dict(efSearch=512))
table.search(vectors=[[0,0,0,0,0,0,0,0,0,0],[1,1,1,1,1,1,1,1,1,1]],n=3,index_options=dict(clusters=16))
Raises:
Type | Description |
---|---|
KDBAIException
|
Raised when an error occurs during search. |
hybrid_search
Perform hybrid search on the table.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dense_vectors |
list of lists
|
Dense query vectors for the search. |
required |
sparse_vectors |
list of dicts
|
Sparse query vectors for the search. |
required |
n |
int
|
Number of neighbours to return. |
5
|
dense_index_options |
dict
|
Index specific options for similarity search. |
None
|
sparse_index_options |
dict
|
Index specific options for similarity search. |
None
|
alpha |
float
|
Weight of strategy in [0,1], 0 sparse vs 1 dense |
0.5
|
distances |
str
|
Optional name of a column to output the distances. If not specified, __nn_distance will be added as an extra column to the result table. |
None
|
filter |
Optional[List[list]]
|
A list of filter conditions as triplets in the following format:
|
None
|
group_by |
Optional[str]
|
A list of column names to use for group by. |
None
|
aggs |
Optional[List[list]]
|
Either a list of column names to select or a list of aggregations to perform as a
list of triplers in the following form:
|
None
|
sort_by |
Optional[List[str]]
|
List of column names to sort on. |
None
|
Returns:
Type | Description |
---|---|
List[DataFrame]
|
List of Pandas dataframes with one dataframe of matching neighbors for each query vector. |
Raises:
Type | Description |
---|---|
KDBAIException
|
Raised when an error occurs during search. |
Examples:
# Find the closest neighbour of a single hybrid query vector
table.hybrid_search(dense_vectors=[[0,0,0,0,0,0,0,0,0,0]],
sparse_vectors=[{101:1,4578:1,102:1}],
n=1)
# Find the 3 closest neighbours for 2 hybrid queries
table.hybrid_search(dense_vectors=[[0,0,0,0,0,0,0,0,0,0],[1,1,1,1,1,1,1,1,1,1]],
sparse_vectors=[{101:1,4578:1,102:1},{101:1,6079:2,102:1}],
n=3)
# Weight the sparse leg of the query higher setting alpha = 0.1
table.hybrid_search(dense_vectors=[[0,0,0,0,0,0,0,0,0,0]],
sparse_vectors=[{101:1,4578:1,102:1}],
alpha=0.1,
n=1)
# Filter
table.hybrid_search(dense_vectors=[[0,0,0,0,0,0,0,0,0,0]],
sparse_vectors=[{101:1,4578:1,102:1}],
n=1,
filter=[['within','size',(5,999)],['like','sym','AAP*']])
# Index options
table.hybrid_search(dense_vectors=[[0,0,0,0,0,0,0,0,0,0]],
sparse_vectors=[{101:1,4578:1,102:1}],
n=1,
dense_index_options=dict(efSearch=521),
sparse_index_options={'k':1.4,'b':0.78})
# Customized distance name
table.hybrid_search(dense_vectors=[[0,0,0,0,0,0,0,0,0,0]],
sparse_vectors=[{101:1,4578:1,102:1}],
n=1,
distances='myDist')
drop
Drop the table.
Returns:
Type | Description |
---|---|
bool
|
A boolean which is True if the table was successfully dropped. |
Examples:
table.drop()
Raises:
Type | Description |
---|---|
KDBAIException
|
Raised when an error occurs during the table deletion. |