Python API Client
This section contains the API references for the Python client to KDB.AI. For example usage, see the Quickstart Guide.
Session
Session represents a connection to a KDB.AI instance.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
api_key |
str
|
API Key to be used for authentication. |
None
|
endpoint |
str
|
Server endpoint to connect to. |
'http://localhost:8082'
|
Example
Open a session on KDB.AI Cloud with an api key:
session = Session(endpoint='YOUR_INSTANCE_ENDPOINT', api_key='YOUR_API_KEY')
Open a session on a custom KDB.AI instance on http://localhost:8082:
session = kdbai.Session(endpoint='http://localhost:8082')
list
table
Retrieve an existing table which was created in the previous session.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name |
str
|
Name of the table to retrieve. |
required |
Returns:
Type | Description |
---|---|
Table
|
A |
Example
Retrieve the trade
table:
session1 = kdbai.Session(endpoint='http://localhost:8082') # Previous session
table1 = session1.create_table('trade1', schema) # Create table 'trade1'
session2 = kdbai.Session(endpoint='http://localhost:8082') # Current session
table2 = session2.table("trade1") # Retrieve table 'trade1'
create_table
Create a table with a schema
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name |
str
|
Name of the table to create. |
required |
schema |
dict
|
Schema of the table to create. This schema must contain a list of columns. All columns
must have either a |
required |
Returns:
Type | Description |
---|---|
Table
|
A newly created |
Raises:
Type | Description |
---|---|
KDBAIException
|
Raised when a error happens during the creation of the table. |
Example Flat Index
schema = {'columns': [{'name': 'id', 'pytype': 'str'},
{'name': 'tag', 'pytype': 'str'},
{'name': 'text', 'pytype': 'bytes'},
{'name': 'embeddings',
'vectorIndex': {'dims': 1536, 'metric': 'L2', 'type': 'flat'}}]}
table = session.create_table('documents', schema)
Example IVF Index
schema = {'columns': [{'name': 'id', 'pytype': 'str'},
{'name': 'tag', 'pytype': 'str'},
{'name': 'text', 'pytype': 'bytes'},
{'name': 'embeddings',
'vectorIndex': {'trainingVectors': 1000,
'metric': 'CS',
'type': 'ivf',
'nclusters': 10}}]}
table = session.create_table('documents', schema)
Example IVFPQ Index
schema = {'columns': [{'name': 'id', 'pytype': 'str'},
{'name': 'tag', 'pytype': 'str'},
{'name': 'text', 'pytype': 'bytes'},
{'name': 'embeddings',
'vectorIndex': {'trainingVectors': 5000,
'metric': 'L2',
'type': 'ivfpq',
'nclusters': 50,
'nsplits': 8,
'nbits': 8}}]}
table = session.create_table('documents', schema)
Example HNSW Index
```python schema = {'columns': [{'name': 'id', 'pytype': 'str'}, {'name': 'tag', 'pytype': 'str'}, {'name': 'text', 'pytype': 'bytes'}, {'name': 'embeddings', 'vectorIndex': {'dims': 1536, 'metric': 'IP', 'type': 'hnsw', 'efConstruction' : 8, 'M': 8}}]} table = session.create_table('documents', schema)
Example Sparse Index
```python schema = {'columns': [{'name': 'id', 'pytype': 'str'}, {'name': 'tag', 'pytype': 'str'}, {'name': 'text', 'pytype': 'bytes'}, {'name': 'embeddings', 'sparseIndex': {'k': 1.25, 'b': 0.75}}]} table = session.create_table('documents', schema)
Example Flat + Sparse Indexes:
schema = {'columns': [{'name': 'id', 'pytype': 'str'},
{'name': 'tag', 'pytype': 'str'},
{'name': 'text', 'pytype': 'bytes'},
{'name': 'denseCol',
'vectorIndex': {'dims': 1536,
'metric': 'L2',
'type': 'flat'}},
{'name': 'sparseCol',
'sparseIndex': {'k': 1.25,
'b': 0.75}}]}
table = session.create_table('documents', schema)
Table
KDB.AI table.
Table object shall be created with session.create_table(...)
or retrieved with session.table(...)
.
This constructor shall not be used directly.
schema
Retrieve the schema of the table.
Raises:
Type | Description |
---|---|
KDBAIException
|
Raised when an error occurs during schema retrieval |
Returns:
Type | Description |
---|---|
Dict
|
A |
Example
table.schema()
{'columns': [{'name': 'id', 'pytype': 'str', 'qtype': 'symbol'},
{'name': 'tag', 'pytype': 'str', 'qtype': 'symbol'},
{'name': 'text', 'pytype': 'bytes', 'qtype': 'string'},
{'name': 'embeddings',
'pytype': 'float32',
'qtype': 'reals',
'vectorIndex': {'dims': 1536, 'metric': 'L2', 'type': 'flat'}}]}
train
Train the index (IVF and IVFPQ only).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
DataFrame
|
Pandas dataframe with column names/types matching the target table. |
required |
warn |
bool
|
If True, display a warning when |
True
|
Returns:
Type | Description |
---|---|
str
|
A |
Examples:
from datetime import timedelta
from datetime import datetime
ROWS = 50
DIMS = 10
data = {
"time": [timedelta(microseconds=np.random.randint(0, int(1e10))) for _ in range(ROWS)],
"sym": [f"sym_{np.random.randint(0, 999)}" for _ in range(ROWS)],
"realTime": [datetime.utcnow() for _ in range(ROWS)],
"price": [np.random.rand(DIMS).astype(np.float32) for _ in range(ROWS)],
"size": [np.random.randint(1, 100) for _ in range(ROWS)],
}
df = pd.DataFrame(data)
table.train(df)
Raises:
Type | Description |
---|---|
KDBAIException
|
Raised when an error occurs during training. |
insert
Insert data into the table.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
DataFrame
|
Pandas dataframe with column names/types matching the target table. |
required |
warn |
bool
|
If True, display a warning when |
True
|
Returns:
Type | Description |
---|---|
bool
|
A boolean which is True if the insertion was successful. |
Examples:
ROWS = 50
DIMS = 10
data = {
"time": [timedelta(microseconds=np.random.randint(0, int(1e10))) for _ in range(ROWS)],
"sym": [f"sym_{np.random.randint(0, 999)}" for _ in range(ROWS)],
"realTime": [datetime.utcnow() for _ in range(ROWS)],
"price": [np.random.rand(DIMS).astype(np.float32) for _ in range(ROWS)],
"size": [np.random.randint(1, 100) for _ in range(ROWS)],
}
df = pd.DataFrame(data)
table.insert(df)
Raises:
Type | Description |
---|---|
KDBAIException
|
Raised when an error occurs during insert. |
query
Query data from the table.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filter |
Optional[List[list]]
|
A list of filter conditions as triplets in the following format:
|
None
|
group_by |
Optional[str]
|
A list of column names to use for group by. |
None
|
aggs |
Optional[List[list]]
|
Either a list of column names to select or a list of aggregations to perform as a
list of triplers in the following form:
|
None
|
sort_by |
Optional[List[str]]
|
List of column names to sort on. |
None
|
fill |
Optional[str]
|
This defines how to handle null values. This should be either |
None
|
Returns:
Type | Description |
---|---|
DataFrame
|
Pandas dataframe with the query results. |
Examples:
table.query(group_by = ['sensorID', 'qual'])
table.query(filter = [['within', 'qual', [0, 2]]])
# Select subset of columns
table.query(aggs=['size'])
table.query(aggs=['size', 'price'])
Raises:
Type | Description |
---|---|
KDBAIException
|
Raised when an error occurs during query. |
search
Perform similarity search on the table, supports dense or sparse queries.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
vectors |
List[list] | List[dict]
|
Query vectors for the search. |
required |
n |
int
|
Number of neighbours to return. |
1
|
index_options |
dict
|
Index specific options for similarity search. |
None
|
distances |
str
|
Optional name of a column to output the distances. If not specified, __nn_distance will be added as an extra column to the result table. |
None
|
filter |
Optional[List[list]]
|
A list of filter conditions as triplets in the following format:
|
None
|
group_by |
Optional[str]
|
A list of column names to use for group by. |
None
|
aggs |
Optional[List[list]]
|
Either a list of column names to select or a list of aggregations to perform as a
list of triplers in the following form:
|
None
|
sort_by |
Optional[List[str]]
|
List of column names to sort on. |
None
|
Returns:
Type | Description |
---|---|
List[DataFrame]
|
List of Pandas dataframes with one dataframe of matching neighbors for each query vector. |
Examples:
#Find the closest neighbour of a single (dense) query vector
table.search(vectors=[[0,0,0,0,0,0,0,0,0,0]], n=1)
#Find the closest neighbour of a single (sparse) query vector
table.search(vectors=[{101:1,4578:1,102:1}], n=1)
#Find the 3 closest neighbours of 2 query vectors
table.search(vectors=[[0,0,0,0,0,0,0,0,0,0], [1,1,1,1,1,1,1,1,1,1]], n=3)
# With aggregation and sorting
table.search(vectors=[[0,0,0,0,0,0,0,0,0,0],[1,1,1,1,1,1,1,1,1,1]],
n=3,
aggs=[['sumSize','sum','size']],
group_by=['sym'],
sort_by=['sumSize'])
# Returns a subset of columns for each match
table.search(vectors=[[0,0,0,0,0,0,0,0,0,0],[1,1,1,1,1,1,1,1,1,1]], n=3, aggs=['size'])
table.search(vectors=[[0,0,0,0,0,0,0,0,0,0],[1,1,1,1,1,1,1,1,1,1]], n=3, aggs=['size', 'price'])
# Filter
table.search(vectors=[[0,0,0,0,0,0,0,0,0,0],[1,1,1,1,1,1,1,1,1,1]],
n=3,
filter=[['within','size',(5,999)],['like','sym','AAP*']])
# Customized distance name
table.search(vectors=[[0,0,0,0,0,0,0,0,0,0],[1,1,1,1,1,1,1,1,1,1]],
n=3,
distances='myDist')
# Index options
table.search(vectors=[[0,0,0,0,0,0,0,0,0,0],[1,1,1,1,1,1,1,1,1,1]],n=3,index_options=dict(efSearch=512))
table.search(vectors=[[0,0,0,0,0,0,0,0,0,0],[1,1,1,1,1,1,1,1,1,1]],n=3,index_options=dict(clusters=16))
Raises:
Type | Description |
---|---|
KDBAIException
|
Raised when an error occurs during search. |
hybrid_search
Perform hybrid search on the table.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dense_vectors |
list of lists
|
Dense query vectors for the search. |
required |
sparse_vectors |
list of dicts
|
Sparse query vectors for the search. |
required |
n |
int
|
Number of neighbours to return. |
1
|
dense_index_options |
dict
|
Index specific options for similarity search. |
None
|
sparse_index_options |
dict
|
Index specific options for similarity search. |
None
|
alpha |
float
|
Weight of strategy in [0,1], 0 sparse vs 1 dense |
0.5
|
distances |
str
|
Optional name of a column to output the distances. If not specified, __nn_distance will be added as an extra column to the result table. |
None
|
filter |
Optional[List[list]]
|
A list of filter conditions as triplets in the following format:
|
None
|
group_by |
Optional[str]
|
A list of column names to use for group by. |
None
|
aggs |
Optional[List[list]]
|
Either a list of column names to select or a list of aggregations to perform as a
list of triplers in the following form:
|
None
|
sort_by |
Optional[List[str]]
|
List of column names to sort on. |
None
|
Returns:
Type | Description |
---|---|
List[DataFrame]
|
List of Pandas dataframes with one dataframe of matching neighbors for each query vector. |
Raises:
Type | Description |
---|---|
KDBAIException
|
Raised when an error occurs during search. |
Examples:
# Find the closest neighbour of a single hybrid query vector
table.hybrid_search(dense_vectors=[[0,0,0,0,0,0,0,0,0,0]],
sparse_vectors=[{101:1,4578:1,102:1}],
n=1)
# Find the 3 closest neighbours for 2 hybrid queries
table.hybrid_search(dense_vectors=[[0,0,0,0,0,0,0,0,0,0],[1,1,1,1,1,1,1,1,1,1]],
sparse_vectors=[{101:1,4578:1,102:1},{101:1,6079:2,102:1}],
n=3)
# Weight the sparse leg of the query higher setting alpha = 0.1
table.hybrid_search(dense_vectors=[[0,0,0,0,0,0,0,0,0,0]],
sparse_vectors=[{101:1,4578:1,102:1}],
alpha=0.1,
n=1)
# Filter
table.hybrid_search(dense_vectors=[[0,0,0,0,0,0,0,0,0,0]],
sparse_vectors=[{101:1,4578:1,102:1}],
n=1,
filter=[['within','size',(5,999)],['like','sym','AAP*']])
# Index options
table.hybrid_search(dense_vectors=[[0,0,0,0,0,0,0,0,0,0]],
sparse_vectors=[{101:1,4578:1,102:1}],
n=1,
dense_index_options=dict(efSearch=521),
sparse_index_options={'k':1.4,'b':0.78})
# Customized distance name
table.hybrid_search(dense_vectors=[[0,0,0,0,0,0,0,0,0,0]],
sparse_vectors=[{101:1,4578:1,102:1}],
n=1,
distances='myDist')
drop
Drop the table.
Returns:
Type | Description |
---|---|
bool
|
A boolean which is True if the table was successfully dropped. |
Examples:
table.drop()
Raises:
Type | Description |
---|---|
KDBAIException
|
Raised when an error occurs during the table deletion. |