KDB.AI Python API Client
Overview
This is the API reference for the Python client to KDB.AI. For example usage see the Quickstart guide
Session
Session represents a connection to a KDB.AI instance.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
api_key |
str
|
API Key to be used for authentication. |
None
|
host |
str
|
Host name or IP address of the KDB.AI server to connect. |
HOST
|
port |
int
|
REST API gateway port on KDB.AI server. |
PORT
|
protocol |
str
|
|
PROTOCOL
|
Example
Open a session on KDB.AI Cloud with an api key:
session = Session(api_key='YOUR_API_KEY')
Open a session on a custom KDB.AI instance on http://localhost:8082:
session = kdbai.Session(host='localhost', port=8082, protocol='http')
config
Retrieve the server configuration.
Returns:
Type | Description |
---|---|
Dict
|
A |
list
table
create_table
Create a table with a schema
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name |
str
|
Name of the table to create. |
required |
schema |
dict
|
Schema of the table to create. This schema must contain a list of columns. All columns
must have either a |
required |
Raises:
Type | Description |
---|---|
KDBAIException
|
Raised when a error happens during the creation of the table. |
Example
schema = {'columns': [{'name': 'id', 'pytype': 'str'},
{'name': 'tag', 'pytype': 'str'},
{'name': 'text', 'pytype': 'bytes'},
{'name': 'embeddings',
'vectorIndex': {'dims': 1536, 'metric': 'L2', 'type': 'flat'}}]}
table = session.create_table('documents', schema)
Table
KDB.AI table.
Table object shall be created with session.create_table(...)
or retrieved with session.table(...)
.
This constructor shall not be used directly.
schema
Retrieve the schema of the table.
Raises:
Type | Description |
---|---|
KDBAIException
|
Raised when an error occurs during schema retrieval |
Returns:
Type | Description |
---|---|
Dict
|
A |
Example
table.schema()
{'columns': [{'name': 'id', 'pytype': 'str', 'qtype': 'symbol'},
{'name': 'tag', 'pytype': 'str', 'qtype': 'symbol'},
{'name': 'text', 'pytype': 'bytes', 'qtype': 'string'},
{'name': 'embeddings',
'pytype': 'float32',
'qtype': 'reals',
'vectorIndex': {'dims': 1536, 'metric': 'L2', 'type': 'flat'}}]}
train
Train the index (IVF and IVFPQ only).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
DataFrame
|
Pandas dataframe with column names/types matching the target table. |
required |
warn |
bool
|
If True, display a warning when |
True
|
Raises:
Type | Description |
---|---|
KDBAIException
|
Raised when an error occurs during training. |
insert
Insert data into the table.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
DataFrame
|
Pandas dataframe with column names/types matching the target table. |
required |
warn |
bool
|
If True, display a warning when |
True
|
Examples:
ROWS = 50 DIMS = 10
data = { "time": [timedelta(microseconds=np.random.randint(0, int(1e10))) for _ in range(ROWS)], "sym": [f"sym_{np.random.randint(0, 999)}" for _ in range(ROWS)], "realTime": [datetime.utcnow() for _ in range(ROWS)], "price": [np.random.rand(DIMS).astype(np.float32) for _ in range(ROWS)], "size": [np.random.randint(1, 100) for _ in range(ROWS)], } df = pd.DataFrame(data) table.insert(df)
Raises:
Type | Description |
---|---|
KDBAIException
|
Raised when an error occurs during insert. |
query
Query data from the table.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filter |
Optional[List[list]]
|
A list of filter conditions as triplets in the following format:
|
None
|
group_by |
Optional[str]
|
A list of column names to use for group by. |
None
|
aggs |
Optional[List[list]]
|
Either a list of column names to select or a list of aggregations to perform as a
list of triplers in the following form:
|
None
|
sort_by |
Optional[List[str]]
|
List of column names to sort on. |
None
|
fill |
Optional[str]
|
This defines how to handle null values. This should be either |
None
|
start_time |
Optional[str]
|
Start of the time interval to query as an ISO 8601 formatted string
( |
None
|
end_time |
Optional[str]
|
End of the time interval to query as an ISO 8601 formatted string
( |
None
|
input_timezone |
Optional[str]
|
The timezones of start_time and end_time, default is UTC if not specified. |
None
|
output_timezone |
Optional[str]
|
The timezone of output timestamp columns, default is UTC if not specified. |
None
|
Examples:
table.query(group_by = ['sensorID', 'qual'])
table.query(filter = [['within', 'qual', [0, 2]]])
table.query(start_time='2000-05-26', end_time='2000-05-27')
Raises:
Type | Description |
---|---|
KDBAIException
|
Raised when an error occurs during query. |
Returns:
Type | Description |
---|---|
DataFrame
|
Pandas dataframe with the query results. |
search
Perform similarity search on the table.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
vectors |
list of lists
|
Query vectors for the search. |
required |
n |
int
|
Number of neighbours to return. |
1
|
index_options |
dict
|
Index specific options for similarity search. |
None
|
distances |
str
|
Optional name of a column to output the distances. If not specified, __nn_distance will be added as an extra column to the result table. |
None
|
filter |
Optional[List[list]]
|
A list of filter conditions as triplets in the following format:
|
None
|
group_by |
Optional[str]
|
A list of column names to use for group by. |
None
|
aggs |
Optional[List[list]]
|
Either a list of column names to select or a list of aggregations to perform as a
list of triplers in the following form:
|
None
|
sort_by |
Optional[List[str]]
|
List of column names to sort on. |
None
|
start_time |
Optional[str]
|
Start of the time interval to query as an ISO 8601 formatted string
( |
None
|
end_time |
Optional[str]
|
End of the time interval to query as an ISO 8601 formatted string
( |
None
|
input_timezone |
Optional[str]
|
The timezones of start_time and end_time, default is UTC if not specified. |
None
|
output_timezone |
Optional[str]
|
The timezone of output timestamp columns, default is UTC if not specified. |
None
|
Examples:
#Find the closest neighbour of a single query vector
table.search(vectors=[[0,0,0,0,0,0,0,0,0,0]], n=1)
#Find the 3 closest neighbours of 2 query vectors
table.search(vectors=[[0,0,0,0,0,0,0,0,0,0], [1,1,1,1,1,1,1,1,1,1]], n=3)
# With aggregation and sorting
table.search(vectors=[[0,0,0,0,0,0,0,0,0,0],[1,1,1,1,1,1,1,1,1,1]],
n=3,
aggs=[('sumSize','sum','size')],
group_by=[('sym')],
sort_by=[('sumSize')])
# Filter
table.search(vectors=[[0,0,0,0,0,0,0,0,0,0],[1,1,1,1,1,1,1,1,1,1]],
n=3,
filter=[('within','size',(5,999)),('like','sym','AAP*')])
# More advanced options
table.search(vectors=[[0,0,0,0,0,0,0,0,0,0],[1,1,1,1,1,1,1,1,1,1]],
n=3,
input_timezone='UTC',
output_timezone='Europe/Zurich',
start_time='2023.08.01D12:13:14.123456789',
end_time='2030.08.30D12:13:14.123456789',
distances='myDist')
Raises:
Type | Description |
---|---|
KDBAIException
|
Raised when an error occurs during search. |
Returns:
Type | Description |
---|---|
List[DataFrame]
|
List of Pandas dataframes with one dataframe of matching neighbors for each query vector. |
drop
Drop the table.
Examples:
table.drop()
Raises:
Type | Description |
---|---|
KDBAIException
|
Raised when an error occurs during the table deletion. |