KDB.AI Python API
This page contains the references for KDB.AI's Python API. For example usage, see the Quickstart Guide.
Note
Before you start, ensure you have the following installed on your machine:
- Python (versions 3.8 to 3.11)
- An active KDB.AI Cloud or Server license
kdbai-client
- PyKX dependencies.
Note: Index Options
The argument index_option
of function search()
is the index specific options for similarity search. For example, efSearch
can be specified for HNSW indexes, while clusters
can be specified for IVF/IVFPQ indexes.
For details of the usage of index_option
, see the How to use an Index in KDB.AI page.
Session
Session represents a connection to a KDB.AI instance. To interact with KDBAI Cloud or Server, you first need to create a session. This section summarizes how to create and close a session.
Create session
kdbai_client.Session
Session represents a connection to a KDB.AI instance.
Input parameters:
Name | Type | Description | Required | Default |
---|---|---|---|---|
api_key | str | API Key to be used for authentication. | No | None |
endpoint | str | Server endpoint to connect to. | No | 'http://localhost:8081' |
host | str | Hostname of the KDB.AI server. | No | None |
port | int | Port number on the server. | No | - 8081 if mode='rest' - 8082 if mode='qipc' |
mode | str | Implementation method used for the session. Possible values: rest and qipc |
No | None |
Important
-
If you don't provide the
mode
parameter:- A REST-based session is created if the endpoint starts with https://cloud.kdb.ai.
- Otherwise, a qIPC-based session is created.
-
Note that the REST-based implementation:
- has worse performance due to payload serialization and deserialization.
- has a 10MB limit on payload size for the
train
andinsert
methods.
Example:
import kdbai_client as kdbai
### local server
session = kdbai.Session(endpoint='http://localhost:8082')
session = kdbai.Session(endpoint='http://localhost:8082', mode='qipc')
### local server using REST
session = kdbai.Session(endpoint='http://localhost:8081', mode='rest')
### local server using TLS
session = kdbai.Session(endpoint='http://localhost:8082', options={'tls': True})
### cloud instance
session = kdbai.Session(api_key="abc" endpoint="https://...", mode="rest")
Error handling:
Description | Message | Troubleshooting |
---|---|---|
Success: Session is created and KDB.AI instance can be interacted with. | True | N/A |
Fail: Incorrect API Key is provided when attempting to connect to a KDB.AI Cloud. | KDBAIException with appropriate error message: - qIPC: Error during creating connection, make sure KDB.AI server is running and accepts qIPC connection on port {port}: {e}“ where e is the original underlying error. - REST: Failed to open a session on {self.endpoint} using API key with prefix {tmp}. Please double check your endpoint and api_key . |
Check endpoint (host/port), credentials, and mode parameter. Check port forwarding in your environment and what port rules are allowed/denied. |
Fail: No API Key is provided when attempting to connect to a KDB.AI Cloud. | KDBAIException with appropriate error message: qIPC: “Error during creating connection, make sure KDB.AI server is running and accepts qIPC connection on port {port}: {e}“ where e is the original underlying error. REST: Failed to open a session on {self.endpoint} using API key with prefix {tmp}. Please double check your endpoint and api_key . |
Check endpoint (host/port), credentials, and mode parameter. Check port forwarding in your environment and what port rules are allowed/denied. |
Fail: Server and client versions are incompatible. | Your KDB.AI server is not compatible with this client (kdbai_client=={version}). Use kdbai_client >={versions['clientMinVersion ']} and <={versions['clientMaxVersion ']} |
Upgrade/downgrade either Server or client. |
Error: Session cannot be created because KDB.AI is not available. | RuntimeError('Error during request, make sure KDB.AI server running') | Check your connection and if your server is running. |
Close session
session.close()
You cannot execute any client-server interaction after this call.
Example:
session.close()
Error handling:
Description | Message | Troubleshooting |
---|---|---|
Success: Session is closed and KDB.AI instance can no longer be interacted with. | True | N/A |
Get version
session.version()
Retrieve version info from server and compatible client min/max version.
Example:
session.version()
Error handling:
Description | Message | Troubleshooting |
---|---|---|
Success. version info is returned. | {'serverVersion': '1.4.0','clientMinVersion': '1.4.0' ,'clientMaxVersion': 'latest'} | N/A |
Error: KDBAI is not available. | RuntimeError('Error during request, make sure KDB.AI server running') | Check your connection and if your server is running. |
Database
Create, delete, and retrieve databases.
In KDB.AI, a database is a collection of tables which store related data.
Key principles for database management
To simplify database design/management and prevent naming conflicts, follow the principles below:
- Unique database names: Each database must have a unique name and can contain multiple tables.
- Unique table names within a database: Tables within a database must have unique names, but different databases can contain tables with the same name. This is similar to the concept of namespaces.
- Cascade deletion: When deleting a database, all child entities (tables) will also be deleted.
- Default database: You don't need to create a database to create tables. If you create a table without specifying a database, it will be placed in a default, undeletable database.
Create database
session.create_database
Input parameters:
Name | Type | Description | Required | Default |
---|---|---|---|---|
database | str | Name of the database to create. | Yes | None |
Database name rules
- Max length is 128 characters
- Must contain only alphanumeric characters and underscore
- Must start with an alpha character
Example:
session.create_database("myDatabase")
Error handling:
Description | Message | Troubleshooting |
---|---|---|
Success: Database is created and returned | database instance | N/A |
Fail: Database name is not unique | Raise exception | A database with the given name already exists. Create a database with another name. |
Fail: Database name is not a valid name | Raise exception | Provide a valid str for the database name. |
Error: KDBAI is not available | RuntimeError('Error during request, make sure KDB.AI server running') | Check your connection and if your server is running. |
Get database
session.database
Retrieve database with a given name.
Input parameters:
Name | Type | Description | Required |
---|---|---|---|
database | str | Name of the database to be retrieved | Yes |
Example:
session.database("myDatabase")
Error handling:
Description | Message | Troubleshooting |
---|---|---|
Success: Database with given name is found | Database instance. | N/A |
Fail: Database with given name is not found | KDB.AI Exception: database {name} does not exist | Check the name of the database you are searching for as it does not seem to exist. |
Error: KDBAI is not available | RuntimeError('Error during request, make sure KDB.AI server running') | Check your connection and if your server is running. |
Refresh database
database.refresh()
This method ensures that the list of tables associated with the loaded database is current. If the list is not up-to-date, it updates it. This is particularly useful if tables have been added to the database after the getDatabase
function was called.
Example:
database.refresh()
Error handling:
Description | Message | Troubleshooting |
---|---|---|
Success: Database is refreshed | None | N/A |
Error: KDBAI is not available | RuntimeError('Error during request, make sure KDB.AI server running') | Check your connection and if your server is running. |
List databases
session.databases
Retrieve list of databases in ascending order.
Example:
session.databases()
Error handling:
Description | Message | Troubleshooting |
---|---|---|
Success: Returns list of database names and default database included | list of database names | N/A |
Error: Databases cannot be listed because KDBAI is not available | RuntimeError('Error during request, make sure KDB.AI server running') | Check your connection and if your server is running. |
Delete database
database.drop
Delete database with a given name and all associated tables.
Input parameters:
Name | Type | Description | Required |
---|---|---|---|
database | str | Name of the database to be deleted. | Yes |
Example:
db=session.database("myDatabase")
db.drop()
Error handling:
Description | Message | Troubleshooting |
---|---|---|
Success: Database with given name has been deleted | N/A | N/A |
Error: KDBAI is not available | RuntimeError('Error during request, make sure KDB.AI server running') | Check your connection and if your server is running. |
Table
Create, delete, update, and retrieve tables.
Create table
database.create_table
Input parameters:
Name | Type | Description | Required |
---|---|---|---|
database | instance name | Name of the database. | Yes |
table | str | Name of the table to create. | Yes |
external_data_references | dict | Should contain the keys: - path (path to the existing kdb+ table mounted in our Docker container) - provider (set to kx ) WARNING: the name of the table should match the name of the target table in the existing kdb+ database. |
No |
schema | dict | Schema details for the table. | Yes - if external_data_references is not specified. |
indexes | list of dict | List of index definitions | No |
partitionColumn | str | Column name to partition on | No |
embeddingConfigurations | dict | Should be keyed by embedding column name | No |
Table name rules
- Max length is 128 characters
- Must contain only alphanumeric characters and underscore
- Must start with an alpha character
Example:
schema = [{'name': 'id', 'type': 'int16'},
{'name': 'tag', 'type': 'bool'},
{'name': 'author', 'type': 'str'},
{'name': 'length', 'type': 'int32'},
{'name': 'content', 'type': 'str'},
{'name': 'createdDate', 'type': 'datetime64[D]'},
{'name': 'embeddings', 'type': 'float64s'}]
indexes = [
{'type': 'flat', 'name': 'flat', 'column': 'embeddings', 'params': {'dims': 1536}},
{'type': 'hnsw', 'name': 'fast_hnsw', 'column': 'embeddings', 'params': {'dims': 1536,'M': 8, 'efConstruction': 8}},
{'type': 'hnsw', 'name': 'accurate_hnsw','column': 'embeddings', 'params': {'dims': 1536,'M': 64, 'efConstruction':256}}
]
db = session.database("default")
db.create_table(table="myTable", schema=schema, indexes=indexes)
# create partitioned table
db.create_table(table="myPartitionedTable", schema=schema, indexes=indexes, partition_column='createdDate')
schema
Attributes:
Name | Type | Description | Required |
---|---|---|---|
name | str | Column name | Yes |
type | str | Column type | Yes |
Example:
schema = [ { 'name': 'id', 'type': 'int32'}, { 'name': 'isValid', 'type': 'bool'},
{ 'name': 'embeddings', 'type': 'float32s' }, { 'name': 'sparse_col', 'type': 'general' } ]
indexes
Attributes:
Name | Type | Description | Required |
---|---|---|---|
name | str | Index name | Yes |
type | str | Index type, for example: flat, qFlat, hsnw, ivf, ivfpq, qhsnw | Yes |
column | str | kdb+ column name to apply index | Yes |
params | dict | Index parameters containing index-specific attributes for Flat, qFlat, HNSW, ivf, ivfpq, qHNSW | Yes |
Example:
indexes = [
{'type': 'flat', 'name': 'flat', 'column': 'embeddings', 'params': {'dims': 1536}},
{'type': 'hnsw', 'name': 'fast_hnsw', 'column': 'embeddings', 'params': {'dims': 1536, 'M': 8, 'efConstruction': 8}},
{'type': 'hnsw', 'name': 'accurate_hnsw','column': 'embeddings', 'params': {'dims': 1536, 'M': 64, 'efConstruction':256}}
]
flat
Index-specific attributes (params
) for type = flat
Attribute | Description | Type | Required | Default |
---|---|---|---|---|
dims | Dimension of vector space | int | Yes | N/A |
metric | Distance metric | str | No | L2 |
qFlat
Index-specific attributes (params
) for type = qFlat
Attribute | Description | Type | Required | Default |
---|---|---|---|---|
dims | Dimension of vector space | int | Yes | N/A |
metric | Distance metric | str | No | L2 |
hnsw
Index-specific attributes (params
) for type = hnsw
Attribute | Description | Type | Required | Default |
---|---|---|---|---|
dims | Dimension of vector space | int | Yes | N/A |
M | Graph valency | int | No | 8 |
efConstruction | Search depth at construction | int | No | 8 |
metric | Distance metric | str | No | L2 |
qHnsw
Index-specific attributes (params
) for type = qHnsw
Attribute | Description | Type | Required | Default |
---|---|---|---|---|
dims | Dimension of vector space | int | Yes | N/A |
M | Graph valency | int | No | 8 |
efConstruction | Search depth at construction | int | No | 8 |
metric | Distance metric | str | No | L2 |
mmapLevel | Level of memory mapping. Accepted values: - 0 for both vectors and node connection in memory; - 1 for memory-mapped vectors and in-memory nodes ; - 2 for both vectors and node connections memory mapped. |
int | No | 1 |
An index consists of vectors and nodes. Vectors represent the data points in the vector space, while nodes are part of the graph structure used to organize and search through these vectors efficiently. Nodes connect vectors based on their similarity, forming a graph that facilitates fast nearest-neighbor searches.
ivf
Index-specific attributes (params
) for type = ivf
Attribute | Description | Type | Required | Default |
---|---|---|---|---|
nclusters | Number of clusters | long | No | 8 |
metric | Distance metric | str | No | L2 |
ivfpq
Index-specific attributes (params
) for type = ivfpq
Attribute | Description | Type | Required | Default |
---|---|---|---|---|
nclusters | Number of clusters | long | No | 8 |
nbits | Number of bits to quantize | long | No | 8 |
nsplits | Number of vectors to split | long | No | 8 |
metric | Distance metric | str | No | L2 |
external_data_references
Attributes:
Name | Type | Description | Required |
---|---|---|---|
path | byte str | Path to external table, for instance the existing kdb+ table mounted in our Docker container. | Yes |
provider | str | Provider of external table, for example kx . |
Yes |
Example:
Launch the KDB.AI Server container with the -v
flag to mount an existing kdb+ DB in the container, for example:
docker run -it --e NUM_WRK=1 \
-e SECONDARIES=16 \
-e KDB_LICENSE_B64 \
-v $PWD/vecdb/data:/tmp/kx/data/vdb \
-v $PWD/taq/db:/tmp/kx/remote:ro \ <= mount a local ./taq/db under /tmp/kx/remote in the container as read-only
-p 8082:8082 \
kdbai-db:local
database.create_table("tq", external_data_references=[{'path': b'/tmp/kx/remote', 'provider': 'kx'}])
The name of the table (tq
) should match the name of the target table in the existing kdb+ db.
Error handling:
Description | Message | Troubleshooting |
---|---|---|
Success: Table is created and returned | success result`error!True;table_dictionary;" |
N/A |
Fail: Table name is not unique | Raise exception | Specify a different table name as it appears a table with this name already exists. |
Fail: Table name is not valid | Raise exception | Use a valid string for the table name. |
Fail: Any of the input parameters are of wrong type | ValueError: "invalid arguments types: " ... | Provide the correct type of input parameters required. |
Fail: Any of the input parameters are missing | ValueError: "missing arguments: " ... | Provide required input parameters. |
Fail: Any of the input parameters are invalid | ValueError: "invalid arguments: " ... | Provide known or valid input parameters. |
Fail: Schema individual attributes are not valid | ValueError: "invalid table attributes: " ... | Provide valid attributes in the schema. |
Fail: Schema individual types are not valid | ValueError: "invalid column types: " ... | Provide valid column types in the schema. |
Fail: Index individual parameters are not valid | ValueError: "invalid index parameters: " ... | Double check the parameters of one of the specified indexes. |
Error: KDBAI is not available | RuntimeError('Error during request, make sure KDB.AI server running') | Check your connection and if your server is running. |
Get Table
database.table
Retrieve a table from a database with a given name.
Example:
db=session.database("default")
db.table("myTable")
Error handling:
Description | Message | Troubleshooting |
---|---|---|
Success: Table with given name is found | Table meta dictionary as Pandas DataFrame | N/A |
Error: KDBAI is not available | RuntimeError('Error during request, make sure KDB.AI server running') | Check your connection and if your server is running. |
Refresh table
table.refresh()
This method ensures that the table index and schema information associated with the table is current and calls getTable
function.
Example:
table.refresh()
Error handling:
Description | Message | Troubleshooting |
---|---|---|
Success: Table is refreshed | None | N/A |
Error: KDBAI is not available | RuntimeError('Error during request, make sure KDB.AI server running') | Check your connection and if your server is running. |
List tables
database.tables
Retrieve a list of tables from a database with a given name.
Tables are cached on the database instance. As a result, the data might have changed since the last get or refresh.
Example:
db = session.database("myDatabase")
db.tables
Error handling:
Description | Message | Troubleshooting |
---|---|---|
Success: Tables found | List of table names | N/A |
Error: KDBAI is not available | RuntimeError('Error during request, make sure KDB.AI server running') | Check your connection and if your server is running. |
Delete Table
table.drop()
Delete a table with a given name and all associated indexes.
Example:
db = session.database("default")
table = db.table("myTable")
table.drop()
Error handling:
Description | Message | Troubleshooting |
---|---|---|
Success: Table with given name has been deleted | N/A | N/A |
Error: KDBAI is not available | RuntimeError('Error during request, make sure KDB.AI server running') | Check your connection and if your server is running. |
Index
Retrieve and list indexes.
Get index
table.index
Retrieve an index from a table.
Input parameters:
Name | Type | Description | Required |
---|---|---|---|
name | str | Name of the index to be retrieved | Yes |
Example:
table.index('trade_flat_index')
Error handling:
Description | Message | Troubleshooting |
---|---|---|
Success: Index with given name is found and returned | dictionary | N/A |
Fail: Index name is not valid | ValueError: Index name is invalid | Provide a valid string for the index name. |
Fail: Index with given name is not found | ValueError: Index name is not found | Provide correct index name. |
Error: KDBAI is not available | RuntimeError('Error during request, make sure KDB.AI server running') | Check your connection and if your server is running. |
List indexes
table.indexes
List all indexes for a table.
Example:
db = session.database("default")
table = db.table("myTable")
table.indexes
Error handling:
Description | Message | Troubleshooting |
---|---|---|
Success: Indexes found and returned | list of dictionaries | N/A |
Error: KDBAI is not available | Cannot write to handle ... | Check your connection and if your server is running. |
Update indexes
table.update_indexes
Build one or more indexes.
Allows to build indexes from scratch. Only supported for kdb+ HDB tables.
Input parameters:
Name | Type | Description | Required |
---|---|---|---|
indexes | list | List of index names to build. | Yes |
parts | list | Partitions list to build index in case of partition database. If not given, then indexes will be built on all partitions. | Yes |
Example:
db = session.database("default")
table = db.table("SEC")
table.update_indexes(indexes=["flat_index"], parts=[1,2,3]) #assuming we have a partition column with integer type
Error handling:
Description | Message | Troubleshooting |
---|---|---|
Success: Index(es) with given name(s) updated successfully | None | N/A |
Fail: Operation called on a table managed by kdbai | KDBAIException: feature not supported: build index is only allowed on reference database | Use build index only on reference tables. |
Fail: Index name is not valid | ValueError: Index name is invalid | Provide a valid string for the index name |
Fail: Index with given name is not found | KDBAIException: index not found: invalid | Provide correct index name. |
Fail: Update operation is not valid | ValueError: Update operation is not valid | |
Error: KDBAI is not available | Cannot write to handle ... | Check your connection and if your server is running. |
Data
Insert, query, and search data.
Insert data
table.insert
Add rows to a table.
Input parameters:
Name | Type | Description | Required |
---|---|---|---|
payload | dataframe | Data to insert. | No - not required when using external database. |
Example:
db = session.database("default")
table = db.table("myTable")
table.insert(data)
Error handling:
Description | Message | Troubleshooting |
---|---|---|
Success: Data inserted successfully. | dictionary | N/A |
Fail: Data table does not match with table schema. | KDBAIException: "data has wrong types: cols provided |
Check data schema and expected table schema. |
Error: KDBAI is not available | RuntimeError('Error during request, make sure KDB.AI server running') | Check your connection and if your server is running. |
Train data
table.train
Train data.
Input parameters:
Name | Type | Description | Required |
---|---|---|---|
payload | table | Data to insert. | Yes |
Example:
db = session.database("default")
table = db.table("myTable")
table.train(payload=data)
Error handling:
Description | Message | Troubleshooting |
---|---|---|
Success: Index(es) with given name(s) updated successfully | True | N/A |
Fail: Index name is not valid | ValueError: Index name is invalid | Provide a valid string for the index name. |
Fail: Index with given name is not found | ValueError: Index name is not found | Provide correct index name. |
Error: KDBAI is not available | RuntimeError('Error during request, make sure KDB.AI server running') | Check your connection and if your server is running. |
Query data
table.query
Query data from a table.
Input parameters:
Name | Type | Description | Required |
---|---|---|---|
filter | list of tuples | List of filter conditions, parse tree style. | No |
sort_columns | list of str | The columns by which to sort the results. | No |
group_by | list of str | The column values by which to group the results. | No |
aggs | dictionary | Aggregation rules. Dictionary structure: - Key → new column name - Value → old column name or parse tree style aggregation rule |
No |
limit | int | Number of rows to return. | No |
Example:
db = session.database("default")
table = db.table("myTable")
table.query() #returns all rows in the table
Error handling:
Description | Message | Troubleshooting |
---|---|---|
Success: Successful query | Pandas DataFrame | N/A |
Error: KDBAI is not available | RuntimeError('Error during request, make sure KDB.AI server running') | Check your connection and if your server is running. |
Search data
table.search
Perform a similarity search.
Input parameters:
Name | Type | Description | Required |
---|---|---|---|
type | str | Specify the type of search (tss or otherwise). | No |
vectors | dictionary | Indexes to query with query vectors. | Yes |
n | int | Number of neighbors to return. | No |
range | float | Range within which the nearest neighbours are returned. (only for qFlat) | No |
index_params | dictionary (key is index name and value is dictionary of parameters for that index) | Weights required for multi index search. | No |
options | dictionary | Use this dictionary: - to rename the distance column with distanceColumn=newname - to not return metadata columns with indexOnly=True - to return TSS matched patterns with returnMatches=True - to force a TSS search on a partitioned tables with failing partitions with force=True |
Yes |
filter | list of tuples | List of filter conditions, parse tree style. | No |
searchBy | str or list of str | (Non Transformed TSS only) Perform a TSS search on each group inferred from the specified columns (not to be confused with groupBy which is used for final aggregation of the results) |
No |
group_by | list of str | The column values by which to group the results. | No |
aggs | dictionary | Aggregation rules. | No |
sort_columns | list of str | The columns by which to sort the results. | No |
Example:
db = session.database("default")
table = db.table("myTable")
table.search(vectors={"indexName":v},n=10)
# Filter the data using 'range' (only for qFlat indexes)
table.search(vectors={"indexName":v}, range=5.5)
options
Attribute | Description | Type | Required | Default |
---|---|---|---|---|
distanceColumn | Rename distance column to this. | str | No | None |
indexOnly | Return only index information | bool | No | None |
returnMatches | (Non Transformed TSS only) Return the full detected pattern for each match | boolean | No | None |
force | (Non Transformed TSS only) Force the TSS search even some searchBy group or table partition is failing, ex: when a partition has less data points than the searched pattern | boolean | No | None |
index_params
index_params
is a dictionary where key is index name and value is a dictionary with the arguments below .
Attribute | Description | Type | Required | Default |
---|---|---|---|---|
weight | Weight for each index. | float | Required for multi index input. | None |
Important! For multi index searches, you have to allocate a weight to each index. The sum of all weights must be equal to 1.
Error handling:
Description | Message | Troubleshooting |
---|---|---|
Success: Successful query | list of Pandas DataFrames | N/A |
Error: KDBAI is not available | RuntimeError('Error during request, make sure KDB.AI server running') | Check your connection and if your server is running. |