Performing Searches

This section contains details of how to execute similarity searches. For more advanced details on filtered similarity searches, see here. Similarity searches in KDB.AI are based on (approximate) nearest neighbor algorithms.

Selecting the table to search

Each table in KDB.AI has an associated name. In order to perform a search, specify the table in which the relevant vector embeddings are stored. Using the python client you can create a table object from the session.

Python

documents = session.table("documents")

Searching

Now given a new vector embedding you can perform a search for the nearest neighbors. In this example, the embeddings are assumed to be 8 dimensional and the number of nearest neighbours is set to 3.

Python

documents.search(vectors=[[1.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0]], n=3)

Batch searches

For larger workloads it can be helpful to send multiple query vectors at once.

Python

documents.search(vectors=[[1.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0],[1.0,7.0,1.0,1.0,7.0,1.0,1.0,7.0,1.0,1.0,7.0,1.0]], n=3)

Processing results

It is possible to return a subset of the columns in the table reducing the amount to data sent back to the client.

Python

documents.search(vectors=[[1.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0]], n=3, aggs=[["author"],["context"]])

In addition to returning a subset of the columns the user can return aggregated resuts, group by categorical variables, and sort based on a column name.

Python

documents.search(vectors=[[1.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0]], n=3, aggs=[('sumLength','sum','length')], group_by=['author'], sort_by=['sumLength'])