Conduct Similarity Search

This section provides details of how to execute similarity searches. For more advanced search filters, see Customize Filters.

Similarity searches in KDB.AI are based on approximate nearest neighbor algorithms.

Selecting the table to search

In order to perform a search, specify the name of the table in which the relevant vector embeddings are stored. Using the python client you can create a table object from the session.

Python

documents = session.table("documents")

Searching

Now you have a vector embedding, you can perform a search for the nearest neighbors. The python client uses the table object, whereas the REST client uses the table name as above. In this example, the embeddings are assumed to be eight dimensional and the number of nearest neighbours is set to three.

Use the following command to search for the nearest neighbours.

PythonREST

documents.search(vectors=[[1.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0]], n=3)

curl -s -H "Content-Type: application/json" localhost:8082/api/v1/kxi/search \
-d '{"table":"documents","n":3,"vectors":[[1.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0]]}'

Batch searches

For larger workloads you can send multiple query vectors at once as seen in the following command.

PythonREST

documents.search(vectors=[[1.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0],[1.0,7.0,1.0,1.0,7.0,1.0,1.0,7.0,1.0,1.0,7.0,1.0]], n=3)

curl -s -H "Content-Type: application/json" localhost:8082/api/v1/kxi/search \ 
-d '{"table":"documents","n":3,
"vectors":[[1.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0],[1.0,7.0,1.0,1.0,7.0,1.0,1.0,7.0,1.0,1.0,7.0,1.0]]}'

Processing results

You can return a subset of the columns in the table, reducing the amount to data sent back to the client.

PythonREST

documents.search(vectors=[[1.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0]], n=3, aggs=["author","content"])

curl -s -H "Content-Type: application/json" localhost:8082/api/v1/kxi/search \ 
-d '{"table":"documents","n":3,
"vectors":[[1.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0]],
agg=["author","content"]}'

In addition to returning a subset of the columns, you can return aggregated results, grouped by categorical variables, and sorted based on a column name.

PythonREST

documents.search(vectors=[[1.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0]], n=3, aggs=[('sumLength','sum','length')], group_by=['author'], sort_by=['sumLength'])

curl -s -H "Content-Type: application/json" localhost:8082/api/v1/kxi/search \ 
-d '{"table":"documents","n":3,
"vectors":[[1.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0]],
agg=[["sumLength","sum","length"]],
groupBy=["author"],sortCols=["sumLength"]}'

All supported aggregations are listed here.