Conduct Similarity Search
This section provides details of how to execute similarity searches. For more advanced search filters, see Customize Filters.
Similarity searches in KDB.AI are based on Approximate Nearest Neighbor (ANN) algorithms.
Setup
Before you start, make sure you have:
- An active KDB.AI Cloud or Server license
- Installed the latest version of KDB.AI Cloud or Server
- A valid API key if you're using KDB.AI Cloud
- Python Client
Select the table to search
To perform a search, specify the name of the table in which the relevant vector embeddings are stored. Using Python Client, you can create a table
object from the session:
documents = session.table("documents")
Search
Now that you have a vector embedding, you can perform a search for the nearest neighbors. Python Client uses the table
object, whereas REST Client uses the table name as above. In this example, the embeddings are assumed to be eight dimensional and the number of nearest neighbours is set to three.
Use the following command to search for the nearest neighbours:
documents.search(vectors=[[1.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0]], n=3)
curl -s -H "Content-Type: application/json" localhost:8082/api/v1/kxi/search \
-d '{"table":"documents","n":3,"vectors":[[1.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0]]}'
Batch searches
For larger workloads you can send multiple query vectors at once as seen in the following command.
documents.search(vectors=[[1.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0],[1.0,7.0,1.0,1.0,7.0,1.0,1.0,7.0,1.0,1.0,7.0,1.0]], n=3)
curl -s -H "Content-Type: application/json" localhost:8082/api/v1/kxi/search \
-d '{"table":"documents","n":3,
"vectors":[[1.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0],[1.0,7.0,1.0,1.0,7.0,1.0,1.0,7.0,1.0,1.0,7.0,1.0]]}'
Processing results
You can return a subset of the columns in the table, reducing the amount to data sent back to the client:
documents.search(vectors=[[1.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0]], n=3, aggs=["author","content"])
curl -s -H "Content-Type: application/json" localhost:8082/api/v1/kxi/search \
-d '{"table":"documents","n":3,
"vectors":[[1.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0]],
agg=["author","content"]}'
In addition to returning a subset of the columns, you can return aggregated results, grouped by categorical variables, and sorted by column name:
documents.search(vectors=[[1.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0]], n=3, aggs=[('sumLength','sum','length')], group_by=['author'], sort_by=['sumLength'])
curl -s -H "Content-Type: application/json" localhost:8082/api/v1/kxi/search \
-d '{"table":"documents","n":3,
"vectors":[[1.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0]],
agg=[["sumLength","sum","length"]],
groupBy=["author"],sortCols=["sumLength"]}'
You can find all supported aggregations listed here.
Next steps
Now that you're familiar with similarity searches, you can do the following:
- Read about multimodal Retrieval Augmented Generation on the Learning hub, visit the GitHub repo, open the sample or run the notebook directly in Google Colab.
- Practice document search after reading our Learning hub article. Visit the GitHub repo, open the Semantic search on PDF Documents sample or run the notebook in Google Colab
- Run a sentiment analysis. Read details on the Learning hub, go to the GitHub repo, open the Sentiment analysis on resort reviews sample and run the notebook directly in Google Colab
- Develop recommendation systems. Learn more on the Learning hub, visit the GitHub repo, open the Music recommendation on Spotify data sample or run the notebook directly in Google Colab.