Quickstart Guide
This guide outlines the essential steps to using KDB.AI. Before proceeding, ensure your environment is set up as described in Pre-requisites and have the necessary information to connect to your KDB.AI database in Python.
The following instructions apply to both KDB.AI Cloud and KDB.AI Server users.
Create a new table
Before creating a table you must first set the table schema. This is defined as a python dictionary containing a list of columns. For each column you must define the name and either a pytype
or a qtype
. The vector embeddings column should contain a vectorIndex
attribute with the configuration of the index for similarity search. Full schema definition specifications are available in the manage tables section.
schema = {'columns': [{'name': 'id', 'pytype': 'str'},
{'name': 'vectors',
'vectorIndex': {'dims': 8, 'metric': 'L2', 'type': 'flat'}}]}
table = session.create_table('quickstartkdbai',schema)
curl -H "Content-Type: application/json" localhost:8082/api/v1/config/table/quickstartkdbai -d @table.json
The request body is in a file called table.json:
{
"type": "splayed",
"columns": [
{"name": "id", "type": "symbol"},
{
"name": "vectors",
"type": "reals",
"vectorIndex": {
"dims": 8,
"type": "flat",
"metric": "L2"
}
}
]
}
Retrieve a list of tables
Display a list of tables, including your recently created table, using the following command.
session.list()
# Return: ['quickstartkdbai']
curl -s localhost:8082/api/v1/config/table
If you pipe the result to a tool such as jq
you can see the result pretty printed:
curl -s localhost:8082/api/v1/config/table | jq
{
"quickstartkdbai": {
"type": "splayed",
"columns": [
{
"name": "id",
"type": "symbol"
},
{
"name": "vectors",
"type": "reals",
"vectorIndex": {
"dims": 8,
"type": "flat",
"metric": "L2"
}
}
]
}
}
Add data to your table
Generate a vector of five 8-dimensional vectors that will be the vector embeddings.
You can then add these to the pandas dataframe ensuring the column names/types match the table schema.
import numpy as np
import pandas as pd
ids = ['h', 'e', 'l', 'l', 'o'] # Example ID values
vectors = np.random.rand(40).astype(np.float32).reshape(5,8)
df = pd.DataFrame({"id": ids, "vectors": list(vectors)})
table.insert(df)
JSON data can be generated in any language. Below .j.j
from q
is used to generate example data and write to a file to insert.json
.
# start the kdb+ binary with the q command
q
# run the following on the 'q)' prompt
`insert.json 0: enlist .j.j `table`rows!(`quickstartkdbai;([] id:"hello";vectors:(5;8)#(5*8)?1e))
\\
The contents of the generated file should be:
$ cat insert.json
{"table":"quickstartkdbai","rows":
[
{"id":"h","vectors":[0.3927524,0.5170911,0.5159796,0.4066642,0.1780839,0.3017723,0.785033,0.5347096]},
{"id":"e","vectors":[0.7111716,0.411597,0.4931835,0.5785203,0.08388858,0.1959907,0.375638,0.6137452]},
{"id":"l","vectors":[0.5294808,0.6916099,0.2296615,0.6919531,0.4707882,0.6346716,0.9672399,0.2306385]},
{"id":"l","vectors":[0.949975,0.439081,0.5759051,0.5919004,0.8481566,0.389056,0.391543,0.08123546]},
{"id":"o","vectors":[0.9367504,0.2782122,0.2392341,0.1508133,0.1567317,0.9785,0.7043314,0.9441671]}
]
}
As above, if you pipe the result to a tool like jq
you can see the result pretty printed.
$ cat insert.json | jq
{
"table": "quickstartkdbai",
"rows": [
{
"id": "h",
"vectors": [
0.3927524,
0.5170911,
0.5159796,
0.4066642,
0.1780839,
0.3017723,
0.785033,
0.5347096
]
},
...
{
"type": "splayed",
"columns": [
{"name": "id", "type": "symbol"},
{
"name": "vectors",
"type": "reals",
"vectorIndex": {
"dims": 8,
"type": "flat",
"metric": "L2"
}
}
]
}
Query the table
Use the following command to query data from the table.
The query
function accepts a wide range of arguments to make it easy to filter, aggregate, and sort. Run ?table.query
to see them all.
table.query()
curl -s -H "Content-Type: application/json" localhost:8082/api/v1/data -d '{"table":"quickstartkdbai"}'
As above, if you pipe the result to a tool like jq
you can see the result pretty printed.
curl -s -H "Content-Type: application/json" 13.93.68.168:28082/api/v1/data -d '{"table":"quickstartkdbai"}' | jq ".payload"
[
{
"id": "h",
"vectors": [
0.3503749,
0.7027296,
0.07756515,
0.4045712,
0.5368044,
0.9517315,
0.1455024,
0.2176988
]
},
...
Run similarity search
Search for the nearest neighbours using the following command.
The dimension of input query vectors must match the vector embedding dimensions in the table, defined in schema above.
table.search(vectors=[[0.1, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]], n=3)
curl -s -H "Content-Type: application/json" localhost:8082/api/v1/search -d '{"table":"quickstartkdbai","n":3,"vectors":[[0.1, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]],"distances":"dist"}'
As above, if you pipe the result to a tool like jq
you can see the result pretty printed.
curl -s -H "Content-Type: application/json" 13.93.68.168:28082/api/v1/search -d '{"table":"quickstartkdbai","n":3,"vectors":[[0.1, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]],"distances":"dist"}' | jq '."payload"'
[
[
{
"id": "l",
"vectors": [
0.1692735,
0.6401389,
0.9439349,
0.5964162,
0.2678166,
0.7119821,
0.8890669,
0.6621503
],
"dist": 0.6006493
},
{
"id": "e",
"vectors": [
0.7630598,
0.645776,
...
The closest matching neighbors for the query vector are returned along with the calculation of L2 (Euclidean Distance) similarity.
The search API supports batch querying and filtered search.
Delete table
Use the following command when you want to delete a table.
table.drop()
curl -s -X DELETE localhost:8082/api/v1/config/table/quickstartkdbai
Once you delete a table, you cannot use it again.
In KDB.AI, when you delete a table, the associated index is also removed.
Next steps
Now that you are successfully making indexes with KDB.AI, you can start inserting your own data and analysing it:
Samples
You can also explore our samples on our Learning Hub.