How to Query Data in KDB.AI
This page covers how to perform queries against tables in the KDB.AI database.
Selecting the table to query
Each table in KDB.AI has an associated name. To perform a query, specify the name of the table in which the relevant data is stored. Using Python client you can create a table object from the session. The REST client is more direct with the table name supplied as a field in the JSON payload.
documents = session.database('default').table("documents")
Queries
Obtain the data from the entire table using the following command:
documents.query()
curl -s -X POST localhost:8081/api/v2/databases/default/tables/documents/query | jq .
gw(`query;`database`table!(`default;`documents));
Filters
To select a subset of the data, you can apply filters as documented here. These serve as where
clauses.
import datetime
start_time=datetime.datetime(2024,1,10)
end_time=datetime.datetime(2024,10,10)
results = table.query(filter=[("within","createdDate",[start_time, end_time]),("<=", "length", 100)])
print(f'Results: {results}')
curl -s -H "Content-Type: application/json" localhost:8081/api/v2/databases/default/tables/documents/query \
-d '{"table":"documents","filter":[["within","createdDate",["2020.07.10D15:00:00.0000", "2021.07.10D15:00:00.0000"]],["<=", "length", 100]]}'
gw(`query;`database`table`filter!(`default;`documents;((within;`time;(2000.06.15D00:00:00.000000001; 2000.08.15D00:00:00.000000001));(<=;`length;100))));
Processing results
You can return a subset of the columns in the table, reducing the amount of data sent back to the client.
documents.query(aggs={"author":"author","content":"content"})
curl -s -X POST -H 'Content-Type: application/json' localhost:8081/api/v2/databases/default/tables/documents/query \
-d '{"aggs":{"author":"author","content":"content"}}' | jq .
gw(`query;`database`table`aggs!(`default;`documents;`author`content!`author`content))
In addition to returning a subset of the columns, you can return aggregated results, group by categorical variables, and sort based on a column name.
aggs = dict()
aggs['SumLength'] = ('sum','length')
print(f'Table data:\n\n {table.query()}')
results = table.query(aggs=aggs, group_by=['author'], sort_columns=['SumLength'])
print(f'Aggregate data results:\n {results}')
curl -s -H "Content-Type: application/json" localhost:8081/api/v2/databases/default/tables/documents/query \
-d '{"aggs":{"sumLength":["sum","length"]}, "groupBy":["author"], "sortColumns":["sumLength"]}'
gw(`query;`database`table`aggs`groupBy`sortColumns!(`default;`documents;(enlist `sumLength)!(enlist `sum`length);`author;`sumLength))
Supported aggregations
The table below lists all supported aggregation functions.
Function | Description |
---|---|
all |
Returns the logical 'and' of all values in a set. |
any |
Returns the logical 'or' of all values in a set. |
avg |
Calculates the mean value across the set of matching records. |
count |
Returns the number of records in the current selection. |
dev |
Calculates the standard deviation of a column. |
distinct |
Returns the distinct values from a column. |
first |
Returns the first occurrence of a value. This is useful when performing a group by aggregation. |
last |
Returns the last occurrence of a value. This is useful when performing a group by aggregation. |
max |
Takes the maximum value of a set of records. |
min |
Takes the minimum value of a set of records. |
prd |
Calculates the product of matching records. |
sdev |
Calculates the sample deviation of matching records. |
scov |
Calculates the sample covariance between matching records. |
sum |
Calculates the sum of matching records. |
svar |
Calculates the sample variance of matching records. |
var |
Calculates the variance of matching records. |
Next Steps
Now that you're familiar with querying, move on to the following:
- Try querying and fuzzy filtering in our sample project.
- Review our articles on metadata filtering and fuzzy filtering for more context.
- Watch our video to learn more about querying in vector databases.