Query Data
This page covers how to perform queries against tables in the KDB.AI database.
Selecting the table to query
Each table in KDB.AI has an associated name. To perform a query, specify the name of the table in which the relevant data is stored. Using Python client you can create a table object from the session. The REST client is more direct with the table name supplied as a field in the JSON payload.
documents = session.database('default').table("documents")
Queries
Obtain the data from the entire table using the following command:
documents.query()
curl -s -X POST localhost:8082/api/v2/databases/default/tables/documents/query | jq .
gw(`query;`database`table!(`default;`documents));
Filters
To select a subset of the data, you can apply filters as documented here. These serve as where
clauses.
import datetime
start_time=datetime.datetime(2024,1,10)
end_time=datetime.datetime(2024,10,10)
results = table.query(filter=[("within","createdDate",[start_time, end_time]),("<=", "length", 100)])
print(f'Results: {results}')
curl -s -H "Content-Type: application/json" localhost:8082/api/v2/databases/default/tables/documents/query \
-d '{"table":"documents","filter":[["within","createdDate",["2020.07.10D15:00:00.0000", "2021.07.10D15:00:00.0000"]],["<=", "length", 100]]}'
gw(`query;`database`table`filter!(`default;`documents;((within;`time;(2000.06.15D00:00:00.000000001; 2000.08.15D00:00:00.000000001));(<=;`length;100))));
Processing results
You can return a subset of the columns in the table, reducing the amount of data sent back to the client.
documents.query(aggs={"author":"author","content":"content"})
curl -s -X POST -H 'Content-Type: application/json' localhost:8082/api/v2/databases/default/tables/documents/query \
-d '{"aggs":{"author":"author","content":"content"}}' | jq .
gw(`query;`database`table`aggs!(`default;`documents;`author`content!`author`content))
In addition to returning a subset of the columns, you can return aggregated results, group by categorical variables, and sort based on a column name.
aggs = dict()
aggs['SumLength'] = ('sum','length')
print(f'Table data:\n\n {table.query()}')
results = table.query(aggs=aggs, group_by=['author'], sort_columns=['SumLength'])
print(f'Aggregate data results:\n {results}')
curl -s -H "Content-Type: application/json" localhost:8082/api/v2/databases/default/tables/documents/query \
-d '{"aggs":{"sumLength":["sum","length"]}, "groupBy":["author"], "sortColumns":["sumLength"]}'
gw(`query;`database`table`aggs`groupBy`sortColumns!(`default;`documents;(enlist `sumLength)!(enlist `sum`length);`author;`sumLength))
Supported aggregations
The table below lists all supported aggregation functions.
Function | Description |
---|---|
all |
Returns the logical 'and' of all values in a set. |
any |
Returns the logical 'or' of all values in a set. |
avg |
Calculates the mean value across the set of matching records. |
count |
Returns the number of records in the current selection. |
dev |
Calculates the standard deviation of a column. |
distinct |
Returns the distinct values from a column. |
first |
Returns the first occurrence of a value. This is useful when performing a group by aggregation. |
last |
Returns the last occurrence of a value. This is useful when performing a group by aggregation. |
max |
Takes the maximum value of a set of records. |
min |
Takes the minimum value of a set of records. |
prd |
Calculates the product of matching records. |
sdev |
Calculates the sample deviation of matching records. |
scov |
Calculates the sample covariance between matching records. |
sum |
Calculates the sum of matching records. |
svar |
Calculates the sample variance of matching records. |
var |
Calculates the variance of matching records. |