Query Data
This section covers how to perform queries against tables in the KDB.AI database.
Selecting the table to query
Each table in KDB.AI has an associated name. To perform a query, specify the name of the table in which the relevant data is stored. Using Python client you can create a table object from the session. The REST client is more direct with the table name supplied as a field in the JSON payload.
documents = session.table("documents")
Queries
Obtain the data from the entire table using the following command:
documents.query()
curl -s -H "Content-Type: application/json" localhost:8082/data -d '{"table":"documents"}'
Filters
To select a subset of the data, you can apply filters as documented here. These serve as where
clauses.
import datetime
start_time=datetime.datetime(2024,1,10)
end_time=datetime.datetime(2024,10,10)
results = table.query(filter=[("within","createdDate",[start_time, end_time]),("<=", "length", 100)])
print(f'Results: {results}')
curl -s -H "Content-Type: application/json" localhost:8082/data \
-d '{"table":"documents","filter":[["within","createdDate",["2020.07.10D15:00:00.0000", "2021.07.10D15:00:00.0000"]],["<=", "length", 100]]}'
Processing results
You can return a subset of the columns in the table, reducing the amount of data sent back to the client.
documents.query(aggs=["author","content"])
curl -s -H "Content-Type: application/json" localhost:8082/data \
-d '{"table":"documents",agg=["author","content"]}'
In addition to returning a subset of the columns, you can return aggregated results, group by categorical variables, and sort based on a column name.
results = table.query(aggs=[('sumLength','sum','length')], group_by=['author'], sort_by=['sumLength'])
print(f'Aggregate data results: {results}')
curl -s -H "Content-Type: application/json" localhost:8082/data \
-d '{"table":"documents",
agg=[["sumLength","sum","length"]],
groupBy=["author"],sortCols=["sumLength"]}'
Supported aggregations
The table below lists all supported aggregation functions.
Function | Description |
---|---|
all |
Returns the logical 'and' of all values in a set. |
any |
Returns the logical 'or' of all values in a set. |
avg |
Calculates the mean value across the set of matching records. |
count |
Returns the number of records in the current selection. |
dev |
Calculates the standard deviation of a column. |
distinct |
Returns the distinct values from a column. |
first |
Returns the first occurrence of a value. This is useful when performing a group by aggregation. |
last |
Returns the last occurrence of a value. This is useful when performing a group by aggregation. |
max |
Takes the maximum value of a set of records. |
min |
Takes the minimum value of a set of records. |
prd |
Calculates the product of matching records. |
sdev |
Calculates the sample deviation of matching records. |
scov |
Calculates the sample covariance between matching records. |
sum |
Calculates the sum of matching records. |
svar |
Calculates the sample variance of matching records. |
var |
Calculates the variance of matching records. |