Query Data

This page covers how to perform queries against tables in the KDB.AI database.

Selecting the table to query

Each table in KDB.AI has an associated name. To perform a query, specify the name of the table in which the relevant data is stored. Using Python client you can create a table object from the session. The REST client is more direct with the table name supplied as a field in the JSON payload.

Python

documents = session.database('default').table("documents")

Queries

Obtain the data from the entire table using the following command:

PythonRESTq

documents.query()

curl -s -X POST localhost:8082/api/v2/databases/default/tables/documents/query | jq .

gw(`query;`database`table!(`default;`documents));

Filters

To select a subset of the data, you can apply filters as documented here. These serve as where clauses.

PythonRESTq

    import datetime
    start_time=datetime.datetime(2024,1,10)
    end_time=datetime.datetime(2024,10,10)
    results = table.query(filter=[("within","createdDate",[start_time, end_time]),("<=", "length", 100)])
    print(f'Results: {results}')

curl -s -H "Content-Type: application/json" localhost:8082/api/v2/databases/default/tables/documents/query \
-d '{"table":"documents","filter":[["within","createdDate",["2020.07.10D15:00:00.0000", "2021.07.10D15:00:00.0000"]],["<=", "length", 100]]}'

gw(`query;`database`table`filter!(`default;`documents;((within;`time;(2000.06.15D00:00:00.000000001; 2000.08.15D00:00:00.000000001));(<=;`length;100))));

Processing results

You can return a subset of the columns in the table, reducing the amount of data sent back to the client.

PythonRESTq

documents.query(aggs={"author":"author","content":"content"})

curl -s -X POST -H 'Content-Type: application/json' localhost:8082/api/v2/databases/default/tables/documents/query \
-d '{"aggs":{"author":"author","content":"content"}}' | jq .

gw(`query;`database`table`aggs!(`default;`documents;`author`content!`author`content))

In addition to returning a subset of the columns, you can return aggregated results, group by categorical variables, and sort based on a column name.

PythonRESTq

    aggs = dict()
    aggs['SumLength'] = ('sum','length')

    print(f'Table data:\n\n {table.query()}')
    results = table.query(aggs=aggs, group_by=['author'], sort_columns=['SumLength'])
    print(f'Aggregate data results:\n {results}')

curl -s -H "Content-Type: application/json" localhost:8082/api/v2/databases/default/tables/documents/query \
-d '{"aggs":{"sumLength":["sum","length"]}, "groupBy":["author"], "sortColumns":["sumLength"]}'

gw(`query;`database`table`aggs`groupBy`sortColumns!(`default;`documents;(enlist `sumLength)!(enlist `sum`length);`author;`sumLength))

Supported aggregations

The table below lists all supported aggregation functions.

Function	Description
`all`	Returns the logical 'and' of all values in a set.
`any`	Returns the logical 'or' of all values in a set.
`avg`	Calculates the mean value across the set of matching records.
`count`	Returns the number of records in the current selection.
`dev`	Calculates the standard deviation of a column.
`distinct`	Returns the distinct values from a column.
`first`	Returns the first occurrence of a value. This is useful when performing a group by aggregation.
`last`	Returns the last occurrence of a value. This is useful when performing a group by aggregation.
`max`	Takes the maximum value of a set of records.
`min`	Takes the minimum value of a set of records.
`prd`	Calculates the product of matching records.
`sdev`	Calculates the sample deviation of matching records.
`scov`	Calculates the sample covariance between matching records.
`sum`	Calculates the sum of matching records.
`svar`	Calculates the sample variance of matching records.
`var`	Calculates the variance of matching records.