How to Query Data in KDB.AI

This page covers how to perform queries against tables in the KDB.AI database.

Selecting the table to query

Each table in KDB.AI has an associated name. To perform a query, specify the name of the table in which the relevant data is stored. Using Python client you can create a table object from the session. The REST client is more direct with the table name supplied as a field in the JSON payload.

Python

documents = session.database('default').table("documents")

Queries

Obtain the data from the entire table using the following command:

PythonRESTq

documents.query()

curl -s -X POST localhost:8081/api/v2/databases/default/tables/documents/query | jq .

gw(`query;`database`table!(`default;`documents));

Filters

To select a subset of the data, you can apply filters as documented here. These serve as where clauses.

PythonRESTq

    import datetime
    start_time=datetime.datetime(2024,1,10)
    end_time=datetime.datetime(2024,10,10)
    results = table.query(filter=[("within","createdDate",[start_time, end_time]),("<=", "length", 100)])
    print(f'Results: {results}')

curl -s -H "Content-Type: application/json" localhost:8081/api/v2/databases/default/tables/documents/query \
-d '{"table":"documents","filter":[["within","createdDate",["2020.07.10D15:00:00.0000", "2021.07.10D15:00:00.0000"]],["<=", "length", 100]]}'

gw(`query;`database`table`filter!(`default;`documents;((within;`time;(2000.06.15D00:00:00.000000001; 2000.08.15D00:00:00.000000001));(<=;`length;100))));

Processing results

You can return a subset of the columns in the table, reducing the amount of data sent back to the client.

PythonRESTq

documents.query(aggs={"author":"author","content":"content"})

curl -s -X POST -H 'Content-Type: application/json' localhost:8081/api/v2/databases/default/tables/documents/query \
-d '{"aggs":{"author":"author","content":"content"}}' | jq .

gw(`query;`database`table`aggs!(`default;`documents;`author`content!`author`content))

In addition to returning a subset of the columns, you can return aggregated results, group by categorical variables, and sort based on a column name.

PythonRESTq

    aggs = dict()
    aggs['SumLength'] = ('sum','length')

    print(f'Table data:\n\n {table.query()}')
    results = table.query(aggs=aggs, group_by=['author'], sort_columns=['SumLength'])
    print(f'Aggregate data results:\n {results}')

curl -s -H "Content-Type: application/json" localhost:8081/api/v2/databases/default/tables/documents/query \
-d '{"aggs":{"sumLength":["sum","length"]}, "groupBy":["author"], "sortColumns":["sumLength"]}'

gw(`query;`database`table`aggs`groupBy`sortColumns!(`default;`documents;(enlist `sumLength)!(enlist `sum`length);`author;`sumLength))

Supported aggregations

The table below lists all supported aggregation functions.

Function	Description
`all`	Returns the logical 'and' of all values in a set.
`any`	Returns the logical 'or' of all values in a set.
`avg`	Calculates the mean value across the set of matching records.
`count`	Returns the number of records in the current selection.
`dev`	Calculates the standard deviation of a column.
`distinct`	Returns the distinct values from a column.
`first`	Returns the first occurrence of a value. This is useful when performing a group by aggregation.
`last`	Returns the last occurrence of a value. This is useful when performing a group by aggregation.
`max`	Takes the maximum value of a set of records.
`min`	Takes the minimum value of a set of records.
`prd`	Calculates the product of matching records.
`sdev`	Calculates the sample deviation of matching records.
`scov`	Calculates the sample covariance between matching records.
`sum`	Calculates the sum of matching records.
`svar`	Calculates the sample variance of matching records.
`var`	Calculates the variance of matching records.

Next Steps

Now that you're familiar with querying, move on to the following:

Try querying and fuzzy filtering in our sample project.
Review our articles on metadata filtering and fuzzy filtering for more context.
Watch our video to learn more about querying in vector databases.