Send Feedback
Skip to content

Manage Tables in KDB-X DB Service

This page introduces table management in the KDB-X DB Service. It covers how tables are defined, and outlines the key design considerations that affect data organization and query performance.

Table operations

You can perform the following actions on tables in the KDB-X DB Service:

  • List - returns a simple list of table names in the database.

  • Describe - returns schema and metadata, including the table type, column definitions, partition/sorting columns, and table/column descriptions.

  • Create - defines a table with a specified schema and settings. You can also dynamically create tables using the import API. However, this endpoint is recommended for permanent tables as it gives you more control. Tables and columns can be given descriptions.

  • Drop - deletes a table and all associated data. Deletion is permanent.

Refer to the API spec for detailed reference, including examples for all APIs. Sample notebooks demonstrating API usage are bundled with the DB Service repo on GitHub.

List

The list tables endpoint is GET /api/v0/tables. The API returns a simple list of all tables in the database.

Example list

session.listTables[]
session.list_tables()
curl -s -X GET "http://localhost:8080/api/v0/tables" -H "Accept: application/json"

Describe

The describe table endpoint is GET /api/v0/tables/{table}. The endpoint outputs the entire table definition, including columns. The output is ideal for re-using in a create; initial exploratory imports of data using createTable can be used to generate an initial schema that can be refined.

Example describe

session.describeTable["fxquote"]
session.describe_table(table="fxquote")
curl -s -X GET "http://localhost:8080/api/v0/tables/fxquote" -H "Accept: application/json"

Create

The create table endpoint is POST /api/v0/tables/{table}. See the sections below for things to consider while you're designing a table to ensure your queries are as fast as possible.

Example create

// Define columns
fxquoteCols:(`name`type!("trddate";"date");`name`type!("ts";"timestamp");`name`type`attrMem`attrDisk`attrOrd!("sym";"symbol";"grouped";"parted";"parted");`name`type!("bid";"float");`name`type!("ask";"float"))

// Create partitioned table ('fxquote')
session.createTable[`table`type`prtnCol`sortColsDisk`sortColsOrd`columns!("fxquote";"partitioned";"ts";enlist "sym";enlist "sym";fxquoteCols)]
session.create_table(
    table="fxquote",
    type="partitioned",
    prtnCol="ts",
    sortColsDisk=["sym"],
    sortColsOrd=["sym"],
    columns=[
        {"name": "trddate", "type": "date"},
        {"name": "ts", "type": "timestamp"},
        {"name": "sym", "type": "symbol", "attrMem": "grouped", "attrDisk": "parted", "attrOrd": "parted"},
        {"name": "bid", "type": "float"},
        {"name": "ask", "type": "float"},
    ]
)
curl -s -X POST "http://localhost:8080/api/v0/tables/fxquote" \
-H "Accept: application/json" \
-H "Content-Type: application/json" \
-d '{
    "type": "partitioned",
    "prtnCol": "ts",
    "sortColsDisk": ["sym"],
    "sortColsOrd": ["sym"],
    "columns": [
    {"name": "trddate", "type": "date"},
    {"name": "ts", "type": "timestamp"},
    {"name": "sym", "type": "symbol", "attrMem": "grouped", "attrDisk": "parted", "attrOrd": "parted"},
    {"name": "bid", "type": "float"},
    {"name": "ask", "type": "float"}
    ]
}'

Drop

The drop table endpoint is DELETE /api/v0/tables/{table}

Example drop

session.dropTable["instruments"]
session.drop_table(table="instruments")
curl -s -X DELETE "http://localhost:8080/api/v1/tables/instruments" -H "Accept: application/json"

Table types

The KDB-X DB Service supports four types of tables. partitioned tables are the appropriate type for most timeseries data, while basic, splayed, and splayed_mem (fastest, at the expense of memory) are intended for non-timeseries or reference data.

As in a typical KDB-X tick architectures, data is organized in tiers: an in-memory RDB for same-day data, an on-disk IDB for intraday writedowns, and an on-disk HDB for end-of-day writedown.

Type Description
partitioned Data is stored in date partitions, and distributed across RDB, IDB, and HDB. The KDB-X DB Service partitions data into date directories based on a timestamp column in the table.
basic Data is stored as a single file in the IDB directory. The same copy of the data is memory mapped on-disk to the RDB, IDB, and HDB.
splayed Data is stored as a single directory (each column an individual file) in the IDB. The same copy of the data is memory mapped on-disk to the RDB, IDB, and HDB.
splayed_mem Data is stored as a single directory (each column an individual file) in the IDB. The same copy of the data is loaded into memory in RDB, IDB, and HDB.

Refer to tables in the filesystem to learn more about KDB-X table types.

Partitioned tables

Partitioned tables are the appropriate type for time series datasets.

The KDB-X DB Service partitions timeseries data based on a timestamp column. If one is not already present, it must be added to data - for example, by using a postparse dictionary during import. The column name is specified with prtnCol in the API call to create the table. This value is required if the table type is set to partitioned.

The prtnCol is the column used by startTS and endTS when querying the data, and is used to locate the data in the correct partition on disk. Consider this carefully when constructing your schema; it should be the primary datetime column you will use to query the data.

Keyed tables

Keyed tables can be defined with the primaryKeys parameter. Data with matching key(s) overwrites the previous records with the same keys. Keyed tables are useful as last-value stores, or for keeping track of the current state of something.

Column definitions

In the KDB-X DB Service, column definitions require only two fields: name and type. Descriptions are optional, but strongly recommended as useful metadata.

  • name: the column name, restricted on primitives (.Q.id), reserved words date/int, and a special reserved prefix label_
  • type: the data type of the column
  • description: optional text describing the column
  • attrMem, attrOrd, attrDisk: column attributes for each tier

Optimal query performance in the service depends on appropriate selection of column attributes.

Note that all dates and times are considered UTC in the KDB-X DB Service, these can however be queried and converted to other time zones.

Column attributes

Column attributes are used to tune query performance in a KDB-X database. You can set attributes differently depending on the different tiers of your database and you can tune attributes to optimize performance at each tier. Applying attributes slightly increases storage space usage, RAM usage and processing required during ingest.

Note

The sorted attribute is not the same as the sorting setting (sortColsMem, sortColsOrd, sortColsDisk).

The following column attributes are available:

  • no attributes - no attribute applied; filters require a linear scan.
  • sorted - requires sorting - ascending values; enables binary search for filters against the column.
  • parted - requires sorting; maintains an index allowing constant time access to the start of a group of identical values.
  • grouped - does not require sorting; maintains a separate index for the column, identifying the positions of each distinct value. Note that this index can take up significant storage space.
  • unique - requires all values be distinct; allows a constant-time lookup, and is typically used for primary keys.

For more information about attributes, their use, and tradeoffs, refer to the kdb+ documentation.

Attributes are configured per column in the table definition using:

  • attrMem: applied to in-memory data (RDB, real-time tier)
  • attrOrd: applied to intraday on-disk data (IDB tier)
  • attrDisk: applied to persisted on-disk data (HDB and object storage tiers)

It is recommended that you use the parted attribute on the columns most frequently used in filters. Only one column per table can have the sorted or parted attribute. The grouped attribute can be applied to multiple columns and does not require sorting. Note that applying the parted attribute requires sorting, which can be memory-intensive.

Column sorting

Sorting can be configured independently for each storage tier: sortColsMem for the RDB, sortColsOrd for the IDB, and sortColsDisk for the HDB. Each setting accepts a list of columns to sort by. Sorting can improve query performance, but introduces additional processing overhead.

Sorting is not recommended for the RDB, as it is applied on every update. The IDB and HDB configurations (sortColsOrd and sortColsDisk) should generally be aligned.

Column sorting is separate from the sorted attribute applied to a column.

Next steps