Schema configuration

This page describes the configuration options available under the Schema Settings tab in the kdb Insights Enterprise Web Interface.

A schema is a description of all of the tables used within a kdb Insights Database, and is comprised of one or more tables and metadata properties. Each table defines a set of columns (sometimes referred to as fields) and metadata information. Each column within a table must have a name and a type associated with it.

Schemas can also be used in pipelines to transform datatypes to match a particular table.

Tables can be added or removed from a schema, but schemas can only be changed when the associated database is torn down.

Performance tuning of your database schema ensures optimal database performance. It is recommended that you use the performance tuning wizard to apply attributes before you deploy a database.

Create a schema

Schemas are created in the Schema Settings tab of the Database screen. This is the first tab opened when you are creating a new database.

Schema overview

A schema has a set of tables corresponding to the data stored in the database.

Every data table in a schema must have at least one column with a Data Type set to Timestamp. See column properties. The Timestamp column represents the time characteristic in the data used for data tier migration.

Configuring the schema involves:

Defining the table properties, including table types.
Defining the data column properties.
Performance tuning the schema to achieve optimal performance.

You can manually add and configure tables and columns in the schema settings screen. However, for large schemas pre-configured in JSON, you can use the code view.

Additional schemas are saved as part of the parent database.

Code view

Schemas can be created in the Code View. These schemas use table properties defined for the YAML schema configuration, encoded as JSON.

Click Code View to open the Schema Code View, as shown below.
- A validation check prevents submission of invalid code.
- Error details appear if you hover over the underlined problem.
- In addition, when editing a property, supported attributes are displayed, as shown below.

Table properties

A table is a definition of a set of data columns (fields) correlated by a common feature set. The definition of a table describes the types and purpose of each column within that table, as well as some usage metadata properties.

Table properties

name	description
Name	The table name. A new schema comes with a default table, table1, this can be changed or deleted. The new name must be unique, and start with a letter and end with an alphanumeric character. The name is case insensitive, and underscore or hyphen can be used to join multiple words. An error is displayed if these conditions are not met.
Primary Keys	Primary keys are used for indicating unique data within a table. This field displays the names of the columns that have been enabled as Primary Key columns in the column properties. Primary Keys can be removed by deleting them here. When new Primary Keys are enabled in column properties they appear here.
Description	Enter a textual description for the table. This can be used to provide an overview of the data collected in the current table.
Type	The type describes how the table is managed on disk. See table types for a full description of the following options: Partitioned: Use this for time-based tables. At least one of your tables must be Partitioned. If there is no partitioned table, the Deploy button is disabled. Splayed: Use this for supplementary data. See Performance tuning for details on choosing the best type for maximum performance.
Partition Data Column	The name of the column within the table to use to partition the content. The column it points to must be of Type timestamp. This value is required when the table Type is Partitioned.
Block Size	This value indicates when data should be written to disk. After this many records are received, data is written to disk. Writing more frequently increases disk IO but uses less memory. If omitted, the default value is 12 million records.
Column Sorting:	You can configure sorting of a table on specified column(s), for each tier of the database. Columns with the Parted attribute usually require the table to be sorted on that column. The sorting options: Real-Time, Intraday, and Historical are described below. - Click on the relevant tier: Real-time, Intraday, or Historical - Click +Add Column Sort to add a column. Click again to add more columns. - Click and drag to change the sort order. - Click the recycle bin to remove a column. See Performance tuning for help on choosing the best sort settings for optimal performance.
Real-time	Specify the sorting rules applied to data in the Real-Time tier (RDB). This tier leverages in-memory storage to manage the most recent data, providing the fastest query performance for immediate analysis.
Intraday	Specify the sorting rules applied to data in the Intraday tier (IDB). This tier stores data that has aged beyond the Real-Time tier but is not yet archived. It facilitates efficient access to high-frequency updates that are flushed from the Real-Time tier every 10 minutes, serving as a bridge to long-term storage.
Historical	Specify the sorting rules applied to data in the Historical tier (HDB). This tier retains data older than the current day, organized into daily partitions to optimize performance for long-term analysis and reporting. Data remains in this tier for a defined retention period before potential migration to an object storage tier.

Sorting in real-time affects performance

Adding a real-time sort means it is applied on every update and has significant negative performance impact on ingestion latency for real-time data.

Keyed tables and object storage

Keyed tables update values for records that have matching keys, except for data that has been migrated to an object storage tier. Once data has migrated to an object storage tier, it cannot be changed.

Table Types

kdb Insights Enterprise web interface supports two table types:

Partitioned: Data is stored in date partitions, with each partition being a directory on disk that holds a time slice of data. These partitions are distributed across RDB, IDB, and HDB. The kdb+ database structure requires each database schema to have at least one partitioned table. This type of table is recommended where the table contains more than 100 million rows or is growing steadily. This is the table type used in most cases.
Splayed: Data is stored as a single directory (each column represents an individual file therein) in the IDB directory. The same copy of the data is memory mapped in RDB, IDB, and HDB.

With a splayed table, a query loads in memory only the files of the column(s) it requires.

This table type is suitable for tables up to 100 million rows, and is recommended for reference or non-timeseries data.

If a splayed table grows, holds more than 100 million records, or has columns that exceed the maximum size of a vector in memory, consider partitioning it.

Column Properties

Columns describe the distinct features within a table with a unique name and datatype. Each row of data in the system for a given table must have a value for each column within a table.

Column properties

Click Add Data Column to add a new column or use the options in the Column Actions drop-down.

name	description
Name	The name of the column in the table. The name must be unique within a single table and should be a single word that represents the purpose of the data within the column. This name must conform to kdb+ naming restrictions and should not use any reserved kdb+ primitives. See The .Q Namespace for details.
Data Type	The type of the column. See Types below for more details.
Compound	Checking this field changes the type to a compound type for the chosen cells Data Column. For example checking this converts int to ints. The Can be compounded column in the Types table indicates the types that can be compounded.
Primary Key	Checking this field keys the table on this data column ensuring that each of the values for this data column are unique. When checked: this data column name is added to the set of Primary Keys keys in the Table Properties for this table. the table is keyed by these columns and any updates that have matching keys update records with matching keys. this table/column can be selected as the Foreign Key for another table. Read more about Foreign Key further down this table.
RDB Attribute	Attributes to apply when the table is in memory. This typically only applies to the RDB tier. See Attributes details below for more information.
IDB Attribute (Ordinal Partitioning)	Attributes to apply when the table is ordinal partitioned. This typically only applies to the IDB tier. See Attributes details below for more information.
HDB Attribute (Temporal Partitioning)	Attributes to apply when the table is partitioned on disk. This typically only applies to the HDB tier. See Attributes details below for more information.
Description	A textual description of the column for documentation purposes.
Foreign Key	A foreign key refers to the Primary Key data column in a different table in the database. This allows for the appendage of the other tables data for queries. Double-click on this field to display a list of Primary Keys that can be selected. These are listed in the format table.column, where the table is another table in the same schema and column is a data column in that table. In the following example there are 2 tables.table2 has a data column col2 which is enabled as a primary key. table1 has a Foreign Key for the time column set to table2.col2. If the Data Type of the selected Foreign Key is different to that of selected data column it is updated to match the type of the selected Primary Key. A message is displayed, at the bottom of the screen. If the type is changed, after it has been assigned, the type of the data column referencing the Foreign Key is updated, and a message is displayed. If the selected Foreign Key is deleted or its Primary Key check is disabled the value in Foreign Key field is the removed.

Types

Column types support all of the base datatypes supported in kdb+. For columns containing only atom values, use the singular form of the data type. To indicate that a column is intended to support vectors of a given type, use the plural of the type name.

Partition tables need a timestamp column

Partition tables require a timestamp column to order data and to propagate it through database tiers. The Partition Column column in the table must be set to the temporal field that will be used for ordering data.

Type example

For a column of long numbers, use the type Long.

x
-
0
1
2

For a column of vectors of long integers, use Longs.

name	type	description	can be compounded
Any	`0h`	Allows for mixed types within a single column. Mixed types are useful for having mixed data or lists of strings but do have a performance trade-off and should be used sparingly.	no
Boolean	`1h`	True or false values.	yes
Guid	`2h`	Unique identifiers in the form of `00000000-0000-0000-0000-000000000000`.	yes
Byte	`4h`	Individual byte values on the range of `0x00` to `0xFF`.	yes
Short	`5h`	A 2 byte short integer value in the range of `-32767h` to `32767h`.	yes
Int	`6h`	A 4 byte integer value in the range of `-2147483647` to `2147483647`	yes
Long	`7h`	An 8-byte integer value with the maximum unsigned integer byte range.	yes
Real	`8h`	A 4-byte floating point value.	yes
Float	`9h`	An 8-byte floating point value.	yes
Char	`10h`	A byte value representing a character value.	yes
Symbol	`11h`	A symbol is a string of characters that is stored as an enumeration on disk. Reserve this datatype for repeated character values. Using this for unique character data incurs significant query performance overhead.	yes
Timestamp	`12h`	Stores a date and time as the number of nanoseconds since 2000.01.01. All partitioned tables must have at least one field that is a timestamp for partitioning its data on.	yes
Month	`13h`	Represents a month and year value without a day.	yes
Date	`14h`	Represents a month, year and day value as a date.	yes
Datetime	`15h`	A deprecated format for storing temporal values that uses a float as its underlying data structure. When using this datatype, it is possible to have two datetimes point to the same day but not be equivalent due to float precision. Use a timestamp over datetime whenever possible.	yes
Timespan	`16h`	Stores a time duration as a number of nanoseconds.	yes
Minute	`17h`	Stores hours and minutes of a timestamp.	yes
Second	`18h`	Stores hours, minutes and seconds of a timestamp.	yes
Time	`19h`	Stores hours, minutes, seconds and sub-seconds to nanosecond precision.	yes
String	`inbound`	Stores a text as a sequence of characters.	no

The Can be compounded column in the table above indicates which types can be compounded by ticking the Compound checkbox for the data column, in Schema Settings.

Attributes

Applying attributes to your tables can significantly improve their performance. RDB/IDB/HDB Attributes, defined in column properties are a key performance tuning point for modifying the performance of queries in a kdb+ Insights database. Attributes can be set differently depending on the different tiers of your database and can be tuned to optimize performance at each tier.

You can set the attributes manually or use the performance tuning wizard to configure them for optimal performance.

The following column attributes are available:

None - requires a linear scan for filters against the field
Sorted - ascending values - allows binary search for filters against the field.
Parted - (requires sorted) - maintains an index allowing constant time access to the start of a group of identical values.
It is recommended that the Parted attribute is used on the column you need to query against most often. You can have multiple fields with a parted attribute if there are multiple fields frequently used for filtering. Note that setting the parted attribute requires a significant amount of memory when constructing the field query index at write down. Ensure that enough memory is allocated to the Storage Manager for a full day's worth of data for the largest field that has this attribute set.
Grouped - (does not require sorted) maintains a separate index for the field, identifying the positions of each distinct value. note - this index can take up significant storage space.
Unique - (requires all values be distinct) allows a constant-time lookup - typically used for primary keys

More information about attributes, their use, and trade-offs is available in the kdb+ documentation.

Within the table configuration, an attribute property can be set on particular columns within the table. There are three levels of attributes that can be set:

attrMem - The memory level attribute applies when data is in memory which is used for any real-time tiers (RDB) in your database configuration.
attrOrd - The ordinal level attribute is used for data that is stored intermittently throughout the day. This applies to the IDB tier of a database.
attrDisk - The disk attribute applies to all disk tiers including the HDB and subsequent tiers such as object storage.

Restrictions on adding attributes to compound types

a) Restrictions on the `parted` attribute

The parted attribute (p) is incompatible with compound types (e.g., columns storing lists) because it relies on scalar ordering. This attribute requires the column to be sorted, but compound types lack a defined scalar order. Attempting to apply parted to a compound column results in a type error:

td: ([] col1: 1 2 3; col2: (`a`a; `b`b; `c`c))
update col2: `p#col2 from td  // Error: 'type

Even if applicable, the parted attribute provides little benefit for compound types, as queries typically do not leverage partition indexing effectively for such data.

b) General limitations of attributes on compound types

While some attributes like grouped (g) and sorted (s) can be applied to compound types, doing so often has limited or no practical benefit due to:

Inefficient indexing: Attributes treat each list as a single value, limiting optimization.
High memory overhead: Indices for attributes like grouped can consume significant memory for columns with high variability.
Exact matching only: Operations on compound types require exact matches, which limits use cases.

Recommendations

Avoid attributes on compound types: Use query-specific functions like group for filtering or grouping.
Flatten data: Split compound types into scalar columns for better compatibility with attributes.
Formatting compound data: Combine elements into a single scalar value (e.g., concatenated symbols) if attributes are required.

By understanding these limitations, you can optimize schema design while avoiding unnecessary complexity.

Data column actions

Open the column actions menu by either clicking on Data Column Actions or right-clicking on a column.

This provides you with the following options for updating the data columns in your schema:

Add Data Column Above or Add Data Column Below to add new columns.
Move to Top or Move to Bottom - to move columns.
Clone Data Column(s) - to clone data column(s) and their properties.
Remove Columns - to delete data column(s).

Delete a table

To delete a table from a database schema:

In the left-hand menu, click on the name of the database containing the table you want to remove and select the Schema Settings tab.
Click the tab corresponding to the table you want to delete. For example, to delete the crime table, click on the crime tab, as shown below.
Click Delete Table.
A Confirm Delete dialog opens, showing the name of table for deletion. Click Delete to proceed.
Click Save to mark the table for deletion.

If the database is deployed Save is disabled. You must tear down and then save your changes.

The change is saved to the package when you click Save but only takes effect after the database is restarted (when the package is redeployed) and a schema conversion completes.

See Schema conversion progress indicator for details on how to track the progress.

Performance tuning

Performance tuning aims to help you achieve optimal database performance. By configuring schema attributes correctly, you can prevent issues with query performance or lagging tables when scrolling large data sets.

The Performance Tuning wizard is designed to guide you in configuring the schema for optimal performance by providing recommendations on attribute settings.

If you attempt to deploy a database without first conducting performance tuning, a warning is displayed.

Start performance tuning

Performance tuning is the process of applying attributes to describe your data. When applied to columns, attributes act like shortcuts for querying your data and are a must-have if you want to unlock the full speed and power of kdb+.

In the Schema settings tab, click Start Performance Tuning to open the Performance Tuning wizard.

Click on the appropriate Table Type. The options available change depending on the table type selected. Click on a tab below to follow the steps for the selected table type.

PartitionedSplayed

The following screenshot shows the partitioned options, described in the table below.

Performance Tuning Partitioned

Option	Action	Description
1. Table Type	Click Time Series(Partitioned).	Storing data this way drastically improves query performance and is best suited to large, complex, time series data. This is the default table type.
2. Data Partitioning	Select this table's primary time column. This must be a timeseries data type.	Although kdb+ excels at managing data in memory, as datasets expand, individual columns may surpass memory constraints. To optimize performance, we partition data based on time. It's essential to partition the primary time-related column, typically the one used for filtering queries.
Data Filtering & Grouping	Select a column that is typically used for filtering or grouping data.	Each table in kdb+ serves a specific purpose, often filtered, grouped, or aggregated by a particular column. Identifying this key column allows kdb+ to create a dedicated index, significantly enhancing performance. To enable performance tuning, ensure there is at least one column with a Guid, Integer, Long, or Symbol data type. If none of these data types are present, Preview Changes is disabled.

The following screenshot shows the splayed options, described in the table below.

Performance Tuning Partitioned

Option	Action	Description
1. Table Type	Click Reference Data(Splayed).	This is generally reserved for non-time series reference tables with under 100m rows of data.
2. Primary Column	Select the primary column for this table.	Each table in kdb+ serves a specific purpose, often filtered, grouped, or aggregated by a particular column. Identifying this key column allows kdb+ to create a dedicated index, significantly enhancing performance.

If you set a table to Splayed, at least one other table in your database must be partitioned.

Click Preview Changes. The recommended settings displayed are based on your responses in the previous sections. This includes attributes settings designed to help improve the performance for your table.
Click Apply Changes to accept these settings. Alternatively, click Back to review the settings, or Cancel to return to the Schema Settings screen.

When you apply the changes the attributes are updated on the Schema Settings screen. The following screenshot shows the recommended attribute values applied in the schema settings.

Modifying a schema

You can modify a schema, changing table properties, data column properties or adding/removing columns.

Before you attempt to modify a schema the schema must not be in use. Therefore if you have a deployed schema it must be torn down before modifying the schema.

The database is unavailable, for either write down or query, for the duration of the schema conversion

Backup before converting a database

We recommend that you always backup your database before applying schema changes. See backup for details.

Stop Publishing

The Storage Manager no longer accepts any new data while a schema conversion is taking place. When performing a schema upgrade on a large database, this process can take some time to finish. This is especially true when an object storage tier is used. During a schema conversion, it is recommended that all data publishers are stopped until the schema conversion is complete. This avoids accumulating back pressure on any streams in the system. Once the schema conversion is complete, Storage Manager processes any messages that were buffered in the stream while the conversion was taking place.

After you have made any required updates to the schema, and you Save and Deploy the updated database a schema conversion progress indicator may be displayed.

Schema conversion progress indicator

When you modify a schema and deploy the database, if the changes take longer than 10 seconds to implement, a Schema Conversion Progress Indicator is displayed. This is a useful tool for monitoring the progress of schema conversions, especially for large data sets which require a significant amount of time.

The indicator is visible above the settings on the Database screen and displays a notification bar stating Applying schema change.
A status bar increments showing the number of updates completed.

Schema Status Indicator

The status of the conversion is also visible when you hover over the database name on the left-hand side, showing the status of components being deployed.

Storage Status Indicator

You can also access information on the conversion by examining logs through Diagnostics. The following screenshot shows an example of schema conversion logs. Filtering the search using Running Step or Processing Partition gives the best results.

Schema conversion logs

Schema modifications

Schema modifications are supported for package-based deployments; export the database, modify the schema, then push and deploy it using the kdb Insights CLI.

Schema configuration

Create a schema

Code view

Table properties

Table Types

Column Properties

Types

Attributes

Restrictions on adding attributes to compound types

a) Restrictions on the parted attribute

b) General limitations of attributes on compound types

Recommendations

Data column actions

Delete a table

Performance tuning

Start performance tuning

Modifying a schema

Schema conversion progress indicator

Further reading

a) Restrictions on the `parted` attribute