Skip to content

Storage Manager configuration

The Storage Manager (SM) takes its configuration from the Assembly Configuration file specified by KXI_ASSEMBLY_FILE environment variable.

SM expects the following sections to be specified in the assembly.

name short name for this assembly tables schemas for the tables operated upon within the assembly (dictionary) mounts mount points for stored data (dictionary) bus configuration of the message bus used for coordination between elements (dictionary) elements.sm SM configuration (dictionary) includes source and tiers

URI schemas

mounts[X].baseURI, and elements.sm.tiers[N].store permit URIs; these may presently use the file:// or s3:// URI schemas. Other schemas may be supported in the future.

Tables

A table schema has the following structure.

key required purpose value
description purpose of the table string
type yes type of the table object
splayed
partitioned
columns yes see columns list
prtnCol yes a timestamp column used for data partitioning string
updTsCol a timestamp column used for data latency monitoring string
blockSize number of rows to keep in-memory before write to disk integer
primaryKeys names of primary key columns string list
sortColsMem names of sort columns (in-memory) string list
sortColsOrd names of sort columns (on-disk IDB) string list
sortColsDisk names of sort columns (on-disk HDB) string list

Columns

A column schema has the following structure.

key required purpose
description purpose of the column
name yes name of the column
type yes type of the column
attrMem column attribute (in-memory)
attrOrd column attribute (on-disk IDB)
attrDisk column attribute (on-disk HDB)
attrObj column attribute (on-disk HDB in object store e.g. S3)
anymap allow mapped lists to nest within other mapped lists as described here

The list of supported column type values is:

boolean  guid  byte  short  int  long  real  float  char   symbol
timestamp  month  date  datetime  timespan  minute  second  time

booleans guids bytes shorts ints longs reals floats string symbols
timestamps months dates datetimes timespans minutes seconds times

or leave blank for a mixed type.

The list of supported column attribute values is: grouped parted sorted unique

Use the grouped attribute for an in-memory column with a lot of repeated values. Use the parted attribute for an on-disk column where common values are adjacent. Use the sorted attribute for an in-memory column with ascending values, typically a time. Use the unique attribute for a column where all items are distinct, typically a primary key.

Attributes are metadata applied to table columns of special form and are often used to speed up query response times. See here for more information.

Bus

SM ingests data from an event stream; a Bus contains the information necessary to subscribe to that stream.

The bus section consists of a dictionary of bus entries. Each entry provides:

key required purpose value
protocol yes protocol of the messaging system rt
custom
topic if protocol is rt subset of messages in this stream that consumers are interested in list
nodes if protocol is custom connection strings to machines or services which can be used for subscribing to this bus hostname:port

Protocol values:

rt       use Insights Reliable Transport (RT)
custom   use a custom solution that complies with RT interface. A custom q code module
         should be loaded from the path given by an environment variable `KXI_RT_LIB`.
         For this protocol, the `nodes` list should contain a single `hostname:port`.

SM loads the bus configuration referenced in source property of sm entry under elements

Mounts

SM migrates data between a hierarchy of tiers, each with its own locality, segmentation format, and rollover configuration. Mounts describe where other services can then access this data.

The Mounts section is a dictionary mapping user-defined names of storage locations to dictionaries with the following fields:

key required purpose value
type yes stream
local
object1
baseURI yes base URI where that data can be mounted by other services string
partition yes partitioning scheme for this mount none
ordinal
date

The full URI for mounting the local on-disk data is <baseURI>/current (current is a symbolic link pointing to a loadable kdb+ database).

Partition values:

none     do not partition; store in arrival order
ordinal  partition by a numeric virtual column which increments according to
         a corresponding storage tier's schedule and resets
         when the subsequent tier (if any) rolls over
date     partition by each table's prtnCol column, interpreted as a date

The following rules apply to mounts section:

  • If specified, a mount of type stream must have partition none, and its baseURI is ignored
  • Exactly one of each local mount with partition ordinal and date is required, and its baseURI must be of the form file://<mount_root> where <mount_root> is an absolute path (e.g. file:///data). This directory is managed by SM from this point on

In the future, SM will allow a setup without an exposed intraday database (i.e. without a local mount with ordinal partition type), this database is still required for internal use by SM, and a separate SM-configuration property will be defined to specify the on-disk location of this database.

SM configuration

Configuration options for SM go in the sm entry of elements:

key required purpose value & default
source yes name of bus entry
tiers yes storage tiers list
enforceSchema whether to enforce table schemas when persisting (with performance penalty; for debugging) boolean
false
disableREST whether to disable the REST interface, leaving only q IPC support boolean
false
disableDiscovery whether to disable registration with discovery boolean
false
chunkSize chunk size used for writing tables integer
500000
sortLimitGB memory limit when sorting splayed tables or partitions on disk, in GB integer
10
waitTm time to wait between connection attempts, in milliseconds integer
250
eodPeachLevel level at which EOD peaches to parallelize HDB table processing list:
part
table
in any combination

Tiers

Tiers describe the locality, segmentation format, and rollover configuration of each storage tier.

A storage tier has the following structure:

key required purpose value & default
name yes
mount yes corresponding mounts entry which determine locality and segmentation format, and also location at which data in the tier may be accessed
store where the tier will physically store data see below
schedule policy for when rollovers should be considered see below
retain policy for how much data should be stored in this tier before it is rolled over into the next tier see below
compression policy for compression of data see below
store

URI describing where this tier will physically store data. If not specified, becomes <baseURI>/data of the corresponding mount (enforced, even if specified, for mounts of type local with partition:ordinal). For multiple tiers within the same mount, there can be only one tier without explicitly specified store. If specified explicitly, store must be outside the mount's baseURI.

schedule

If present, this dictionary contains the following keys.

  • freq: HH:MM:SS Used by the ordinal partition mount to specify length of interval in each ordinal partition. Default 00:10:00.
  • snap: HH:MM:SS Used by the date partition mount to specify when to move data from ordinal to date partition mount. Default 00:00:00.
snap

A snap value of 00:01:00 would allow any late data that arrives in the one minute from 00:00 -> 00:01 belonging to the previous date partition to be saved to that location. Any late data that arrives after 00:01:00 belonging to the previous date partition will be written at the next snap. The data received from 00:00 -> 00:01 belonging to the current date partition will also be saved at this time.

retain

This dictionary may have one or more of the following keys.

  • time: A timespan consisting of a number followed by a unit: {Years,Months,Weeks,Days,Hours,Minutes}, e.g. 2 Years. Data which has been stored for this length of time is rolled over.
  • sizePct: A size as percentage of total storage of corresponding mount, specified as a number from 1 to 100.

If multiple keys are set, they are interpreted in an inclusive-OR fashion.

A mount partitioned as ordinal, or of type stream cannot be used with a storage tier that has a retain policy.

compression

If present, this dictionary contains the following keys.

  • algorithm: Compression algorithm: {none, qipc, gzip, snappy, lz4hc}
  • block: Block size
  • level: Compression level

The compression policy currently applies only to tiers associated with a mount of type:local and partition:date.

Tiers can be categorized according to their locality and segmentation format, which imply the characteristics and governing rules:

Stream based tier

Stream based tier represents the in-memory data that is received between write-down events. It is implicit and need not be specified.

Local-ordinal based tier

There has to always be one tier that corresponds to mount of type local with partition ordinal. However its configuration can be omitted, in which case the frequency defaults to 10 minutes.

Local-date based tier

There can be one or more tiers that correspond to mount of type local with partition date. However when only one tier is used, its configuration can be omitted in which case snap-time defaults to midnight, frequency to 1 day, and retain to infinite.


  1. object type mount is currently not supported by SM. Its use is limited to scenarios where database is read-only (SM is not part of the installation).