Skip to content

Storage Manager configuration

Assembly configuration file (AC)

A set of services collectively making up a data ingestion, storage, and access pipeline is called an Assembly.

To aid in using them together as part of an assembly, KXI components share a common configuration file format, called an Assembly Configuration file (AC). An AC is a YAML document read from a path specified for each service by the KXI_ASSEMBLY_FILE environment variable.

The Storage Manager expects the following sections in an AC.

name short name for this assembly description purpose of the assembly (optional) tables schemas for the tables operated upon within the assembly (dictionary) mounts mount points for stored data (dictionary) bus configuration of the message bus used for coordination between elements (dictionary) elements services that should run within the assembly, and any configuration they each require (dictionary)

URI schemas

mounts[X].uri, elements.sm.source, and elements.sm.tiers[N].store permit URIs; these may presently use the file:// or s3:// URI schemas. Other schemas may be supported in the future.

Tables

A table schema has the following structure.

key reqd purpose value
description purpose of the table string
type yes splayed
partitioned
primaryKeys names of primary key columns list
prtnCol column to be used for storage partitioning string
shards shard count integer
partitions partition count integer
blockSize block size integer
updTsCol name of the arrival timestamp column string
columns yes column schemas list

A column schema has the following structure.

key reqd purpose
name yes name of the column
description purpose of the column
type yes q type name
foreign foreign key into another table in this assembly in the form table.column
attrMem column attribute when stored in memory
attrDisk column attribute when stored on disk
attrOrd column attribute when stored on disk with an ordinal partition scheme
attrObj column attribute when stored in object store (e.g. S3)

Mounts

The Storage Manager migrates data between a hierarchy of tiers, each with its own locality, segmentation format, and rollover configuration. Mounts describe where other services can then access this data.

The Mounts section is a dictionary mapping user-defined names of storage locations to dictionaries with the following fields:

key reqd purpose value
type yes stream
local
object
uri yes URI where that data can be mounted by other services string
partition yes partitioning scheme for this mount none
ordinal
date

Partition values:

none     do not partition; store in arrival order
ordinal  partition by a numeric virtual column which increments according to
         a corresponding storage tier's schedule and resets
         when the subsequent tier (if any) rolls over
date     partition by each table's prtnCol column, interpreted as a date
  • A mount of type stream must have partition none
  • A mount of type local must have partition ordinal or date, and its URI must be of the form <mount_root>/current, where the <mount_root> directory is managed by the Storage Manager

Bus

The Storage Manager ingests data from an event stream; a Bus contains the information necessary to subscribe to that stream.

The bus section consists of a dictionary of bus entries. Each entry provides:

key reqd purpose value
protocol yes protocol of the messaging system custom
topic subset of messages in this stream that consumers are interested in list
nodes yes connection strings to machines or services which can be used for subscribing to this bus hostname:port

† Currently, the only valid protocol is custom. It means custom q code should be loaded from the path given by an environment variable KXI_RT_LIB. For this protocol, the nodes list should contain a single hostname:port.

Elements

Assemblies coordinate a number of processes and/or microservices, which we call elements of the assembly. The elements section provides configuration details only relevant to specific services.

Configuration options for the Storage Manager go in the sm entry of elements:

key reqd purpose value & default
tiers yes storage tiers list
enforceSchema whether to enforce table schemas when persisting (with performance penalty; for debugging) boolean
false
disableREST whether to disable the REST interface, leaving only q IPC support boolean
false
disableDiscovery whether to disable registration with discovery boolean
false
chunkSize chunk size used for writing tables integer
500000
sortLimitGB memory limit when sorting splayed tables or partitions on disk, in GB integer
10
waitTm time to wait between connection attempts, in milliseconds integer
250
eodPeachLevel level at which EOD peaches to parallelize HDB table processing list:
part
table
in any combination

A storage tier has the following structure:

key reqd purpose value & default
name yes
store where the tier will physically store data URI
uri field of the corresponding mount
mount yes corresponding mounts entry at which data in the tier may be accessed
schedule policy for when rollovers should be considered see below
retain policy for how much data should be stored in this tier before it is rolled over into the next tier see below
compression policy for compression of data see below
schedule

If present, this dictionary contains the following keys.

  • freq: Timespan, in q notation. How often should this tier roll data over into the next tier? For example, 00:10:00 means roll over every 10 minutes.
  • snap: Time, in q notation. At what whole multiples of time should rollovers be scheduled? For example, 01:00:00 means roll over at the beginning of an hour.
retain

This dictionary may have one or more of the following keys.

  • time: A timespan consisting of a number followed by a unit: {Years,Months,Weeks,Days,Hours,Minutes}, e.g. 2 Years. Data which has been stored for this length of time is rolled over.
  • sizePct: A size as percentage of total storage of corresponding mount, specified as a number from 1 to 100.
  • name: String used to refer to a particular tier.
  • store: URI describing where this tier will physically store data. If not specified, based on the uri field of the corresponding mount (enforced, even if specified, for mounts of type local with partition:ordinal): for uri:<mount_root>/current, the effective store will be <mount_root>/data.

If multiple keys are set, they are interpreted in an inclusive-OR fashion.

A mount partitioned as ordinal, or of type stream cannot be used with a storage tier that has a retain policy.

compression

If present, this dictionary contains the following keys.

  • algorithm: Compression algorithm: {none, qipc, gzip, snappy, lz4hc}
  • block: Block size
  • level: Compression level

The compression policy currently applies only to tiers associated with a mount of type:local and partition:date.