Skip to content

Assembly Configuration

Assembly configuration comes in 2 flavors: the first is the format as consumed by KXI components. The second is a version embedded in a Kubernetes custom resource document (CRD) and contains KXI-Operator specific keys.

This documents describes the Assembly Configuration in the format consumed by KXI components. Information about the assembly CRD can be found here.

An assembly configuration is a machine-readable description of the structure of a dataset, its life cycle, and the services that operate upon it. This description is used by KXI services to self-configure and coordinate amongst themselves, and also provides room for user extension.

KXI services typically load their assembly configuration from file specified by the KXI_ASSEMBLY_FILE environment variable. This file is represented in the YAML format, which allows for hierarchically-structured data, future extension, and inline comments.

An assembly has the following top-level structure:

name short name for this assembly (required) description purpose of the assembly (optional) labels user defined keys and values used for representing the purview of the assembly (optional) tables schemas for the tables operated upon within the assembly (dictionary) mounts mount points for stored data (dictionary) bus configuration of the message bus used for coordination between elements (dictionary) elements services that should run within the assembly, and any configuration they each require (dictionary) overrides overrides to base values (optional)

This document focuses on the top level sections mentioned above. The components that typically would be under elements are described in respective documentation.

Labels

Labels are used to define the purview for DA services contained within the assembly. That is, the data that it grants access to. If using the KX Insights Service Gateway, these are the values reported as the DAP's purview (see the Service Gateway page).

Below are some examples.

Example 1 - Provides FX data for America.

labels:
    region: amer
    assetClass: fx

Example 2 - Provides electrical, weekly billing for residential customers.

labels:
    sensorType: electric
    clientType: residential
    billing: weekly

Tables

Table schemas describe the metadata and columns of tables.

A table schema has the following structure:

key required purpose value
description purpose of the table string
type yes splayed
partitioned
oldName old name of the table, for automatic renaming string
primaryKeys names of primary key columns string list
prtnCol column to be used for storage partitioning string
shards shard count integer
slices slice count integer
hashCol column to hash for sharding string
partitions partition count integer
blockSize block size for memory/disk manipulation integer
updTsCol name of the arrival timestamp column string
sortColsMem names of sort columns (in-memory) string list
sortColsOrd names of sort columns (on-disk IDB) string list
sortColsDisk names of sort columns (on-disk HDB) string list
columns yes column schemas (see below) list

A column schema has the following structure:

key required purpose
name yes name of the column
description purpose of the column
type yes q type name
oldName old name of the column, for automatic renaming
foreign foreign key into another table in this assembly in the form table.column
attrMem column attribute when stored in memory
attrDisk column attribute when stored on disk
attrOrd column attribute when stored on disk with an ordinal partition scheme
attrObj column attribute when stored in object store (e.g. S3)

The list of supported column type values is:

boolean  guid  byte  short  int  long  real  float  char   symbol
timestamp  month  date  datetime  timespan  minute  second  time

booleans guids bytes shorts ints longs reals floats string symbols
timestamps months dates datetimes timespans minutes seconds times

or leave blank for a mixed type.

The list of supported column attribute values is: grouped parted sorted unique

Use the grouped attribute for an in-memory column with a lot of repeated values. Use the parted attribute for an on-disk column where common values are adjacent. Use the sorted attribute for an in-memory column with ascending values, typically a time. Use the unique attribute for a column where all items are distinct, typically a primary key.

Attributes are metadata applied to table columns of special form and are often used to speed up query response times. See here for more information.

Mounts

Assemblies store data in multiple places. The KXI Storage Manager (SM) component migrates data between a hierarchy of "tiers", each with its own locality, segmentation format, and rollover configuration. Other components might use entries in this section to coordinate other forms of data storage and access.

The Mounts section is a dictionary mapping user-defined names of storage locations to dictionaries with the following fields:

key required purpose value
type yes stream
local
object1
baseURI yes base URI where that data can be mounted by other services string
partition yes partitioning scheme for this mount none
ordinal
date
sym (object storage only) a file:// URI or object storage URI path to a sym file string
par (object storage only) a file:// URI or object storage URI to a par.txt file string
storageURI (object storage only) an object storage URI that points to a kdb+ database string

The full URI for mounting the local on-disk data is <baseURI>/current (current is a symbolic link pointing to a loadable kdb+ database).

Partition values:

none     do not partition; store in arrival order
ordinal  partition by a numeric virtual column which increments according to
         a corresponding storage tier's schedule and resets
         when the subsequent tier (if any) rolls over
date     partition by each table's prtnCol column, interpreted as a date

  • A mount of type stream must have partition none
  • A mount of type local must have partition ordinal or date, and its URI must be of the form <mount_root>/current, where the <mount_root> directory is managed by the Storage Manager

Bus

The Bus provides information about whatever EMS-like system (or systems) is available to elements within this assembly for communication.

The bus section consists of a dictionary of bus entries. The names internal and external are suggested for a bus used for communication within the assembly, and communication with the outside world (perhaps other assemblies), but assemblies may contain further entries for user-defined purposes.

Each bus entry provides:

key required purpose value
protocol yes protocol of the messaging system rt
custom
topic subset of messages in this stream that consumers are interested in list
nodes connection strings to machines or services which can be used for subscribing to this bus hostname:port

Protocol values:

rt       use Insights Reliable Transport (RT)
custom   use a custom solution that complies with RT interface. A custom q code module
         should be loaded from the path given by an environment variable `KXI_RT_LIB`.
         For this protocol, the `nodes` list should contain a single `hostname:port`.

Elements

Assemblies coordinate a number of components or processes, which we call elements of the assembly. The elements section provides configuration details only relevant to specific services.

When the processes comprising the elements of an assembly are initialized, each will have access to the Assembly configuration. Furthermore, every process will know its element type, a short name describing its purpose. (The KXI components sp, gw, sm, and dap are all element types, but an assembly might contain an open-ended collection of user-defined types.)

If processes are started with the environment variable KXI_NAME, they can search for the configuration details of an element with this element name. Otherwise, they should look for configuration details for their element type. It is a fatal error for a process to launch as part of an assembly if its type is not listed in elements, or if there is an entry matching its name has the wrong type.

Each element entry has the following structure:

key required purpose value
description the purpose of the element string
instances maps instance names to options dictionaries dictionary
image describes container name used for this element dictionary

Image dictionary contains the following:

key required purpose value
repo a URL indicating the image repository string
name name of the image string
tag name/version of the image string

Each element may have its own specific set of key/value configurations. Please refer to documentation of individual KXI components for information on element specific configurations.

Any other keys in the element entry will be applied to each item in instances unless overridden.

Example

An example can be found here

Overrides

Overrides are used to define multiple flavors of an assembly within a single file. This makes it easy to define multiple assemblies with significant overlap in one file, rather than copy-pasting common fields across many files. You can define multiple named overrides and they can be layered onto the base values in any order.

The following is an example of using overrides:

name: "Override example"
tables:
  balls:
    description: Master record of each golf ball manufactured.
    type: splayed
    primaryKeys: serial
    columns:
      - name: serial
      type: int
      attrMem: unique
      - name: nft
      description: They say these are all the rage lately!
      type: guid
      - name: factory
      type: short
      - name: batch
      type: int
      - name: machine
      type: int
      - name: bornTS
      type: timestamp
elements:
  x: "abc"
  y: 123

overrides:
  #
  # Add a new column to balls and a new elements value.
  #
  myOverride1:
    tables:
      balls:
        columns:
          - name: extraCol
          type: symbol
          - name: factory
          type: byte
    elements:
      z: newValue

  #
  # Add a new table, add/modify elements
  #
  myOverride2:
    tables:
      clubs:
        description: Master record of each golf club manufactured.
        type: splayed
        primaryKeys: serial
        columns:
          - name: serial
          attrMem: unique
          - name: type
          type: string
          description: "e.g. wood 1, putter, etc..."
          - name: size
          type: int
    elements:
      y: 456
      z: newerValue

The above assembly file defines two overrides: myOverride1 and myOverride2. Specify the override(s) to apply by defining the KXI_ASSEMBLY_OVERRIDES environment variable. This is an ordered : separated list defining what overrides to apply.

KXI_ASSEMBLY_OVERRIDES="${override_1}:${override_2}:..." # Applies override_1, then override_2, etc...

Example:

  • KXI_ASSEMBLY_OVERRIDES=""

    No overrides. Uses only the base level assembly values. * KXI_ASSEMBLY_OVERRIDES="myOverride1"

    Applies myOverride1 only. The balls table has an extra column (extraCol), the factory column is of type byte, andelements.z="newValue". * KXI_ASSEMBLY_OVERRIDES="myOverride1:myOverride2"

    Applies myOverride1, then myOverride2. The balls table has an extra column (extraCol), the factory column is of type byte, elements.y=456 and elements.z="newerValue" (since myOverride2 is applied second).


  1. object type mount is currently not supported by SM. Its use is mainly for scenarios where SM is not part of the installation.