Assembly Configuration
Assembly configuration comes in 2 flavors: the first is the format as consumed by KXI components. The second is a version embedded in a Kubernetes custom resource document (CRD) and contains KXI-Operator specific keys.
This documents describes the Assembly Configuration in the format consumed by KXI components. Information about the assembly CRD can be found here.
An assembly configuration is a machine-readable description of the structure of a dataset, its life cycle, and the services that operate upon it. This description is used by KXI services to self-configure and coordinate amongst themselves, and also provides room for user extension.
KXI services typically load their assembly configuration from file specified by the KXI_ASSEMBLY_FILE
environment variable. This file is represented in the YAML format, which allows for hierarchically-structured data, future extension, and inline comments.
An assembly has the following top-level structure:
name short name for this assembly (required) description purpose of the assembly (optional) labels user defined keys and values used for representing the purview of the assembly (optional) tables schemas for the tables operated upon within the assembly (dictionary) mounts mount points for stored data (dictionary) bus configuration of the message bus used for coordination between elements (dictionary) elements services that should run within the assembly, and any configuration they each require (dictionary) overrides overrides to base values (optional)
This document focuses on the top level sections mentioned above. The components that typically would be under elements
are described in respective documentation.
Tables
Table schemas describe the metadata and columns of tables.
A table schema has the following structure:
key | required | purpose | value |
---|---|---|---|
description | purpose of the table | string | |
type | yes | splayed partitioned |
|
primaryKeys | names of primary key columns | string list | |
prtnCol | column to be used for storage partitioning | string | |
shards | shard count | integer | |
partitions | partition count | integer | |
blockSize | block size for memory/disk manipulation | integer | |
updTsCol | name of the arrival timestamp column | string | |
sortColsMem | names of sort columns (in-memory) | string list | |
sortColsOrd | names of sort columns (on-disk IDB) | string list | |
sortColsDisk | names of sort columns (on-disk HDB) | string list | |
columns | yes | column schemas | list |
A column schema has the following structure:
key | required | purpose |
---|---|---|
name | yes | name of the column |
description | purpose of the column | |
type | yes | q type name |
foreign | foreign key into another table in this assembly in the form table.column | |
attrMem | column attribute when stored in memory | |
attrDisk | column attribute when stored on disk | |
attrOrd | column attribute when stored on disk with an ordinal partition scheme |
|
attrObj | column attribute when stored in object store (e.g. S3) |
Mounts
Assemblies store data in multiple places. The KXI Storage Manager (SM) component migrates data between a hierarchy of "tiers", each with its own locality, segmentation format, and rollover configuration. Other components might use entries in this section to coordinate other forms of data storage and access.
The Mounts section is a dictionary mapping user-defined names of storage locations to dictionaries with the following fields:
key | required | purpose | value |
---|---|---|---|
type | yes | stream local object 1 |
|
baseURI | yes | base URI where that data can be mounted by other services | string |
partition | yes | partitioning scheme for this mount | none ordinal date |
The full URI for mounting the local on-disk data is <baseURI>/current
(current
is a symbolic link pointing to a loadable kdb+ database).
Partition values:
none do not partition; store in arrival order
ordinal partition by a numeric virtual column which increments according to
a corresponding storage tier's schedule and resets
when the subsequent tier (if any) rolls over
date partition by each table's prtnCol column, interpreted as a date
- A mount of type
stream
must have partitionnone
- A mount of type
local
must have partitionordinal
ordate
, and its URI must be of the form<mount_root>/current
, where the<mount_root>
directory is managed by the Storage Manager
Bus
The Bus provides information about whatever EMS-like system (or systems) is available to elements within this assembly for communication.
The bus
section consists of a dictionary of bus entries. The names internal
and external
are suggested for a bus used for communication within the assembly, and communication with the outside world (perhaps other assemblies), but assemblies may contain further entries for user-defined purposes.
Each bus entry provides:
key | required | purpose | value |
---|---|---|---|
protocol | yes | protocol of the messaging system | rt custom |
topic | subset of messages in this stream that consumers are interested in | list | |
nodes | connection strings to machines or services which can be used for subscribing to this bus | hostname:port |
Protocol values:
rt use Insights Reliable Transport (RT)
custom use a custom solution that complies with RT interface. A custom q code module
should be loaded from the path given by an environment variable `KXI_RT_LIB`.
For this protocol, the `nodes` list should contain a single `hostname:port`.
Elements
Assemblies coordinate a number of components or processes, which we call elements of the assembly. The elements
section provides configuration details only relevant to specific services.
When the processes comprising the elements of an assembly are initialized, each will have access to the Assembly configuration. Furthermore, every process will know its element type, a short name describing its purpose. (The KXI components sp
, gw
, sm
, and dap
are all element types, but an assembly might contain an open-ended collection of user-defined types.)
If processes are started with the environment variable KXI_NAME
, they can search for the configuration details of an element with this element name. Otherwise, they should look for configuration details for their element type. It is a fatal error for a process to launch as part of an assembly if its type is not listed in elements, or if there is an entry matching its name has the wrong type.
Each element entry has the following structure:
key | required | purpose | value |
---|---|---|---|
description | the purpose of the element | string | |
instances | maps instance names to options dictionaries | dictionary | |
image | describes container name used for this element | dictionary |
Image dictionary contains the following:
key | required | purpose | value |
---|---|---|---|
repo | a URL indicating the image repository | string | |
name | name of the image | string | |
tag | name/version of the image | string |
Each element may have its own specific set of key/value configurations. Please refer to documentation of individual KXI components for information on element specific configurations.
Any other keys in the element entry will be applied to each item in instances
unless overridden.
Example
An example can be found here
Overrides
Overrides are used to define multiple flavors of an assembly within a single file. This makes it easy to define multiple assemblies with significant overlap in one file, rather than copy-pasting common fields across many files. You can define multiple named overrides and they can be layered onto the base values in any order.
The following is an example of using overrides:
name: "Override example"
tables:
balls:
description: Master record of each golf ball manufactured.
type: splayed
primaryKeys: serial
columns:
- name: serial
type: int
attrMem: unique
- name: nft
description: They say these are all the rage lately!
type: guid
- name: factory
type: short
- name: batch
type: int
- name: machine
type: int
- name: bornTS
type: timestamp
elements:
x: "abc"
y: 123
overrides:
#
# Add a new column to balls and a new elements value.
#
myOverride1:
tables:
balls:
columns:
- name: extraCol
type: symbol
- name: factory
type: byte
elements:
z: newValue
#
# Add a new table, add/modify elements
#
myOverride2:
tables:
clubs:
description: Master record of each golf club manufactured.
type: splayed
primaryKeys: serial
columns:
- name: serial
attrMem: unique
- name: type
type: string
description: "e.g. wood 1, putter, etc..."
- name: size
type: int
elements:
y: 456
z: newerValue
The above assembly file defines two overrides: myOverride1
and myOverride2
. Specify the override(s) to apply by defining the KXI_ASSEMBLY_OVERRIDES
environment variable. This is an ordered :
separated list defining what overrides to apply.
KXI_ASSEMBLY_OVERRIDES="${override_1}:${override_2}:..." # Applies override_1, then override_2, etc...
Example:
-
KXI_ASSEMBLY_OVERRIDES=""
No overrides. Uses only the base level assembly values. *
KXI_ASSEMBLY_OVERRIDES="myOverride1"
Applies
myOverride1
only. Theballs
table has an extra column (extraCol
), thefactory
column is of typebyte
, andelements.z="newValue"
. *KXI_ASSEMBLY_OVERRIDES="myOverride1:myOverride2"
Applies
myOverride1
, thenmyOverride2
. The balls table has an extra column (extraCol
), thefactory
column is of typebyte
,elements.y=456
andelements.z="newerValue"
(sincemyOverride2
is applied second).
-
object
type mount is currently not supported by SM. Its use is mainly for scenarios where SM is not part of the installation. ↩