Skip to content

Configuration

A set of services collectively making up a data ingestion, storage, and access pipeline is collectively called an Assembly. To aid in using them together as part of an assembly, KXI components share a common configuration file format, called an Assembly Configuration file (AC). An AC is a YAML document read from a path specified for each service by the KXI_ASSEMBLY_FILE environment variable.

A Data Access image expects the following top-level structure in an AC:

  • name: String giving a short name for this assembly.
  • description: String describing the purpose of this assembly. Optional.
  • tables: Dictionary of schemas for the tables operated upon within the assembly.
  • mounts: Dictionary of mount points for stored data.
  • bus: Dictionary containing the configuration of the message bus used for coordination between elements.
  • elements: Dictionary of services that should run within this assembly, and any configuration they each require.

Tables

A Table schema has the following structure:

  • description: String describing the purpose of this table. Optional.
  • type: String; one of {splayed, partitioned}.
  • primaryKeys: List of names of primary key columns. Optional.
  • partCol: Name of a column to be used for storage partitioning. Optional.
  • shards: Integer; shard count. Optional.
  • partitions: Integer; Partition count. Optional.
  • blockSize: Integer; Block size. Optional.
  • updTsCol: Name of the arrival timestamp column. Optional.
  • columns: List of column schemas.

A column schema has the following structure:

  • name: Name of the column.
  • description: String describing the purpose of this column. Optional.
  • type: Q type name.
  • foreign: This column is a foreign key into another table in this assembly of the form table.column. Optional.
  • attrMem: String; column attribute when stored in memory. Optional.
  • attrDisk: String; column attribute when stored on disk. Optional.
  • attrOrd: String; column attribute when stored on disk with an ordinal partition scheme. Optional.
  • attrObj: String; column attribute when stored in Object store (e.g. S3). Optional.

Mounts

Data Access can mount data from any of the supported tiers each with its own locality and format. Loosely speaking the type of Data Access process is defined by the type of Mount. Where stream is similar to a traditional kdb+ RDB, and local equivalent to an HDB. The object tier is unique to cloud based storage.

The Mounts section is a dictionary mapping user-defined names of storage locations to dictionaries with the following fields:

  • type: String; one of {stream, local, object}.
  • uri: String URI representing where that data can be mounted by other services. Presently this supports the file:// URI schema.
  • partition: Partitioning scheme for this mount. One of:
  • none: do not partition; store in the order it arrives.
  • ordinal: partition by a numeric virtual column which increments according to a corresponding storage tier's schedule and resets when the subsequent tier (if any) rolls over.
  • date: partition by each table's partCol column, interpreted as a date.

Notes:

  • A mount of type stream must be partition:none.
  • A mount of type local or object must be partition:ordinal or partition:date.

Bus

Data Access ingests data from an event stream; a Bus contains the information necessary to subscribe to that stream.

The bus section consists of a dictionary of bus entries. Each bus entry provides several fields:

  • protocol: Short string indicating the protocol of the messaging system. Currently, the only valid choice for this protocol is custom, which indicates that custom Q code should be loaded from the path given by an environment variable KXI_RT_LIB. In the future, the protocol will also support out-of-the-box EMS protocols, like kraftmq.
  • topic: String indicating the subset of messages in this stream that consumers are interested in. Optional.
  • nodes: List of one or more connection strings to machines/services which can be used for subscribing to this bus. In the case of the custom protocol, this list should contain a single hostname:port string.

Elements

Assemblies coordinate a number of processes and/or microservices, which we call elements of the assembly. The elements section provides configuration details which are only relevant to individual services. This guide will focus on the configuration options for Data Access, which go in the da entry of elements.

The da element configuration has the following structure:

  • opEnabled: Whether to enable the data access operator.
  • opHost: Host of operator process.
  • opPort: Port of operator process.
  • gwArch: Architecture of gateway process. Support for traditional and asymetric.
  • gwEndpoints: List of hostname:port strings of known gateways, if the discovery service is unavailable.
  • gwAssembly: The name of the assembly containing this DA's gateway (if using a shared gateway).
  • tableLoad: How to populate in-memory database tables.
  • mountName: Name of mount.