Skip to content

Data Access configuration

A set of services collectively making up a data ingestion, storage, and access pipeline is collectively called an Assembly.

To aid in using them together as part of an assembly, KXI components share a common configuration file format, called an Assembly Configuration file (AC). An AC is a YAML document read from a path specified for each service by the KXI_ASSEMBLY_FILE environment variable.

A Data Access image expects the following top-level structure in an AC:

name            short name for the assembly (string)
description     purpose of the assembly (optional) (string)
tables          schemas for tables operated upon within the assembly (dictionary)
mounts          mount points for stored data (dictionary)
bus             configuration of the message bus used for 
                coordination between elements (dictionary)
elements        services that should run within the assembly, 
                and any configurations they require (dictionary)

Tables

A table schema has the following structure.

description      purpose of this table (optional) (string)
type             ["splayed"|"partitioned"] (string)
primaryKeys      names of primary key columns (optional) (list)
partCol          name of column for storage partitioning (optional) (string)
shards           shard count (optional) (integer)
partitions       partition count (optional) (integer)
blockSize        block size (optional) (integer)
updTsCol         name of the arrival timestamp column (optional) (string)
columns          column schemas (list)

A column schema has the following structure.

name             name of the column
description      purpose of this column (optional) (string)
type             q type name
foreign          foreign key into another table in the assembly 
                 in the form table.column (optional)
attrMem          column attribute when stored in memory (optional) (string)
attrDisk         column attribute when stored on disk (optional) (string)
attrOrd          column attribute when stored on disk with an 
                 ordinal partition scheme (optional) (string)
attrObj          column attribute when stored in object store (e.g. S3) 
                 (optional) (string)

Mounts

Data Access can mount data from any of the supported tiers, each with its own locality and format. Loosely speaking, the type of Data Access process is defined by the type of Mount. Where stream is similar to a traditional kdb+ RDB, and local equivalent to an HDB. The object tier is unique to cloud based storage.

The Mounts section is a dictionary mapping user-defined names of storage locations to dictionaries, with the following fields.

type             ["stream"|"local"|"object"]
uri              URI representing where that data can be mounted by other services; 
                 presently supports the file:// protocol (string)
partition        partitioning scheme for the mount. One of:
  none           do not partition; store in arrival order
  ordinal        partition by a numeric virtual column which increments according 
                 to a corresponding storage tier's schedule 
                 and resets when the subsequent tier (if any) rolls over
  date           partition by each table's partCol column, interpreted as a date

A mount of type stream must be partition:none.

A mount of type local or object must be partition:ordinal or partition:date.

Bus

Data Access ingests data from an event stream; a Bus contains the information necessary to subscribe to that stream.

The bus section consists of a dictionary of bus entries. Each bus entry provides several fields:

field type content
protocol short string Protocol of the messaging system. Currently, the only valid choice for this protocol is custom, which indicates that custom q code should be loaded from the path given by an environment variable KXI_RT_LIB.

In the future, the protocol will also support out-of-the-box EMS protocols, like kraftmq.
topic string Subset of messages in this stream that consumers are interested in. (Optional.)
nodes list One or more connection strings to machines/services which can be used for subscribing to this bus.

In the case of the custom protocol, this list should contain a single hostname:port string.

Elements

Assemblies coordinate a number of processes and/or microservices, which we call elements of the assembly. The elements section provides configuration details which are only relevant to individual services. This guide will focus on the configuration options for Data Access, which go in the da entry of elements.

The da element configuration has the following structure.

opEnabled        whether to enable the data access operator
opHost           host of operator process
opPort           port of operator process
gwArch           architecture of gateway process: ["traditional"|"asymmetric"]
gwEndpoints      list of hostname:port strings of known gateways, 
                 if the discovery service is unavailable
gwAssembly       name of the assembly containing this DA's gateway 
                 if using a shared gateway
tableLoad        how to populate in-memory database tables
mountName        name of mount