Data Access configuration¶
A set of services collectively making up a data ingestion, storage, and access pipeline is collectively called an Assembly.
To aid in using them together as part of an assembly, KXI components share a common configuration file format, called an Assembly Configuration file (AC). An AC is a YAML document read from a path specified for each service by the
KXI_ASSEMBLY_FILE environment variable.
A Data Access image expects the following top-level structure in an AC:
name short name for the assembly (string) description purpose of the assembly (optional) (string) tables schemas for tables operated upon within the assembly (dictionary) mounts mount points for stored data (dictionary) bus configuration of the message bus used for coordination between elements (dictionary) elements services that should run within the assembly, and any configurations they require (dictionary)
A table schema has the following structure.
description purpose of this table (optional) (string) type ["splayed"|"partitioned"] (string) primaryKeys names of primary key columns (optional) (list) partCol name of column for storage partitioning (optional) (string) shards shard count (optional) (integer) partitions partition count (optional) (integer) blockSize block size (optional) (integer) updTsCol name of the arrival timestamp column (optional) (string) columns column schemas (list)
A column schema has the following structure.
name name of the column description purpose of this column (optional) (string) type q type name foreign foreign key into another table in the assembly in the form table.column (optional) attrMem column attribute when stored in memory (optional) (string) attrDisk column attribute when stored on disk (optional) (string) attrOrd column attribute when stored on disk with an ordinal partition scheme (optional) (string) attrObj column attribute when stored in object store (e.g. S3) (optional) (string)
Data Access can mount data from any of the supported tiers, each with its own locality and format. Loosely speaking, the type of Data Access process is defined by the
type of Mount. Where
stream is similar to a traditional kdb+ RDB, and
local equivalent to an HDB. The
object tier is unique to cloud based storage.
The Mounts section is a dictionary mapping user-defined names of storage locations to dictionaries, with the following fields.
type ["stream"|"local"|"object"] uri URI representing where that data can be mounted by other services; presently supports the file:// protocol (string) partition partitioning scheme for the mount. One of: none do not partition; store in arrival order ordinal partition by a numeric virtual column which increments according to a corresponding storage tier's schedule and resets when the subsequent tier (if any) rolls over date partition by each table's partCol column, interpreted as a date
A mount of type
stream must be
A mount of type
object must be
Data Access ingests data from an event stream; a Bus contains the information necessary to subscribe to that stream.
bus section consists of a dictionary of bus entries. Each bus entry provides several fields:
|protocol||short string||Protocol of the messaging system. Currently, the only valid choice for this protocol is
In the future, the protocol will also support out-of-the-box EMS protocols, like
|topic||string||Subset of messages in this stream that consumers are interested in. (Optional.)|
|nodes||list||One or more connection strings to machines/services which can be used for subscribing to this bus.
In the case of the
Assemblies coordinate a number of processes and/or microservices, which we call elements of the assembly. The
elements section provides configuration details which are only relevant to individual services. This guide will focus on the configuration options for Data Access, which go in the
da entry of
da element configuration has the following structure.
opEnabled whether to enable the data access operator opHost host of operator process opPort port of operator process gwArch architecture of gateway process: ["traditional"|"asymmetric"] gwEndpoints list of hostname:port strings of known gateways, if the discovery service is unavailable gwAssembly name of the assembly containing this DA's gateway if using a shared gateway tableLoad how to populate in-memory database tables mountName name of mount