Data Access configuration
In its most basic form, Data Access is a set of Docker images that are combined using minimal configuration. Below is an explanation of the images required, what configuration parameters need to be defined, and an some example configurations.
Two images are provided for deploying Data Access Processes:
To deploy a Data Access Process providing query access to a single mount (e.g., an
HDB), use the
Additionally, set the mount preferred using
mountName: <mount> in the assembly for that dap instance.
elements: dap: instances: rdb: mountName: stream
To deploy a Data Access Process providing query access to multiple mounts within a single image (to share compute resources across tiers), use the
Additionally, set the list of mounts preferred using
mountList: [<mounts>] in the assembly, rather than
elements: dap: instances: db: mountList: [stream, intraday, historical]
An example of a multi-tier deployment can be found in the Docker example.
The DA microservice relies on certain environment variables to be defined in the containers. The variables are described below.
|KXI_PORT||No||DA||Port. Can also be started with
|KXI_SC||Yes||DA||Service Class type for data access (e.g. RDB, IDB,HDB)|
|KXI_LOG_FORMAT||No||DA, sidecar||Message format (see qlog documentation).|
|KXI_LOG_DEST||No||DA, sidecar||Endpoints (see qlog documentation).|
|KXI_LOG_LEVELS||No||DA, sidecar||Component routing (see qlog documentation).|
|KXI_CONFIG_FILE||Yes||sidecar||Discovery configuration file (see KXI Service Discovery documentation).|
|KXI_CUSTOM_FILE||No||DA||File containing custom code to load in DA processes.|
|KXI_DAP_SANDBOX||No||DA||Whether this DAP is a sandbox.|
|SBX_MAX_ROWS||No||DA||Maximum number of rows, per partitioned table, to store in memory.|
|KXI_ALLOWED_SBX_APIS||No||DA||Comma-delimited list of sandbox APIs to allow in non-sandbox DAPs (ex: ".kxi.sql,.kxi.qsql").|
|KXI_DA_RELOAD_STAGGER||No||DA||Time in seconds between DAPs of the same class reloading after an EOX (default:
|KXI_DA_USE_REAPER||No||DA||Whether to use KX Reaper and object storage cache - follow the (configuration)[#object-store-config] (default:
|KXI_SAPI_HB_FREQ||No||DA||Time in milliseconds to run the heartbeat to connected processes (default is
|KXI_SAPI_HB_TOL||No||DA||Number of heartbeat intervals a process can miss before being disconnected (default is
|KXI_GC_AFTER_API||No||DA||Whether to garbage collect after executing an API or not (default
See the Docker deployment example for examples in settings environment variables.
Object store config
The Data Access HDB processes are able to cache and reap object storage results to avoid repeated downloads of the same data.
Be sure to configure RT log archiving to not overlap with the cache
If using RT and the RT log volume, be sure to size the RT log volume appropriately to make additional room for the object storage cache.
The following environment variables should be set:
|KX_OBJSTOR_CACHE_PATH||Yes (unless Platform)||DA||Path to where the object storage cache should be. This uses the RT Log Volume in Platform.|
|KX_OBJSTOR_CACHE_SIZE||Yes||DA||Size of the object storage cache in MB. Increase the RT Log Volume by this amount in Platform.|
For Platform, the RT Log Volume is used for the object storage cache. Since all RT Log Volumes must be sized identically for log archiving, increase the RT Log Volume by the object storage cache size. For example, for a 20Gi log volume and a desired 5Gi object storage cache, set the RT Log Volume size for the HDB to 25Gi and set
"true" for the HDB DAP element.
Data Access process names help determine the order in which
RDB processes reload, to help avoid processes all reloading at once. This is handled by Kubernetes StatefulSet configuration, which will name Pods as
pod-name-<ordinal> and by Docker Compose, which will name containers as
container-name_<ordinal>. In cases where this naming convention isn't followed, either explicitly or via Kubernetes/Docker Compose, the reloads will be immediate with no staggering. See the
KXI_DA_RELOAD_STAGGER to control the time period between reloads.
The assembly configuration is a yaml file that defines the DA configuration, i.e. what data it is expected to offer, how it responds to queries. Assemblies are used in all KX Insights microservices.
See the assembly configuration documentation for more information and examples.
The Docker deployment example provides an example of an assembly file in the conext of an end-to-end example.
Data Access Elements
DAP instances are configured within the
dap element of the assembly
elements field. It provides the following options:
rcEndpoints: List of
hostname:portstrings of known resource coordinators to connect to if the discovery service is unavailable.
rcName: The name of the resource coordinator for the DAP to connect to, as defined by its
hostname:portstrings of storage manager service for data accesss process to connnect to.
tableLoad: How to populate in-memory database tables. Support for
links. Default behaviour is
mountName: Name of mount from
mountssection of assembly for DA to mount and provide access to.
mountList: (**if not using
mountName**) List of mount names frommounts` section of assembly for DA to mount and provide access to.
mapPartitions: Whether a local mount should map partitions after a remount. See kdb+ documentation here.
purview: Inclusive start, exclusive end purview for startup of DA process.
enforceSchema: Whether stream DAP should validate all incoming table data against what's defined in the schema. There is a performance cost having this enabled.
pctMemThreshold: Percentage of available memory to allocate to ingestion of a single interval. Decimal value between 0 and 1.
allowPartialResults: Whether an HDB DAP should return a successful response if it's entered low memory mode and stopped ingesting late data (exceeding the
pctMemThreshold). Default is
Within the assembly it is structured under the
dap element, instances. Config that applies to all DAPs are indented one level above the instances themselves. This can be overridden at the instance level as well.
elements: dap: # These configs apply to all DA below rcName: sg_rc # Used with discovery to determine resource coordinator to connect to instances: RDB: # Config specific to DAPs with a KXI_SC of RDB mountName: rdb # Must match name of mount in "mounts" section IDB: # Config specific to DAPs with a KXI_SC of IDB mountName: idb HDB: # Config specific to DAPs with a KXI_SC of HDB mountName: hdb
See the deployment example for an example of DAP configuration.
By default, the database microservices (SG, DA, SM) use environment variables to connect to one another. An example of using environment variables is outlined in the deployment example. In this mode, The dynamic processes connect to the static processes (DAs connect to SG and SM), so processes can still come and go despite being explicitly configured.
Alternatively, all database microservices can use KX Insights Service Discovery in order for processes to discover and connect with each other dynamically (see the KXI Service Discovery documentation). When using service discovery, all images must be configured to use discovery. Modes can not be intermixed. Images required for this are as follows.
|discovery||Discovery client. Configure one, which all processes seamlessly connect to.||kxi-eureka-discovery|
The DA processes load the q file pointed to by the
KXI_CUSTOM_FILE environment variable. In this file, you can load any custom APIs/functions that you want accessible by the DA processes. Note that while DA only supports loading a single file, you can load other files from within this file using
\l (allowing you to control load order). The current working directory (
pwd) at load time is the base directory of the file.
This can be combined with the Service Gateway microservice (which allows custom aggregation functions) to create full custom API support within KX Insights (see "Service Gateway" for details).
Note: It's recommended to avoid
.da* namespaces to avoid colliding with DA functions.
To make an API executable within DA, use the
.api.registerAPI API, whose signature is as follows.
api- symbol - Aggregation function name.
metadata- list|string|dictionary - Aggregation function metadata (see "SAPI - Metadata" documentation).
API functions MUST be registered with
.api.registerAPI in order to be invoke-able by the DA processes. See Custom file example for an example.
If using the Service Gateway microservice, you can see which APIs are available (and in which DAP), use the
.kxi.getMeta API (See "SG - APIs").
When creating custom analytics that access data there is a helper function
.kxi.selectTable which understands the data model within each DAP and can help select from the tables necessary to return the appropriate records. It's interface is as follows:
|tn||symbol||Name of table to retrieve data from|
|ts||timestamp||Time period of interest|
|wc||list||Where clause of what to select|
|bc||dict/boolean||By clause for select|
|cn||symbol||Names of columns to select for. Include any columns needed in aggregations|
|agg||dict||Select clause/aggregations to apply to table|
EOX Event Hooks
When loading a custom file into a Data Access Process, there are two functions which are intended to overwritten to augment the DAPs EOX event handling. These functions are
.da.prntEndCB is invoked by receipt of the
_prtnEnd table published by Storage Manager to mark the end of an interval. This callback function is invoked after DAP has adjusted any receive filters and redirected updates to any delta tables.
|startTS||timestamp||Start timestamp of interval|
|endTS||timestamp||End timestamp of interval|
|opts||dictionary||List of additional options (detailed below)|
Where the options can have these keys:
|date||date||Date of interval|
|partNo||long||EOI partition number|
|soiTS||timestamp||Start of interval timestamp|
.da.reloadCB is invoked by Storage Manager notifying the DAPs that the EOX has been finished and committed. The callback function is invoked after any database has been reloaded, tables have been purged, but before the DAP has marked itself as available to the Resource Coordinator. The function takes a dictionary of arguments with the following keys:
|ts||timestamp||Timestamp of reload event|
|minTS||timestamp||Lower inclusive start of this DAPs purview|
|maxTS||timestamp||Upper inclusive start of this DAPs purview|
|startTS||timestamp||Start time of inverval|
|endTS||timestamp||End time of interval|
Custom file example
The DAP process can load a custom code file, wherein you can define custom functions and define APIs . Below is an example file, of some example API calls that exercise the custom code.
A full example of an integrated deployment using Docker Compose is available here.