Skip to content

Data Access configuration

In its most basic form, kdb Insights Data Access is a set of Docker images that are combined using minimal configuration. Below is an explanation of the images required, what configuration parameters need to be defined, and an some example configurations.

Images

Two images are provided for deploying Data Access Processes:

Single mount

To deploy a Data Access Process providing query access to a single mount (e.g., an RDB, IDB, or HDB), use the registry.dl.kx.com/kxi-da image.

Additionally, set the mount preferred using mountName: <mount> in the assembly for that dap instance.

For example,

elements:
  dap:
    instances:
      rdb:
        mountName: stream

Multiple mounts

To deploy a Data Access Process providing query access to multiple mounts within a single image (to share compute resources across tiers), use the registry.dl.kx.com/kxi-da-single image.

Additionally, set the list of mounts preferred using mountList: [<mounts>] in the assembly, rather than mountName.

elements:
  dap:
    instances:
      db:
        mountList: [stream, intraday, historical]

An example of a multi-tier deployment can be found in the Docker example.

Environment variables

The DA microservice relies on certain environment variables to be defined in the containers. The variables are described below:

variable required containers description
KXI_NAME Yes DA Process name.
KXI_PORT No DA Port. Can also be started with "-p $KXI_PORT".
KXI_SC Yes DA Service Class type for data access (e.g. RDB, IDB,HDB)
KXI_LOG_FORMAT No DA, sidecar Message format (see qlog documentation).
KXI_LOG_DEST No DA, sidecar Endpoints (see qlog documentation).
KXI_LOG_LEVELS No DA, sidecar Component routing (see qlog documentation).
KXI_ASSEMBLY_FILE Yes DA Assembly yaml.file.
KXI_CONFIG_FILE Yes sidecar Discovery configuration file (see KXI Service Discovery documentation).
KXI_CUSTOM_FILE No DA File containing custom code to load in DA processes.
KXI_DAP_SANDBOX No DA Whether this DAP is a sandbox.
SBX_MAX_ROWS No DA Maximum number of rows, per partitioned table, to store in memory.
KXI_ALLOWED_SBX_APIS No DA Comma-delimited list of sandbox APIs to allow in non-sandbox DAPs (ex: ".kxi.sql,.kxi.qsql").
KXI_DA_RELOAD_STAGGER No DA Time in seconds between DAPs of the same class reloading after an EOX (default: 30)
KXI_DA_USE_REAPER No DA Whether to use KX Reaper and object storage cache (default: false)
KXI_SAPI_HB_FREQ No DA Time in milliseconds to run the heartbeat to connected processes (default is 30,000).
KXI_SAPI_HB_TOL No DA Number of heartbeat intervals a process can miss before being disconnected (default is 2).
KXI_GC_FREQ No DA Frequency in milliseconds to run garbage collect in a timer (default 600000, set to 0 to disable).
KXI_ENABLE_FLUSH No DA Set to "true" to enable async flush on messages from DA to Agg (default "false").
KX_OBJSTR_INVENTORY_FILE No DA Set to path relative to the root of the bucket to use an inventory file

See the Docker deployment example for examples in settings environment variables.

Object store config

The Data Access HDB processes are able to cache and reap object storage results to avoid repeated downloads of the same data.

Be sure to configure RT log archiving to not overlap with the cache

If using RT and the RT log volume, be sure to size the RT log volume appropriately to make additional room for the object storage cache.

The following environment variables should be set:

variable required containers description
KX_OBJSTOR_CACHE_PATH Yes (unless Enterprise) DA Path to where the object storage cache should be. This uses the RT Log Volume in kdb Insights Enterprise.
KX_OBJSTOR_CACHE_SIZE Yes DA Size of the object storage cache in MB. Increase the RT Log Volume by this amount in kdb Insights Enterprise.

For kdb Insights Enterprise, the RT Log Volume is used for the object storage cache. Since all RT Log Volumes must be sized identically for log archiving, increase the RT Log Volume by the object storage cache size. For example, for a 20Gi log volume and a desired 5Gi object storage cache, set the RT Log Volume size for the HDB to 25Gi and set KXI_DA_USE_REAPER to "true" for the HDB DAP element.

Inventory files

DA processes can be setup to load object storage data from an inventory file. Set KX_OBJSTR_INVENTORY_FILE to a path relative to root of your storage.

For example, with the following s3 layout, the setting should be: KX_OBJSTR_INVENTORY_FILE=inventory/inventory.tgz:

s3://examplebucket/
    db/
      2022.01.01/
      2022.01.02/
    inventory/
      inventory.tgz

The SM may be configured to write inventory files at EOD, as well as produce them at startup if none exist. Please refer to the SM configuration.

Names

Data Access process names help determine the order in which RDB processes reload, to help avoid processes all reloading at once. This is handled by Kubernetes StatefulSet configuration, which will name Pods as pod-name-<ordinal> and by Docker Compose, which will name containers as container-name_<ordinal>. In cases where this naming convention isn't followed, either explicitly or via Kubernetes/Docker Compose, the reloads will be immediate with no staggering. See the KXI_DA_RELOAD_STAGGER to control the time period between reloads.

Assembly

The assembly configuration is a yaml file that defines the DA configuration, i.e. what data it is expected to offer, how it responds to queries. Assemblies are used in all kdb Insights microservices.

See the assembly configuration documentation for more information and examples.

The Docker deployment example provides an example of an assembly file in the conext of an end-to-end example.

Data Access Elements

DAP instances are configured within the dap element of the assembly elements field. It provides the following options:

  • rcEndpoints: List of hostname:port strings of known resource coordinators to connect to if the discovery service is unavailable.
  • rcName: The name of the resource coordinator for the DAP to connect to, as defined by its KXI_NAME environment variable.
  • smEndpoints: The hostname:port strings of storage manager service for data accesss process to connnect to.
  • tableLoad: How to populate in-memory database tables. Support for empty, splay, and links. Default behaviour is empty.
  • mountName: Name of mount from mounts section of assembly for DA to mount and provide access to.
  • mountList: (**if not using mountName**) List of mount names frommounts` section of assembly for DA to mount and provide access to.
  • mapPartitions: Whether a local mount should map partitions after a remount. See kdb+ documentation here.
  • purview : Inclusive start, exclusive end purview for startup of DA process.
  • enforceSchema : Whether stream DAP should validate all incoming table data against what's defined in the schema. There is a performance cost having this enabled.
  • pctMemThreshold : Percentage of available memory to allocate to ingestion of a single interval. Decimal value between 0 and 1.
  • allowPartialResults : Whether an HDB DAP should return a successful response if it's entered low memory mode and stopped ingesting late data (exceeding the pctMemThreshold). Default is true.

Within the assembly it is structured under the dap element, instances. Config that applies to all DAPs are indented one level above the instances themselves. This can be overridden at the instance level as well.

elements:
  dap:
    # These configs apply to all DA below
    rcName: sg_rc # Used with discovery to determine resource coordinator to connect to
    instances:
      RDB:
        # Config specific to DAPs with a KXI_SC of RDB
        mountName: rdb # Must match name of mount in "mounts" section
      IDB:
        # Config specific to DAPs with a KXI_SC of IDB
        mountName: idb
      HDB:
        # Config specific to DAPs with a KXI_SC of HDB
        mountName: hdb

See the deployment example for an example of DAP configuration.

Discovery

By default, the kdb Insights Database microservices (SG, DA, SM) use environment variables to connect to one another. An example of using environment variables is outlined in the deployment example. In this mode, The dynamic processes connect to the static processes (DAs connect to SG and SM), so processes can still come and go despite being explicitly configured.

Alternatively, all kdb Insights Database microservices can use kdb Insights Service Discovery in order for processes to discover and connect with each other dynamically (see the kdb Insights Service Discovery documentation). When using Service Discovery, all images must be configured to use discovery. Modes can not be intermixed. Images required for this are as follows:

process description image
sidecar Discovery sidecar kxi_sidecar
discovery Discovery client. Configure one, which all processes seamlessly connect to. kxi-eureka-discovery
proxy Discovery proxy discovery_proxy

Custom file

The DA processes load the q file pointed to by the KXI_CUSTOM_FILE environment variable. In this file, you can load any custom APIs/functions that you want accessible by the DA processes. Note that while DA only supports loading a single file, you can load other files from within this file using \l (allowing you to control load order). The current working directory (pwd) at load time is the base directory of the file.

This can be combined with the Service Gateway microservice (which allows custom aggregation functions) to create full custom API support within kdb Insights (see "Service Gateway" for details).

Note: It's recommended to avoid .da* namespaces to avoid colliding with DA functions.

To make an API executable within DA, use the .api.registerAPI API, whose signature is as follows:

  • api - symbol - Aggregation function name.
  • metadata - list|string|dictionary - Aggregation function metadata (see "SAPI - Metadata" documentation).

API functions MUST be registered with .api.registerAPI in order to be invoke-able by the DA processes. See Custom file example for an example.

If using the Service Gateway microservice, you can see which APIs are available (and in which DAP), use the .kxi.getMeta API (See "SG - APIs").

When creating custom analytics that access data there is a helper function .kxi.selectTable which understands the data model within each DAP and can help select from the tables necessary to return the appropriate records. It's interface is as follows:

name type description
tn symbol Name of table to retrieve data from
ts timestamp[2] Time period of interest
wc list[] Where clause of what to select
bc dict/boolean By clause for select
cn symbol Names of columns to select for. Include any columns needed in aggregations
agg dict Select clause/aggregations to apply to table

EOX Event Hooks

When loading a custom file into a Data Access Process, there are two functions which are intended to overwritten to augment the DAPs EOX event handling. These functions are .da.prtnEndCB and .da.reloadCB.

The function .da.prntEndCB is invoked by receipt of the _prtnEnd table published by Storage Manager to mark the end of an interval. This callback function is invoked after DAP has adjusted any receive filters and redirected updates to any delta tables.

name type description
startTS timestamp Start timestamp of interval
endTS timestamp End timestamp of interval
opts dictionary List of additional options (detailed below)

Where the options can have these keys:

name type description
date date Date of interval
partNo long EOI partition number
soiTS timestamp Start of interval timestamp
intv int Interval length

The function .da.reloadCB is invoked by Storage Manager notifying the DAPs that the EOX has been finished and committed. The callback function is invoked after any database has been reloaded, tables have been purged, but before the DAP has marked itself as available to the Resource Coordinator. The function takes a dictionary of arguments with the following keys:

name type Ddscription
ts timestamp Timestamp of reload event
minTS timestamp Lower inclusive start of this DAPs purview
maxTS timestamp Upper inclusive start of this DAPs purview
startTS timestamp Start time of inverval
endTS timestamp End time of interval
pos int Position of _prtnEnd event that triggered this EOX

Custom file example

The DAP process can load a custom code file, wherein you can define custom functions and define APIs. Below is an example file, of some example API calls that exercise the custom code:

// Sample DA custom file.

// Can load other files within this file. Note that the current directory
// is the directory of this file (in this example: /opt/kx/custom).
/ \l subFolder/otherFile1.q
/ \l subFolder/otherFile2.q

//
// @desc Define a new API. Counts number of entries by specified columns.
//
// @param table     {symbol}            Table name.
// @param byCols    {symbol|symbol[]}   Column(s) to count by.
// @param startTS   {timestamp}         Start time (inclusive).
// @param endTS     {timestamp}         End time (exclusive).
//
// @return          {table}             Count by specified columns.
//
countBy:{[table;startTS;endTS;byCols]
    ?[table;enlist(within;`realTime;(startTS;endTS-1));{x!x,:()}byCols;enlist[`cnt]!enlist(count;`i)]
    }

// Register with the DA process.
.da.registerAPI[`countBy;
    .sapi.metaDescription["Define a new API. Counts number of entries by specified columns."],
    .sapi.metaParam[`name`type`isReq`description!(`table;-11h;1b;"Table name.")],
    .sapi.metaParam[`name`type`isReq`description!(`byCols;-11 11h;1b;"Column(s) to count by.")],
    .sapi.metaParam[`name`type`isReq`description!(`startTS;-12h;1b;"Start time (inclusive).")],
    .sapi.metaParam[`name`type`isReq`description!(`endTS;-12h;1b;"End time (exclusive).")],
    .sapi.metaReturn[`type`description!(98h;"Count by specified columns.")],
    .sapi.metaMisc[enlist[`safe]!enlist 1b]
    ]

Example

A full example of an integrated deployment using Docker Compose is available here.