Data Access configuration
In its most basic form, kdb Insights Data Access is a set of Docker images that are combined using minimal configuration. Below is an explanation of the images required, what configuration parameters need to be defined, and an some example configurations.
Images
Two images are provided for deploying Data Access Processes:
Single mount
To deploy a Data Access Process providing query access to a single mount (e.g., an RDB
, IDB,
or HDB
), use the registry.dl.kx.com/kxi-da
image.
Additionally, set the mount preferred using mountName: <mount>
in the assembly for that dap instance.
For example,
elements:
dap:
instances:
rdb:
mountName: stream
Multiple mounts
To deploy a Data Access Process providing query access to multiple mounts within a single image (to share compute resources across tiers), use the registry.dl.kx.com/kxi-da-single
image.
Additionally, set the list of mounts preferred using mountList: [<mounts>]
in the assembly, rather than mountName
.
elements:
dap:
instances:
db:
mountList: [stream, intraday, historical]
An example of a multi-tier deployment can be found in the Docker example.
Environment variables
The DA microservice relies on certain environment variables to be defined in the containers. The variables are described below:
variable | required | containers | description |
---|---|---|---|
KXI_NAME | Yes | DA | Process name. |
KXI_PORT | No | DA | Port. Can also be started with "-p $KXI_PORT" . |
KXI_SC | Yes | DA | Service Class type for data access (e.g. RDB, IDB,HDB) |
KXI_LOG_FORMAT | No | DA, sidecar | Message format (see qlog documentation). |
KXI_LOG_DEST | No | DA, sidecar | Endpoints (see qlog documentation). |
KXI_LOG_LEVELS | No | DA, sidecar | Component routing (see qlog documentation). |
KXI_ASSEMBLY_FILE | Yes | DA | Assembly yaml.file. |
KXI_CONFIG_FILE | Yes | sidecar | Discovery configuration file (see KXI Service Discovery documentation). |
KXI_CUSTOM_FILE | No | DA | File containing custom code to load in DA processes. |
KXI_DAP_SANDBOX | No | DA | Whether this DAP is a sandbox. |
SBX_MAX_ROWS | No | DA | Maximum number of rows, per partitioned table, to store in memory. |
KXI_ALLOWED_SBX_APIS | No | DA | Comma-delimited list of sandbox APIs to allow in non-sandbox DAPs (ex: ".kxi.sql,.kxi.qsql"). |
KXI_DA_RELOAD_STAGGER | No | DA | Time in seconds between DAPs of the same class reloading after an EOX (default: 30 ) |
KXI_DA_USE_REAPER | No | DA | Whether to use KX Reaper and object storage cache (default: false ) |
KXI_SAPI_HB_FREQ | No | DA | Time in milliseconds to run the heartbeat to connected processes (default is 30,000 ). |
KXI_SAPI_HB_TOL | No | DA | Number of heartbeat intervals a process can miss before being disconnected (default is 2 ). |
KXI_GC_FREQ | No | DA | Frequency in milliseconds to run garbage collect in a timer (default 600000 , set to 0 to disable). |
KXI_ENABLE_FLUSH | No | DA | Set to "true" to enable async flush on messages from DA to Agg (default "false"). |
KX_OBJSTR_INVENTORY_FILE | No | DA | Set to path relative to the root of the bucket to use an inventory file |
See the Docker deployment example for examples in settings environment variables.
Object store config
The Data Access HDB processes are able to cache and reap object storage results to avoid repeated downloads of the same data.
Be sure to configure RT log archiving to not overlap with the cache
If using RT and the RT log volume, be sure to size the RT log volume appropriately to make additional room for the object storage cache.
The following environment variables should be set:
variable | required | containers | description |
---|---|---|---|
KX_OBJSTOR_CACHE_PATH | Yes (unless Enterprise) | DA | Path to where the object storage cache should be. This uses the RT Log Volume in kdb Insights Enterprise. |
KX_OBJSTOR_CACHE_SIZE | Yes | DA | Size of the object storage cache in MB. Increase the RT Log Volume by this amount in kdb Insights Enterprise. |
For kdb Insights Enterprise, the RT Log Volume is used for the object storage cache. Since all RT Log Volumes must be sized identically for log archiving, increase the RT Log Volume by the object storage cache size. For example, for a 20Gi log volume and a desired 5Gi object storage cache, set the RT Log Volume size for the HDB to 25Gi and set KXI_DA_USE_REAPER
to "true"
for the HDB DAP element.
Inventory files
DA processes can be setup to load object storage data from an inventory file. Set KX_OBJSTR_INVENTORY_FILE
to a path relative to root of your storage.
For example, with the following s3 layout, the setting should be: KX_OBJSTR_INVENTORY_FILE=inventory/inventory.tgz
:
s3://examplebucket/
db/
2022.01.01/
2022.01.02/
inventory/
inventory.tgz
The SM may be configured to write inventory files at EOD, as well as produce them at startup if none exist. Please refer to the SM configuration.
Names
Data Access process names help determine the order in which RDB
processes reload, to help avoid processes all reloading at once. This is handled by Kubernetes StatefulSet configuration, which will name Pods as pod-name-<ordinal>
and by Docker Compose, which will name containers as container-name_<ordinal>
. In cases where this naming convention isn't followed, either explicitly or via Kubernetes/Docker Compose, the reloads will be immediate with no staggering. See the KXI_DA_RELOAD_STAGGER
to control the time period between reloads.
Assembly
The assembly configuration is a yaml file that defines the DA configuration, i.e. what data it is expected to offer, how it responds to queries. Assemblies are used in all kdb Insights microservices.
See the assembly configuration documentation for more information and examples.
The Docker deployment example provides an example of an assembly file in the conext of an end-to-end example.
Data Access Elements
DAP instances are configured within the dap
element of the assembly elements
field. It provides the following options:
rcEndpoints
: List ofhostname:port
strings of known resource coordinators to connect to if the discovery service is unavailable.rcName
: The name of the resource coordinator for the DAP to connect to, as defined by itsKXI_NAME
environment variable.smEndpoints
: Thehostname:port
strings of storage manager service for data accesss process to connnect to.tableLoad
: How to populate in-memory database tables. Support forempty
,splay
, andlinks
. Default behaviour isempty
.mountName
: Name of mount frommounts
section of assembly for DA to mount and provide access to.mountList
: (**if not usingmountName**) List of mount names from
mounts` section of assembly for DA to mount and provide access to.mapPartitions
: Whether a local mount should map partitions after a remount. See kdb+ documentation here.purview
: Inclusive start, exclusive end purview for startup of DA process.enforceSchema
: Whether stream DAP should validate all incoming table data against what's defined in the schema. There is a performance cost having this enabled.pctMemThreshold
: Percentage of available memory to allocate to ingestion of a single interval. Decimal value between 0 and 1.allowPartialResults
: Whether an HDB DAP should return a successful response if it's entered low memory mode and stopped ingesting late data (exceeding thepctMemThreshold
). Default istrue
.
Within the assembly it is structured under the dap
element, instances. Config that applies to all DAPs are indented one level above the instances themselves. This can be overridden at the instance level as well.
elements:
dap:
# These configs apply to all DA below
rcName: sg_rc # Used with discovery to determine resource coordinator to connect to
instances:
RDB:
# Config specific to DAPs with a KXI_SC of RDB
mountName: rdb # Must match name of mount in "mounts" section
IDB:
# Config specific to DAPs with a KXI_SC of IDB
mountName: idb
HDB:
# Config specific to DAPs with a KXI_SC of HDB
mountName: hdb
See the deployment example for an example of DAP configuration.
Discovery
By default, the kdb Insights Database microservices (SG, DA, SM) use environment variables to connect to one another. An example of using environment variables is outlined in the deployment example. In this mode, The dynamic processes connect to the static processes (DAs connect to SG and SM), so processes can still come and go despite being explicitly configured.
Alternatively, all kdb Insights Database microservices can use kdb Insights Service Discovery in order for processes to discover and connect with each other dynamically (see the kdb Insights Service Discovery documentation). When using Service Discovery, all images must be configured to use discovery. Modes can not be intermixed. Images required for this are as follows:
process | description | image |
---|---|---|
sidecar | Discovery sidecar | kxi_sidecar |
discovery | Discovery client. Configure one, which all processes seamlessly connect to. | kxi-eureka-discovery |
proxy | Discovery proxy | discovery_proxy |
Custom file
The DA processes load the q file pointed to by the KXI_CUSTOM_FILE
environment variable. In this file, you can load any custom APIs/functions that you want accessible by the DA processes. Note that while DA only supports loading a single file, you can load other files from within this file using \l
(allowing you to control load order). The current working directory (pwd
) at load time is the base directory of the file.
This can be combined with the Service Gateway microservice (which allows custom aggregation functions) to create full custom API support within kdb Insights (see "Service Gateway" for details).
Note: It's recommended to avoid .da*
namespaces to avoid colliding with DA functions.
To make an API executable within DA, use the .api.registerAPI
API, whose signature is as follows:
api
- symbol - Aggregation function name.metadata
- list|string|dictionary - Aggregation function metadata (see "SAPI - Metadata" documentation).
API functions MUST be registered with .api.registerAPI
in order to be invoke-able by the DA processes. See Custom file example for an example.
If using the Service Gateway microservice, you can see which APIs are available (and in which DAP), use the .kxi.getMeta
API (See "SG - APIs").
When creating custom analytics that access data there is a helper function .kxi.selectTable
which understands the data model within each DAP and can help select from the tables necessary to return the appropriate records. It's interface is as follows:
name | type | description |
---|---|---|
tn | symbol | Name of table to retrieve data from |
ts | timestamp[2] | Time period of interest |
wc | list[] | Where clause of what to select |
bc | dict/boolean | By clause for select |
cn | symbol | Names of columns to select for. Include any columns needed in aggregations |
agg | dict | Select clause/aggregations to apply to table |
EOX Event Hooks
When loading a custom file into a Data Access Process, there are two functions which are intended to overwritten to augment the DAPs EOX event handling. These functions are .da.prtnEndCB
and .da.reloadCB
.
The function .da.prntEndCB
is invoked by receipt of the _prtnEnd
table published by Storage Manager to mark the end of an interval. This callback function is invoked after DAP has adjusted any receive filters and redirected updates to any delta tables.
name | type | description |
---|---|---|
startTS | timestamp | Start timestamp of interval |
endTS | timestamp | End timestamp of interval |
opts | dictionary | List of additional options (detailed below) |
Where the options can have these keys:
name | type | description |
---|---|---|
date | date | Date of interval |
partNo | long | EOI partition number |
soiTS | timestamp | Start of interval timestamp |
intv | int | Interval length |
The function .da.reloadCB
is invoked by Storage Manager notifying the DAPs that the EOX has been finished and committed. The callback function is invoked after any database has been reloaded, tables have been purged, but before the DAP has marked itself as available to the Resource Coordinator. The function takes a dictionary of arguments with the following keys:
name | type | Ddscription |
---|---|---|
ts | timestamp | Timestamp of reload event |
minTS | timestamp | Lower inclusive start of this DAPs purview |
maxTS | timestamp | Upper inclusive start of this DAPs purview |
startTS | timestamp | Start time of inverval |
endTS | timestamp | End time of interval |
pos | int | Position of _prtnEnd event that triggered this EOX |
Custom file example
The DAP process can load a custom code file, wherein you can define custom functions and define APIs. Below is an example file, of some example API calls that exercise the custom code:
// Sample DA custom file.
// Can load other files within this file. Note that the current directory
// is the directory of this file (in this example: /opt/kx/custom).
/ \l subFolder/otherFile1.q
/ \l subFolder/otherFile2.q
//
// @desc Define a new API. Counts number of entries by specified columns.
//
// @param table {symbol} Table name.
// @param byCols {symbol|symbol[]} Column(s) to count by.
// @param startTS {timestamp} Start time (inclusive).
// @param endTS {timestamp} End time (exclusive).
//
// @return {table} Count by specified columns.
//
countBy:{[table;startTS;endTS;byCols]
?[table;enlist(within;`realTime;(startTS;endTS-1));{x!x,:()}byCols;enlist[`cnt]!enlist(count;`i)]
}
// Register with the DA process.
.da.registerAPI[`countBy;
.sapi.metaDescription["Define a new API. Counts number of entries by specified columns."],
.sapi.metaParam[`name`type`isReq`description!(`table;-11h;1b;"Table name.")],
.sapi.metaParam[`name`type`isReq`description!(`byCols;-11 11h;1b;"Column(s) to count by.")],
.sapi.metaParam[`name`type`isReq`description!(`startTS;-12h;1b;"Start time (inclusive).")],
.sapi.metaParam[`name`type`isReq`description!(`endTS;-12h;1b;"End time (exclusive).")],
.sapi.metaReturn[`type`description!(98h;"Count by specified columns.")],
.sapi.metaMisc[enlist[`safe]!enlist 1b]
]
Example
A full example of an integrated deployment using Docker Compose is available here.