Skip to content

Data Access User Guide

In its most basic form, Data Access is a set of Docker images that are combined using minimal configuration. Below is an explanation of the images required, what configuration parameters need to be defined, and an some example configurations.

Images

There is one image per process type in the service. The architecture allows for multiple DAs to be run in parallel (upstream gateways can load balance as desired). Note, however, that there can only be one operator process which can be a place to query metadata information about running all running DA processes.

process number required image
DA many Yes kxi-da
DA Operator 1 No kxi-da-operator

In addition, Data Access can optionally use KX Insights Service Discovery in order for processes to discover and connect with each other seamlessly (see the KXI Service Discovery documentation). Images required are as follows.

process description image
sidecar Discovery sidecar. kxi_sidecar
discovery Discovery client. Configure one, which all processes seamlessly connect to. kxi-eureka-discovery
proxy Discovery proxy. discovery_proxy

Assembly

The assembly configuration is a yaml file that defines the DA configuration, i.e. what data it is expected to offer, how it responds to queries. Assemblies are used in all KX Insights microservices.

field required description
name Yes Assembly name.
description No Description of the assembly.
labels Yes Labels (i.e. dimensions) along which the data is partitioned in the DAs, and possible values (see Labels).
elements Yes Additional, service specific configuration (see Elements).

See Labels/Elements or Example for example assembly yaml configurations. The assembly yaml file must be included in the Docker container.

Labels

Labels are used to define the DA purview. That is, the data that it grants access to. If using the KX Insights Service Gateway, these are the values reported as the DAP's purview (see "Service Gateway" page).

Below are some examples.

Example 1 -- Provides FX data for America.

labels:
    region: amer
    assetClass: fx

Example 2 -- Provides electrical, weekly billing for residential customers.

labels:
    sensorType: electric
    clientType: residential
    billing: weekly

Elements

The elements tag is used to define DA details and configuration for the service. The section must match the KXI_SC set in environment variables. An example assembly is shown below, and parameters are defined in the configuration section.

Environment variables

The DA microservice relies on certain environment variables to be defined in the containers. The variables are described below.

variable required containers description
KXI_NAME Yes DA Process name.
KXI_PORT No DA Port. Can also be started with "-p $KXI_PORT".
KXI_SC Yes DA Service Class type for data access (e.g. RDB, IDB,HDB)
KXI_LOG_FORMAT No DA, sidecar Message format (see qlog documentation).
KXI_LOG_DEST No DA, sidecar Endpoints (see qlog documentation).
KXI_LOG_LEVELS No DA, sidecar Component routing (see qlog documentation).
KXI_ASSEMBLY_FILE Yes DA Assembly yaml.file.
KXI_CONFIG_FILE Yes sidecar Discovery configuration file (see KXI Service Discovery documentation).
KXI_CUSTOM_FILE No DA File containing custom code to load in DA processes.

See example section below.

Custom file

The DA processes load the q file pointed to by the KXI_CUSTOM_FILE environment variable. In this file, you can load any custom APIs/functions that you want accessible by the DA processes. Note that while DA only supports loading a single file, you can load other files from within this file using \l (allowing you to control load order). The current working directory (pwd) at load time is the base directory of the file.

This can be combined with the Service Gateway microservice (which allows custom aggregation functions) to create full custom API support within KX Insights (see "Service Gateway" for details).

Note: It's recommended to avoid .da* namespaces to avoid colliding with DA functions.

Example

Below is a sample configuration. We use a docker-compose yaml, but this can be adapted to other formats. Note: variables ${...} are user-defined and based on your local directory structure/file names. Sections/lines marked Optional are optional.

Docker-compose yaml file:

#
# Optional: Create volumes to include licence/configuration in the containers.
#
x-vols:
    volumes:
    - ${kx_licence_dir}:/opt/kx/lic
    - ${cfg_dir}:/opt/kx/cfg
    - ${mnt_dir}:/data
    - ${custom_dir}:/opt/kx/custom # Optional mount for loading custom code

#
# Optional: Create a network for processes to communicate.
#
x-kxnet: &kxnet
    networks:
    - kx

networks:
    kx:
    name: kx
    driver: bridge

#
# Services.
#
services:

    #
    # Realtime Database
    #
    rdb:
    image: kxi-da:0.8.0
    command: -p 5080
    environment:
        - KXI_NAME=rdb
        - KXI_SC=RDB
        - KXI_LOG_FORMAT=text # Optional
        - KXI_LOG_LEVELS=default:trace # Optional
        - KXI_ASSEMBLY_FILE=/opt/kx/cfg/assembly/${assembly_file_yaml}
        - KXI_RT_LIB=/opt/kx/cfg/docker/rt_tick_client_lib.q
        - KXI_CUSTOM_FILE=/opt/kx/custom/${custom_rdb_code}.q # Optional
    ports:
        - 5080-5084:5080
    deploy:
        mode: replicated
        replicas: 2
    <<: *vols # Optional
    <<: *kxnet # Optional

    #
    # Optional: RDB sidecar. Only required if using discovery, otherwise, may be omitted.
    #
    rdb_sidecar:
        image: kxi_sidecar:0.8.0
        environment:
        - KXI_CONFIG_FILE=/opt/kx/cfg/${rdb_sidecar_config_json}
        - KXI_LOG_LEVELS=default:debug # Optional
        <<: *vols # Optional
        <<: *kxnet # Optional

    #
    # Intraday Database
    #
    idb:
    image: kxi-da:0.8.0
    command: -p 5090
    environment:
        - KXI_NAME=idb
        - KXI_SC=IDB
        - KXI_LOG_FORMAT=text # Optional
        - KXI_LOG_LEVELS=default:trace # Optional
        - KXI_ASSEMBLY_FILE=/opt/kx/cfg/assembly/${assembly_file_yaml}
        - KXI_CUSTOM_FILE=/opt/kx/custom/${custom_idb_code}.q # Optional
    ports:
        - 5090-5094:5090
    deploy:
        mode: replicated
        replicas: 2
    <<: *vols # Optional
    <<: *kxnet # Optional

    #
    # Historical Database
    #
    hdb:
    image: kxi-da:0.8.0
    command: -p 5100
    environment:
        - KXI_NAME=hdb
        - KXI_SC=HDB
        - KXI_LOG_FORMAT=text # Optional
        - KXI_LOG_LEVELS=default:trace # Optional
        - KXI_ASSEMBLY_FILE=/opt/kx/cfg/assembly/${assembly_file_yaml}
        - KXI_CUSTOM_FILE=/opt/kx/custom/${custom_hdb_code}.q # Optional
    ports:
        - 5100-5104:5090
    deploy:
        mode: replicated
        replicas: 2
    <<: *vols # Optional
    <<: *kxnet # Optional

    #
    # Optional: Eureka Service Discovery Registry. Only required if using discovery, otherwise, may be omitted.
    #
    eureka:
        image: kxi-eureka-discovery:0.8.0
        ports:
        - 9000:8761

    #
    # Optional: Discovery proxy. Only required if using discovery, otherwise, may be omitted.
    #
    proxy:
        image: discovery_proxy:0.8.0
    ports:
        - 4000:4000
    environment:
        - KXI_CONFIG_FILE=/opt/app/cfg/${proxy_config_json}
    command: -p 4000

Assembly:

Here's an example assembly configuration where the Data Access processes are tagged with a region of "New York" and an assetClass of "stocks".

name: integration-env
description: Data access assembly configuration
labels:
  region: New York
  assetClass: stocks

tables:
  trade:
    description: Trade data
    type: partitioned
    shards: 11
    blockSize: 10000
    prtnCol: realTime
    columns:
      - name: time
        description: Time
        type: timespan
      - name: sym
        description: Symbol name
        type: symbol
        attrMemory: grouped
        attrDisk: parted
        attrOrd: parted
      - name: realTime
        description: Real timestamp
        type: timestamp
      - name: price
        description: Trade price
        type: float
      - name: size
        description: Trade size
        type: long

  quote:
    description: Quote data
    type: partitioned
    shards: 11
    blockSize: 10000
    prtnCol: realTime
    columns:
      - name: time
        description: Time
        type: timespan
      - name: sym
        description: Symbol name
        type: symbol
        attrMemory: grouped
        attrDisk: parted
        attrOrd: parted
      - name: realTime
        description: Real timestamp
        type: timestamp
      - name: bid
        description: Bid price
        type: float
      - name: ask
        description: Ask price
        type: float
      - name: bidSize
        description: Big size
        type: long
      - name: askSize
        description: Ask size
        type: long

bus:
  stream:
    protocol: custom
    nodes: tp:5000
    topic: dataStream

mounts:
  rdb:
    type: stream
    uri: file://stream
    partition: none
  idb:
    type: local
    uri: file://data/db/idb/current
    partition: ordinal
  hdb:
    type: local
    uri: file://data/db/hdb/current
    partition: date

  RDB:
    description: RDB
    opEnabled: false # Connect with operator
    opHost: # Host of operator process
    opPort: 8000  # Port of operator process
    gwArch: asymmetric  # Whether DA should respond with traditional, or assymetric gw architecture
    gwEndpoints:  # Null endpoints, rely on discovery
    gwAssembly: gw-assembly # GW assembly name
    smConn:  # Endpoint to storage manager, set when discovery not being
    tableLoad: default # How to populate database tables
    mountName: rdb # Name of mount

  IDB:
    description: IDB
    opEnabled: false # Connect with operator
    opHost:  # Host of operator process
    opPort:  # Port of operator process
    gwArch: asymmetric  # Whether DA should respond with traditional, or assymetric gw architecture
    gwEndpoints:  # Null endpoints, rely on discovery
    gwAssembly: gw-assembly # GW assembly name
    smConn:  # Endpoint to storage manager, set when discovery not being
    tableLoad: default # How to populate database tables
    mountName: idb # Name of mount


  HDB:
    description: HDB
    opEnabled: false # Connect with operator
    opHost: # Host of operator process
    opPort: # Port of operator process
    gwArch: asymmetric  # Whether DA should respond with traditional, or assymetric gw architecture
    gwEndpoints:  # Null endpoints, rely on discovery
    gwAssembly: integration-env # GW assembly name
    smConn:  # Endpoint to storage manager, set when discovery not being
    tableLoad: default # How to populate database tables
    mountName: hdb # Name of mount

The RDB discovery sidecar

Config file configured as per discovery documentation.

{
    "connection": ":rdb:5080",
    "frequencySecs": 5,
    "discovery":
    {
        "registry": ":proxy:4000",
        "adaptor": "discEurekaAdaptor.q",
        "heartbeatSecs": 30,
        "leaseExpirySecs": 90
    }
}

Custom file

Each DA process can load a custom file for custom API support. For example,

// Sample DA custom file.

// Can load other files within this file. Note that the current directory
// is the directory of this file (in this example: /opt/kx/custom).
\l subFolder/otherFile1.q
\l subFolder/otherFile2.q

//
// @desc Define a new API. Counts number of entries by specified columns.
//
// @param table     {symbol}            Table name.
// @param byCols    {symbol|symbol[]}   Column(s) to count by.
// @param startTS   {timestamp}         Start time (inclusive).
// @param endTS     {timestamp}         End time (exclusive).
//
// @return          {table}             Count by specified columns.
//
countBy:{[table;startTS;endTS;byCols]
    ?[table;enlist(within;`realTime;(startTS;endTS-1));{x!x,:()}byCols;enlist[`cnt]!enlist(count;`i)]
    }

// etc...