Skip to content

Query Resilience

There are four process types in the query path: Gateway (GW), Resource Coordinator (RC), Data Access Process (DAP), and Aggregator (Agg). Each process can be configured with multiple replicas for resiliency. Process connections are as follows.

  • GWs connect to multiple RCs. Each GW distributes requests round-robin across all known RCs.

    GW-RC connections

  • DAPs and Aggs connect to exactly one RC each. Hence, every RC owns its set of DAPs/Aggs.

    RC-DAP-Agg connections

  • RCs can connect to each other.

    RC-RC connections

In general, it is best practice to allocate multiple of each resource at each connection point. That is:

  • Allocate multiple RCs, so that if one dies, the GWs can distribute to the remaining ones. If no RCs remain, requests return a "No Resource Coordinator connections are available and ready for service" error (see Troubleshooting).
  • Allocate multiple DAPs (of each type RDB/IDB/HDB) for each label set to each RC. Multiple DAPs increase query throughput as RCs can distribute queries to several DAPs in parallel. If a DAP dies, the RC continues to distribute to the remaining ones. If no DAPs for a particular tier/label set are available, requests queue up in the RCs (see Queueing).
  • Allocate multiple Aggs to each RC. Multiple Aggs increase query throughput as RCs can allocate queries across several Aggs. If an Agg dies, the RC allocates to the remaining ones. If no Aggs remain for a particular RC, requests received by this RC return a "No aggregator available" error (see Troubleshooting).

Configuration

kdb Insights

Using kdb Insights offers the greatest degree of flexibility around process connection at the cost of extra configuration. All processes connect to the RCs. The details for how to configure each process type are described below.

Gateway

You can configure the Gateway to connect to the RC in one of three ways. They are listed below in decreasing order of decreasing precedence.

  • Environment variable

    The simplest method of configuration is to explicitly define set the RC address in an environment variable. Set host and port of the RC in the KXI_SG_RC_ADDR environment variable in the GW container.

    KXI_SG_RC_ADDR="<rc_host>:<rc_port>"
    

    Note that this method restricts the GW to connect to a single RC.

  • Kubernetes control plane.

    If using Kubernetes, configure the GW to connect to RCs using Kubernetes labels. For this method, the GW pod requires Kubernetes RBAC permissions for the "get", "watch", and "list" verbs of the "pods" resource. The following is an example GW configuration snippet.

    # GW pod.
    apiVersion: apps/v1
    kind: Pod
    metadata:
      name: insights-gateway
    spec:
      serviceAccountName: insights-gateway-serviceAccount
      containers:
      # GW container
      - ...
        env:
        # Set the following environment variable. The key-value pair must match the metadata labels of the RC(s).
        - name: KXI_RC_LABEL_SELECTOR
          value: app.kubernetes.io/name=resource-coordinator
    ---
    # GW service account.
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: insights-gateway-service-account
    ---
    # RBAC role.
    apiVersion: rbac.authorization.k8s.io/v1
    kind: Role
    metadata:
      name: insights-gateway-role
    rules:
    - apiGroups: [""]
      resources: ["pods"]
      verbs: ["get", "watch", "list"]
    ---
    # RoleBinding RBAC role to GW's ServiceAccount.
    apiVersion: rbac.authorization.k8s.io/v1
    kind: RoleBinding
    metadata:
      name: insights-gateway-role-binding
    subjects:
    - kind: ServiceAccount
      name: insights-gateway-service-account
      apiGroup: ""
    roleRef:
      kind: Role
      name: insights-gateway-role
      apiGroup: ""
    

    The GW(s) connect to all RCs with the corresponding metadata labels:

    kind: Pod
    metadata:
      name: insights-resource-coordinator
      labels:
        app.kubernetes.io/name: "resource-coordinator" # Must match GW's KXI_RC_LABEL_SELECTOR
    spec:
      containers:
      - ...
        ports:
        - ...
          containerPort: 5050 # Must set a port for the GW to connect to
          protocol: TCP
    
  • Service discovery

    See Discovery.

Data access process

DAPs connect to their respective RCs in one of two ways. They are listed here in order of decreasing precedence.

  • Environment variable

    Configure a DAP to explicitly connect to a particular RC by defining the RC address in the KXI_SG_RC_ADDR environment variable.

    KXI_SG_RC_ADDR="<rc_host>:<rc_port>"
    
  • Service discovery

    See Discovery. Using this method the DAPs connect to RCs by ordinal.

Aggregator

Aggs connect to to their respective RCs in one of two ways. They are listed here in order of decreasing precedence.

  • Environment variable

    Configure an Agg to explicitly connect to a particular RC by defining the RC address in the KXI_SG_RC_ADDR environment variable.

    KXI_SG_RC_ADDR="<rc_host>:<rc_port>"
    
  • Service discovery

    See Discovery. Using this method the Aggs connect to RCs by ordinal.

Resource coordinator

RCs connect to each other so they can enlist each other for help when the RC receiving the request does not contain the required DAPs to be able to complete the request on its own (see Routing). RCs can only connect to each other using Kubernetes labels. The RC pods require Kubernetes RBAC permissions for the "get" and "list" verbs of the "pods" resource. The following is an example RC configuration snippet.

# RC pod.
apiVersion: apps/v1
kind: Pod
metadata:
  name: insights-resource-coordinator
  labels:
    app.kubernetes.io/name: resource-coordinator
spec:
  ServiceAccountName: insights-resource-coordinator-serviceAccount
  containers:
  # Note: the RC's pod may have multiple containers, but the RC container MUST be the last containder not named *sidecar*.
  - ...
    ports:
    - ...
      containerPort: 5050 # Must set a port for peer RC's to connect to
      protocol: TCP
---
# RC service account.
apiVersion: v1
kind: ServiceAccount
metadata:
  name: insights-resource-coordinator-service-account
---
# RBAC role.
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
    name: insights-resource-coordinator-role
rules:
- apiGroups: [""]
    resources: ["pods"]
    verbs: ["get", "list"]
---
# RoleBinding RBAC role to RC's ServiceAccount.
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
    name: insights-resource-coordinator-role-binding
subjects:
- kind: ServiceAccount
    name: insights-resource-coordinator-service-account
    apiGroup: ""
roleRef:
    kind: Role
    name: insights-resource-coordinator-role
    apiGroup: ""

kdb Insights Enterprise

If you use kdb Insights Enterprise, no extra configuration is needed. GWs, DAPs, and Aggs connect to RCs using Service discovery. In particular, DAP-RC and Agg-RC connection is done by ordinal.

Ordinal connection

If you use Service discovery to connect to RCs, DAPs and Aggs, use ordinals. A process's ordinal is the number following the last "-" or "_" in the process's host name. If a process's host name has no number following the last "-" or "_", then its ordinal is 0. For example:

host name ordinal
resource-coordinator-3 3
dap-hdb_11 11
aggregatorOne 0

It is important to use properly numbered RC, DAP, and Agg replicas with sequential ordinals. Use Kubernetes StatefulSets or Docker compose replicas to do this.

In kdb Insights, in order for this method to work, RCs MUST set the following the KXI_RC_STS_SIZE environment variable to the total number of RCs.

KXI_RC_STS_SIZE=<total_number_of_RCs>

In kdb Insights Enterprise, this environment variable is automatically set.

A DAP or Agg with ordinal n connects to the (unique) RC whose ordinal is congruent to n modulo KXI_RC_STS_SIZE. For example, in a system with 6 DAPs and KXI_RC_STS_SIZE=3:

DAP RC
dap-0 rc-0
dap-1 rc-1
dap-2 rc-2
dap-3 rc-0
dap-4 rc-1
dap-5 rc-2

Note

  • You must have at least as many DAPs and Aggs as RCs so that each RC has at least one DAP and one Agg.
  • We recommend that you have a number of DAPs and Aggs equal to a multiple of the number of RCs so that each RC has equal query throughput capacity.