Monitoring

The kdb Insights Database can be configured to report metrics about data ingested and queries serviced, which can be reported to a monitoring endpoint. The actual scraping of these metrics is done by the kdb Insights sidecar container which uses the kxi-sidecar image, and can be be set to report to an event monitoring and alerting application such as Prometheus.

Query Metrics

Component	Name	Type	Description
SG	`kxi_sg_ipc_requests_total`	counter	Total QIPC requests
SG	`kxi_sg_ipc_responses_total`	counter	Total QIPC responses
SG	`kxi_sg_http_requests_total`	counter	Total HTTP requests
SG	`kxi_sg_http_responses_total`	counter	Total HTTP responses
SG	`kxi_sg_pending`	gauge	Number of pending queries (Both HTTP/IPC)
SG	`kxi_sg_connected_aggregators`	gauge	Number of connected aggregators
SG	`kxi_sg_connected_coordinators`	gauge	Number of connected coordinators
SG	`kxi_sg_connected_clients`	gauge	Number of connected q clients
RC	`kxi_rc_reqs_total`	counter	Service requests received
RC	`kxi_rc_queue_length`	gauge	Length of the outstanding request queue
RC	`kxi_rc_connected_daps`	gauge	Number of connected target DAPs
RC	`kxi_rc_connected_aggs`	gauge	Number of connected Aggs
RC	`kxi_rc_retry_count`	counter	Total number of request retry attempts
RC	`kxi_rc_req_complete_time`	histogram	Histogram of request completion times
Agg	`kxi_agg_fn_time`	histogram	Histogram of duration of aggregation functions
Agg	`kxi_agg_errors`	counter	Number of errors from aggregation functions
Agg	`kxi_agg_timeouts`	counter	Number of timeouts for requests for this agg
Agg	`kxi_agg_partials_received`	counter	Number of partial responses received
Agg	`kxi_agg_requests_held`	counter	Number of requests in progress
Agg	`kxi_agg_http_json_reqs`	counter	Number of HTTP JSON requests
Agg	`kxi_agg_http_octet_reqs`	counter	Number of HTTP octet stream requests
Agg	`kxi_agg_ipc_reqs`	counter	Number of IPC requests
DA	`kxi_da_purview_start`	gauge	Start timestamp of DA purview
DA	`kxi_da_purview_end`	gauge	End timestamp of DA purview
DA	`kxi_da_records_after_purge`	gauge	Total records remaining after a purge
DA	`kxi_da_stream_msgs`	counter	Number of inbound messages received
DA	`kxi_da_stream_records`	counter	Number of inbound records received
DA	`kxi_da_stream_pos`	counter	Current RT stream position
DA	`kxi_da_requests`	counter	Count of requests received in interval
DA	`kxi_da_failed_requests`	counter	Count of failed requests received in interval
DA	`kxi_da_request_time`	histogram	Duration of requests in milliseconds received in interval. Buckets can be set with `KXI_REQUEST_METRIC_BUCKETS` environment var. Default "50 100 500 1000 2000 10000"
SM	`kxi_sm_clients`	gauge	Currently connected clients
SM	`kxi_sm_stream_records`	gauge	Number of records read from RT stream
SM	`kxi_sm_msgs`	gauge	Number of messages read from RT stream
SM	`kxi_sm_eoi_requests_pending`	gauge	EOI requests awaiting completion
SM	`kxi_sm_eod_requests_pending`	gauge	EOD requests awaiting completion
SM	`kxi_sm_eoi_count`	counter	Number of completed End of Interval runs
SM	`kxi_sm_eod_count`	counter	Number of completed End of Day runs
SM	`kxi_sm_eoi_duration_seconds`	gauge	Duration of the most recent End of Interval
SM	`kxi_sm_eod_duration_seconds`	gauge	Duration of the most recent End of Day
SM	`kxi_sm_eoi_stream_pos`	gauge	Current RT stream position
SM	`kxi_sm_eoi_records`	gauge	Number of records written during EOI
SM	`kxi_sm_hdb_date_records`	gauge	Number of total records in latest EOD partition
SM	`kxi_sm_hdb_size`	gauge	Size of HDB (in MB)
SM	`kxi_sm_hdb_partitions`	gauge	Number of partitions in HDB

Configuration

Service Gateway

To enable metrics for the service gateway, configure the following environment variables:

- name: KXI_SG_METRICS_ENABLED
  value: "true"
- name: KXI_SG_METRICS_ENDPOINT
  value: /metrics
- name: KXI_SG_METRICS_PORT
  value: "8081"

Once these variable are configured, you may now get metrics by querying http://localhost:8081/metrics.

You may wish to configure your service gateway such that internal probes, such as a ServiceMonitor, can reach 8081 over HTTP; this technique is referred to as scraping metrics.

How you scrape metrics depends on how your services are deployed (e.g. in Kubernetes), and what your monitoring stack is (e.g. Prometheus).

For example, within Kubernetes, you may configure a ServiceMontitor and have it reference a named 'metrics' port, that is defined in a corresponding Service.

Below are partial snippets of a ServiceMontitor and Service YAML which refer to a port by name:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
spec:
  endpoints:
    - port: "metrics"
      path: "/metrics"
      interval: "2m"

Ensure that the port name used in the ServiceMonitor endpoints, matches the port name in the corresponding service YAML:

apiVersion: v1
kind: Service
spec:
  type: ClusterIP
  ports:
    - name: metrics
      protocol: TCP
      port: 8081
      targetPort: 8081
...

q containers

A Docker Compose example of how to set up kxi-sidecar and Prometheus in an environment is detailed below. First in the Docker Compose file, the sidecar and the Prometheus processes need to added.

  rdb: # Data Access Process RDB
    image: kxi-da
    command: -p 5080
    environment:
      - KXI_NAME=rdb
      - KXI_PORT=5080
      - KXI_SC=RDB
      - KXI_ASSEMBLY_FILE=/opt/kx/cfg/assembly/assembly.yml
    networks:
      - kx

  rdb-sidecar: # Sidecar for data access process named RDB
    image: kxi_sidecar:0.9.0 # Can be pulled from kdb Insights repo
    command: -p 8080
    environment:
      - KXI_CONFIG_FILE=/opt/kx/cfg/docker/rdb.json
    networks:
      - kx
    volumes:
      - ./config:/etc/kx/cfg # Make rdb.json available to container

prometheus: # Prometheus monitoring
  image: prometheus
  command: --config.file=/etc/prometheus/prometheus.yml
  ports:
   - "8080:8080"
  networks:
   - kx
  volumes:
   - ./config:/etc/prometheus # Make prometheus configuration available

An example rdb.json file is shown below. In it the connection field points to the main DAP container and is set to scrape the container every 5 seconds through metrics.frequency.

{
    "connection": ":rdb_1:5080",
    "frequencySecs": 5,
    "metrics":
    {
        "enabled":"true",
        "frequency": 5,
        "handler": {
            "pc": true,
            "pg": true,
            "ph": true,
            "po": true,
            "pp": true,
            "ps": true,
            "ts": true,
            "wc": true,
            "wo": true,
            "ws": true
          }
    }
}

An example prometheus.yml set to scrape every 15 seconds and set to actively scrape from the sidecar process is shown below.

global:
  scrape_interval:     15s # By default, scrape targets every 15 seconds.
  evaluation_interval: 15s # Evaluate rules every 15 seconds.

scrape_configs:
  - job_name: 'rdb-monitoring'
    static_configs:
      - targets: ['rdb-sidecar_1:8080'] # Point to RDB's sidecar

Storage Metrics

When integrating with the monitoring sidecar, the following metrics will be available.

component	type	name	description
SM	gauge	`kxi_sm_clients`	Currently connected clients
SM	gauge	`kxi_sm_stream_records`	Number of records read from RT stream
SM	gauge	`kxi_sm_msgs`	Number of messages read from RT stream
SM	gauge	`kxi_sm_eoi_requests_pending`	EOI requests awaiting completion
SM	gauge	`kxi_sm_eod_requests_pending`	EOD requests awaiting completion
SM	counter	`kxi_sm_eoi_count`	Number of completed End of Interval runs
SM	counter	`kxi_sm_eod_count`	Number of completed End of Day runs
SM	gauge	`kxi_sm_eoi_duration_seconds`	Duration of the most recent End of Interval
SM	gauge	`kxi_sm_eod_duration_seconds`	Duration of the most recent End of Day
SM	gauge	`kxi_sm_eoi_stream_pos`	Current RT stream position
SM	gauge	`kxi_sm_eoi_records`	Number of records written during EOI
SM	gauge	`kxi_sm_hdb_date_records`	Number of total records in latest EOD partition
SM	gauge	`kxi_sm_hdb_size`	Size of HDB (in MB)
SM	gauge	`kxi_sm_hdb_partitions`	Number of partitions in HDB