Skip to content

Monitoring

Data Access can be configured to report metrics about data ingested and queries serviced, which can be reported to a monitoring endpoint. The actual scraping of these metrics is done by the Insights sidecar container which uses the kxi-sidecar image, and can be be set to report to an event monitoring and alerting application such as Prometheus.

Metrics

Current statistics reported by Data Access Processes and the Service Gateway are:

Component Name Type Description
DA kxi_da_purview_start gauge Start timestamp of DA purview
DA kxi_da_purview_end gauge End timestamp of DA purview
DA kxi_da_records_after_purge gauge Total records remaining after a purge
DA kxi_da_stream_msgs counter Number of inbound messages received
DA kxi_da_stream_records counter Number of inbound records received
DA kxi_da_stream_pos counter Current RT stream position
DA kxi_da_requests counter Count of requests received in interval
DA kxi_da_failed_requests counter Count of failed requests received in interval
DA kxi_da_request_time summary Duration of requests received in interval
SG kxi_sg_ipc_requests_total counter Total qipc requests
SG kxi_sg_ipc_responses_total counter Total qipc responses
SG kxi_sg_http_requests_total counter Total http requests
SG kxi_sg_http_responses_total counter Total http responses
SG kxi_sg_pending gauge Number of pending queries (Both HTTP/ipc)
SG kxi_sg_connected_aggregators gauge Number of connected aggregators
SG kxi_sg_connected_coordinators gauge Number of connected coordinators
SG kxi_sg_connected_clients gauge Number of connected q clients
RC kxi_rc_reqs_total counter Service requests received
RC kxi_rc_queue_length gauge Length of the outstanding request queue
RC kxi_rc_connected_daps gauge Number of connected target DAPs
RC kxi_rc_connected_aggs gauge Number of connected Aggs
RC kxi_rc_retry_count counter Total number of request retry attempts
RC kxi_rc_req_complete_time histogram Histogram of request completion times
Agg kxi_agg_fn_time histogram Histogram of duration of aggregation functions
Agg kxi_agg_errors counter Number of errors from aggregation functions
Agg kxi_agg_timeouts counter Number of timeouts for requests for this agg
Agg kxi_agg_partials_received counter Number of partial responses received
Agg kxi_agg_requests_held counter Number of requests in progress
Agg kxi_agg_http_json_reqs counter Number of HTTP JSON requests
Agg kxi_agg_http_octet_reqs counter Number of HTTP octetstream requests
Agg kxi_agg_ipc_reqs counter Number of IPC requests

Configuration:

A Docker Compose example of how to setup the kxi-sidecar and Prometheus in an environment is detailed below. First in the Docker Compose file, the sidecar and the prometheus processes need to added.

  rdb: # Data Access Process RDB
    image: kxi-da
    command: -p 5080
    environment:
      - KXI_NAME=rdb
      - KXI_PORT=5080
      - KXI_SC=RDB
      - KXI_ASSEMBLY_FILE=/opt/kx/cfg/assembly/assembly.yml
    networks:
      - kx

  rdb-sidecar: # Sidecar for data access process named RDB
    image: kxi_sidecar:0.9.0 # Can be pulled from Insights repo
    command: -p 8080
    environment:
      - KXI_CONFIG_FILE=/opt/kx/cfg/docker/rdb.json
    networks:
      - kx
    volumes:
      - ./config:/etc/kx/cfg # Make rdb.json available to container

 prometheus: # Prometheus monitoring
  image: prometheus
  command: --config.file=/etc/prometheus/prometheus.yml
  ports:
   - "8080:8080"
  networks:
   - kx
  volumes:
   - ./config:/etc/prometheus # Make prometheus configuration available

An example rdb.json file is shown below. In it the connection field points to the main DAP container and is set to scrape the container every 5 seconds through metrics.frequency.

{
    "connection": ":rdb_1:5080",
    "frequencySecs": 5,
    "metrics":
    {
        "enabled":"true",
        "frequency": 5,
        "handler": {
            "pc": true,
            "pg": true,
            "ph": true,
            "po": true,
            "pp": true,
            "ps": true,
            "ts": true,
            "wc": true,
            "wo": true,
            "ws": true
          }
    }
}

An example prometheus.yml set to scrape every 15 seconds and set to actively scrape from the sidecar process is shown below.

global:
  scrape_interval:     15s # By default, scrape targets every 15 seconds.
  evaluation_interval: 15s # Evaluate rules every 15 seconds.

scrape_configs:
  - job_name: 'rdb-monitoring'
    static_configs:
      - targets: ['rdb-sidecar_1:8080'] # Point to RDB's sidecar