Monitoring
The kdb Insights Database can be configured to report metrics about data ingested and queries serviced, which can be reported to a monitoring endpoint. The actual scraping of these metrics is done by the kdb Insights sidecar container which uses the kxi-sidecar
image, and can be be set to report to an event monitoring and alerting application such as Prometheus.
Query Metrics
Component | Name | Type | Description |
---|---|---|---|
SG | kxi_sg_ipc_requests_total |
counter | Total QIPC requests |
SG | kxi_sg_ipc_responses_total |
counter | Total QIPC responses |
SG | kxi_sg_http_requests_total |
counter | Total HTTP requests |
SG | kxi_sg_http_responses_total |
counter | Total HTTP responses |
SG | kxi_sg_pending |
gauge | Number of pending queries (Both HTTP/IPC) |
SG | kxi_sg_connected_aggregators |
gauge | Number of connected aggregators |
SG | kxi_sg_connected_coordinators |
gauge | Number of connected coordinators |
SG | kxi_sg_connected_clients |
gauge | Number of connected q clients |
RC | kxi_rc_reqs_total |
counter | Service requests received |
RC | kxi_rc_queue_length |
gauge | Length of the outstanding request queue |
RC | kxi_rc_connected_daps |
gauge | Number of connected target DAPs |
RC | kxi_rc_connected_aggs |
gauge | Number of connected Aggs |
RC | kxi_rc_retry_count |
counter | Total number of request retry attempts |
RC | kxi_rc_req_complete_time |
histogram | Histogram of request completion times |
Agg | kxi_agg_fn_time |
histogram | Histogram of duration of aggregation functions |
Agg | kxi_agg_errors |
counter | Number of errors from aggregation functions |
Agg | kxi_agg_timeouts |
counter | Number of timeouts for requests for this agg |
Agg | kxi_agg_partials_received |
counter | Number of partial responses received |
Agg | kxi_agg_requests_held |
counter | Number of requests in progress |
Agg | kxi_agg_http_json_reqs |
counter | Number of HTTP JSON requests |
Agg | kxi_agg_http_octet_reqs |
counter | Number of HTTP octet stream requests |
Agg | kxi_agg_ipc_reqs |
counter | Number of IPC requests |
DA | kxi_da_purview_start |
gauge | Start timestamp of DA purview |
DA | kxi_da_purview_end |
gauge | End timestamp of DA purview |
DA | kxi_da_records_after_purge |
gauge | Total records remaining after a purge |
DA | kxi_da_stream_msgs |
counter | Number of inbound messages received |
DA | kxi_da_stream_records |
counter | Number of inbound records received |
DA | kxi_da_stream_pos |
counter | Current RT stream position |
DA | kxi_da_requests |
counter | Count of requests received in interval |
DA | kxi_da_failed_requests |
counter | Count of failed requests received in interval |
DA | kxi_da_request_time |
histogram | Duration of requests in milliseconds received in interval. Buckets can be set with KXI_REQUEST_METRIC_BUCKETS environment var. Default "50 100 500 1000 2000 10000" |
SM | kxi_sm_clients |
gauge | Currently connected clients |
SM | kxi_sm_stream_records |
gauge | Number of records read from RT stream |
SM | kxi_sm_msgs |
gauge | Number of messages read from RT stream |
SM | kxi_sm_eoi_requests_pending |
gauge | EOI requests awaiting completion |
SM | kxi_sm_eod_requests_pending |
gauge | EOD requests awaiting completion |
SM | kxi_sm_eoi_count |
counter | Number of completed End of Interval runs |
SM | kxi_sm_eod_count |
counter | Number of completed End of Day runs |
SM | kxi_sm_eoi_duration_seconds |
gauge | Duration of the most recent End of Interval |
SM | kxi_sm_eod_duration_seconds |
gauge | Duration of the most recent End of Day |
SM | kxi_sm_eoi_stream_pos |
gauge | Current RT stream position |
SM | kxi_sm_eoi_records |
gauge | Number of records written during EOI |
SM | kxi_sm_hdb_date_records |
gauge | Number of total records in latest EOD partition |
SM | kxi_sm_hdb_size |
gauge | Size of HDB (in MB) |
SM | kxi_sm_hdb_partitions |
gauge | Number of partitions in HDB |
Configuration
Service Gateway
To enable metrics for the service gateway, configure the following environment variables:
- name: KXI_SG_METRICS_ENABLED
value: "true"
- name: KXI_SG_METRICS_ENDPOINT
value: /metrics
- name: KXI_SG_METRICS_PORT
value: "8081"
Once these variable are configured, you may now get metrics by querying http://localhost:8081/metrics
.
You may wish to configure your service gateway such that internal probes, such as a ServiceMonitor
, can reach 8081 over HTTP; this technique is referred to as scraping metrics.
How you scrape metrics depends on how your services are deployed (e.g. in Kubernetes), and what your monitoring stack is (e.g. Prometheus).
For example, within Kubernetes, you may configure a ServiceMontitor
and have it reference a named 'metrics' port, that is defined in a corresponding Service
.
Below are partial snippets of a ServiceMontitor
and Service
YAML which refer to a port by name:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
spec:
endpoints:
- port: "metrics"
path: "/metrics"
interval: "2m"
Ensure that the port name used in the ServiceMonitor
endpoints, matches the port name in the corresponding service YAML:
apiVersion: v1
kind: Service
spec:
type: ClusterIP
ports:
- name: metrics
protocol: TCP
port: 8081
targetPort: 8081
...
q containers
A Docker Compose example of how to set up kxi-sidecar
and Prometheus in an environment is detailed below. First in the Docker Compose file, the sidecar and the Prometheus processes need to added.
rdb: # Data Access Process RDB
image: kxi-da
command: -p 5080
environment:
- KXI_NAME=rdb
- KXI_PORT=5080
- KXI_SC=RDB
- KXI_ASSEMBLY_FILE=/opt/kx/cfg/assembly/assembly.yml
networks:
- kx
rdb-sidecar: # Sidecar for data access process named RDB
image: kxi_sidecar:0.9.0 # Can be pulled from kdb Insights repo
command: -p 8080
environment:
- KXI_CONFIG_FILE=/opt/kx/cfg/docker/rdb.json
networks:
- kx
volumes:
- ./config:/etc/kx/cfg # Make rdb.json available to container
prometheus: # Prometheus monitoring
image: prometheus
command: --config.file=/etc/prometheus/prometheus.yml
ports:
- "8080:8080"
networks:
- kx
volumes:
- ./config:/etc/prometheus # Make prometheus configuration available
An example rdb.json
file is shown below. In it the connection
field points to the main DAP container and is set to scrape the container every 5 seconds through metrics.frequency
.
{
"connection": ":rdb_1:5080",
"frequencySecs": 5,
"metrics":
{
"enabled":"true",
"frequency": 5,
"handler": {
"pc": true,
"pg": true,
"ph": true,
"po": true,
"pp": true,
"ps": true,
"ts": true,
"wc": true,
"wo": true,
"ws": true
}
}
}
An example prometheus.yml
set to scrape every 15 seconds and set to actively scrape from the sidecar process is shown below.
global:
scrape_interval: 15s # By default, scrape targets every 15 seconds.
evaluation_interval: 15s # Evaluate rules every 15 seconds.
scrape_configs:
- job_name: 'rdb-monitoring'
static_configs:
- targets: ['rdb-sidecar_1:8080'] # Point to RDB's sidecar
Storage Metrics
When integrating with the monitoring sidecar, the following metrics will be available.
component | type | name | description |
---|---|---|---|
SM | gauge | kxi_sm_clients |
Currently connected clients |
SM | gauge | kxi_sm_stream_records |
Number of records read from RT stream |
SM | gauge | kxi_sm_msgs |
Number of messages read from RT stream |
SM | gauge | kxi_sm_eoi_requests_pending |
EOI requests awaiting completion |
SM | gauge | kxi_sm_eod_requests_pending |
EOD requests awaiting completion |
SM | counter | kxi_sm_eoi_count |
Number of completed End of Interval runs |
SM | counter | kxi_sm_eod_count |
Number of completed End of Day runs |
SM | gauge | kxi_sm_eoi_duration_seconds |
Duration of the most recent End of Interval |
SM | gauge | kxi_sm_eod_duration_seconds |
Duration of the most recent End of Day |
SM | gauge | kxi_sm_eoi_stream_pos |
Current RT stream position |
SM | gauge | kxi_sm_eoi_records |
Number of records written during EOI |
SM | gauge | kxi_sm_hdb_date_records |
Number of total records in latest EOD partition |
SM | gauge | kxi_sm_hdb_size |
Size of HDB (in MB) |
SM | gauge | kxi_sm_hdb_partitions |
Number of partitions in HDB |