Skip to content

Metrics Reference

This page lists the metrics generated by the Platform. See Overview for details on how metrics are captured, stored and presented. Each component will generate a range of metrics. With metrics enabled on a component, kdb info, memory and cpu metrics are generated.

License

kdb_info

This is a gauge metric detailing the the kdb+ license.

metric name type description
kdb_info gauge kdb+ license info
kdb_info{license_expiry_date="2022.01.27", os_version="l64", process_cores="4", release_date="2021.06.12", release_version="4.1", service="component"}

Memory stats

Enabling global metric enables the capture of kdb+ memory stats.

metric name type description
memory_usage_bytes gauge Current memory usage
memory_heap_bytes gauge Heap memory
memory_heap_peak_bytes gauge Maximum heap size so far
memory_heap_limit_bytes gauge Memory limit heap
memory_mapped_bytes gauge Mapped memory
memory_physical_bytes gauge Physical memory
kdb_syms_total gauge Number of symbols
kdb_syms_memory_bytes gauge memory use of symbols

memory_usage_bytes

This gauge metric reports the current memory usage in bytes of the component container. Output is taken from .Q.w[].

q) .Q.w[]`used
memory_usage_bytes{service="component"}  1010128

memory_heap_bytes

This gauge metric reports the memory available in the heap in bytes for the component container. Output is taken from .Q.w[].

.Q.w[]`heap
memory_heap_bytes{service="component"}  67108860

memory_heap_peak_bytes

This gauge metric reports the maximum heap size so far in bytes for the component container. Output is taken from .Q.w[].

.Q.w[]`peak
memory_heap_peak_bytes{service="component"}  67108860

memory_heap_limit_bytes

This gauge metric reports the memory limit heap in bytes for the component container as set by -w.

Output is taken from .Q.w[]

.Q.w[]`wmax
memory_heap_limit_bytes{service="component"}  0

memory_mapped_bytes

This gauge metric reports the mapped memory in bytes for the component container.

Output is taken from .Q.w[]

.Q.w[]`mmap
memory_mapped_bytes{service="component"}   0

memory_physical_bytes

This gauge metric reports the physical memory available in bytes for the component container.

Output is taken from .Q.w[]

.Q.w[]`mphy
memory_physical_bytes{service="component"}  16788270000

kdb_syms_total

This gauge metric reports the number of symbols for the component container.

Output is taken from .Q.w[]

q) .Q.w[]`syms
kdb_syms_total{service="component"}   2150

kdb_syms_memory_bytes

This gauge metric reports the memory use of symbols in bytes for the component container.

Output is taken from .Q.w[]

q) .Q.w[]`symw
kdb_syms_memory_bytes{service="component"}  106648

.z Handler Metrics

Each component is able to generate metrics from the .z.* handlers, these are enabled by setting the relevant field true under the metrics.handler object in your values file. See details here.

metric name type description
kdb_ipc_opened_total counter Number of opened ipc sockets
kdb_handles_total gauge Number of open handles (ipc and websocket)
kdb_ipc_closed_total counter Number of ipc sockets closed
kdb_ws_opened_total counter Number of websockets opened
kdb_ws_closed_total counter Number of websockets closed
kdb_sync_total counter Number of sync requests made
kdb_sync_err_total counter Number of errors returned in sync requests
kdb_sync_histogram_seconds histogram Count and time taken by sync requests
kdb_async_total counter Number of async requests made
kdb_async_err_total counter Number of errors returned in async requests
kdb_async_histogram_seconds histogram Count and time taken by async requests
kdb_http_get_total counter Number of http GET requests made
kdb_http_get_err_total counter Number of errors returned in http GET requests
kdb_http_get_histogram histogram Count and time taken by http GET requests
kdb_http_post_total counter Number of http POST requests made
kdb_http_post_err_total counter Number of errors returned in http POST requests
kdb_http_post_histogram histogram Count and time taken by http POST requests
kdb_ts_total counter Number of timer calls made
kdb_ts_err_total counter Number of errors returned in timer calls
kdb_ts_histogram histogram Count and time taken by timer calls
kdb_ws_total counter Number of websocket calls made
kdb_ws_err_total counter Number of errors returned in websocket calls
kdb_ws_histogram histogram Count and time taken by websocket calls

kdb_ipc_opened_total

This counter metric reports the total number of ipc sockets that have been opened to the component container.

Enable by setting:

handler:
    po: true
kdb_ipc_opened_total{service="component"}   1

kdb_handles_total

This gauge metric reports the total number of open handles (ipc and websocket) to the component container.

Enabled by setting any of the following:

handler:
    po: true
    pc: true
    wo: true
    wc: true
kdb_handles_total{service="component"}  1

Note

This metric is incremented and decremented by multiple handlers, all should be enabled to get a better view of application.

kdb_ipc_closed_total

This counter metric reports the total number of ipc sockets that have been closed to the component container.

Enable by setting:

handler:
    pc: true
kdb_ipc_closed_total{service="component"}  0

kdb_ws_opened_total

This counter metric reports the total number of websockets opened to the component container.

Enable by setting:

handler:
    wo: true
kdb_ws_opened_total{service="component"}  0

kdb_ws_closed_total

This counter metric reports the total number of websockets closed to the component container.

Enable by setting:

handler:
    wc: true
kdb_ws_closed_total{service="component"}  0

kdb_sync

Metrics prepended with kdb_sync relate to the sync requests made to the component container.

Enable by setting:

handler:
    pg: true

kdb_async

Metrics prepended with kdb_async relate to the async requests made to the component container.

Enable by setting:

handler:
    ps: true

kdb_http_get

Metrics prepended with kdb_http_get relate to the http GET requests made to the component container.

Enable by setting:

handler:
    ph: true

kdb_http_post

Metrics prepended with kdb_http_post relate to the http POST requests made to the component container.

Enable by setting:

handler:
    pp: true

kdb_ts

Metrics prepended with kdb_ts relate to the timer calls made within the component container.

Enable by setting:

handler:
    ts: true

kdb_ws

Metrics prepended with kdb_ws relate to the websocket messages received by the component container. Note the special kdb_ws_opened_total and kdb_ws_closed_total metrics have individual settings.

Enable by setting:

handler:
    ws: true

kxi-controller

Each call to a kxi_controller API endpoint will generate metrics.

sandbox

Relates to the management of sandboxes from the UI. Available metrics are:

metric name type description
kxi_kxic_list_histogram_seconds histogram Count and time taken to list sandboxes
kxi_kxic_list_failure_total counter Number of failed calls to list sandboxes
kxi_kxic_create_histogram_seconds histogram Count and time taken to create sandboxes
kxi_kxic_create_failure_total counter Number of failed calls to create sandboxes
kxi_kxic_listOne_histogram_seconds histogram Count and time taken to list a single sandbox
kxi_kxic_listOne_failure_total counter Number of failed calls to list a single sandbox
kxi_kxic_status_histogram_seconds histogram Count and time taken to retrieve status of sandboxes
kxi_kxic_status_failure_total counter Number of failed calls to retrieve status of sandboxes
kxi_kxic_expiresAfter_histogram_seconds histogram Count and time taken to return the sandbox expiresAfter timestamp
kxi_kxic_expiresAfter_failure_total counter Number of failed calls to return the sandbox expiresAfter timestamp
kxi_kxic_refresh_histogram_seconds histogram Count and time taken to renew the lease on sandboxes
kxi_kxic_refresh_failure_total counter Number of failed calls to renew the lease on sandboxes
kxi_kxic_teardown_histogram_seconds histogram Count and time taken to teardown sandboxes
kxi_kxic_teardown_failure_total counter Number of failed calls to teardown sandboxes

schema

Relates to the management of schemas from the UI. Available metrics are:

metric name type description
kxi_kxic_schema_list_histogram_seconds histogram Count and time taken to list schemas
kxi_kxic_schema_list_failure_total counter Number of failed calls to list schemas
kxi_kxic_schema_create_histogram_seconds histogram Count and time taken to create schemas
kxi_kxic_schema_create_failure_total counter Number of failed calls to create schemas
kxi_kxic_schema_get_histogram_seconds histogram Count and time taken to get a schema by ID
kxi_kxic_schema_get_failure_total counter Number of failed calls to get a schema by ID
kxi_kxic_schema_update_histogram_seconds histogram Count and time taken to update schemas
kxi_kxic_schema_update_failure_total counter Number of failed calls to update schemas
kxi_kxic_schema_delete_histogram_seconds histogram Count and time taken to delete schemas
kxi_kxic_schema_delete_failure_total counter Number of failed calls to delete schemas
kxi_kxic_schema_count gauge Number of existing schemas

database

Relates to the management of databases from the UI. Available metrics are:

metric name type description
kxi_kxic_db_list_histogram_seconds histogram Count and time taken to list databases
kxi_kxic_db_list_failure_total counter Number of failed calls to list databases
kxi_kxic_db_create_histogram_seconds histogram Count and time taken to create databases
kxi_kxic_db_create_failure_total counter Number of failed calls to create databases
kxi_kxic_db_get_histogram_seconds histogram Count and time taken to get databases by ID
kxi_kxic_db_get_failure_total counter Number of failed calls to get databases by ID
kxi_kxic_db_update_histogram_seconds histogram Count and time taken to update databases
kxi_kxic_db_update_failure_total counter Number of failed calls to update databases
kxi_kxic_db_delete_histogram_seconds histogram Count and time taken to delete databases
kxi_kxic_db_delete_failure_total counter Number of failed calls to delete databases
kxi_kxic_db_count gauge Number of existing databases

pipeline

Relates to the pipelines of sandboxes from the UI. Available metrics are:

metric name type description
kxi_kxic_pipeline_list_histogram_seconds histogram Count and time taken to list pipelines
kxi_kxic_pipeline_list_failure_total counter Number of failed calls to list pipelines
kxi_kxic_pipeline_create_histogram_seconds histogram Count and time taken to create pipelines
kxi_kxic_pipeline_create_failure_total counter Number of failed calls to create pipelines
kxi_kxic_pipeline_get_histogram_seconds histogram Count and time taken to get pipelines by ID
kxi_kxic_pipeline_get_failure_total counter Number of failed calls to get pipelines by ID
kxi_kxic_pipeline_update_histogram_seconds histogram Count and time taken to update pipelines
kxi_kxic_pipeline_update_failure_total counter Number of failed calls to update pipelines
kxi_kxic_pipeline_delete_histogram_seconds histogram Count and time taken to delete pipelines
kxi_kxic_pipeline_delete_failure_total counter Number of failed calls to delete pipelines
kxi_kxic_pipeline_count gauge Number of existing pipelines

stream

Relates to the management of streams from the UI. Available metrics are:

metric name type description
kxi_kxic_stream_list_histogram_seconds histogram Count and time taken to list streams
kxi_kxic_stream_list_failure_total counter Number of failed calls to list streams
kxi_kxic_stream_create_histogram_seconds histogram Count and time taken to create streams
kxi_kxic_stream_create_failure_total counter Number of failed calls to create streams
kxi_kxic_stream_get_histogram_seconds histogram Count and time taken to get streams by ID
kxi_kxic_stream_get_failure_total counter Number of failed calls to get streams by ID
kxi_kxic_stream_update_histogram_seconds histogram Count and time taken to update streams
kxi_kxic_stream_update_failure_total counter Number of failed calls to update streams
kxi_kxic_stream_delete_histogram_seconds histogram Count and time taken to delete streams
kxi_kxic_stream_delete_failure_total counter Number of failed calls to delete streams
kxi_kxic_streams_count gauge Number of existing streams

assembly

Relates to the management of assemblies from the UI. Available metrics are:

metric name type description
kxi_kxic_assembly_list_histogram_seconds histogram Count and time taken to list assemblies
kxi_kxic_assembly_list_failure_total counter Number of failed calls to list assemblies
kxi_kxic_assembly_create_histogram_seconds histogram Count and time taken to create assemblies
kxi_kxic_assembly_create_failure_total counter Number of failed calls to create assemblies
kxi_kxic_assembly_get_histogram_seconds histogram Count and time taken to get assemblies
kxi_kxic_assembly_get_failure_total counter Number of failed calls to get assemblies
kxi_kxic_assembly_update_histogram_seconds histogram Count and time taken to update assemblies
kxi_kxic_assembly_update_failure_total counter Number of failed calls to update assemblies
kxi_kxic_assembly_delete_histogram_seconds histogram Count and time taken to delete assemblies
kxi_kxic_assembly_delete_failure_total counter Number of failed calls to delete assemblies
kxi_kxic_assembly_deploy_histogram_seconds histogram Count and time taken to deploy assemblies
kxi_kxic_assembly_deploy_failure_total counter Number of failed calls to deploy assemblies
kxi_kxic_assembly_export_histogram_seconds histogram Count and time taken to export assemblies
kxi_kxic_assembly_export_failure_total counter Number of failed calls to export assemblies
kxi_kxic_assembly_teardown_histogram_seconds histogram Count and time taken to teardown assemblies
kxi_kxic_assembly_teardown_failure_total counter Number of failed calls to teardown assemblies
kxi_kxic_assembly_count gauge Number of existing assemblies

discovery-proxy

Each call to a discovery-proxy API endpoint will generate metrics.

metric name type description
kxi_sd_register_histogram_seconds histogram Count and time taken to register services with discovery
kxi_sd_register_failure_total counter Number of failed calls to register services with discovery
kxi_sd_updateDetails_histogram_seconds histogram Count and time taken to service details with the registry
kxi_sd_updateDetails_failure_total counter Number of failed calls to service details with the registry
kxi_sd_getServices_histogram_seconds histogram Count and time taken to get the latest services from the registry
kxi_sd_getServices_failure_total counter Number of failed calls to get the latest services from the registry
kxi_sd_heartbeat_histogram_seconds histogram Count and time taken to heartbeat with the registry
kxi_sd_heartbeat_failure_total counter Number of failed calls to heartbeat with the registry
kxi_sd_updateStatus_histogram_seconds histogram Count and time taken to update status with registry
kxi_sd_updateStatus_failure_total counter Number of failed calls to update status with registry
kxi_sd_deregister_histogram_seconds histogram Count and time taken to deregister from the registry
kxi_sd_deregister_failure_total counter Number of failed calls to deregister from the registry
kxi_sd_alive_histogram_seconds histogram Count and time taken to check the proxy process is responsive
kxi_sd_alive_failure_total counter Number of failed calls to check the proxy process is responsive
kxi_sd_ready_histogram_seconds histogram Count and time taken to check the proxy process is ready
kxi_sd_ready_failure_total counter Number of failed calls to check the proxy process is ready

kxi-operator

KXI Operator generates metrics on each attempt to interact with a namespace SP Coordinator or Keycloak instance.

metric name type description
kxi_operator_keycloak_errors_total counter Number of failed requests to Keycloak
kxi_operator_keycloak_request_seconds histogram Count and time taken to make requests to Keycloak
kxi_operator_pipeline_errors_total counter Number of failed calls to the SP Coordinator
kxi_operator_pipeline_request_seconds histogram Count and time taken to make requests to the SP Coordinator

information service

Each call to an information service API endpoint will generate metrics.

metric name type description
kxi_info_details_histogram_seconds histogram Count and time taken to get details for a specific client ID
kxi_info_details_failure_total counter Number of failed calls to get details for a specific client ID

client-controller

Each call to a client-controller API endpoint will generate metrics.

metric name type description
kxi_com_kx_cc_enrol_histogram_seconds histogram Count and time taken to enroll clients
kxi_com_kx_cc_enrol_failure_total counter Number of failed calls to enroll clients
kxi_com_kx_cc_leave_histogram_seconds histogram Count and time taken to remove clients
kxi_com_kx_cc_leave_failure_total counter Number of failed calls to remove clients

reliable transport

The Reliable Transport, also known as an KX Insights Stream publishes status and performance metrics by default. These can be disabled by setting the environment variable RT_EXPORT_METRICS="0".

metric name type Description Component Node
kxi_rt_seq_leader gauge Leadership status of node sequencer all
kxi_rt_in_bytes_total counter Count of input bytes sequenced sequencer leader
kxi_rt_in_messages_total counter Count of input messages sequenced sequencer leader
kxi_rt_in_bytes counter Count of input bytes sequenced per directory sequencer leader
kxi_rt_in_messages counter Count of input messages sequenced per directory sequencer leader
kxi_rt_out_bytes_total counter Count of bytes merged merger all
kxi_rt_out_messages_total counter Count of messages merged merger all
kxi_out_bytes counter Count of bytes merged per directory merger all
kxi_out_messages counter Count of messages merged per directory merger all
kxi_rt_merge_queue_size gauge Merge instructions waiting in queue merger all

Note

  • As well as exporting the total number of messages and bytes transferred (*_total metrics), RT exports the number of messages and bytes transferred per publisher, using a directory label set to topicname.hostname.

  • The sequencer leader metric is set on restart/leader change, therefore it is always up to date. The other sequencer metrics are updated every second, but only on the leader node and should be ignored for all other nodes.

  • The merger metrics are all updated every second for all nodes.

Labels

All metrics include the following labels:

label description example
ha_type HA configuration 3-node
raft_node_index the node index from the hostname 0

Metrics defined for each individual publisher also include the following labels:

label description example
directory name of the directory topicname.hostname
dedup_stream name of the topic, if the input stream is being deduplicated topicname