Metrics Reference
This page lists the metrics generated by the Platform. See Overview for details on how metrics are captured, stored and presented. Each component will generate a range of metrics. With metrics enabled on a component, kdb info, memory and cpu metrics are generated.
License
kdb_info
This is a gauge
metric detailing the the kdb+ license.
metric name | type | description |
---|---|---|
kdb_info |
gauge |
kdb+ license info |
kdb_info{license_expiry_date="2022.01.27", os_version="l64", process_cores="4", release_date="2021.06.12", release_version="4.1", service="component"}
Memory stats
Enabling global metric enables the capture of kdb+ memory stats.
metric name | type | description |
---|---|---|
memory_usage_bytes |
gauge |
Current memory usage |
memory_heap_bytes |
gauge |
Heap memory |
memory_heap_peak_bytes |
gauge |
Maximum heap size so far |
memory_heap_limit_bytes |
gauge |
Memory limit heap |
memory_mapped_bytes |
gauge |
Mapped memory |
memory_physical_bytes |
gauge |
Physical memory |
kdb_syms_total |
gauge |
Number of symbols |
kdb_syms_memory_bytes |
gauge |
memory use of symbols |
memory_usage_bytes
This gauge
metric reports the current memory usage in bytes of the component container.
Output is taken from .Q.w[]
.
q) .Q.w[]`used
memory_usage_bytes{service="component"} 1010128
memory_heap_bytes
This gauge
metric reports the memory available in the heap in bytes for the component container.
Output is taken from .Q.w[]
.
.Q.w[]`heap
memory_heap_bytes{service="component"} 67108860
memory_heap_peak_bytes
This gauge
metric reports the maximum heap size so far in bytes for the component container.
Output is taken from .Q.w[]
.
.Q.w[]`peak
memory_heap_peak_bytes{service="component"} 67108860
memory_heap_limit_bytes
This gauge
metric reports the memory limit heap in bytes for the component container as set by -w
.
Output is taken from .Q.w[]
.Q.w[]`wmax
memory_heap_limit_bytes{service="component"} 0
memory_mapped_bytes
This gauge
metric reports the mapped memory in bytes for the component container.
Output is taken from .Q.w[]
.Q.w[]`mmap
memory_mapped_bytes{service="component"} 0
memory_physical_bytes
This gauge
metric reports the physical memory available in bytes for the component container.
Output is taken from .Q.w[]
.Q.w[]`mphy
memory_physical_bytes{service="component"} 16788270000
kdb_syms_total
This gauge
metric reports the number of symbols for the component container.
Output is taken from .Q.w[]
q) .Q.w[]`syms
kdb_syms_total{service="component"} 2150
kdb_syms_memory_bytes
This gauge
metric reports the memory use of symbols in bytes for the component container.
Output is taken from .Q.w[]
q) .Q.w[]`symw
kdb_syms_memory_bytes{service="component"} 106648
.z Handler Metrics
Each component is able to generate metrics from the .z.*
handlers, these are enabled by setting the relevant field true
under the metrics.handler
object in your values file. See details here.
metric name | type | description |
---|---|---|
kdb_ipc_opened_total |
counter |
Number of opened ipc sockets |
kdb_handles_total |
gauge |
Number of open handles (ipc and websocket) |
kdb_ipc_closed_total |
counter |
Number of ipc sockets closed |
kdb_ws_opened_total |
counter |
Number of websockets opened |
kdb_ws_closed_total |
counter |
Number of websockets closed |
kdb_sync_total |
counter |
Number of sync requests made |
kdb_sync_err_total |
counter |
Number of errors returned in sync requests |
kdb_sync_histogram_seconds |
histogram |
Count and time taken by sync requests |
kdb_async_total |
counter |
Number of async requests made |
kdb_async_err_total |
counter |
Number of errors returned in async requests |
kdb_async_histogram_seconds |
histogram |
Count and time taken by async requests |
kdb_http_get_total |
counter |
Number of http GET requests made |
kdb_http_get_err_total |
counter |
Number of errors returned in http GET requests |
kdb_http_get_histogram |
histogram |
Count and time taken by http GET requests |
kdb_http_post_total |
counter |
Number of http POST requests made |
kdb_http_post_err_total |
counter |
Number of errors returned in http POST requests |
kdb_http_post_histogram |
histogram |
Count and time taken by http POST requests |
kdb_ts_total |
counter |
Number of timer calls made |
kdb_ts_err_total |
counter |
Number of errors returned in timer calls |
kdb_ts_histogram |
histogram |
Count and time taken by timer calls |
kdb_ws_total |
counter |
Number of websocket calls made |
kdb_ws_err_total |
counter |
Number of errors returned in websocket calls |
kdb_ws_histogram |
histogram |
Count and time taken by websocket calls |
kdb_ipc_opened_total
This counter
metric reports the total number of ipc sockets that have been opened to the component container.
Enable by setting:
handler:
po: true
kdb_ipc_opened_total{service="component"} 1
kdb_handles_total
This gauge
metric reports the total number of open handles (ipc and websocket) to the component container.
Enabled by setting any of the following:
handler:
po: true
pc: true
wo: true
wc: true
kdb_handles_total{service="component"} 1
Note
This metric is incremented and decremented by multiple handlers, all should be enabled to get a better view of application.
kdb_ipc_closed_total
This counter
metric reports the total number of ipc sockets that have been closed to the component container.
Enable by setting:
handler:
pc: true
kdb_ipc_closed_total{service="component"} 0
kdb_ws_opened_total
This counter
metric reports the total number of websockets opened to the component container.
Enable by setting:
handler:
wo: true
kdb_ws_opened_total{service="component"} 0
kdb_ws_closed_total
This counter
metric reports the total number of websockets closed to the component container.
Enable by setting:
handler:
wc: true
kdb_ws_closed_total{service="component"} 0
kdb_sync
Metrics prepended with kdb_sync
relate to the sync requests made to the component container.
Enable by setting:
handler:
pg: true
kdb_async
Metrics prepended with kdb_async
relate to the async requests made to the component container.
Enable by setting:
handler:
ps: true
kdb_http_get
Metrics prepended with kdb_http_get
relate to the http GET requests made to the component container.
Enable by setting:
handler:
ph: true
kdb_http_post
Metrics prepended with kdb_http_post
relate to the http POST requests made to the component container.
Enable by setting:
handler:
pp: true
kdb_ts
Metrics prepended with kdb_ts
relate to the timer calls made within the component container.
Enable by setting:
handler:
ts: true
kdb_ws
Metrics prepended with kdb_ws
relate to the websocket messages received by the component container. Note the special kdb_ws_opened_total and kdb_ws_closed_total metrics have individual settings.
Enable by setting:
handler:
ws: true
kxi-controller
Each call to a kxi_controller API endpoint will generate metrics.
sandbox
Relates to the management of sandboxes from the UI. Available metrics are:
metric name | type | description |
---|---|---|
kxi_kxic_list_histogram_seconds |
histogram |
Count and time taken to list sandboxes |
kxi_kxic_list_failure_total |
counter |
Number of failed calls to list sandboxes |
kxi_kxic_create_histogram_seconds |
histogram |
Count and time taken to create sandboxes |
kxi_kxic_create_failure_total |
counter |
Number of failed calls to create sandboxes |
kxi_kxic_listOne_histogram_seconds |
histogram |
Count and time taken to list a single sandbox |
kxi_kxic_listOne_failure_total |
counter |
Number of failed calls to list a single sandbox |
kxi_kxic_status_histogram_seconds |
histogram |
Count and time taken to retrieve status of sandboxes |
kxi_kxic_status_failure_total |
counter |
Number of failed calls to retrieve status of sandboxes |
kxi_kxic_expiresAfter_histogram_seconds |
histogram |
Count and time taken to return the sandbox expiresAfter timestamp |
kxi_kxic_expiresAfter_failure_total |
counter |
Number of failed calls to return the sandbox expiresAfter timestamp |
kxi_kxic_refresh_histogram_seconds |
histogram |
Count and time taken to renew the lease on sandboxes |
kxi_kxic_refresh_failure_total |
counter |
Number of failed calls to renew the lease on sandboxes |
kxi_kxic_teardown_histogram_seconds |
histogram |
Count and time taken to teardown sandboxes |
kxi_kxic_teardown_failure_total |
counter |
Number of failed calls to teardown sandboxes |
schema
Relates to the management of schemas from the UI. Available metrics are:
metric name | type | description |
---|---|---|
kxi_kxic_schema_list_histogram_seconds |
histogram |
Count and time taken to list schemas |
kxi_kxic_schema_list_failure_total |
counter |
Number of failed calls to list schemas |
kxi_kxic_schema_create_histogram_seconds |
histogram |
Count and time taken to create schemas |
kxi_kxic_schema_create_failure_total |
counter |
Number of failed calls to create schemas |
kxi_kxic_schema_get_histogram_seconds |
histogram |
Count and time taken to get a schema by ID |
kxi_kxic_schema_get_failure_total |
counter |
Number of failed calls to get a schema by ID |
kxi_kxic_schema_update_histogram_seconds |
histogram |
Count and time taken to update schemas |
kxi_kxic_schema_update_failure_total |
counter |
Number of failed calls to update schemas |
kxi_kxic_schema_delete_histogram_seconds |
histogram |
Count and time taken to delete schemas |
kxi_kxic_schema_delete_failure_total |
counter |
Number of failed calls to delete schemas |
kxi_kxic_schema_count |
gauge |
Number of existing schemas |
database
Relates to the management of databases from the UI. Available metrics are:
metric name | type | description |
---|---|---|
kxi_kxic_db_list_histogram_seconds |
histogram |
Count and time taken to list databases |
kxi_kxic_db_list_failure_total |
counter |
Number of failed calls to list databases |
kxi_kxic_db_create_histogram_seconds |
histogram |
Count and time taken to create databases |
kxi_kxic_db_create_failure_total |
counter |
Number of failed calls to create databases |
kxi_kxic_db_get_histogram_seconds |
histogram |
Count and time taken to get databases by ID |
kxi_kxic_db_get_failure_total |
counter |
Number of failed calls to get databases by ID |
kxi_kxic_db_update_histogram_seconds |
histogram |
Count and time taken to update databases |
kxi_kxic_db_update_failure_total |
counter |
Number of failed calls to update databases |
kxi_kxic_db_delete_histogram_seconds |
histogram |
Count and time taken to delete databases |
kxi_kxic_db_delete_failure_total |
counter |
Number of failed calls to delete databases |
kxi_kxic_db_count |
gauge |
Number of existing databases |
pipeline
Relates to the pipelines of sandboxes from the UI. Available metrics are:
metric name | type | description |
---|---|---|
kxi_kxic_pipeline_list_histogram_seconds |
histogram |
Count and time taken to list pipelines |
kxi_kxic_pipeline_list_failure_total |
counter |
Number of failed calls to list pipelines |
kxi_kxic_pipeline_create_histogram_seconds |
histogram |
Count and time taken to create pipelines |
kxi_kxic_pipeline_create_failure_total |
counter |
Number of failed calls to create pipelines |
kxi_kxic_pipeline_get_histogram_seconds |
histogram |
Count and time taken to get pipelines by ID |
kxi_kxic_pipeline_get_failure_total |
counter |
Number of failed calls to get pipelines by ID |
kxi_kxic_pipeline_update_histogram_seconds |
histogram |
Count and time taken to update pipelines |
kxi_kxic_pipeline_update_failure_total |
counter |
Number of failed calls to update pipelines |
kxi_kxic_pipeline_delete_histogram_seconds |
histogram |
Count and time taken to delete pipelines |
kxi_kxic_pipeline_delete_failure_total |
counter |
Number of failed calls to delete pipelines |
kxi_kxic_pipeline_count |
gauge |
Number of existing pipelines |
stream
Relates to the management of streams from the UI. Available metrics are:
metric name | type | description |
---|---|---|
kxi_kxic_stream_list_histogram_seconds |
histogram |
Count and time taken to list streams |
kxi_kxic_stream_list_failure_total |
counter |
Number of failed calls to list streams |
kxi_kxic_stream_create_histogram_seconds |
histogram |
Count and time taken to create streams |
kxi_kxic_stream_create_failure_total |
counter |
Number of failed calls to create streams |
kxi_kxic_stream_get_histogram_seconds |
histogram |
Count and time taken to get streams by ID |
kxi_kxic_stream_get_failure_total |
counter |
Number of failed calls to get streams by ID |
kxi_kxic_stream_update_histogram_seconds |
histogram |
Count and time taken to update streams |
kxi_kxic_stream_update_failure_total |
counter |
Number of failed calls to update streams |
kxi_kxic_stream_delete_histogram_seconds |
histogram |
Count and time taken to delete streams |
kxi_kxic_stream_delete_failure_total |
counter |
Number of failed calls to delete streams |
kxi_kxic_streams_count |
gauge |
Number of existing streams |
assembly
Relates to the management of assemblies from the UI. Available metrics are:
metric name | type | description |
---|---|---|
kxi_kxic_assembly_list_histogram_seconds |
histogram |
Count and time taken to list assemblies |
kxi_kxic_assembly_list_failure_total |
counter |
Number of failed calls to list assemblies |
kxi_kxic_assembly_create_histogram_seconds |
histogram |
Count and time taken to create assemblies |
kxi_kxic_assembly_create_failure_total |
counter |
Number of failed calls to create assemblies |
kxi_kxic_assembly_get_histogram_seconds |
histogram |
Count and time taken to get assemblies |
kxi_kxic_assembly_get_failure_total |
counter |
Number of failed calls to get assemblies |
kxi_kxic_assembly_update_histogram_seconds |
histogram |
Count and time taken to update assemblies |
kxi_kxic_assembly_update_failure_total |
counter |
Number of failed calls to update assemblies |
kxi_kxic_assembly_delete_histogram_seconds |
histogram |
Count and time taken to delete assemblies |
kxi_kxic_assembly_delete_failure_total |
counter |
Number of failed calls to delete assemblies |
kxi_kxic_assembly_deploy_histogram_seconds |
histogram |
Count and time taken to deploy assemblies |
kxi_kxic_assembly_deploy_failure_total |
counter |
Number of failed calls to deploy assemblies |
kxi_kxic_assembly_export_histogram_seconds |
histogram |
Count and time taken to export assemblies |
kxi_kxic_assembly_export_failure_total |
counter |
Number of failed calls to export assemblies |
kxi_kxic_assembly_teardown_histogram_seconds |
histogram |
Count and time taken to teardown assemblies |
kxi_kxic_assembly_teardown_failure_total |
counter |
Number of failed calls to teardown assemblies |
kxi_kxic_assembly_count |
gauge |
Number of existing assemblies |
discovery-proxy
Each call to a discovery-proxy API endpoint will generate metrics.
metric name | type | description |
---|---|---|
kxi_sd_register_histogram_seconds |
histogram |
Count and time taken to register services with discovery |
kxi_sd_register_failure_total |
counter |
Number of failed calls to register services with discovery |
kxi_sd_updateDetails_histogram_seconds |
histogram |
Count and time taken to service details with the registry |
kxi_sd_updateDetails_failure_total |
counter |
Number of failed calls to service details with the registry |
kxi_sd_getServices_histogram_seconds |
histogram |
Count and time taken to get the latest services from the registry |
kxi_sd_getServices_failure_total |
counter |
Number of failed calls to get the latest services from the registry |
kxi_sd_heartbeat_histogram_seconds |
histogram |
Count and time taken to heartbeat with the registry |
kxi_sd_heartbeat_failure_total |
counter |
Number of failed calls to heartbeat with the registry |
kxi_sd_updateStatus_histogram_seconds |
histogram |
Count and time taken to update status with registry |
kxi_sd_updateStatus_failure_total |
counter |
Number of failed calls to update status with registry |
kxi_sd_deregister_histogram_seconds |
histogram |
Count and time taken to deregister from the registry |
kxi_sd_deregister_failure_total |
counter |
Number of failed calls to deregister from the registry |
kxi_sd_alive_histogram_seconds |
histogram |
Count and time taken to check the proxy process is responsive |
kxi_sd_alive_failure_total |
counter |
Number of failed calls to check the proxy process is responsive |
kxi_sd_ready_histogram_seconds |
histogram |
Count and time taken to check the proxy process is ready |
kxi_sd_ready_failure_total |
counter |
Number of failed calls to check the proxy process is ready |
kxi-operator
KXI Operator generates metrics on each attempt to interact with a namespace SP Coordinator or Keycloak instance.
metric name | type | description |
---|---|---|
kxi_operator_keycloak_errors_total |
counter |
Number of failed requests to Keycloak |
kxi_operator_keycloak_request_seconds |
histogram |
Count and time taken to make requests to Keycloak |
kxi_operator_pipeline_errors_total |
counter |
Number of failed calls to the SP Coordinator |
kxi_operator_pipeline_request_seconds |
histogram |
Count and time taken to make requests to the SP Coordinator |
information service
Each call to an information service API endpoint will generate metrics.
metric name | type | description |
---|---|---|
kxi_info_details_histogram_seconds |
histogram |
Count and time taken to get details for a specific client ID |
kxi_info_details_failure_total |
counter |
Number of failed calls to get details for a specific client ID |
client-controller
Each call to a client-controller API endpoint will generate metrics.
metric name | type | description |
---|---|---|
kxi_com_kx_cc_enrol_histogram_seconds |
histogram |
Count and time taken to enroll clients |
kxi_com_kx_cc_enrol_failure_total |
counter |
Number of failed calls to enroll clients |
kxi_com_kx_cc_leave_histogram_seconds |
histogram |
Count and time taken to remove clients |
kxi_com_kx_cc_leave_failure_total |
counter |
Number of failed calls to remove clients |
reliable transport
The Reliable Transport, also known as an KX Insights Stream publishes status and performance metrics by default. These can be disabled by setting the environment variable RT_EXPORT_METRICS="0".
metric name | type | Description | Component | Node |
---|---|---|---|---|
kxi_rt_seq_leader |
gauge |
Leadership status of node | sequencer | all |
kxi_rt_in_bytes_total |
counter |
Count of input bytes sequenced | sequencer | leader |
kxi_rt_in_messages_total |
counter |
Count of input messages sequenced | sequencer | leader |
kxi_rt_in_bytes |
counter |
Count of input bytes sequenced per directory | sequencer | leader |
kxi_rt_in_messages |
counter |
Count of input messages sequenced per directory | sequencer | leader |
kxi_rt_out_bytes_total |
counter |
Count of bytes merged | merger | all |
kxi_rt_out_messages_total |
counter |
Count of messages merged | merger | all |
kxi_out_bytes |
counter |
Count of bytes merged per directory | merger | all |
kxi_out_messages |
counter |
Count of messages merged per directory | merger | all |
kxi_rt_merge_queue_size |
gauge |
Merge instructions waiting in queue | merger | all |
Note
-
As well as exporting the total number of messages and bytes transferred (
*_total
metrics), RT exports the number of messages and bytes transferred per publisher, using adirectory
label set totopicname.hostname
. -
The sequencer
leader
metric is set on restart/leader change, therefore it is always up to date. The other sequencer metrics are updated every second, but only on the leader node and should be ignored for all other nodes. -
The merger metrics are all updated every second for all nodes.
Labels
All metrics include the following labels:
label | description | example |
---|---|---|
ha_type |
HA configuration | 3-node |
raft_node_index |
the node index from the hostname | 0 |
Metrics defined for each individual publisher also include the following labels:
label | description | example |
---|---|---|
directory |
name of the directory | topicname.hostname |
dedup_stream |
name of the topic, if the input stream is being deduplicated | topicname |