Available metrics

The following metrics are available from the Monitoring system. The Monitoring components are agnostic of the available list of metrics. Therefore, the versions mentioned in this page are the Refinery application version that the metrics became available.

Refinery

Common metrics

These metrics are available across multiple different process types:

Since Metric name Description
4.2.0 ipc.pendingHandleBytes The total number of pending bytes to all connections in the process. If there are any pending bytes, it can signify a slow consumer
process.queryTimeout The current query timeout (in seconds) on the process
process.connectionOpen Information of a new inbound connection to the process
process.connectionClosed Information of any inbound or outbound connection closed
process.exit Sent as the process exits gracefully (either by a stop request or a SIGTERM
process.userAuthFailure Any user connection attempts that have failed due to bad credentials
4.3.2 mem.kdbUsageBytes The internal memory usage of the kdb+ process, based on the output from .Q.w[]
process.ping The response time of each process the Monitoring daemon is connected to
system.timeSync Check for time synchronisation between multiple servers (only if daemon is running in single-node mode).

The supported process types:

since  process type
-------------------------------------
4.2.0  GW, QR, TP, RDB, HDB, CTP, RTE 
4.3.2  PDB
4.5.3  MS, FW, Event Bus

Process-level metrics

Process type Since Metric name Description
TP 4.2.0 tp.updDayCount number of rows received in the TP per table; updated every 10 seconds and resets at end of day
RDB 4.2.0 consumer.updDayCount number of rows received in consuming process per table; updated every 10 seconds and resets at end of day
RDB 4.3.2 rdb.eodFlush Once the RDB has been flushed of the previous days’ data
CTP/RTE 4.2.0 consumer.updDayCount number of rows received in consuming process per table; updated every 10 seconds and resets at end of day
HDB 4.2.0 hdb.availableDates most recent date in the HDB (latest) and all dates available in the HDB (all)
HDB 4.2.0 hdb.latestDateRowCounts most recent date in the HDB (latest) and the captured row counts per table (rowCounts)
PDB 4.3.2 pdb.rollover updates of the status of the rollover within the PDB process
GW 4.2.0 gw.queryStatus The status of a query after execution on the Gateway. Status can be checked by the status boolean
GW 4.5.3 failover.process An individual process that has been failed over due to a connection drop
GW 4.5.3 failover.stack A pipeline of processes that have been failed over due to a FH or TP connection drop
GW 4.5.3 failover.routing The routing master has changed due to a connection drop
QR 4.2.0 qr.queryDispatch The information in a query as it is sent to the downstream processes to be executed. There is no query status in this metric
QR 4.2.0 qr.queryResult The result of a query after downstream execution. Status can be checked by the status boolean
QR 4.5.3 failover.process An individual process that has been failed over due to a connection drop
QR 4.5.3 failover.stack A pipeline of processes that have been failed over due to a FH or TP connection drop
QR 4.5.3 failover.routing The routing master has changed due to a connection drop
Event Bus 4.5.3 eb.primaryState If the event bus changes from primary to secondary or vice versa

Ignored processes

By default, any process based on the following process template will be completely ignored by the Monitoring components:

  • DS_LAUNCH
  • DS_STARTER

System information generator

If the System Information Generator is running on each server, then server-level and additional process-level metrics are available.

Server-level metrics

Since Metric name Description
4.3.2 system.cpu usage of server CPUs, individually and total
system.fs filesystem usage
system.kernel kernel information
system.load system load averages
system.mem server RAM usage across all processes
system.netproto detailed TCP and UDP network statistics
system.nic network interface statistics
system.process
system.sys

Process-level metrics

Since Metric name Description
4.3.2 process.cpu process CPU usage
process.mem process RAM usage