Available metrics¶
The following metrics are available from the Monitoring system. The Monitoring components are agnostic of the available list of metrics. Therefore, the versions mentioned in this page are the Refinery application version that the metrics became available.
Refinery¶
Common metrics¶
These metrics are available across multiple different process types:
| Since | Metric name | Description |
|---|---|---|
| 4.2.0 | ipc.pendingHandleBytes | The total number of pending bytes to all connections in the process. If there are any pending bytes, it can signify a slow consumer |
| process.queryTimeout | The current query timeout (in seconds) on the process | |
| process.connectionOpen | Information of a new inbound connection to the process | |
| process.connectionClosed | Information of any inbound or outbound connection closed | |
| process.exit | Sent as the process exits gracefully (either by a stop request or a SIGTERM |
|
| process.userAuthFailure | Any user connection attempts that have failed due to bad credentials | |
| 4.3.2 | mem.kdbUsageBytes | The internal memory usage of the kdb+ process, based on the output from .Q.w[] |
| process.ping | The response time of each process the Monitoring daemon is connected to | |
| system.timeSync | Check for time synchronisation between multiple servers (only if daemon is running in single-node mode). |
The supported process types:
since process type
-------------------------------------
4.2.0 GW, QR, TP, RDB, HDB, CTP, RTE
4.3.2 PDB
4.5.3 MS, FW, Event Bus
Process-level metrics¶
| Process type | Since | Metric name | Description |
|---|---|---|---|
| TP | 4.2.0 | tp.updDayCount | number of rows received in the TP per table; updated every 10 seconds and resets at end of day |
| RDB | 4.2.0 | consumer.updDayCount | number of rows received in consuming process per table; updated every 10 seconds and resets at end of day |
| RDB | 4.3.2 | rdb.eodFlush | Once the RDB has been flushed of the previous days’ data |
| CTP/RTE | 4.2.0 | consumer.updDayCount | number of rows received in consuming process per table; updated every 10 seconds and resets at end of day |
| HDB | 4.2.0 | hdb.availableDates | most recent date in the HDB (latest) and all dates available in the HDB (all) |
| HDB | 4.2.0 | hdb.latestDateRowCounts | most recent date in the HDB (latest) and the captured row counts per table (rowCounts) |
| PDB | 4.3.2 | pdb.rollover | updates of the status of the rollover within the PDB process |
| GW | 4.2.0 | gw.queryStatus | The status of a query after execution on the Gateway. Status can be checked by the status boolean |
| GW | 4.5.3 | failover.process | An individual process that has been failed over due to a connection drop |
| GW | 4.5.3 | failover.stack | A pipeline of processes that have been failed over due to a FH or TP connection drop |
| GW | 4.5.3 | failover.routing | The routing master has changed due to a connection drop |
| QR | 4.2.0 | qr.queryDispatch | The information in a query as it is sent to the downstream processes to be executed. There is no query status in this metric |
| QR | 4.2.0 | qr.queryResult | The result of a query after downstream execution. Status can be checked by the status boolean |
| QR | 4.5.3 | failover.process | An individual process that has been failed over due to a connection drop |
| QR | 4.5.3 | failover.stack | A pipeline of processes that have been failed over due to a FH or TP connection drop |
| QR | 4.5.3 | failover.routing | The routing master has changed due to a connection drop |
| Event Bus | 4.5.3 | eb.primaryState | If the event bus changes from primary to secondary or vice versa |
Ignored processes¶
By default, any process based on the following process template will be completely ignored by the Monitoring components:
- DS_LAUNCH
- DS_STARTER
System information generator¶
If the System Information Generator is running on each server, then server-level and additional process-level metrics are available.
Server-level metrics¶
| Since | Metric name | Description |
|---|---|---|
| 4.3.2 | system.cpu | usage of server CPUs, individually and total |
| system.fs | filesystem usage | |
| system.kernel | kernel information | |
| system.load | system load averages | |
| system.mem | server RAM usage across all processes | |
| system.netproto | detailed TCP and UDP network statistics | |
| system.nic | network interface statistics | |
| system.process | ||
| system.sys |
Process-level metrics¶
| Since | Metric name | Description |
|---|---|---|
| 4.3.2 | process.cpu | process CPU usage |
| process.mem | process RAM usage |