Alerts reference
kdb Insights Enterprise provides a set of pre-configured alerts to help you monitor and maintain the health of kdb Insights Enterprise.
This page lists the pre-packaged alerts generated by kdb Insights Enterprise.
Alert name prefix
All alert names are prefixed with "NAMESPACE-kxi-".
name | level | description | threshold | context |
---|---|---|---|---|
CriticalNodeCPUUsage | critical | Node CPU utilization % for the last 5 minutes | > 90% | node |
CriticalNodeMemoryUsage | critical | Node Memory usage for the last 5 minutes | > 90% | node |
CriticalPVCDiskUsage | critical | Percentage disk utilization of Non-RT PVC connected to one or multiple pods | > 80% | pvc |
CriticalRookCephDiskUsage | critical | Percentage rook-ceph disk utilization across the cluster | > 80% | cluster |
CriticalRTPVCDiskUsage | warning | Percentage disk utilization of RT PVC connected to one or multiple pods | > 95% | pvc |
DAPIsNotReceivingData | warning | A previously active DAP has not received any data in the last minute | pod | |
DAPPurgeIncomplete | warning | At the last EOI the DAP purged less than 50% of the records written to the Storage Manager | > 50% | assembly |
HighAggErrors | warning | Aggregator errors for the last minute | > 20 | pod |
HighAggQueueSize | warning | Aggregator request queue size for the last minute | > 20 | pod |
HighCPUThrottling | warning | CPU throttling issues for a process in container for the last minute | container | |
HighNodeCPUUsage | warning | Node CPU utilization % for the last 5 minutes | > 80% | node |
HighNodeMemoryUsage | warning | Node Memory usage for the last 5 minutes | > 80% | node |
HighPVCDiskUsage | warning | Percentage disk utilization of Non-RT PVC connected to one or multiple pods | > 60% | pvc |
HighRCQueueSize | warning | Resource Coordinator queue size | > 20 | pod |
HighRCRetries | warning | Resource Coordinator request retries | > 20 | pod |
HighRookCephDiskUsage | warning | Percentage rook-ceph disk utilization across the cluster | > 90% | cluster |
HighSGPendingQueries | warning | Service Gateway pending queries for the last minute | > 20 | container |
HighSMEODTime | info | Time take for an EOD | > 4h | database |
HighSymFileGrowth | info | Daily sym file growth as a percentage of the total sym file size, where the sym file is larger than 50MB | > 25 % | pod |
KeycloakContainerFailed | warning | Pod responsible for Keycloak failing to restart in the last 5 minutes | pod | |
NoAggsPresent | warning | At least one assembly is deployed, but no Resource Coordinator Aggregators exist | container | |
NoDAPsPresent | warning | At least one assembly is deployed, but no Resource Coordinator DAPs exist | container | |
NodeNotInReadyState | warning | A node is not in a ready state | node | |
NoRDBGrowth | warning | Rate of rdb growth is 0% | = 0% | pod |
NoRTLeader | critical | There is no leader for the Stream and therefore no messages will be merged and available for the subscribers | RT stream | |
PodCrashing | critical | Pod in a CrashLoopBackoff for the last minute | pod | |
PodCrashLoopBackOff | warning | Pod failing to restart on for the last minute | pod | |
PodInFailedState | warning | Pod in Failed state for the last minute | pod | |
PodInUnknownState | warning | Pod in Unknown state for the last minute | pod | |
PodNotReady | warning | Pod is in NotReady state for the last minute | pod | |
PodOOMKilled | warning | Container is Out of memory (OOM) killed and restarting | container | |
PodTargetDown | warning | A target is down | pod | |
PostgreSQLContainerFailed | warning | PostgreSQL container which supports Keycloak is no longer running | pod | |
RCsWithoutDAPs | critical | Resource Coordinators have connected clients but there are no Data Access Processes connected to them | container | |
RTContainerDown | warning | A Reliable Transport container has either failed, or been stopped manually | container | |
RookCephLimitedDiskAvailable | warning | Limited Rook-Ceph disk storage available in MBs | < 2000 MB | node |
SGWithoutAggs | warning | Service Gateway has connected client but there are no Aggregators connected | container | |
SMContainerDown | warning | Storage Manager container has either failed, or been stopped manually | container | |
SMNoRecordsWrittenDuringEOI | warning | An End of Interval ran but no records were written | pod | |
SMPendingEOIs | warning | Storage Manager has pending End of Interval requests | pod |