Alerts reference
kdb Insights Enterprise provides a set of pre-configured alerts to help you monitor and maintain the health of kdb Insights Enterprise.
This page lists the pre-packaged alerts generated by kdb Insights Enterprise.
Alert name prefix
All alert names are prefixed with "NAMESPACE-kxi-".
| name | level | description | threshold | context |
|---|---|---|---|---|
| CriticalNodeCPUUsage | critical | Node CPU utilization % for the last 5 minutes | > 90% | node |
| CriticalNodeMemoryUsage | critical | Node Memory usage for the last 5 minutes | > 90% | node |
| CriticalPVCDiskUsage | critical | Percentage disk utilization of Non-RT PVC connected to one or multiple pods | > 80% | pvc |
| CriticalRookCephDiskUsage | critical | Percentage rook-ceph disk utilization across the cluster | > 80% | cluster |
| CriticalRTPVCDiskUsage | warning | Percentage disk utilization of RT PVC connected to one or multiple pods | > 95% | pvc |
| DAPIsNotReceivingData | warning | A previously active DAP has not received any data in the last minute | pod | |
| DAPPurgeIncomplete | warning | At the last EOI the DAP purged less than 50% of the records written to the Storage Manager | > 50% | assembly |
| HighAggErrors | warning | Aggregator errors for the last minute | > 20 | pod |
| HighAggQueueSize | warning | Aggregator request queue size for the last minute | > 20 | pod |
| HighCPUThrottling | warning | CPU throttling issues for a process in container for the last minute | container | |
| HighNodeCPUUsage | warning | Node CPU utilization % for the last 5 minutes | > 80% | node |
| HighNodeMemoryUsage | warning | Node Memory usage for the last 5 minutes | > 80% | node |
| HighPVCDiskUsage | warning | Percentage disk utilization of Non-RT PVC connected to one or multiple pods | > 60% | pvc |
| HighRCQueueSize | warning | Resource Coordinator queue size | > 20 | pod |
| HighRCRetries | warning | Resource Coordinator request retries | > 20 | pod |
| HighRookCephDiskUsage | warning | Percentage rook-ceph disk utilization across the cluster | > 90% | cluster |
| HighSGPendingQueries | warning | Service Gateway pending queries for the last minute | > 20 | container |
| HighSMEODTime | info | Time take for an EOD | > 4h | database |
| HighSymFileGrowth | info | Daily sym file growth as a percentage of the total sym file size, where the sym file is larger than 50MB | > 25 % | pod |
| KeycloakContainerFailed | warning | Pod responsible for Keycloak failing to restart in the last 5 minutes | pod | |
| NoAggsPresent | warning | At least one assembly is deployed, but no Resource Coordinator Aggregators exist | container | |
| NoDAPsPresent | warning | At least one assembly is deployed, but no Resource Coordinator DAPs exist | container | |
| NodeNotInReadyState | warning | A node is not in a ready state | node | |
| NoRDBGrowth | warning | Rate of rdb growth is 0% | = 0% | pod |
| NoRTLeader | critical | There is no leader for the Stream and therefore no messages will be merged and available for the subscribers | RT stream | |
| PodCrashing | critical | Pod in a CrashLoopBackoff for the last minute | pod | |
| PodCrashLoopBackOff | warning | Pod failing to restart on for the last minute | pod | |
| PodInFailedState | warning | Pod in Failed state for the last minute | pod | |
| PodInUnknownState | warning | Pod in Unknown state for the last minute | pod | |
| PodNotReady | warning | Pod is in NotReady state for the last minute | pod | |
| PodOOMKilled | warning | Container is Out of memory (OOM) killed and restarting | container | |
| PodTargetDown | warning | A target is down | pod | |
| PostgreSQLContainerFailed | warning | PostgreSQL container which supports Keycloak is no longer running | pod | |
| RCsWithoutDAPs | critical | Resource Coordinators have connected clients but there are no Data Access Processes connected to them | container | |
| RTContainerDown | warning | A Reliable Transport container has either failed, or been stopped manually | container | |
| RookCephLimitedDiskAvailable | warning | Limited Rook-Ceph disk storage available in MBs | < 2000 MB | node |
| SGWithoutAggs | warning | Service Gateway has connected client but there are no Aggregators connected | container | |
| SMContainerDown | warning | Storage Manager container has either failed, or been stopped manually | container | |
| SMNoRecordsWrittenDuringEOI | warning | An End of Interval ran but no records were written | pod | |
| SMPendingEOIs | warning | Storage Manager has pending End of Interval requests | pod |