Skip to content

Alerts reference

kdb Insights Enterprise provides a set of pre-configured alerts to help you monitor and maintain the health of kdb Insights Enterprise.

This page lists the pre-packaged alerts generated by kdb Insights Enterprise.

Alert name prefix

All alert names are prefixed with "NAMESPACE-kxi-".

name level description threshold context
CriticalNodeCPUUsage critical Node CPU utilization % for the last 5 minutes > 90% node
CriticalNodeMemoryUsage critical Node Memory usage for the last 5 minutes > 90% node
CriticalPVCDiskUsage critical Percentage disk utilization of Non-RT PVC connected to one or multiple pods > 80% pvc
CriticalRookCephDiskUsage critical Percentage rook-ceph disk utilization across the cluster > 80% cluster
CriticalRTPVCDiskUsage warning Percentage disk utilization of RT PVC connected to one or multiple pods > 95% pvc
DAPIsNotReceivingData warning A previously active DAP has not received any data in the last minute pod
DAPPurgeIncomplete warning At the last EOI the DAP purged less than 50% of the records written to the Storage Manager > 50% assembly
HighAggErrors warning Aggregator errors for the last minute > 20 pod
HighAggQueueSize warning Aggregator request queue size for the last minute > 20 pod
HighCPUThrottling warning CPU throttling issues for a process in container for the last minute container
HighNodeCPUUsage warning Node CPU utilization % for the last 5 minutes > 80% node
HighNodeMemoryUsage warning Node Memory usage for the last 5 minutes > 80% node
HighPVCDiskUsage warning Percentage disk utilization of Non-RT PVC connected to one or multiple pods > 60% pvc
HighRCQueueSize warning Resource Coordinator queue size > 20 pod
HighRCRetries warning Resource Coordinator request retries > 20 pod
HighRookCephDiskUsage warning Percentage rook-ceph disk utilization across the cluster > 90% cluster
HighSGPendingQueries warning Service Gateway pending queries for the last minute > 20 container
HighSMEODTime info Time take for an EOD > 4h database
HighSymFileGrowth info Daily sym file growth as a percentage of the total sym file size, where the sym file is larger than 50MB > 25 % pod
KeycloakContainerFailed warning Pod responsible for Keycloak failing to restart in the last 5 minutes pod
NoAggsPresent warning At least one assembly is deployed, but no Resource Coordinator Aggregators exist container
NoDAPsPresent warning At least one assembly is deployed, but no Resource Coordinator DAPs exist container
NodeNotInReadyState warning A node is not in a ready state node
NoRDBGrowth warning Rate of rdb growth is 0% = 0% pod
NoRTLeader critical There is no leader for the Stream and therefore no messages will be merged and available for the subscribers RT stream
PodCrashing critical Pod in a CrashLoopBackoff for the last minute pod
PodCrashLoopBackOff warning Pod failing to restart on for the last minute pod
PodInFailedState warning Pod in Failed state for the last minute pod
PodInUnknownState warning Pod in Unknown state for the last minute pod
PodNotReady warning Pod is in NotReady state for the last minute pod
PodOOMKilled warning Container is Out of memory (OOM) killed and restarting container
PodTargetDown warning A target is down pod
PostgreSQLContainerFailed warning PostgreSQL container which supports Keycloak is no longer running pod
RCsWithoutDAPs critical Resource Coordinators have connected clients but there are no Data Access Processes connected to them container
RTContainerDown warning A Reliable Transport container has either failed, or been stopped manually container
RookCephLimitedDiskAvailable warning Limited Rook-Ceph disk storage available in MBs < 2000 MB node
SGWithoutAggs warning Service Gateway has connected client but there are no Aggregators connected container
SMContainerDown warning Storage Manager container has either failed, or been stopped manually container
SMNoRecordsWrittenDuringEOI warning An End of Interval ran but no records were written pod
SMPendingEOIs warning Storage Manager has pending End of Interval requests pod