Alerts reference

kdb Insights Enterprise provides a set of pre-configured alerts to help you monitor and maintain the health of kdb Insights Enterprise.

This page lists the pre-packaged alerts generated by kdb Insights Enterprise.

Alert name prefix

All alert names are prefixed with "NAMESPACE-kxi-".

name	level	description	threshold	context
CriticalNodeCPUUsage	critical	Node CPU utilization % for the last 5 minutes	> 90%	node
CriticalNodeMemoryUsage	critical	Node Memory usage for the last 5 minutes	> 90%	node
CriticalPVCDiskUsage	critical	Percentage disk utilization of Non-RT PVC connected to one or multiple pods	> 80%	pvc
CriticalRookCephDiskUsage	critical	Percentage rook-ceph disk utilization across the cluster	> 80%	cluster
CriticalRTPVCDiskUsage	warning	Percentage disk utilization of RT PVC connected to one or multiple pods	> 95%	pvc
DAPIsNotReceivingData	warning	A previously active DAP has not received any data in the last minute		pod
DAPPurgeIncomplete	warning	At the last EOI the DAP purged less than 50% of the records written to the Storage Manager	> 50%	assembly
HighAggErrors	warning	Aggregator errors for the last minute	> 20	pod
HighAggQueueSize	warning	Aggregator request queue size for the last minute	> 20	pod
HighCPUThrottling	warning	CPU throttling issues for a process in container for the last minute		container
HighNodeCPUUsage	warning	Node CPU utilization % for the last 5 minutes	> 80%	node
HighNodeMemoryUsage	warning	Node Memory usage for the last 5 minutes	> 80%	node
HighPVCDiskUsage	warning	Percentage disk utilization of Non-RT PVC connected to one or multiple pods	> 60%	pvc
HighRCQueueSize	warning	Resource Coordinator queue size	> 20	pod
HighRCRetries	warning	Resource Coordinator request retries	> 20	pod
HighRookCephDiskUsage	warning	Percentage rook-ceph disk utilization across the cluster	> 90%	cluster
HighSGPendingQueries	warning	Service Gateway pending queries for the last minute	> 20	container
HighSMEODTime	info	Time take for an EOD	> 4h	database
HighSymFileGrowth	info	Daily sym file growth as a percentage of the total sym file size, where the sym file is larger than 50MB	> 25 %	pod
KeycloakContainerFailed	warning	Pod responsible for Keycloak failing to restart in the last 5 minutes		pod
NoAggsPresent	warning	At least one assembly is deployed, but no Resource Coordinator Aggregators exist		container
NoDAPsPresent	warning	At least one assembly is deployed, but no Resource Coordinator DAPs exist		container
NodeNotInReadyState	warning	A node is not in a ready state		node
NoRDBGrowth	warning	Rate of rdb growth is 0%	= 0%	pod
NoRTLeader	critical	There is no leader for the Stream and therefore no messages will be merged and available for the subscribers		RT stream
PodCrashing	critical	Pod in a CrashLoopBackoff for the last minute		pod
PodCrashLoopBackOff	warning	Pod failing to restart on for the last minute		pod
PodInFailedState	warning	Pod in Failed state for the last minute		pod
PodInUnknownState	warning	Pod in Unknown state for the last minute		pod
PodNotReady	warning	Pod is in NotReady state for the last minute		pod
PodOOMKilled	warning	Container is Out of memory (OOM) killed and restarting		container
PodTargetDown	warning	A target is down		pod
PostgreSQLContainerFailed	warning	PostgreSQL container which supports Keycloak is no longer running		pod
RCsWithoutDAPs	critical	Resource Coordinators have connected clients but there are no Data Access Processes connected to them		container
RTContainerDown	warning	A Reliable Transport container has either failed, or been stopped manually		container
RookCephLimitedDiskAvailable	warning	Limited Rook-Ceph disk storage available in MBs	< 2000 MB	node
SGWithoutAggs	warning	Service Gateway has connected client but there are no Aggregators connected		container
SMContainerDown	warning	Storage Manager container has either failed, or been stopped manually		container
SMNoRecordsWrittenDuringEOI	warning	An End of Interval ran but no records were written		pod
SMPendingEOIs	warning	Storage Manager has pending End of Interval requests		pod