kdb Insights Enterprise Azure Monitoring Workbooks
kdb Insights Enterprise on Azure Workbooks are a compilation of relevant metrics to help the user monitor the performance and status of kdb Insights Enterprise and the Azure Cloud infrastructure in a centralized and holistic way.
Azure Workbooks are based on Microsoft Azure log analytics data, a feature that allows the user to obtain performance statistics from the system, while offering a tight integration across the Microsoft supported deployments.
The kdb Insights Workbooks are automatically deployed alongside each kdb Insights Enterprise to assist with monitoring the performance of kdb Insights Enterprise.
Getting started
-
Go to Azure Homepage and click Resource Group
-
Select KX Insights Workbook
Navigate the Workbook
Given its multi-deployment tracking capability, the user can navigate through Subscriptions without changing the screen.
The user needs to select Subscription, Cluster Name, Workspace and Time-Range.
-
Make your selection on the main tab.
A set of tabs below will help the user navigate each metric category.
-
Make your selection on the sub tabs
Tabs
Cluster Overview
Metrics shown on this tab provide a general health overview of Azure and kdb Insights Enterprise underlying hardware. It provides Kubernetes cluster-level overview of CPU memory and disk usage.
metric | description | risk |
---|---|---|
Cluster CPU | Percentage of CPU Utilised by the product. | |
Cluster Memory | Percentage of available Memory utilised by the product. | |
Cluster Disk Usage % | Percentage of available disk utilised by the product. |
Nodes
Information shown on this tab contains details for nodes that are part of that particular Kubernetes cluster, with each node being a virtual or a physical machine. It can be identified as “kxinsightaks” – KX Insights Azure Kubernetes Service.
metric | description | risk |
---|---|---|
Node CPU | Percentage of CPU Utilised to run the system and functions. | |
Node Memory | Percentage of Memory RAM utilised to run the system and functions. | |
Node Disk Usage % | Percentage of Disk utilised to store ingested data. |
metric | description |
---|---|
Received Bytes | Rate at which data is received. |
Sent Bytes | Rate at which data is sent. |
Disk Busy % | Utilization of Disk by transactions and access requests. |
Read Bytes | Data read from disk. |
Written Bytes | Data written down to Disk. |
Disk IOPs | Total Input/Output operations being performed. |
Network Bytes IN | Network bytes received. |
Network Bytes OUT | Network bytes transmitted. |
Errors IN | Total errors receiving data. |
Errors OUT | Total errors transmitting data. |
IOPs in progress | Input/Output operations that are in progress of execution. |
Pods
Pods are a group of one or more running containers (containers can run one or more processes). Information shown on this tab relates to the pods of both kdb Insights Enterprise deployment and Azure Kubernetes (AKS).
Azure Kubernetes (AKS) relies on controllers to monitor and manage pods and to coordinate resources for software applications. Namespaces provide a mechanism for isolating groups of resources within a single cluster. Names of resources need to be unique within a namespace, but not across namespaces. Namespace-based scoping is applicable only for namespaced objects (e.g. Deployments, Services, etc) and not for cluster-wide objects (e.g. StorageClass, Nodes, PersistentVolumes, etc).
metric | description | risk |
---|---|---|
CPU Used | CPU being used. | |
Memory Used | Memory used by every pod. | |
Pod Status by Node | List of all Pods and their current Health: Total, Running, Succeeded, Pending, Failed, Unknown. | |
Controllers per Namespace | Total number of controllers deployed on each namespace within the deployment. |
Rook-Ceph
Information shown on this tab relates to Rook-Ceph, a storage management tool used on Kubernetes. It automates the storage management processes of the system, making storage self-healing, self-managing and self-scaling.
Note
These charts are only populated if the user chose to deploy Rook-Ceph during the configuration of their kdb Insights Enterprise. If the user does not select Rook-Ceph during deployment time, then only the alternative storage class Azure NFS metrics will be available in the Workbook and the Rook-Ceph charts will be empty.
Rook-Ceph uses Object Storage Daemons (OSDs) to manage devices and ensure data can be accessed and relies on Pools to obtain resilience to data loss and also uses Objects to store data.
metric | description | risk |
---|---|---|
Cluster State | System’s health status: Healthy, Warning, Error. | |
Number of OSDs | Quantity Object Storage Daemons deployed on Ceph. | |
Number of OSDs Up | Amount of OSDs running | If Number OSDs ≠ Number OSDs up, Cluster state will change status. |
Number of Pools | Number of Pools deployed. | |
Number of Objects | Number of Objects deployed. | |
Cluster Disk usage % | Percentage of Disk utilised. | |
Read/Write bytes | Total amount of data written and read by the OSDs of Ceph. | |
Read/Writes | Number of read and write operations. | |
Pool stored % | Disk space by Pool. | |
Pool Stored bytes written | Rate at which each Pool writes data. | |
Pool Stored bytes read | Rate at which data is read. |
Assemblies
This section is used to monitor data flow through each assembly in kdb Insights Enterprise. An assembly is the entity that represents the resources needed to ingest data into kdb Insights Enterprise, transform it and store it in the database. An assembly includes a schema and a database, alongside one or more streams and pipelines.
This tab provides information about the volume of data and number of messages flowing through each assembly. It also provides lower-level details about the messages/sec, bytes/sec and the average message size per Assembly and Stream. A stream, also known as a Reliable Transport (or RT), is a component which transports data into kdb Insights Enterprise and between components of kdb Insights Enterprise.
All Assemblies view
metric | description | risk |
---|---|---|
Stream Messages In bytes/sec | Rate at which data is passed into a Stream at a given time. This may be from an external source, or from a Stream Processor. | |
Stream Messages Out bytes/sec | Rate at which data is passed out of a Stream. This may be to the Stream Processor or the Storage Manager. | Data is expected to flow through a Stream, Stream Messages Out is expected to be equal to Stream Messages In, unless filtering rules are in place, which would filter out certain messages. |
Database Messages In/sec | Rate of data flow from a Stream into each of the database tiers. |
Stream Details
metric | description | risk |
---|---|---|
Stream Messages In bytes/sec | Rate at which data is passed into a Stream at a given time. This may be from an external source or from the Stream Processor. | |
Stream Messages In/sec | Total number of messages being passed through a Stream | |
Average Size of Messages in bytes | Stream Messages Out bytes/sec Rate at which data is passed from the Stream Transport (RT) to kdb Insights Enterprise or the Storage Manager. | If Stream Messages Out ≠ Stream Messages In. Data may be trapped. |
Stream Messages Out/sec | Rate at which data is passed out of a Stream. This may be from to the Stream Processor or the Storage Manager | Data is expected to flow through the Stream. Rate of "Stream Messages Out" may differ from the rate of "Stream Messages In" only if filtering rules are in place to filter out certain messages. |
Average Size of Messages Out bytes | Average size of each message flowing through a Stream. |
Database Tier details
Quick summary of the amount of data being passed from the Stream into the Database Tier.
metric | description |
---|---|
Records In/sec | Rate of records entering each database tiers. |
Messages In/sec | Total Messages entering the different database tiers |
Records per message | Total records contained in each message. |
DB Ingestion
This section provides a deeper look into how data is passed through the different database tiers.
Real Time Stream
This tab depicts the number of data messages received by each tier from a Stream.
metric | description |
---|---|
Messages per sec | Total Messages entering the different database tiers. |
Records per message | Number of records inside each message at a given point in time entering the different database tiers. |
Note
It is expected that each tier receives the same number of messages.
Intraday
This tab depicts how data moves from a Real Time Database (RDB) to an Intraday Database (IDB). This occurs at regular intervals throughout the day, by default this occurs every 10 minutes.
During an End of Interval process (EOI), data for the last 10 minutes is transferred to the IDB, where it will be persisted to disk temporarily until it is persisted to disk in a historical database (HDB) partition at the end of the day (EOD).
metric | description | risk |
---|---|---|
Duration of last EOI transition | Length of each End of Interval process. | |
Records written during last EOI | Amount of data held in RDB that has been written to IDB during the last EOI. | If the data stream has a steady data flow then the number of written records between each transition should be consistent. |
Historical Database
This tab depicts how the historical database grows with each End of Day process (EOD). By default this occurs once a day.
metric | description | risk |
---|---|---|
HDB Size | Current size of the HDB. | |
Number of HDB Partitions | Current number of partitions in HDB. | |
Records Written During Last EOD Transition | Amount of data transferred to the HDB during an EOD process | If the data stream has a steady data flow then the number of written records between each transition should be consistent. |
DB Queries
Information about all queries requested by processes that are either internal or external to kdb Insights Enterprise.
These queries are actioned by the following components: Data Access Process (DAP), Resource Coordinator, Service Gateway and Aggregators.
Data Access Request
Information on how fast and successful a database tier is at actioning a request.
metric | description |
---|---|
Request Duration by Database tier | Speed at which queries retrieve the request on each database tier. |
Failed Requests by Database Tier/sec | Number of failed data requests on each database tier. |
Resource Coordinator
The Resource Coordinator takes each request and sends it on to each database tier that needs to provide data to return the results of the query.
metric | description | risk |
---|---|---|
Request Completion Time | Speed at which the system completes requests. | An increase in this could indicate a number of things: large number of requests are being made causing the system to come under pressure, some requests are expecting a large volume of data, there is a resource issue in kdb Insights Enterprise. |
Queue Length | Total number of requests that are in queue with the resource coordinator and have not yet been processed. | If this is high, or is increasing the system is under pressure and requests are building up. |
Connected Components | Shows the number of components connected to the Resource Connector, including DAPs and Aggregators. | |
Retry Count | Number of retries for the requests. | If the retry count is not zero then resources could be under pressure, or an error is occurring when trying to run the request. |
Service Gateway
The Service Gateway bridges network access and external access requests.
metric | description | risk |
---|---|---|
Connected Components | Number of components currently connected to the Service Gateway. | A high number of connected components may coincide with a high value for the pending requests if the volume of requests is high. |
Pending Requests | Number of requests the Service Gateway has not yet processed. | A rise in this metric may indicate a performance issue as the Service Gateway has a backlog of requests to action. |
HTTP Requests and Responses | Number of HTTP requests and responses. | If Requests ≠ Responses, system is not processing Requests correctly. |
IPC Requests and Responses | Number of IPC requests and responses. | If Requests ≠ Responses, system is not processing Requests correctly. |
Aggregator
The Aggregator combines data from multiple database tiers and tables.
metric | description | risk |
---|---|---|
Requests in Progress by Pod | Number of aggregation requests being processed by each aggregator. | |
Errors and Timeouts | Number of aggregation requests that have failed. | |
Requests by type | Total number of requests by type. | |
Aggregation Duration | Speed at which each aggregator completes a request. |