Streams

A stream is the kdb Insights Enterprise deployment of a Reliable Transport (or RT) cluster. Streams connect a data source to the database and provide highly available, high performance messaging.

Messages are sent to the stream by a publisher and are replicated, merged and sequenced before being passed to any consumers; for example the Stream Processor and the Storage Manager.

In the kdb Insights Enterprise all data written to the database goes via a stream. Streams can be connected to a Stream Processor and transformed, or they can be directly connected to the database.

Configuration in YAML

This guide discusses how to configure a stream using the kdb Insights Enterprise user interface. Streams can also be configured using YAML configuration file. YAML configured streams can be deployed using the kdb Insights CLI

Create a stream

To create a stream, start by clicking the "+" button beside streams on the left tree.

New stream

A stream minimally needs a name and a resource configuration to be defined. All other fields are optional.

Stream settings

name	description
Stream Name	This is the unique name for this stream that is used to identify the stream. This name is restricted to the DNS characters (alpha numeric and dashes) as it is used for data routing.
Sub Topic	The stream sub topic is used for external publishers to reference this stream for publishing. This field is only required if external publishing is needed. See advanced stream settings for details on creating an external

Once the stream has been configured, click the "Submit" button to save the settings. Streams need to be added to the database writedown settings to add data to a database. Once added to the database, it also must be added to an assembly to be deployed.

Video Tutorial

Create a data stream for an assembly; define CPU, Memory and Volume size resourcing.

Stream compute

Streams are responsible for moving data between components within kdb Insights. They ensure a deterministic order of events from multiple producers in one unified log of the event sequence. The throughput and latency of the system is dependent on the compute resources allocated to a stream compared to how much data is moving through it. The compute resources for a stream allows you to tune your system for your desired throughput.

name	description
Minimum CPU	The minimum amount of virtual CPU cycles to allocate to this stream. This value is measured in fractions of CPU cores which is either represented as `mCPU` for millicpu or just simply `CPU` for whole cores. You can express a half CPU as either `0.5` or as `500mCPU`.
Maximum CPU	The limit of CPU available to for this stream to consume. During burst operations, this process may consume more than the limit for a short period of time. This is a best effort governance of the amount of compute used.
Minimum Memory	The minimum amount of RAM to reserve for this process. RAM is measured in either bytes using multiples of 1024 (the `iB` suffix is used to denote 1024 byte multiples). If not enough memory is available on the current infrastructure, Kubernetes will trigger a scale up event on the install.
Maximum Memory	The maximum amount of RAM to reserve for this process. If the maximum RAM is consumed by this stream, it will be evicted and restarted to free up resources.
Volume Size	This is the amount of disk allocated to the stream to hold in-flight data. By default, this will hold `20Gi`. This should be tuned to hold enough data so that your system can recover from your maximum allowable downtime. For example, if your data was arriving at `1MB/s` and you wanted to allow for a 2 hour recovery, you would need at least `120Gi` of disk space to hold those logs.

Advanced stream settings

To enable advanced stream settings, click the "Advanced toggle".

Advanced stream settings

The advanced settings allow you to create an externally facing ingress to the RT cluster for ingesting data outside of kdb Insights using a feedhandler. This option will enable the 'External Topic Prefix' which allows you to specify an additional prefix for external clients.