Stream configuration
Streams move and sequence data and messages between components within kdb Insights. kdb Insights includes Reliable Transport (RT) as the primary stream bus. Custom streams can also be used, but they must comply with the RT interface.
Configuration
In kdb Insights Enterprise, all streams use Reliable Transport to move data. In this mode, streams are configured under the spec.elements.sequencer
key of the assembly.
User interface configuration
This guide discusses configuration using YAML files. If you are using kdb Insights Enterprise, you can configure your system using the kdb Insights user interface
Sequencer
The sequencer
field under elements
allows you to optionally define multiple RT stream instances within the Assembly.
The operator will have defaults set for sequencer
at install time, these cover target ports and image details.
Under the key sequencer
each RT stream instance can be defined under its own key, representing the instance name.
spec:
...
elements:
...
sequencer:
north:
size: 3
external: true
externalNodePort: true
useInternalLBAnnotations: false
topicConfig:
subTopic: "data"
key | type | required | description | default | validation |
---|---|---|---|---|---|
size |
integer |
false |
Size of the StatefulSet to be deployed. Note, the size must be consistent for all streams in an assembly. | 3 |
Limited to 1 or 3 |
external |
boolean |
true |
External facing Sequencer, setting true enables External IP. | "false" |
|
externalNodePort |
boolean |
true |
Use node port type for externally facing Sequencer service. | "false" |
|
useInternalLBAnnotations |
boolean |
false |
When enabled will set Service annotations to create an Internal LoadBalancer the external service. | "true" |
|
image |
object |
false |
Image details for container. | ||
env |
list |
false |
List of environment variables. | ||
args |
string[] |
false |
Command line arguments to be passed to container. | ||
topicConfig |
object |
false |
Sequencer Topic Configurations See Sequencer Topics Config. | ||
volume |
object |
false |
RT Sequencer directory paths. See RT Volume. | ||
topicConfigDir |
string |
false |
Location of RT 'pull' directory. | "/config/topics/" |
^[\/]+[a-zA-Z0-9\/-_]*$ |
volumeMounts |
list |
false |
List of standard Kubernetes Volume Mount definitions. Volume must be present in spec.volumes . |
||
k8sPolicy |
object |
false |
Kubernetes Pod configurations. See Kubernetes policy for more details. | ||
archiver |
object |
false |
Sequencer Archiver. |
Topic config
RT Streams can be internal or external to a Kubernetes cluster. Setting external
to true
and adding thetopicConfig
object allows an external publisher to publish to a RT stream which is running inside the cluster. The presence of the topicConfig
object in the assembly file will result in the operator provisioning a set of Load Balancers. The Load Balancers serve as a point of ingress to the cluster.
spec:
...
elements:
...
sequencer:
south:
external: false
north:
external: true
topicConfig:
subTopic: "ext-north"
key | type | required | description | default | validation |
---|---|---|---|---|---|
subTopic |
string |
false |
An external ID for a RT stream. A publisher external to the cluster can use this when requesting RT endpoints from the information service. If topicConfig is include subTopic is required. |
^[a-z0-9]+[a-z0-9-]*[a-z0-9]+$ |
subTopic example
An example of a publisher requesting the RT endpoints from the information service can be found here.
Sequencer volume
The volume
object allows you to configure the Sequencers RT log volume. This is the volume container the sequencer logs for state, subscribing and publishing topics.
spec:
...
elements:
...
sequencer:
south:
volume:
mountPath: "/s/"
subPaths:
in: "in"
out: "out"
cp: "state"
size: "20Gi"
key | type | required | description | default | validation |
---|---|---|---|---|---|
mountPath |
string |
false |
Mount location of volume. | "/s/" |
^[\/]+[a-zA-Z0-9\/-_]*$ |
accessModes |
string[] |
false |
Requested Kubernetes access modes for PVC. | ||
storageClass |
string |
false |
Kubernetes Storage Class. | ||
size |
string |
false |
Kubernetes Storage size request. | "20Gi" |
|
subPaths |
object |
false |
Sub directories under Mount location. | ||
subPaths.in |
string |
false |
Location of RT 'in' sub directory. | "in" |
^[a-zA-Z0-9-_]+$ |
subPaths.out |
string |
false |
Location of RT 'out' sub directory. | "out" |
^[a-zA-Z0-9-_]+$ |
subPaths.cp |
string |
false |
Location of RT 'cp' sub directory. | "state" |
^[a-zA-Z0-9-_]+$ |
Archiver
Each Sequencer has the option to enable an Archiver deployment. This Archiver deployment is used for truncating the Sequencers log file, based on log size or age. There is also an option to configure the Sequencer to archive log files to object storage.
The log files cannot be kept on the Sequencer node indefinitely as the nodes disk space will be finite. While there are configuration options that allow users to control the rate at which data is truncated, the log files will eventually be truncated. When the log file truncation happens, the data in the log file is no longer available, and cannot be recovered. The motivation for the archival to object storage is to provide a backup of your data before the log file is truncated.
Log file truncation
spec:
...
elements:
...
sequencer:
south:
archiver:
retentionDuration: 10080
maxDiskUsagePercent: 90
maxLogSize: 5
key | type | required | description | default | validation |
---|---|---|---|---|---|
retentionDuration |
integer |
false |
Log retention in minutes | 10080 |
|
maxLogSize |
string |
false |
Maximum log size | 50G |
^([+-]?[0-9.]+)([eEinukmgtpKMGTP]*[-+]?[0-9]*)$ |
maxDiskUsagePercent |
integer |
false |
Max disk utilization | 90% |
Log file archival to S3
An example set of configuration which includes the archiver to S3 object storage.
spec:
...
elements
spec:
# ...
elements:
# ...
sequencer:
south:
annotations:
serviceAccount:
eks.amazonaws.com/role-arn: arn:aws:iam::03.....32:role/aws-kxi-rnd-irsa
k8sPolicy:
serviceAccount: "my-aws-sa" # Name of service account for AWS authentication
serviceAccountConfigure:
create: true
env:
- name: RT_AWS_BACKUP_ENABLED
value: "1"
- name: RT_AWS_BACKUP_REGION
value: "us-east-2"
- name: RT_AWS_BACKUP_BUCKET
value: "kxi-rnd"
- name: RT_AWS_BACKUP_KEYPREFIX
value: "prefix/"
- name: RT_AWS_BACKUP_LOGLEVEL
value: "INFO"
- name: RT_AWS_BACKUP_NUM_THREADS
value: "4"
- name: RT_AWS_BACKUP_PARALLEL_FILES
value: "2"
To configure archival to object storage a set of environment variables must be set. You must also create a specific AWS role for your cluster, referenced here as aws-kxi-rnd-irsa
. The setup above adds an AWS service account to the kxi-rt
container, this holds the credentials used to access S3
Naming convention
When log files are backed up to S3 the object key follows this naming convention:
s3://$RT_AWS_BACKUP_BUCKET/$RT_AWS_BACKUP_KEYPREFIX/<RT_STEAMNAME>/<FILENAME>
RT_AWS_BACKUP_KEYPREFIX
should be edited between a kxi-rt
session to avoid conflation of Sequencer logs in object storage.
AWS threads
The facility to archive to object store is built upon the AWS C++ SDK. The reference to threads in the environment variable RT_AWS_BACKUP_NUM_THREADS
, refers to the number of background threads created by the SDK to copy the data to S3. We have chosen a default of 4 threads, however the rate of messages sent to RT may need this value to be increased.
environment variable | default | description |
---|---|---|
RT_AWS_BACKUP_ENABLED |
0 |
The backup is disabled by default, and can be enabled by setting the value to 1 |
RT_AWS_BACKUP_BUCKET |
No default | The S3 bucket that the log files should be written to. Required field if AWS backup is enabled. |
RT_AWS_BACKUP_REGION |
No default | The AWS region where the bucket is hosted. Required field if AWS backup is enabled. |
RT_AWS_BACKUP_KEYPREFIX |
No default | The object key prefix in the bucket under which to backup the log files. This must end in a / , such that all the log files are placed under a directory in the S3 bucket. The RT stream name is automatically appended to this prefix. Required field if AWS backup is enabled. |
RT_AWS_BACKUP_LOGLEVEL |
INFO |
S3 backup logging level, one of NONE , FATAL , ERROR , WARN , INFO , DEBUG or TRACE . |
RT_AWS_BACKUP_NUM_THREADS |
4 |
The number of threads that the AWS backup service should use. |
RT_AWS_BACKUP_PARALLEL_FILES |
2 |
The number of log files that can be backed up in parallel. |