Building assemblies
This page describes how to develop an assembly using the sdk_sample_assembly.yaml
as an example.
This serves as a useful example to understand the building blocks of an assembly.
This assembly and other samples are available to download.
Assembly components
The main components of an assembly are:
- databases - which store and access your data.
- schema - to define the structure of that data.
- pipelines - these provide a flexible stream processing service.
- streams - provide a reliable transport layer.
Database
A database stores your streaming and historical data.
It consists of a set of tiers which generally separate the data by age.
It must contain an rdb
(real-time), an idb
(interval) and a hdb
(historic) tiers.
It can also be configured with an odb
tier for object-storage migrations.
The common parameters:
size
- number of replicas to deploy for performance and resiliencemount
- how the data for this tier will be storedsource
- used by the rdb to subscribe to streams. Corresponds to a stream name.rtLogVolume
- size of the pod storage for stream log files.
instances:
rdb:
mountName: rdb
source: south
rtLogVolume:
size: 20Gi
size: 3
hdb:
mountName: hdb
rtLogVolume:
size: 20Gi
size: 3
idb:
mountName: idb
rtLogVolume:
size: 20Gi
size: 3
Mounts
Mounts coordinates storage and access of a database.
The mounts
section is a dictionary mapping user-defined names of storage locations to dictionaries
The needed keys for each storage locations are:
type
which can be one of the following:stream , local , object
baseURI
this is where that data can be mounted from other servicespartition
this is the partitioning scheme for this mountnone, ordinal, date
Additionally, the dependency
key specifies the interdependent relationship between storage locations.
mounts:
rdb:
type: stream
baseURI: none
partition: none
dependency:
- idb
idb:
type: local
baseURI: file:///data/db/idb
partition: ordinal
hdb:
type: local
baseURI: file:///data/db/hdb
partition: date
dependency:
- idb
Schema
The schema serves as a blueprint for the database, providing a clear and organized structure for storing and retrieving data.
A schema has a name, a data table with at least one timestamp
column and a partition mapped to a timestamp
data column.
Each table can be defined under the tables
. For example, in the case of a trace
table, it may be defined with 7 columns,
each containing distinct data types and attributes.
spec:
attach: false
labels:
assemblyname: sdk-sample-assembly
tables:
trace:
description: Manufacturing trace data
type: partitioned
prtnCol: updateTS
sortColsOrd: [sensorID]
sortColsDisk: [sensorID]
columns:
- name: sensorID
description: Sensor Identifier
type: int
attrMem: grouped
attrDisk: parted
attrOrd: parted
- name: readTS
description: Reading timestamp
type: timestamp
- name: captureTS
description: Capture timestamp
type: timestamp
- name: valFloat
description: Sensor value
type: float
- name: qual
description: Reading quality
type: byte
- name: alarm
description: Enumerated alarm flag
type: byte
- name: updateTS
description: Ingestion timestamp
type: timestamp
In order to use SQL when querying against your schema, you will need to augment the assembly.
To set queryEnvironment
See SQL
spec:
queryEnvironment:
enabled: true
size: 1
Pipelines
Pipelines are how kdb Insights Enterprise ingests data from a source and performs stream processing. Pipelines offer a large number of potential data sources for importing from, and are highly configurable;
Multiple pipelines are supported within a single assembly.
source
and destination
keys relates to the streams
The protectedExecution
key enables protected execution within the execution of the pipelines, it increases
the granularity of reporting when errors occur within the SP but has an impact on performance of the pipelines.
elements:
sp:
description: Transforms incoming data to a table and adds a timestamp
pipelines:
sdtransform:
protectedExecution: false
source: north
destination: south
spec: |-
columns: `sensorID`readTS`captureTS`valFloat`qual`alarm;
// Add in updateTS column as the ingestion time
transformList: {[data] update updateTS:.z.p from flip columns!data };
transformTable: {[data] update updateTS:.z.p from data };
transform: {[data] $[(type data)=98h; transformTable[data]; transformList[data]]};
// Start a pipeline that sends all incoming data through
// the transform function
.qsp.run
.qsp.read.fromStream[]
.qsp.map[transform]
.qsp.write.toStream[]
Streams
Streams are used to transport data around the application, e.g. from a pipeline into the database.
south
and north
are the names of the streams used and referenced in the database using the source
key.
Streams can be internal or external and can be associated with pipelines through the source
key.
The subTopic
is the stream id for an external publisher to subscribe to.
More details of additional keys can be found here
sequencer:
south:
external: false
volume:
size: 40Gi
north:
external: true
topicConfig:
subTopic: "sdk-sample-assembly"
Common configuration
Every assembly deployed by kdb Insights Enterprise will be configured with default resources.
These resources are used to ensure optimal performance of your application and protect the cluster.
However you may want to override the default with specific resource requests. The k8sPolicy
field is used to do this.
The rtLogVolume
is used to configure the storage needed for stream logfiles.
rdb:
mountName: rdb
rtLogVolume:
size: 20Gi
k8sPolicy:
resources:
limits:
cpu: 100m
memory: 2Gi
requests:
cpu: 100m
memory: 2Gi
size: 3