Skip to content

Building assemblies

Deprecation of deployment with assembly yaml files

The deployment of assembly yaml files using the kxi assembly command in the kdb Insights CLI has been deprecated and may be removed from future releases. You should instead use packages, which can be deployed and managed by both the UI and the kdb Insights CLI. For more information about packages, refer to Packages.

This page describes how to develop an assembly using the sdk_sample_assembly.yaml as an example. This serves as a useful example to understand the building blocks of an assembly.

This assembly and other samples are available to download.

Assembly components

The main components of an assembly are:

  • databases - which store and access your data.
  • schema - to define the structure of that data.
  • pipelines - these provide a flexible stream processing service.
  • streams - provide a reliable transport layer.


A database stores your streaming and historical data. It consists of a set of tiers which generally separate the data by age. It must contain an rdb (real-time), an idb (interval) and a hdb (historic) tiers. It can also be configured with an odb tier for object-storage migrations.

The common parameters:

  • size - number of replicas to deploy for performance and resilience
  • mount - how the data for this tier will be stored
  • source - used by the rdb to subscribe to streams. Corresponds to a stream name.
  • rtLogVolume - size of the pod storage for stream log files.
          mountName: rdb
          source: south
            size: 20Gi
          size: 3
          mountName: hdb
            size: 20Gi
          size: 3
          mountName: idb
            size: 20Gi
          size: 3


Mounts coordinates storage and access of a database.

The mounts section is a dictionary mapping user-defined names of storage locations to dictionaries

The needed keys for each storage locations are:

  • type which can be one of the following: stream , local , object
  • baseURI this is where that data can be mounted from other services
  • partition this is the partitioning scheme for this mount none, ordinal, date

Additionally, the dependency key specifies the interdependent relationship between storage locations.

      type: stream
      baseURI: none
      partition: none
      - idb
      type: local
      baseURI: file:///data/db/idb
      partition: ordinal
      type: local
      baseURI: file:///data/db/hdb
      partition: date
      - idb


The schema serves as a blueprint for the database, providing a clear and organized structure for storing and retrieving data.

A schema has a name, a data table with at least one timestamp column and a partition mapped to a timestamp data column. Each table can be defined under the tables. For example, in the case of a trace table, it may be defined with 7 columns, each containing distinct data types and attributes.

  attach: false
    assemblyname: sdk-sample-assembly
      description: Manufacturing trace data
      type: partitioned
      prtnCol: updateTS
      sortColsOrd: [sensorID]
      sortColsDisk: [sensorID]
        - name: sensorID
          description: Sensor Identifier
          type: int
          attrMem: grouped
          attrDisk: parted
          attrOrd: parted
        - name: readTS
          description: Reading timestamp
          type: timestamp
        - name: captureTS
          description: Capture timestamp
          type: timestamp
        - name: valFloat
          description: Sensor value
          type: float
        - name: qual
          description: Reading quality
          type: byte
        - name: alarm
          description: Enumerated alarm flag
          type: byte
        - name: updateTS
          description: Ingestion timestamp
          type: timestamp

In order to use SQL when querying against your schema, you will need to augment the assembly. To set queryEnvironment See SQL

    enabled: true
    size: 1


Pipelines are how kdb Insights Enterprise ingests data from a source and performs stream processing. Pipelines offer a large number of potential data sources for importing from, and are highly configurable;

Multiple pipelines are supported within a single assembly.

source and destination keys relates to the streams

The protectedExecution key enables protected execution within the execution of the pipelines, it increases the granularity of reporting when errors occur within the SP but has an impact on performance of the pipelines.

      description: Transforms incoming data to a table and adds a timestamp
          protectedExecution: false
          source: north
          destination: south
          spec: |-
              columns: `sensorID`readTS`captureTS`valFloat`qual`alarm;

              // Add in updateTS column as the ingestion time
              transformList: {[data] update updateTS:.z.p from flip columns!data };
              transformTable: {[data] update updateTS:.z.p from data };
              transform: {[data] $[(type data)=98h; transformTable[data]; transformList[data]]};

              // Start a pipeline that sends all incoming data through
              // the transform function


Streams are used to transport data around the application, e.g. from a pipeline into the database.

south and north are the names of the streams used and referenced in the database using the source key. Streams can be internal or external and can be associated with pipelines through the source key.

The subTopic is the stream id for an external publisher to subscribe to.

More details of additional keys can be found here

        external: false
          size: 40Gi
        external: true
          subTopic: "sdk-sample-assembly"

Common configuration

Every assembly deployed by kdb Insights Enterprise will be configured with default resources.

These resources are used to ensure optimal performance of your application and protect the cluster. However you may want to override the default with specific resource requests. The k8sPolicy field is used to do this.

The rtLogVolume is used to configure the storage needed for stream logfiles.

          mountName: rdb
            size: 20Gi
                cpu: 100m
                memory: 2Gi
                cpu: 100m
                memory: 2Gi
          size: 3