Creating environment-based pipelines

The following is an extension to the process layout as defined in Creating pipelines. Its purpose is to reduce the overhead of maintaining multiple pipeline YAML files where only the number of instances per host/environment differ.

Environment

Refer to Environment for setup. Once defined, add the environment key to your pipeline.yaml

pipeline:
  name: envexample
  type: realtime
  environment: KX_REFINERY_ENV

In this instance, we've defined our environment variable as KX_REFINERY_ENV, which has a value of dev. You can utilise this value to parse proc-layout from the dev block as outlined in the examples below.

Process layout

environment defines which proc-layout to parse based an environment variable and determines how the processes within an environment pipeline are physically laid out on the servers available. Servers are "tagged" in the system configuration and these tags are used in this element for layout.

Each element of the array is an environment object containing an array of objects that defines an instance of the pipeline. Running multiple instances allows for redundancy in the data capture.

The supported keys of the object are:

  • all-environments: Applies defaults to all processes or per *process-type*

Within an environment object, the supported keys are:

  • all: All processes in the pipeline
  • *process-type*: All processes of the specified type in the pipeline.
  • Within a *process-type*, there is additional support to define an array of host/instance pairs. This allows N proc-type instances to be defined per host.
    Notes:

    • All disk-based processes *per pipeline must be on the same host. This is with the exception of systems using NFS storage, in which a custom client library can be used to disable this restriction.
    • host and instances keys must be present when defining as an array block (see example below).
  • *process-type*.*process-instance*: The specified instance of the process type in the pipeline

Example

proc-layout:
  -
    all-environments:
      all: primary-server
    dev:
      rdb:
        -
          host: primary-server
          instances: 2
        -
          host: secondary-server
          instances: 3

In this example, there will be a total of 5 rdb instances, 2 on primary-server and 3 on secondary-server. Refer to Creating pipelines for the standard procLayout templates, all of which are supported within environment key. This example assigns all of the processes to the same cluster-number of 0, e.g. 0.rdb.0 or 1.rdb.0 .

all example

proc-layout:
  -
    all-environments:
      all: primary-server
    dev:
      rdb: secondary-server
    prod:
      rdb:
        -
          host: primary-server
          instances: 2
        -
          host: secondary-server
          instances: 2

In this example, there will be a total of 1 rdb for dev on secondary-server. There will be 5 rdb instances on prod, 2 on primary-server and 2 on secondary-server. Again, this example assigns all of the processes to the same cluster-number of 0.

*process-type* example

proc-layout:
  -
    all-environments:
      all: primary-server
      tp: tp-server
      rdb: rdb-server
      hdb: disk-processes-server
      ipdb: disk-processes-server
      epdb: disk-processes-server
    prod:
      tp: primary-server
      rdb:
        -
          host: primary-server
          instances: 2
        -
          host: secondary-server
          instances: 2

In this example, there will be a total of 4 rdb instances on prod, 2 on primary-server and 2 on secondary-server. Again, this example assigns all of the processes to the same cluster-number of 0.

*process-type*.*process-instance* example

proc-layout:
  -
    all-environments:
      all: primary-server
    dev:
      rdb.0: primary-server
      rdb.1: secondary-server
      rdb:
        -
          host: primary-server
          instances: 2
    prod:
      rdb.0: primary-server
      rdb.1: secondary-server
      rdb.2: primary-server
      rdb.3: secondary-server
      rdb:
        -
          host: primary-server
          instances: 2
        -
          host: secondary-server
          instances: 2

In this example, there will be a total of 4 rdb instances on dev, 3 on primary-server and 2 on secondary-server. There will be 8 rdb instances on prod, 4 on primary-server and 4 on secondary-server. Again, this example assigns all of the processes to the same cluster-number of 0.

Note

Tags for hosts can be the same between environment blocks; simply specify different hosts in $DELTADATA_HOME/refinery/system-config/system/system.yaml per environment server.

*cluster-number*.*process-type*.*process-instance*

proc-layout:
  -
    all-environments:
      all: primary-server
    prod:
      all: primary-server
      rdb:
        -
          host: primary-server
          instances: 5
      hdb:
        -
          host: primary-server
          instances: 3
    dev:
      hdb.0: primary-server
  -
    all-environments:
      all: secondary-server
    prod:
      all: secondary-server
      rdb:
        -
          host: secondary-server
          instances: 2
      hdb:
        -
          host: secondary-server
          instances: 2
    dev:
      hdb.1: secondary-server

In this example, there will be a total of 7 rdb instances on prod, 5 on primary-server and 2 on secondary-server and there will be a total of 5 hdb instances on prod, 3 on primary-server and 2 on secondary-server. There will be 2 rdb instances on dev, 1 on primary-server and 1 on secondary-server. For this example, the cluster-number will be assigned both 0 and 1, as the pipeline has been configured with more than 1 cluster. Clusters are identified by their cluster-number.

Limitations

  • *process-type*.*process-instance* with multiple hosts/instances per environment block are not supported.
  • Invalid layout
proc-layout:
-
  all-environments:
    all: primary-server
  dev:
    rdb.0:
    -
      host: primary-server
      instances: 1
    rdb.1:
    -
      host: secondary-server
      instances: 1
    hdb:
    -
      host: primary-server
      instances: 2