Replication

The Control cluster needs to replicate its internal state to all nodes in the cluster. All state-changes via the public interfaces (Web UI, Process API etc) are automatically persisted to a transaction log and streamed to slaves in Realtime. Each change is tracked by a UID. When a process starts, it connects to the master and compares its own UID with that of the cluster. If it has fallen behind, it will re-sync with the master and become a slave.

The cluster is usually setup via the install script, which writes a CSV file of cluster details. This file is located at ${DELTA_CONFIG}/failover.csv.

servera.domain.com,5000
serverb.domain.com,5000

Start-up

When starting the Control cluster, it's important to ensure all processes are started at the same time. This ensures the cluster is fully in-sync before further changes are made. The example below illustrates why this is important.

  • At previous shutdown time, the slaves were all shutdown first
  • Master was temporarily left running and state-changes occurred
  • This process is now further ahead than the rest of the cluster
  • Now on the next start-up, this process was not started at the same time
  • The other processes start and elect a master but are missing state
  • When the last master starts later, there are two possibilities
  • The cluster will have progressed beyond the previous master - in this case the process will join as a slave and all state changes made to it previously will be lost.
  • The cluster will still be behind the previous master - in this case that process will refuse to start as it's UID is higher than the current master.

In the latter case, it's possible to force start in this mode and accept any resulting data loss. This can be enabled using the below setting in the delta.profile. With this mode enabled, this process will force its start-up and demote the other processes to slave status.

DELTACONTROL_SLAVE_STARTUPOVERWRITE=YES

Always slave mode

For disaster recovery (DR) purposes, there is often a requirement to have processes in the cluster that only subscribe to state changes and never become the master. An example might be having a two processes running in a separate data centre in case of an outage. The network configuration or latency would make it unfeasible for them to ever become master but they can act as a backup at a separate site. To enable this for a Control process, set the following in the delta.profile;

DELTACONTROL_ALWAYS_SLAVE=YES