Process recovery guide

Setting up a robust system with failover

Let's take a look at a complete guide in how to set up a working basic 2 host hot-hot systems.

Note

A hot-hot system can be made up of N hosts and N instances, but for this demonstration only 2 are being used.

There are a few steps to follow to achieve this:

1. Have access to 2 hosts

2. Have the same Refinery packages set up on both hosts

This is required to be set up by the onboard developer. When setting up your Refinery settings, make sure that the following config is set up correctly:

auto-configure-instance-hostname-a = aaa.host.com
delta-control-master-hostname = aaa.host.com
delta-control-slave-hostname = bbb.host.com
auto-configure-instance-hostname-b = bbb.host.com
...
delta-control-clustering = 1

3. Set up a clustered deployment system

The system pipeline YAML requires a slight change. The 2 different IP addresses of the hosts are added to the layout section (primary-server & secondary-server) of the system YAML.

system:
layout:
    -
    name: primary-server
    nodes:
        -
        host: aaa.host.com
    -
    name: secondary-server
    nodes:
        -
        host: bbb.host.com

default-cpu-taskset: 0-256

data-hierarchy:
    - region
    - data-source
    - data-class
    - sub-class

delta-messaging-server: DS_MESSAGING_SERVER:refinery_a

timezone: UTC

time-sort: false

Next the pipeline YAML requires that you add the primary server and secondary server into the proc-layout. This tells each process which server they are running on. Individual process instances can be split independently between the different servers.

pipeline:
    name: "DemoPipeline"
    type: "realtime"

    expose-to-gw: true

    proc-layout:
    # Example of how process instances can be split across servers
    #  -
    #    tp.0: primary-server
    #    tp.1: secondary-server
    #    hdb.0: primary-server
    #    hdb.1: secondary-server
        -
        all: primary-server
        -
        all: secondary-server

    taxonomy:
        region: test
        data-source: demo

        processes:
        tp:
            pub-mode: timer
            pub-freq-ms: 100
            log-to-journal: true
            rollover-mode: daily-at-time
            rollover-time: "00:00:00.001"
            port: 41221  
            enable-analyst: true     
        rdb:
            port: 41222
            timeout: 30
            enable-analyst: true
        hdb:
            port: 41223
            timeout: 30
            enable-analyst: true
        ipdb:
            port: 41224          
            write-freq: 10000    
            write-row-limit: 0  
            enable-analyst: true
        idb:
            port: 41225
            timeout: 0
            enable-analyst: true
        epdb:
            timeout: 0
            enable-analyst: true

Make sure that there is a table added so the data can be stored (see example below).

table:
    name: DemoTable
    id-col: sym
    time-col:  time
    intra-persist-type: splay
    end-persist-type: date-partition

    taxonomy:
        -
        region: global
        data-source: demo

    columns:
        -
        name: time
        data-type: timestamp
        attribute: sorted
        -
        name: sym
        data-type: symbol
        attribute: grouped
        -
        name: price
        data-type: float
        -
        name: volume
        data-type: long

Note

Make sure that these changes are applied across both hosts.

4. Start up Refinery

Step 1 - start Control

Start up Delta Control on the primary server first and then run it on the secondary server.

refinery application --start-control  

Check the deltaControl.log file to confirm the two Controls have found each other.

refinery logs --view --process DeltaControl

Step 2 - start Control daemon

Start up the Delta Control daemon on the primary server first and then run it on the secondary server.

refinery application --start-daemon  

Step 3 - start Process Manager

Start up Process Manager on the primary server only. The Process Manager is in charge of running all the pipelines and workflows across both servers.

refinery process-manager --start --wait  

Step 4 -start core workflows

Start up the core workflows. With this being a clustered deployment system, 2 instances of each workflow are started up for the servers (A & B).

refinery workflow --start REFINERY_CORE_A
refinery workflow --start REFINERY_ENTRYPOINT_0_a
refinery workflow --start REFINERY_CORE_B
refinery workflow --start REFINERY_ENTRYPOINT_0_b

Step 5 - start pipelines

Start up the pipelines, starting with the default entrypoint pipeline and then the demo pipeline.

refinery pipeline --start DefaultEntrypoint
refinery pipeline --start DemoPipeline

Step 6 -start gateway client

Lastly for setting up Refinery, the gateway client is required.

refinery service-class --start-template refinery-gw-client

5. Publishing data

Now that clustered deployment system is up and running, it's time to publish some data to the system. For this you will need to create a publisher script q file. Using the DemoTable that is already uploaded to Refinery, the following q script opens a connection to the Tickerplant (TP) on each server and publishes dummy data to the table.

Note

This script below publishes randomly generated data at 1 second intervals to both TPs.

tp1: hopen `:aaa.host.com:41221;
tp2: hopen `:bbb.host.com:41221;

i:0;
n:1000;

// function for formatting the log message for the number of rows published, i.e. 100000 -> 100,000.  
hrf:{reverse "," sv 3 cut reverse string x}

gen:{[]
    t: flip `time`sym`price`volume!(n#.z.p-1D;n?`$/:.Q.a;`float$n#i;`float$n?1000);
    i+::1;
    t
    };

.z.ts:{

    tab: gen[];
    tp1(`upd;`DemoTable;tab);
    tp2(`upd;`DemoTable;tab);   
    show"published number ",string[i]," - total ",hrf[i*n]," rows";
    }

\t 1000

After the dummy data has been sent to the TP, the data is published to the RDB. As the data streams in, the IPDB writes batches of this data to disk (IDB). The data is stored in the IDB until EOD where EPDB sorts the data, applies attributes and moves the data to the HDB.

This is a complete basic Refinery system set up with hot-hot recovery enabled, along with a demonstration of how to send data through Refinery.

Failover recovery

Failover is the re-routing of the queries to data when failure occurs in one of the primary route processes. The primary routing state is registered to instance 0 by default and can be observed by running the CLI command:

refinery failover --status --pipeline DemoPipeline

When a process on the primary routing state is killed or fails, the automatic failover operation takes action. Noticing the failed process, the system re-routes the query through the secondary instance of the process.

Failed / killed process

Log message type Specific process name Process Message details
WARN DefaultEntrypoint.0.gw.0-2684 **** 0 gw Active downstream process has disconnected [ Process: DemoPipeline.0.rdb.0 ]

Failover occurring

Log message type Specific process name Process Message details
INFO DefaultEntrypoint.0.gw.0-2684 **** 0 gw Attempting auto-failover to new process instance [ Process: DemoPipeline.0.rdb.0 ] [ Pipeline: DemoPipeline ] [ Instance: 0 ] [ New: 1 ]
INFO DefaultEntrypoint.0.gw.0-2684 **** 0 gw Validating new instance process is available [ Process Name: DemoPipeline.1.rdb.0 ]
INFO DefaultEntrypoint.0.gw.0-2684 **** 0 gw Updating process primary configuration [ Source: DemoPipeline.0.rdb.0 ] [ New: DemoPipeline.1.rdb.0 ]

Restarting processes after they've failed

To restart a process after it's failed, Refinery's --force-start CLI command is required.

refinery pipeline --force-start DemoPipeline --instance N 

Note

For this guide, the primary routing path is set to --instance 0.

Once the process has been restarted, it won't be automatically re-routed back into the primary routing state. This can be clearly seen in the failover status table, where the primary column no longer saying yes for the DemoPipeline.0.rdb.0 but instead says yes for DemoPipeline.1.rdb.0.

[ DefaultEntrypoint.0.gw.0 ] Primary Routing State:

        processName          pipeline     pipelineInstance primary registered busy busySince
        -------------------- ------------ ---------------- ------- ---------- ---- ---------
        DemoPipeline.0.rdb.0 DemoPipeline 0                no      no         no
        DemoPipeline.0.hdb.0 DemoPipeline 0                yes     yes        no
        DemoPipeline.0.idb.0 DemoPipeline 0                yes     yes        no
        DemoPipeline.1.rdb.0 DemoPipeline 1                yes     yes        no
        DemoPipeline.1.hdb.0 DemoPipeline 1                no      no         no
        DemoPipeline.1.idb.0 DemoPipeline 1                no      no         no

To re-route the primary process back into the primary routing path, the following failover CLI command is required:

refinery failover --failover --pipeline DemoPipeline --to-instance 0

The re-routing back to the original primary processes is confirmed in the failover status table seen below:

[ DefaultEntrypoint.0.gw.0 ] Primary Routing State:

        processName          pipeline     pipelineInstance primary registered busy busySince
        -------------------- ------------ ---------------- ------- ---------- ---- ---------
        DemoPipeline.0.rdb.0 DemoPipeline 0                yes     yes        no
        DemoPipeline.0.hdb.0 DemoPipeline 0                yes     yes        no
        DemoPipeline.0.idb.0 DemoPipeline 0                yes     yes        no
        DemoPipeline.1.rdb.0 DemoPipeline 1                no      yes        no
        DemoPipeline.1.hdb.0 DemoPipeline 1                no      no         no
        DemoPipeline.1.idb.0 DemoPipeline 1                no      no         no

Note

This process recovery guide can be applied to any process that fails.