Refinery Pipeline Processes

This page explains what the different processes within a Refinery pipeline are, how they work, and provides examples on how to use the configurable parameters.

TickerPlant (TP)

What does the TP process do?

Tickerplant (TP) handles all incoming streaming data processed by kdb+. It reads data from external sources and passes it to kdb+. TP also enables log file generation during real-time and batch processing.

How does the TP process work?

The process acts as a connector to read data from the streaming source and loads it based on a timer or row count. However, it does not store any data on the server; instead, it reads the data and passes it to the IPDB. It also saves log information based on parameter configurations and enables an alert mechanism for monitoring. The log files are stored within $DELTADATA_HOME/refinery/system-config/journal.

Configurable parameters

This section provides examples of configurable parameters that control how data is published, logged, and rolled over to the database.

Example 1

This Process enables direct reading from the external source and rolls over data to the database based on the configured rollover time.

    tp:
    pub-mode: direct 
    log-to-journal: true
    rollover-mode: daily-at-time
    rollover-time: "00:00:00.001"
    enable-analyst: true
    subscribe-from-delta-messaging: true

Example 2

This process reads data from TP every 100 milliseconds and rolls it over to the database based on the configured rollover time.

tp:
  pub-mode: timer 
  pub-freq-ms: 100
  log-to-journal: true
  rollover-mode: daily-at-time
  rollover-time: "00:00:00.001"
  enable-analyst: true
  subscribe-from-delta-messaging: true

Example 3

This process reads data from TP every 100 milliseconds, processing only 1,000 records at a time, and rolls it over to the database based on the configured rollover time.

tp:
  pub-mode: timer-with-row-batch 
  pub-freq-ms: 100
  pub-row-limit: 1000
  log-to-journal: true
  rollover-mode: daily-at-time
  rollover-time: "00:00:00.001"
  enable-analyst: true
  subscribe-from-delta-messaging: true

Intraday Persisting Database (IPDB)

What does the process do?

The IPDB is responsible for persisting intraday incoming data from TP.It serves as backup storage for the intraday database to prevent data loss during real-time processing.

How does it work

This process works in-memory, and its behavior may vary depending on whether IDB is active or inactive in the pipeline.

Configurable parameters

ipdb:
  write-freq: 30000
  write-row-limit: 0

End of day Persisting Database (EPDB)

What it does

The EPDB is responsible for end-of-day procedures, including sorting new partitions, applying attributes, and moving data to the HDB directory.

How does it work

The EPDB process handles the "late data" directory in the system. End-of-day (EOD) can occur at any time of the day. If data for an existing HDB date is received again through EOD, EPDB creates a copy of the HDB and merges the new data without affecting the original HDB. The late data directory is $DELTADATA_HOME/refinery/system-config/late.

Refer to EOD Rollover for more details.

Configurable parameters

epdb:
  timeout: 0

Intraday Database (IDB)

What it does

This process is used to store the daily data volume. When IDB is added to the pipeline, data is flushed from RDB to IDB. Refer to IDB for information.

How does it work

This process reads data from RDB and IPDB based on intraday completion.

Configurable parameters

idb:
  timeout: 300

Historical Database (HDB)

What it does

HDB maintains all the historic data on disk. HDB data is stored in a splayed format (column split) and then partitioned by date. Each table is stored as a directory therein, with a kdb+ binary file for each column stored on disk.

Refer to HDB for more information.

How does it work

This system maintains all the historic data on disk. The data is stored in $DELTADATA_HOME/refinery/system-config/hdb.

Configurable parameters

See HDB for more details.

hdb:
  timeout: 30

Real-time Database (RDB)

What it does

This process is responsible for reading data from TP and acts as a real-time database for querying from the gateway.

How does it work

This process works in-memory; however, if IDB is not enabled, the data is saved only in the RDB. When IDB is enabled, data is upserted into IDB based on the IPDB intraday write process. IPDB is the one to trigger the rollover to move the data to HDB.

Configurable parameters

See RDB for more information.

rdb:
  timeout: 30
  enable-analyst: true
  instances: 2

Chained TickerPlant (CTP)

What it does

The CTP acts as a protective layer in front of the main Tickerplant (TP) to prevent memory bloat and manage real-time data flow more efficiently. It ensures that if real-time UDF nodes experience a backup, the impact is contained within the CTP rather than affecting the main TP. Additionally, CTP enables integration with legacy KX control processes by broadcasting data onto a Delta Messaging System (DMS) channel when the publish-to-delta-messaging setting is enabled.

How does it work

The CTP manages outbound data buffering, limiting memory overload on the main TP. In case of failures, the CTP can be restarted independently, without affecting the data persistence path from the main TP. The Query Manager (QM) manages subscriptions by brokering between clients and RTE/CTP, ensuring smooth streaming of data.

Configurable parameters

ctp:
  pub-mode: direct
  log-to-journal: false
  rollover-mode: passthrough
  slow-consumer-disconnect: true
  slow-consumer-disconnect-bytes: 1000000
  publish-to-delta-messaging: true

Real-Time Processing Engine (RTE)

What it does

The Real-Time Engine (RTE) function is a Complex Event Processor (CEP) and a dedicated Streaming Calculation Engine within the refinery system. Its core job is to generate immediate, actionable insights and new data streams based on high-velocity raw market data, while ensuring the Real-Time Database (RDB) remains fast and available for querying.

How does it work

The Refinery Real-Time Engine (RTE) works as a high-speed Complex Event Processor (CEP) by establishing a direct subscription to the raw data stream, bypassing the RDB. It processes every incoming data tick, updating its internal, in-memory state to perform complex calculations and create derived data (like VWAP). The RTE then uses a pub/sub model to shed load, publishing only the final, enriched result to a new output channel.

Configurable parameters

There are two ways to configure the RTE in Refinery:

  • Using the rte.*.q file naming convention to load all the custom functionality into the RTE process in the system.
  • Adding the .q file to the lib folder of the client package, and then adding the additional-q-libraries parameter for the RTE in the YAML configuration file.
rte:
  timeout: 30
  enable-analyst: true
  additional-q-libraries:
      -file_name

Late Data Intraday Database (LDIDB)

What it does

This process has the specific role of exposing late data for queries between the time that end of day occurs and the data being merged into the HDB.

How does it work

This process allows querying of late data. If LDIDB is enabled, the late data directory remains available for queries during integration with HDB data. If it is not enabled, no data from the previous day is available for querying. The late data directory is $DELTADATA_HOME/refinery/system-config/late.

Refer to LDIDB for more details.

Configurable parameters

ldidb:
  write-freq: 20000