Architecture¶
Refinery is built around the concept of "data pipelines" (or just "pipelines").
Pipelines¶
- Pipelines are functionally independent from one another
- There is no cross-pipeline communication
- A failure of one pipeline has no impact on any others
- A pipeline is accessed by its data taxonomy
- The taxonomy uniquely identifies each pipeline
- This property allows the Gateway to route to a pipeline based on this taxonomy
- Cannot currently query multiple pipelines in a single getTicks / getStats query
- A pipeline provides a complete real-time capture, persist and query set of processes
- Based on schemas defined and linked by the data taxonomy
- RDBs can be sharded
- MD5 and 'first letter' provided out of the box
- HDBs and IDBs can be clustered
- There is only a single PDB per pipeline
- Can be configured to remove the end-of-day sort by writing per symbol intra-day
- Slower intraday writes and not currently supported alongside the IDB
- Multiple TPs within a pipeline has limited support
- No protection against out-of-order updates received from multiple TPs (potential loss of `s# attribute on time)
- Pipelines are extensible
- Additional process types can be defined and added as required
Architecture diagram¶
The following diagram shows a 3-pipeline Refinery application:

Example layout¶
In a very simple (and unrealistic) example, we could define a data taxonomy with a single element of "city" (multiple possible) and a table schema called "electricity" and allocate 1 pipeline to 3 cities:
- New York -
nyc - London -
lon - Singapore -
sin
There is a feed handler assigned to each city and publishing to each pipeline. The primary ID column is 'postcode' (zip code). Then to query the electricity meter reading for specific postcodes from the database, we must specify the city and the postcode(s):
/ Get meter reading for EC4N 4TQ in London today
getTicks `city`dataType`startDate`endDate`idList!(`lon; `electricity; .z.d; .z.d; `EC4N4TQ);
/ Get meter readings for 048583 and 018983 in Singapore from a week ago
getTicks `city`dataType`startDate`endDate`idList(`sin; `electricity; .z.d - 7; .z.d - 7; `048583`018983);
You could then extend the system with a 'gas' table schema and the 'dataType' field would be used to select between 'gas' and 'electricity'.