The Stream Processor offers high-performance performance stateful stream processing by providing an in-memory state store powered by kdb+.
User-defined function and stream operator state can be managed by using the state API, which makes use of an in-memory kdb+ data store for high-performance access. Built-in stateful operators (such as windows and joins) implicitly manage state within the same store.
Any state managed with the state API is checkpointed consistently during periodic checkpoint events, allowing Workers to restore state to a recent snapshot upon recovery after any failure.
Durability and performance
Because managed state is stored during checkpoints, there is a trade-off to the durability gained by using managed state when very large historical state is required. For performance, it can be desirable to manage user state explicitly using global state.
Local state for an operator (local variables within a user-defined function) are stored only during the lifetime of the function call, as normal.
Global state is not managed by the Stream Processor, but can be useful when very large historical state is required, or when using external state storage mechanisms.
To facilitate managing global state, life-cycle hooks can be used to store appropriate markers and metadata within managed checkpoints to reset or roll back global state during recovery.
As an example, consider explicitly storing global state in an append-only kdb+ on-disk table. This may be useful where state grows rapidly and indefinitely. Rather than storing the entire table within the managed state checkpoints through the state API, state could be inserted directly into the on-disk table, and life-cycle hooks used to store an index marker for the table, so that the on-disk table could be truncated to the point of the checkpoint during recovery.