The Stream Processor offers high-performance performance stateful stream processing by providing an in-memory state store powered by kdb+.
User defined function and stream operator state can be managed by using the state API, which makes use of an in-memory kdb+ data store for high performance access. Builtin stateful operators (such as windows and joins) implicitly manage state within the same store.
Any state managed by using the state API is checkpointed consistently during periodic checkpoint events, allowing Worker's to restore state to a recent snapshot upon recovery after any failure.
Because managed state is stored during checkpoints, there is a trade-off to the durability gained by using managed state when very large historical state is required. For performance, it can be desirable to manage user state explicitly using global state.
Local state for an operator (local variables within a user defined function) are only stored during the lifetime of the function call, as normal.
Global state is not managed by the Stream Processor, but can be useful when very large historical state is required, or when using external state storage mechanisms.
To facilitate managing global state, life-cycle hooks can be leveraged to store appropriate markers and metadata within managed checkpoints in order to appropriately reset or rollback global state during recovery.
As an example, consider explicitly storing global state in an append-only kdb+ on-disk table. This may be useful in instances where state grows rapidly and indefinitely. Rather than storing the entire table within the managed state checkpoints through the state API, state could be inserted directly into the on-disk table, and life-cycle hooks used to store an index marker for the table, so that the on-disk table could be truncated to the point of the checkpoint during recovery.