RT Soft Reset
In kdb Insights Enterpise an instance of RT runs using a 3 node cluster. RT can continue to run if there are 2 of the 3 nodes running, however it is not ideal as there is now no additional resiliency and old messages cannot be archived unless all 3 nodes are running. If for some reason the 3rd node cannot come back up or if one of the remaining 2 nodes goes down, messages will eventually stop flowing.
In these cases a soft reset can often allow the 3rd node to rejoin without losing any messages as a soft reset ensures RT retains the knowledge of what has been replicated/merged from a publisher.
Soft Reset Workflow
The API to execute the soft reset can be configured, however the default behavior is listed below:
-
The sequencer service is stopped
-
A snapshot of the current state is saved:
- last input position from each input directory
- watermarks required for deduplication
- last output position
-
All state files (
/s/state
) are removed -
The sequencers are restarted
Triggering a Soft Reset
A user can call an API on the RT supervisor process running in their kubernetes cluster which will trigger a soft reset. This only needs to be performed on one of the nodes in the RT cluster, RT automatically manages the orchestration of the soft reset across the other nodes.
Reset
The softReset
REST request should be called against port 6000 of one the RT nodes in an RT cluster:
GET softReset/
Warning
RT can only handle one reset request at a time. If the RT pod is in the middle of executing a reset command, it will return to a HTTP request with a status of 503 (unavailable).
Status
The resetStatus
REST request should be called against port 6000 of any of the RT nodes in an RT cluster and provides the reset status of that node. This allows the user to identify whether the RT node is in the middle of a soft reset, or whether it is available for a hard or soft reset.
GET resetStatus/
The status returned by the resetStatus
call is UP
during normal operation. However, once a soft reset is requested, it will cycle through a set of operations, updating it's state each time. The soft reset is complete when the returned value reverts back to UP
.
Example execution
In order to use both APIs the user must port-forward port 6000 of one of the applicable RT pods to their localhost. For example where remote 6000 has been port-forwarded to localhost 6000:
Successful call
curl http://127.0.0.1:6000/softReset
{"hostname":"rt-sdk-sample-assembly-north-2","status":"Resetting"}
Unsuccessful call
curl http://127.0.0.1:6000/softReset
{"hostname":"rt-sdk-sample-assembly-north-2","status":"Not available for Reset"}
Reset Status
curl http://127.0.0.1:6000/resetStatus
{"hostname":"rt-sdk-sample-assembly-north-0","status":"UP"}