RT Hard Reset

A hard reset removes all messages from the RT stream to allow the reliable transport to be restarted with an empty stream.

Warning

This removes all messages from the stream and can result in duplicate or lost data. We recommend you only use this feature if advised to do so by KX Support.

Hard Reset Workflow

The API to execute the Hard Reset can be configured, however the default behavior is listed below:

The following directories on the PVCs are deleted:
- /s/in/
- /s/state/
The /s/out/ logs are truncated, rather than deleted. This ensures that all messages sent prior to the reset are removed from the stream and each subscriber eventually. See note below on truncation.
The session number of the output merged log files being is incremented, i.e. from log.x.y to log.x+1.y
The RT service is restarted.

Note

The session number is maintained in a config map, which is managed outside of the PVCs.

Truncating /s/out/ directory logs

The effect of truncating the out logs, is that after the reset each subscriber's corresponding logs will also be truncated. This ensures that all messages sent prior to the reset are removed from the RT and subscribe nodes.

Consequences of a Hard Reset

Duplicate data

RT keeps track of what data it has processed to avoid sending duplicate data to a subscriber. When executing the hard reset, RT loses knowledge of what it has processed. Therefore, after carrying out a hard reset, RT will have no knowledge of what data has been replicated/merged from a publisher.

If any data resides on the publisher, it will be (re)replicated back into the RT cluster and published on to the subscriber.

To avoid duplicate data, you should take the following steps:

Publisher taken down
Client log files removed
Hard Reset executed

Warning

Taking down the publisher may result in data loss.

Data loss

Data loss can become a factor if the steps above in the duplicate data section are followed.

Beside that, data loss can be a reason for needing a hard reset. For example, if an internal RT log were to become corrupted, a hard reset would be required to get the system back to a healthy state. A hard reset would not be the cause of the data loss in this scenario, but a resolution to a corrupted log. This is similar to a tickerplant becoming corrupted, data can be recovered up to the point the log file was corrupted.

Triggering a Hard Reset

You can call an API on the RT supervisor process running in you kubernetes cluster to trigger a hard reset. This only needs to be performed on one of the nodes in the RT cluster, RT automatically manages the orchestration of the hard reset across the other nodes.

Details on this API are available here.

Alternative behaviour for Hard Reset

This is to cater to cases where data loss is acceptable and we want to remove duplicate data. This change aims to prevent logs files from being re-replicated to the RT servers.

Apply this behaviour

The hard-reset REST request needs to be extended with the input:truncate parameter.

    bash-4.4$ curl http://0:6000/hard-reset/input:truncate

Status

The reset-status REST request can be called on any of the RT nodes in an RT cluster and provides the reset status of that node. This allows the user to identify whether the RT node is in the middle of a hard reset, or whether it is available for a hard reset.

Warning

RT can only handle one reset request at a time. If the RT pod is in the middle of executing a reset command, it will return to a HTTP request with a status of 503 (unavailable).