RT External Clients (SDK)
An external publisher, or RT client, requires:
- The ability to do HTTP/REST requests.
- A port to your Insights Platform.
- TLS for all communication.
- The ODBC driver or Java SDK.
The publisher and consumer of the messages (an Insights data pipeline and/or database) must agree on:
- Stream name and parameter usage.
- Message format.
Note
The ODBC driver implementation of RT provides a distribution of components that the Java SDK leverages.
See Using the KX Insights PlatformSDKs for a worked example of how to deploy a pipeline and publish data into the KX Insights Platform using the SDKs.
Client Registration
Before messages can flow into the Insights Platform the client SDK must first be enrolled. The client requests the Information Service and Client Controller to enrol a client and it needs to be given the following information as part of the enrollment:
- An access token.
- A unique name for the client.
- The name of the stream (or RT) cluster in the Insights Platform that can receive messages.
The Information Services then returns a URL that is used to access the Insights Platform.
Note
The stream names must match those defined in the Insights Platform for the RT cluster that will receive the messages (also known as a KX Insights Stream).
Full details on the client enrollment workflow can be found here
Requesting Dynamic Configuration
Once enrollment has taken place the RT client (ODBC driver or Java SDK) uses the URL to consult the Information Service, which provides (among other things) the list of the push_server replicators. This request is done by the RT client every 5-seconds to ensure the details are up to date. On failure an RT client will continue to use the existing configuration. (The ODBC driver monitors the configuration url using libcurl to poll the service. Java SDK uses a special configuration thread.)
The information the client receives from this request includes:
- name: The unique name for the client, set during enrollment.
- certificate details: PEM-encoded credentials (CA certificate, private key, certificate file).
- stream name: set during enrollment.
- host-port pairs: for the push_server replicators.
Naming convention for topics
RT combines the topic name and host name to provide a unique directory name for the stream files as part of the replication process. Therefore, it is important to name the topics carefully and when there are multiple (independent) publishers on a host it is vital that each producer uses a different topic name.
Only one topic publisher per host
There must only be one publisher running on a single host, at any one time, using a specific topic name.
Deduplication
If two hosts are publishing the same messages in the same sequence, RT can ensure only one copy of each message is sent to the consumers. This allows support for the failure of one of the publishers.
This deduplication can be performed on a per topic basis, before the messages are merged and sequenced.
To enable deduplication a topic name should be suffixed with .dedup
.
Stream directory naming
Two publishers that need deduplication need to share the same topic name, but as they are on different hosts the naming convention is still honoured, as the stream directory names are defined as $hostname.$topic
.
Publisher logs
- The directory is preserved between runs, but is not shared by multiple log-writers.
- The stream files are rolled every 1GB. When the stream file is rolled and has been replicated to all RT nodes it becomes eligible for garbage collection. This means that a loss of networking won't result in data loss (assuming the network does come back and the publisher has enough disk space to queue during the blackout).
Garbage collection of publisher logs
Publisher log files are automatically garbage collected when both of these conditions are met:
-
The publisher's log file has rolled, for example from
log.0.0
tolog.0.1
. -
The rolled log, for example
log.0.0
, has been replicated to all the RT pods and merged.
At this point log.0.0
is garbage collected on each of the RT pods. The archived flag is then propagated back to publisher so its local log.0.0
is then also garbage collected.