Writers¶

This page explains how to set up writer operators, for kdb Insights Enterprise pipelines, using the Web Interface.

Writers are a specialized type of operators that allow users to push data from a streaming pipeline to different external data sources, using the stream processor.

See APIs for more details

Both q and Python interfaces can be used to build pipelines programmatically. See the q and Python APIs for details.

The pipeline builder uses a drag-and-drop interface to link together operations within a pipeline. For details on how to build a pipeline, see the building a pipeline guide.

Amazon S3¶

The Amazon S3 operator writes data to an object in an Amazon S3 bucket.

Amazon S3 writer properties

Note

See q and Python APIs for more details: Amazon S3

Required Parameters:

Name	Description	Default
Mode	Indicates if this writer should write to a single file path or if it should use a function to derive a function path from the data in the stream.	`Single path`
Destination	If `Single path` is selected, the destination is the path of the object to write to in Amazon S3. If `Dynamic path` is selected, destination is a function that accepts the current batch of data and returns a string of the object path in S3 to save data to
On Teardown	Indicates the desired behavior for any pending data when the pipeline is torn down. The pipeline can perform no action (`None`) and continue to buffer data until the pipeline starts again. This is recommended for most use cases. Alternatively, the pipeline can be set to `Abort` any partial uploads. Or the pipeline can be set to `Complete` any partial uploads.	None

Optional Parameters:

Name	Description	Default
Override File Completion Trigger	When selected, an `Is Complete` function is required to be provided. This allows the completion behavior for partial data to be customized	No
Is Complete	A function that accepts `metadata` and `data` and returns a boolean indicating if after processing this batch of data that all partial data should be completed.
Region	The AWS region of the bucket to authenticate against	`us-east-1`
Use Authentication	Enable Kubernetes secret authentication.	No
Kubernetes Secret	The name of a Kubernetes secret to authenticate with Azure Cloud Storage. Only available if `Use Authentication` is enabled.

Amazon S3 Authentication¶

To access private buckets or files, a Kubernetes secret needs to be created that contains valid AWS credentials. This secret needs to be created in the same namespace as the kdb Insights Enterprise install. The name of that secret is then used in the Kubernetes Secret field when configuring the reader.

To create a Kubernetes secret containing AWS credentials:

kubectl create secret generic --from-file=credentials=<path/to/.aws/credentials> <secret-name>

Where <path/to/.aws/credentials> is the path to an AWS credentials file, and <secret-name> is the name of the Kubernetes secret to create.

Note that this only needs to be done once, and each subsequent usage of the Amazon S3 reader can re-use the same Kubernetes secret.

Console¶

The Console operator writes to the console.

Console writer properties

Note

See q and Python APIs for more details: Console

Optional Parameters:

Name	Description	Default
Prefix	A customized message prefix for data sent to the console.	""
Timestamp	Indicates what timezone should be used for timestamps, either `Local Time`, `UTC` or `No Timestamp`.	`Local Time`
Split Lists	Indicates if each element of a list should be on its own line or if it can all be on a single line.	No
QLog	Logs written by the console output are in raw text format. Check this option to wrap all output in QLog format.	No

kdb Insights Database¶

Use this operator to write data to a kdb Insights database. The operator properties, shown below, are described in the table that follows.

This writer is used by the Import Wizard to configure the writer for pipelines created using the wizard.

There are 2 versions of the kdb Insights Database writer operator.

Version 1: This is the original and default version.
Version 2: Enhances the ingest experience when Write Direct to HDB is enabled.

To change the operator version, click the version drop-down to select either Version 1 or Version 2.

Changing versions overwrites operator configuration

Pipelines built using v1 of this operator retain this version unless you change it. If you change operator versions, the fields of the v2 screen are blank and the v2 fields have to be configured. The v1 current configuration is overwritten and the new configuration is applied when you click Apply. You are notified and asked to confirm this action.

Click the version tabs below to view the operator configuration options available for each version.

Version 2Version 1

Configuration options for the Version 2 kdb Insights Database node are described below.

Database writer Version 2 properties

Read about the Version 2 database writer here.

Multiple database writers

A single pipeline with multiple database writers using direct write is only supported in v2.

In v1, a pipeline cannot include multiple direct write database writers targeting different databases.

However, the following are supported:

multiple direct write database writers targeting the same database
multiple streaming writers

Configuration options for the Version 1 kdb Insights Database node are described below.

Database Version 1 writer properties

Name	Description	Default	V1	V2
Database	The name of the database to write data to.	None	✔	✔
Table	The name of the table to insert data into.	None	✔	✔
Write Direct to HDB	When enabled, data is directly written to the database. For large historical ingests, this option has significant memory savings. When this is checked, data is not available for query until the entire pipeline completes. When a pipeline with this setting enabled is run, a status indicator shows the progress of the data ingest in the banner section of the database screen. The status of the ingest can also be seen on the Batch Ingest tab under Diagnostics.	Unchecked	✔	✔
Deduplicate Stream	For deterministic streams, this option ensures that data that is duplicated during a failure event only arrives in the database once. Only uncheck this if your model can tolerate data that is duplicated but may not be exactly equivalent. This is only available if Write Direct to HDB is disabled. In v2 of this operator, this field is not available if `Write Direct to HDB` is enabled.	Enabled	✔	✔
Set Timeout Value	This controls whether to set a database connection timeout.	Disabled	✔	❌
Timeout Value	The database connection timeout value.	None	✔	❌
Direct Write Overwrite	This is only displayed when Write Direct to HDB is enabled. When checked, this overwrites content within each data partition with the new batch ingest table data. When this is unchecked, batch ingest data is merged with existing data.	Unchecked	✔	✔
Direct Write Mount Name	This is only displayed in the Advanced parameters when Write Direct to HDB is enabled. When using direct write, this name indicates the name of the mount to use writing to the database. This must match the database mount name used for the HDB tier. This value corresponds to `spec.elemeents.sm.tiers.<hdb tier>.mount` in the database assembly file.	`hdb`	✔	❌
Retry Attempts	The maximum number of database connection attempts.	60	✔	✔
Retry Wait Time	The wait time between retry attempts. Click on the time picker to select the hours, minutes, and seconds.	0D00:00:03.000000000	✔	✔

Required fields are marked with an asterisk *. Click ** + Database** to create a new database.

Note

See q and Python APIs for more details: kdb Insights Database

kdb Insights Stream¶

The kdb Insights Stream operator writes data using a kdb Insights Reliable Transport stream. There are two versions of the writer operator. Version 2 is recommended.

Version 1: The original and default version.
Version 2: Provides improved handling of recovery scenarios involving multiple pipelines. While Version 1 supports multiple writers, Version 2 improves recovery behavior and is more robust in these situations.

To change the operator version, click Select Version and select either Version 1 or Version 2.

Changing versions overwrites operator configuration

Pipelines built using Version 1 of this operator retain this version unless you change it. If you change operator versions, the fields of the Version 2 screen are blank and the Version 2 fields have to be configured. The Version 1 current configuration is overwritten and the new configuration is applied when you click Apply. You are notified and asked to confirm this action.

Note

See q and Python APIs for more details: kdb Insights Stream

Select either the Version 2 or Version 1 tab below to view the operator configuration options available for each version.

Version 2Version 1

Stream writer properties

To write data to a stream, you must provide an Assembly Publish Topic value. Configure this in the Settings tab under the Assembly Integration section. The format used should be <package name>-<stream name>

Deploy dialog assembly integration

Required Parameters

Name	Description	Default	V1	V2
Stream	The stream to write to, using the format `<package name>-<stream name>`. This field is only present when using v2.		❌	✔️

Optional Parameters:

Name	Description	Default	V1	V2
Topic	Topic messages will be pushed to.		✔️	✔️
Deduplicate Stream	For deterministic streams, this option ensures that data duplicated during a failure event is written to the database only once. Disable this option only if your data model can tolerate duplicate data that may not be exactly identical.	Yes	✔️	✔️

Note

Topic is called Table in Version 1.

Kafka¶

The Kafka operator publishes data on a Kafka topic.

Kafka writer properties

Note

See q and Python APIs for more details: Kafka

Required Parameters:

Name	Description	Default
Broker	Location of the Kafka broker as host:port. If multiple brokers are available, they can be entered as a comma separated list.
Topic	The name of the Kafka topic to subscribe to.

Optional Parameters:

Name	Description	Default
Use TLS	Enable TLS for encrypting the Kafka connection. When selected, certificate details must be provided with a Kubernetes TLS Secret	No
Kubernetes Secret	The name of a Kubernetes secret that is already available in the cluster and contains your TLS certificates. Only available if `Use TLS` is selected.
Certificate Password	TLS certificate password, if required. Only available if `Use TLS` is selected.
Use Schema Registry	Use the schema registry to automatically decode data in a Kafka stream for JSON and Protocol Buffer payloads. When enabled, will automatically deserialize data based on the latest schema.	No
Registry	Kafka Schema Registry URL. Only available if `Use Schema Registry` is selected.
Auto Register	Whether or not to generate and publish schemas automatically. When enabled, the pipeline will update the registry with a new schema if the shape of data changes. If false, the schema for this topic must already exist.	No

Advanced Parameters:

Name	Description	Default
Retry Attempts	The maximum number of retry attempts allowed.	`5`
Retry Wait Time	The amount of time to wait between retry attempts.	`2s`
Use Advanced Kafka Options	Set to true and click the "+" button to add one or more key-value pairs, where the keys are Kafka configuration properties defined by librdkafka.	No
Consumer Group ID	Sets the Consumer Group ID field.

Use Advanced Configuration

Allows more flexible options for security protocol configuration or changes to fetch intervals, as seen when subscribing to an Azure Event Hub Kafka connector.

See this guide for more details on setting up TLS Secrets

Subscriber¶

This node is used to publish data to websocket subscribers, such as kdb Insights Enterprise Views. It buffers data and publishes at a configurable frequency. Filters can be applied to focus on a specific subset of the data and optionally keys can be applied to see the last value by key. When subscribing to the stream, subscribers will receive an initial snapshot with the current state of the subscription and then receive periodic updates as the node publishes them.

When configuring the operator you must decide whether it is keyed or not.

Keyed: When configured with keyed columns, it stores and publishes the latest record for each unique key combination. For example, a subscriber wanting to see the latest stock price by symbol.
Unkeyed: When configured without any key columns, the node publishes every row it receives and store a configurable buffer size. This buffer is used for snapshots and stores a sliding window of the latest records received. For example, a subscriber to the last 100 trades for a symbol. Once the buffer size limit is reached, it drops older records to store new ones.

Subscriber writer properties

Note

See q and Python APIs for more details: Subscriber

Required Parameters:

Name	Description	Default
Table	Name of the subscription. Used to subscribe to data emitted from this operator.

Optional Parameters:

Name	Description	Default
Keyed Columns	A list of keyed columns in the input data, for example `sym`. Keyed columns must be of type symbol, integer, long or GUID
Publish Frequency	Interval at which the operator sends data to its subscribers (milliseconds)	`500`
Snapshot Cache Limit	Max size (number of rows) for an unkeyed snapshot cache. This is required when the node is configured as unkeyed. When you add keyed columns this field is not displayed.	`2000`

Process¶

The Process operator writes data to a kdb+ process.

Process writer properties

Note

See APIs for more details q and Python APIs: Process

Required Parameters:

Name	Description	Default
Mode	Chose for data to either be added to a table in the remote process or to call a function.	`Upsert to Function`
Handle	An IPC handle to a destination process (ex. `tcps://10.0.1.128:8080`).

Optional Parameters:

Name	Description	Default
Table	When using `Upsert to Function` mode, this is the table name to write data to in the remote process.
Function	When using `Call function` mode, this is the name of the function to invoke in the remote process.
Parameter	When using `Call function` mode, these are additional arguments to pass to the remote function call. The last parameter is always implicitly the data to be sent.
Asynchronous	Indicates if data should be sent to the remote process asynchronously. If unchecked, the pipeline will wait for the remote process to receive each batch of data before sending the next. Disabling this will add significant processing overhead but provide greater reliability.	Yes

Advanced Parameters:

Name	Description	Default
Retry Attempts	The maximum number of retry attempts allowed.	`5`
Retry Wait Time	The amount of time to wait between retry attempts.	`1s`
Queue Length	The maximum async message queue length before flush.
Queue Size	The maximum number of bytes to buffer before flushing.	`1000`

Variable¶

The Variable operator writes the latest data received through the pipeline to a variable.

Variable writer properties

Note

See q and Python APIs for more details: Variable

Required Parameters:

Name	Description	Default
variable	The name of the variable to be written to.

Optional Parameters:

Name	Description	Default
Mode	The writing behaviour. Append will add the latest data to the variable. Overwrite will set the variable to be the last output of the pipeline. Upsert performs an upsert of table data to the variable.	""

Writers¶

Amazon S3¶

Amazon S3 Authentication¶

Console¶

kdb Insights Database¶

kdb Insights Stream¶

Kafka¶

Subscriber¶

Process¶

Variable¶

Further reading¶