Skip to content

Decoders

Decode an external data format into the Stream Processor

Decoding allows data to be converted into a format that can be processed directly within the Stream Processor. Decoders need to be used when ingesting data from an external data format before performing other transformations.

See APIs for more details

A q interface can be used to build pipelines programatically. See the q API for API details.

A Python interface is included along side the q interface and can be used if PyKX is enabled. See the Python API for API details.

The pipeline builder uses a drag-and-drop interface to link together operations within a pipeline. For details on how to wire together a transformation, see the building a pipeline guide.

Arrow

(Beta Feature) Decodes Arrow encoded data

Beta Features

Beta feature are included for early feedback and for specific use cases. They are intended to work but have not been marked ready for production use. To learn more and enable beta features, see enabling beta features.

Arrow decoder node properties

See APIs for more details

q API: .qsp.decode.arrow Python API: kxi.sp.decode.arrow

Required Parameters:

name description default
As List If checked, the decoded result is a list of arrays, corresponding only to the Arrow stream data. Otherwise, by default the decoded result is a table corresponding to both the schema and data in the Arrow stream. No

CSV

Parse CSV data to a table

CSV decoder node properties

See APIs for more details

q API: .qsp.decode.csv •  Python API: kxi.sp.decode.csv

Required Parameters:

name description default
Delimiter Field separator for the records in the encoded data ,

Optional Parameters:

name description default
Header Defines whether source CSV file has a header row, either Always, First and Never. When set to Always, the first record of every batch of data is treated as being a header. This is useful when decoding a stream (ex. Kafka) and each message has a header. First indicates that only the first batch of data will have a CSV header. This should be used when processing files that have a header row. Lastly, None indicates that there is no header row in the data. First

GZIP

(Beta Feature) Inflates (decompresses) gzipped data

Beta Features

Beta feature are included for early feedback and for specific use cases. They are intended to work but have not been marked ready for production use. To learn more and enable beta features, see enabling beta features.

GZIP decoder node properties

See APIs for more details

q API: .qsp.decode.gzip •  Python API: kxi.sp.decode.gzip

Fault Tolerance

GZIP decoding is currently not fault tolerant which is why it is marked as a beta feature. In the event of failure, the incoming data must be entirely reprocessed from the beginnging. The GZIP decoder is only fault tolerant when you are streaming data with independently encoded messages.

GZIP requires no additional configuration. On each batch of data, this operator decodes as much data as it can passing it down the pipeline and buffering any data that cannot be decoded until the next batch arrives.

JSON

Parse JSON data

JSON decoder node properties

See APIs for more details

q API: .qsp.decode.json Python API: kxi.sp.decode.json

Required Parameters:

name description default
Decode Each By default messages passed to the decoder are treated as a single JSON object. Setting decodeEach to true indicates that parsing should be done on each value of a message. This is useful when decoding data that has objects separated by newlines. This allows the pipeline to process partial sets of the JSON file without requiring the entire block to be in memory. No

Protocol Buffers

Decodes Protocol Buffer encoded data.

Configure protocol buffers node properties.

See APIs for more details

q API: .qsp.decode.protobuf Python API: kxi.sp.decode.protobuf

Required Parameters:

name description default
Message Name The name of the Protocol Buffer message type to decode None
Message Definition A .proto definition containing the expected schema of the data to decode. This definition must include a definition of the Message Name referenced above None

Optional Parameters:

name description default
As List If checked, the decoded result is a list of arrays, corresponding only to the Protocol Buffer stream data. Otherwise, by default the decoded result is a table corresponding to both the schema and data in the stream. No

Definition Example:

In this example, the operator is configured to read the "Person" message and decode data with the defined fields. Because As List is unchecked, the resulting data will be a table with the columns name, id and email.

Code Snippet
message Person {
    string name = 1;
    int32 id = 2;
    string email = 3;
    }

Protocol Buffers example configuration