Decoders

Decode an external data format into the Stream Processor

Decoding allows data to be converted into a format that can be processed directly within the Stream Processor. Decoders need to be used when ingesting data from an external data format before performing other transformations.

See APIs for more details

A q interface can be used to build pipelines programmatically. See the q API for API details.

A Python interface is included alongside the q interface and can be used if PyKX is enabled. See the Python API for API details.

The pipeline builder uses a drag-and-drop interface to link together operations within a pipeline. For details on how to wire together a transformation, see the building a pipeline guide.

Arrow

(Beta Feature) Decodes Arrow encoded data

Beta Features

Beta feature are included for early feedback and for specific use cases. They are intended to work but have not been marked ready for production use. To learn more and enable beta features, see enabling beta features.

See APIs for more details

q API: .qsp.decode.arrow • Python API: kxi.sp.decode.arrow

Required Parameters:

name	description	default
As List	If checked, the decoded result is a list of arrays, corresponding only to the Arrow stream data. Otherwise, by default the decoded result is a table corresponding to both the schema and data in the Arrow stream.	No

CSV

Parse CSV data to a table

CSV decoder node properties

See APIs for more details

q API: .qsp.decode.csv • Python API: kxi.sp.decode.csv

Required Parameters:

name	description	default
Delimiter	Field separator for the records in the encoded data	`,`

Optional Parameters:

name	description	default
Header	Defines whether source CSV file has a header row, either `Always`, `First` and `Never`. When set to `Always`, the first record of every batch of data is treated as being a header. This is useful when decoding a stream (ex. Kafka) and each message has a header. `First` indicates that only the first batch of data will have a CSV header. This should be used when processing files that have a header row. Lastly, `None` indicates that there is no header row in the data.	`First`
Schema	A table with the desired output schema.
Columns to Exclude	A list of columns to exclude from the output.
Encoding Format	How the data is expected to be encoded when being consumed. Currently supports `UTF8` and `ASCII`.	`UTF8`
Newlines	Indicates whether newlines may be embedded in strings. Can impact performance when enabled.	0b

Expected type formats

The parse option allows for string representations to be converted to typed values. For numeric values to be parsed correctly, they must be provided in the expected format. String values in unexpected formats may be processed incorrectly.

Strings representing bytes are expected as exactly two base 16 digits, e.g. "ff"
Strings representing integers are expected to be decimal, e.g. "255"
Strings representing boolean values have a number of supported options, e.g. "t", "1"
- More information on the available formats.

GZIP

(Beta Feature) Inflates (decompresses) gzipped data

Beta Features

Beta feature are included for early feedback and for specific use cases. They are intended to work but have not been marked ready for production use. To learn more and enable beta features, see enabling beta features.

GZIP decoder node properties

See APIs for more details

q API: .qsp.decode.gzip • Python API: kxi.sp.decode.gzip

Fault Tolerance

GZIP decoding is currently not fault tolerant which is why it is marked as a beta feature. In the event of failure, the incoming data must be entirely reprocessed from the beginnging. The GZIP decoder is only fault tolerant when you are streaming data with independently encoded messages.

GZIP requires no additional configuration. On each batch of data, this operator decodes as much data as it can passing it down the pipeline and buffering any data that cannot be decoded until the next batch arrives.

JSON

Parse JSON data

JSON decoder node properties

See APIs for more details

q API: .qsp.decode.json • Python API: kxi.sp.decode.json

Required Parameters:

name	description	default
Decode Each	By default messages passed to the decoder are treated as a single JSON object. Setting `decodeEach` to true indicates that parsing should be done on each value of a message. This is useful when decoding data that has objects separated by newlines. This allows the pipeline to process partial sets of the JSON file without requiring the entire block to be in memory.	No

Pcap

Decode Pcap Data

Pcap decoder node properties

See APIs for more details

q API: .qsp.decode.pcap • Python API: kxi.sp.decode.pcap

Required Parameters:

name	description	default
Columns to Include	The columns to include. If none of the options are selected, the output will include every available column.

Protocol Buffers

Decodes Protocol Buffer encoded data.

Configure protocol buffers node properties.

See APIs for more details

q API: .qsp.decode.protobuf • Python API: kxi.sp.decode.protobuf

Required Parameters:

name	description	default
Message Name	The name of the Protocol Buffer message type to decode
Message Definition	A `.proto` definition containing the expected schema of the data to decode. This definition must include a definition of the Message Name referenced above.

Optional Parameters:

name	description	default
As List	If checked, the decoded result is a list of arrays, corresponding only to the Protocol Buffer stream data. Otherwise, by default the decoded result is a table corresponding to both the schema and data in the stream.	No

Definition Example:

In this example, the operator is configured to read the "Person" message and decode data with the defined fields. Because As List is unchecked, the resulting data will be a table with the columns name, id and email.

Code Snippet

message Person {
    string name = 1;
    int32 id = 2;
    string email = 3;
    }

Protocol Buffers example configuration