Get Data - Kafka

This section provides a walkthrough on how to use kdb Insights Enterprise to stream data from Kafka to monitor NYC subway train punctuality for travel planning.

Apache Kafka is an event streaming platform that seamlessly integrates with kdb Insights Enterprise, enabling real-time data processing through pipelines connected to Kafka data sources.

We have provided a Kafka subway feed for use in this walkthrough, which generates live alerts for NYC Subway trains; tracking arrival time, station location co-ordinates, direction and route details.

No kdb+ knowledge required

No prior experience with q/kdb+ is required to build this pipeline.

Import Data

The import process creates a Pipeline which consists of a set of nodes. These nodes are responsible for reading data from a source, transforming it to a kdb+ compatible format and writing that data to a kdb Insights Enterprise database.

The pipeline requires the following nodes:

Node	Description
Reader	The reader stores details of data to import, including any required authentication.
Decoder	This decodes Kafka event data, which is in JSON, to a kdb+ friendly format (a kdb+ dictionary).
Transform	This applies a schema which converts data to a type compatible with a kdb Insights Database. Every imported data table requires a schema; and every data table must have a timestamp key to be compatible with kdb's time series columnar database. insights-demo has a predefined schema for subway data.
Writer	This writes transformed data to the kdb Insights Enterprise database.
Map	This node uses enlist to convert the decoded data to a kdb+ table prior to deployment.

To setup the pipeline, containing the nodes in the table above:

Before you import data you must ensure the insights-demo database is created, as described here.

On the Overview page, click 2. Import.
In the Import your data screen select the Kafka reader.
In the Configure Kafka screen:
- Enter values for:
  
  Setting Value
  
  Broker kafka.trykdb.kx.com:443
  
  Topic subway
  
  The default values can be accepted for the following:
  
  Setting Value
  
  Offset End
  
  Use TLS Unchecked
  
  Use Schema Registry Unchecked
- Open the Advanced drop-down and check Advanced Broker Options.
- Click + under Add an Advanced Configuration and enter the following key value-pairs:
  
  key value
  
  sasl.username demo
  
  sasl.password demo
  
  sasl.mechanism SCRAM-SHA-512
  
  security.protocol SASL_SSL
- Click Next.
In the Select a decoder screen click JSON.
In the Configure JSON screen click Next, leaving Decode each unchecked.
In the Configure Schema screen:
- Keep Data Format set to Any.
- Click Load Schema icon beside Parse Strings.
- Select insights-demo as the database. This is the database you created here.
- Select subway as the table.
- Click Load.
- Keep Parse Strings set to Auto for all fields.
- Click Next.
In the Configure Writer screen:
- Select insights-demo as the database. This is the database you created here.
- Select subway as the table. . - Keep the default values for the remaining fields.
  
  setting value
  
  Write Direct to HDB Unchecked
  
  Deduplicate Stream Checked
  
  Set Timeout Value Unchecked
- Click Open Pipeline to open a view of the pipeline.
In the pipeline template:
- Click-and-drag a Map node, from the list of Functions, into the workspace.
- Remove the connection between Decoder and Transform nodes by right-clicking the link and selecting Remove Edge.
- Connect the Map node to the Decoder and Transform nodes.
- Click on the Map node to edit its properties and set the enlist data as shown below.
```
{[data]
    enlist data
    }
```
  - Click Apply to apply these changes to the node.

Save the Pipeline

Once you have configured and reviewed your Pipeline you must save it.

Enter a name in the top left of the workspace. The name must be unique to the pipeline; for example, subway-1.
Click Save.
The subway-1 pipeline is available under Pipelines in the left-hand menu.

A Kafka pipeline built using the import wizard.

Deploy the Pipeline

You can now deploy the Pipeline. This will read the data from its source, transforms it to a kdb+ compatible format, and writes it to the insights-demo database.

Click on Save & Deploy in the top panel.
Check the progress of the pipeline under the Running Pipelines panel of the Overview tab, which may take several minutes. The data is ready to query when Status is set to Running.

Pipeline warnings

Once the pipeline is running some warnings may be displayed in the Running Pipelines panel of the Overview tab, these are expected and can be ignored.

You are now ready to query the subway data.

Next Steps

Now that your Pipeline is up and running you can:

key	value
sasl.username	demo
sasl.password	demo
sasl.mechanism	SCRAM-SHA-512
security.protocol	SASL_SSL

setting	value
Write Direct to HDB	Unchecked
Deduplicate Stream	Checked
Set Timeout Value	Unchecked

Setting	Value
Broker	kafka.trykdb.kx.com:443
Topic	subway

Setting	Value
Offset	End
Use TLS	Unchecked
Use Schema Registry	Unchecked