Get Data - Kafka
The purpose of this walkthrough is to guide you through the steps to ingest data from Apache Kafka into kdb Insights Enterprise.
We have provided a Kafka subway
feed for use in this walkthrough, which generates live alerts for NYC Subway trains; tracking arrival time, station location co-ordinates, direction and route details.
No kdb+ knowledge required
No prior experience with q/kdb+ is required to build this pipeline.
Before you import data you must ensure the insights-demo
database is created, as described here.
Import Data
The import process creates a Pipeline which is a collection of nodes to read data from a source, transform it to a kdb+ compatible format and write that data to a kdb Insights Enterprise database.
Open the import wizard by selecting 2. Import from the Overview page. Next, you are prompted to select a reader node.
Select a Reader
A reader stores details of data to import, including any required authentication.
-
Select the Kafka reader and complete the connection details as follows:
setting value Broker kafka.trykdb.kx.com:443 Topic subway Offset End Use TLS No Use Schema Registry No -
Expand the Advanced parameters section and add the following key value-pairs into the Use Advanced Kafka Options section:
key value sasl.username demo sasl.password demo sasl.mechanism SCRAM-SHA-512 security.protocol SASL_SSL The following screen shows the Configure Kafka screen with the above values set.
-
Click Next.
Select a Decoder
Kafka event data is in JSON and has to be decoded to a kdb+ friendly format (a kdb+ dictionary).
-
Select a JSON decoder.
-
Keep the default JSON decoder settings.
-
Click Next.
Define the Schema
Next you need to define a schema which converts data to a type compatible with a kdb Insights Database. Every imported data table requires a schema; and every data table must have a timestamp
key to be compatible with kdb's time series columnar database. insights-demo
has a predefined schema for subway
data.
-
Complete the Apply Schema properties as follows:
setting value Apply a Schema Checked
Data Format Any -
Click the Load Schema icon and select
insights-demo
from the database dropdown andsubway
from the table dropdown, as shown below. -
Click Load and then Next to open the Configure Writer screen.
Configure Writer
You now need to configure the Writer which writes transformed data to the kdb Insights Enterprise database.
-
Specify the following settings:
setting value Database insights-demo Table subway Write Direct to HDB No Deduplicate Stream Yes Set Timeout Value No -
Click Open Pipeline to review the pipeline in the pipeline viewer
Add Map node to enlist data
The Kafka pipeline requires an additional piece of functionality to convert the decoded data to a kdb+ table prior to deployment. This is done with an enlist of the data which is setup using a Map node. This is described in the next steps.
-
In the pipeline template view, click-and-drag into the workspace a Map node from the list of Functions.
-
Connect the Map node between the Decoder and Transform node. Remove the existing connection between Decoder and Transform nodes by right-clicking the link and selecting
Remove Edge
, as shown in the following animation. -
“Click on the Map node to edit its properties and set the
enlist data
as shown below.{[data] enlist data }
-
Click Apply to apply these changes to the node.
Save the Pipeline
Once you have configured and reviewed your Pipeline you must save it.
-
Enter a name in the top left of the workspace. The name must be unique to the pipeline; for example,
subway-1
. -
Click Save.
-
The
subway-1
pipeline is available under Pipelines in the left-hand menu.
Deploy the Pipeline
You can now deploy the Pipeline. This will read the data from its source, transforms it to a kdb+ compatible format, and writes it to the insights-demo
database.
-
Click on Save & Deploy in the top panel.
-
Check the progress of the pipeline under the Running Pipelines panel of the Overview tab, which may take several minutes. The data is ready to query when
Status=Running
.
Pipeline warnings
Once the pipeline is running some warnings may be displayed in the Running Pipelines panel of the Overview tab, these are expected and can be ignored.
Next Steps
Now that your Pipeline is up and running you can:
- Add data from Postgres.
- Query the data.
- Build a visualization from the data.
Further Reading
To learn more about specific topics mentioned in this page please see the following links: