Get Data - Kafka (Subway)
Apache Kafka is an event streaming platform supported by kdb Insights Enterprise. The
subway Kafka feed generates live alerts for NYC Subway trains - tracking arrival time, station location coordinates, direction and route details.
No kdb+ knowledge required
This example assumes no prior experience with q/kdb+ and you can replace the endpoints provided with other Kafka brokers to gain similar results.
1. Create and deploy a database
To use a pipeline, your database must be deployed and active.
2. Import Data
Open the import wizard by clicking [+] in the ribbon menu or from the left-hand icon menu or Overview page. Then click Start Import. The import process creates a pipeline; a pipeline is a connection of nodes to read data from source, transform to a kdb+ compatible format, then write to a kdb Insights Enterprise database.
Click Start Import to begin the data import process.
3. Select a Reader
A reader stores details of data to import, including any required authentication. Select the Kafka reader and complete connection details; required properties are marked with an
Choose the Kafka reader.
Click Next when done.
4. Select a Decoder
Kafka event data is in JSON and has to be decoded to a kdb+ friendly format (a kdb+ dictionary).
Select a JSON decoder.
Keep default JSON decoder settings:
Keep default JSON decoder settings;
Decode Each is left unchecked.
Click Next when done.
5. Define the Schema
The schema converts data to a type compatible with a kdb+ database. Every imported data table requires a schema; and every data table must have a
timestamp key to be compatible with kdb's time series columnar database.
insights-demo has a predefined schema for
|Apply a Schema
insights-demoschema from the dropdown
subwaytable from the
subwayschema. Parse Strings is set to
autofor all fields.
Determines the requirement to parse input string data to other datatypes. Generally, parse strings is enabled for all
string fields unless your input is IPC or RT; retain the
Auto default if unsure.
Manual entry subway schema
If not adding a schema with a table from a database, add the following columns with the button instead. Column descriptions are optional and not required here:
Click Next when done.
6. Configure Writer
Write transformed data to the kdb Insights Enterprise database.
|Write Direct to HDB
Click to review the pipeline in the pipeline viewer
A pipeline created by the import wizard, reads data from its source, transforms to a kdb+ compatible format, and writes it to a kdb Insights Enterprise database.
Writer - KX Insights Database node is essential for exploring data in a pipeline. The node defines the database to write too that must be active to receive data. Stream data uses its own writer node,
Writer - KX Insights Stream.
7. Review Pipeline
A Kafka pipeline built using the import wizard.
8. Modify Pipeline
The Kafka pipeline requires an additional piece of functionality to convert the decoded data to a kdb+ table prior to deployment. This is done with an enlist of the data.
- In the pipeline template view, click-and-drag into the workspace a Map node from the list of Functions.
- Connect the Map node between the Decoder and Transform node. Remove the existing connection between Decoder and Transform nodes by right-clicking the link and selecting
Adding a Function Map node to a Kafka data pipeline. Connect edges with click-and-drag, right-click a conncetion to remove.
- Edit the Map node properties.
Select the Function Map node to edit its properties.
9. Save the Pipeline
Save and name the pipeline. This name should be unique to the pipeline; for example,
Save the pipeline as
subway-1 pipeline is available in the left-hand pipeline menu.
10. Deploy the Pipeline
Deploy a pipeline to access it's data.
Deploy the pipeline.
Check the progress of pipeline under Running Pipelines of the Overview panel. The data is ready to query when
subway pipeline available for querying.
Database Deployment: If not already active, ensure
insights-demo, or the database created with the
subway schema table, is deployed from Databases in the left-hand menu for it to receive data from the pipeline.
11. Query the Data
insights-demo database and
subway-1 pipeline if not active or running.
Query data on a successful deploy to the database.
12. Visualize the Data
Build a visualization from the data.