Streaming subway data to a Data Grid in Views

Beta - For evaluation and trial use only

Streaming to Views is currently in beta.

Refer here to the standard terms related to beta features.
We invite you to use this beta feature and to provide feedback using the portal available here.
Refer here for known limitations of this beta feature

The purpose of this walkthrough is to showcase the ability of kdb Insights Enterprise to visualize streaming data in real-time in a View, by streaming the latest co-ordinates of a set of NY City subway trains to a data grid.

We have provided a Kafka subway feed for use in this walkthrough. This feed generates live alerts for NYC subway trains, tracking which trains are stopped at stations. For each train it provides arrival time, station location coordinates, direction and route details.

In this walkthrough you will:

Create a pipeline to ingest and transform the subway data
Create a View with a Data Grid to display the data

Create a pipeline

Begin by opening the Pipeline editor by clicking [+] on the ribbon menu and clicking Pipeline, as shown below, or by clicking [+] beside Pipelines on the left-hand menu.

Click pipeline from the ribbon dropdown menu to open the pipeline editor.

The pipeline editor opens, as shown below.

pipeline editor.

The pipeline requires the following:

Kafka Node to ingest the data
Decoder Node to decode the data
Map Node to convert the decoded data to a kdb+ table
Transform Node to convert data fields to kdb+ compatible types
Subscriber Node to provide real-time updates of data
Environment Variable to ensure the subscriber node updates the stream every 10^th of a second

Once these are configured you can save and deploy the pipeline. You can then:

Create a View
Add filtering to the data grid
Add a map to your view to display the streaming data

Add a Kafka node

You begin your pipeline with a Kafka node to read the data from the subway feed we have provided.

Click-and-drag a Kafka node, from the Readers, into the workspace.
Select the Kafka node and add the following connection details to the Configure Kafka Node panel:

setting value

Broker kafka.trykdb.kx.com:443

Topic subway

Offset End

Use TLS Unchecked

Use Schema Registry Unchecked
Expand the Advanced parameters section and tick Use Advanced Kafka Options. Click the [+] to add an advanced configuration and enter the following key value pairs:

key value

sasl.username demo

sasl.password demo

sasl.mechanism SCRAM-SHA-512

security.protocol SASL_SSL
Click Apply to apply these changes to the Kafka node.

Add a Decoder node

Kafka event data is in JSON and has to be decoded to a kdb+ friendly format (a kdb+ dictionary).

Click-and-drag a JSON decoder node, from the Decoders, into the workspace, and connect it to the Reader node.
Keep the default JSON decoder settings.

Add a Map node

The decoded data needs to be converted to a kdb+ table. The Map node converts an incoming dictionary to a table using an enlist.

Click-and-drag a Map node, from the Functions, into the workspace, and connect it to the Decoder node.
Select the Map node and replace the code in the right-hand Configure Map Node property panel with the following code:
```
{[data]
    enlist data
    }
```
Click Apply to apply these changes to the Map node.

Add a Transform Node

The next step is to add a Transform node which transforms the subway data fields to kdb+ compatible types and drops the fields that are not going to be displayed in the View.

Converting strings to symbols for use as keyed columns in the Subscriber node

The data from Kafka that is used in this example includes strings. The Subscriber node requires any columns defined as keyed columns to be symbols. This step is required to convert the strings to symbols.

If you are using a Subscriber node with your own data and the data in your keyed columns are already symbols this step is not necessary. See the next section for details on keyed columns.

Click-and-drag the Apply Schema node from the list of Transform nodes and connect it to the Decoder node.
In the Schema section click the [+] icon under the "add schema columns below" text. Then add the following columns, leaving Parse Strings set to Auto for all columns:

column name colomn type

arrival_time Timestamp

stop_name Symbol

stop_lat Float

stop_lon Float

direction_id Symbol

route_long_name Symbol

route_color Symbol
Click Apply to apply changes to the node.

Add a Subscriber node

A Subscriber node emits a summarized view of the input data, with one record per unique combination of values in the list of keyed columns. The remaining columns are set to their latest values.

Once deployed, the emitted stream of data is available as a data source in a View. This is described in the Create a View section later.

In this example the node is configured so that it emits updates to the co-ordinates of trains per route and direction of travel (i.e. inbound or outbound).

Click-and-drag the Subscriber node, from the list of Writers into the central workspace, and connect it to the Apply Schema node.
Select the Subscriber node and set the Table to subway.
Add the following Keyed Column values:

value

route_long_name

direction_id

Defining these keyed columns ensures that:
- The output of the Subscriber node has one record per train route and direction.
- You can filter these columns in the View, as you can only filter on Keyed Columns.
Click Apply to apply these settings.

Configure the pipeline settings

An environment variable needs to be added to the pipeline to ensures the subscriber node updates the stream every 10^th of a second.

Click on the Settings tab for the pipeline.
Scroll down to the Environment Variables and click Add Variable. Add the following variable details.

variable value details

KXI_SP_SUBSCRIBER_PUBLISH_INTERVAL 100 This determines how regularly the subscriber updates the web-socket

Save and deploy the pipeline

The pipeline is now configured and ready to be saved and deployed.

Enter subway-streaming in the Name field in the top left of the workspace.
Click Save & Deploy in the top panel.
Check the progress of the pipeline under the Running Pipelines panel of the Overview tab. The data is ready to visualize when the Status is set to Running. Note it may take a few minutes for the pipeline to start running.

Create a View

Now that the data is being streamed to a pipeline containing a Subscriber node, you can create a View with a Data Grid to display the locations of the subway trains that are stopped at stations.

Select Visualize on the Overview page, or View from the ribbon menu, as shown below.
Click-and-drag a Data Grid, which is the first component on the list, from the icon menu into the central workspace.
In the Data Grid component, Click to populate Data Source to open the Data editor.
Click New to create a new data source and name it streaming.
Click on the Streaming radio button and set the following properties:

value	setting
pipeline	This is automatically populated with any running pipelines with a Subscriber node. Select `subway-streaming`.
table	This is automatically populated with tables associated with the pipeline selected. Choose `subway`.
filters	leave empty

Click Execute to check the results of the new query. The results are displayed in the Results panel at the bottom of the screen, as shown below. Verify that the streaming position values, with updates visible as data is streamed, are displayed.

Streaming properties

Click Apply and then Select Item.
With the Data Grid selected, set the Sort Column, in the Basics properties, to route_long_name.
Save the View by clicking the Save Dashboard icon at the top of the workspace.

The Data Grid now displays a record per train showing the most recent stop along the NY City subway.

Streaming to a data grid

Filtering the Data Grid

Now that you have a View that is being updated with streamed data you can add filters, based on any of the Keyed Columns, to show a subset of the data.

You can only filter on the Keyed Columns, filtering on any other column causes a timeout.

With the Data Grid selected in the new View, open the Data Source and choose the streaming Data Source from the list on the left-hand side.
Type in one of the filters below, which are examples of how filtering can be applied:

filter filter description

{"direction_id":"inbound"} All inbound trains

{"route_long_name":"6 Avenue Express"} All trains on the 6 Avenue Express route
Current filter restrictions
- Filters must be defined using the following JSON format: {"keyed_column":"value"}.
- Only one keyed column can be filtered on.
- Only one value can be applied to the keyed column.
Click Execute to check that the results, displayed in the Results panel at the bottom of the screen, have applied the filter to the data.
Click Apply and then Select Item.

The Grid now only lists records that match the filter being applied. The following example shows the grid being updated with data for all inbound trains only. Streaming data inbound trains

Advanced Filtering using View States

If you are familiar with View States you can pair the value in this filter with another component, for example a dropdown, so that you can choose the filter value in the View.

Add a map

If you have a access to an API Key for Google Maps Platform you can add a Map component to your View to display the subway data.

This step is optional. If you want to proceed you must create an API Key for Google Maps Platform as described here.

To add a map to your View:

In the design mode for your View, search for the Map component in the list of components, on the left, and drag it into the workspace.
Click on the Map component and configure it as follows:
1. Click "set Google Maps JavaScript API Key in Map Key Property". The Map Key is saved as a view state inside your view.
2. Click New and name the node mapkey.
3. In the Properties section for mapkey, leave Type set to Symbol. Enter your API Key for Google Maps into the Default and Value properties and click Select Item.
4. Centralize the map on New York by setting the values, in the following table, in the Map properties. These are illustrated in the screenshot below the table.
  
  setting value
  
  CenterX 40.75
  
  CenterY -73.98
Define the Data Source from which the component will get data:
Click on Data Source in Points properties.
Create a new data source and name it streaming-map.

!!!note "If you are want to share the filtering options that you setup for the Data Grid, select the existing streaming data source here."

Click on the Streaming radio button and set the following properties:

value     | setting
--------- | -------
pipeline  | subway-streaming
table     | subway
filters   | leave empty

Click Execute to check the results are displayed in the Results panel at the bottom of the screen, as shown below. This displays the streaming position values with updates visible as data is streamed.
Click Apply and then Select Item.
To ensure the trains are named, positioned and coloured on the map based on the streaming data set the Points properties as follows:

properties value

Selected Attr route_long_name

Latitude Data stop_lat

Longitude Data stop_long

Shape train

Shape Colour (From DB) route_colour

Cluster Unchecked
Save the View by clicking the Save Dashboard icon at the top of the workspace.

Your View now contains a data grid and a map updating for all inbound and outbound trains, as illustrated below.

Streaming View

setting	value
Broker	kafka.trykdb.kx.com:443
Topic	subway
Offset	End
Use TLS	Unchecked
Use Schema Registry	Unchecked

key	value
sasl.username	demo
sasl.password	demo
sasl.mechanism	SCRAM-SHA-512
security.protocol	SASL_SSL

column name	colomn type
arrival_time	Timestamp
stop_name	Symbol
stop_lat	Float
stop_lon	Float
direction_id	Symbol
route_long_name	Symbol
route_color	Symbol

filter	filter description
`{"direction_id":"inbound"}`	All inbound trains
`{"route_long_name":"6 Avenue Express"}`	All trains on the 6 Avenue Express route

properties	value
Selected Attr	route_long_name
Latitude Data	stop_lat
Longitude Data	stop_long
Shape	train
Shape Colour (From DB)	route_colour
Cluster	Unchecked

setting	value
CenterX	40.75
CenterY	-73.98