Protocol Buffers (Crime)

No kdb+ knowledge required

This example assumes no prior experience with q/kdb+ and you can replace the endpoints provided with any other Protocol Buffer endpoint to gain similar results.

Protocol buffers are Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data – think XML, but smaller, faster, and simpler. Data structure is first defined before specially generated source code read-and-writes structured data, to-and-from a variety of data streams, using a variety of programming languages.

The crime data set we will be using contains events occurring in NYC, organized by precinct, including location coordinates of the call, dispatch times and a description of the crime.

I want to learn more on how protocol buffers work with Kx Insights Platform

I want to learn more about Protocol Buffers

Pipeline Editor

In this example we will not use the Import Wizard to ingest the data, but instead build a pipeline using the pipeline editor.

Start by creating a new pipeline; click the [+] icon in the document menu bar and select Pipeline, or select [+] from Pipelines in the entity tree menu on the left while in Overview. This will open a new tab in the Document bar with a draft pipeline

protocolbuffer

The draft pipeline editor will start with a list of Readers in the left-hand-menu; click-and-drag the Expression node into the central workspace.

protocolbuffer

Expression Node

Clicking the Expression node will update the properties of the node on the right. The Expression node is a kdb/q editor, here code will be added to take advantage of native REST interfacing functionality (an API supporting interaction with RESTful web services) in the KX Insights Platform using .kurl.sync. This step will require a cut-and-paste of some kdb+/q code:

URL:"https://code.kx.com/kxiwalkthrough/data/crime.msg";
resp:.kurl.sync[(URL;`GET;(::))];
if[200 <> first resp; ' last resp];
"\n" vs last resp

protocolbuffer

Decoder

The output from aforementioned code will resemble a binary format.

The next step will convert the binary format to kdb+/q. To achieve this, drag and drop the Protocol Buffers node from Decoders.

Expression

In order to decode the message from binary to kdb+ two values must be supplied in the Decoder node:

1) Message Name: crime

2) Message Definition:

syntax = "proto3";

message crime {
    uint64 event_id = 1;
    string incident_time = 2;
    uint64 nypd_precinct = 3;
    string borough = 4;
    string patrol_borough = 5;
    uint64 call_x_geo = 6;
    uint64 call_y_geo = 7;
    string radio_code = 8;
    string description = 9;
    string crime_in_progress = 10;
    string call_timestamp = 11;
    string dispatch_timestamp = 12;
    string arrival_timestamp = 13;
    string closing_timestamp = 14;
    double latitude = 15;
    double longitude = 16;
}

Expression

Transform

Next, click-and-drag the Apply a Schema node under Transform into the pipeline workspace. The schema transforms the data to a kdb+/q format compatible with the KX Insights Platform. The properties to edit are:

setting	value
Data Format	Table
Schema	insights-demo
Table	crime

protocolbuffer

With the loaded schema we next have to define what parsing we want to use for the data; this is typically done for time, timestamp and string variables and is defined as follows. This is an essential step, and may cause issues during pipeline deployment if incorrectly set:

protocolbuffer

Parse Strings

This indicates whether parsing of input string data into other datatypes is required. Generally for all time, timestamp, string fields you should have Parse Strings ticked unless your input is IPC or RT.

Link the Apply a Schema transform node to the Protocols Buffer decoder

Writer

The last step of the data ingestion is to define where data will be written too. Click-and-drag into the pipeline workspace KX Insights Database from the list of Writers and connect to our existing pipeline.

protocolbuffer

setting	value
Database	insights-demo
Table	crime
Deduplicate Stream	Yes

Writer - KX Insights Database

The Writer - KX Insights Database node is essential for exploring data in a pipeline. This node defines which assembly and table to write the data too. As part of this, the assembly must also be deployed; deployments of an assembly or pipeline can de done individually, or a pipeline can be associated with an assembly and all pipelines associated with that assembly will be deployed when the latter is deployed.

Deploying

The next step is to deploy your pipeline that has been created in the above Importing stage; for Free Trial, there is a pipeline already named crime, so give this new pipeline a name like crime-1.

Save your pipeline giving it a name that is indicative of what you are trying to do.
Select Deploy.
Once deployed you can check on the progress of your pipeline back in the Overview pane where you started.
When it reaches Status=Running then it is done and your data is loaded.

I want to learn more about Pipelines

Exploring

Select Explore from the Overview panel to start the process of exploring your loaded data. See here for an overview of the Explore Window.

In the Explore tab there are a few different options available to you, Query, SQL and q. Let's look at SQL.

You can run the following SQL to retrieve from the crime table.

SELECT * FROM crime

Note that you will need to define an Output Variable and then select Get Data to execute the query. You should see the output below.

Expression

Refresh Browser

If the tables are not appearing in the Explore tab as expected - it is a good idea to refresh your browser and retry.

Troubleshooting

If the tables are still not outputting after refreshing your browser, try our Troubleshooting page.

Next let's visualize some data on a Report to better understand it. Head over the Reports to get started.