Database

Welcome to this introductory walkthrough to the KX Insights Platform. In this walkthrough you will learn about the features of the platform, import data and visualize the results.

The Free Trial version of Kx Insights Platform comes with a ready-made data assembly called insights-demo; an assembly is a collection of elements necessary to store, manage and explore your data. With the insights-demo walkthrough tutorial, you will learn how to access through the platform. You will find insights-demo listed under Data in the left-hand entity tree menu.

Overview

When you are ready to build your own Data assembly you can do so using a step-by-step wizard, but before that, lets look at the elements of insights-demo. The insights-demo assembly is composed of a:

Database
Schema
Pipelines
Streams

Assembly

To access data from a Data assembly , a pipeline with a KX Insights Platform Writer node directing data to that assembly is required. Both the referenced assembly and pipeline have to be deployed and active for the data to be explored. For Free Trial, all pipelines send data to the insights-demo assembly , so by associating our Free Trial pipelines with the insights-demo assembly, we can deploy both pipelines and assembly on just a single deploy of insights-demo.

Assembly and Pipeline Deployments

Assemblies can be deployed without pipelines, and pipelines can be deployed directly without been associated with an assembly . A successfully deployed pipeline with a KX Insights Platform Writer node requires the referenced assembly to be actively deployed in order for its data to be explored. A successful deployment of an assembly will show a green tick next to its name; at that point, the data it contains can be explored.

I want to build my own Data assembly.

Data Assembly

Database

A KX Insights Platform database is an organized collection of data stored on the cloud. Free Trial users have a ready-made database for insights-demo so no further configuration is required. More advanced users can choose to optimize their database settings (but not for Free Trial), although optimizing a database in KX Insights Platform is not typically required.

I want to optimize my database.

Schema

A KX Insights Platform schema keys ingested data to kdb+ data types - required for querying on the platform. The insights-demo has pre-defined schemas for each the four data sets available: weather, subway, health, and crime.

Depending on the method of data import, KX Insights Platform may assign the kdb+ types for you, but there may be situations when schemas have to be manually created and assigned for ingested data; this can be done with a custom schema. Free Trial users do not have to configure a schema and can move to pipelines to continue the tutorial.

Switch through the tabs to view data column names (which would be used in your queries) and respective kdb types for the four sample insights-demo data sets:

weathersubwayhealthcrime

column	type char	type
timestamp	p	timestamp
sensor	s	symbol
airtemp	f	float
name	s	symbol
borough	s	symbol
longitude	f	float
latitude	f	float
color	s	symbol

column	type char	type
trip_id	s	symbol
arrival_time	p	timestamp
stop_id	s	symbol
stop_sequence	h	short
stop_name	s	symbol
stop_lat	f	float
stop_lon	f	float
route_id	h	short
trip_headsign	s	symbol
direction_id	s	symbol
route_short_name	s	symbol
route_long_name	s	symbol
route_desc	C	string
route_type	h	short
route_url	s	symbol
route_color	s	symbol

column	type char	type
name	C	string
lat	f	float
long	f	float
neighborhood	C	string
airquality	f	float
trafficcongestion	f	float
bikeacccnt	i	integer
caracccnt	i	integer
timestamp	p	timestamp

column	type char	type
event_id	s	symbol
incident_time	t	time
nypd_precinct	s	symbol
borough	s	symbol
patrol_borough	s	symbol
call_x_geo	j	long
call_y_geo	j	long
radio_code	s	symbol
description	C	string
crime_in_progress	C	string
call_timestamp	p	timestamp
dispatch_timestamp	p	timestamp
arrival_timestamp	p	timestamp
closing_timestamp	p	timestamp
latitude	f	float
longitude	f	float

I want to build a manual schema for my data table

Pipelines

Pipelines are how KX Insights Platform ingests data from its source for later querying in the platform. Pipelines offer a large number of potential data sources for importing from, and are highly configurable; a data import wizard is available to help users get data into a pipeline and will be used during the walkthrough process.

To access and explore pipeline data it can be deployed directly or it can be associated with an assembly prior to the latter's deployment. In the case of a direct pipeline deployment, that pipeline should contain a writer node which contains details of the assembly and table to write to. In addition, the assembly the data is written to must also be actively deployed.

Each of the insights-demo pipelines have a writer node and can be deployed without first associating them with an assembly. For Free Trial, ready-made pipelines are available for weather, crime, health and subway datasets - this is the one exception Free Trial offers over an individual Data assembly . Custom Data assemblies do not offer pre-made pipelines, but pipelines can be created and added as part of the assembly deploy. A Data assembly can be deployed without requiring a pipeline.

As part of the Free Trial tutorial, instructions for recreating each of the free pipelines are given below.

Streams

Streams is how KX Insights Platform pushes data from a source to the platform. For example, real-time events like a ticker price would be streamed data. Free trial does not currently offer a streaming data source. I want to learn more about Streams.

Deploying a Data assembly

When using KX Insights Platform, the first step is to deploy an empty database to store the data you want to ingest.

The Data assembly required for this walkthrough is called insights-demo and can be launched by selecting Overview -> Data -> insights-demo.

If you wish you can create your own Data assembly , instructions are available here.

During deployment, the Data assembly will go from Deploying to Ready and will show a solid green-circle next to its name in the left-hand entity tree menu when ready to use.

While it is deploying you can hover on the icon to see the status of the deployment and View the logs.

DeployGIF

Check the status of an Data assembly deployment and troubleshoot problems.

Import and Explore

Once the Data assembly has been deployed you can build pipelines and Explore the data imported.

Select a pipeline to build depending on where the data is currently stored:

Weather data - Object Storage
Subway data - Kafka
Health data - SQL Database
Crime data - Protocol Buffers

Object storage is a means of storage for unstructured data, eliminating the scaling limitations of traditional file storage. Limitless scale is the reason object storage is the storage of the cloud; Amazon, Google and Microsoft all employ object storage as their primary storage.

Free Trial offers a sample weather dataset hosted on each of the major cloud providers. Select one of the cloud providers and enter the relevant details for that provider into the first input form.

Import Menu

=== " Google Cloud Storage "

  setting              | value                       |
  ---------------------| --------------------------- |
  | Path*              | gs://kxevg/weather/temp.csv |
  | Project ID         | kx-evangelism               |
  | File Mode*         | Binary                      |
  | Offset*            | 0                           |
  | Chunking*          | Auto                        |
  | Chunk Size*        | 1MB                         |
  | Use Authentication | No                          |

  <!-- ![objectstorage](img/weatherimporGCS.jpg) -->

=== " Microsoft Azure Storage"

  setting              | value               |
  ---------------------| ------------------- |
  | Path*              | ms://kxevg/temp.csv |
  | Account*           | kxevg               |
  | File Mode*         | Binary              |
  | Offset*            | 0                   |
  | Chunking*          | Auto                |
  | Chunk Size*        | 1MB                 |
  | Use Authentication | No                  |

  <!-- ![objectstorage](img/weatherImportAzure.jpg) -->

=== " AWS S3"

  setting              | value                      |
  ---------------------| -------------------------- |
  | Path*              | s3://kxevangelism/temp.csv |
  | Region*            | eu-west-1                  |
  | File Mode*         | Binary                     |
  | Offset*            | 0                          |
  | Chunking*          | Auto                       |
  | Chunk Size         | 1MB                        |
  | Use Authentication | No                         |

  <!-- ![objectstorage](img/weatherImportAWS.jpg) -->

Click Next when done.

Select a Decoder

With the data source defined, next step defines a decoder which references the type of data we are importing. The source data is a csv file, so set the delimiter type - typically a comma for a csv file - and if the source data has a header row.

decoder

setting	value
Delimiter	,
Header	First Row

Click Next when done.

Configure Schema

Imported data requires a schema compatible with the KX Insights Platform. The insights-demo has a predefined set of schemas for each of the data sets available in Free Trial, referenced in the schema table. The configure schema step applies the kdb+ datatypes used by the destination database and required for exploring the data in Insights, to the data we are importing.

However, not every data set imported to KX Insights Platform requires a schema. In this example, there is no requirement to apply a schema, so the Apply A Schema box can be unchecked which disables the schema form.

setting	value
Apply a Schema	`Uncheck`

schema

Click Next when done.

!!! note "Writing data to a table when Apply a Schema is unchecked" For the Free Trial Object Storage import, the Apply a Schema box is unchecked in the Configure Schema step. When it comes to Configure Writer, available tables to write too are pulled from the Schema associated with the Assembly . For Free Trial, these schemas have already been defined for each of the sample data sets, so no further action is required in this tutorial. However, for custom imports it may be necessary to add a data schema to the Assembly for situations where there is no appropriate table in the dropdown to assign the imported data too.

  ![writerschema](img/tablewriter.jpg)

Configure Writer

The final step in pipeline creation is to write the imported data to a table in the database. With Free Trial we will use the insights-demo database and assign the imported data to the weather table.

setting	value
Database	insights-demo
Table	weather
Deduplicate Stream	Yes

Click writer when done. This will complete the import and open the pipeline in the pipeline viewer

!!! note "Writer - KX Insights Database" The Writer - KX Insights Database node is essential for exploring data in a pipeline. This node defines what assembly and table to write the data too. As part of this, the assembly must also be deployed; deployments of an assembly or pipeline can de done individually, or a pipeline can be associated with an assembly and all pipelines associated with that assembly will be deployed when the latter is deployed.

Pipeline view

The pipeline view allows you to review and edit your pipeline by selecting any of the nodes. However, this is not required as part of our Free Trial tutorial.

schema

Click Save and give the pipeline a name. This name should be unique to the pipeline; for Free Trial, there is a pipeline already named weather, so give this new pipeline a name like weather-1.

writerschema

The newly created pipeline will now feature in the list of pipelines in the left-hand-entity tree.

writerschema

KafkaSQL DatabaseProtocol Buffer

Apache Kafka is an event streaming platform that can be easily published to and consumed by the KX Insights Platform. The provided Kafka feed has live alerts for NYC Subway trains containing details such as arrival time of train, stop location coordinates, direction and route details, all as part of the subway data set.

Start by selecting Kafka from the import options

Import Menu

Configure Kafka

setting	value
Broker*	34.130.174.118:9091
Topic*	subway
Offset*	End
Use TLS	No

Click Next when done.

Select a Decoder

Event data on Kafka is of JSON type; select the related decoder to transform the imported data to a kdb+ friendly format (a kdb+ dictionary).

Decoder

Click Next when done.

Configure JSON

The next step is a decoder step specific to JSON, and defines whther the decode parse should be done on each value. This can be left unchanged.

Decoder

Click Next when done.

Configure Schema

Imported data requires a schema compatible with the KX Insights Platform. The insights-demo has a predefined set of schemas for each of the data sets available in Free Trial, referenced in the schema table. The configure schema step applies the kdb+ datatypes used by the destination database and required for exploring the data in Insights, to the data we are importing.

The next step applies a schema to the imported data.

setting	value
Apply a Schema	`Checked`
Data Format	Table

Then attach the schema to the data set, click Schema icon

Select the insights-demo Schema from the dropdown

Schema load

then choose the Table, subway

Schema load

check Parse Strings for the arrival_time and route_desc columns. This is an essential step, and may cause issues during pipeline deployment if incorrectly set.

Parse Strings

This indicates whether parsing of input string data into other datatypes is required. Generally for all time, timestamp, and string fields, Parse Strings should be ticked unless your input is IPC or RT.

Schema load

Click Next when done.

Configure Writer

The final step in pipeline creation is to write the imported data to a table in the database. With Free Trial we will use the insights-demo database and assign the imported data to the subway table.

setting	value
Database	insights-demo
Table	subway
Deduplicate Stream	Yes

Click writer when done. This will complete the import and open the pipeline in the pipeline viewer

Writer - KX Insights Database

The Writer - KX Insights Database node is essential for exploring data in a pipeline. This node defines what assembly and table to write the data too. As part of this, the assembly must also be deployed; deployments of an assembly or pipeline can de done individually, or a pipeline can be associated with an assembly and all pipelines associated with that assembly will be deployed when the latter is deployed.

Pipeline view

The pipeline view allows you to review and edit your pipeline by selecting any of the nodes.

Writer

Additional functionality is required to convert this particular Kafka import to a kdb+ table before it's deployed. This can be done by inserting a Function node into the pipeline created by the import wizard.

From the Pipeline entity-tree menu, select the Map function and drag this inside the pipeline workspace

Pipeline

We will connect this to the pipeline between the Decoder and Transform nodes.

Pipeline

Select the Map function node to edit its properties

Pipeline

The Map function will transform the kdb+ dictionary format using enlist. To achieve this, copy-and-paste into the Configure Map Node code editor the following kdb+/qand select Apply.

{[data]
    enlist data
}

Save the pipeline; the pipeline name should be unique; for Free Trial, there is a pipeline already named subway, so give this new pipeline a name like subway-1.

writerschema

The newly created pipeline will now feature in the list of pipelines in the left-hand-entity tree.

SQL database is a relational database, managed using SQL programming language. In this example, the health data set has information relating to air quality, traffic congestion, car and bike accidents in New York. If you are interested in viewing the data using a PostgreSQL client, instructions are given in the note below - but this step is not required for the Free Trial tutorial.

Optional: Accessing health data using a local PostgreSQL client install


$ psql "sslmode=disable dbname=postgres user=postgres hostaddr=34.132.151.134"
Password for user postgres: dxrHvgtjHHvsOH7B
psql (14.1, server 13.4)
Type "help" for help.

postgres=> \connect health
psql (14.1, server 13.4)
You are now connected to database "health" as user "postgres".
health=> \dt

                List of relations
schema |         name         | type  |  owner
-------+----------------------+-------+----------
public | health               | table | postgres
(5 rows)

For Free Trial we start by selecting PostgreSQL from the Relational database import options

Import Menu

Configure PostgreSQL

Complete the form of the server details

setting	value
Server	34.132.151.134
Port	5432
Database	health
Username	postgres
Query	select * from health

Click Next when done.

Configure Schema

Imported data requires a schema compatible with the KX Insights Platform. The insights-demo has a predefined set of schemas for each of the data sets available in Free Trial, referenced in the schema table. The configure schema step applies the kdb+ datatypes used by the destination database and required for exploring the data in Insights, to the data we are importing.

The next step applies a schema to the imported data.

setting	value
Apply a Schema	`Checked`
Data Format	Table

Then attach the schema to the data set, click Schema icon

Select the insights-demo Schema from the dropdown

Schema load

then choose the Table, health

Schema load

check Parse Strings for the timestamp column. This is an essential step, and may cause issues during pipeline deployment if incorrectly set.

Parse Strings

This indicates whether parsing of input string data into other datatypes is required. Generally for all time, timestamp, and string fields, Parse Strings should be ticked unless your input is IPC or RT.

Schema load

Click Next when done.

Configure Writer

The final step in pipeline creation is to write the imported data to a table in the database. With Free Trial we will use the insights-demo database and assign the imported data to the health table.

setting	value
Database	insights-demo
Table	health
Deduplicate Stream	Yes

Click writer when done. This will complete the import and open the pipeline in the pipeline viewer

Writer - KX Insights Database

The Writer - KX Insights Database node is essential for exploring data in a pipeline. This node defines what assembly and table to write the data too. As part of this, the assembly must also be deployed; deployments of an assembly or pipeline can de done individually, or a pipeline can be associated with an assembly and all pipelines associated with that assembly will be deployed when the latter is deployed.

Pipeline view

The pipeline view allows you to review and edit your pipeline by selecting any of the nodes. However, this is not required as part of our Free Trial tutorial.

Writer

Click Save and give the pipeline a name. This name should be unique to the pipeline; for Free Trial, there is a pipeline already named health, so give this new pipeline a name like health-1.

writerschema

The newly created pipeline will now feature in the list of pipelines in the left-hand-entity tree.

Protocol buffers are Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data – think XML, but smaller, faster, and simpler. Data structure is first defined before specially generated source code read-and-writes structured data, to-and-from a variety of data streams, using a variety of programming languages.

The crime data set we will be using contains events occurring in NYC, organized by precinct, including location coordinates of the call, dispatch times and a description of the crime.

I want to learn more on how protocol buffers work with Kx Insights Platform

I want to learn more about Protocol Buffers

Pipeline Editor

In this example we will not use the Import Wizard to ingest the data, but instead build a pipeline using the pipeline editor.

Start by creating a new pipeline; click the [+] icon in the document menu bar and select Pipeline, or select [+] from Pipelines in the entity tree menu on the left while in Overview. This will open a new tab in the Document bar with a draft pipeline

protocolbuffer

The draft pipeline editor will start with a list of Readers in the left-hand-menu; click-and-drag the Expression node into the central workspace.

protocolbuffer

Expression Node

Clicking the Expression node will update the properties of the node on the right. The Expression node is a kdb/q editor, here code will be added to take advantage of native REST interfacing functionality (an API supporting interaction with RESTful web services) in the KX Insights Platform using .kurl.sync. This step will require a cut-and-paste of some kdb+/q code:

URL:"https://code.kx.com/kxiwalkthrough/data/crime.msg";
resp:.kurl.sync[(URL;`GET;(::))];
if[200 <> first resp; ' last resp];
"\n" vs last resp

protocolbuffer

Decoder

The output from aforementioned code will resemble a binary format. The next decoding step will convert the binary format to kdb+/q. To achieve this, drag-and-drop the Protocol Buffers from the list of Decoders.

protocolbuffer

Transform

Next, click-and-drag the Apply a Schema node under Transform into the pipeline workspace. The schema transforms the data to a kdb+/q format compatible with the KX Insights Platform. The properties to edit are:

setting	value
Data Format	Table
Schema	insights-demo
Table	crime

protocolbuffer

With the loaded schema we next have to define what parsing we want to use for the data; this is typically done for time, timestamp and string variables and is defined as follows. This is an essential step, and may cause issues during pipeline deployment if incorrectly set:

protocolbuffer

Parse Strings

This indicates whether parsing of input string data into other datatypes is required. Generally for all time, timestamp, string fields you should have Parse Strings ticked unless your input is IPC or RT.

Link the Apply a Schema transform node to the Protocols Buffer decoder

Writer

The last step of the data ingestion is to define where data will be written too. Click-and-drag into the pipeline workspace KX Insights Database from the list of Writers and connect to our existing pipeline.

protocolbuffer

setting	value
Database	insights-demo
Table	crime
Deduplicate Stream	Yes

Writer - KX Insights Database

The Writer - KX Insights Database node is essential for exploring data in a pipeline. This node defines what assembly and table to write the data too. As part of this, the assembly must also be deployed; deployments of an assembly or pipeline can de done individually, or a pipeline can be associated with an assembly and all pipelines associated with that assembly will be deployed when the latter is deployed.

Save the pipeline; the pipeline name should be unique; for Free Trial, there is a pipeline already named crime, so give this new pipeline a name like crime-1.

The newly created pipeline will now feature in the list of pipelines in the left-hand-entity tree.

I want to learn more about Pipelines

Add a Pipeline to a Data Assembly

A newly created pipeline built via import, using a pipe-node-builder or pre-made as part of Free Trial, must be added to a Data assembly prior to deployment and data exploration. This can be done in the assembly by selecting the pipeline from a list of available pipelines in the dropdown. You can add as many pipelines as needed to that assembly .

Whether you have re-creatred a pipeline or just want to explore the data freely available in Free Trial, add the protocol buffer/crime pipeline to the insights-demo assembly before continuing.

This is what adding crime and weather pipelines to the Insights-demo assembly :

Assembly

will look like when done

Assembly

I would like to use the data import wizard to build a Pipeline
I want to learn more about Pipelines

Streams

Streams is how KX Insights Platform pushes data from a source to the platform. For example, real-time events like a ticker price would be streamed data. Free trial does not currently offer a streaming data source. To make use of streaming data, upgrade to the starter Assembly.

I want to learn more about Streams.

Deploy

Before working with data in an Assembly it's first necessary to deploy it. Deployment is a requirement to use insights-demo or any Data assembly you create before you can explore the data. During deployment, the Data assembly will go from Deploying to Ready and will show a solid green-circle next to its name in the left-hand entity tree menu when ready to use.

DeployGIF

Check the status of an Data assembly deployment and troubleshoot problems.

Explore

Once an assembly is running with its associated data pipelines, the next step is to query its data with Explore. Explore can be selected from the document bar [+] or by clicking exploreicon from the icon menu or Overview page.

Explore

Data exploration has two sectons:

Data
Scratchpad

Data

Data written to a database is accessed in the Data window.

There are three methods to query data from a database; Query table, SQL or kdb+/q. For new users unfamiliar with SQL or kdb+/q we recommend using the query table as no coding experience is required.

In each case, an Output Variable is defined. It's this Output Variable which is used in the scratchpad to work with your data and is valid only for the current session (the Table name generated by the pipeline is what persists).

For Free Trial, it's not a requirement to use the scratchpad, which requires some knowledge of kdb+/q or Python to explore the data.

QuerySQLq

The query is the simplest way to explore your data. Available data tables are listed in the dropdown for Table Name.

Next, you need to define the Output Variable; this will be auto-assigned when selecting your first data table but may need to be changed if you decide to use a different data set. The Output Variable is used for queries in the scratchpad.

Mandatory is defining the Start Date and End Date filters, and this range will be dependent on the data used; for example, the crime data is for March 31^st 2022.

Additional filters are available, but none of these are required for use in Free Trial.

QUERY

I want to learn more about query filters.

If you are familiar with SQL you can use a simple query to return data.

SELECT * FROM crime

An Output Variable has to be defined for use in queries for the scratchpad.

SQL

If you are familiar with kdb+/q you can use a simple query to return data.

select from crime

An Output Variable has to be defined for use in queries for the scratchpad.

HDB

Scratchpad

Scratchpad is where data returned from a database and outputted to an Output Variable can be manipulated.

Free Trial users are not required to use the Scratchpad to view their data. Scratchpad queries must reference the Output Variable, and not the database table name, although for Free Trial, the database table and the Output Variable name are the same (and this is okay to do).

A scratchpad example for the weather data set to convert air temperature from fahrenheit to celsius, add the following q or python:

q

q update airtempcelcius:(airtemp-32)%1.8 from w

python

python import pykx as kx w = kx.q['w'] df = w.pd() df['airtempcelcius'] = (df['airtemp'] -32 )/ 1.8

In the output, there will be a new column called airtempcelcius!

Troubleshooting

If the database or tables are still not outputting after refreshing your browser, try our Troubleshooting page.

I want to learn more about data exploration

Reports

When you are ready to visualize your data, go to report builder.