Develop Trade Lifecycle Analytics combining Live and Historical Financial Data

Overview

Use kdb Insights Enterprise to stream both live and historical tick data and develop interactive analytics powering faster and better in-the-moment decision-making.

Benefits

Live + Historical View	To maximize opportunity, you need both real time and historical views of your market share. Use kdb Insights Enterprise to stream live tick data alongside historical data for the big picture view of the market.
React in Real Time	React to market events as they happen; not minutes or hours later to increase market share through efficient execution.
Aggregate Multiple Sources	Aggregate your liquidity sourcing for a better understanding of market depth and trading opportunities.

Financial Dataset

The kdb Insights Enterprise enables users to work with both live and historical data, we have provided both a real-time price feed and historical trade dataset. The real-time price feed will be used to calculate a range of moving averages and a position in the market.

With the Financial dataset, users will learn how kdb Insights Enterprise can manage live and historical data, along with the integration of transactional data.

Step 1: Build Database

Configure Database

The first step is to build a database to store the data you will need to ingest.

Free Trial Users

The Free Trial version of kdb Insights Enterprise comes with a ready-made assembly called equities accessible from the left handside toolbar. You can simply open and deploy the assembly. For all other users, follow the steps below.

Select Build a database from the home screen Getting Started page.

From here, you can enter the name and select the size of your database. Let's give this database a name and select Next.

Configure Schema

The next step requires the user to know information about the incoming data ahead of time:

column header names
kdb+ datatypes

Below are the schemas of the data used in this recipe:

tradecloseanalytics

column	type char	type
timestamp	p	timestamp
price	f	float
volume	i	integer

column	type char	type
Date	d	date
Time	p	timestamp
Open	f	float
High	f	float
Low	f	float
Close	f	float
AdjClose	f	float
Volume	j	long
AssetCode	s	symbol

column	type char	type
timestamp	p	timestamp
vwap	f	float
twap	f	float
open	f	float
high	f	float
low	f	float
close	f	float

For more information on kdb+ data types see here

The name of the tables, column headers and the kdb+ datatypes can be entered in the Configure Schema page.

Before proceeding, toggle the Advanced option in the Schema configuration. Then, for our table trade, ensure the following Table Properties are defined. If you are configuring a different dataset you can adjust as appropriate.

setting	value
Type	partitioned
Partition Column	time
Timestamp Column	time

Once all tables have been added, select Next.

Deploy Database

From the next screen you can simply select Save & Deploy.

This will take a few seconds and the grey circle will become a green tick and change to Active status when ready.

That's it, you are now ready to Ingest Data!

Step 2: Ingesting Live Data

Importing from Kafka

Apache Kafka is an event streaming platform that can be easily published to and consumed by kdb Insights Enterprise. The provided Kafka feed has live price alerts that we will consume in this Recipe.

Select Import from the Overview panel to start the process of loading in new data. Select the Kafka node and enter the details as listed below to connect to the datasource.

Here are the broker and topic details for the price feed on Kafka.

Setting	Value
Broker*	34.130.174.118:9091
Topic*	spx
Offset*	End

Decoding

The event data on Kafka is of JSON type so you will need to use the related decoder to transform to a kdb+ friendly format. This will convert the incoming data to a kdb+ dictionary.

Select the JSON Decoder option.

Transforming

The next screen that comes up is Apply Schema. This is a useful tool that transforms the upstream data to the correct datatype of the destination database.

For Data Format, select Table from the dropdown.

Next, click on the blue "+" icon next to Parse Strings, this will popup a Load Schema dialog.

You can then select equities database and trade table from the dropdown and select "Load".

Leave Parse Strings set to Auto for all fields.

Writing

Finally, from the Writer dropdown select the equities database and the trade table to write the data to, then Open Pipeline.

Apply Function

Before we deploy our pipeline there is one more node to add. The data has already been converted from an incoming JSON format on the Kafka feed, to a kdb+ friendly dictionary.

We need to further convert this to a kdb+ table before we deploy so it is ready to save to our database. We can do this by adding a Function node Map. The transformation from the kdb+ dictionary format to kdb+ table is doing using enlist. Copy and paste the below code into your Map node and select Apply.

Expression

In this section you are transforming the kdb+ dictionary format using enlist. Copy and paste the below code into your Map node and select Apply.

{[data]
    enlist data
}

You will need to Remove Edge by hovering over the connecting line between the Decoder and Transform nodes and right clicking. You can then add in links to the Function map node.

Deploying

Next you can Save your pipeline giving it a name that is indicative of what you are trying to do. Then you can select Deploy.

Once deployed you can check on the progress of your pipeline back in the Overview pane where you started. When it reaches Status=Running then it is done and your data is loaded.

Exploring

Select Explore from the Overview panel to start the process of exploring your loaded data.

Select SQL on the top of the page, you can enter the following SQL to retrieve the count of the trade table.

SELECT COUNT(*) FROM trade

Define an 'Output Variable' and then select 'Get Data' to execute the query.

As new messages appear on the Kafka topic they will be added to the kdb+ table; a rerun of the query will show an increase in the data count as a result.

Nice! You have just successfully setup a table consuming live events from a Kafka topic with your data updating in real time.

Step 3: Ingesting Historical Data

Importing from Object Storage

Object storage is a means of storage for unstructured data, eliminating the scaling limitations of traditional file storage. Limitless scale is the reason object storage is the storage of the cloud; Amazon, Google and Microsoft all employ object storage as their primary storage.

In this recipe we will be importing the historical trade data from Azure Blob Storage.

Select Import from the Overview panel to start the process of loading in new data.

Enter the corresponding details listed to connect to Microsoft Azure Storage.

setting	value
Path*	ms://kxevg/close_spx.csv
Account*	kxevg
File Mode*	Binary
Offset*	0
Chunking*	Auto
Chunk Size*	1MB

Decoding

Select the CSV Decoder option.

setting	value
Delimiter	,
Header	First Row

Transforming

The next screen that comes up is Configure Schema. You can select the close table as follow similar steps outlined in the previous section and parse all fields.

Remember to Parse

Generally, for all fields transformed after CSV decoding it's necessary to parse the data.

Writing

Finally, from the dropdown select the equities database and the close table to write this data to and select Open Pipeline.

Deploying

The next step is to deploy your pipeline that has been created in the above Importing stage.

You can first Save your pipeline giving it a name that is indicative of what you are trying to do. Then you can select Deploy.

Once deployed you can check on the progress of your pipeline back in the Overview pane where you started. When it reaches Status=Running then it is done and your data is loaded.

Exploring

Select Explore from the Overview panel to start the process of exploring your loaded data.

Select SQL and enter the following SQL to retrieve count of the close table.

SELECT * FROM close

Define an 'Output Variable' and then select 'Get Data' to execute the query.

Step 4: Create Powerful Analytics

Prices Over Time

We will query S&P index data to determine the best time to buy and sell. To do this we can run a simple statement: Select SQL and enter the following SQL to retrieve count of the close table.

SELECT * FROM trade

Notice that the output can be viewed in the console, table or even rendered into a chart with rich visualization. When swithing between tabs it is required that you rerun the query.

In this example, one that displays prices over time.

Long and Short Position Simple Moving Averages

So far in this recipe, we have just looked at raw data. Where we begin to add real business value is in the ability to run advanced data analytics. We will do this by creating a trade signal using two simple moving average calculations; one of which will be faster i.e. have a shorter time period than the other (slow) average. The trade position will be created on the crossover between the two moving averages. This is an "always-in-the-market" signal, so we will either be Long or Short the market; long when the faster moving average is above the slow one, and vice versa for short.

We can do this in kdb+/q by creating a variable named analytics for calculating our fast and slow simple moving average. We can then visualize the output in a chart.

// taking the close price we calculate two moving averages:
// 1) shortMavg on a window of 10 sec and 
// 2) longMavg on a window of 60
analytics : select timestamp,
                   price, 
                   shortMavg:mavg[10;price],
                   longMavg:mavg[60;price
            from t

In the demo we have set a 10-period and 60-period for our fast and slow simple moving average. These updates will be streamed into our chart. In a real use case scenario, the periods used by the fast and slow simple moving average would be back-tested to maximize returns and reduce transactions costs.

Notice how you can zoom into a particular trading time window for a deeper look at the data trend at a point in time.

Positions

Next, let's create a new variable called positions that uses the data derived from the previous analytics table that will help indicate when to buy and sell our assets.

// when shortMavg and long Mavg cross each other we create the position 
// 1) +1 to indicate the signal to buy the asset or
// 2) -1 to indicate the signal to sell the asset
positions : select timestamp,
                   position:?[shortMavg<longMavg;-1;1],
                   return:log[price % prev price] 
                from analytics

q/kdb+ functions explained

If any of the above code is new to you don't worry, we have detailed the q functions used above:

?[x;y;z] when x is true return y, otherwise return z
x<y returns true when x is less than y, otherwise return false
log to return the natual logarithm
x%y divides x by y

Active vs. Passive Strategy

Active strategy requires someone to actively manage a fund or account, while passive investing involves tracking a major index like the S&P 500 or a basket of stocks, such as an ETF. Find out more about each, including their pros and cons, on Investopedia.

In this recipe we will look to determine if our active strategy performs better than our passive one. To do this we will add new indicators that compare the two and save it a variable called performance.

performance : select timestamp, 
                     benchmark: exp sums return, 
                     strategy: exp sums return * prev position 
                from positions

q/kdb+ functions explained

If any of the above code is new to you don't worry, we have detailed the q functions used above:

exp raise e to a power where e is the base of natual logarithms
sums calculates the cumulative sum
* to multiply
prev returns the previous item in a list

As you can see in the chart visualization, the active strategy in red outperforms the passive benchmark position in blue, suggesting the potential for greater returns using an active strategy.

Step 5: Reporting

In the previous section we saw that we can create new variables from within the Query window and visualize on the fly there.

There is also a further reporting suite that can be accessed selecting Reports from the Overview pane. Visualizations can be built using drag-and-drop and point-and-click operations in an easy to use editor.

Reports can only use data that has been persisted to a database

By default Reports can only access data that has been saved to an existing database. So in order to use derived data like the analytics created in the previous section, we must first save the output to a database.

Persisting Analytics

In cases where we want the results of our scratchpad analytics to persist we use a pipeline. Better still, we can do this by simply amending our earlier pipeline from Step 2.

Start by opening the pipeline created in Step 2 - we will be adding some new nodes to it.

Our desired output is to have one table of trade price data, and a second analytics table with our position signals.

Split Data

The first step is to split the incoming feed. We will do this by adding the Split node just before the schema node in the pipeline.

This node does not require any configuration.

The first branch of the node will continue as before to feed into the Apply Schema node. But details for building our second branch now follow.

This would be a good time to select "Save As" and give your pipeline a new name.

Add Timer Window

Next, we will add a Timer Window node to aggregate the incoming data by time. Data aggregation helps summarize data into a summary form to make it easier to understand and consume by the end user.

We will set the period to be every 5 seconds (0D00:00:05), this will be how often the window should fire.

Derivation of Analytics

This is the step in which we add all our business logic that is needed, we will use a Map node here and add the following code:

// the following code is calculating the fields:
// 1) vwap = weighted average price, adjusted by volume, over a given time period
// 2) twap = weighted average price over a given time period
// 3) open = price at which a security first trades when an exchange opens for the day
// 4) high = highest price security is traded at
// 5) low = lowest price security is traded at
// 6) close = price at which a security last trades when an exchange closes for the day
// 7) volume = volume of trades places
{[data]
    select last timestamp, 
           vwap:|[1;volume] wavg price,
           twap:last[price]^(next[timestamp]-timestamp) wavg price,
           open:first price,
           high:max price,
           low:min price,
           close: last price 
        from update "P"$timestamp,
             volume:|[1;volume] 
    from data
    }

kdb+/q functions explained

If any of the above code is new to you don't worry, we have detailed the q functions used above:

first/last to get first/last record
| to return the greater number between 1 and volume
^ to replace any nulls with the previous price
wavg to get weighted average
- to subtract
max to get maximum value
min to get minimum value
$"P" to cast to timestamp datatype

For those that are interested in learning more and getting a quick and a practical introduction to the q language, check out our Introduction to KX Workshop.

Transform, Write & Deploy

In this final step we will add Transform and Writer nodes, as before to apply our schema and write data down. These steps will be the same as in the previous section.

This time ensure to select to the analytics table in the equities database.

Finally, you can Save your pipeline then you can select Deploy.

Once deployed you can check on the progress of your pipeline back in the Overview pane where you started. When it reaches Status=Running then it is done and your data should be loading in real time to the trade and analytics tables.

Reporting

Now that we have successfully saved down our analytics table we can use the Reports to build some more advanced visualizations.

The Free Trial version of kdb Insights Enterprise comes with a ready-made completed Report called Equities accessible from the left handside toolbar. You can simply open this.

The Equities Report highlights the live Kafka stream and the VWAP, high, low and close data that was derived from the pipelines created earlier in this recipe.

The report also includes daily historical data loaded from our Azure Blob Storage meaning we have both streaming live tick data alongside historical data for the big picture view of the market.

The Signals Report shows the position of the trade; green if Long (a buyer) or red if Short (a seller). The trade position is also illustrated by the color coding in the table on the left and in the mini- bar and bubble chart top and bottom of the central price chart. The central price chart shows the Long and Short signal lines overlaying price; a trade is initiated when these lines cross.

Signals

Expression

Create your own Report

Note, that you can also create your own new Report by selecting the plus icon at the top toolbar and select Report.

Please see our documentation for guidelines on Report creation and design.

You can also download the completed report Equities.json and drop it into a new blank Report.