A Brief Introduction to kdb+
Introduction to kdb+
kdb+ is a powerful database that can be used for streaming, real-time and historical data. q is the SQL-like general purpose programming language built on top of kdb+ that offers high-performance, in-database analytic capabilities.
The kdb+ 32-bit trial version can be downloaded from the Kx website. There are installation instructions in the README.txt contained in the zip download file. For Unix environments it is recommended to download a command line wrapper such as rlwrap.
At the shell prompt, type q to start a q console session where the prompt q) will appear.
Creating a table
To begin learning q/kdb+ we will create a simple table. To do this please type or copy the below code into your q session. Please ensure to remove the leading q) from these code snippets.
q)n:1000000; ￼￼￼￼q)item:`apple `banana `orange `pear; q)city:`beijing `chicago `london `paris;￼￼￼￼￼￼￼ ￼q)tab:(time:asc n?0D0;n?item;amount:n?100;n?city);
This code creates a table called tab which contains 1 million rows and 4 columns of random time-series sales data. For now, understanding these line of code is not important.
The first query we run selects all rows from the table where the item sold is a banana.
q)select from tab where item=`banana time item amount city ------------------------------------------ ￼￼￼￼￼￼0D00:00:00.048360228 banana 62 beijing 0D00:00:00.159745663 banana 27 london￼￼￼￼￼ 0D00:00:00.480262935 banana 40 london 0D00:00:00.548035651 banana 32 chicago￼￼￼￼￼￼ ..
Please note that all columns in the table are returned in the result when there is no column explicitly mentioned.
The next query will calculate the sum of the amounts sold of all items by each city.
q)select sum amount by city from tab city | amount￼￼￼￼￼￼ -------| -------- beijing| 12418161 ￼￼￼￼￼￼chicago| 12342736 london | 12367712￼￼￼￼￼￼ paris | 12383797
This uses the aggregate function sum within the q language. Please note that this returns a keyed table where the key column is city. This key column is returned in alphabetical order automatically by kdb+.
Time-Series Aggregate Query
The following query shows the sum of the amount of each item sold by hour during the day.
q)select sum amount by time.hh,item from tab hh item | amount ---------| ------￼￼￼￼￼￼ 0 apple | 526212 0 banana| 507660￼￼￼￼￼￼ 0 orange| 502360 0 pear | 511619￼￼￼￼￼￼ ..
The result is a keyed table with two key columns, hh for the hour and item. The results are ordered by the keyed columns. This query extracts the hour portion from the nanosecond-precision time column by adding a .hh to the column name.
Congratulations, you have now successfully created and queried your first kdb+ table!