A Brief Introduction to kdb+

From Kx Wiki
(Difference between revisions)
Jump to: navigation, search
m (Copied over from Fintan's pdf)
 
(Creating a table)
 
(8 intermediate revisions by 3 users not shown)
Line 1: Line 1:
=Introduction to kdb+=
 
 
==Introduction==
 
==Introduction==
 
kdb+ is a powerful database that can be used for streaming, real-time and historical data. q is the SQL-like general purpose programming language built on top of kdb+ that offers high-performance, in-database analytic capabilities.
 
kdb+ is a powerful database that can be used for streaming, real-time and historical data. q is the SQL-like general purpose programming language built on top of kdb+ that offers high-performance, in-database analytic capabilities.
 
==Installation==
 
==Installation==
The kdb+ 32-bit trial version can be downloaded from the Kx website. There are installation instructions in the README.txt contained in the zip download file. For Unix environments it is recommended to download a command line wrapper such as rlwrap.
+
The kdb+ 32-bit trial version can be [http://kx.com/software-download.php downloaded] from the Kx website. See [[Tutorials/Installation|Installation]] for a step-by-step guide on installing kdb+.
 
==Starting kdb+==
 
==Starting kdb+==
 
At the shell prompt, type q to start a q console session where the prompt q) will appear.
 
At the shell prompt, type q to start a q console session where the prompt q) will appear.
<pre>
+
<pre style="white-space: pre-wrap;
 +
white-space: -moz-pre-wrap;
 +
white-space: -pre-wrap;
 +
white-space: -o-pre-wrap;
 +
word-wrap: break-word;">
 
bash$q
 
bash$q
 
q)
 
q)
Line 12: Line 15:
 
==Creating a table==
 
==Creating a table==
 
To begin learning q/kdb+ we will create a simple table. To do this please type or copy the below code into your q session. Please ensure to remove the leading q) from these code snippets.
 
To begin learning q/kdb+ we will create a simple table. To do this please type or copy the below code into your q session. Please ensure to remove the leading q) from these code snippets.
<pre>
+
<pre style="white-space: pre-wrap;
 +
white-space: -moz-pre-wrap;
 +
white-space: -pre-wrap;
 +
white-space: -o-pre-wrap;
 +
word-wrap: break-word;">
 
q)n:1000000;
 
q)n:1000000;
q)item:`apple `banana `orange `pear;
+
q)item:`apple`banana`orange`pear;
q)city:`beijing `chicago `london `paris;
+
q)city:`beijing`chicago`london`paris;
q)tab:([]time:asc n?0D0;n?item;amount:n?100;n?city);
+
q)tab:([]time:asc n?0D0;n?item;amount:n?100;n?city);
 
</pre>
 
</pre>
 
This code creates a table called tab which contains 1 million rows and 4 columns of random time-series sales data. For now, understanding these line of code is not important.
 
This code creates a table called tab which contains 1 million rows and 4 columns of random time-series sales data. For now, understanding these line of code is not important.
 +
 
==Simple Query==
 
==Simple Query==
 
The first query we run selects all rows from the table where the item sold is a banana.
 
The first query we run selects all rows from the table where the item sold is a banana.
<pre>
+
<pre style="white-space: pre-wrap;
q)select from tab where item=`banana
+
white-space: -moz-pre-wrap;
time item amount city
+
white-space: -pre-wrap;
 +
white-space: -o-pre-wrap;
 +
word-wrap: break-word;">
 +
q)select from tab where item=`banana                                          
 +
time                 item   amount city  
 
------------------------------------------
 
------------------------------------------
0D00:00:00.048360228 banana 62 beijing
+
0D00:00:00.466201454 banana 31    london
0D00:00:00.159745663 banana 27 london
+
0D00:00:00.712388008 banana 86    london
0D00:00:00.480262935 banana 40 london
+
0D00:00:00.952962040 banana 20    london  
0D00:00:00.548035651 banana 32 chicago
+
0D00:00:01.036425679 banana 49    chicago
 +
0D00:00:01.254006475 banana 94    beijing
 
..
 
..
 
</pre>
 
</pre>
Line 36: Line 49:
 
==Aggregate Query==
 
==Aggregate Query==
 
The next query will calculate the sum of the amounts sold of all items by each city.
 
The next query will calculate the sum of the amounts sold of all items by each city.
<pre>
+
<pre style="white-space: pre-wrap;
q)select sum amount by city from tab
+
white-space: -moz-pre-wrap;
city | amount
+
white-space: -pre-wrap;
 +
white-space: -o-pre-wrap;
 +
word-wrap: break-word;">
 +
q)select sum amount by city from tab                                          
 +
city   | amount 
 
-------| --------
 
-------| --------
beijing| 12418161
+
beijing| 12398569
chicago| 12342736
+
chicago| 12317015
london | 12367712
+
london | 12375412
paris | 12383797
+
paris | 12421447
 
</pre>
 
</pre>
  
Line 49: Line 66:
 
==Time-Series Aggregate Query==
 
==Time-Series Aggregate Query==
 
The following query shows the sum of the amount of each item sold by hour during the day.
 
The following query shows the sum of the amount of each item sold by hour during the day.
<pre>
+
<pre style="white-space: pre-wrap;
q)select sum amount by time.hh,item from tab
+
white-space: -moz-pre-wrap;
hh item | amount
+
white-space: -pre-wrap;
---------| ------
+
white-space: -o-pre-wrap;
0 apple | 526212
+
word-wrap: break-word;">
0 banana| 507660
+
q)select sum amount by time.hh,item from tab                                  
0 orange| 502360
+
hh item | amount
0 pear | 511619
+
---------| ------
 +
0 apple | 522704
 +
0 banana| 506947
 +
0 orange| 503054
 +
0 pear | 515212
 +
1  apple | 513723
 
..
 
..
 
</pre>
 
</pre>
Line 62: Line 84:
  
 
Congratulations, you have now successfully created and queried your first kdb+ table!
 
Congratulations, you have now successfully created and queried your first kdb+ table!
 +
 +
==What's Next?==
 +
Have a look at some more complex [[Sample_Queries|in-memory queries]].

Latest revision as of 14:58, 16 June 2016

Contents

Introduction

kdb+ is a powerful database that can be used for streaming, real-time and historical data. q is the SQL-like general purpose programming language built on top of kdb+ that offers high-performance, in-database analytic capabilities.

Installation

The kdb+ 32-bit trial version can be downloaded from the Kx website. See Installation for a step-by-step guide on installing kdb+.

Starting kdb+

At the shell prompt, type q to start a q console session where the prompt q) will appear.

bash$q
q)

Creating a table

To begin learning q/kdb+ we will create a simple table. To do this please type or copy the below code into your q session. Please ensure to remove the leading q) from these code snippets.

q)n:1000000;
q)item:`apple`banana`orange`pear;
q)city:`beijing`chicago`london`paris;
q)tab:([]time:asc n?0D0;n?item;amount:n?100;n?city);

This code creates a table called tab which contains 1 million rows and 4 columns of random time-series sales data. For now, understanding these line of code is not important.

Simple Query

The first query we run selects all rows from the table where the item sold is a banana.

q)select from tab where item=`banana                                            
time                 item   amount city   
------------------------------------------
0D00:00:00.466201454 banana 31     london 
0D00:00:00.712388008 banana 86     london 
0D00:00:00.952962040 banana 20     london 
0D00:00:01.036425679 banana 49     chicago
0D00:00:01.254006475 banana 94     beijing
..

Please note that all columns in the table are returned in the result when there is no column explicitly mentioned.

Aggregate Query

The next query will calculate the sum of the amounts sold of all items by each city.

q)select sum amount by city from tab                                            
city   | amount  
-------| --------
beijing| 12398569
chicago| 12317015
london | 12375412
paris  | 12421447

This uses the aggregate function sum within the q language. Please note that this returns a keyed table where the key column is city. This key column is returned in alphabetical order automatically by kdb+.

Time-Series Aggregate Query

The following query shows the sum of the amount of each item sold by hour during the day.

q)select sum amount by time.hh,item from tab                                    
hh item  | amount
---------| ------
0  apple | 522704
0  banana| 506947
0  orange| 503054
0  pear  | 515212
1  apple | 513723
..

The result is a keyed table with two key columns, hh for the hour and item. The results are ordered by the keyed columns. This query extracts the hour portion from the nanosecond-precision time column by adding a .hh to the column name.

Congratulations, you have now successfully created and queried your first kdb+ table!

What's Next?

Have a look at some more complex in-memory queries.

Personal tools
Namespaces
Variants
Actions
Navigation
Print/export
Toolbox