On Disk Queries

From Kx Wiki
Revision as of 13:23, 30 May 2014 by Jonny Press (Talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Contents

Introduction

These scripts allow you to build an on-disk database and run some queries against it. The database is randomly generated utility (smart meter) data for different customers in different regions and industry sectors, along with some associated payment information. The idea is to allow you to see some of the language and performance. There is some more information in the README file.

Building The Database

The database is built by running the buildsmartmeterdb.q script. You can very the number of days of data to build, and the number of customer records per day. When you run the script, some information will be printed. Type

go[]

to proceed.

jonny$ q buildsmartmeterdb.q 
KDB+ 3.1 2014.05.03 Copyright (C) 1993-2014 Kx Systems

This process is set up to save a daily profile across 61 days for 100000 random customers with a sample every 15 minute(s).
This will generate 9.60 million rows per day and 585.60 million rows in total
Uncompressed disk usage will be approximately 224 MB per day and 13664 MB in total
Compression is switched OFF
Data will be written to :./smartmeterDB

To modify the volume of data change either the number of customers, the number of days, or the sample period of the data.  Minimum sample period is 1 minute
These values, along with compression settings and output directory, can be modified at the top of this file (buildsmartmeterdb.q)

To proceed, type go[]
q)go[]                                                                                                    
2014.05.15T09:17:42.271 Saving static data table to :./smartmeterDB/static
2014.05.15T09:17:42.291 Saving pricing tables to :./smartmeterDB/basicpricing and :./smartmeterDB/timepricing
2014.05.15T09:17:42.293 Generating random data for date 2013.08.01
2014.05.15T09:17:47.011 Saving to hdb :./smartmeterDB
2014.05.15T09:17:47.775 Save complete
2014.05.15T09:17:47.776 Generating random data for date 2013.08.02
2014.05.15T09:17:52.459 Saving to hdb :./smartmeterDB
2014.05.15T09:17:53.196 Save complete
2014.05.15T09:17:53.196 Generating random data for date 2013.08.03
2014.05.15T09:17:58.006 Saving to hdb :./smartmeterDB
2014.05.15T09:17:58.734 Save complete
2014.05.15T09:17:58.734 Generating random data for date 2013.08.04
2014.05.15T09:18:03.438 Saving to hdb :./smartmeterDB
2014.05.15T09:18:04.194 Save complete

... snip ...

2014.05.15T09:23:09.689 Generating random data for date 2013.09.30
2014.05.15T09:23:14.503 Saving to hdb :./smartmeterDB
2014.05.15T09:23:15.266 Save complete
2014.05.15T09:23:15.271 Saving payment table to :./smartmeterDB/payment/


2014.05.15T09:23:15.281 HDB successfully built in directory :./smartmeterDB
2014.05.15T09:23:15.281 Time taken to generate and store 585.60 million rows was 00:05:33
2014.05.15T09:23:15.281 or 1.76 million rows per second

Running The Queries

Once the database has been built, you can start the tutorial by running smartmeterdemo.q. This will print an overview of the database, and a lot of information as to how to step through the queries. Each query will show you the code, a sample of the results, and timing and memory usage information for the query. You can also see the code in smartmeterfunctions.q (you will also see comments in the code in this script which should help explain how it works).

jonny$ q smartmeterdemo.q 
KDB+ 3.1 2014.05.03 Copyright (C) 1993-2014 Kx Systems


DATABASE INFO
-------------
This database consists of 5 tables.
It is using 0 slaves.
There are 61 date partitions.

... snip ... 

To start running queries, execute .tut.n[].

... snip ...

Run .tut.help[] to redisplay the below instructions
Start by running .tut.n[].

.tut.n[]     : run the Next example
.tut.p[]     : run the Previous example
.tut.c[]     : run the Current example
.tut.f[]     : go back to the First example
.tut.j[n]    : Jump to the specific example
.tut.db[]    : print out database statistics
.tut.res     : result of last run query
.tut.gcON[]  : turn garbage collection on after each query
.tut.gcOFF[] : turn garbage collection off after each query
.tut.help[]  : display help information
\\           : quit

q).tut.n[]                                                                                                                                                        

**********  Example 0  **********

Total meter usage for every customer over a 10 day period

Function meterusage has definition:

{[startdate; enddate]
 
 start:select first usage by meterid from meter where date=startdate;
 end:select last usage by meterid from meter where date=enddate;
 
 end-start}

2014.05.15T09:27:55.260 Running: meterusage[2013.08.01;2013.08.10]
2014.05.15T09:27:56.817 Function executed in 1557ms using 387.0 MB of memory

Result set contains 100000 rows.
First 10 element(s) of result set:

meterid | usage   
--------| --------
10000000| 3469.449
10000001| 2277.875
10000002| 4656.111
10000003| 2216.527
10000004| 2746.24 
10000005| 2349.073
10000006| 3599.034
10000007| 2450.384
10000008| 1939.314
10000009| 3934.089

Garbage collecting...

*********************************

q)

Experimentation

Please experiment! There are lots of things to try:

UI

The Smart Meter demo includes a basic UI intended for Business Intelligence type usage. It is intended as a simple example of an html5 front end talking directly to the q database. It is not intended as a demonstration of capability in building advanced BI tools or complex GUIs. It is to show the performance of kdb+ slicing and dicing the data in different ways directly from the raw dataset, and the q language - the whole report is done in one function, usagereport. There isn't any caching of data, there aren't any tricks.

The report allows for 3 things:

To access the UI, start the smart meter demo database on port 5600 and point your browser (Chrome or Firefox only please!) at http://localhost:5600/smartmeter.html. If the date range becomes large, the query will take more time and memory. Similarly, grouping by hour can increase the time taken. It will be possible to exceed the 32bit memory limits of the free version of kdb+ when working on large datasets.

Smartmeterui.png

What's Next?

Maybe you want to start building your own database- a good place to start is to load some data from csv.

Or perhaps you want to try one of the Tutorials.

Personal tools
Namespaces
Variants
Actions
Navigation
Print/export
Toolbox