Programming idioms

How do I select all the columns of a table except one?

If the table has many columns, it is tedious and error-prone to write

q)select c1,c2,... from t

In q we can write

q)delete somecolumn from t

Here, delete does not modify the table t in place.

Watch out

delete does not work on historical databases.

How do I select a range of rows in a table by position?

You can use the row index i.

q)select from tab where i within 42 57

How do I extract the milliseconds from a time?

Given

q)time: 09:10:52.139

we can extract hour, minute and second like this

q)(time.hh; time.mm; time.ss)
9 10 52

To query the milliseconds

q)time mod 1000
139

How do I populate a table with random data?

This is useful for testing. The ? operator can be used to generate random values of a given type.

q)n:1000
q)stocks:`goog`amzn`msft`intel`amd`ibm
q)trade:([]stock:n?stocks; price:n?100.0;amount:100*10+n?20;time:n?24:00:00.000)
q)trade
stock price     amount time
-----------------------------------
ibm   94.1497   2800   10:45:14.943
amzn  96.1774   2800   03:17:33.371
ibm   3.95321   2200   04:53:09.818
goog  11.10307  1000   19:27:15.894
msft  64.73216  1900   17:32:42.558
intel 19.51964  2600   05:05:58.680
ibm   10.18318  1000   05:47:46.437
...

How do I get the hourly lowest and highest price?

This query

q)select price by stock,time.hh from trade

groups the prices hourly by stock

stock hh| price                                                              ..
--------| -------------------------------------------------------------------..
amd   0 | 76.03805 3.632539 16.6526 77.27191 79.27451 7.501845               ..
amd   1 | 16.35361 85.9618 27.60804 61.91134 6.921016 84.99082 50.05533 81.10..
amd   2 | 47.61621 15.33209 23.64018 88.34472                                ..
...

The price column is a list. To get the low and high prices, we need to apply list functions that produce that result:

q)select low: min price,high: max price by stock,time.hh from trade
stock hh| low       high
--------| ------------------
amd   0 | 3.632539  79.27451
amd   1 | 6.921016  85.9618
amd   2 | 15.33209  88.34472
...

How can I extract the time of the lowest and highest prices?

This query makes use of where to extract the time matching the high/low prices:

q)t: `time xasc ([] time:09:30:00.0+100000?23000000; sym:100000?`AAPL`GOOG`IBM; price:50+(floor (100000?0.99)*100)%100)
q)select high:max price,low:min price,time_high:first time where price=max price,time_low:first time where price=min price by sym,time.hh from t
sym  hh| high  low time_high    time_low    
-------| -----------------------------------
AAPL 9 | 50.98 50  09:31:01.254 09:32:19.141
AAPL 10| 50.98 50  10:01:24.975 10:04:21.228
AAPL 11| 50.98 50  11:00:04.438 11:00:19.517
AAPL 12| 50.98 50  12:01:24.891 12:01:09.768
AAPL 13| 50.98 50  13:00:35.044 13:01:15.162
AAPL 14| 50.98 50  14:02:37.634 14:01:06.998
AAPL 15| 50.98 50  15:00:42.958 15:01:24.288
GOOG 9 | 50.98 50  09:30:21.404 09:30:12.264
..

How do I extract regular time series from observed quotes?

Given this table containing observed quotes:

q)t: `time xasc ([] time:09:30:00.0+100000?23000000; sym:100000?`AAPL`GOOG`IBM; bid:50+(floor (100000?0.99)*100)%100; ask:51+(floor (100000?0.99)*100)%100);
q)t
time         sym  bid   ask  
-----------------------------
09:30:00.143 IBM  50.75 51.09
09:30:00.192 IBM  50.03 51.56
09:30:00.507 GOOG 50.23 51.47
09:30:00.540 IBM  50.49 51.22
..

We can extract the last observation of each ‘second’ time period:

q)`second xasc select last bid,last last ask by sym,1 xbar time.second from select from t
sym  second  | bid   ask  
-------------| -----------
AAPL 09:30:00| 50.45 51.4 
GOOG 09:30:00| 50.43 51.04
IBM  09:30:00| 50.49 51.22
AAPL 09:30:01| 50.68 51.11
..

However this solution will skip periods where there were no observations. For example, in the table generated for this document there were no observations for AAPL and IBM at 09:30:03:

..
IBM  09:30:02| 50.39 51.15
GOOG 09:30:03| 50.26 51.94
AAPL 09:30:04| 50.13 51.66
GOOG 09:30:04| 50.07 51.62
IBM  09:30:04| 50.61 51.14
..

A better solution would be to use aj:

q)res: aj[`sym`time;([]sym:`AAPL`IBM`GOOG) cross ([]time:09:30:00+til `int$(16:00:00 - 09:30:00)); select `second$time,sym,bid,ask from t]
q)`time xasc res
sym  time     bid   ask  
-------------------------
AAPL 09:30:00 50.45 51.4 
IBM  09:30:00 50.49 51.22
GOOG 09:30:00 50.43 51.04
AAPL 09:30:01 50.68 51.11
IBM  09:30:01 50.2  51.48
GOOG 09:30:01 50.59 51.72
AAPL 09:30:02 50.3  51.54
IBM  09:30:02 50.39 51.15
GOOG 09:30:02 50.74 51.09
AAPL 09:30:03 50.3  51.54
IBM  09:30:03 50.39 51.15
GOOG 09:30:03 50.26 51.94
AAPL 09:30:04 50.13 51.66
IBM  09:30:04 50.61 51.14
GOOG 09:30:04 50.07 51.62
..

When a millisecond resolution is required, this solution might offer better performances:

q)aapl:select from t where sym=`AAPL
q)res:([]time),'(`bid`ask#aapl) -1+where deltas @[;count[d]-1;:;count time]d:(time:09:30:00.0+til`int$10:00:00.0-09:30:00.0) bin aapl`time
q)select from res where time>09:31:00
time         bid   ask  
------------------------
09:31:00.001 50.41 51.13
09:31:00.002 50.41 51.13
09:31:00.003 50.41 51.13
09:31:00.004 50.41 51.13
09:31:00.005 50.41 51.13
..

How do I select the last n observations for each sym?

A couple of solutions using the same table than in the previous examples:

q)ungroup select sym,-3#'time,-3#'bid,-3#'ask from select time,bid,ask by sym from t where time<15:00:00
sym  time         bid   ask  
-----------------------------
AAPL 14:59:58.564 50.94 51.28
AAPL 14:59:59.450 50.54 51.17
AAPL 14:59:59.650 50.42 51.87
GOOG 14:59:59.159 50.42 51.41
GOOG 14:59:59.302 50.52 51.66
GOOG 14:59:59.742 50.52 51.25
IBM  14:59:56.439 50.01 51.81
IBM  14:59:56.556 50.38 51.33
IBM  14:59:57.116 50.96 51.45
q)select sym,time,bid,ask from t where time<15:00:00,3>(idesc;i) fby sym
sym  time         bid   ask  
-----------------------------
IBM  14:59:56.439 50.01 51.81
IBM  14:59:56.556 50.38 51.33
IBM  14:59:57.116 50.96 51.45
AAPL 14:59:58.564 50.94 51.28
GOOG 14:59:59.159 50.42 51.41
GOOG 14:59:59.302 50.52 51.66
AAPL 14:59:59.450 50.54 51.17
AAPL 14:59:59.650 50.42 51.87
GOOG 14:59:59.742 50.52 51.25

How do I calculate vwap series?

Use xbar and wavg:

q)t: `time xasc ([] time:09:30:00.0+100000?23000000; sym:100000?`AAPL`GOOG`IBM; price:50+(floor (100000?0.99)*100)%100; size:10*100000?100);
q)aapl: select from t where sym=`AAPL,size>0
q)select vwap:size wavg price by 5 xbar time.minute from aapl 
minute| vwap     
------| ---------
09:30 | 50.474908
09:35 | 50.461356
09:40 | 50.46645 
09:45 | 50.493585
09:50 | 50.48062 
..

Q for Mortals 9.13.4 Meaty Queries

How do I extract regular-size vwap series?

One quick solution is to use xbar with the running size sum to average the wanted periods:

q)t:`time xasc ([]time:N?.z.t;price:0.01*floor 100*10 + N?1.0;size:N?500)
q)select time:last time ,vwap: size wavg price by g:1000 xbar sums size from t
g    | time         vwap     
-----| ----------------------
0    | 00:41:19.862 10.771082
1000 | 01:25:20.920 10.656847
2000 | 02:15:20.433 10.522944
3000 | 02:35:18.721 10.295203
4000 | 02:58:30.142 10.320519
5000 | 05:54:20.645 10.564838
..

However this solution approximates regular trade series: the last trade of each period can overflow the requested size (1000 in the examples) and the overflowing amount will not be used in the following period. This query highlights the problem:

q)update total:sum each size from select time:last time,vwap: size wavg price,size by g:1000 xbar sums size from t
g    | time         vwap      size                  total
-----| --------------------------------------------------
0    | 00:41:19.862 10.771082 409 297 80 59 116     961  
1000 | 01:25:20.920 10.656847 126 451 244 149 29    999  
2000 | 02:15:20.433 10.522944 85 417 107 50         659  
3000 | 02:35:18.721 10.295203 422 477 477           1376 
4000 | 02:58:30.142 10.320519 408 19 305            732  
5000 | 05:54:20.645 10.564838 406 277 58 106 323    1170 
..

A more precise (and elaborate) solution consist of splitting the edge trades into two parts so that each size bar sum up to the requested even amount:

rvwap : { [t;size_par]
 // add the bucket and the total size
 t:update bar:size_par xbar tot from update tot:sums size from t;
 // get the indices where the bar changes
 ind:where differ t`bar;
 // re-index t, and sort (duplicate the rows where the bucket changes)
 t:t asc (til count t),ind;
 // shift all the indices due to the table now being larger
 ind:ind+til count ind;
 // update the size in the first trade of the new window and modify the bar
 t:update size:size-tot-bar,bar:size_par xbar tot-size from t where i in ind;
 // update the size in the second trade of the new window
 t:update size:tot-bar from t where i in 1+ind;
 // calculate stats
 select last time,price:size wavg price,sum size by bar from t
 }

q)rvwap[t;1000]
bar  | time         price    size
-----| --------------------------
0    | 00:44:53.037 10.76246 1000
1000 | 01:29:02.279 10.64794 1000
2000 | 02:25:13.605 10.49144 1000
3000 | 02:52:50.130 10.24878 1000
4000 | 03:11:20.253 10.37125 1000
5000 | 06:01:35.780 10.57725 1000
..

How do I apply a function to a sequence sliding window?

Use the scan adverb to build a moving list adding one item at the time from the original table, and the _ drop operator to discard older values.

This example calculates the moving average for a sliding window of 3 items. Note how the second parameter sets the size of the sliding window.

q)swin:{[f;w;s] f each { 1_x,y }\[w#0;s]}
q)swin[avg; 3; til 10]
0 0.33333333 1 2 3 4 5 6 7 8
q)// trace the sliding window
q)swin[0N!; 3; til 10]
0 0 0
0 0 1
0 1 2
1 2 3
2 3 4
3 4 5
..

A different approach based on prev, inserting 0N at the beginning of the window rather than 0:

q)swin2:fwv:{x/'[flip reverse prev\[y-1;z]]}
q)swin2[avg;3;til 10]
0 0.5 1 2 3 4 5 6 7 8
q)swin2[::;3;til 10]
    0
  0 1
0 1 2
1 2 3
2 3 4
3 4 5
..

swin2:fwv finds all the windows using prev and converge-iterate. To find windows of size 3:

q)prev\[2;x:3 5 7 2 4 3 7]
3 5 7 2 4 3 7
  3 5 7 2 4 3
    3 5 7 2 4
q)flip reverse prev\[2;x]
    3 
  3 5
3 5 7
5 7 2
7 2 4
2 4 3
4 3 7
q)max each flip prev\[2;x]
3 5 7 7 7 4 7
q)(3 mmax x)~max each flip prev\[2;x]
1b

A third, more memory efficient function, m is:

q)m:{last{(a;x 1;x[2],z y x[1]+a:1+x 0)}[;z;x]/[n:count z;(0-y;til y;())]}

For larger windows, time and space may be important...

q)\ts swin[max;1000;10000?10]
71 82473552
q)\ts 1000 mmax 10000?10
76 524592
q)\ts m[max;1000;10000?10]
205 401984
q)\ts fwv[max;1000;10000?10]
491 213061888

…c.f. smaller window

q)\ts w mmax v
1 393440
q)\ts fwv[f:max;w:10;v:10000?10]
6 2656576
q)\ts swin[f;w;v]
12 1702416
q)\ts m[f;w;v]
50 262784

To index:

q)w:{(til[count z]-m)+x each flip reverse prev\[m:y-1;z]}
q)x w[{x?max x};3;x]
3 5 7 7 7 4 7

which is useful for addressing other lists:

q)update top:date w[{x?max x};3;volume] from ([]date:2000.01.01+til 7;volume:x)
date       volume top
----------------------------
2000.01.01 3      2000.01.01
2000.01.02 5      2000.01.02
2000.01.03 7      2000.01.03
2000.01.04 2      2000.01.03
2000.01.05 4      2000.01.03
2000.01.06 3      2000.01.05
2000.01.07 7      2000.01.07

Grouping over non-primary keys

Consider a table with multiple entries for the same value of the stock column.

q)table:([]stock:(); price:())
q)insert[`table; (`ibm`ibm`ibm`ibm`intel`intel`intel`intel; 1 2 3 4 1 2 3 4)]
0 1 2 3 4 5 6 7
q)table
stock price
-----------
ibm   1
ibm   2
ibm   3
ibm   4
intel 1
intel 2
intel 3
intel 4

If we select and group by stock ...

q)select by stock from table
stock| price
-----| -----
ibm  | 4
intel| 4

… only the last price is in the result.

What if we want the first price? We reverse the table.

q)select by stock from reverse table
stock| price
-----| -----
ibm  | 1
intel| 1

What if we want all the prices as a list? We use xgroup.

q)`stock xgroup table
stock| price
-----| -------
ibm  | 1 2 3 4
intel| 1 2 3 4

Getting the contents of the columns of a table

Imagine that we want to save a CSV file without the first row that contains the names of the table columns. This can be done by extracting the columns from a table, without the row name, and then saving in CSV format.

Here, we assume that tables are not keyed. We’ll use this table as our example:

q)trade
date       open  high  low   close volume   sym
------------------------------------------------
2006.10.03 24.5  24.51 23.79 24.13 19087300 AMD
2006.10.03 27.37 27.48 27.21 27.37 39386200 MSFT
2006.10.04 24.1  25.1  23.95 25.03 17869600 AMD
2006.10.04 27.39 27.96 27.37 27.94 82191200 MSFT
2006.10.05 24.8  25.24 24.6  25.11 17304500 AMD
2006.10.05 27.92 28.11 27.78 27.92 81967200 MSFT
2006.10.06 24.66 24.8  23.96 24.01 17299800 AMD
2006.10.06 27.76 28    27.65 27.87 36452200 MSFT

There are two ways of extracting columns from a table. The first one uses indexing by the column names:

q)cols trade
`date`open`high`low`close`volume`sym
q)trade (cols trade)
2006.10.03 2006.10.03 2006.10.04 2006.10.04 2006.10.05 2006.10.05 2006.10.06 ..
24.5       27.37      24.1       27.39      24.8       27.92      24.66      ..
24.51      27.48      25.1       27.96      25.24      28.11      24.8       ..
23.79      27.21      23.95      27.37      24.6       27.78      23.96      ..
24.13      27.37      25.03      27.94      25.11      27.92      24.01      ..
19087300   39386200   17869600   82191200   17304500   81967200   17299800   ..
AMD        MSFT       AMD        MSFT       AMD        MSFT       AMD        ..

The second one turns the table into a dictionary and then gets the values

q)value flip trade
2006.10.03 2006.10.03 2006.10.04 2006.10.04 2006.10.05 2006.10.05 2006.10.06 ..
24.5       27.37      24.1       27.39      24.8       27.92      24.66      ..
24.51      27.48      25.1       27.96      25.24      28.11      24.8       ..
23.79      27.21      23.95      27.37      24.6       27.78      23.96      ..
24.13      27.37      25.03      27.94      25.11      27.92      24.01      ..
19087300   39386200   17869600   82191200   17304500   81967200   17299800   ..
AMD        MSFT       AMD        MSFT       AMD        MSFT       AMD        ..

Now we can save in CSV format without the column names:

q)t: trade (cols trade)
q)save `:t.csv
`:t.csv
q)\cat t.csv
"2006-10-03,24.5,24.51,23.79,24.13,19087300,AMD"
"2006-10-03,27.37,27.48,27.21,27.37,39386200,MSFT"
"2006-10-04,24.1,25.1,23.95,25.03,17869600,AMD"
...

Using flip is more efficient (constant time):

q)\t do[1000000; value flip trade]
453
q)\t do[1000000; trade (cols trade)]
2984

However, for splayed tables, only indexing works:

C:\>.\q.exe dir
KDB+ 2.4t 2006.07.27 Copyright (C) 1993-2006 Kx Systems
w32/ 1cpu 384MB ...
q)\v
`s#`sym`trade
q)trade (cols trade)
2006.10.03 2006.10.03 2006.10.04 2006.10.04 2006.10.05 2006.10.05 2006.10.06 ..
24.5       27.37      24.1       27.39      24.8       27.92      24.66      ..
24.51      27.48      25.1       27.96      25.24      28.11      24.8       ..
23.79      27.21      23.95      27.37      24.6       27.78      23.96      ..
24.13      27.37      25.03      27.94      25.11      27.92      24.01      ..
19087300   39386200   17869600   82191200   17304500   81967200   17299800   ..
AMD        MSFT       AMD        MSFT       AMD        MSFT       AMD        ..
q)value flip trade
`:trade/

Column names as parameters to functions

Column names cannot be arguments to parameterized queries. A type error is signalled when a query like that is applied:

q)f:{[tbl; col; amt] select from tbl where col > amt}
q)f[trade][`volume][10000]
{[tbl; col; amt] select from tbl where col > amt}
'type

However, a query can also be built from a string. This allows column names to be used as query parameters. The above parameterized query can be written using strings:

q)f:{[tbl; col; amt] value "select from ", (string tbl), " where ", (string col), " > ", string amt}
q)f[`trade][`volume][10000]
date       open  high  low   close volume   sym
------------------------------------------------
2006.10.03 24.5  24.51 23.79 24.13 19087300 AMD
2006.10.03 27.37 27.48 27.21 27.37 39386200 MSFT
...

Executing queries built from strings has a performance penalty.

q)\t do[100000; select from table where volume > 10000]
234
q)\t do[100000; value "select from ", (string `trade), " where ", (string `volume), " > ", string 10000]
1250

An alternative to building queries by concatenating strings is to use parse trees.

q)f:{[tbl;col;amt] ?[tbl; enlist (>;col;amt); 0b; ()]}
q)f[trade; `volume; 10000]
date       open  high  low   close volume   sym
------------------------------------------------
2006.10.03 24.5  24.51 23.79 24.13 19087300 AMD
2006.10.03 27.37 27.48 27.21 27.37 39386200 MSFT

This is more efficient than using strings:

q)\t do[100000; ?[trade; enlist (>;`volume;10000); 0b; ()]]
312

Note

When f builds the query from

  • strings, tbl is passed by reference, e.g. `trade
  • parse trees, tbl is passed by value, e.g. trade

Glossary for passing by reference and value