9. Queries: qSQL
9.0 Overview
We call the collection of functions for manipulating tables qSQL, since many of them resemble their SQL counterparts in form or function. The usual suspects such as insert, select, update, are present, as well as functionality that is not available in traditional SQL. But appearances can be deceiving: there are significant differences in the syntax and behavior.
The first important difference is that a q table has ordered rows and columns. This is particularly useful when dealing with records that arrive in time order. Appending them to a table ensures that they enter - and stay - in order. Subsequent select operations retrieve the records in order without any need for sorting.
A second difference is that a q table is stored physically as a collection of column lists. This means that operations on column data are vector operations. Moreover, for simple column lists, atomic, aggregate and uniform functions applied to columns are especially simple and fast since they reduce to direct memory addressing.
A third difference is that qSQL provides upsert semantics. Recall that upsert semantics on a dictionary mean that when a key-value pair is applied with , and the key is present the value is updated; otherwise the pair is inserted. In the context of tables and keyed tables, which are both dictionaries, this has far-reaching consequences for many common operations, including joins. Upsert semantics permeate qSQL.
In this chapter, we cover the important features of qSQL, beginning with a simple examples for each Eventually more complex examples are introduced.
Some examples are based on the sp.q script included in the installation files of earlier versions of q. Here it is in case you do not have it.
s:([s:`s1`s2`s3`s4`s5]
name:`smith`jones`blake`clark`adams;
status:20 10 30 20 30;
city:`london`paris`paris`london`athens)
p:([p:`p1`p2`p3`p4`p5`p6]
name:`nut`bolt`screw`screw`cam`cog;
color:`red`green`blue`red`blue`red;
weight:12 17 17 14 12 19;
city:`london`paris`rome`london`paris`london)
sp:([]
s:`s$`s1`s1`s1`s1`s4`s1`s2`s2`s3`s4`s4`s4; / fkey
p:`p$`p1`p2`p3`p4`p5`p6`p1`p2`p2`p2`p4`p5; / fkey
qty:300 200 400 200 100 100 300 400 200 200 300 400)
You should load and display these tables in your console session to familiarize yourself with them.
9.1 Inserting Records
There are multiple ways to insert – i.e., append – records in q.
Tip
The upsert operator is superior to insert and is to be preferred. We include insert for nostalgia only.
9.1.1 Append Using Amend
Since a table is (logically) a list of records, it is possible to append records in place using ,:. Type checking on field values is performed.
q)t:([] name:`symbol$(); iq:`int$())
q)t,:([name:`Beeblebrox; iq:42])
q)t,:([name:`Dent; iq:98.0])
'type
Amend can also be used with a row of naked field values provided the fields align exactly with the target columns and have the right types.
q)t,:(`Prefect; 126)
Note
Applying amend to a table repeatedly with the same argument results in duplicate records.
q)t,:(`Prefect; 126)
q)t,:(`Prefect; 126)
q)t
name iq
--------------
Beeblebrox 42
Prefect 126
Prefect 126
Prefect 126
You can use amend to append to a keyed table using the full record form, but you will quickly see why no one uses this notation.
q)kt:([eid:`long$()] name:`symbol$(); iq:`long$())
q)kt,:(enlist ([eid:1001]))!enlist ([name:`Beeblebrox; iq:42])
Dictionary definition notation is more palatable.
q)kt,:([eid:1001; name:`Beeblebrox; iq:42])
It is even easier to use naked field values, provided they align exactly with both the key and value columns.
q)kt,:(1002; `Dent; 98)
eid | name iq
----| -------------
1001| Beeblebrox 42
1002| Dent 98
Note
Applying amend to a table repeatedly with the same argument results in duplicate records.
q)kt,:(1002; `Dent; 99)
q)kt
eid | name iq
----| -------------
1001| Beeblebrox 42
1002| Dent 99
9.1.2 Basic insert
Those who prefer the familiarity of SQL can append records to an existing global table using the binary function insert whose left operand is symbol containing the name of a global table (target) and whose right argument is a record, or list of records, conforming to the target. The result is a list of integers representing the offset(s) of
the appended record(s).
Important
Since insert essentially reduces to amend in place, fields are type checked.
- If the target column is simple the type must match exactly.
- If the target column is an untyped empty list, the result will take the type of the field in the initial
insert.
For a regular - i.e., non-keyed - table, the effect of insert is to append a new record with the specified field values. Let's use our simple example.
q)t:([] name:`Dent`Beeblebrox`Prefect; iq:42 98 126)
Here is how to insert a single record or the equivalent row list.
q)t:([] name:`Dent`Beeblebrox`Prefect; iq:42 98 126)
q)`t insert ([name:`Slartibartfast; iq:134])
,3
q)`t insert (`Marvin; 150)
,4
Tip
Repeatedly applying insert to a table with the same argument results in duplicate records.
q)`t insert (`Marvin; 150)
,5
q)`t insert (`Marvin; 150)
,6
q)t
name iq
------------------
Dent 42
Beeblebrox 98
Prefect 126
Slartibartfast 134
Marvin 150
Marvin 150
Marvin 150
Important
Since a list of conforming records is a table, inserting a list of records is the same as inserting a table. Observe the use of 3# to truncate the table.
q)t:3#t
q)`t insert (([name:`Slartibartfast; iq: 134]); ([name:`Marvin; iq:200]))
3 4
q)t
name iq
------------------
Dent 42
Beeblebrox 98
Prefect 126
Slartibartfast 134
Marvin 200
q)t:3#t
q)`t insert ([] name:`Slartibartfast`Marvin; iq:134 200)
3 4
You can also use insert in prefix form, possibly with the table name projected. For example, the previous insert can be written as,
q)insert[`t; (`Slartibartfast; 134)]
,5
q)insert[`t;] (`Marvin; 150)
,6
9.1.3 Columnar Bulk Insert
We have seen that it is possible to insert a naked list of row values instead of the full record dictionary. We have also seen that it is possible to bulk insert a list of conforming records, which amounts to a table with the same metadata as the target.
It is also possible to bulk insert naked field values but there is q twist. To bulk insert naked field values, you provide a list of columns not a list of rows.
q)t:([] name:`Dent`Beeblebrox; iq:98 42)
q)`t insert ((`Prefect; 126); (`Marvin; 200))
'type
q)`t insert (`Prefect`Marvin; 126 200)
2 3
After a brief q meditation you will realize that this is consistent with the previous bulk insert of a table, since a table is a collection of columns. From this perspective, we should view the insertion of a single naked row more correctly as a degenerate list of column lists.
9.1.4 Insert into Empty Tables
Inserting into a table with empty lists of general type causes the result table to take the type of the first record inserted. In particular, an atomic field in the inserted record results in a simple column with its type.
q)t:([] name:(); iq:())
q)`t insert (`Dent;98)
,0
q)meta t
c | t f a
----| -----
name| s
iq | j
This is fine as long as all the types in the initial record are correct. Should any have an unintended type, subsequent records of the correct type will all be rejected.
q)t:([] name:(); iq:())
q)`t insert (`Dent;98.0)
,0
q)`t insert (`Beeblebrox; 42)
'type
Tip
It is good practice to type all columns in an empty table. This will ensure that incorrect types are rejected and correct ones succeed.
q)t:([] name:`$(); iq:`int$())
q)
q)`t insert (`Dent;98.0)
'type ^
q)`t insert (`Beeblebrox; 42)
,0
It is also possible to insert a list of conforming records (i.e., a table) into a table that does not exist. This is the same as assigning the table to a variable of the specified name.
q)tnew
'tnew
q)`tnew insert enlist ([c1:`a; c2:10])
,0
9.1.5 Insert and Foreign Keys
When inserting data into a table that has foreign key(s) the values destined for the foreign key column(s) are checked to ensure that they appear in the primary key column(s) pointed to by the foreign key(s). This is referential integrity (well, half of it).
Returning to our previous foreign key example.
q)kt:([eid:1001 1002 1003]
name:`Dent`Beeblebrox`Prefect; iq:98 42 126)
q)tdetails:([] eid:`kt$1003 1002 1001 1002 1001;
sc:126 36 92 39 98)
q)`tdetails insert (1002;42)
,5
q)`tdetails insert (1042;150)
'cast
Recall that enumeration is a form of cast.
9.1.6 Insert into Keyed Tables
You can use insert to append data to keyed table, but this normally does not have the desired semantics. Specifically, you can insert into a keyed table only if the key value is not already in the table. For this and other reasons upsert should normally be used instead of insert.
Since a keyed table is a dictionary, to use insert we should properly provide a dictionary entry comprising a key record and a value record.
q)kt:([eid:1001 1002] name:`Dent`Beeblebrox; iq:98 42)
q)`kt insert (enlist ([eid:1003]))!enlist ([name:`W; iq:21])
,2
Yikes! Nobody ever does this. Instead use dictionary definition notation.
q)`kt insert ([eid:1003; name:`W; iq:21])
,2
Or you can provide a list of raw field values with the proviso that they align exactly across the key and value columns.
q)`kt insert (1005; `Marvin; 200)
,3
q)`kt insert (1004;`Slartibartfast; 158)
,4
Repeating the last insert now fails because the key value 1004 already exists.
q)`kt insert (1005; `Marvin; 200)
'insert
Note
The records in the keyed table are stored in insert order rather than key order.
eid | name iq
----| ------------------
1001| Dent 98
1002| Beeblebrox 42
1003| W 21
1005| Marvin 200
1004| Slartibartfast 158
9.2 upsert
The upsert operator is like insert, only better. Except for the last sub-section on keyed tables, the examples in the previous section all work the same for upsert.
9.2.1 upsert Replacing insert
Here we repeat some examples from the previous section to demonstrate that upsert can (and should) be used in its place to append rows to a table.
q)t:([] name:`Dent`Beeblebrox`Prefect; iq:42 98 126)
q)`t upsert (`name`iq)!(`Slartibartfast; 134)
`t
q)`t upsert (`Marvin; 150)
`t
q)`t upsert ([] name:`Slartibartfast`Marvin; iq:134 200)
`t
q)t:3#t
q)upsert[`t; (`Slartibartfast; 134)]
`t
Important
To bulk upsert naked field values, use rows instead of columns.
q)`t upsert ((`Prefect; 126); (`Marvin; 200))
`t
9.2.2 upsert by Name
A limitation of insert is that it uses pass-by-name, so it can only operate against global tables. In contrast, (upsert) a supports both pass-by-name and pass-by-value. Thus it can be used with anonymous or local tables.
q)([] c1:`a`b; c2:10 20) upsert (`c; 30)
c1 c2
-----
a 10
b 20
c 30
q)f:{t:([] c1:`a`b; c2:10 20); t upsert x}
q)f (`c; 30)
c1 c2
-----
a 10
b 20
c 30
9.2.3 upsert on Keyed Tables
Observe that insert has normally undesirable semantics on keyed tables – i.e., it rejects a record when the key already exists. What we want is well upsert semantics.
q)kt:([id:1001 1002] name:`Dent`Beeblebrox; iq:42 98)
q)`kt insert (1002; `Beeblebrox; 43)
'insert
q)`kt upsert (1002; `Beeblebrox; 43)
`kt
This is another reason to use upsert instead of insert.
9.2.4 upsert on Empty Table
You can use upsert even when the table is empty or doesn't exist. Ensure that you provide a list of records.
q)() upsert enlist ([c1:`a; c2:10])
c1 c2
-----
a 1
q)ktnew
'ktnew
q)`ktnew upsert ([] c1:`a`b; c2:10 20)
`ktnew
q)ktnew
c1 c2
-----
a 10
b 20
This is very useful in practice. Make sure the type of the initial record is correct as that will establish the types of all columns.
9.2.5 upsert on Persisted Tables
Advanced
You can use upsert to append records to serialized and splayed tables. Simply pass the handle of the file or splayed directory as the name of the table. This is the final strike against insert.
We serialize a table and then append a row to it. Here we are writing to a new directory located in QHOME.
q)`:Q4M/tser set ([] c1:`a`b; c2:1.1 2.2)
`:Q4M/tser
q)`:Q4M/tser upsert (`c; 3.3)
`:Q4M/tser
q)get `:Q4M/tser
c1 c2
------
a 1.1
b 2.2
c 3.3
Note
Upserting to a serialized table reads the entire table into memory, updates it and writes out the result.
Next we splay a table and then append a row to it. Observe that we ensure that all symbols are enumerated, as required for splayed tables.
q)`:Q4M/tsplay/ set ([] c1:`:sym?`a`b; c2:1.1 2.2)
`:Q4M/tsplay/
q)`:Q4M/tsplay upsert (`:sym?`c; 3.3)
`:Q4M/tsplay
q)\l Q4M/tsplay
`tsplay
q)select from tsplay
c1 c2
------
a 1.1
b 2.2
c 3.3
Tip
Upserting to a partitioned table does not read the persisted image into memory; rather, it appends to the ends of the column files. This allows incremental creation of large splayed (or partitioned) tables by upserting chunks that comfortably fit into memory. See 14.2 for details on working with splayed tables.
9.3 The select Template
In this section, we investigate the general form of select, which like all qSQL templates, has required and optional elements. The template elements, in turn, contain phrases which are expressions presumably involving column values of a specified table. The template is converted by the interpreter into a functional form and applied against the table to produce a result table. While the syntax and behavior resemble the analogous SQL statement, the underlying mechanics are quite different.
Important
The result of select is always a table.
We examine each of the constituents of select in detail. We introduce the concepts with illustrative examples using trivial tables so that the basic mechanics are not obscured by large or complex data.
9.3.1 Syntax
The select template has the following form, where elements enclosed in matching angle brackets (<...>) are optional.
select <ps> by pb from texp where pw
The select and from keywords are required; omission or mistyping either results in an error. The table expression texp, which is any q expression whose reduced value is a table or keyed table, is also required. The remaining elements ps, pb and pw are optional. They are called the select, the by and the where phrases, respectively.
Tip
If where is present and texp is itself a select expression, the inner expression should be enclosed in parentheses to avoid confusing you or the interpreter about the binding of the where phrase.
Each phrase in the select template is a comma-sequenced list of subphrases. A subphrase is an arbitrary q expression presumably involving columns of texp or columns of another table accessed via foreign key. The evaluation of subphrases within a phrase is sequenced left-to-right by the commas, but each subphrase expression is evaluated right-to-left, like any q expression.
Important
The commas separating the subphrases are syntactic separators, so it is not necessary to enclose a subphrase in parentheses unless the expression contains the operator ,. An occurrence of the operator , within any template phrase must be enclosed in parentheses to distinguish it from the separator. Forgetting this is very easy and painful.
The order of evaluation of the select template is:
fromexpression texpwherephrase pwbyphrase pbselectphrase ps
To start, the from expression, which is an arbitrary q expression, is fully evaluated and is expected to result in a table or keyed table.
9.3.2 The select Phrase
We begin our examples with the select phrase because it is the easiest. We use the following table for our examples.
q)t:([] c1:`a`b`c; c2:10 20 30; c3:1.1 2.2 3.3)
The select phrase specifies the columns in the result table, one per subphrase. If the select phrase ps is absent, all columns are returned. There is no need for the * wildcard of SQL.
q)select from t
c1 c2 c3
---------
a 10 1.1
b 20 2.2
c 30 3.3
q)t~select from t
1b
In other words, to specify result columns, list them separated by commas. Whitespace after commas is optional but some think it improves readability, especially for complicated queries.
q)select c1,c3 from t
c1 c3
------
a 1.1
b 2.2
c 3.3
9.3.2.1 Result Column Names
To specify column names for the result, prepend the name followed by colon to the subphrase.
q)select c1, res:2*c2 from t
c1 res
------
a 20
b 40
c 60
Notes
- The colon used to specify a name is not assignment; it is simply part of the syntax of the template.
- Just as with the use of colon in a variable assignment, the column name is not a symbol and cannot be parameterized. Use functional form if you need this.
- Unlike in SQL, columns in the select phrase do not actually exist until the final result table is returned. Thus an assigned column cannot be used in other column expressions, even to the right of it.
If you do not provide a name for a computed column, q determines one.
- Normally the name is taken from the left-most term in the column expression.
- When q cannot determine a name it uses
x. - If q's chosen name duplicates a previously determined column name, it will suffix it with
1,2, etc. to make it unique. Recall that columns are sequenced from left-to-right:
q)select c1, c1, 2*c2, c2+c3, string c3, 42 from t
c1 c11 x c2 c3 x1
-----------------------
a a 20 11.1 "1.1" 42
b b 40 22.2 "2.2" 42
c c 60 33.3 "3.3" 42
9.3.2.2 The Virtual Column i
A virtual column i containing the offset of each record within the table is implicitly available in the select phrase - i.e., i is the row number.
q)select i, c1 from t
x c1
----
0 a
1 b
2 c
Unfortunately select does not carry the name of its own virtual column i to the result so you will have to name it if you don't like the default name (I don't).
9.3.2.3 select distinct
The special form select distinct returns only unique records in the result - i.e., it eliminates duplicates.
q)select distinct from ([] c1:`a`b`a; c2:10 20 10)
c1 c2
-----
a 10
b 20
9.3.2.4 select[n]
You can return the first or last n records in a select by using function parameter syntax with select. A positive integer parameter returns the first records, a negative parameter the last. This and following examples use the sample tables referred to in the introduction to this chapter. Make sure you have loaded them to run the examples.
q)select[2] from s where city<>`athens
s | name status city
--| -------------------
s1| smith 20 london
s2| jones 10 paris
q)select[-1] from s where city<>`athens
s | name status city
--| -------------------
s4| clark 20 london
Of course we could achieve the same result using # after the select (i.e., to its left).
q)2#select from s where city<>`athens
s | name status city
--| -------------------
s1| smith 20 london
s2| jones 10 paris
q)-1#select from s where city<>`athens
s | name status city
--| -------------------
s4| clark 20 london
The difference is that the # construct requires computing the entire result set and then keeping only the desired rows, whereas 'select[n]' only creates the desired number of rows which is faster for large tables.
This syntax is extended to select[n m] where m is the starting row number and n is the number of rows.
q)select[1 2] from s where city<>`athens
s | name status city
--| ------------------
s2| jones 10 paris
s3| blake 30 paris
One final extension of the syntax specifies a column to be sorted inside the brackets. For ascending sort, place < before the column name and for descending sort use >.
q)select[>name] from s where city<>`athens
s | name status city
--| -------------------
s1| smith 20 london
s2| jones 10 paris
s4| clark 20 london
s3| blake 30 paris
q)select[<city] from s where city<>`athens
s | name status city
--| -------------------
s1| smith 20 london
s4| clark 20 london
s2| jones 10 paris
s3| blake 30 paris
You can combine the two syntax extensions by separating them with a semicolon.
q)select[2; >name] from s where city<>`athens
s | name status city
--| -------------------
s1| smith 20 london
s2| jones 10 paris
9.3.2.5 Select on Nested Columns
You can use select on tables with nested columns but things become more complicated. The rule of thumb is that you will need iterators...lots of iterators.
Let's take a simple example.
q)show tnest:([] c1:`a`b`c; c2:(10 20 30; enlist 40; 50 60))
c1 c2
-----------
a 10 20 30
b ,40
c 50 60
What can we do with this table? We cannot apply an aggregate or uniform operation straight to c2 since the fields do not conform.
q)select avg c2 from tnest
'length
We can apply an aggregate or uniform function to each field of c2 in tnest.
q)select avg each c2 from tnest
c2
--
20
40
5
Similarly we can use Each Parallel to compute a weighted average using two columns.
q)update c3:(1.1 2.2 3.3; enlist 4.4; 5.5 6.6) from `tnest
`tnest
q)select wtavg:c2 wavg' c3 from tnest
wtavg
--------
2.566667
4.4
6.1
9.3.3 Filtering with where
The 'where' phrase controls which records of the input table are used in the query. The effect of the 'where' phrase is to include only the records that meet its criteria.
9.3.3.1 Basic where
The action is a generalization of the built-in (where) function on lists (See 3.13). Recall that a table is logically a list of records.
q)t:([] c1:`a`b`c; c2:10 20 30; c3:1.1 2.2 3.3)
q)t where t[`c2]>15
c1 c2 c3
---------
b 20 2.2
c 30 3.3
q)select from t where c2>15
c1 c2 c3
---------
b 20 2.2
c 30 3.3
9.3.3.2 The Virtual Column i in where
The virtual column i is useful for paginating a table. Use within, which returns a boolean indicating whether the left operand is in the closed interval specified in the right operand, to determine the bounds of the page.
q)tbig:100#t
q)select from tbig where i within 50 99
c1 c2 c3
---------
c 30 3.3
…
Tip
To use this construct with non-literal values as the endpoints, you must use general list notation, since simple list notation cannot be used with variables.
q)st:50
q)end:99
q)select from tbig where i within (s;e)
…
9.3.3.3 Multiple where Subphrases
Each where subphrase is a predicate expression that produces a boolean result vector corresponding to records passing or failing a criterion. The logical effect of multiple subphrases as is if they were joined with "and". Note that parentheses are necessary in the second query. Continuing with t from §9.3.3.1.
q)r1:select from t where c2>15,c3<3.0
q)r2:select from t where (c2>15)&c3<3.0
q)r1~r2
1b
However, since the where subphrases are sequenced from left-to-right, their order affects the actual processing. As each subphrase is applied, only the records it passes are tested in the next subphrase. The net effect is a progressively narrowed sublist of rows to consider. There is often an optimal order that significantly narrows in the first one or few subphrases, which can greatly reduce the processing.
Important
Place the most limiting where subphrase first, followed by others in decreasing strictness.
Consider the following table comprising a million observations of two variables a and b taken every millisecond starting at midnight. On the author's laptop, the version of the query that narrows the time interval first executes in a millisecond whereas the one specifying the variable name first takes 9 milliseconds.
q)t:([] c1:00:00:00.000+til 1000000;c2:1000000?`a`b;c3:1000000?100.)
q)\t select from t where c1 within 00:00:01.000 00:00:01.999, c2=`a
1
q)\t select from t where c2=`a, c1 within 00:00:01.000 00:00:01.999
9
9.3.3.4 Nested Columns in where
Nested columns generally require iterators, and the where phrase is no different. A common mistake made by qbies is when trying to find a specific string in a column of strings because they forget that strings are not first class in q.
q)t:([] f:1.1 2.2 3.3; s:("abc";enlist "d";"ef"))
q)select from t where s="ef"
'length
=
q)select from t where s~"ef"
f s
---
q)select from t where s~\:"ef"
f s
--------
3.3 "ef"
The first query is in error because it tests atomic equality between a simple list and a nested list. The second query does not achieve the desired result because it asks if the entire column matches the specified string. The final query works because it tests the specified string for match against each string in the column.
Tip
For looking up strings, the like operator is efficient. See A.1.1 for a detailed description but we only need the special case that asks if two strings match exactly - i.e., no wildcards. The fact that
like is pseudo-atomic in the left operand makes it suited for comparing against a column. It is actually faster than the above expression with ~\: for large lists.
q)select from t where s like "ef"
f s
--------
3.3 "ef"
9.3.3.5 fby in where (Advanced)
A common use case in a query is to filter on groups. For example, in the following example using the table p from the distribution samples, we wish to include only the records having the maximum weight in their respective cities. We could start with a correlated sub query.
q)select max weight by city from p
city | weight
------| ------
london| 19
paris | 17
rome | 17
But you run into the issue of what to do with the other fields in the subquery. Do you carry them along without aggregation, which loses the information of where the maximum occurs?
q)select name, color, max weight by city from p
city | name color weight
------| ----------------------------------
london| `nut`screw`cog `red`red`red 19
paris | `bolt`cam `green`blue 17
rome | ,`screw ,`blue 17
Or do you apply other aggregates that will pick values that are uncorrelated with the original?
q)select first name, first color, max weight by city from p
city | name color weight
------| ------------------
london| nut red 19
paris | bolt green 17
rome | screw blue 17
In SQL you would use HAVING, but q is having none of that. Instead use fby in the where phrase. Since it returns the value of the aggregate across each group, you simply compare the target column to the fby result to get a boolean vector with 1b precisely at the records whose fields match the aggregate on the group.
Used in a where phrase, fby takes the form
(fagg;exprcol) fby c
The left operand is a two-item list comprising an aggregate function fagg and a column expression exprcol on which the function will be applied. The right operand c is the column to be grouped.
In our example,
q)select from p where weight=(max;weight) fby city
p | name color weight city
--| -------------------------
p2| bolt green 17 paris
p3| screw blue 17 rome
p6| cog red 19 london
Now we include another where phrase for the desired result.
q)select from p where weight=(max;weight) fby city,color=`blue
p | name color weight city
--| -----------------------
p3| screw blue 17 rome
To group on multiple columns, encapsulate them in an anonymous table in the right operand of fby.
q)t:([]sym:`IBM`IBM`MSFT`IBM`MSFT;
ex:`N`O`N`N`N;time:12:10:00 12:30:00 12:45:00 12:50:00 13:30:00;
price:82.1 81.95 23.45 82.05 23.40)
q)select from t where price=(max;price) fby ([]sym;ex)
sym ex time price
----------------------
IBM N 12:10:00 82.1
IBM O 12:30:00 81.95
MSFT N 12:45:00 23.45
9.3.4 Grouping and Aggregation
In contrast to SQL, where grouping and aggregation are performed together, in qSQL they are independent. In this section we again use the tables defined in the sp.q script.
9.3.4.1 Aggregation without Grouping
When an aggregate function is applied against a column of simple type the result is an atom. If all columns in the select phrase are computed with aggregation and there is no grouping, the result will be a table with a single row – e.g., a summary or rollup. While q has many built-in aggregates, you can also define and use your own.
Here we calculate the total and mean order quantity using the built-in aggregates sum and avg.
q)select total:sum qty, mean:avg qty from sp
total mean
--------------
3100 258.3333
9.3.4.2 Grouping without Aggregation
The by phrase groups rows having common values in specified column(s), much like GROUP BY in SQL. The result of a query including a by phrase is a keyed table whose key column(s) are those specified in the by phrase. This is well-defined because the grouping along like values ensures uniqueness of the keys.
The action of the by phrase is a generalization of the built-in function (group) on lists (See A.5.17). A query that groups without aggregation results in nested columns. One way to think of this is that each group of values is folded into a single field in the result.
q)t:([] c1:`a`b`a`b`c; c2:10 20 30 40 50)
q)t[`c2] group t[`c1]
a| 10 30
b| 20 40
c| ,50
q)select c2 by c1 from t
c1| c2
--| -----
a | 10 30
b | 20 40
c | ,50
Tip
Grouping without aggregation is the most common way that qbies unintentionally create nested columns. Nested columns are slower, more cumbersome to use (they require a heavy dose of iterators) and are usually unnecessary. And you can’t just cast a nested column to a simple one; you must apply an aggregate or some other operation that flattens a list.
Observe that ungroup can be used to reverse the nested result of grouping without aggregation It is not quite an inverse since it returns the original records ordered on the by column(s).
q)ungroup select c2 by c1 from t
c1 c2
-----
a 10
a 30
b 20
b 40
c 50
There are use cases that group only on specified column(s) and want the remaining columns to be nested in the result. For this use xgroup, which takes the symbolic column name(s) to be grouped as left operand and a table as right operand. The result is a keyed table that is that same as listing all the non-grouped columns in the equivalent select.
Using the distribution example – i.e., the sample tables from the beginning of this chapter,
q)`p xgroup sp
p | s qty
--| -------------------------------
p1| `s$`s1`s2 300 300
p2| `s$`s1`s2`s3`s4 200 400 200 200
p3| `s$,`s1 ,400
p4| `s$`s1`s4 200 300
p5| `s$`s4`s4 100 400
p6| `s$,`s1 ,100
Again ungroup is only an inverse up to record order.
q)ungroup `p xgroup sp
s qty
---------
p1 s1 300
p1 s2 300
...
9.3.4.3 Grouping with Aggregation
Normally you will group with by together with aggregation in the select phrase. The effect is to aggregate along the groups, collapsing each group of rows into a single record. The result is a keyed table whose key columns are the grouped column(s).
q)select sum c2 by c1 from t
c1| c2
--| --
a | 40
b | 60
c | 50
To group on multiple columns, specify multiple by subphrases, which results in a compound key in the result.
q)t:([] desk:`a`b`a`b`a`b; acct:`1`2`3`4`1`4; pnl:1.1 -2.2 3.3 4.4 5.5 -.5)
q)select ct:count i,sum pnl by desk,acct from t
desk acct| ct pnl
---------| -------
a 1 | 2 6.6
a 3 | 1 3.3
b 2 | 1 -2.2
b 4 | 2 3.9
Tip
In contrast to SQL, every column in the by phrase is automatically included in the key columns of the result and should not be included in the select phrase.
Each by subphrase can be an arbitrary q expression, meaning that you can group on computed columns. This is very powerful and is not present in standard SQL. Following is a useful example that averages the observations of a time series in 100 millisecond buckets.
q)t:([] c1:00:00:00.000+til 1000000;c2:1000000?`a`b;c3:1000000?100.)
q)select avg c3 by 100 xbar c1, c2 from t
c1 c2| c3
---------------| --------
00:00:00.000 a | 53.19259
00:00:00.000 b | 48.21786
00:00:00.100 a | 51.91764
…
Important
In a useful special case of implicit grouping with aggregation, specifying a by phrase together with no select phrase is equivalent to operating on all non-grouped columns with the aggregate last. A moment’s thought reveals this is quite useful for time series.
q)t:([] desk:`a`b`a`b`a`b; acct:`1`2`3`4`1`4; pnl:1.1 -2.2 3.3 4.4 5.5 -.5)
q)select by desk from t
desk| acct pnl
----| ---------
a | 1 5.5
b | 4 -0.5
9.4 The exec Template
The syntax of the exec template is identical to that of select.
exec ps by pb from texp where pw
Whereas select always returns a table, the type of the result of exec depends on the number of columns in its select phrase. One column with aggregate yields an atom; one column without aggregate yields a list; more than one column yields a dictionary.
When more than one column is specified in the select phrase, the result is a dictionary mapping the column names to the column lists produced. The essential difference with select is that the column lists do not have to be rectangular – i.e., they are not required to have the same length – and the resulting dictionary is not flipped into a table. For example, the following query fails with select because the column dictionary is not rectangular and so cannot be flipped into a table. But it succeeds with (exec).
q)t:([] name:`a`b`c`d`e; state:`NY`FL`OH`NY`HI)
q)select name, distinct state from t
'length
q)exec name, distinct state from t
name | `a`b`c`d`e
state| `NY`FL`OH`HI
The most common use of exec is when there is only one result column and no by phrase. The result is the computed column list devoid of other structure – i.e., not a dictionary or table. This is useful to extract a column dynamically. With t as above,
q)select name from t
name
----
a
b
c
d
e
q)exec name from t
`a`b`c`d`e
When using exec to extract a single column of a table you can place constraints on other columns.
q)exec name from t where state in `NY`HI
`a`d`e
9.5 The update Template
9.5.1 Basic update
The update template has identical syntax to select.
update pu by pb from texp where pw
The semantic difference is that colons in the update phrase pu identify modified or new columns instead of simply assigning column names. If the name to the left of a colon is a column that exists in the table, that column is updated with the result of the expression to the right of the colon. If the name to the left of a colon is not a column in the table, a new column with the result of the expression is added to the end of the column list. Note that this occurs against a copy of t.
q)t:([] c1:`a`b`c; c2:10 20 30)
q)update c1:`x`y`z from t
c1 c2
-----
x 10
y 20
z 30
q)t
c1 c2
-----
a 10
b 20
c 30
q)update c3:`x`y`z from t
c1 c2 c3
--------
a 10 x
b 20 y
c 30 z
The naming conventions are the same as in select so you can omit the name and colon for existing columns in many cases. With t as above,
q)(update c2:c2+100 from t)~update c2+100 from t
1b
In the examples above, the table was passed by value and so the original was not modified. To modify the table in place, pass it by name.
q)t:([] c1:`a`b`c; c2:10 20 30)
q)update c1:`x`y`z from `t
`t
q)t
c1 c2
-----
x 10
y 20
z 30
q)update c3:`x`y`z from `t
`t
q)t
c1 c2 c3
--------
x 10 x
y 20 y
z 30 z
Tip
Qbies coming from DDL may fail to appreciate how useful it is that update can add new columns dynamically. For example, you can add a “constant” column by taking advantage of that fact that scalars are extended to match vectors. Note that this is not a default value for the column in subsequent appends.
q)t:([] c1:`a`b`c; c2:10 20 30)
q)update c3:42 from t
c1 c2 c3
--------
a 10 42
b 20 42
c 30 42
Often you will apply update with a 'where' phrase that limits the scope of the modification.
q)t:([] c1:`a`b`c; c2:10 20 30)
q)update c2:c2+100 from t where c1<>`a
c1 c2
------
a 10
b 120
c 130
Tip
The actions in the where phrase and the update phrase are vector operations on entire column lists. This is the Zen of update.
If you add a new column in update with a where phrase, the fields in the non-selected rows will be the null from the appropriate type.
q)update c3:1 from t where c2>15
c1 c2 c3
--------
a 10
b 20 1
c 30 1
You can use any q expression to the right of the colon providing the size and type match the targeted location.
q)update c2:42 43 from t where c2>15
c1 c2
-----
a 10
b 42
c 43
q)update c2:42 43 44 from t where c2>15
'length
q)update c2:42.0 43 from t where c2>15
'type
9.5.2 update-by
When the by phrase is present, the update operation is performed along groups. This is most useful for aggregate and uniform functions. For an aggregate function, the entire group gets the value of the aggregation on the group.
q)update avg weight by city from p
p | name color weight city
--| -------------------------
p1| nut red 15 london
p2| bolt green 14.5 paris
p3| screw blue 17 rome
p4| screw red 15 london
p5| cam blue 14.5 paris
p6| cog red 15 london
A uniform function is applied along the group in place. This can be used to compute cumulative volume of orders, for example.
q)update cumqty:sums qty by s from sp
s p qty cumqty
----------------
s1 p1 300 300
s1 p2 200 500
s1 p3 400 900
..
9.6 The delete Template
The final template delete allows either rows or columns to be deleted. Its syntax is a simpler form of select, with the restriction that either pcols or pw can be present but not both.
delete pcols from texp where pw
If pcols is present as a comma-separated list of columns, the result is texp with the specified columns removed. If pw is present, the result is texp after records meeting the criteria of pw are removed. Someone always asks can you delete rows and column simultaneously but the rest of us meditating on the Zen of q realize this makes no sense.
As with update, when the table is passed by value, the operation is on a copy. When the table is passed by name, the operation is in place. Deleting from a copy,
q)t:([] c1:`a`b`c; c2:10 20 30)
q)delete c1 from t
c2
--
10
20
30
q)delete from t where c2>15
c1 c2
-----
a 10
q)t
c1 c2
-----
a 10
b 20
c 30
To delete in place,
q)delete from `t where c2=30
`t
q)delete c2 from `t
`t
q)t
c1
--
a
b
Tip
When you want to select all but a few columns, it is easier to delete the ones you don’t want than list all the ones you do.
q)t:([] c1:1 2; c2:`a`b; c3:1.1 2.2; c4:2025.01.01 2025.01.02)
q)(select c1, c2, c4 from t)~delete c3 from t
1b
9.7 Sorting
Recall that tables and keyed tables comprise lists of records and therefore have an inherent order. A table or keyed table can be reordered by sorting on any column(s). In contrast to SQL, there is in general no equivalent to ORDER BY in the select template. Instead the are built-in operators to sort tables that can be applied after select. However, see 9.7.4 for in-memory tables.
9.7.1 xasc
The binary xasc takes a scalar or list of symbol column names as its left operand and a table or table name as its right operand. It returns the records of the table sorted ascending on the specified column(s). The column list is in major-to-minor sort order.
To work on a copy pass by value.
q)t:([] c1:`a`b`c`a; c2:20 10 40 30)
q)`c2 xasc t
c1 c2
-----
b 10
a 20
a 30
c 40
q)`c1`c2 xasc t
c1 c2
-----
a 20
a 30
b 10
c 40
q)t
c1 c2
-----
a 20
b 10
c 40
a 30
To sort in place pass by name.
q)`c1`c2 xasc `t
`t
q)t
c1 c2
-----
a 20
a 30
b 10
c 40
9.7.2 xdesc
The binary xdesc behaves exactly as (xasc), except that the sort is performed in descending order.
q)t:([] c1:`a`b`c`a; c2:20 10 40 30)
q)`c1`c2 xdesc t
c1 c2
-----
c 40
b 10
a 30
a 20
9.7.3 Mixed sort
To combine ascending and descending sort compose xasc and xdesc. For example, to sort c2 descending within c1 ascending in t above, remember right-to-left.
q)`c1 xasc `c2 xdesc t
c1 c2
-----
a 30
a 20
b 10
c 40
9.7.4 Sorting Memory Tables
There is an extension to select for sorting non-memory-mapped tables – i.e., in-memory tables but not splayed, partitioned or segmented tables. In this situation you can add a column name and a sort indicator, '>' for descending or '<' for ascending, in brackets after select.
q)t:([] c1:`a`b`c`a; c2:20 10 40 30)
q)select[<c1] from t
c1 c2
-----
a 20
a 30
b 10
c 40
This can be combined with an indicator of how many rows to retrieve from the sorted result.
q)select[3; >c2] from t
c1 c2
-----
c 40
a 30
a 20
9.8 Renaming and Rearranging Columns
As the flip of a column dictionary, the columns of a table are both named and ordered. Similarly a keyed tabled is a dictionary whose columns also have names and that also has inherent order. There are built-in primitives to rename and reorder columns. Their names are unfortunately chosen and their usage may seem awkward at first since they focus on the left-most columns. Nonetheless, they work well in practice.
We use the following table in this section.
q)t:([] c1:`a`b`c; c2:10 20 30; c3:1.1 2.2 3.3)
9.8.1 xcol
The binary xcol takes a scalar or list of symbolic column names as its left operand (names) and a table or keyed table (source) as its right operand. The result is a table obtained by renaming the left-most columns of source according to names.
q)show `new1`new2 xcol t
new1 new2 c3
-------------
a 10 1.1
b 20 2.2
c 30 3.3
Note
There is no pass-by-name version of (xcol). To modify the source reassign it.
You can use the dictionary form of xcol to rename selected columns.
q)([c1:`C1; c3:`C3]) xcol t
C1 c2 C3
---------
a 10 1.1
b 20 2.2
c 30 3.3
9.8.2 xcols
Now you see what we mean about the unfortunate naming convention. The binary xcols takes a scalar or list of symbol column names as its left argument (names) and a table (source) as its right argument. The output is the result of reordering the columns in names so that those in names occur at the beginning – i.e., left-most in the display. Columns not specified in names are left in original order at the end – i.e., right-most.
Notes
- The source operand cannot be a keyed table.
- There is no pass-by-name version of
xcols. To modify the source you must reassign it.
For example,
q)`c3 xcols t
c3 c1 c2
---------
1.1 a 10
…
q)`c3`c2 xcols t
c3 c2 c1
---------
1.1 10 a
…
q)t
c1 c2 c3
---------
a 10 1.1
…
9.9 Joins
The essence of relational database design is normalizing data using relations and keys and then reassembling with joins. Normalization eliminates duplicate data, which takes up space and is hard to keep consistent. Joins restore the original flat rectangular form that makes data easy to work with (there’s a reason spreadsheets are so popular). Simple use cases are a master-detail relation or a lookup table.
In SQL the relational structure with primary and foreign keys is static. It must be defined in a separate language (DDL) before the data can be used. Foreign key/primary key relations must be pre-established in order for joins to take place.
In q, tables are first-class entities in the language. You can define tables and relations statically, but it is easy to create them dynamically. It is even possible to join tables that could have a relation but don’t.
A join sews back together data that has been factored into normal form along a foreign key/primary linkage. A join can be classified as inner or outer. An inner join pairs only records in both operands having matching keys. A left outer join includes all records in the left operand, pairing them with records having matching key in the right operand, should such exist. A right outer join reverses left and right in this description.
In q there are built-in inner and left outer joins; if you need a right join, reverse the operands. There is no operator for a full outer join but you can construct one; be careful what you wish for with large tables.
Joins can also be classified by how matching keys are determined. Most joins are equijoins, meaning that columns are tested for equality during matching. In q there are also non-equijoins, called “as of” joins, in which a column in one table is tested for less-than-or-equal against a column in another table.
9.9.1 Implicit joins
Given a primary key table m with key column(s) k and a table d with a foreign key linking to k, a left join can be expressed in various SQL notations. For example,
m LEFT JOIN d ON m.k = d.k
A SELECT statement for this join refers to columns in the join by using dot notation based on the constituent tables.
SELECT d.cold, m.colm FROM m LEFT JOIN d WHERE m.k = d.k
As we saw in Chapter 8, a foreign key in q is accomplished with an enumeration over the key column(s) of a keyed table. A left join is implicit in the following query on the detail table.
select cold, k.colm from d
For example, in the sp.q distribution script (see Introduction to Chapter 8), the table sp has foreign keys to both s and p. We can extract columns from the left join with s by issuing a query against sp and using dot notation on the foreign key to get columns in s.
q)select sname:s.name, qty from sp
sname qty
---------
smith 300
smith 200
smith 400
smith 200
clark 100
…
This generalizes to the situation where d has multiple foreign keys. For example, in the sp.q distribution script, we can select records from the join of sp, s and p from a query against sp.
q)select sname:s.name, pname:p.name, qty from sp
sname pname qty
---------------
smith nut 300
smith bolt 200
smith screw 400
smith screw 200
clark cam 100
…
Implicit joins extend to the situation in which the targeted keyed table itself has a foreign key to another keyed table.
q)emaster:([eid:101 102 103 104 105] curr:`gbp`eur`eur`gbp`eur)
q)update eid:`emaster$101 102 105 104 103 from `s
`s
q)select s.name, qty, s.eid.currency from sp
name qty curr
--------------
smith 300 gbp
smith 200 gbp
smith 400 gbp
smith 200 gbp
clark 100 gbp
…
9.9.2 Ad-hoc Left Join lj
You can create an ad-hoc left outer join between tables that could have a foreign key relationship using the binary lj. When the foreign key exists, that linkage is used; otherwise, the linkage is constructed dynamically. This join is 2-3 times faster if the foreign key has already been created.
Tip
If you intend to perform the join more than a few times, it pays to create the foreign key up front.
The right operand is a keyed table (target) and the left operand is a table or keyed table (source) having either a foreign key to target or column(s) that match the key column(s) of target in name and type. The result is all the records and columns of source augmented with the records and columns of target along matching keys. For those records in source having no matching key, the augmented columns contain null values.
In the following example we see all the records of t in the result, with null values in the kt column(s) where there is no matching key.
q)t:([] k:1 2 3 4; c:10 20 30 40)
q)kt:([k:2 3 4 5]; v:200 300 400 500)
q)t lj kt
k c v
-------
1 a
2 b 2.2
3 c 3.3
4 d 4.4
Observe that when the source table has a foreign key, an ad-hoc left join is equivalent to listing all columns from both tables in an implicit join.
q)kt:([k:1 2 3 4 5]; v1:10 20 30 40 50; v2:1.1 2.2 3.3 4.4 5.5)
q)tf:([] k:`kt$1 2 3 4; c:10 20 30 40)
q)(tf lj kt)~select k,c,k.v1,k.v2 from tf
1b
Important
When the tables source and target have duplicate non-key columns, lj has upsert semantics. That is, the values in the right operand (target) columns prevail over those in the left operand (source). This is different from SQL where the result contains both columns with suffixes to ensure unique names.
q)t:([] k:1 2 3 4; v:10 20 30 40)
q)kt:([k:2 3 4 5]; v:200 300 400 500)
q)t lj kt
k v
-----
1 10
2 200
3 300
4 400
You can also use lj with a left-operand keyed table.
q)kt1:([k:1 2 3 4]; v:10 0N 30 40)
q)kt:([k:2 3 4 5]; v:200 300 400 500)
q)kt1 lj kt
k| v
-| ---
1| 10
2| 200
3| 300
4| 400
9.9.3 Column Lookup
Folks coming to kdb from SQL-land tend to be join-happy because when the only tool you have is a hammer, everything looks like a nail. In q you can perform a column lookup against a keyed table within a query without using a join.
The insight is that a keyed table is a dictionary whose key list comprises its key records, so it will perform the lookup provided we put the column in an anonymous table conforming to those key records – see 8.4.11. Here we demonstrate the case where the lookup column names do not match the key columns, hence we rename columns to match in the anonymous table.
q)t:([] k:1 2 3 4; c:10 20 30 40)
q)kt:([k1:2 3 4 5]; v:2.2 3.3 4.4 5.5)
q)select c, v:kt[([] k1:k); `v] from t
c v
------
10
20 2.2
30 3.3
40 4.4
Especially for a single column, this is simpler (and more impressive to your colleagues) than the equivalent join.
q)t:([] k:1 2 3 4; c:10 20 30 40)
q)kt:([k1:1 2 3 4 5]; v1:1.2 2.2 3.3 4.4 5.5; v2:10 20 30 40 50)
q)select c, v1 from t lj `k xkey select k:k1, v1 from kt
c v1
------
10 1.2
20 2.2
30 3.3
40 4.4
Here is an example using compound keys and column renaming to match the lookup table.
q)t:([] f:`rocky`bullwinkle; l:`squirrel`moose; c:10 20)
kt:([fn:`rocky`bullwinkle`fearless; ln:`squirrel`moose`leader]
v:1.1 2.2 3.3)
q)select c, v:kt[([] fn:f; ln:l); `v] from t
c v
------
10 1.1
20 2.2
9.9.4 Ad-Hoc Inner Join ij
Given a primary key table m with key column(s) k and a table d with a foreign key linking to k, an inner join can be expressed in various SQL notations,
m,d WHERE m.k = d.k
m INNER JOIN d ON m.k = d.k
A SELECT statement for this join refers to columns in the join by using dot notation based on the constituent tables.
SELECT d.cold, m.colm FROM m,d WHERE m.k = d.k
The binary inner join operator ij performs an inner join between two tables that could have a foreign key relationship.
Tip
The same performance observations hold as for lj.
As with lj, the right operand is a keyed table (target) and the left operand is a table or keyed table (source) having column(s) that are either foreign key(s) over target or exactly match the key column(s) of target in name and type. The matching is done via the foreign key or by common column name(s) between source and the key column(s) of target if there is no foreign key relationship. The result contains the columns from source and target joined along common keys.
The following example shows that ij returns just those records with matching keys in both tables.
q)t:([] k:1 2 3 4; c:10 20 30 40)
q)kt:([k:2 3 4 5]; v:2.2 3.3 4.4 5.5)
q)t ij kt
k c v
--------
2 20 2.2
3 30 3.3
4 40 4.4
As with lj, upsert semantics holds for duplicate columns.
q)t:([] k:1 2 3 4; v:10 20 30 40)
q)kt:([k:2 3 4 5]; v:200 300 400 500)
q)t ij kt
k v
-----
2 200
3 300
4 400
You can also use ij with a left operand keyed table.
q)kt1:([k:1 2 3 4]; v:10 0N 30 40)
q)kt:([k:2 3 4 5]; v:200 300 400 500)
q)kt1 ij kt
k| v
-| ---
2| 200
3| 300
4| 400
9.9.5 Equijoin ej
The ternary equijoin operator ej is an inner join between tables in the second and third parameters along specified column names in the first parameter. Unlike ij the right operand does not have to be a keyed table. As with all joins, upsert semantics holds on the columns.
q)t1:([] k:1 2 3 4; c:10 20 30 40)
q)t2:([] k:2 3 4 5; c:200 300 400 500; v:2.2 3.3 4.4 5.5)
q)ej[`k; t1; t2]
k c v
---------
2 200 2.2
3 300 3.3
4 400 4.4
q)ej[`k; t1; t2]~t1 ij `k xkey t2
1b
9.9.6 Plus Join pj
Plus join pj is a left join that replaces upsert semantics for duplicate column names with addition. This is useful when you have two tables with identical schemas whose non-key columns are all numeric. For example, you have an organizational hierarchy and you want to rollup numeric results.
The operands of pj are the same as for lj with the additional requirement that all non-key columns are numeric. The semantics are that duplicate columns are added along matching keys and missing or null values are treated as zero.
For example,
q)t:([] k:`a`b`c; a:100 200 300; b:10. 20. 30.; c:1 2 3)
q)kt:([k:`a`b] a:10 20; b:1.1 2.2)
q)t pj kt
k a b c
------------
a 110 11.1 1
b 220 22.2 2
c 300 30 3
Observe that this is the same as,
q)kt1:([k:`a`b`c] a:10 20 0; b:1.1 2.2 0.0; c:0 0 0)
q)t pj kt1
k a b c
------------
a 110 11.1 1
b 220 22.2 2
c 300 30 3
9.9.7 Union Join
The equivalent of an ordinary SQL union on tables with matching schemas is simply , except that SQL removes duplicates – use distinct of you need this. Here it joins two lists of compatible records.
q)t1:([] c1:`a`b; c2:1 2)
q)t2:([] c1:`c`d; c2:3 4)
q)t1,t2
c1 c2
-----
a 1
b 2
c 3
d 4
Union join uj is more powerful in that it vertically combines any two tables, or keyed tables. The records of the right operand are appended to those of the left operand in the following manner. The result table is widened with new columns of the same name and type for the columns of the right operand that do not appear in the left operand. The initial records in the result come from the left operand and have nulls in the new columns. Records from the right operand have their field values under the appropriate columns. An example is worth a thousand words.
q)t1:([] c1:`a`b`c; c2: 10 20 30)
q)t2:([] c1:`x`y; c3:8.8 9.9)
q)t1 uj t2
c1 c2 c3
---------
a 10
b 20
c 30
x 8.8
y 9.9
You can also apply uj to keyed tables, where upsert semantics hold on both rows and columns.
q)kt1:([k:1 2 3] v1:`a`b`c; v2:10 20 30)
q)kt2:([k:3 4] v2:300 400; v3:3.3 4.4)
q)kt1 uj kt2
k| v1 v2 v3
-| ----------
1| a 10
2| b 20
3| c 300 3.3
4| 400 4.4
Important
While uj is powerful and provides great flexibility for receiving dynamic records whose names and types may not be fully known at run time, it is also expensive when tables have to be widened to accommodate new columns. If you are using it to receive real-time data that comes as partial records containing only fields that are updated, performance will be much better if you initially create a table wide enough to hold all possible columns to be received and then just upsert the partial records to it. It is faster to remove empty columns later if necessary.
9.9.8 As-of Joins
As-of joins are so-named because they most often join tables along time columns to obtain a value in one table that is current as of a time in another table. As-of joins are non-equijoins that match on less-than-or-equal. They are not restricted to time values and can be applied to any types having underlying integral values.
The fundamental as-of join is the triadic function aj. It joins tables along common columns using most recent values. The syntax of aj is,
aj[`c1...`cn;t1;t2]
where `c1...`cn is a simple list of symbol column names common to t1 and t2, which are the tables to be joined. There is no requirement for any of the join columns to be keys but the join will be faster on keys. The columns of both tables are brought into the result.
The semantics of aj are as follows. The match on all specified columns except the last is by equals. Now assuming the records are sequenced by cn in both tables, for a given cn value in t1, the match picks the greatest cn in t2 less than or equal to the given value in t1. In essence, if the cn columns are sequenced temporal values, for each cn value in t1 the match picks the t2 row whose cn value is in effect "as of" the time in t1.
The canonical example for aj is matching trades with quotes. To know if a trade represents best execution, you want to compare the trade price to the current quote – i.e., the most recent quote up to and including the time of the trade. When matching the tables with trades and quotes, you clearly want an equi-join on the date and symbol and a non-equijoin on the time.
q)show t:([] ti:10:01:01 10:01:03 10:01:04;
sym:`msft`ibm`ge;
qty:100 200 150)
ti sym qty
-----------------
10:01:01 msft 100
10:01:03 ibm 200
10:01:04 ge 150
q)show q:([] ti:10:01:00 10:01:01 10:01:01 10:01:02;
sym:`ibm`msft`msft`ibm;
px:100 99 101 98)
ti sym px
-----------------
10:01:00 ibm 100
10:01:01 msft 99
10:01:01 msft 101
10:01:02 ibm 98
q)aj[`sym`ti;t;q]
ti sym qty px
---------------------
10:01:01 msft 100 101
10:01:03 ibm 200 98
10:01:04 ge 150
It is worth stepping through this example in detail. We begin with the first record in t. We look for the quote records matching `msft as of the trade time 10:01:01. There are two such records, both stamped 10:01:01 but the last one, which presumably occurred later, has price 101, which is reflected in the result. Next we look for the last quote record for `ibm as of 10:01.03, which is the quote record stamped 10:00:02 with price 98. Finally there are no quote records matching `ge through 10:01:04.
The simpler function asof performs the same match as aj but with a table against a single record. The result picks out the remaining columns in the matched row of the table.
q)t:([] ti:10:01:01 10:01:03 10:01:04;
sym:`msft`ibm`ge;
qty:100 200 150;
px:45 160 55)
q)t asof ([sym:`ibm; ti:10:01:03])
qty| 200
px | 160
A list of such dictionary records conforms – i.e., is a table – so (asof) against a table matches against each record.
q)t asof ([] sym:`msft`ibm; ti:10:01:01 10:01:03)
qty px
-------
100 45
200 160
There are several other variations of as-of joins closely related to aj. Whereas aj returns the boundary time from t1, aj0 returns the actual time from t2. Also ajf and ajf0 restore the behavior of aj and aj0 from q version 2.8 for backward compatibility. See the KX site for more detailed documentation.
9.9.9 Window Joins
Window joins are a generalization of as-of joins and are specifically geared for analyzing the relationship between trades and quotes in Finance. The idea is that you want to investigate the behavior of quotes in a neighborhood of each trade. For example, to determine how a trade was executed, you need to examine the range of bid and ask prices that were prevalent around the trade time.
Writing such a query manually would be cumbersome. Instead, q provides the built-in window join wj that computes on an interval around each trade. Specifying the set-up takes a bit of work but then all the real work is done for you.
So that things fit on the page, we use a simple example involving trades for one ticker symbol `aapl and times that are in seconds. The general case is essentially the same. First we create the trades and quotes tables; note that they are required to be non-keyed.
q)t:([]sym:3#`aapl;time:09:30:01 09:30:04 09:30:08;price:100 103 101)
q)t
sym time price
-------------------
aapl 09:30:01 100
aapl 09:30:04 103
aapl 09:30:08 101
q)q:([]sym:8#`aapl;
time:09:30:01+(til 5),7 8 9;
ask:101 103 103 104 104 103 102 100;
bid:98 99 102 103 103 100 100 99)
q)q
sym time ask bid
---------------------
aapl 09:30:01 101 98
aapl 09:30:02 103 99
aapl 09:30:03 103 102
aapl 09:30:04 104 103
aapl 09:30:05 104 103
aapl 09:30:08 103 100
aapl 09:30:09 102 100
aapl 09:30:10 100 99
We construct fixed-width windows of 2 seconds before and one second after each trade time. In passing, we note that there is no requirement that the windows be of uniform width across records.
q)show w:flip -2 1+/:t `time
09:29:59 09:30:02 09:30:06
09:30:02 09:30:05 09:30:09
For readability we place the names of the 'sym' and 'time' columns in a variable.
q)c:`sym`time
The actual form for wj is:
wj[w;c;t;(q;(f0;c0);(f1;c1))]
Here w is a pair of lists representing start and end points of the windows, c is the list of sym and time column names, t is the trades table. The final parameter is a list with the quotes table and a pair of lists, each with an aggregate function and a column name.
To see all the values in each window, pass the identity function :: in place of the aggregates. The result is similar to grouping without aggregate in a query and is helpful to see what is happening within each window.
q)wj[w;c;t;(q;(::;`ask);(::;`bid))]
sym time price ask bid
--------------------------------------------------
aapl 09:30:01 100 101 103 98 99
aapl 09:30:04 103 103 103 104 104 99 102 103 103
aapl 09:30:08 101 104 103 102 103 100 100
Now we apply wj with actual aggregates for the maximum ask and minimum bid over each window. Here you see the results of the aggregate functions run within each window.
q)wj[w;c;t;(q;(max;`ask);(min;`bid))]
sym time price ask bid
---------------------------
aapl 09:30:01 100 103 98
aapl 09:30:04 101 104 99
aapl 09:30:08 105 108 104
Instead of the aggregates being run over exactly the values within each window, you can also include quote values that are current at the beginning of the window. This matters if there is no quote precisely at the beginning time of the window. In this case use wj1, whose syntax is identical to wj.
q)wj1[w;c;t;(q;(::;`ask);(::;`bid))]
sym time price ask bid
--------------------------------------------------
aapl 09:30:01 100 101 103 98 99
aapl 09:30:04 103 103 103 104 104 99 102 103 103
aapl 09:30:08 101 103 102 100 100
9.10 Parameterized Queries
Relational databases have stored procedures, which are database programs incorporating SQL statements. The programming languages are not part of the SQL standard. They vary significantly across vendors but are generally third-generation imperative languages.
This situation forces a programmer to make a choice: learn a proprietary language to place functionality close to the data, or extract the data into an application server to perform calculations. Various multi-tier architectures have evolved to address this problem, but they increase system cost and complexity.
This separation is obviated in kdb+, since q is the stored procedure language and it has the power and performance to process big data, whether in memory or on disk. Any q function that operates on a table is effectively a stored procedure. Function parameters can be used to supply specific values for queries.
In particular, the (select), (exec), (update) and (delete) templates can be invoked within a function with parameters to yield a parameterized query.
Important
Parameterized queries with templates have restrictions:
- A column name cannot be passed as a parameter since columns are part of the syntactic sugar of these templates. They only become symbolic names under the covers.
- Parameters used in the queries cannot be implicit parameters to the function – i.e., the parameters must be declared explicitly. Although this seems to work for a single implicit parameter, for consistency we recommend making all function parameters explicit.
We follow our own advice in the following examples.
q)t:([] c1:`a`b`c; c2:10 20 30; c3:1.1 2.2 3.3)
q)select from t where c2>15
c1 c2 c3
---------
b 20 2.2
c 30 3.3
q)proc:{[sc] select from t where c2>sc}
q)proc 15
c1 c2 c3
---------
b 20 2.2
c 30 3.3
q)proc2:{[nms;sc] select from t where c1 in nms, c2>sc}
q)proc2[`a`c; 15]
c1 c2 c3
---------
c 30 3.3
You can pass a table as a parameter to a template, either by value or by name.
q)t:([] c1:`a`b`c; c2:10 20 30; c3:1.1 2.2 3.3)
q)proc3:{[t;nms;delta] update c2+delta from t where c1 in nms}
q)proc3[t;`a`c;100]
c1 c2 c3
----------
a 110 1.1
b 20 2.2
c 130 3.3
q)proc3[`t;`a`c;100]
`t
q)t
c1 c2 c3
----------
a 110 1.1
b 20 2.2
c 130 3.3
Tip
There is no need to restrict stored procs in q to the templates. Any expression that operates on a table can serve.
q)t:([] c1:`a`b`c; c2:10 20 30; c3:1.1 2.2 3.3)
q)procf:{[cs;t] cs#t}
q)procf[`c1`c2; t]
c1 c2
-----
a 10
b 20
c 30
Advanced: You can effectively parameterize column names in two ways, only one of which is good practice. First, you can mimic a common technique from SQL in which the query is created dynamically: build the query text in a string and then invoke the interpreter programmatically using (value). This is comparatively slow. Worse, it exposes your application to injection attacks, since any q expression that appears inside the text will be executed.
The preferred method is to use the functional form for queries – see 9.12 – which is fast and secure. In functional form, all columns are referred to by symbolic name, so columns names can be passed as symbols. In fact, any component of a query can be passed as an argument in functional form.
9.11 Views
A SQL view is effectively a query expression whose result set can be used like a table. Views are useful for encapsulating data – e.g., hiding columns, or simplifying complex queries.
A qSQL view is a named table expression created as an alias with the double-colon operator ::. It is common to use the templates in views but this is not a requirement.
In the following example, contrast u, which is a q variable that is assigned the result of a query, to the view (alias) v that is the query expression itself.
q)t:([] c1:`a`b`c; c2:10 20 30)
q)u:select from t where c2>15
q)v::select from t where c2>15
q)u~v
1b
q)update c2:15 from `t where c1=`b
`t
q)u
c1 c2
-----
b 20
c 30
q)v
c1 c2
-----
c 30
Observe that when the underlying table changed, u did not change but the next reference to v does reflect the update.
To find the underlying query of a view, or any alias, apply the function view to the symbolic name.
q)view `v
"select from t where c2>15"
Recall from §4.11 to list all the views in a namespace, use the function (views) with the context name.
q)views `.
,`v
9.12 Functional Forms
In the experience of the author, functional form is the most difficult q topic for most programmers to learn. Among the reasons for this are:
- The high level of abstraction and generalization
- There is nothing like it in SQL or other programming languages
- In vintage Arthurian fashion, information density is maximized
Functional form is difficult. Along with iterators and generalized application, it completes the Big Three aspects of q that separate pretenders from contenders. Fortunately there is a cheat that can be helpful in most situations.
Note
The function parse can be applied to a string containing a template query to produce a parse tree whose items work in the equivalent functional form. A complication is that sometimes the operators are in k form instead of q.
The functional forms of the four templates select, exec, update and delete are powerful because they allow all constituents to be parameterized. They can be used for any query but are especially handy for queries that are generated or completed programmatically. The q interpreter parses the syntactic sugar of the templates into their equivalent functional forms, so there is no performance difference.
Tip
We recommend writing table operations in fundamental or template form unless you anticipate needing to parameterize column names. The folks who maintain your code will thank you.
There are two functional forms, one for select and exec, the other for update and delete. The types of the arguments passed determine the overload. The forms are,
?[t;c;b;a] / select and exec
![t;c;b;a] / update and delete
where
tis a table expression or the name of a tableais a dictionary of aggregatesbis a dictionary of group-bys or a flag controlling other aspects of the querycis a list of constraints.
The expressions in a, b and c can involve columns of t and also any variables that are in scope. The rules for expression interpretation are:
- Columns are always represented by their symbolic names
- Consequently, any literal symbols, or lists of symbols, appearing in the expressions must be enlisted.
We warned you this wasn’t going to be easy.
We shall use the following sample table in our examples.
q)show t:([] c1:`a`b`a`c`a`b`c; c2:10*1+til 7; c3:1.1*1+til 7)
c1 c2 c3
---------
a 10 1.1
b 20 2.2
a 30 3.3
c 40 4.4
a 50 5.5
b 60 6.6
c 70 7.7
9.12.1 Functional select
We begin with the functional form corresponding to the simplest select query.
q)select from t
c1 c2 c3
---------
a 10 1.1
b 20 2.2
…
q)?[t; (); 0b; ()]~select from t
1b
In this degenerate functional form,
- The constraint parameter is the empty list
- The by parameter is the special value 0b indicating there is no grouping in a select query
- The aggregate parameter takes the special value of the general empty list, indicating there is no aggregate phrase – i.e., return all columns
Observe that we can pass the table by name and we can also parameterize the table by substituting any q expression that returns a table or keyed table.
q)?[`t; (); 0b; ()]
q)?[`t; (); 0b; ()]
c1 c2 c3
---------
a 10 1.1
b 20 2.2
…
q)ft:{[] t}
q)?[ft[]; (); 0b; ()]
c1 c2 c3
---------
a 10 1.1
b 20 2.2
…
Now we add a constraint – i.e., a 'where' phrase. The constraint parameter must be a list of q parse trees, one item for each subphrase in the 'where' phrase. The parse tree for each subphrase can be obtained by converting the expression fully to prefix form and then transforming each function into a list with it followed by all its arguments. Remember, columns names become symbols and literal symbols get enlisted.
q)select from t where c2>35,c1 in `b`c
c1 c2 c3
---------
c 40 4.4
…
q)?[t; ((>;`c2;35); (in;`c1;enlist `b`c)); 0b; ()]
c1 c2 c3
---------
c 40 4.4
…
Next we demonstrate aggregation without grouping. The aggregate parameter is a dictionary whose key list is column names and whose values are a list of parse trees, one for each subphrase of the aggregate phrase. Again, column names are symbols and literal symbols get enlisted.
Note
While q will assign default columns names in the templates, you must explicitly provide all columns names in functional form.
q)select max c2, c2 wavg c3 from t
c2 c3
------
70 5.5
q)?[t; (); 0b; `maxc2`wtavg!((max;`c2); (wavg;`c2;`c3))]
maxc2 wtavg
-----------
70 5.5
While we could use dictionary definition syntax, that would defeat the purpose of exposing all column names as symbols.
Next we demonstrate grouping without aggregation. As with the aggregate parameter, we construct a dictionary with column names and parse trees of the 'by' subphrases. In this particular example, we demonstrate how to handle the case of grouping on a single column. Recall that a singleton dictionary requires the key and value to be enlisted – this enlist is separate from enlisting literal symbols in functional form.
q)select c2 by c1 from t
c1| c2
--| --------
a | 10 30 50
…
q)?[t; (); (enlist `c1)!enlist `c1; ()]
c1| c2 c3
--| ------
a | 50 5.5
…
Once again we have not used dictionary definition syntax so that columns names are parametrizable.
Finally we put the pieces together. We find functional form code to be more readable if you separately construct the three parameters as variables and pass these in to the functional form.
q)select max c2, c2 wavg c3 by c1 from t where c2>35,c1 in `b`c
c1| c2 c3
--| ------
b | 60 6.6
c | 70 6.5
q)c:((>;`c2;35); (in;`c1;enlist `b`c))
q)b:(enlist `c1)!enlist `c1
q)a:`maxc2`wtavg!((max;`c2); (wavg;`c2;`c3))
q)?[t;c;b;a]
c1| maxc2 wtavg
--| -----------
b | 60 6.6
c | 70 6.5
Now that wasn’t so bad, was it? OK, maybe it was.
Next we show the functional form for the special case select distinct. This case is indicated by setting the 'by' parameter to 1b.
q)t:([] c1:`a`b`a`c`b`c; c2:1 1 1 2 2 2; c3:10 20 30 40 50 60)
q)select distinct c1,c2 from t
c1 c2
-----
a 1
b 1
c 2
b 2
q)?[t; (); 1b; `c1`c2!`c1`c2]
c1 c2
-----
a 1
…
The extended form (select[n]) adds a fifth parameter to the functional form.
q)t:([] c1:`a`b`c; c2:10 20 30)
q)select[2] from t
c1 c2
-----
a 10
b 20
q)?[t;();0b;();2]
c1 c2
-----
a 10
b 20
Advanced: The extended forms (select[n]) and (select[m n]) that limit the number of records returned add an additional parameter to the functional form. In the first case the additional parameter is the integer atom n representing (up to) how many records to retrieve. In the second case it is the list m n where m is the starting record index for retrieval and n is as previous.
q)t:([] c1:`a`b`c`a; c2:10 20 30 40)
q)select[2] from t
c1 c2
-----
a 10
b 20
q)?[t;();0b;(); 2]
c1 c2
-----
a 10
b 20
q)select[1 2] from t
c1 c2
-----
b 20
c 30
q)?[t;();0b;(); 1 2]
c1 c2
-----
b 20
c 30
The maximal extended form (select[n; >ci]) or (select[m n; >ci]) adds one more parameter to the functional form, which is a list with the sort specification. Its first item indicates whether to use iasc or idesc for the sort. The second item is the column name for the sort. Note that this column name becomes a symbol in functional form so it is parametrizable. As before n (or m n) limits the number of records to retrieve.
q)select[2; >c1] from t
c1 c2
-----
c 30
b 20
q)?[t;();0b;(); 2; (>:;`c1)]
c1 c2
-----
c 30
b 20
Finally note that to specify only the column sort but not to limit the number of records, set n to 0W in the maximal form Confused? It's Zen!
q)select[>c1] from t
c1 c2
-----
c 30
b 20
a 10
a 40
q)?[t;();0b;(); 0W; (>:;`c1)]
c1 c2
-----
c 30
…
9.12.2 Functional exec
The functional form for exec is nearly identical to that of (select). Slight variations in the parameters indicate which is intended. Since the constraint parameter carries over unchanged, we omit its discussion in this section.
The functional form for exec on a single result column depends on whether you want a list or dictionary to be returned. When there is no grouping, use the empty list for the 'by' parameter. For a list result, specify the aggregate as a parse tree; for a dictionary result, specify a dictionary mapping the result name to its parse tree.
Note
Here 'distinct' is not part of the exec template as it was with select. Rather, it is applied to column c1.
q)t:([] c1:`a`b`c`a; c2:10 20 30 40; c3:1.1 2.2 3.3 4.4)
q)exec distinct c1 from t
`a`b`c
q)?[t; (); (); (distinct; `c1)]
`a`b`c
q)exec c1:distinct c1 from t
c1| a b c
q)?[t; (); (); (enlist `c1)!enlist (distinct; `c1)]
c1| a b c
For exec on multiple columns without grouping, specify the 'by' parameter as the general empty list and the aggregate parameter as a dictionary mapping column names to a list of parse trees.
q)exec distinct c1, c2 from t
c1| `a`b`c
c2| 10 20 30 40
q)?[t; (); (); `c1`c2!((distinct; `c1); `c2)]
c1| `a`b`c
c2| 10 20 30 40
To group on a single column, specify its symbol name in the 'by' parameter.
q)exec c2 by c1 from t
a| 10 40
b| ,20
c| ,30
q)?[t; (); `c1; `c2]
a| 10 40
…
More complex grouping in an exec seems to revert to the equivalent select.
9.12.3 Functional update
The syntax of functional update is identical to that of select except that ! is used in place of ?. In the following examples you will need to keep track of the different uses of enlist:
- making a list of parse trees from the parse of a single expression
- creating singleton dictionaries
- distinguishing literal symbols from column names.
q)t:([] c1:`a`b`c`a`b; c2:10 20 30 40 50)
q)update c2:100 from t where c1=`a
c1 c2
------
a 100
…
q)c:enlist (=;`c1;enlist `a)
q)b:0b
q)a:(enlist `c2)!enlist 100
q)![t;c;b;a]
c1 c2
------
a 100
…
q)update c2:sums c2 by c1 from t
c1 c2
-----
a 10
b 20
c 30
a 50
b 70
q)![`t; (); 0b; (enlist `c2)!enlist(sums;`c2)]
`t
q)t
c1 c2
------
a 10
b 30
…
9.12.4 Functional delete
The syntax of functional delete is a simplified form of functional update.
![t;c;0b;a]
Here t is a table expression, or the name of a table, c is a list of parse trees for 'where' subphrases and a is a list of symbolic column names. Either c or a, but not both, must be present. If c is present, it specifies which rows are to be deleted. If a is present it is a list of symbolic column names to be deleted. In the latter case you must specify c as an empty general list.
As before, distinguish the various uses of (enlist) in the examples.
q)t:([] c1:`a`b`c`a`b; c2:10 20 30 40 50)
q)delete from t where c1=`b
c1 c2
-----
a 10
c 30
a 40
q)![t;enlist (=;`c1;enlist `b);0b;`symbol$()]
c1 c2
-----
a 10
…
q)delete c2 from t
c1
--
a
b
…
q)![`t;();0b;enlist `c2]
`t
q)t
c1
--
a
b
…
Again observe that we demonstrate call by name in the last query.
9.13 Examples
At this point we know enough about tables to do some damage. You can now go back and (re)read Q Shock and Awe and the section on tables should be straightforward.
9.13.1 The trades Table
We shall demonstrate further examples in this section based on the trades table created there. We encapsulate the steps in a function so that it can be run for an arbitrary list of tickers and an arbitrary number of rows. You should enter it into a text editor, save it in a file with .q extension and then use (\l) to load it into your q session. Here is the file that we stored in the directory Q4M under QHOME.
qmkTrades:{[sz]
tickers:`aapl`goog`ibm;
dt:2025.01.01+sz?31;
tm:sz?24:00:00.000;
sym:sz?tickers;
qty:10*1+sz?1000;
px:90.0+(sz?2001)%100;
t:([] dt; tm; sym; qty; px);
t:`dt`tm xasc t;
t:update px:2*px from t where sym=`aapl;
t:update px:1.75*px from t where sym=`goog;
t:update px:2.7*px from t where sym=`ibm;
t}
Now in q fresh q session,
q)\l Q4M/trades.q
q)trades:mkTrades 10000000
At the time of this writing (July 2025) on the author’s four-year-old MacBook Pro, it takes 1.5 seconds to create a table with 10,000,000 rows.
9.13.2 The instrument Table
The instr table is a static keyed table containing basic information about the companies whose financial instruments (stocks in our case) are traded. Its schema has fields for the stock symbol, the name of the company and the industry classification of the company.
We also include this in our trades.q file for ease of repetition but you should enter it at the console in case you are tempted to use (insert).
q)instr:([sym:`symbol$()] name:(); industry:`symbol$())
We populate instr using upsert.
q)`instr upsert (`ibm; "International Business Machines"; `$"Computer Services")
`instr
q)`instr upsert (`msft; "Microsoft"; `$"Software")
`instr
q)`instr upsert (`goog; "Google"; `$"Search")
`instr
q)`instr upsert (`aapl; "Apple"; `$"Electronics")
`instr
q)instr
sym | name industry
----| ---------------------------------------------------
ibm | "International Business Machines" Computer Services
msft| "Microsoft" Software
goog| "Google" Search
aapl| "Apple"
Now we make sym a foreign key to trades.
q)update `instr$sym from `trades
`trades
q)meta trades
c | t f a
---| ---------
dt | d s
tm | t
sym| s instr
qty| j
px | f
9.13.3 Basic Queries
Here we demonstrate basic operations against the trades and instrument tables.
We can count the total number of trades in several ways. The last version causes q to do the most work and should be avoided. Observe that q does not carry the name of its own virtual column i to the result of select.
q)count trades
10000000
q)exec count i from trades
10000000
q)select count i from trades
x
--------
100000000
q)count select from trades
10000000
Let’s count the number of trades for an individual symbol.
Tip
For splayed and partitioned tables, only the (select) expression will work to count records.
q)exec count i from trades where sym=`ibm
3331308
We can count the number of trades across all symbols. The second version un-keys the result.
q)select ct:count i by sym from trades
sym | ct
----| -------
aapl| 3334041
goog| 3334651
ibm | 3331308
q)() xkey select ct:count i by sym from trades
…
We find one day's trades for aapl.
q)select from trades where dt=2025.01.15,sym=`aapl
dt tm sym qty px
----------------------------------------
2025.01.15 00:00:02.365 aapl 7790 182.44
2025.01.15 00:00:02.496 aapl 6430 200.44
2025.01.15 00:00:02.783 aapl 3690 195.68
…
We find all lunch hour trades for goog. Note that within includes both endpoints. Also recall that you must use general list notation if the endpoints are variables.
q)select from trades where sym=`goog, tm within 12:00:00 13:00:00
dt tm sym qty px
------------------------------------------
2025.01.01 12:00:00.103 goog 160 177.1
2025.01.01 12:00:00.502 goog 710 165.69
…
q)noon:12:00:00
q)thirteen:13:00:00
q)select from trades where sym=`goog, tm within (noon;thirteen)
dt tm sym qty px
------------------------------------------
2025.01.01 12:00:00.103 goog 160 177.1
2025.01.01 12:00:00.502 goog 710 165.69
…
We find the maximum daily price for aapl. Due to our uniform distribution, it is statistically constant.
q)select maxpx:max px by dt from trades where sym=`aapl
dt | maxpx
----------| -----
2025.01.01| 220
2025.01.02| 220
..
We find the minimum and maximum trade price over the trading range for each symbol by company name. We point out that this is actually grouping on a computed column since the foreign key is resolved to the instrument table via an implicit join.
q)select lo:min px, hi:max px by sym.name from trades
name | lo hi
-------------------------------| -----------
Apple | 180 220
Google | 157.5 192.5
International Business Machines| 243 297
We find the total and average trade volume for specific symbols. Due to our simplistic construction, the volumes are statistically the same.
q)select totq:sum qty, avgq:avg qty by sym from trades where sym in `ibm`goog
sym | totq avgq
----| --------------------
goog| 16687533260 5004.282
ibm | 16676821610 5006.088
We find the high, low and close over one minute intervals.
q)select hi:max px,lo:min px,open:first px, close:last px by dt,tm.minute from trades
dt minute| hi lo open close
-----------------| ----------------------------------
2025.01.01 00:00 | 295.569 158.13 181.62 208.42
2025.01.01 00:01 | 296.73 157.7275 170.8175 208.8
..
We demonstrate how to use our own functions in queries. Suppose we define a funky average that weights items by their position. Notice that we use intra-line assignment to avoid counting x three times.
q)favg:{(sum x*1+til ctx)%ctx*ctx:count x}
Then we can apply favg just as we did the built-in q function avg.
q)select favgpx:favg px by sym from trades
sym | favgpx
----| --------
aapl| 100.0066
goog| 87.50431
ibm | 135.0068
9.13.4 Meaty Queries
In this section, we demonstrate more interesting qSQL examples against the trades table.
We find volume weighted average price by day and for 100 millisecond buckets for aapl. Note that latter takes under two seconds for 10,000,000 trades on the author’s four year old laptop.
q)select vwap:qty wavg px by dt from trades where sym=`ibm
dt | vwap
----------| --------
2025.01.01| 270.0084
2025.01.02| 269.9907
..
q)select vwap:qty wavg px by dt,100 xbar tm from trades where sym=`ibm
dt tm | vwap
-----------------------| --------
2025.01.01 00:00:00.700| 261.36
2025.01.01 00:00:01.800| 285.174
..
We use fby to select records attaining the maximum price each day for each sym.
q)select from trades where px=(max;px) fby sym
dt tm sym qty px
---------------------------------------
2025.01.01 00:02:11.365 ibm 750 297
2025.01.01 00:04:34.916 goog 8780 192.5
2025.01.01 00:16:41.350 goog 2640 192.5
..
We use favg from the previous section to demonstrate that user functions can appear in any phrase of the query.
q)select from trades where px<2*(favg;px) fby sym
dt tm sym qty px
-----------------------------------------
2025.01.01 00:00:00.218 aapl 6970 181.62
2025.01.01 00:00:00.272 aapl 270 188.42
2025.01.01 00:00:00.735 ibm 490 261.36
..
We find the average daily volume and price for all instruments and store the result for the following example.
q)show atrades:select avgqty:avg qty, avgpx:avg px by sym, dt from trades
sym dt | avgqty avgpx
---------------| -----------------
aapl 2025.01.01| 4991.744 200.016
aapl 2025.01.02| 5013.513 200.0394
aapl 2025.01.03| 5010.029 200.0046
…
We find the days when the average price went up. Note that we create our own deltas0 to return 0 for the initial item since the behavior of the built-in deltas doesn’t work well for this purpose. Observe that the nested result columns scroll off the page.
q)deltas0:{first[x] -': x}
q)select dt, avgpx by sym from atrades where 0<deltas0 avgpx
sym | dt ----|--------------------------------------------------------..
aapl| 2025.01.02 2025.01.04 2025.01.05 2025.01.07 2025.01.09 ..
goog| 2025.01.02 2025.01.04 2025.01.05 2025.01.07 2025.01.09 ..
ibm | 2025.01.01 2025.01.04 2025.01.06 2025.01.08 2025.01.10 ..
To compactify the display, take only the first few field values.
q)select 2#dt, 2#avgpx by sym from atrades where 0<deltas0 avgpx
sym | dt avgpx
----| ---------------------------------------
aapl| 2025.01.02 2025.01.04 200.0394 200.0133
goog| 2025.01.02 2025.01.04 174.988 174.9687
ibm | 2025.01.01 2025.01.04 269.9992 270.0605
In order to demonstrate operations on nested columns, we denormalize trades into a keyed table with one row and complex columns for each symbol. We can do this either by grouping without aggregation or by using xgroup. The difference is that xgroup does not automatically sort on the result key column.
q)dntrades:select dt,tm,qty,px by sym from trades
q)dntrades~`sym xasc `sym xgroup trades
1b
We display the first two items of each field to make the structure more evident.
q)select 2#dt,2#tm,2#qty,2#px by sym from trades
sym | dt tm qty px
----| ---------------------------------------------------------------------------
aapl| 2025.01.01 2025.01.01 00:00:00.218 00:00:00.272 6970 270 181.62 188.42
goog| 2025.01.01 2025.01.01 00:00:02.777 00:00:05.267 8970 1150 192.2725 180.7575
ibm | 2025.01.01 2025.01.01 00:00:00.735 00:00:01.893 490 4650 261.36 285.174
In such a table with compound columns – i.e., lists of simple lists of the same type – you will need iterators for column operations. Lots of iterators.
q)select sym,ct:count each dt,avgpx:avg each px from dntrades
sym ct avgpx
---------------------
aapl 3334041 200.0091
goog 3334651 175.0039
ibm 3331308 270.0106
We can apply our own unary favg function.
q)select sym, favgpx:favg each px from dntrades
sym favgpx
-------------
aapl 100.0066
goog 87.50431
ibm 135.0068
To find the volume weighted average price by we use the iterator Each Parallel ' with the binary wavg.
q)select sym,vwap:qty wavg' px from dntrades
sym vwap
-------------
aapl 200.0081
goog 174.9991
ibm 270.0097
Note that this iterator generalizes to n-ary functions for any n>1.
We find the profit of the ideal transaction over the month for each symbol. This is the maximum amount of money that could be made with complete foreknowledge of the market. In other words, find the largest profit obtainable by buying at some traded price and selling at the highest subsequently traded price. To solve this, we reverse the perspective. For each traded price, we look at the minimum of prices that preceded it. The largest such difference is our answer. The maximum draw down is dual.
q)select max px-mins px by sym from trades
sym | px
----| --
aapl| 40
goog| 35
ibm | 54
q)select min px-maxs px by sym from trades
sym | px
----| ---
aapl| -40
goog| -35
ibm | -54
Given our uniform distributions these results are expected.
9.13.5 Excursion – Pivot Table
The objective is to pivot a table on a column (or columns) as in Excel. The construct revolves (pun intended) around three columns. The distinct values of the pivot column become the column names in the result. Each field in the value column is placed under the pivot column whose name appears in its original row. The destination row of a value column field is determined by the key column field in its original row. This is a pivot in the sense that the original vertical arrangement is transformed to horizontal. As usual, an example is worth a thousand words.
We begin with the simple table.
q)show t:([]k:1 2 3 2 3; p:`a1`a2`a1`a3`a2; v:100 200 300 400 500)
k p v
--------
1 a1 100
2 a2 200
3 a1 300
2 a3 400
3 a2 500
The desired result pvt is as follows. Observe that this is actually a keyed table but it is simple to unkey it with xkey.
q)pvt
k| a1 a2 a3
-| -----------
1| 100
2| 200 400
3| 300 500
We break the solution into multiple steps, beginning with our sample table and then successively generalizing to handle a wider class of use cases. Here we suppress many results with '_' to force you to type the expressions yourself.
-
Collect the unique values of the pivot column p into a list P.
q)P:exec distinct p from t -
Write a query that extracts the key-value pairs for p and v grouped by k.
q)exec p!v by k from t -
Enhance the previous query to produce a keyed table by rectangularizing the dictionaries by filling missing values. Magic happens! Observe that we do need to name the result key column explicitly. In what follows we suppress the query outputs so you can type them yourself.
q)exec P#p!v by k:k from t k| a1 a2 a3 -| ----------- 1| 100 2| 200 400 3| 300 500 -
Write the query in functional form to extract the unique values of the pivot column.
q)?[t; (); (); (distinct;`p)] _ -
Convert the pivot query to functional form.
q)?[t;(); (enlist `k)!enlist `k; (#;`P;(!;`p;`v))] q)_ -
Place the previous functional form in a function that takes the table and the column names as parameters and returns the pivot result.
q)dopivot:{[t;kn;pn;vn] P:?[t; (); (); (distinct;pn)]; ?[t;(); (enlist kn)!enlist kn; (#;`P; (!;pn;`v))]} q)dopivot[t;`k;`p;`v] _ -
Write an expression that converts the fields of the pivot column to valid names when they are not – e.g., integers. Test it on tn.
q)tn:([]k:1 2 3 2 3;p:(`a1;2;`a1;3;2);v:100 200 300 400 500) q)mkNames:{ x:(::),x; x:1_x:@[x; where not 10h=type each x; string]; `$@[x; where not any x[;0] within/: ("AZ";"az"); "X_",]} q)dopivot:{[t; kn; pn; vn] t:![t; (); 0b; (1#pn)!enlist (`mkNames; pn)]; P:?[t; (); (); (distinct; pn)]; ?[t;(); (1#kn)!1#kn; (#;`P;(!;pn;vn))]} q)dopivot[tn;`k;`p;`v] _ -
Next, we generalize to the case when there are multiple entries in v for a single key value. For example, applied to tr below, our current dopivot misses the 1000 value in v.
q)tr:([]k:1 2 3 2 3 1; p:`a1`a2`a1`a3`a2`a1; v:100 200 300 400 500 1000) q)dopivot[tr;`k;`p;`v] / misses 1000 value k| a1 a2 a3 -| ----------- 1| 100 2| 200 400 3| 300 500We would like to apply an aggregate function such as sum to obtain,
k| a1 a2 a3 -| ------------ 1| 1100 2| 200 400 3| 300 500Modify dopivot to take an aggregate function agg and apply it to v as part of the pivot process. Test it on tr using sum.
q)dopivot:{[t; agg; kn; pn; vn] t:![t; (); 0b; (1#pn)!enlist (`mkNames; pn)]; t:?[t; (); (kn,pn)!kn,pn; (1#vn)!enlist (agg;vn)]; P:?[t; (); (); (distinct; pn)]; ?[t; (); (1#kn)!1#kn; (#;`P;(!;pn;vn))]} q)dopivot[tr;sum;`k;`p;`v] _ -
We would like to handle the case of compound keys – e.g., k1 and k2 in tk below.
q)k1:1 2 3 2 3 1 1 q)k2:10 20 30 40 50 60 10 q)p:`a1`a2`a1`a3`a2`a1`a1 q)v:100 200 300 400 500 1000 10000 q)show tk:([]k1:k1; k2:k2; p:p; v:v) k1 k2 p v -------------- 1 10 a1 100 2 20 a2 200 3 30 a1 300 2 40 a3 400 3 50 a2 500 1 60 a1 1000 1 10 a1 10000Modify dopivot to take a list of key column names and test it on tk
q)dopivot:{[t; agg; ks; pn; vn] ks,:(); t:![t; (); 0b; (1#pn)!enlist (`mkNames; pn)]; t:?[t; (); (ks,pn)!ks,pn; (1#vn)!enlist (agg;vn)]; P:?[t; (); (); (distinct; pn)]; ?[t;(); ks!ks; (#;`P;(!;pn;vn))]} q)dopivot[tk;sum;"k1<>2";`k1`k2;`p;`v] _ -
Modify dopivot to accept a string argument representing a valid 'where' phrase for the input table and use it to constrain the pivot. Test it against tk with the phrase "k1<>2".
q)dopivot:{[t; agg; wh; ks; pn; vn] ks,:(); c:enlist parse wh; t:?[t; c; 0b; (cols t)!cols t]; t:![t; (); 0b; (1#pn)!enlist (`mkNames; pn)]; t:?[t; (); (ks,pn)!ks,pn; (1#vn)!enlist (agg;vn)]; P:?[t; (); (); (distinct; pn)]; ?[t; (); ks!ks; (#;`P;(!;pn;vn))]} q)dopivot[tk;sum;"k1<>2";`k1`k2;`p;`v] _ -
Finally, modify dopivot to accept an empty list in the agg or the wh parameters. In the former case, use (first) as the default aggregate; in the latter, perform no constraint. Test against tk with empty list arguments for both.
q)dopivot:{[t; agg; wh; ks; pn; vn] ks,:(); agg:first agg,first; c:$[count wh; enlist parse wh; ()]; t:?[t; c; 0b; (cols t)!cols t]; t:![t; (); 0b; (1#pn)!enlist (`mkNames; pn)]; t:?[t; (); (ks,pn)!ks,pn; (1#vn)!enlist (agg;vn)]; P:?[t; (); (); (distinct; pn)]; ?[t; (); ks!ks; (#;`P;(!;pn;vn))]} q)dopivot[tk;();"";`k1`k2;`p;`v] _
We note that there are many directions you could take this. It could be extended for more functionality. The code could be made more concise. It could also be optimized for performance on very large tables. We leave these as exercises to the reader.