`aj`, `aj0`, `ajf`, `ajf0`

As-of join

aj  [c; t1; t2]
aj0 [c; t1; t2]
ajf [c; t1; t2]
ajf0[c; t1; t2]

Where

t1 is a table or the name of a table as a symbol (in the latter case, the table is updated in place with the result)
t2 is a simple table
c is a symbol vector of n column names, common to t1 and t2, and of matching type
column c_n is of a sortable type (typically time)

returns a table with records from the left-join of t1 and t2. In the join, columns c₀...c_n-1 are matched for equality, and the last value of c_n (most recent time) is taken. For each record in t1, the result has one record with the items in t1, and

if there are matching records in t2, the items of the last (in row order) matching record are appended to those of t1;
otherwise the remaining columns are null.

q)t:([]time:10:01:01 10:01:03 10:01:04;sym:`msft`ibm`ge;qty:100 200 150)
q)t
time       sym  qty
-----------------
10:01:01 msft 100
10:01:03 ibm  200
10:01:04 ge   150

q)q:([]time:10:01:00 10:01:00 10:01:00 10:01:02;sym:`ibm`msft`msft`ibm;px:100 99 101 98)
q)q
time     sym  px 
-----------------
10:01:00 ibm  100
10:01:00 msft 99 
10:01:00 msft 101
10:01:02 ibm  98 

q)aj[`sym`time;t;q]
time       sym  qty px
---------------------
10:01:01 msft 100 101
10:01:03 ibm  200 98
10:01:04 ge   150

aj is a multithreaded primitive.

There is no requirement for any of the join columns to be keys but the join is faster on keys.

`aj`, `aj0`

aj and aj0 return different times in their results:

aj    boundary time from t1
aj0   actual time from t2

`ajf`, `ajf0`

ajf and ajf0 fill from t1 if the corresponding value in t2 is null. For example:

q)t0:([]time:2#00:00:01;sym:`a`b;p:1 1;n:`r`s)
q)t1:([]time:2#00:00:01;sym:`a`b;p:0 1)
q)t2:([]time:2#00:00:00;sym:`a`b;p:1 0N;n:`r`s)
q)t0~ajf[`sym`time;t1;t2]
1b

Performance

Order of search columns

Ensure the first argument to aj, the columns to search on, is in the correct order, e.g. `sym`time. Otherwise you’ll suffer a severe performance hit.

aj should run at a million or two trade records per second; whether the tables are mapped or not is irrelevant. However, for speed:

medium	t2[c₁]	t2[c₂…]	example
memory	`g#`	sorted within `c₁`	`quote` has `g#sym and `time` sorted within `sym`
disk	`p#`	sorted within `c₁`	`quote` has `p#sym and `time` sorted within `sym`

Departure from this incurs a severe performance penalty.

Note that on disk, the g# attribute does not help.

Select the virtual partition column only if you need it. It is constructed on demand, which can be slow for large partitions.

`select` from `t2`

In memory, there is no need to select from t2. Irrespective of the number of records, use, for example:

aj[`sym`time;select … from trade where …;quote]

instead of

aj[`sym`time;select … from trade where …;
             select … from quote where …]

In contrast, on disk, you must map your splayed or partitioned database to memory first:

Splayed:

aj[`sym`time;select … from trade where …;select … from quote]

Partitioned:

aj[`sym`time;select … from trade where …;
             select … from quote where date = …]

If further where constraints are used, the columns will be copied instead of mapped into memory, slowing down the join.

If you are using a database where an individual day’s data is spread over multiple partitions the on-disk p# will be lost when retrieving data with a constraint such as …date=2011.08.05. In this case you will have to reduce the number of quotes retrieved by applying further constraints – or by re-applying the attribute.

asof

Joins

Q for Mortals §9.9.8 As-of Joins

aj, aj0, ajf, ajf0

aj, aj0

ajf, ajf0

Performance

select from t2

`aj`, `aj0`, `ajf`, `ajf0`

`aj`, `aj0`

`ajf`, `ajf0`

`select` from `t2`