How to join data

This page shows you how to join tables in KDB-X. Joins are split into two categories, keyed joins and as-of joins. Joining tables allows you to combine data spread across various tables for better data analysis.

Keyed joins are joins in which columns in the first argument are matched with the key columns of the second argument.

As-of joins are joins in which a time column in the first argument specifies corresponding intervals in a time column of the second argument.

Overview

Keyed joins
As-of joins
- As-of join - aj
- Window join - wj

Keyed joins

Left join

Left join (lj) joins two tables based on the key column of the second table. The result is all rows from t1, updated with values from t2.

Left join (lj) syntax:

t1 lj t2

Where there is a matching record in t2, common columns are replaced from t2.

Where there are no matching records in t2, common columns remain unchanged and new columns are filled with nulls.

t2 needs to be keyed for lj to work, t1 can be either keyed or unkeyed. Tables being joined need not have the same schema, but the keyed columns of t2 must also exist in t1.

Example:

q)show x:([]a:1 2 3;b:`I`J`K;c:10 20 30)
a b c
------
1 I 10
2 J 20
3 K 30

q)show y:([a:1 3;b:`I`K]c:1 2;d:10 20)
a b| c d
---| ----
1 I| 1 10
3 K| 2 20

q)x lj y
a b c  d
---------
1 I 1  10
2 J 20
3 K 2  20

Inner join

Inner join (ij) is similar to left join (lj), and returns only the rows that have matching key values in both tables.

Inner join (ij) syntax:

t1 ij t2

The result has one combined record for each row in t1 that matches a row in t2.

t2 needs to be keyed for ij to work, t1 can be either keyed or unkeyed. Tables being joined need not have the same schema, but the keyed columns of t2 must also exist in t1.

Example:

q)show t:([]sym:`IBM`FDP`FDP`FDP`IBM`MSFT;price:0.7029677 0.08378167 0.06046216 0.658985 0.2608152 0.5433888)
sym  price
---------------
IBM  0.7029677
FDP  0.08378167
FDP  0.06046216
FDP  0.658985
IBM  0.2608152
MSFT 0.5433888

q)show s:([sym:`IBM`MSFT`GOOG];ex:`N`CME`CME;mc:1000 250 500)
sym | ex  mc
----| --------
IBM | N   1000
MSFT| CME 250
GOOG| CME 500

q)t ij s
sym  price     ex  mc
-----------------------
IBM  0.7029677 N   1000
IBM  0.2608152 N   1000
MSFT 0.5433888 CME 250

Where there are common columns in both tables, values are overwritten from t2:

q)show t1:([k:1 2 3 4]; c:10 20 30 40;s:`a`j`h`y)
k| c  s
-| ----
1| 10 a
2| 20 j
3| 30 h
4| 40 y

q)show t2:([k:2 3 4 5]; c:222 300 400 500;v:22.22 3.3 4.4 5.5)
k| c   v
-| ---------
2| 222 22.22
3| 300 3.3
4| 400 4.4
5| 500 5.5

q)t1 ij t2
k| c   s v
-| -----------
2| 222 j 22.22
3| 300 h 3.3
4| 400 y 4.4

Equi join

Equi join (ej) is similar to inner join (ij), but it lets you specify the columns you want to join on.

Equi join (ej) syntax:

ej[c;t1;t2]

Where:

c is a symbol vector of column names (or a single column name symbol)
t1 and t2 are tables

The result has one combined record for each row in t2 that matches t1 on columns c.

Tables being joined need not have the same schema, and any of the tables being joined may be keyed or unkeyed.

Example:

q)show s:([]sym:`IBM`MSFT`GOOG;ex:`N`CME`CME;mc:1000 250 500)
sym  ex  mc
-------------
IBM  N   1000
MSFT CME 250
GOOG CME 500

q)show t:([]sym:`IBM`FDP`FDP`FDP`IBM`MSFT;price:0.7029677 0.08378167 0.06046216 0.658985 0.2608152 0.5433888)
sym  price
---------------
IBM  0.7029677
FDP  0.08378167
FDP  0.06046216
FDP  0.658985
IBM  0.2608152
MSFT 0.5433888

q)ej[`sym;s;t]
sym  ex  mc   price
-----------------------
IBM  N   1000 0.7029677
IBM  N   1000 0.2608152
MSFT CME 250  0.5433888

Where there are common columns in both tables, values are overwritten from t2:

q)show t1:([]k:1 2 3 4;c:10 20 30 40;s:`a`j`h`y)
k c  s
------
1 10 a
2 20 j
3 30 h
4 40 y

q)show t2:([]k:2 2 3 4 5;c:200 222 300 400 500;v:2.2 22.22 3.3 4.4 5.5)
k c   v
-----------
2 200 2.2
2 222 22.22
3 300 3.3
4 400 4.4
5 500 5.5

q)ej[`k;t1;t2]
k c   s v
-------------
2 200 j 2.2
2 222 j 22.22
3 300 h 3.3
4 400 y 4.4

Plus join

Plus join (pj) is similar to left join (lj), except instead of updating t1 with values from t2, values get added together.

Plus join (pj) syntax:

t1 pj t2

For each record in t1, if there is a matching record in t2, it is added to the t1 record column by column. Common columns must be of appropriate types for addition.

If there is no matching record in t2, common column values are unchanged and values in new columns are set to zero. New columns are only allowed to be of numeric type.

t2 needs to be keyed for pj to work, t1 can be either keyed or unkeyed. Tables being joined need not have the same schema, but the keyed columns of t2 must also exist in t1.

Example:

q)show x:([]a:1 2 3;b:`x`y`z;c:10 20 30)
a b c
------
1 x 10
2 y 20
3 z 30

q)show y:([a:1 3;b:`x`z]c:1 2;d:10 20)
a b| c d
---| ----
1 x| 1 10
3 z| 2 20

q)x pj y           / values from columns 'c' are added together
a b c  d
---------
1 x 11 10         
2 y 20 0
3 z 32 20

Comma join

Comma join (,) will join two tables row-wise. It is similar to union join (uj).

Both tables must have the same schema for comma join (,) to work, otherwise a 'mismatch error will be thrown.

Both tables must also either be keyed or unkeyed. If both tables are unkeyed, rows are simply appended (including duplicates). When both tables are keyed, rows with matching keys are merged and the second table's values overwrite the first.

Unkeyed table example:

q)show t1:([]a:1 2 3;b:`a`b`c)
a b
---
1 a
2 b
3 c

q)show s1:([]a:2 11;b:`d`e)
a  b
----
2 d
11 e

q)t1,s1
a  b
----
1  a
2  b
3  c
2  d
11 e

q)t1,([]c:`d`f`g;d:1 2 3)
'mismatch
  [0]  t1,([]c:`d`f`g;d:1 2 3)
        ^

Keyed table example:

q)show t2:([a:1 2 3];b:`a`b`c)
a| b
-| -
1| a
2| b
3| c

q)show s2:([a:2 11];b:`d`e)
a | b
--| -
2 | d
11| e

q)t2,s2
a | b
--| -
1 | a
2 | d           / value from s2 replaces value t2
3 | c
11| e

Union join

Union join (uj) joins tables row-wise. It is similar to comma join (,), but it generalizes the comma join operator and can be used when tables have different schemas.

Union join (uj) syntax:

t1 uj t2

Both tables must either be keyed or unkeyed for uj to work. If both tables are unkeyed, rows are simply appended (including duplicates). When both tables are keyed, rows with matching keys are merged and the second table's values overwrite the first.

Unkeyed table example:

q)show s:([]a:1 2;b:2 3;c:5 7)
a b c
-----
1 2 5
2 3 7

q)show t:([]a:1 2 3;b:2 3 7;c:10 20 30;d:"ABC")
a b c  d
--------
1 2 10 A
2 3 20 B
3 7 30 C

q)s,t                            / tables do not conform for comma join
'mismatch

q)s uj t                         / second table is inserted
a b c  d                         
--------
1 2 5
2 3 7
1 2 10 A
2 3 20 B
3 7 30 C

Keyed table example:

q)show s:([a:1 2;b:2 3];c:5 7)
a b| c
---| -
1 2| 5
2 3| 7

q)show t:([a:1 2 3;b:2 3 7];c:10 20 30;d:"ABC")
a b| c  d
---| ----
1 2| 10 A
2 3| 20 B
3 7| 30 C

q)s uj t                         / matching records are updated
a b| c  d
---| ----
1 2| 10 A
2 3| 20 B
3 7| 30 C

Note

The operation of uj is expensive so it should only be used when necessary. In particular, when the tables conform, use comma join (,) or raze. Be especially cautious when low latency is paramount.

Coalesce

Coalesce (^) will merge two keyed tables, ignoring nulls.

Coalesce (^) syntax:

t1^t2

Where:

t1 and t2 are keyed tables
With no nulls in t2, the result is the same as for comma join (,)
When t2 has null column values, the column values of t1 are updated only with non-null values of t2

The performance of coalesce (^) is slower than that of comma join (,) since each column value of t2 must be checked for nulls.

Unlike comma join (,), both tables need not have the same schema for coalesce (^) to work.

Example:

q)show kt1:([k:1 2 3] c1:10 20 30;c2:`a`b`c)
k| c1 c2
-| -----
1| 10 a
2| 20 b
3| 30 c

q)show kt2:([k:3 4 5] c1:300 400 500;c2:`cc`dd`ee)
k| c1  c2
-| ------
3| 300 cc
4| 400 dd
5| 500 ee

q)kt1^kt2
k| c1  c2
-| ------
1| 10  a
2| 20  b
3| 300 cc
4| 400 dd
5| 500 ee

q)(kt1^kt2)~kt1,kt2
1b

q)show kt3:([k:2 3] c1:0N 3000;c2:`bbb`)
k| c1   c2
-| --------
2|      bbb
3| 3000

q)kt1,kt3
k| c1   c2
-| --------
1| 10   a
2|      bbb
3| 3000

q)kt1^kt3
k| c1   c2
-| --------
1| 10   a
2| 20   bbb
3| 3000 c

q)(kt1^kt3)~kt1,kt3
0b

As-of joins

As-of join

As-of join is most often used to join tables on time columns, to obtain a value in one table that is current as of a time in another table.

As-of join syntax:

aj[c1...cn;t1;t2]
aj0[c1...cn;t1;t2]

Where:

t1 is a table
t2 is an unkeyed table
c1...cn is a symbol list of n column names, common to t1 and t2, and of matching type
column cn is a column of sortable type (typically time)

For each record in t1, the result has one record with the items in t1, and if there are matching records in t2, the items of the last matching record are appended to those of t1. Otherwise the remaining columns are null.

The time column from t1 is used in aj. The time column from t2 is used in aj0.

There is no requirement for any of the join columns to be keys but the join is faster on keys. For more details on performance considerations with aj, please refer to here.

Example:

q)show t:([]time:10:01:01 10:01:03 10:01:04 10:01:09;sym:`msft`ibm`ge`msft;qty:100 200 150 300)
time     sym  qty
-----------------
10:01:01 msft 100
10:01:03 ibm  200
10:01:04 ge   150
10:01:09 msft 300

q)show q:([]time:10:01:00 10:01:00 10:01:00 10:01:02;sym:`ibm`msft`msft`ibm;px:100 99 101 98)
time     sym  px 
-----------------
10:01:00 ibm  100
10:01:00 msft 99 
10:01:00 msft 101
10:01:02 ibm  98 

q)aj[`sym`time;t;q]             / The time column from t is used
time     sym  qty px
---------------------
10:01:01 msft 100 101
10:01:03 ibm  200 98
10:01:04 ge   150
10:01:09 msft 300 101

q)aj0[`sym`time;t;q]            / The time column from q is used
time     sym  qty px
---------------------
10:01:00 msft 100 101
10:01:02 ibm  200 98
10:01:04 ge   150
10:01:00 msft 300 101

For an older variation of aj, use ajf, ajf0.

For a simpler form of aj, use asof.

Window join

Window join (wj) is a generalisation of as-of join (aj), handling aggregations as well. Instead of joining the last value as of a given time, it will perform an aggregation over data within specified intervals.

Window join (wj) syntax:

wj[w;c;t;(q;(f0;c0);(f1;c1))]
wj1[w;c;t;(q;(f0;c0);(f1;c1))]

Where:

t and q are simple tables to be joined
q should be sorted on `sym`time with `p# on sym
w is a pair of lists of times/timestamps, that is, of windows
c are the names of the common columns, which must have integral types
f0, f1 are aggregation functions applied to values in q columns c0, c1 over the windows

For each record in t, wj returns a record with additional columns c0 and c1, which are the results of the aggregation functions applied to values over the matching intervals in w.

A wj might typically look something like this:

wj[w;`sym`time;trade;(quote;(max;`ask);(min;`bid))]

To see all the values in each window, pass the identity function (::) in place of the aggregates:

wj[w;`sym`time;trade;(quote;(::;c0);(::;c1))]

wj takes the prevailing quote at the start of each window to be valid for aggregations. If you want to only aggregate what is strictly within your windows use wj1, whose syntax is identical to wj.

Example:

q)show t:([]sym:3#`ibm;time:09:30:01 09:30:04 09:30:08;price:100 103 101)
sym time     price
------------------
ibm 09:30:01 100
ibm 09:30:04 103
ibm 09:30:08 101

q)show q:([]sym:8#`ibm;time:09:30:01+(til 5),7 8 9;ask:101 103 103 104 104 103 102 100;bid:98 99 102 103 103 100 100 99)
sym time     ask bid
--------------------
ibm 09:30:01 101 98
ibm 09:30:02 103 99
ibm 09:30:03 103 102
ibm 09:30:04 104 103
ibm 09:30:05 104 103
ibm 09:30:08 103 100
ibm 09:30:09 102 100
ibm 09:30:10 100 99

q)show w:-2 1+\:t.time
09:29:59 09:30:02 09:30:06
09:30:02 09:30:05 09:30:09

q)wj[w;`sym`time;t;(q;(max;`ask);(min;`bid))]
sym time     price ask bid
--------------------------
ibm 09:30:01 100   103 98       / max ask, min bid within (09:29:59;09:30:02)
ibm 09:30:04 103   104 99       / max ask, min bid within (09:30:02;09:30:05)
ibm 09:30:08 101   104 100      / max ask, min bid within (09:30:05;09:30:09)

q)wj[w;`sym`time;t;(q;(::;`ask);(::;`bid))]
sym time     price ask             bid
-------------------------------------------------
ibm 09:30:01 100   101 103         98 99
ibm 09:30:04 103   103 103 104 104 99 102 103 103
ibm 09:30:08 101   104 103 102     103 100 100

q)wj1[w;`sym`time;t;(q;(max;`ask);(min;`bid))]
sym time     price ask bid
--------------------------
ibm 09:30:01 100   103 98
ibm 09:30:04 103   104 99
ibm 09:30:08 101   103 100

q)wj1[w;`sym`time;t;(q;(::;`ask);(::;`bid))]
sym time     price ask             bid
-------------------------------------------------
ibm 09:30:01 100   101 103         98 99
ibm 09:30:04 103   103 103 104 104 99 102 103 103
ibm 09:30:08 101   103 102         100 100

Summary

In this guide, you:

Learned how to join tables using keyed joins
Learned how to join tables using as-of joins

You now have the essential skills to join tables in KDB-X, and you should be well equipped to handle real world data joining and manipulation.