How to join data
This page shows you how to join tables in KDB-X. Joins are split into two categories, keyed joins and as-of joins. Joining tables allows you to combine data spread across various tables for better data analysis.
Keyed joins are joins in which columns in the first argument are matched with the key columns of the second argument.
As-of joins are joins in which a time column in the first argument specifies corresponding intervals in a time column of the second argument.
Overview
Keyed joins
Left join
Left join (lj) joins two tables based on the key column of the second table. The result is all rows from t1, updated with values from t2.
Left join (lj) syntax:
t1 lj t2
Where there is a matching record in t2, common columns are replaced from t2.
Where there are no matching records in t2, common columns remain unchanged and new columns are filled with nulls.
t2 needs to be keyed for lj to work, t1 can be either keyed or unkeyed. Tables being joined need not have the same schema, but the keyed columns of t2 must also exist in t1.
Example:
q)show x:([]a:1 2 3;b:`I`J`K;c:10 20 30)
a b c
------
1 I 10
2 J 20
3 K 30
q)show y:([a:1 3;b:`I`K]c:1 2;d:10 20)
a b| c d
---| ----
1 I| 1 10
3 K| 2 20
q)x lj y
a b c d
---------
1 I 1 10
2 J 20
3 K 2 20
Inner join
Inner join (ij) is similar to left join (lj), and returns only the rows that have matching key values in both tables.
Inner join (ij) syntax:
t1 ij t2
The result has one combined record for each row in t1 that matches a row in t2.
t2 needs to be keyed for ij to work, t1 can be either keyed or unkeyed. Tables being joined need not have the same schema, but the keyed columns of t2 must also exist in t1.
Example:
q)show t:([]sym:`IBM`FDP`FDP`FDP`IBM`MSFT;price:0.7029677 0.08378167 0.06046216 0.658985 0.2608152 0.5433888)
sym price
---------------
IBM 0.7029677
FDP 0.08378167
FDP 0.06046216
FDP 0.658985
IBM 0.2608152
MSFT 0.5433888
q)show s:([sym:`IBM`MSFT`GOOG];ex:`N`CME`CME;mc:1000 250 500)
sym | ex mc
----| --------
IBM | N 1000
MSFT| CME 250
GOOG| CME 500
q)t ij s
sym price ex mc
-----------------------
IBM 0.7029677 N 1000
IBM 0.2608152 N 1000
MSFT 0.5433888 CME 250
Where there are common columns in both tables, values are overwritten from t2:
q)show t1:([k:1 2 3 4]; c:10 20 30 40;s:`a`j`h`y)
k| c s
-| ----
1| 10 a
2| 20 j
3| 30 h
4| 40 y
q)show t2:([k:2 3 4 5]; c:222 300 400 500;v:22.22 3.3 4.4 5.5)
k| c v
-| ---------
2| 222 22.22
3| 300 3.3
4| 400 4.4
5| 500 5.5
q)t1 ij t2
k| c s v
-| -----------
2| 222 j 22.22
3| 300 h 3.3
4| 400 y 4.4
Equi join
Equi join (ej) is similar to inner join (ij), but it lets you specify the columns you want to join on.
Equi join (ej) syntax:
ej[c;t1;t2]
Where:
cis a symbol vector of column names (or a single column name symbol)t1andt2are tables
The result has one combined record for each row in t2 that matches t1 on columns c.
Tables being joined need not have the same schema, and any of the tables being joined may be keyed or unkeyed.
Example:
q)show s:([]sym:`IBM`MSFT`GOOG;ex:`N`CME`CME;mc:1000 250 500)
sym ex mc
-------------
IBM N 1000
MSFT CME 250
GOOG CME 500
q)show t:([]sym:`IBM`FDP`FDP`FDP`IBM`MSFT;price:0.7029677 0.08378167 0.06046216 0.658985 0.2608152 0.5433888)
sym price
---------------
IBM 0.7029677
FDP 0.08378167
FDP 0.06046216
FDP 0.658985
IBM 0.2608152
MSFT 0.5433888
q)ej[`sym;s;t]
sym ex mc price
-----------------------
IBM N 1000 0.7029677
IBM N 1000 0.2608152
MSFT CME 250 0.5433888
Where there are common columns in both tables, values are overwritten from t2:
q)show t1:([]k:1 2 3 4;c:10 20 30 40;s:`a`j`h`y)
k c s
------
1 10 a
2 20 j
3 30 h
4 40 y
q)show t2:([]k:2 2 3 4 5;c:200 222 300 400 500;v:2.2 22.22 3.3 4.4 5.5)
k c v
-----------
2 200 2.2
2 222 22.22
3 300 3.3
4 400 4.4
5 500 5.5
q)ej[`k;t1;t2]
k c s v
-------------
2 200 j 2.2
2 222 j 22.22
3 300 h 3.3
4 400 y 4.4
Plus join
Plus join (pj) is similar to left join (lj), except instead of updating t1 with values from t2, values get added together.
Plus join (pj) syntax:
t1 pj t2
For each record in t1, if there is a matching record in t2, it is added to the t1 record column by column. Common columns must be of appropriate types for addition.
If there is no matching record in t2, common column values are unchanged and values in new columns are set to zero. New columns are only allowed to be of numeric type.
t2 needs to be keyed for pj to work, t1 can be either keyed or unkeyed. Tables being joined need not have the same schema, but the keyed columns of t2 must also exist in t1.
Example:
q)show x:([]a:1 2 3;b:`x`y`z;c:10 20 30)
a b c
------
1 x 10
2 y 20
3 z 30
q)show y:([a:1 3;b:`x`z]c:1 2;d:10 20)
a b| c d
---| ----
1 x| 1 10
3 z| 2 20
q)x pj y / values from columns 'c' are added together
a b c d
---------
1 x 11 10
2 y 20 0
3 z 32 20
Comma join
Comma join (,) will join two tables row-wise. It is similar to union join (uj).
Both tables must have the same schema for comma join (,) to work, otherwise a 'mismatch error will be thrown.
Both tables must also either be keyed or unkeyed. If both tables are unkeyed, rows are simply appended (including duplicates). When both tables are keyed, rows with matching keys are merged and the second table's values overwrite the first.
Unkeyed table example:
q)show t1:([]a:1 2 3;b:`a`b`c)
a b
---
1 a
2 b
3 c
q)show s1:([]a:2 11;b:`d`e)
a b
----
2 d
11 e
q)t1,s1
a b
----
1 a
2 b
3 c
2 d
11 e
q)t1,([]c:`d`f`g;d:1 2 3)
'mismatch
[0] t1,([]c:`d`f`g;d:1 2 3)
^
Keyed table example:
q)show t2:([a:1 2 3];b:`a`b`c)
a| b
-| -
1| a
2| b
3| c
q)show s2:([a:2 11];b:`d`e)
a | b
--| -
2 | d
11| e
q)t2,s2
a | b
--| -
1 | a
2 | d / value from s2 replaces value t2
3 | c
11| e
Union join
Union join (uj) joins tables row-wise. It is similar to comma join (,), but it generalizes the comma join operator and can be used when tables have different schemas.
Union join (uj) syntax:
t1 uj t2
Both tables must either be keyed or unkeyed for uj to work. If both tables are unkeyed, rows are simply appended (including duplicates). When both tables are keyed, rows with matching keys are merged and the second table's values overwrite the first.
Unkeyed table example:
q)show s:([]a:1 2;b:2 3;c:5 7)
a b c
-----
1 2 5
2 3 7
q)show t:([]a:1 2 3;b:2 3 7;c:10 20 30;d:"ABC")
a b c d
--------
1 2 10 A
2 3 20 B
3 7 30 C
q)s,t / tables do not conform for comma join
'mismatch
q)s uj t / second table is inserted
a b c d
--------
1 2 5
2 3 7
1 2 10 A
2 3 20 B
3 7 30 C
Keyed table example:
q)show s:([a:1 2;b:2 3];c:5 7)
a b| c
---| -
1 2| 5
2 3| 7
q)show t:([a:1 2 3;b:2 3 7];c:10 20 30;d:"ABC")
a b| c d
---| ----
1 2| 10 A
2 3| 20 B
3 7| 30 C
q)s uj t / matching records are updated
a b| c d
---| ----
1 2| 10 A
2 3| 20 B
3 7| 30 C
Note
The operation of uj is expensive so it should only be used when necessary. In particular, when the tables conform, use comma join (,) or raze. Be especially cautious when low latency is paramount.
Coalesce
Coalesce (^) will merge two keyed tables, ignoring nulls.
Coalesce (^) syntax:
t1^t2
Where:
t1andt2are keyed tables- With no nulls in
t2, the result is the same as for comma join (,) - When
t2has null column values, the column values oft1are updated only with non-null values oft2
The performance of coalesce (^) is slower than that of comma join (,) since each column value of t2 must be checked for nulls.
Unlike comma join (,), both tables need not have the same schema for coalesce (^) to work.
Example:
q)show kt1:([k:1 2 3] c1:10 20 30;c2:`a`b`c)
k| c1 c2
-| -----
1| 10 a
2| 20 b
3| 30 c
q)show kt2:([k:3 4 5] c1:300 400 500;c2:`cc`dd`ee)
k| c1 c2
-| ------
3| 300 cc
4| 400 dd
5| 500 ee
q)kt1^kt2
k| c1 c2
-| ------
1| 10 a
2| 20 b
3| 300 cc
4| 400 dd
5| 500 ee
q)(kt1^kt2)~kt1,kt2
1b
q)show kt3:([k:2 3] c1:0N 3000;c2:`bbb`)
k| c1 c2
-| --------
2| bbb
3| 3000
q)kt1,kt3
k| c1 c2
-| --------
1| 10 a
2| bbb
3| 3000
q)kt1^kt3
k| c1 c2
-| --------
1| 10 a
2| 20 bbb
3| 3000 c
q)(kt1^kt3)~kt1,kt3
0b
As-of joins
As-of join
As-of join is most often used to join tables on time columns, to obtain a value in one table that is current as of a time in another table.
As-of join syntax:
aj[c1...cn;t1;t2]
aj0[c1...cn;t1;t2]
Where:
t1is a tablet2is an unkeyed tablec1...cnis a symbol list of n column names, common tot1andt2, and of matching type- column
cnis a column of sortable type (typically time)
For each record in t1, the result has one record with the items in t1, and if there are matching records in t2, the items of the last matching record are appended to those of t1. Otherwise the remaining columns are null.
The time column from t1 is used in aj. The time column from t2 is used in aj0.
There is no requirement for any of the join columns to be keys but the join is faster on keys. For more details on performance considerations with aj, please refer to here.
Example:
q)show t:([]time:10:01:01 10:01:03 10:01:04 10:01:09;sym:`msft`ibm`ge`msft;qty:100 200 150 300)
time sym qty
-----------------
10:01:01 msft 100
10:01:03 ibm 200
10:01:04 ge 150
10:01:09 msft 300
q)show q:([]time:10:01:00 10:01:00 10:01:00 10:01:02;sym:`ibm`msft`msft`ibm;px:100 99 101 98)
time sym px
-----------------
10:01:00 ibm 100
10:01:00 msft 99
10:01:00 msft 101
10:01:02 ibm 98
q)aj[`sym`time;t;q] / The time column from t is used
time sym qty px
---------------------
10:01:01 msft 100 101
10:01:03 ibm 200 98
10:01:04 ge 150
10:01:09 msft 300 101
q)aj0[`sym`time;t;q] / The time column from q is used
time sym qty px
---------------------
10:01:00 msft 100 101
10:01:02 ibm 200 98
10:01:04 ge 150
10:01:00 msft 300 101
For an older variation of aj, use ajf, ajf0.
For a simpler form of aj, use asof.
Window join
Window join (wj) is a generalisation of as-of join (aj), handling aggregations as well. Instead of joining the last value as of a given time, it will perform an aggregation over data within specified intervals.
Window join (wj) syntax:
wj[w;c;t;(q;(f0;c0);(f1;c1))]
wj1[w;c;t;(q;(f0;c0);(f1;c1))]
Where:
tandqare simple tables to be joinedqshould be sorted on`sym`timewith`p#onsymwis a pair of lists of times/timestamps, that is, of windowscare the names of the common columns, which must have integral typesf0,f1are aggregation functions applied to values in q columnsc0,c1over the windows
For each record in t, wj returns a record with additional columns c0 and c1, which are the results of the aggregation functions applied to values over the matching intervals in w.
A wj might typically look something like this:
wj[w;`sym`time;trade;(quote;(max;`ask);(min;`bid))]
To see all the values in each window, pass the identity function (::) in place of the aggregates:
wj[w;`sym`time;trade;(quote;(::;c0);(::;c1))]
wj takes the prevailing quote at the start of each window to be valid for aggregations. If you want to only aggregate what is strictly within your windows use wj1, whose syntax is identical to wj.
Example:
q)show t:([]sym:3#`ibm;time:09:30:01 09:30:04 09:30:08;price:100 103 101)
sym time price
------------------
ibm 09:30:01 100
ibm 09:30:04 103
ibm 09:30:08 101
q)show q:([]sym:8#`ibm;time:09:30:01+(til 5),7 8 9;ask:101 103 103 104 104 103 102 100;bid:98 99 102 103 103 100 100 99)
sym time ask bid
--------------------
ibm 09:30:01 101 98
ibm 09:30:02 103 99
ibm 09:30:03 103 102
ibm 09:30:04 104 103
ibm 09:30:05 104 103
ibm 09:30:08 103 100
ibm 09:30:09 102 100
ibm 09:30:10 100 99
q)show w:-2 1+\:t.time
09:29:59 09:30:02 09:30:06
09:30:02 09:30:05 09:30:09
q)wj[w;`sym`time;t;(q;(max;`ask);(min;`bid))]
sym time price ask bid
--------------------------
ibm 09:30:01 100 103 98 / max ask, min bid within (09:29:59;09:30:02)
ibm 09:30:04 103 104 99 / max ask, min bid within (09:30:02;09:30:05)
ibm 09:30:08 101 104 100 / max ask, min bid within (09:30:05;09:30:09)
q)wj[w;`sym`time;t;(q;(::;`ask);(::;`bid))]
sym time price ask bid
-------------------------------------------------
ibm 09:30:01 100 101 103 98 99
ibm 09:30:04 103 103 103 104 104 99 102 103 103
ibm 09:30:08 101 104 103 102 103 100 100
q)wj1[w;`sym`time;t;(q;(max;`ask);(min;`bid))]
sym time price ask bid
--------------------------
ibm 09:30:01 100 103 98
ibm 09:30:04 103 104 99
ibm 09:30:08 101 103 100
q)wj1[w;`sym`time;t;(q;(::;`ask);(::;`bid))]
sym time price ask bid
-------------------------------------------------
ibm 09:30:01 100 101 103 98 99
ibm 09:30:04 103 103 103 104 104 99 102 103 103
ibm 09:30:08 101 103 102 100 100
Summary
In this guide, you:
- Learned how to join tables using keyed joins
- Learned how to join tables using as-of joins
You now have the essential skills to join tables in KDB-X, and you should be well equipped to handle real world data joining and manipulation.