Foreign keys
Introduction
Tables in a database define a relationship between different types of data, whether that relationship is static, dynamic (i.e. fluctuating as part of a time series) or a mixture of both. In general, it is regularly the case that database queries will require data from multiple tables for enrichment and aggregation purposes and so a key aspect of database design is developing ways in which data from several tables is mapped together quickly and efficiently. Although KDB-X contains a very rich set of functions for joining tables in real time, if permanent and well-defined relationships between different tables can be established in advance then data-retrieval latency and related memory usage may be significantly reduced.
This document will discuss foreign keys and linked columns in a KDB-X context, two ways whereby table structure and organization can be optimized to successfully retrieve and store data in large-scale time-series databases.
Tests performed using KDB-X 0.1.2 (2025.10.18).
Overview
- Simple foreign keys
- Compound foreign keys
- Removing foreign keys
- Contrasting foreign keys and joins
- Summary
Simple foreign keys
The concept behind a foreign key is analogous to that of an enumerated list. While enumerating a list involves separating it into its distinct elements and their associated indexes within the list, here we take an arbitrary table column and enumerate it across a keyed column, which may be either in the same table as itself or a different table in the same database. Behind the scenes, a pointer to the associated key column replaces the enumerated column’s values essentially creating a parent-child relationship if they are in the same table or a data link if they are in different tables.
Reference: Enumerate $,
Enumeration !,
Enum Extend ?
Basics: Enumerations
In the case of a single key enumeration, creating a foreign key is very straightforward and is specified either within the initial table definition or on the fly through an update statement:
//Keyed table that will be used for enumeration; values in the keyed
//column must completely encapsulate the values in the column being enumerated
financials:(
[sym:`A`B`C]
earningsPerShare:1.2 2.3 1.5;
bookValPerShare:2.1 2.5 3.2 )
//Schema of trade table prior to enumeration
trade:([]time:`time$();sym:`$();price:`float$())
//Use the ‘$’ operator to create an enumerated list within the trade
//table by casting to the keyed table ‘financials’
update sym:`financials$sym from `trade
//Alternatively the foreign key may be defined at the outset
q)trade:([]time:`time$();sym:`financials$();price:`float$())
q)`trade insert (.z.T;`A;20.3)
,0
We can see that the sym column in the trade table is indeed an enumerated list with respect to the keyed table:
q)exec sym from trade
`financials$,`A
When we inserted a row into the trade table above, KDB-X performed a lookup on the keyed table to see what row of the table this entry will map to (in this case row 0) and rather than placing the sym into the table directly, a pointer to this key value was inserted instead. Given that this entry is a reference and not a value, any alterations to the referenced key column will directly influence the trade table and indeed any other tables referencing this key. It is therefore vitally important that great care be taken when managing data that is being referenced elsewhere since modifying, rearranging or deleting this data will have unwanted knock-on effects:
q)delete from `financials where sym=`A
`financials
//trade table still has a reference to the sym entry in row 0 which is now 'B'
q)trade
time sym price
----------------------
09:06:24.849 B 20.3
On the other hand – as is the case with a regular enumerated list – if an enumeration attempt is made where the referencing column value does not exist then the lookup fails and a cast error is returned:
q)`trade insert (.z.T;`D;12.1)
'cast
Inserting the relevant mapping data into the keyed table will fix this problem:
q)`financials insert (`D;1.3;4.0)
,2
q)`trade insert (.z.T;`D;12.1)
,1
The above example demonstrates a key benefit of using foreign keys, it ensures that trade data will always have relevant referential data available for queries and lookups, thus identifying missing or corrupt data and improving data integrity. Another benefit is that since we have created a link from the sym column in the trade table to various rows in the financials table, we can use this mapping to reference other rows as well using dot notation, just as if the referenced table columns were in the original:
//Display the Price-Earnings and Price-Book ratios by sym
q)select priceEarningsRatio:last price%sym.earningsPerShare,priceBookRatio:last price%sym.bookValPerShare by sym from trade
sym| priceEarningsRatio priceBookRatio
---| ---------------------------------
B | 8.826087 8.12
D | 9.307692 3.025
Similar to the case of KDB-X column attributes only a single foreign key can be referenced by a column at any one time; establishing a second foreign key will automatically delete the link to the first. Links to more than one table using a single column may be created by linking table 1 to table 2, table 2 to table 3 and so forth:
//Create a new table holding exchange information
q)exchange:([id:101 102 103 104];ex:`LSE`NDQ`NYSE`AMEX)
q)update exchangeID:`exchange$101 101 102 from `financials
`financials
//Compound dot notation
q)select time,sym,sym.exchangeID.ex from trade
time sym ex
--------------------
09:06:24.849 B LSE
09:07:44.282 D NDQ
Compound foreign keys
A foreign key link across two or more columns is possible in KDB-X. In this case to allow the usage of dot notation an extra column is appended to the referencing table storing the index link of the table being referenced:
q)t1:([sym:`A`B`C;ex:`NYSE`NYSE`NDQ];sharesInIssue:3?1000)
q)t2:([]time:2?.z.T;sym:`A`B;exchange:`NYSE`NYSE;price:2?10.)
//Append columns together using Each
q)update t1fkey:`t1$(t2[`sym],'t2[`exchange]) from `t2
q)t2
time sym exchange price t1fkey
-----------------------------------------
02:31:39.330 A NYSE 7.043314 0
04:25:17.604 B NYSE 9.441671 1
q)select sym, marketCap:price*t1fkey.sharesInIssue from t2
sym marketCap
-------------
A 401.0
B 880.5
All future inserts into t2 must enumerate across t1 as below to avoid an error:
q)`t2 insert (.z.T;`C;`NDQ;4.05;`t1$`C`NDQ)
,2
q)t2
time sym exchange price t1fkey
-----------------------------------------
02:31:39.330 A NYSE 7.043314 0
04:25:17.604 B NYSE 9.441671 1
20:08:25.689 C NDQ 4.05 2
Alternatively, a complex foreign key may be initialized along with the table itself. The following notation is required:
q) t2:([]time:`time$();sym:`$();exchange:`$();price:`float$(); t1fkey:`t1$())
q)`t2 insert (.z.T;`C;`NDQ;4.05;`t1$`C`NDQ)
,0
Note that since the enumerated column stores the row index lookup value and not the actual value. The column type is converted to an integer and not a symbol list. Once again all inserts must enumerate the foreign key values:
q)meta t2
c | t f a
------| ------
time | t
t1fkey| i t1
price | f
q)t2
time sym exchange price t1fkey
--------------------------------------
09:17:22.771 C NDQ 4.05 2
Removing foreign keys
To remove a foreign key from a table for simple foreign keys the keyword value is used:
q)update sym:value sym from `trade
`trade
If a table has a large number of foreign keys then the following function may be used which looks up the index of each column containing a foreign key and applies the value function to each one:
removeKeys:{![x;();0b;tr!value,/:tr:?[meta x;enlist(<>;`f;(),`);();`c]]}
q)meta removeKeys t2
c | t f a
------| -----
time | t
t1fkey| i
price | f
Calling the value function on a complex foreign key column will remove the table mapping but will leave the previously enumerated column intact as a list of integers.
Contrasting foreign keys and joins
We start with a fresh trade table and another table, exInfo, which maps each symbol to its traded exchange:
trade:([]time:`time$();sym:`$();price:`float$();size:`int$());
exInfo:([sym:`$()]exID:`int$();exSym:`$();location:`$())
//One million row entries
n:1000000
//Start and end time
st:08:00:00.000
et:17:00:00.000
//100 random syms of length 3
syms:-100?`3
//Exchange information
exdata:(
syms;
count[syms]#101 102 103 104;
count[syms]#`LSE`NDQ`HKSE`TSE;
count[syms]#`GB`US`HK`JP )
insert[`exInfo;exdata]
//Trade data
tdata:(asc st+n?et-st;n?syms;n?100f;n?1000)
insert[`trade;tdata]
Simple select statements from the database take much longer using a left join and require more memory since the table mappings must be built up from scratch and the entire lookup table must be expanded to match the length of the source table prior to the output columns being specified:
q)\ts select time,sym,exSym from trade lj exInfo
16 29362016
q)update sym:`exInfo$sym from `trade
q)\ts select time,sym,sym.exSym from trade
2 8389168
The difference above is even more evident in higher dimensions:
q)//Remove the existing foreign key from the trade table and add the
q)//exchange ID for joining across two columns
q)update sym:value sym from `trade
q)update exID:exInfo[;`exID] each sym from `trade
q)//Re-key exInfo to key on exchange ID as well as sym
q)exInfo:`sym`exID xkey 0!exInfo
q)//Left join on two columns, takes almost twenty times longer
q)\ts select time,sym,exSym from trade lj exInfo
32 25167056
q)//Now create a complex foreign key
q)update exfKey:`exInfo$(trade[`sym],'trade[`exID]) from `trade
q)//Same results as above in simple case
q)\ts select time,sym,exfKey.exSym from trade
2 8389168
Summary
This document described the usage of foreign keys in KDB-X tables. Foreign keys are important building blocks when considering making permanent connections between tables. We have read about
- how to create simple and compound foreign keys
- how to remove them
- why they can be better from a performance standpoint as opposed to joining at the time of the query
Foreign keys, though, cannot be persisted with splayed tables. If we need that, we will have to use linking columns.