14. Introduction to kdb+
14.0 Overview
In this chapter we provide an introduction to kdb+, which is the database offering from KX. Roughly speaking, kdb+ is what happens when q tables are persisted and then mapped back into memory for operations. Of course, things are more complicated than that.
In this chapter we presume familiarity with the contents of the previous chapters and will write native q code with impunity.
Note
Due to public outcry, in contrast to previous versions of this tutorial, we have made all database paths relative. Assuming you have installed q in the standard setup, the examples should work without change. Even old dogs can learn new tricks. Woof!
Due to space limitations on the display page, in this chapter there will input lines that (w)rap on the printed page. You can copy/paste these to your q session without issue. Bring your own beatbox. For clarity, on the page we sometimes will suppress the result of a function application with _ when it is not germane to the presentation.
Late breaking: In q4.1 there is an alternative to
\lfor "loading" that optionally does not change the current working directory to the root and optionally does not load scripts it finds. It can be substituted for any instance of\lyou encounter in the chapter. See 14.5.9 for.Q.lo.
14.1 Tables in Memory and Serialization
Since tables are first-class entities in q it is possible to maintain a table entirely in memory, provided you have enough physical memory to hold it. This is often done with real-time data in which every microsecond matters. There is one problem with this from a database perspective:
- An in-memory table is ephemeral – meaning that you have to persist it to storage to preserve it for history. Moreover, all information is lost if the program or q process dies.
One solution is to serialize the table periodically to persistent storage using set or similar mechanisms. In this section we
recapitulate material from previous chapters from this perspective.
Note
Those having passing familiarity with q can use this section as a review.
14.1.1 Tables and Keyed Tables
A table is fundamentally the (flip) of a column dictionary, in which address slots are reversed but no data is moved during the transpose. For example, here is a table with two simple list columns in fundamental form.
q)flip ([c1:`a`b`c; c2:10 20 30])
c1 c2
-----
a 10
b 20
c 30
Table definition syntax permits tables to be defined in more readable format.
q)([] c1:`a`b`c; c2:10 20 30) ~flip ([c1:`a`b`c; c2:10 20 30])
1b
The schema of a table has the same form but with empty columns – i.e., no rows.
q)([] c1:`$(); c2:`long$())~ 0#([] c1:`a`b`c; c2:10 20 30)
1b
It is good practice to prototype fully the empty lists in a schema. Unfortunately this is not possible if the corresponding columns are not simple lists since they have only general type.
q).Q.s1 0#([] s:`a`b; cc:(10 20; 30 40 50))
"+`s`cc!(`symbol$();())"
The type of a (non-keyed) table is 98h and the function (meta) summarizes the column names, types and attributes in a result keyed table.
q)meta ([] s:`$(); v:`long$())
c| t f a
-| -----
s| s
v| j
A keyed table is a dictionary that establishes positional correspondence between a table of (presumably unique) keys and a table of values. As such, a keyed table has type 99h.
q)show kt:([] id:1001 1002 1003)!([] c1:`a`b`c; c2:100 200 300)
id | s v
----| -----
1001| a 100
1002| b 200
1003| c 300
Table definition syntax is more compact.
q)kt~([id:1001 1002 1003] c1:`a`b`c; c2:100 200 300)
1b
The type of a keyed table is 99h since it is a dictionary and meta applies exactly as with tables.
q)meta kt
c | t f a
--| -----
id| j
s | s
v | j
14.1.2 Foreign Keys and Link Columns
A foreign key is one or more table columns that are enumerated over the key component of a keyed table. For example, the column id in the table below is a foreign key over kt. Note that the foreign key column is tagged with the name of its target table in the f column of (meta).
q)kt:([id:1001 1002 1003] c1:`a`b`c; c2:100 200 300)
q)t:([]; id:`kt$1002 1001 1003 1001; iq:100 101 102 103)
meta t
c | t f a
--| ------
id| j kt
iq| j
A query on a table having a foreign key can access columns in the related keyed table via dot notation. This represents an implicit inner join.
q)select id.c2, iq from t
c2 iq
-------
200 100
100 101
300 102
100 103
A link column is similar to a foreign key, in that its entries are indices of rows in a table, but you must create the indices by performing the lookup manually. See 8.6 for more details.
Here is the previous foreign key example redone with a link column against a table.
q)tk:([] id:1001 1002 1003; c1:`a`b`c; c2:100 200 300)
q)t:([]; id:`tk!(exec id from tk)?1002 1001 1003 1001; iq:100 101 102 103)
q)meta t
c | t f a
--| ------
id| j tk
iq| j
14.1.3 Serializing Tables
It is possible to persist any table (or keyed table) using the general q serialization/deserialization capability of set and get. There is no restriction on table or column types.
q)`:data/t set ([] c1:`a`b`c; c2:100 200 300)
`:data/t
q)\\
>q
q)t:get `:data/t
c1 c2
------
a 100
…
You can serialize foreign keys and link columns and bring them back into memory.
q)kt:([id:1001 1002 1003] c1:`a`b`c; c2:100 200 300)
q)`:data/kt set kt
_
q)tk:([] id:1001 1002 1003; c1:`a`b`c; c2:100 200 300)
q)`:data/tk set tk
_
q)`:data/t1 set ([]; id:`kt$1002 1001 1003 1001; iq:100 101 102 103)
_
q)`:data/t2 set ([]; id:`tk!(exec id from tk)?1002 1001 1003 1001; iq:100 101 102 103)
_
q)\\
>q
q)kt:get `:data/kt
q)tk:get `:data/tk
q)t1:get `:data/t1
q)t2:get `:data/t2
q)select id.c1, iq from t1
_
q)select id.c1, iq from t2
_
14.1.4 Operating on Serialized Tables
You can operate on a serialized table by loading it into memory with get or \l.
q)`:data/t set ([] c1:`a`b`c; c2:100 200 300)
`:/data/t
q)\l data/t
`t
q)select from t
¸c1 c2
------
a 100
…
q)t1: get `:data/t
q)select from t1
c1 c2
------
a 100
…
Alternatively, you can perform a query on a serialized table by specifying its file handle as the table name.
q)select from `:data/t where c1 in `a`c
c1 c2
------
a 100
c 300
q)`:data/t upsert (`x;42)
`:data/t
q).[`:data/t;();,;([] c1:`y`z; c2:400 500)]
`:data/t
q)select from `:data/t where c1 in `x`y`z
c1 c2
------
x 42
y 400
z 500
Similar operations are available on keyed tables.
q)`:data/kt set ([k:`a`b`c] v:10 20 30)
`:data/kt
q)`:data/kt upsert ([k:`b; v:200])
`:data/kt
q)`:data/kt upsert (`x;42)
`:data/kt
q)select from `:data/kt
k| v
-| ---
a| 10
b| 200
c| 30
x| 42
Note
The limitation to using a serialized table or keyed table is that, behind the scenes, the operations load it into memory and write it back out. In addition to the time for loading, this means it must fit into memory in its entirety.
14.1.5 The Database View
Persisting a table via serialization creates a single file in the OS file system. Let's raise our perspective to the 50,000-foot level. From this height, we see the table as a single point.
Big Picture (0): We think of a serialized table as a zero-dimensional persisted form since it is stored in a single file and is not decomposed.
file
table ▪
Here the ▪ represents the contents of the table as an opaque entity. This contrasts with higher-dimensional forms we shall encounter later.
14.2 Splayed Tables
In the previous section, we saw that it is possible to persist tables using serialization. From a database perspective there are (at least) two issues with operating on serialized tables due to the fact that the entire table is loaded into memory.
- The entire table must fit into memory on each user's machine.
- Operations against a large persisted table will be slow due to reloading the entire table each time.
14.2.0 Splaying a Table
When a table is too large to fit into memory as a single entity, we can decompose it into columns within a directory. This is called splaying the table because the table is pulled apart into its constituent pieces (it is an interesting exercise to look up the derivation of this word in English).
Splaying solves the memory/reload issue because a splayed table is mapped into memory rather than loaded; individual columns are loaded on demand then released when no longer needed. Tables with many columns especially benefit from splaying since most queries refer to only a handful of columns and so only those columns will actually be loaded.
A splayed table corresponds to a directory whose name is the table name. Each column list of the table is serialized into a file whose name is the column name.
Tip
You don't have any choice in the name when you load a splayed table.
A list of the symbolic column names is also serialized to the hidden file .d in the directory to record column order. This is the only metadata stored by kdb+; all other metadata is read from directory and file names.
Customarily the splayed directory is created one level down from a directory that serves as the root of the database.
/root
/tablename <- splayed table directory
.d <- file with column names
column1name <- column data file
column2name <- column data file
…
From the 50,000-foot level, we see the following on the file system.
Big Picture (1): We think of a splayed table as a persisted form that is cut vertically along columns.
| column1 | column2 | column3 | … | |
|---|---|---|---|---|
| table | ▪ | ▪ | ▪ | … |
Geometrically, the persisted table is 1-dimensional – there is a point for each column in the persisted image of the table.
14.2.1 Creating Splayed Tables
We start with a very simple example of creating a splayed table. Make sure you include the trailing / in the file handle; otherwise, you will serialize the table into a single file.
q)`:db/t/ set ([] v1:10 20 30; v2:1.1 2.2 3.3)
`:db/t/
q)\ls -a db/t
,"."
".."
".d"
"v1"
"v2"
q)get `:db/t
v1 v2
------
10 1.1
20 2.2
30 3.3
Note
Do not include the trailing / in the corresponding get.
It is also possible to create a splayed table with upsert, or with the equivalent generalized Amend, using the file handle as the table name. When the file does not exist, these act like set.
q)`:db/t2/ upsert ([] v1:10 20 30; v2:1.1 2.2 3.3)
`:db/t2/
q).[`:db/t3/; (); ,; ([] v1:10 20 30; v2:1.1 2.2 3.3)]
`:db/t3/
Reading the constituents of the splayed directory with get demonstrates that they are simply serialized q entities. Observe that you do not include the trailing / in the directory name with get.
q)get `:db/t/v1
10 20 30
q)get `:db/t/v2
1.1 2.2 3.3
q)get `:db/t/.d
`v1`v2
If your notion of fashion is a hair shirt, you can splay your table manually. In a fresh db directory:
t:([] v1:10 20 30; v2:1.1 2.2 3.3)
cs:cols t
{[cname] (hsym `$"db/",string cname) set t cname} each cs
`:db/v1`:db/v2
q):db/.d set cs
`:db/.d
There are restrictions on what can be splayed.
- Tables can be splayed. Keyed tables cannot. This might seem to preclude relational capabilities since it eliminates persisting a foreign key relation. But the day is saved by link columns, which can be persisted. See 14.1.2.
- Before q3.6 only columns that were simple lists or compound lists could be splayed. See 14.2.4 for more on nested columns.
- All symbol columns must be enumerated. This restriction might seem to complicate life but there are conventions and utilities that eliminate most of the pain. See 14.2.2.
14.2.2 Splayed Tables with Symbol Columns
The convention in splayed (and partitioned) tables is that all symbol columns in all tables are enumerated over the list sym, which is serialized into the root directory.
For ease of understanding, here we use a projected form of the utility .Q.en for enumerating symbols. Its first parameter is the file handle of the root directory (for the location of the sym file) and its second parameter is the table whose symbol columns you wish enumerated. Here is a simple example. Assuming the target directory db does not exist.
q)`:db/t/ set .Q.en[`:db;] ([] s1:`a`b`c; v:10 20 30; s2:`x`y`z)
`:db/t/
q)\ls db
"sym"
,"t"
q)\ls -a db/t
,"."
".."
".d"
"s1"
"s2"
,"v"
q}\\
>q
q)\l db
)select from t
s1 v s2
--------
a 10 x
…
Be mindful of the following observations on the action of .Q.en in this example:
- If there is a
symlist in memory, it is overwritten. - If there is a
symlist on disk it is locked and then loaded into memory. - If no
symlist exists in memory or on disk an empty one is created. - All symbols in all symbol columns of the table are conditionally enumerated over the
symlist in memory. - Once the enumeration is complete the
symlist in memory is serialized to the root and the file is unlocked.
Once again, you could don your hair shirt and do symbol enumeration manually. We do not recommend doing this except to impress someone on a q test. Assuming db is empty:
q)t:([] s1:`a`b`c; v:10 20 30; s2:`x`y`z)
q)sym:()
q)`:db/t/ set @[t;exec c from meta t where "s"=t;`sym?]
`:db/t/
q)sym
`a`b`c`x`y`z
q)`:db/sym set sym
\\
q)select from t
s1 v s2
--------
a 10 x
14.2.3 Splayed Tables with Non-Simple Columns
In previous versions of q the only nested columns that could be splayed are what we call compound lists – i.e., lists of simple lists of uniform type. The most common example is a list of strings, which is a list of lists of char. A compound column is indicated by an upper case letter in the result of (meta). These columns are almost as fast as simple columns in retrieval.
For example, here is a table having two compound columns, which can be splayed in a fresh directory db.
q)meta ([] ci:(1 2 3; enlist 4; 5 6); cstr:("abc";enlist"d";"ef"))
c | t f a
----| -----
ci | J
cstr| C
q)`:db/tcomp/ set ([] ci:(1 2 3; enlist 4; 5 6); cstr:("abc";enlist"d";"ef"))
`:db/tcomp/
By contrast, the following table could not formerly be splayed but now succeeds. Columns of mixed type will not be as fast as simple or compound columns on retrieval.
q)meta ([] c:(1;1,1;`1))
c| t f a
-| -----
c|
q)`:db/tmix/ set ([] c:(1;1,1;`1))
`:db/tmix/
Important
You might reasonably think that you can use the result of (meta) to determine whether a table column is simple or compound. You would be wrong. Since (meta) only examines the initial item of a column list it improperly reports some mixed columns as compound.
q)meta ([] c:(1 2 3 ;1,1;`1))
c| t f a
-| -----
c| J
When you splay a table with compound columns, q creates not one but two files for each compound column. For example, we see the following.
q)`:db/tstr/ set ([] c:("abc";enlist "d";"ef"))
`:db/tstr
q)\ls -a db/tstr
,"."
".."
".d"
,"c"
"c#"
Observe that there are two files associated with the compound column – namely, c and c#.
Interestingly, get reconstitutes the list when applied to the non-sharp file. This suggests the logic for handling the mapping is built into get.
q)get hsym `$":db/tstr/c"
"abc"
,"d"
"ef"
Roughly speaking, the c file holds "pointers" to the flattened data in c#. The purpose of the multiple files is to speed up operations when the splayed column is mapped into memory. The processing won't be as fast as for a simple column, but it is still plenty fast for most purposes.
When you splay a table with mixed column(s) – i.e., a general list – q creates not just one but potentially three files for each mixed column. The two additional files have suffixes # and ##. If there are any symbols found in the column, a ## file will be created to hold them. Again the purpose is to facilitate fast operations on the mapped column. For example, we see the following:
q)\ls -a db/tmix
,"."
".."
".d"
,"c"
"c#"
"c##"
Once again get reconstitutes the column under the covers.
q)get hsym `$"db/tmix/c"
1
1 1
`1
In contrast, strings do not pollute the sym list with one-off instances and are reasonably fast. The disadvantage is that they are not first class and you must revert to teenage speak, like, by using (like) to match them.
Recommendation
Only make text columns into symbols when the fields will be drawn from a small, reasonably stable domain and there is significant repetition in their use. Unless you are completely sure of this, start with a string column. It is much easier to convert a string column to symbols that it is to remove symbols from a polluted sym list.
A text column that is drawn from a small fixed list or a lookup table is an ideal candidate for symbols. So are invariant keys, provided the key domain is small and will not grow unreasonably. On the other hand, fields such as comments or notes should always be strings.
14.2.4 Basic Operations on Splayed Tables
We create a splayed table in a fresh db.
q)`:db/t/ set .Q.en[`:db;] ([] s1:`a`b`c; v:10 20 30; s2:`x`y`z)
`:db/t/
To operate on a splayed table you can map it into memory in one of two ways.
Tip
The lack of a trailing / in both techniques of mapping the table below.
You can specify a root directory containing the sym file and splayed table on the q startup command immediately after the q executable.
$q db/t
select from t
s1 v s2
--------
a 10 x
...
Alternatively, you can use \l to map the root directory.
Tip
Confusingly, many folks say they are 'loading" the table, even though none of the table data is actually brought into memory in this step.
q)\l db
select from t
s1 v s2
--------
a 10 x
...
Important
When you "load" a root database directory as above, the current working directory for the OS is set to that directory.
The illusion that the table is actually in memory after it is mapped is convincing. Many fundamental table operations work on splayed tables.
q)\a
,`t
q)meta t
c | t f a
--| -----
s1| s
v | j
s2| s
q)cols t
`s1`v`s2
q)count t
3
You can index records.
q)t[0]
s1| `sym$`a
v | 10
s2| `sym$`x
You cannot use dot notation to extract columns from a splayed table but you can extract a column with symbol indexing.
q)t.s1
's1
q)t `s1
`sym$`a`b`c
You can use both select and exec templates on a splayed table.
q)select v from t where s1=`c
v
--
30
q)exec s1 from t
`sym$`a`b`c
Tip
This contrasts with partitioned tables where you can only use select.
14.2.5 Operations on a Splayed Directory
As of this writing (July 2025), the table operations available against the file handle of a splayed table are: select, exec, upsert, xasc, `attr# (apply an attribute).
In a fresh q session with t having been splayed as above but not loaded.
q)select from `:db/t
s1 v s2
--------
0 10 3
…
q)exec v from `:db/t
10 20 30
q)`v xdesc `:db/t / sort in pace on disk
`:db/t
q)@[`:db/t;`s1;`p#] / apply attribute in place
`:db/t
q)\l db
q)meta t
c | t f a
--| -----
s1| s p
v | j
s2| s
We point out one source of confusion to qbies. Specifically, the behavior of update on a splayed table that has been mapped into
memory. Starting with a fresh directory db, create a splayed table, map it into memory and then update it.
q)`:db/t/ set .Q.en[`:db;] ([] s1:`a`b`c; v:10 20 30; s2:`x`y`z)
`:db/t/
q)\l db
q)update v:300 from `t where s1=`c
`t
q)select from t
s1 v s2
---------
a 10 x
b 20 y
c 300 z
But a rude surprise awaits when we remap the table.
q)\l .
q)select from t
s1 v s2
--------
a 10 x
b 20 y
c 30 z
Note
Updates applied to a mapped table are only visible in the workspace and are not reflected on disk. There may be scenarios where this is useful but we advise avoiding it.
And now we arrive at a fundamental limitation of kdb+.
Fact of Life with Splayed Tables: It is not possible to use built-in operations to update data in persisted splayed (or partitioned) tables.
You read that correctly. kdb+ is intended to store data that is not updated or deleted once it has been written. We shall see in the next section how to append to a splayed table, which makes it possible to process updates and deletes in a bi-temporal fashion, but this capability is not available out of the box.
14.2.6 Appending to a Splayed Table
Since upsert acts as insert on regular (non-keyed) tables and only non-keyed tables can be splayed, we use upsert with the splayed directory name in order to append records to a splayed table on disk. One reason to deprecate (insert) in your code is it doesn't work on splayed tables.
In a fresh db,
q)`:db/t/ set .Q.en[`:db;] ([] s1:`a`b`c; v:10 20 30; s2:`x`y`z)
`:db/t/
q)`:db/t/ upsert .Q.en[`:db;] ([] s1:`d`e; v:40 50;s2:`u`v)
`:db/t/
q)`:db/t upsert .Q.en[`:db;] enlist ([s1:`f;v:60;s2:`t])
`:db/t
q)`:db/t upsert .Q.en[`:db;] flip cols[`:db/t]!flip ((`g;70;`r);(`h;80;`s))
`:db/t
q)select from `:db/t
s1 v s2
--------
a 10 x
b 20 y
…
In the last update expression, note that converting rows into a conforming table makes it easy to enumerate symbols on the fly.
You can use upsert in this fashion to build large splayed tables incrementally. The following example can be enhanced to create a splayed table step-by-step as batches of new records arrive. In a fresh db:
q)batch:{[rt;tn;recs] hsym[`$rt,"/",tn,"/"] upsert .Q.en[hsym `$rt;] recs}
q)dayrecs:{([] dt:x; ti:asc 100?24:00:00; sym:100?`ibm`aapl; qty:100*1+100?1000)}
q)appday:batch["db";"t";]
q)appday dayrecs 2025.01.01
`:db/t/
q)appday dayrecs 2025.01.02
`:db/t/
q)appday dayrecs 2025.01.03
q)select count i from `:db/t
x
---
300
14.2.7 Manual Operations on a Splayed Directory
Although there are no built-in operations to update splayed tables on disk, in a crunch you can perform such operations by manipulating the serialized files.
Important
The examples shown here should be used with caution, as none of the operations are atomic; they are simply file system manipulation. Even read-only users could see inconsistent data, so things are best done when no other users are accessing the database.
Here is an example of how to simulate an (update .. where) on a single column. In a fresh db:
q)`:db/t/ set ([] ti:09:30:00 09:31:00; p:101.5 33.5)
_
q)`:db/t/p set .[get `:db/t/p; where 09:31:00=get `:db/t/ti; :; 42.0]
_
q)select from `:db/t
ti p
--------------
09:30:00 101.5
09:31:00 42
To add a new column of nulls to an existing splayed table, we first get the row count form any column file. Then we revise the .d file with the new column name appended. This example is easily modified to place arbitrary data in the new column. For simplicity in this example, we are bypassing the enumeration of symbols. If you choose to follow this path, you should include the enumeration.
q)`:db/t/s set (count get `:db/t/ti)#`
`:db/t/s
q)`:db/t/.d set get[`:db/t/.d] union `s
`:db/t/.d
q)meta select[1] from `:db/t
c | t f a
--| -----
ti| v
p | f
s | s
.d file to reflect
its absence.
q)system "rm db/t/s"
q)`:db/t/.d set get[`:db/t/.d] except `s
`:db/t/.d
q)meta select[1] from `:db/t
c | t f a
--| -----
ti| v
p | f
s | s
While you can sort a splayed table on disk, suppose you want to implement your own custom sort. We show how to (re)create the standard sort and. You can go from there.
First create the appropriate sort index with (iasc) or (idesc) and then use this to re-index all the column files. Here we perform a descending sort by the ti column.
q)cs:system "ls db/t"
q)I:idesc get `:db/t/ti
q){path set get[path:hsym `$"db/t/",x] I} each cs
`:db/t/p`:db/t/ti
q)select from `:db/t
ti p
--------------
09:31:00 42
09:30:00 101.5
14.2.8 Working with sym Files
On occasion you may need to perform operations involving the sym files. For example, you may want to move a table from one database to another. Or you may find that your sym domain has become polluted due to symbol columns that should have been strings. Or you may have unwisely chosen to have too many different enumeration domains and you wish to consolidate them into one. All these situations can be resolved by careful operations on the sym file(s).
Tip
Updating sym files is delicate and should be done with great care, only after you have made a complete backup of the root directory. Corrupting the sym file will almost certainly render your database useless. Ask me how I know.
In our first example, we show how to merge a table from one database into another. Suppose we have root directories db1 and db2, both with the standard sym file in the root and with splayed tables t1 and t2, respectively. In fresh directories db1 and db2,
q)`:db1/t1/ set .Q.en[`:db1;] ([] s:`a`b`c; v:10 20 30)
`:db1/t1/
q)`:db2/t2/ set .Q.en[`:db2;] ([] s:`c`d`e; v:300 400 500)
`:db2/t2/
q)get `:db1/sym
`a`b`c
q)get `:db2/sym
`c`d`e
In order to copy table t2 to the db1 root, we un-enumerate all its symbol columns with (value) and then re-enumerate them in the target database.
q)scols2:exec c from meta `:db2/t2 where t="s"
q)t2noenum:@[select from `:db2/t2;scols2; value]
q)`:db1/t2/ set .Q.en[`:db1] t2noenum
`:db1/t2/
q)get `:db1/sym
`a`b`c`d`e
In another example, suppose that we mistakenly made a comment column into symbols instead of strings, with the result being a pollution of the sym domain with many single use items. Starting in a fresh db, we save t1 with a valid symbol column and then screwup with a symbol comment field in t2.
q)`:db/t1/ set .Q.en[`:db;] ([] s:`a`b`c; v:10 20 30)
`:db/t1/
q)`:db/t2/ set .Q.en[`:db;] ([] c:1 2 3; comment:`$("abc";"de";"fg"))
`:db/t2/
q)show sym:get `:db/sym
`a`b`c`abc`de`fg
First we load the database and remove the offending comments from the sym list.
q)show sym:sym except exec comment from `:db/t2
`a`b`c
Next, we re-enumerate the re-typed culprit table over the clean sym list and re-splay it. We use (`sym?) for the enumeration since we presumably have most of them.
q)reenum:{@[x;exec c from meta x where t="s";`sym?]}
q)`:db/t2/ set reenum ([] s:`a`b`c; comment:("abc";"de";"fg"))
sym list.
q)ts:hsym `$-1_'(system "ls -d db/*/") except enlist "db/t2/"
q)unenum:{@[select from x; exec c from meta x where t="s";value]}
q){(hsym `$string[x],"/") set reenum unenum x} each ts
,`:db/t1/
Finally we overwrite the sym file with the correct sym list that is now in memory and reload the database to check our handiwork.
q)`:db/sym set sym
q)\l db
q)meta t2
c | t f a
-------| -----
s | s
comment| C
14.2.9 Splayed Tables with Link Columns
We previously pointed out that you cannot splay a keyed table, and therefore cannot have a foreign key relation between splayed tables. However, you can splay a link column and then use dot notation just as you would with a foreign key. You must do the work of creating the index yourself, just as with link columns with tables in memory.
In our first example, we create the link just before we splay the tables. This is the same as creating the link on in-memory tables and then splaying them. In a fresh db,
q)t1:([] c1:`c`b`a; c2: 10 20 30)
q)t2:([] c3:`a`b`a`c; c4: 1. 2. 3. 4.)
q)update t1lnk:`t1!t1[`c1]?t2[`c3] from `t2
`t2
q)`:db/t1/ set `.Q.en[`:db;] t1
`:db/t1/
q)`:db/t2/ set `.Q.en[`:db;] t2
`:db/t2
q)\\
$q
q)\l db
meta t2
c | t f a
-----| ------
c3 | s
c4 | f
t1lnk| j t1
q)select c3,t1lnk.c2,c4 from t2
c3 c2 c4
--------
a 30 1
…
.d file for t2.
q))`:db/t1/ set .Q.en[`:db;] ([] c1:`c`b`a; c2: 10 20 30)
`:db/t1/
q))\
q)`:db/t1/ set .Q.en[`:db;] ([] c1:`c`b`a; c2: 10 20 30)
`:db/t1/
q)`:db/t2/ set .Q.en[`:db;] ([] c3:`a`b`a`c; c4: 1. 2. 3. 4.)
`:db/t2/
q)`:db/t2/t1link set `t1!(get `:db/t1/c1)?get `:db/t2/c3
`:db/t2/t1link
q).[`:db/t2/.d;();,;`t1link]
`:db/t2/.d
q)\l db
q)meta t2
c | t f a
------| ------
c3 | s
c4 | f
t1link| j t1
14.2.10 Query Execution on Splayed Tables
As we noted in the introduction of the chapter, splayed tables are a memory win, in that the table is mapped into memory and columns are actually loaded only as needed. Since a typical query requires only a handful of columns, this can greatly reduce the amount of memory required to process a table with many columns.
In addition to the memory win, if the same columns are referenced again soon, there is a good chance that their data is cached, either in memory or by the storage controller. This can be a significant performance win.
There is also a more subtle performance win in how the where phrase is implemented. As mentioned in Chapter 11, each where subphrase produces a boolean vector that acts as mask, on which positions will be examined in the subsequent grouping and aggregation. When the table is mapped, this can also significantly reduce the amount of data that is loaded into memory.
14.3 Partitioned Tables
To recap the previous section, a splayed table is split vertically along its columns when stored to disk and is subsequently mapped back into memory. When a query is executed against the mapped table, column data is loaded on demand and is (presumably) cached for subsequent access. This reduces a table's memory footprint, especially for tables with many columns that are not accessed simultaneously.
Some time series data is so large that even the individual columns may not fit into memory – for example, daily trades and quotes for an entire exchange. In this case, we can further decompose the table by slicing horizontally – called partitioning in kdb+. For example, the typical solution for trades and quotes is to slice into daily portions. The result is a collection of daily splayed directories, one for each day for which data exists.
Tip
All partitioned tables are splayed but not all splayed tables are partitioned.
14.3.1 Partitions
A partitioned table is a splayed table that is further decomposed by grouping records having common values along a virtual column of integral type. The allowable virtual column types have underlying integer values: date, month, year and int.
The slice of records grouped by a given value is splayed into a directory, called a partition, whose name is that common value. In the canonical Finance example, historical trades (or quotes) are stored in daily partition directories – remember a q date is an integer under the covers.
Big Picture (2): We consider a partitioned table to be a two-dimensional persisted form since it is cut in two directions: vertically by splaying along columns and horizontally by slicing into partitions.
| column1 | column2 | column3 | … | |
|---|---|---|---|---|
| partition1 | ▪ | ▪ | ▪ | … |
| partition2 | ▪ | ▪ | ▪ | … |
| … | … | … | … | … |
Here each ▪ represents a (partial) column of a partition slice as an opaque entity.
As with splaying, kdb+ implements partitioning using the OS directory structure. Each partition must be stored in a directory immediately under the database root directory. The directory name is the common column value of the virtual field for all the rows in the partition.
Each partition directory in turn contains a subdirectory holding a splayed table with the records for that slice. The two-dimensional decomposition is reflected in two levels of nested directories under the root.
Since each splayed table subdirectory name is the name of the table, these names must be consistent across all partitions. Likewise, as portions of a single table, all splayed slices should have identical schemas. This means that each slice directory contains column files with the same names and internal data types, as well as identical .d files.
/root
/partitionvalue1
/tablename
.d
column1name
column2name
…
/partitionvalue2
/tablename
.d
column1name
column2name
…
…
Important
Since a partition directory name factors out the common value for all records in its slice, do not include a column holding the virtual partition value when you splay a partition slice – e.g., do not include a date column for a daily partitioned table. Instead, kdb+ infers the name, value and type from the partition directory name and recreates a virtual column from this information. The name of the virtual column is set by q and cannot be modified.
14.3.2 Partition Domain
We call the type of the virtual column for the partition the partition domain. As noted previously, the partition domain must have an underlying integral value.
Tip
You cannot use a symbol column as a partition domain, even if the symbols are enumerated. You can, however, use it to segment the database.
For example, suppose we have daily time series data. A daily partitioning could look as follows:
2025.01.01, 2025.01.02, ...
/db
/2015.01.01
/t <- trade records for 2015.01.01
/2015.01.02
/t <- trade records for 2015.01.02 …
Important
A kdb+ database can only have a single partition domain. This means that you must create separate databases if you need partitions of different granularity. For example, you cannot have daily and monthly partitions in one database.
There is no requirement that there be entries for every value in the partition domain. For example, there may be no trades for holidays.
/db
/2014.12.31
/t
/2015.01.02
/t
…
If we partition by month,
2025.01m, 2025.02m, ...
month. Do not put the type indicator m on the directory name.
For daily data that requires frequent multi-year queries, partitioning by year may be better:
2020, 2021, ...
year even though there is no year type in q. A directory name is recognized as a valid q year if it comprises exactly four numeric digits.
You can also partition by an arbitrary int such as a bin number:
0, 1, ...
int.
14.3.3 Creating Partitioned Tables
For simplicity, our first example has no symbols. In a fresh db,
q)`:db/2025.01.01/t/ set ([] ti:09:30:00 09:31:00; p:101 102f)
`:db/2025.01.01/t/
q)`:db/2025.01.02/t/ set ([] ti:09:30:00 09:31:00; p:101.5 102.5)
`:db/2025.01.02/t/
q)\l db
q)t / never do this at home!!!
date ti p
-------------------------
2025.01.01 09:30:00 101
2025.01.01 09:31:00 102
2025.01.02 09:30:00 101.5
2025.01.02 09:31:00 102.5
The table appears to be in the workspace, along with the virtual date columns, but this is an illusion. It is actually mapped into memory. The request to display t forces all columns for all days to be loaded into memory. Never do this with a real database!
Important
Always qualify the partition column in the first where subphrase in any query against a partitioned table. If you do not, you will cause all partitions to be loaded into memory and will probably live-lock the server. Well before this completes, your colleagues will be at your desk with pitchforks and burning torches.
A more responsible way to create the previous display is:
q)select from t where date within 2025.01.01 2025.01.02
date ti p
-------------------------
2025.01.01 09:30:00 101
…
Here we partition tables with symbols in a fresh db. Note that we wrap the line in our display and suppress the response with _.
q)`:db/2025.01.01/t/ set .Q.en[`:db;] ([] ti:09:30:00 09:31:00; s:`ibm`msft; p:101 33f)
_
q)`:db/2025.01.02/t/ set .Q.en[`:db;] ([] ti:09:30:00 09:31:00; s:`ibm`msft; p:101.5 33.5)
_
q)\l db
q)select from t where date within 2025.01.01 2025.01.02
date ti s p
------------------------------
2025.01.01 09:30:00 ibm 101
14.3.4 Working with Partitioned Tables
Continuing with the last example of the previous section, we can apply basic operations on the partitioned table once it has been mapped. Observe that the virtual partition column is included in the results of (meta) and (cols).
q)count t
4
q)meta t
c | t f a
----| -----
date| d
ti | v
s | s
p | f
q)cols t
`date`ti`s`p
q)type t
98h
However, none of the following work on partitioned tables, even though they work on splayed tables.
t[0j]
t[;`p]
0#t
exec from t
select[1] from t
`p xasc t
Tip
The fact that (exec) doesn't work on partitioned tables is annoying but the workaround is:
exec ... from select ... from ...
The select template, or the equivalent functional form, is the way to access data for a partitioned table. We have already seen how to retrieve the records for consecutive days. Here is a very common query to retrieve a day's slice.
q)select from t where date=2025.01.01
date ti s p
----------------------------
2025.01.01 09:30:00 ibm 101
…
Here is how to retrieve the first or last day without hardcoding the date.
q)select from t where date=first date
date ti s p
----------------------------
2025.01.01 09:30:00 ibm 101
…
q)select from t where date=last date
date ti s p
------------------------------
2025.01.02 09:30:00 ibm 101.5
…
Always place the partition column constraint first.
select from t where date=2025.01.01, ti<09:30:30
date ti s p
---------------------------
2025.01.01 09:30:00 ibm 101
...
You can group by the partition column. Note we have wrapped the line on the display.
select hi:max p, lo:min p by date from t where date within 2025.01.01
2025.01.02
date | hi lo
----------| ----------
2025.01.01| 101 33
2025.01.02| 101.5 33.5
14.3.5 The Virtual Column i in Partitioned Tables
In a partitioned table, the virtual column i does not refer to absolute row number as it does with in-memory and splayed tables. Instead, it refers to the relative row number within a partition. Thus, a constraint on i alone would apply across all partitions and the result will contain that row in each partition slice – probably not what you want and likely a bad idea (colleagues with pitchforks again).
Continuing with the previous example, the following query retrieves the initial record from each of the specified partitions.
q)select from t where date in 2025.01.01 2025.01.02, i=0
date ti s p
-----------------------------
2025.01.01 09:30:00 ibm 101
2025.01.02 09:30:00 ibm 101.5
The following queries retrieve what happen to be the first and last records in the table, respectively.
q)select from t where date in 2025.01.01 2025.01.02, i=0
date ti s p
-----------------------------
2025.01.01 09:30:00 ibm 101
2025.01.02 09:30:00 ibm 101.5
q)select from t where date=max date, i=max i
date ti s p
-----------------------------
2025.01.02 09:31:00 msft 33.5
See 14.4.5 for using .Q.ind to index by absolute row number across partitions.
14.3.6 Query Execution on Partitioned Tables
Query execution against a partitioned table is more complex than execution against a splayed table. Understanding how it works is necessary to achieve good design and performance with kdb+ databases.
Recall that the motivation for partitions was to avoid loading entire columns into memory. Behind the scenes, kdb+ achieves this as follows:
- Analyze the
wherephrase to determine which partition slices are targeted by the query. - Process the remaining 'where' subphrases to determine the column sub-domains that must be loaded.
- Process the query separately against the requisite partition slices to obtain partial results.
- Combine the partial results to obtain the final result.
Reading only partial column slices of a table having many partitions is clearly a big memory and performance win. But things are even more interesting with regard to the third point above. If you have started q without secondary threads, the query is executed sequentially against the partitions and the time to complete is (roughly) the sum of the time against each of the partition slices.
By contrast, if you have started q with secondary threads, the query will be executed concurrently, once slice per secondary process. Thus, you can see significant speedup provided the query is compute-bound and the secondary threads are running on separate cores. On the other hand, if the query is I/O bound and you have only a single I/O channel to the data, you may see no speedup.
Next we discuss how kdb+ executes some types of queries concurrently.
14.3.7 Map-Reduce
We start with a brief introduction to map-reduce. Suppose we want to compute some aggregation – say the sum or the average – of a large list of numbers and have at our disposal a crop of interns armed with smartphones. Our strategy is to decompose the overall task into two stages.
In the first stage, we split the list into roughly (but not necessarily exactly) equal size sublists, distributing one to each intern along with instructions on what to compute. In the second step, we collect the (partial) result from each intern and combine them to get the overall result. This is what good managers do, after all.
Think about how to do this for sum before continuing. For the sum, you could instruct each intern to compute the sum of her list and then you would compute the sum of the individual sums. The same simple approach works for count or maximum.
For average, things aren't quite so simple. If you simply have each intern compute an average of her list, you will need to weight each partial average by the size of the sublist to get the final average. Thus you need the count too. You quickly realize that you can distribute the average computation more effectively. Instead of asking each intern to return the average and count for her list, ask her to compute the sum and the count. In the reduction step, you obtain the overall average as the sum of the partial sums divided by the sum of the partial counts.
More generally, map-reduce decomposes an operation on a (presumably large) list into two sub-operations, opmap and opreduce:
- In the first step, called map, opmap is performed on each sublist to obtain a list of partial results.
- In the second step, called reduce, the partial result lists are combined with opreduce to obtain the final result.
A good exercise (and interview question) is to express sorting a list using map-reduce. In greater generality, map-reduce may apply the two steps recursively – i.e., the reduce step may itself involve a map-reduce, etc.
It is easy to see how map-reduce applies to a query against a partitioned table, since the table is a list of records sliced into sub-lists by the partitioning. The challenge is to decompose the query into a map step and a reduce step. The solution depends on whether the query involves aggregation.
If there is no aggregation, the result of the query on each partition is simply the computed columns for the list of the records in the partition slice matching the constraint. In other words, produce a partial result table by computing the columns of the query across partitions. Because all the partial result tables conform, union the partial result tables in order of their virtual partition column values. In summary: fan the query across the partitions and union the ordered results.
Things are more interesting when the query contains aggregation. For aggregates that kdb+ recognizes as map-reducible, it applies the appropriate map operation across partition slices to obtain partial results. It then applies the reduce operation across the partial result tables to obtain the final result table.
At the time of this writing (July 2025), the aggregates that kdb+ can decompose with map-reduce are: all, any, avg, cor, count, cov, dev, distinct, first, last, max, min, prd, scov, sdev, sum, svar, var, wavg, and wsum.
To make this more concrete, we re-examine computing an average against a daily partitioned trades table.
select avg size from trade where date within ...
Because kdb+ recognizes avg as an aggregate it can decompose with map-reduce, it performs the calculation in two steps. Specifically, opmap computes the sum and record count for each date, while opreduce divides the sum of the daily sums by the sum of the daily counts to give the average. The user gets the expected result and is none the wiser unless q was started with secondary threads, in which case the result may be faster than linear in the size of the table.
14.3.8 Multiple Partitioned Tables
Recall that there can be only one partition domain in a given kdb+ root – i.e., daily, monthly, yearly or int. However, multiple tables can share this partitioning.
For example, quotes and trades can coexist in a daily partition. That layout would look something like,
/db
/2025.01.01
/trade
/quote
/2025.01.02
/trade
/quote
Important
Although not all potential partition values need be populated, any value that is populated must contain slices for all tables.
The following layout is in error:
/db
/2025.01.01 <- this is a bad partition!
/quote
/2025.01.02
/trade
/quote
We demonstrate how to create a simplified database with tables standing for trades and quotes in a fresh db directory. Notice that we wrap the lines that do not fit within this page margin and we suppress non-essential return values. In a fresh q session and fresh db.
q)`:db/2025.01.01/t/ set .Q.en[`:db;] ([] ti:09:30:00 09:31:00; sym:`ibm`msft;p:101 33f)
_
q)`:db/2025.01.02/t/ set .Q.en[`:db;] ([] ti:09:30:00 09:31:00; sym:`ibm`msft;p:101.5 33.5)
_
q)`:db/2025.01.01/q/ set .Q.en[`:db;] ([] ti:09:30:00 09:31:00; sym:`ibm`msft;b:100.75 32.75; a:101.25 33.25f)
_
q)`:db/2025.01.02/q/ set .Q.en[`:db;] ([] ti:09:30:00 09:30:00; sym:`ibm`msft;b:101.25 33.25; a:101.75 33.75)
_
Mapping the root, we now find both are available:
q)\l db
q)select from t where date within 2025.01.01 2025.01.02
date ti sym p
------------------------------
2025.01.01 09:30:00 ibm 101
…
q)select from q where date within 2025.01.01 2025.01.02
date ti sym b a
--------------------------------------
2025.01.01 09:30:00 ibm 100.75 101.25
…
In a fresh q session we add a historical slice for q on 2024.12.31 but neglect to add the corresponding slice for t. Things seem fine when we map the root and query q on the newly added date.
q)`:db/2024.12.31/q/ set .Q.en[`:db;] ([] ti:09:30:00 09:31:00; sym:`ibm`msft; b:101. 33.; a:101.5 33.5f)
`:db/2024.12.31/q/
q)\l db / reload
q)select from q where date=2024.12.31
date ti sym b a
----------------------------------
2024.12.31 09:30:00 ibm 101 101.5
But we get a nasty surprise when we query t on the missing date.
q)select from t where date=2024.12.31
'./2024.12.31/t/ti. OS reports: No such file or directory
[0] select from t where date=2024.12.31
^
We could remedy this by manually splaying an empty copy of t on that date. Instead we use the utility .Q.chk that fills all missing slices with empty tables from the most recent partition. We remap and find things are fine. In a fresh q session,
q).Q.chk `:db
()
()
,`:db/2024.12.31
q)\l db
q)select from t where date=2024.12.31
date ti sym p
-------------
Tip
If you neglect to place a table slice in the most recent partition, the table will effectively disappear from your database since kdb+ inspects only that partition to determine which tables are present.
Continuing with our previous (repaired) example we add a trade slice for 2025.01.03 but neglect to add a quotes slice. When we remap, the table q is nowhere to be found. In a fresh q session,
q)`:db/2025.01.03/t/ set .Q.en[`:db;] ([] ti:09:30:00 09:31:00; sym:`ibm`msft;p:101 33f)
`:db/2025.01.03/t/
q)\l db
q)\a
,`a
Here is one way to fill manually an empty slice in a fresh q session by querying the splayed slice.
q)`:db/2025.01.03/q/ set 0#select from `:db/2025.01.02/q
`:db/2025.01.03/q/
q)\l db
q)\a
`q`t
14.3.9 Examples of Other Partition Domain Types
We present simple examples of partitions with the other permissible types. For simplicity, we do not include symbol columns; to do so, merely add the call to .Q.en as in previous examples.
Yearly partitioning can be a good solution when you regularly need to retrieve multiple years of daily data that isn't too large. Note that we explicitly include the date column but not the year column, which is virtual. In a fresh db:
q)`:db/2025/t/ set ([] date:2025.01.01 2025.01.02; p:101 102f)
`:db/2025/t/
q)`:db/2024/t/ set ([] date:2024.01.01 2024.01.02; p:101.5 102.5)
`:db/2024/t/
q)\l db
q)select from t where year within 2024 2025
year date p
---------------------
2024 2024.01.01 101.5
2024 2024.01.02 102.5
2025 2025.01.01 101
2025 2025.01.02 102
Monthly partitioning can be a good compromise between daily and yearly when you have both single day and year-to-date queries. We explicitly include the date column but not the month column, which is virtual.
Tip
Do not include a trailing m in the partition directory name for the monthly slice but do include it in the queries.
In a fresh db,
q)`:db/2025.01/t/ set ([] date:2025.01.01 2025.01.02; p:101 102f)
`:db/2025.01/t/
q)`:db/2025.02/t/ set ([] date:2025.02.01 2025.02.02; p:101.5 102.5)
`:db/2025.02/t/
q)\l db
q)select from t where month within 2025.01 2025.02m
month date p
------------------------
2025.01 2025.01.01 101
2025.01 2025.01.02 102
2025.02 2025.02.01 101.5
2025.02 2025.02.02 102.5
You can partition by an int to slice into arbitrary bins. The data need not be daily. In a fresh db,
q)`:db/2025.01/t/ set ([] date:2025.01.01 2025.01.02; p:101 102f)
`:db/2025.01/t/
q)`:db/2025.02/t/ set ([] date:2025.02.01 2025.02.02; p:101.5 102.5)
`:db/2025.02/t/
q)\l db
q)select from t where month within 2025.01 2025.02m
month date p
------------------------
2025.01 2025.01.01 101
2025.01 2025.01.02 102
2025.02 2025.02.01 101.5
2025.02 2025.02.02 102.5
14.3.10 Partitioned Tables with Links
You can create links between tables in the same partitioned database with the following restriction.
Note
A link column in the slice of a partitioned table must be intra-partition – i.e., it must refer to another table in the same slice. In particular, you cannot link across days in a daily partitioned database.
Since the daily slices are splayed tables, the mechanics of creating links for partitioned tables are the same as for splayed tables. We first demonstrate how to create a link as the slices are created in memory and splayed. Again we avoid symbols for simplicity; to include them, add a call to .Q.en as in previous examples.
We create two days of linked tables in a fresh db.
q)t1:([] id:101 102 103 104 105; v:1.1 2.2 3.3 4.4 5.5)
q)`:db/2025.01.01/t1/ set t1
_
q)`:db/2025.01.01/t2/ set ([] id:`t1!t1[`id]?103 101 101 102; n:10 20 30 40)
_
q)`:db/2025.01.02/t/ set t1
_
q)`:db/2025.01.02/t2/ set ([] id:`t1!t1[`id]?105 104 104; n:50 60 70)
_
Now we load the root and use dot notation on the link column.
q)\l db
q)select date,n,id.v from t2 where date in 2025.01.01 2025.01.02
date n v
-----------------
2025.01.01 10 3.3
…
Adding a link column to persisted partitions is only a bit more complicated. Here we create a link on (enumerated) symbol columns. In a fresh db we first create the partitioned tables without the link.
q)`:db/2025.01.01/t1/ set .Q.en[`:db;] ([] id:`c`b`a; c1: 10 20 30)
_
q)`:db/2025.01.02/t1/ set .Q.en[`:db;] ([] id:`x`a; c1: 40 50)
_
q)`:db/2025.01.01/t2/ set .Q.en[`:db;] ([]id:`a`b`a`c; c2: 1 2 3 4.)
_
q)`:db/2025.01.02/t2/ set .Q.en[`:db;] ([] id:`x`a`x; c2:5 6 7.)
_
Presumably at a later time, we add the link column to t2 in each partition, making sure to update the .d files.
q)`:db/2025.01.01/t2/t1lk set `t1!get[`:db/2025.01.01/t1/id]?get[`:db/2025.01.01/t2/id]
_
q)`:db/2025.01.01/t2/.d set get[`:db/2025.01.01/t2/.d],`t1lnk
_
q)`:db/2025.01.02/t2/t1lnk set `t1!get[`:db/2025.01.02/t1/id]?get[`:db/2025.01.01/t2/id]
_
q)`:db/2025.01.02/t2/.d set get[`:db/2025.01.02/t2/.d],`t1lnk
_
Now we can map the root and use dot notation.
q)select date,id,t1lnk.c1,c2 from t2 where date<=2025.01.02
date id c1 c2
-------------------
2025.01.01 a 30 1
…
14.4 Segmented Tables
We saw in the previous section that queries against partitioned tables execute separately across the partitions. When q is started without secondary threads (secondaries) this is a memory win but not a performance win, except for the reduced amount of data manipulation required. Starting q with secondaries can offer significant performance enhancement for compute-bound queries.
In large time series databases, the kdb queries are often I/O-bound. In this case, the multiple secondaries for a partitioned query will mostly be waiting on I/O (or Godot). The solution requires multiple I/O channels so that data retrieval and processing can occur in parallel. Kdb+ provides another level of data decomposition to enable parallel processing in this scenario.
Traditional kdb+ parlance did not distinguish between this level and the basic partitioning we encountered in the last section. We find this confusing and, at the suggestion of Simon Garland formerly of Kx, have introduced the terminology of segmentation.
14.4.1 Segments
Segmentation is an additional level of structure on top of partitioning. Segmentation spreads a partitioned table’s records across multiple directories that have the same structure as the root directory in a partitioned database. Each pseudo-root, called a segment, is thus a directory that contains a collection of partition directories. The segment directories are presumably on independent I/O channels so that data retrieval can occur in parallel. This used to be rare and expensive with spinning media but with the prevalence of SSD now it is commonplace.
You can use any criteria to decompose partition slices, as long as the results are conforming record subsets that are disjoint and complete – i.e., they reconstitute the original table with no omissions or duplication. The decomposition can be along rows, along partitions or by some combination thereof, but it cannot occur only along columns since all records must conform across the decomposition.
Important
You must ensure that the segments conform and are complete and disjoint, since kdb+ will not check this when you write the data files or when you load the database. In particular, overlapping segments will result in duplicate records in query results and an incomplete decomposition will result in missiing records.
Big Picture (3): We view a segmented table as a three-dimensional persisted form: the table is cut vertically by splaying, sliced horizontally by partitions and is additionally segmented across physical locations. The primary purpose of the third dimension is to allow operations against the tables to take advantage of parallel I/O and parallel processing. Following is an abstract representation of segmentation.
segment1
| column1 | column2 | … | |
|---|---|---|---|
| partition* | ▪ | ▪ | … |
| partition* | ▪ | ▪ | … |
| … | … | … | … |
segment2
| column1 | column2 | … | |
|---|---|---|---|
| partition* | ▪ | ▪ | … |
| partition* | ▪ | ▪ | … |
| … | … | … | … |
Here each ▪ represents a (partial) column of a partition slice as an opaque entity.
Important
In contrast to the partitioned table layout in which partitions reside under the root, the segment directories must not reside under the root. The only portion of a segmented table (other than the sym file for enumerated symbol columns) that lives in the root is a file called par.txt containing the paths of the physical locations of the segments, one segment path per line. At the time of this writing (July 2025) the example of segmented tables on the KX site shows a directory with the same name as the root in each segment that houses the partition directories. This is not necessary as our examples below show.
The example discussed above would be laid out on the file system. Here we use leading / to indicate that the entities are directories, other than that they are absolute paths.
/db
[sym]
par.txt
=============== <- channel 1
/segment1
/partition*
/table*
/table*
…
/partition*
/table*
/table*
…
=============== <- channel 2
/segment2
/partition*
/table*
/table*
…
/partition*
/table*
/table*
…
=============== …
To make this more concrete, we demonstrate how to segment daily trades (and eventually quotes) in several useful ways. To begin, we create segments by bucketing trades into alternating days of the week.
/1 <- drive 1
/2015.01.01
/t <- days trades
/2015.01.03
/t <- days trades
…
=============
/2 <- drive 2
/2015.01.02
/t <- days trades
/2015.01.04
/t <- days trades
…
This segmentation represents grouping of partitions by parity (even/odd), so it is orthogonal to the partitioning. It is clearly complete and disjoint and is easily generalizable to round-robining every n business days.
We could alternatively create segments by splitting the daily slices into records with symbols starting with a-m and those starting with n-z. Here we decompose based on the values in a symbol column, which we could not do with simple partitioning.
/am <- channel 1
/2025.01.01
/t <- day’s trades for syms a-m
/2025.01.02
/t <- day’s trades for syms a-m
…
=============
/nz <- channel 2
/2025.01.01
/t <- day’s trades for syms n-z
/2025.01.02
/t <- day’s trades for syms n-z
…
This segmentation clearly results in complete, disjoint subsets. It is not orthogonal to the partitioning because a single day's trades span multiple segments. It is easily generalizable to n segments by splitting the alphabet into n roughly equal portions.
Note
The previous example demonstrates that while you cannot partition a table by a symbol column, you can segment by a symbol decomposition.
Alternately, we can create segments by splitting the daily slices into trades from NYSE and trades from NASDAQ.
/nyse <- channel 1
/2015.01.01
/t <- day's trades for nyse
/2015.01.02
/t <- day's trades for nyse
…
=============
/nasd <- channel 2
/2015.01.01
/t <- days' trades for nasd
/2015.01.02
/t <- day's trades for nasd …
The segmentation is also not orthogonal to the partitioning since a single day's trades span multiple segments. It is clearly complete and disjoint and is easily generalizable to multiple exchanges – e.g., all Asian exchanges.
Finally, we provide an example of a non-uniform segmentation, in which some partitions span segments and others do not. The A segment contains trades from the beginning of day 1 until lunchtime of day 2. The B segment contains the trades from lunchtime of day 2 through the end of day 3. The C segment contains all trades from day 4. We rely on our understanding of the business logic to know that segmentation is disjoint and complete.
/seg A <- channel 1
/2025.01.01
/t <- entire day’s trades
/2025.01.02
/t <- morning trades
=============
/seg B <- channel 2
/2025.01.02
/t <- afternoon trades
/2025.01.03
/t <- entire day’s trades
=============
/seg C <- channel 3
/2025.01.04
/t <- entire day’s trades
This sort of segmentation can be useful if your query patterns are not uniform across days.
14.4.2 Segmentation vs. Partitions
We collect the following into tabular form so that you can use it for meditation.
| Category | Partitioned Table | Segmented Table |
|---|---|---|
| Record location | All partitions (and hence all records) reside under the root directory. | None of the segments (and hence no records) reside under the root. |
| I/O channels | All partitions (and hence all records) reside on a single I/O channel. | The segments (and hence the records) should reside on multiple I/O channels. |
| Processing | Partitions loaded and processed sequentially in aggregation queries. | Given appropriate secondary threads and cores, aggregate queries load segments in parallel and process them concurrently. |
| Decomposition | Partition by grouping rows on the values of a virtual column of underlying integral type. | Segment by any criteria yielding disjoint and complete decomposition. |
| Symbols | Cannot partition on a symbol column. | Can segment along a symbol column. |
| Virtual Column | Partition column not stored. Virtual column values inferred from directory names. | No special column associated with segmentation (virtual column from underlying partition still present). |
14.4.3 Creating Segmented Tables
There is no one-size-fits-all utility to create segments. Instead, you write a q program that places a subset of each partition slice into a segment. You can create segments and partitions at the same time by including logic in your data load script to extract slice subsets and place them in the appropriate directory on the appropriate drive.
Along with creating the partitions, you must also specify the locations of the segments in an ASCII text file par.txt located in the root. Each line of par.txt contains the path of one segment directory; symlinks are acceptable.
Important
The segment paths must not be under the root. Qbies who make this mistake will beg for Vogon poetry.
We illustrate the process by creating segmented tables for each example in 14.4.1. Our intent is to demonstrate how to create the segments and observe how execution of the query pieces the segment results back together. Initially we create the segments without concern for placing them on multiple drives and running multiple secondaries, as we are not (yet) considering performance. That follows shortly.
We begin with the example in which daily slices are segmented alternately across two bins. This will result in roughly even access and will give good performance assuming the queries are uniformly distributed across days. Using relative paths, the storage layout is:
db/
sym
par.txt
1/
2025.01.01/
t/
2025.01.03/
t/
2/
2025.01.02/
t/
2025.01.04/
t/
The corresponding entries in par.txt point to:
1
2
With fresh directories 1, 2 and db we construct the segments and write the par.txt file.
q)`:1/2025.01.01/t/ set .Q.en[`:db;] ([] ti:09:30:00 09:31:00; s:`ibm`t; p:101 17f)
_
q)`:2/2025.01.02/t/ set .Q.en[`:db;] ([] ti:09:30:00 09:31:00; s:`ibm`t; p:101.5 17.5)
_
q)`:1/2025.01.03/t/ set .Q.en[`:db;] ([] ti:09:30:00 09:31:00; s:`ibm`t; p:103 16.5f)
_
q)`:2/2025.01.04/t/ set .Q.en[`:db;] ([] ti:09:30:00 09:31:00; s:`ibm`t; p:102 17f)
_
q)`:db/par.txt 0: ("../1"; "../2")
`:db/par.txt
q)\l db
q)select from t where date within 2025.01.01 2025.01.04
date ti s p
---------------------------
2025.01.01 09:30:00 0 101
2025.01.01 09:31:00 1 17
Tip
We have to account in par.txt for the fact the \l changes the working directory to its target, which it does this before par.txt is read.
We can query the segments in the loaded table with .Q.P and we can ask whether each partition is uniquely found in one segment with .Q.u. Of course, the utilities for partitioned tables also work on segmented tables; they now display partition results across segments.
q).Q.P
`:./../1`:./../2
q).Q.u
1b
q).Q.P!.Q.D
:./../1| 2025.01.01 2025.01.03
:./../2| 2025.01.02 2025.01.04
q).Q.pt / partitioned tables
`t
q).Q.qp t / is t partitioned
1b
q).Q.pf / partition field
`date
q).Q.D / partitions across segments
2025.01.01 2025.01.03
2025.01.02 2025.01.04
q).Q.PD /location of each partition across segments
`:./../1`:./../2`:./../1`:./../2
q).Q.PV / partition column values across segments
2025.01.01 2025.01.02 2025.01.03 2025.01.04
q).Q.pn / count records in each partition across segments
t| 2 2 2 2
Next we segment trades by symbol range. Storage layout relative to the current working directory looks like:
am
/2025.01.01
/2025.01.02
nz
/2025.01.01
/2025.01.02
The corresponding par.txt points to:
am
nz
We begin with fresh directories am, an, and db and a utility function to extract records for symbols within a range. Then we create the segments and write out the par.txt file.
q)extr:{[t;r] select from t where (`$1#'string sym) within r}
q)t1:.Q.en[`:db;] ([] ti:09:30:00 09:31:00; sym:`ibm`t; p:101 17f)
q)`:am/2025.01.01/t/ set extr[t1;`a`m]
_
q)`:nz/2025.01.01/t/ set extr[t1;`n`z]
_
q)t2:.Q.en[`:db;] ([] ti:09:30:00 09:31:00; sym:`ibm`t; p:101.5 17.5)
q)`:am/2025.01.02/t/ set extr[t2;`a`m]
_
q)`:nz/2025.01.02/t/ set extr[t2;`n`z]
_
q)`:db/par.txt 0: ("../am"; "../nz")
`:db/par.txt
q)\l db
q)select from t where date within 2025.01.01 2025.01.02
date ti sym p
-----------------------------
2025.01.01 09:30:00 ibm 101
2025.01.01 09:31:00 t 17
…
q).Q.u
0b
q).Q.P!.Q.D
:./../am| 2025.01.01 2025.01.02
:./../nz| 2025.01.01 2025.01.02
Next we segment trades by exchange. The storage layout looks like:
nyse
/2025.01.01
/t
/2025.01.02
/t
nasd
/2025.01.01
/t
/2025.01.02
/t
The corresponding par.txt points to,
nyse
nasd
Segment construction is completely analogous to that of the previous example. We begin with fresh directories nyse, nasd, db and a utility to extract records by exchange:
q)extr:{[t;e] select from t where ex=e}
q)t1:.Q.en[`:db;] ([] ti:09:30:00 09:31:00; s:`ibm`aapl; p:101 17f;ex:`n`o)
q)`:nyse/2025.01.01/t/ set extr[t1;`n]
_
q)`:nasd/2025.01.01/t/ set extr[t1;`o]
_
q)t2:.Q.en[`:db;] ([] ti:09:30:00 09:31:00; s:`aapl`ibm; p:143 102f; ex:`o`n)
q)`:nyse/2025.01.02/t/ set extr[t2;`n]
_
q)`:nasd/2025.01.02/t/ set extr[t2;`o]
_
q)`:db/par.txt 0: ("../nyse"; "../nasd")
_
q)\l db
q)select from t where date within 2025.01.01 2025.01.02
date ti s p ex
-------------------------------
2025.01.01 09:30:00 ibm 101 n
2025.01.01 09:31:00 aapl 17 o
…
q).Q.u
0b
q).Q.P!.Q.D
:./../nyse| 2025.01.01 2025.01.02
:./../nasd| 2025.01.01 2025.01.02
Finally we distribute a day across segments. The storage layout looks like:
seg A
/2025.01.01
/2025.01.02
seg B
/2025.01.02
/2025.01.03
seg C
/2025.01.04
The corresponding par.txt points to:
A
B
C
In fresh directories A, B, C and db,
q) t1:.Q.en[`:db;] ([] ti:09:30:00 12:31:00; s:`ibm`t; p:101 17f)
q)`:A/2025.01.01/t/ set t1
_
q)t2:.Q.en[`:db;] ([] ti:09:31:00 12:32:00; s:`ibm`t; p:102 18f)
q)`:A/2025.01.02/t/ set select from t2 where ti<=12:00:00
_
q)`:B/2025.01.02/t/ set select from t2 where ti>12:00:00
_
q)t3:.Q.en[`:db;] ([] ti:09:33:00 12:33:00; s:`ibm`t; p:103 19f)
q)`:B/2025.01.03/t/ set t3
_
q)t4:.Q.en[`:db;] ([] ti:09:34:00 12:35:00; s:`ibm`t; p:104 20f)
q)`:C/2025.01.04/t/ set t4
_
q)`:db/par.txt 0: ("../A";"../B";"../C")
_
q)\l db
q)select from t where date within 2025.01.01 2025.01.04
date ti s p
---------------------------
2025.01.01 09:30:00 ibm 101
2025.01.01 12:31:00 t 17
…
q).Q.u
0b
q).Q.P!.Q.D
:./../A| 2025.01.01 2025.01.02
:./../B| 2025.01.02 2025.01.03
:./../C| ,2025.01.04
14.4.4 Multiple Segmented Tables
Multiple tables that share a partition can also be segmented. While there is no requirement that the tables be distributed similarly across the segmentation, they should do so if you expect to use links or joins between them. In this section, we create (vastly simplified) trades and quotes that share a segmentation in order to do joins.
Our actual segmentation layout is designed to allow parallel retrieval for (aj). We recall that it will perform an equijoin on the sym and date columns, while it performs a non-equijoin on the time column. Thus we want all the time values for a given symbol and date in one segment. Consequently, segmentation by symbol range will allow parallelization of (aj) across symbols.
Here is a simple version of this scheme.
a_m <- segment for first portion of alphabet
/2025.01.01 <- day 1
/t <- day 1 trades for symbols a-m
/q <- day 1 quotes for symbols a-m
/2025.01.02 <- day 2
/t <- day 2 trades for symbols a-m
/q <- day 2 quotes for symbols a-m
=================
n_z <- segment for second portion of alphabet
/2025.01.01 <- day 1
/t <- day 1 trades for symbols n-z
/q <- day 1 quotes for symbols n-z
/2025.01.02 <- day 2
/t <- day 2 trades for symbols n-z
/q <- day 2 quotes for symbols n-z
=================
The corresponding par.txt points to:
a_m
n_z
Setting up the segments and partitions is a matter of getting the details right. In our simplified example, we assume that the directories am and nz reside on different drives. In fresh directories am, nz and db:
q)extr:{[t;r] select from t where (`$1#'string sym) within r}
q)/ day 1
q)t:.Q.en[`:db;] ([] ti:09:30:00 09:31:00; sym:`ibm`t; p:101 17f)
q)q:.Q.en[`:db;] ([] ti:09:29:59 09:29:59 09:30:00; sym:`ibm`t`ibm; p:100.5 17 101)
q)`:am/2025.01.01/t/ set extr[t;`a`m]
_
q)`:nz/2025.01.01/t/ set extr[t;`n`z]
_
q)`:am/2025.01.01/q/ set extr[q;`a`m]
)
q)`:nz/2025.01.01/q/ set extr[q;`n`z]
q)/ day 2
q)t:.Q.en[`:db;] ([] ti:09:30:00 09:31:00; sym:`t`ibm; p:17.1 100.9)
q)q:.Q.en[`:db;] ([] ti:09:29:59 09:29:59 09:30:00; sym:`t`ibm`t;p:17 100.8 17.1)
q)`:am/2025.01.02/t/ set extr[t;`a`m]
_
q)`:nz/2025.01.02/t/ set extr[t;`n`z]
_
q)`:am/2025.01.02/q/ set extr[q;`a`m]
_
q)`:nz/2025.01.02/q/ set extr[q;`n`z]
_
q)`:db/par.txt 0: ("../am"; "../nz")
_
q)\l db
q)dr:2025.01.01 2025.01.02
q)select from t where date within dr
date ti sym p
-----------------------------
2025.01.01 09:30:00 ibm 101
…
q)select from q where date within dr
date ti sym p
-----------------------------
2025.01.01 09:29:59 ibm 100.5
…
q).Q.pt
`s#`q`t
q).Q.P!.Q.D
:./../am| 2025.01.01 2025.01.02
:./../nz| 2025.01.01 2025.01.02
Now let’s do the aj. We could be naïve and pull all requisite data into the workspace, but this would be inefficient in memory and slow, even assuming it would fit.
q)aj[`date`sym`ti;select from t where date within dr;
select from q where date within dr]
_
Instead we start q with secondary threads and use peach to run things in parallel across the segmentation.
$q -s 2
KDB+ ...
q)aj99:{[d] aj[`sym`ti;select from t where date=d; select from q where date=d]}
q)\l db
q)raze aj99 peach 2025.01.01 2025.01.02
date ti sym p
-----------------------------
2025.01.01 09:30:00 ibm 101
2025.01.01 09:31:00 t 17
2025.01.02 09:31:00 ibm 100.8
2025.01.02 09:30:00 t 17.1
14.4.5 Query Execution against Segmented Tables
The performance goal is to scale out by taking advantage of parallel I/O and parallel processing. We would ideally like to achieve 100% saturation of the I/O channels and 100% utilization of each core. How do we approach these levels on a kdb+ server? The key insight in kdb+ design and tuning is that a vector calculation on in-memory data is much faster than retrieving data from storage. This suggests our first two design principles.
- Maximize the number of independent I/O channels to retrieve data in parallel.
-
Maximize server memory in order to allocate each secondary thread as much memory as it needs.
We can satisfy these two objectives by having
nindependent I/O channels on the server – for example, nearly all machines now have SSD storage that appears as multiple independent I/O channels. In this scenario, we are led to, -
Create n segments to spread data retrieval across the n channels in order to maximize I/O parallelization.
To ensure that data can be processed from all n channels simultaneously and that no two threads contend for data, we are led to our final objective.
-
Have (at least)
nsecondary threadsThe precise form of segmentation and number of secondaries will depend on the actual data and queries.
Now assuming such an environment, how does kdb+ execute a query against a segmented table in our scenario of n segments and n secondaries? Essentially, it decomposes a qualifying query into two steps via map-reduce:
-
Map: a revised form of the original query that executes on each segment
-
Reduce: aggregate the segment results
The use of map-reduce allows kdb+ to perform preliminary calculations as close to the data as possible and to perform aggregation centrally at the last step.
To begin, kdb+ compares the query’s requisite partitions to the segment layout in par.txt and determines the footprint of the target partitions on each segment. The result is a nested list, each item being the partition list for one segment.
To execute the map step, kdb+ creates a revised query containing the map sub-operation from the original query, which it dispatches to all n secondaries via peach. Each secondary is provided the partition list for one segment and computes the revised query for its segment. For example, the revised query for avg is:
Compute the sum and count of the sublist
Now we stand back and examine execution across the secondaries by squinting to make partition detail disappear. At this level, the original query’s map step has n secondaries retrieving segment data in parallel and calculating segment results. Once all secondaries complete, the nested list of segment results is flattened and reordered by partition value.
Finally, kdb+ employs the original query reduce step to combine the full list of ordered partition results into the query result table. Whew!
Note
Kdb+ treats a vanilla partitioned table–i.e., without a par.txt–as having a single segment. The astute reader will realize that the description in 14.3.3 is actually the degenerate case of this section.
14.4.5 Balancing Secondaries and Cores
In our quest to reach 100% saturation of I/O and CPU, we consider how to optimize the use of secondaries and cores. As seen in the previous section, the query via map-reduce provides good progress toward our original objective of I/O saturation. The secondaries can load data from all n channels in parallel without contention.
We now investigate channel and core utilization. Since kdb+ will only use as many secondaries to process a query as there are segments in the query footprint, we consider two cases:
I/O-bound: Assuming that the query has light calculation compared to data retrieval (common in kdb+), having n secondaries on n cores is close to optimal: most of the time, all n secondaries will be waiting for data. When a partition load completes, there will be a brief burst of computation, followed by another long load. So we conclude for this scenario:
n channels => n segments => n secondaries => n cores
Balanced I/O-compute: Consider the scenario in which both the I/O and calculation are intensive. While one secondary is waiting on data, another secondary on the same core could be crunching; conversely, while one secondary is crunching another secondary on that core could be loading data. Thus to maximize channel and core utilization, we actually want 2n secondaries on n cores. We conclude that in this scenario we should have 2n segments, two per channel. On average, there will be one secondary per core loading data and one secondary per core crunching the data it has just loaded.
n channels => 2n segments => 2n secondaries => n cores
These conclusions rely on many implicit assumptions that we have glossed over. In practice, you should view them as guidelines, with the goal of feeding data to kdb+ as fast as possible. The optimum configuration for your situation will depend on your particular query mix. For example, queries that do VWAP calculations are closer to the first scenario, whereas queries doing regression analysis are closer to the second.
A good strategy is to construct your initial configuration using one of the above scenarios. Then load a good approximation of your anticipated data and query mix, and simulate a realistic user load. Observe the I/O saturation and CPU utilization and adjust the number of secondaries and cores allocated to the q process accordingly.
14.4.6 Sample Performance Data
Our examples are based on simulated trade data randomly generated to match one month of US equity data spanning August and September 2014. In preparing this revision of the tutorial we decided that the analysis was still appropriate and did not warrant the substantial effort to update it to current dates. Even though the internal processing logic has changed, this still gives an idea of how kdb+ distributed queries. Those who are truly interested can create a similarly instrumented copy of q and run their own queries.
The data is in the table trade, whose most significant columns time, sym, tp and ts represent respectively arrival time, instrument symbol, trade price and trade size columns. The trade data is partitioned by date.
The following output is generated from a version of kdb+ that has been instrumented to show the intermediate steps during query execution. The first example query shows how the simple aggregate avg tp is decomposed into the pair sum and count in the map step, followed by division of the sums of sums by the sum of counts for the reduce step. Here the data is not segmented – i.e., there is no par.txt file.
The added instrumentation exposes 3 stages of execution of a query.
-
Analyze query:
- Decompose query into map and reduce components (if appropriate)
- Determine and apply partition constraints
- Map query onto segments and partitions (query plan)
-
Execute map step, if appropriate
- Compute final result (reduce step)
q)select avg tp from trade where date in -3#date
"--- map/reduce: input aggregates, map query, reduce query ---" (,`tp)!,(avg;`tp)
`0`1!((sum;($["f"];`tp));(#:;`i))
(,`tp)!,(%;(sum;`0);(sum;`1))
"--- query plan: segments/partitions ---" (`trade;();()!();`0`1!((sum;($["f"];`tp));(#:;`i)))
2014.09.24 2014.09.25 2014.09.28
"--- partition query: query, segment, partition, result ---" (`trade;();()!();`0`1!((sum;($["f"];`tp));(#:;`i)))
`:.
2014.09.24
+`0`1!(,1.419538e+10;,27914938)
"--- partition query: query, segment, partition, result ---" (`trade;();()!();`0`1!((sum;($["f"];`tp));(#:;`i)))
`:. 2014.09.25 +`0`1!(,1.419318e+10;,24485503)
"--- partition query: query, segment, partition, result ---" (`trade;();()!();`0`1!((sum;($["f"];`tp));(#:;`i)))
`:. 2014.09.28 +`0`1!(,1.388645e+10;,20162485)
"--- query plan result ---" (+`0`1!(,1.419538e+10;,27914938);+`0`1!(,1.419318e+10;,24485503); +`0`1!(,1.388645e+10;,20162485))
"--- final result ---" tp -------- 582.5979
First note that 0 and 1 are used as intermediate columns. Observe that avg tp is decomposed into a map step.
`1:sum "f"$tp and`0:count i
and a reduce step
tp:(sum `0)%sum`1.
Also observe that expressions are displayed in their parse tree format and that count is expressed as its k equivalent: the unary form (#:).
The next example uses par.txt to segment the data into four segments: (/d/d1/data, /d/d2/data, /d/d3/data, /d/d4/data). It also uses a by date clause to sidestep the need to break up the query into map and reduce components (our data is partitioned by date). For clarity, we execute the query only on the last 16 partitions as well as on a subset of symbols. Observe that the "query plan" contains more information than that of the previous example:
- the query
- a list of all partitions
- a list of all segments
- a nested list of partitions belonging to segments.
q)select avg tp by date from trade where date in -16#date,sym in syms
"--- query plan: segments/partitions ---" (`trade;,(in;`sym;`syms);(`symbol$())!`symbol$();(,`tp)!,(avg;`tp)) 2014.08.18 2014.08.19 2014.08.20 2014.08.21 2014.08.24 2014.08.25 2014.08.26 2014.08.27 2014.08.28 2014.08.31 2014.09.01 2014.09.02 2014.09.03 2014.09.04 2014.09.08 2014.09.09 `:/d/d1/data`:/d/d2/data`:/d/d3/data`:/d/d4/data ((`:/d/d1/data;2014.08.21 2014.08.25 2014.09.02); (`:/d/d2/data;2014.08.18 2014.08.26 2014.09.03); (`:/d/d3/data;2014.08.19 2014.08.27 2014.08.31 2014.09.04 2014.09.08); (`:/d/d4/data;2014.08.20 2014.08.24 2014.08.28 2014.09.01 2014.09.09))
"--- partition query: query, segment, partition, result ---" (`trade;,(in;`sym;`syms);(`symbol$())!`symbol$();(,`tp)!,(avg;`tp)) `:/d/d1/data
2014.08.21 +(,`tp)!,,15.42632
"--- partition query: query, segment, partition, result ---" (`trade;,(in;`sym;`syms);(`symbol$())!`symbol$();(,`tp)!,(avg;`tp)) `:/d/d1/data
2014.08.25
+(,`tp)!,,15.04996
"--- partition query: query, segment, partition, result ---" (`trade;,(in;`sym;`syms);(`symbol$())!`symbol$();(,`tp)!,(avg;`tp)) `:/d/d1/data
2014.09.02
+(,`tp)!,,14.16648
"--- partition query: query, segment, partition, result ---" (`trade;,(in;`sym;`syms);(`symbol$())!`symbol$();(,`tp)!,(avg;`tp)) `:/d/d2/data 2014.08.18
+(,`tp)!,,14.16883
... (some output removed for brevity) ...
"--- partition query: query, segment, partition, result ---" (`trade;,(in;`sym;`syms);(`symbol$())!`symbol$();(,`tp)!,(avg;`tp)) `:/d/d3/data
2014.09.08
+(,`tp)!,,15.59198
"--- partition query: query, segment, partition, result ---" (`trade;,(in;`sym;`syms);(`symbol$())!`symbol$();(,`tp)!,(avg;`tp)) `:/d/d4/data 2014.08.20
+(,`tp)!,,15.2657
"--- partition query: query, segment, partition, result ---" (`trade;,(in;`sym;`syms);(`symbol$())!`symbol$();(,`tp)!,(avg;`tp)) `:/d/d4/data
2014.08.24 +(,`tp)!,,14.75603
"--- partition query: query, segment, partition, result ---" (`trade;,(in;`sym;`syms);(`symbol$())!`symbol$();(,`tp)!,(avg;`tp)) `:/d/d4/data 2014.08.28
+(,`tp)!,,14.37194
"--- partition query: query, segment, partition, result ---" (`trade;,(in;`sym;`syms);(`symbol$())!`symbol$();(,`tp)!,(avg;`tp)) `:/d/d4/data 2014.09.01
+(,`tp)!,,13.25797
"--- partition query: query, segment, partition, result ---" (`trade;,(in;`sym;`syms);(`symbol$())!`symbol$();(,`tp)!,(avg;`tp)) `:/d/d4/data 2014.09.09
+(,`tp)!,,14.98316
"--- query plan result ---" (+(,`tp)!,,14.16883;+(,`tp)!,,15.05272;+(,`tp)!,,15.2657;+(,`tp)!,,15.42632; +(,`tp)!,,14.75603;+(,`tp)!,,15.04996;+(,`tp)!,,15.69218;+(,`tp)!,,15.53095; +(,`tp)!,,14.37194;+(,`tp)!,,14.32488;+(,`tp)!,,13.25797;+(,`tp)!,,14.16648; +(,`tp)!,,15.58938;+(,`tp)!,,16.1427;+(,`tp)!,,15.59198;+(,`tp)!,,14.98316)
"--- final result ---"
date | tp
----------| --------
2014.08.18| 14.16883
2014.08.19| 15.05272
2014.08.20| 15.2657
2014.08.21| 15.42632
2014.08.24| 14.75603
2014.08.25| 15.04996
2014.08.26| 15.69218
2014.08.27| 15.53095
2014.08.28| 14.37194
2014.08.31| 14.32488
2014.09.01| 13.25797
2014.09.02| 14.16648
2014.09.03| 15.58938
2014.09.04| 16.1427
2014.09.08| 15.59198
2014.09.09| 14.98316
The previous example was run with no secondaries – i.e., on a single-threaded q process. Consequently, the queries run sequentially across segments and partitions.
Now observe what happens when we match the number of secondaries to the number of segments in our database by starting q with -s 4.
q)select avg tp by date from trade where date in -16#date,sym in syms
"--- query plan: segments/partitions ---" (`trade;,(in;`sym;`syms);(`symbol$())!`symbol$();(,`tp)!,(avg;`tp)) 2014.08.18 2014.08.19 2014.08.20 2014.08.21 2014.08.24 2014.08.25 2014.08.26 2014.08.27 2014.08.28 2014.08.31 2014.09.01 2014.09.02 2014.09.03 2014.09.04 2014.09.08 2014.09.09 `:/d/d1/data`:/d/d2/data`:/d/d3/data`:/d/d4/data ((`:/d/d1/data;2014.08.21 2014.08.25 2014.09.02); (`:/d/d2/data;2014.08.18 2014.08.26 2014.09.03); (`:/d/d3/data;2014.08.19 2014.08.27 2014.08.31 2014.09.04 2014.09.08); (`:/d/d4/data;2014.08.20 2014.08.24 2014.08.28 2014.09.01 2014.09.09))
"--- partition query: query, segment, partition, result ---"
"--- partition query: query, segment, partition, result ---"
"--- partition query: query, segment, partition, result ---"
(`trade;,(in;`sym;`syms);(`symbol$())!`symbol$();(,`tp)!,(avg;`tp)) `:/d/d1/data
2014.08.21
"--- partition query: query, segment, partition, result ---" (`trade;,(in;`sym;`syms);(`symbol$())!`symbol$();(,`tp)!,(avg;`tp)) `:/d/d4/data
2014.08.20 (`trade;,(in;`sym;`syms);(`symbol$())!`symbol$();(,`tp)!,(avg;`tp)) `:/d/d3/data
2014.08.19 (`trade;,(in;`sym;`syms);(`symbol$())!`symbol$();(,`tp)!,(avg;`tp)) `:/d/d2/data
2014.08.18
+(,`tp)!,,15.55121
"--- partition query: query, segment, partition, result ---" (`trade;,(in;`sym;`syms);(`symbol$())!`symbol$();(,`tp)!,(avg;`tp)) `:/d/d3/data
2014.08.27
+(,`tp)!,,15.47055 +(,`tp)!,,15.21819
"--- partition query: query, segment, partition, result ---"
"--- partition query: query, segment, partition, result ---" (`trade;,(in;`sym;`syms);(`symbol$())!`symbol$();(,`tp)!,(avg;`tp)) `:/d/d2/data
2014.08.26 (`trade;,(in;`sym;`syms);(`symbol$())!`symbol$();(,`tp)!,(avg;`tp)) `:/d/d4/data
2014.08.24
+(,`tp)!,,14.81711
"--- partition query: query, segment, partition, result ---" (`trade;,(in;`sym;`syms);(`symbol$())!`symbol$();(,`tp)!,(avg;`tp)) `:/d/d1/data
2014.08.25
+(,`tp)!,,14.92875
"--- partition query: query, segment, partition, result ---" +(,`tp)!,,16.07275
"--- partition query: query, segment, partition, result ---" (`trade;,(in;`sym;`syms);(`symbol$())!`symbol$();(,`tp)!,(avg;`tp)) `:/d/d3/data
2014.08.31
+(,`tp)!,,15.55499 (`trade;,(in;`sym;`syms);(`symbol$())!`symbol$();(,`tp)!,(avg;`tp)) `:/d/d4/data
2014.08.28
"--- partition query: query, segment, partition, result ---" (`trade;,(in;`sym;`syms);(`symbol$())!`symbol$();(,`tp)!,(avg;`tp)) `:/d/d2/data
2014.09.03
+(,`tp)!,,13.43061 +(,`tp)!,,15.29159 +(,`tp)!,,12.64993
"--- partition query: query, segment, partition, result ---"
"--- partition query: query, segment, partition, result ---"
"--- partition query: query, segment, partition, result ---" (`trade;,(in;`sym;`syms);(`symbol$())!`symbol$();(,`tp)!,(avg;`tp)) (`trade;,(in;`sym;`syms);(`symbol$())!`symbol$();(,`tp)!,(avg;`tp)) (`trade;,(in;`sym;`syms);(`symbol$())!`symbol$();(,`tp)!,(avg;`tp)) `:/d/d3/data
`:/d/d4/data
2014.09.04
2014.09.01
+(,`tp)!,,176.6311
`:/d/d1/data
2014.09.02
+(,`tp)!,,151.7784
+(,`tp)!,,13.67089
"--- partition query: query, segment, partition, result ---" (`trade;,(in;`sym;`syms);(`symbol$())!`symbol$();(,`tp)!,(avg;`tp)) `:/d/d4/data
2014.09.09
+(,`tp)!,,179.799
"--- partition query: query, segment, partition, result ---" (`trade;,(in;`sym;`syms);(`symbol$())!`symbol$();(,`tp)!,(avg;`tp)) `:/d/d3/data
2014.09.08
+(,`tp)!,,193.7031
+(,`tp)!,,48.75286
"--- query plan result ---" (+(,`tp)!,,15.47055;+(,`tp)!,,15.55121;+(,`tp)!,,15.21819;+(,`tp)!,,14.81711; +(,`tp)!,,16.07275;+(,`tp)!,,15.29159;+(,`tp)!,,15.55499;+(,`tp)!,,14.92875; +(,`tp)!,,12.64993;+(,`tp)!,,13.43061;+(,`tp)!,,13.67089;+(,`tp)!,,151.7784; +(,`tp)!,,176.6311;+(,`tp)!,,179.799;+(,`tp)!,,193.7031;+(,`tp)!,,48.75286)
"--- final result ---"
date | tp
----------| --------
2014.08.18| 15.47055
2014.08.19| 15.55121
2014.08.20| 15.21819
2014.08.21| 14.81711
2014.08.24| 16.07275
2014.08.25| 15.29159
2014.08.26| 15.55499
2014.08.27| 14.92875
2014.08.28| 12.64993
2014.08.31| 13.43061
2014.09.01| 13.67089
2014.09.02| 151.7784
2014.09.03| 176.6311
2014.09.04| 179.799
2014.09.08| 193.7031
2014.09.09| 48.75286
Here the output from four secondaries running concurrently is interleaved. One secondary executes for each segment, with each secondary executing the query on its segment sequentially across its partitions. The result demonstrates how an appropriately configured server can take advantage of available I/O bandwidth to speed up query execution using segments and secondaries.
14.5 .Q Utilities for Splayed and Partitioned Tables
The .Q namespace contains many useful functions for creating and maintaining splayed and partitioned tables. Since nearly everyone uses them and they have become de facto standard, KX now supports their use, provided the function is documented on their site. In this section, we describe the more commonly used entries and demonstrate their use. See the .Q entry on the KX site for the full list.
14.5.1 .Q.bv .Q.bvi
In partitioned databases, construct the dictionary .Q.vp of table schemas for tables with missing partitions. .Q.bvi scans only new partitions loaded in the HDB since the last time .Q.bv or .Q.bvi was run. See KX site for details.
14.5.2 .Q.chk
The unary .Q.chk takes as argument the symbolic file handle of a root directory. It examines each partition subdirectory in the root and writes an empty splayed slice of the appropriate form wherever a table is missing in a partition.
We reconstruct our trades and quotes example, this time using.Q.dpft. In a fresh db:
q)t:([] ti:09:30:00 09:31:00; sym:`ibm`msft; p:101 33f)
q).Q.dpft[`:db;2025.01.01;`sym;`t]
_
q)t:([] ti:09:30:00 09:31:00; sym:`ibm`msft; p:101.5 33.5)
q).Q.dpft[`:db;2025.01.02;`sym;`t]
_
q)q:([] ti:09:30:00 09:31:00; sym:`ibm`msft;b:100.75 32.75; a:101.25 33.25f)
q).Q.dpft[`:db;2025.01.01;`sym;`q]
_
q)q:([] ti:09:30:00 09:31:00; sym:`ibm`msft;b:101.25 33.25; a:101.75 33.75)
q).Q.dpft[`:db;2025.01.02;`sym;`q]
_
Now we introduce two new partitions, each containing a slice for just one table.
q)q:([] ti:09:30:00 09:31:00; sym:`ibm`msft;b:101. 33.; a:101.5 34f)
q).Q.dpft[`:db;2025.01.03;`sym;`q]
_
q)t:([] ti:09:30:00 09:31:00; sym:`ibm`msft; p:102 34f)
q).Q.dpft[`:db;2025.01.04;`sym;`t]
_
In a fresh q session:
q)\a
,`t
q)select from t where date=2025.01.03
'./2025.01.03/t/sym. OS reports: No such file or directory
[0] select from t where date=2025.01.03
^
This will make the table q disappear since it is not in the most recent partition; also a query for t fails on 2025.01.03. We use .Q.chk to fix things:
q).Q.chk `:../db
,`:../db/2025.01.04
,`:../db/2025.01.03
()
q)\l .
q)select from t where date=2025.01.03
date sym ti p
-------------
q)select from q where date=2025.01.04
date sym ti b a
---------------
Notice that .Q.chk tells you which partitions it is fixing but not which tables.
14.5.3 .Q.dpft write partition slice
The utility .Q.dpft assists in creating partitioned and segmented tables by incorporating the functionality of .Q.en at a higher level. It is convenient when partitions are loaded and written out iteratively.
.Q.dpft[d;p;f;t]
- The first parameter
dis the symbolic file handle of the database root directory. - The second parameter
pis the q data value that will become the name of the partition subdirectory. - The third parameter
fis the name of the field to which thep#attribute is applied (usuallysymfor trades and quotes). - The last parameter
tis the table name.
The .Q.dpft function rearranges (a copy of) the named table so that the partition column is first and the parted column is second, and then splays it into the specified partition in the specified directory. When appropriate, it enumerates all symbol columns over sym in memory and saves the sym list in the root. The result is the symbolic table name if the operation is successful.
Note
- Because the final parameter is a table name, this function cannot be applied to a local variable.
- The table columns must be simple or compound lists.
- The source table cannot be a keyed table.
We recycle our favorite example, using .Q.dpft to write it as a partitioned table into a fresh /db:
q)t:([] ti:09:30:00 09:31:00; sym:`ibm`msft; p:101.5 33.5)
q).Q.dpft[`:db;2025.01.01;`sym;`t]
_
q)t:([] ti:09:31:00 09:32:00; sym:`ibm`msft; p:101 33f)
q).Q.dpft[`:db;2025.01.02;`sym;`t]
_
q)\l db
q)select from t where date in 2025.01.01 2025.01.02
date sym ti p
------------------------------
2025.01.01 ibm 09:30:00 101.5
…
Note
There are other similarly-named helper functions: .Q.pfts, Q.dtp, .Q.dpts, .Q.dsftg, .Q.hdpf. See the .Q entry on the KX site for details.
14.5.4 .Q.en enumerate symbols
The binary .Q.en takes a symbolic file handle of a root directory as its first argument and a table as its second argument. As a side effect, it creates an in-memory list sym comprising the unique items across all symbol columns and writes it to an eponymous file in the specified directory. It returns a table obtained from the original table by enumerating its symbol column(s) over sym.
In a fresh /db,
q)sym
'sym
q).Q.en[`:db;] ([] s:`a`b`v;v:10 20 30)
s v
----
a 10
b 20
v 30
q)sym
`a`b`v
q)get `:db/sym
`a`b`v
Here is more detail on the actual sequence of operations for .Q.en:
- The variable
symis created (in memory) by loading the filesymfrom the specified root, should such exist, or as the empty symbol list if not. An existingsymvariable in the workspace is overwritten in this step. - All symbol columns of (a copy of) the table are conditionally enumerated over the
symlist. - The sym variable is serialized to the specified root directory.
- The enumerated table is returned from the function application.
Note
In contrast to earlier versions of q, .Q.en now enumerates arbitrarily nested lists of symbols.
The sibling .Q.ens takes an extra parameter that is a symbol specifying the name of the enumeration domain – i.e., something other than sym. See the .Q.en entry on the KX site for details.
14.5.5 .Q.fs. file streaming
In many cases, splayed tables (as well as partitioned or segmented slices) are too large to fit into memory. Thus, we face the dilemma of how to create such tables, since splaying writes the table from memory. The utility .Q.fs comes to the rescue by allowing us to process text files in "chunks."
Note
.Q.fs is a projection of .Q.fsn with the chunk size set to 131000 bytes. If this does not provide adequate performance, you may wish to use .Q.fsn, which exposes the chunk size as an additional parameter. The optimal chunk size will vary for each application.
Loosely speaking, .Q.fs applies a function to the records of a text file in 'chunks' instead of processing the entire file in one gulp. The intent is to iterate the function processing a number of bite-sized lists of records. As a trivial warm-up, we create a text file containing the first 100000 non-negative integers, one per line and we chunk the calculation of their squares. (The call to 0N! enables us to see the result of each chunk as it is processed.)
q)`:L 0: string til 100000
_
q).Q.fs[{0N!v*v:"J"$x};`:L]
0 1 4 9 16 25 36 49 64 81 100 121 144 169 196 225 256 289 324 361 40 ..
560979225 561026596 561073969 561121344 561168721 561216100 56126348 ..
2071888324 2071979361 2072070400 2072161441 2072252484 2072343529 ..
4536157201 4536291904 4536426609 4536561316 4536696025 4536830736 ..
7953785856 7953964225 7954142596 7954320969 7954499344 7954677721 ..
588890
While chunking provides no material benefit in this trivial example, it is easily adaptable to larger problems. Assume we have a CSV file containing one day’s pseudo-trade records (here we manufacture one):
q)ct:00:00:00.000+(10*(til 100000))mod `int$24:00:00.000
q)ix:100000?til 2
q)cs:(`ibm`ms) ix
q)cp:(115 25@ix)+-1.+100000?1.
q)trade:([] ti:ct; sym:cs; px:cp)
q)save `:data/trade.csv
_
q)read0 `:data/trade.csv
"ti,sym,px"
"00:00:00.000,ibm,114.767"
"00:00:00.010,ibm,114.7566"
…
We intend to save this data into a kdb+ daily partition slice, but suppose that loading it all at once results in a WSFULL error because our q process has insufficient memory. We demonstrate how to read the CSV file and write out to the partition slice in chunks.
For convenience, we wrap the read and symbol enumeration into a function ldchunk. Then we use .Q.fs to iterate ldchunk over the chunks, upserting each partial result to the partition directory. Again we instrument with 0N! to see each chunk result.
q)ldchunk:{.Q.en[`:db] flip `time`sym`price!("TSF";",") 0: x}
q).Q.fs[{0N!.[`:db/2025.01.01/t/;();,;ldchunk x]}] `:data/trade.csv
`:db/2025.01.01/t/
`:db/2025.01.01/t/
…
2638673j
We verify that the partition is properly written. (Where does the extra one come from?)
q)\l db
q)count select from t where date=2025.01.01
100001
Because performance matters in the real world, we would proceed to sort the completed partition slice by time within sym and apply the `p# attribute to the sym column. In a fresh q session,
q)`sym`time xasc `:db/2025.01.01/t
_
q)@[`:db/2025.01.01/t; `sym; `p#]
_
q)\l db
q)meta t
c | t f a
-----| -----
date | d
time | t
sym | s p
price| f
14.5.6 .Q.ind absolute index
As we saw in 14.3, the virtual column i reflects the relative row number in a partitioned table. That is, the value of i is the offset of the records within the partition slice. How to retrieve records by absolute within a partition or segment?
A fundamentalist could use a q expression to determine the partition and relative row for the absolute row number. It is less painful to use the binary .Q.ind, whose first argument is a partitioned table and whose second argument is a list of long values representing absolute row numbers. The result is a table in memory. You must enlist a single index value.
In a fresh /db,
q)`:db/2025.01.01/t/ set ([] c1:1 2 3; c2:1.1 2.2 3.3)
_
q)`:db/2025.01.03/t/ set ([] c1:4 5; c2:4.4 5.5)
_
q)\l db
q)select from t where date within 2025.01.01 2025.01.03
date c1 c2
-----------------
2025.01.01 1 1.1
2025.01.01 2 2.2
2025.01.01 3 3.3
2025.01.03 4 4.4
2025.01.03 5 5.5
q).Q.ind[t;1 3]
date c1 c2
-----------------
2025.01.01 2 2.2
2025.01.03 4 4.4
q).Q.ind[t;enlist 2]
date c1 c2
-----------------
2025.01.01 3 3.3
14.5.7 .Q.li
In the current HDB, maps any partition(s) which are both in the list supplied and on disk. See the .Q.li entry on the KX site for details.
14.5.8 .Q.lo
Has the form:
.Q.lo[`:database;cd;scripts]
where database is an hsym or symbol atom, cd is a boolean flag indicating whether to cd to the database directory and scripts is a boolean flag indicating whether to execute any scripts in the database directory.
Compare .Q.li with normal \l.
14.5.9 .Q.pv partition values
The variable .Q.pv is a list containing the values of the partition domain – i.e., the values corresponding to the slice directories actually found in the root. This is useful if you need to iterate explicitly over partitions.
In a fresh /db,
q)`:db/2025.01.01/t/ set ([] c1:1 2 3; c2:1.1 2.2 3.3)
_
q)`:db/2025.01.03/t/ set ([] c1:4 5; c2:4.4 5.5)
`_
q)\l db
q).Q.pv
2025.01.01 2025.01.03
14.5.10 .Q.qp partition status
The unary .Q.qp reports the residency of a table. It returns 1b if its argument is a partitioned table mapped into memory, 0b if it is splayed and 0 for anything else. In a fresh /db,
q)`:db/tser set ([] c1:1 2 3; c2:1.1 2.2 3.3)
_
q)`:db/tsplay/ set ([] c1:1 2 3; c2:1.1 2.2 3.3)
_
q)`:db/2025.01.01/tpart/ set ([] c1:1 2 3; c2:1.1 2.2 3.3)
_
q)\l db
q).Q.qp tser
0
q).Q.qp tsplay
0b
q).Q.qp tpart
1b
14.5.11 .Q.view
The unary .Q.view is handy when you are executing queries against partitioned or segmented tables. Recall that multiple tables can share the partitioning. The argument of .Q.view is a list of partition values that acts as a filter for all queries against any partitioned table in the database. Otherwise put, the practical effect of applying .Q.view is to add its argument as a constraint in the first sub-phrase of the where clause of every query. This can guard against run-away queries that ask for all historical data, for example.
In the following example, we create a daily partitioned table with three days of data in a fresh db. By using .Q.view to restrict the view to the last two days, we rein in the unqualified query.
q)`:db/2025.01.01/t/ set ([] ti:09:30:00 09:31:00; p:101 102f)
_
q)`:db/2025.01.02/t/ set ([] ti:09:30:00 09:31:00; p:100 100f)
_
q)`:db/2025.01.03/t/ set ([] ti:09:30:00 09:31:00; p:103 104f)
_
q)\l db
q).Q.view 2025.01.02 2025.01.03
q)select from t
date ti p
-----------------------
2025.01.02 09:30:00 100
2025.01.02 09:31:00 100
2025.01.03 09:30:00 103
2025.01.03 09:31:00 104
This is especially useful when you are testing or when you expose large historical databases to users who are prone to forget that they must qualify their queries by date. To reset the default view to all partitions, invoke .Q.view as nullary. You can use .Q.pv to access partition values programmatically.
q).Q.view[]
q)select from t / yikes!! Gets everything
date ti p
-----------------------
2025.01.01 09:30:00 101
2025.01.01 09:31:00 102
2025.01.02 09:30:00 100
2025.01.02 09:31:00 100
2025.01.03 09:30:00 103
2025.01.03 09:31:00 104
q).Q.view -2#.Q.pv
q)select from t
date ti p
-----------------------
2025.01.02 09:30:00 100
…
q)\cd
"/Q4M"
q)\l db
q)\cd
"/Q4M/db"
q)\cd ..
q)\cd
"/Q4M"
q).Q.lo[`:db;0b;0b]
q)\cd
"/Q4M"
14.6 Kdb+ Database
Many kdb+ newcomers find the notion of a kdb+ database shrouded in mystery. In this section, we answer common qbie questions:
- What is a kdb+ database?
- How is it different from an RDBMS?
- How do I create a kdb+ database?
- How do I startup a kdb+ database?
- What happens at startup?
14.6.1 Comparing kdb+ to an RDBMS
While q-sql makes certain aspects of a kdb+ database seem familiar to a SQL programmer, the two are quite different under the covers. Folks coming into the world of q and kdb+ from a traditional RDBMS environment may feel as if they have entered the Matrix after taking the red pill from Morpheus. The fundamental differences are:
- Kdb+ is based on lists, which are ordered collections allowing duplicates, whereas SQL is based on sets, which are unordered collections of distinct elements.
- Kdb+ stores data as contiguous items in column lists, whereas an RDMS stores data as fields within non-contiguous rows. Neo says in kdb+, “There are no rows.”
- Kdb+ table operations are vector operations on columns, whereas SQL operates on individual fields and rows.
We summarize the major differences between an RDBMS and kdb+ in the following chart.
| Traditional RDBMS | Kdb+ Database | |
|---|---|---|
| Table Creation | Tables defined declaratively using DDL and created on disk. | Tables created functionally in the q language. |
| Data Persistence | Tables and related metadata held in an opaque repository. Tables are stored by row. | Serialized q entities stored in the O/S file system. No separate table metadata. Tables are stored by column. |
| Data Access | Access to stored information is through DDL for metadata and SQL for data. Must retrieve through a query into program. | Data directly accessible in q. q-sql provides query forms for table manipulation. |
| Memory Residency | Tables reside on disk; query result sets reside in program memory. | Tables live in memory but can be persisted to disk. Column subsets are page faulted into memory for mapped tables. |
| Data Format | Based on sets, which are unordered collection of distinct items. Data is stored in fields within rows, which are not contiguous. | Based on lists, which are ordered collections allowing duplicates. Data is stored as contiguous items in column lists. |
| Data Modification | Persisted table modifiable through SQL (INSERT, UPDATE, etc.) |
Memory resident tables modifiable through q and q-sql. Persisted table modifiable only with append (upsert) |
| Data Programming | SQL is declarative relational. Programs, called stored procedures, written in proprietary procedural language. | Programs written in integrated vector functional language q. Tables are first class entities in q. |
| Transactions | Support for transactions through COMMIT and ROLLBACK. |
No built-in transaction support. |
14.6.2 The Physical Layout of a kdb+ Database
Here is a Zen koan: What is a kdb+ database? At the most basic level, a kdb+ database is a file system directory (and subdirectories) holding q entities. This directory is the root of the database. The constituents of the database are simply q entities saved in files or text scripts. Database entities either reside at some level under the root or are pointed to from par.txt, itself in the root.
We describe the various kdb+ database components in this section and show examples in the following section. In our descriptions we use /db as our root, but this is arbitrary.
14.6.2.1 The sym File
The sym file is an optional serialized q data file containing a list of unique symbols used as the domain for symbol enumeration. Placing the sym file in the root guarantees that it will be loaded into memory at q startup.
/db
sym …
There is no strict requirement that there be only one enumeration domain, but conventionally all symbol columns from all tables are enumerated over a single domain sym. Many of the .Q utilities that handle symbol enumeration assume this but others have an optional parameter to specify the enumeration domain.
With the switch to 64-bit enumerations there is less need for multiple enumeration domains. If you choose to have multiple enumeration domains, be aware that symbols in the resulting enumerated values will act as expected under (=) but not with (~) due to the different domains. You can resolve enumerated values with (value) if this is a problem.
Important
Pay very careful attention to the sym file, as it is a single point of failure in kdb+. Corrupting or losing it will result in all your symbol columns being unresolvable. Good practice is to use conditional enumeration – e.g., `:db/sym? – when you enumerate symbols manually or use .Q utilities that do this for you. Always back up your database before any manual operations on the sym file.
The sym file will not normally be an issue for data integrity when loading historical data from an external source into a kdb+ because conditional enumeration (and the .Q utilities) uses efficient file locking to mediate concurrent updates.
Alternately, if the symbol domain can be determined in advance, you can load the sym universe into memory and use non-conditional enumeration – i.e., `sym$ – which will be very fast. For example, one approach is to create a preprocessing utility in q using .Q.fsn to extract all symbols from the source data, import this as a list in q and place the distinct items in sym. Then create the historical partitions running concurrent processes that use unconditional enumerations. Be mindful that unconditional enumeration fails for a symbol not in the domain. If they are few, it may be cheaper to capture these with error trapping and fix with post processing.
14.6.2.2 Other Serialized Files in Root
While it is seldom done in practice (perhaps because few kdb+ programmers are aware of the possibility), it is possible to place any serialized q entity into the root and have it loaded into an eponymous variable. This can be more efficient than using a script to initialize complex variables in the workspace.
Tip
If you use this, don’t go overboard.
/db
[sym]
var1
var2
…
One type of serialized file that can be initialized in this fashion is a reference table (or keyed table) of modest size (up to a few million rows). Such a table will be loaded into memory in its entirety, so lookups or joins on it will be quite fast.
It is also possible to (re)load the entire state of the root context automatically by serializing its dictionary and loading it at startup. This is a form of checkpoint.
14.6.2.3 Scripts
A script in the root can hold arbitrary q code that will be loaded upon startup. For example, functions defined in such a script can be "stored procedures" for the database. While it is possible to have multiple q scripts in the root, you probably want precise control over the order in which the scripts are executed. Good practice is to place in the root one startup script that loads scripts residing in libraries elsewhere.
/db
…
init.q
14.6.2.4 Splayed Tables
Tables of intermediate size (up to tens of millions of rows) can be splayed under the root. Splayed table directories must be immediately under the root.
/db
[sym]
…
/splay1
/splay2
…
Because splayed tables are mapped and columns are page-faulted into memory as needed, no more than a few columns will be simultaneously resident in the workspace. Assuming that the server has adequate memory available to kdb+, q should expect to find recently accessed columns cached by the OS. In this scenario, performance of lookups against splayed tables can be close to that of in-memory tables.
14.6.2.5 Partitioned Tables
The partition directories of non-segmented tables must be immediately under the root. Every partition directory contains a subdirectory for each table sharing the partition; in each subdirectory lives the splayed slice for that table's partition value. For example, for daily partitioned trades and quotes, a day’s directory would contain a splayed directory for that day’s trades and another splayed directory for that day’s quotes.
/db
[sym]
…
/part1
/trade
/quote
/part2
/trade
/quote …
There need not be a directory for every partition value but the partition directories must have a uniform structure. That is, a partition value having any entry must contain splayed directories for all tables in the partition. Splay an empty schema for a table having no records for a partition value. The utility .Q.chk will do this for you automatically.
14.6.2.6 Segmented Tables
In contrast to normal partitioned tables (i.e., non-segmented tables) in which all records live under the root, no records of a segmented table can reside under the root. Instead, the root contains a text file par.txt having one entry per line. Each entry represents an OS path for a segment directory not under the root containing the data in that segment.
Each segment directory contains partition directories whose structure is completely analogous to that of a regular partitioned table. There is one directory per partition value and beneath that, one splayed subdirectory per table. Note that depending on the nature of the segmentation, multiple segment directories may contain slices for the same partition value. The previous restriction about consistency across partition directories applies to segments. Here we use segi as shorthand for data located on independent I/O channel i.
/db
[sym]
…
par.txt
…
== drive 1 =============
/seg1
/part1
/part2
…
== drive 2 =============
/seg2
/parta
/partb
…
== … =============
14.6.3 Creating and Populating a kdb+ Database
Here we demonstrate how to create and populate various kdb+ database configurations. Although the examples are simplistic, they cover most common situations.
There are no wizards for laying out and populating a kdb+ database. Instead, you must write each component in the correct form to the correct location. There are utilities to help with the process but it is your responsibility to ensure that everything is valid. Improperly structured directories or incorrectly written data files will generate run-time errors only when q code attempts to access them.
Once you have created a kdb+ database, start a q process pointing at the root directory of the database to attach it to the session. Alternately, in a running q instance you can issue \l on the root directory.
14.6.3.1 What Happens at Startup?
This question has a deceptively simple answer. When you point q at a directory or file, it applies a single rule:
Whatever q recognizes as its own, it does.
Thus, a serialized q entity is loaded; a splayed, partitioned or segmented table is mapped; a script is executed.
Important
Almost anything that q does not recognize causes it to abort that portion of startup. In particular, if q discovers foreign or undecipherable files in splayed, partitioned or segmented directories, it will not map the tables contained there. Even unexpected hidden files (e.g., those written by the Mac OS Finder) will abort the map. The exception is files whose name ends in $, which are ignored by the loader.
When you point q startup or \l at a directory, that directory becomes the root directory for the kdb+ database and also the current working directory for the OS. We shall refer to this scenario as kdb+ startup to distinguish it from an arbitrary q session. We shall cover the items that Kdb+ startup finds in the order that it handles them:
- Serialized q entities
- Splayed tables
- Partitioned or segmented tables
- Scripts
Note
File handles that are not fully qualified – i.e., relative paths – are interpreted relative to the current working directory at the time.
14.6.3.2 Serialized q Entities
When kdb+ startup recognizes a serialized q data file, it loads that data into an eponymous variable. The canonical example is the sym file in the root, containing the serialized list of (unique) symbols in the enumeration domain for (all) symbol columns. Multiple serialized entities are loaded in the order in which the OS file system lists them.
In our first example, we manually write out a sym file, exit q and then start kdb+ pointing at the root.
q)`:db/sym set `a`b`c`z`x`y
`:db/sym
q)\\
$q db
q)sym
`a`b`c`z`x`y
Starting with a fresh /db, we place a defined "constant", a utility function and a serialized keyed table in the root.
q)`:db/LIFE set 42
`:db/LIFE
q)`:db/f set {x*y}
`:db/f
q)`:db/lookup set ([s:`a`b`c] v:1 2 3)
`:db/lookup
q)\\
$q /db
q)LIFE
42
q)f
{x*y}
q)lookup
s| v
-| -
a| 1
b| 2
c| 3
Tip
As of this writing (July 2025), q will not look for serialized data in directories other than the root. Nor will it automatically load serialized data whose file names have extensions. However, you can load such data files manually using \l or get.
14.6.3.3 Splayed Tables
When kdb+ startup finds a (sub)directory immediately beneath the root that it recognizes as a splayed table, it maps the table into memory. All symbol columns must be enumerated for a table to be splayed. Ensure that the sym file is in the root directory if you do not use a built-in utility.
Starting with a fresh db, we splay two tables under the root. The first does not have symbol columns; the second does.
q)`:db/tref/ set ([] c1:1 2 3; c2:1.1 2.2 3.3)
_
q)`:db/cust/ set .Q.en[`:db;] ([] sym:`ibm`msft`goog; name:`IBM`Microsoft`Alphabet)
_
q)\\
$q db
q)select from tref
c1 c2
------
1 1.1
…
q)select from cust
sym name
--------------
ibm IBM
…
14.6.3.4 Partitioned Tables
If kdb+ startup finds subdirectories immediately beneath the root whose names constitute valid partition values, it examines them for splayed tables comprising partition slices. Valid partitioned tables are then mapped. The presence of partitioned tables is independent of the presence of (plain) splayed tables. Partitioned tables and segmented tables are mutually exclusive.
Note
All symbol columns in every partition slice must be enumerated, customarily over a file sym in the root. Ensure that the sym file is in the root directory if you do not use the built-in utilities.
We recreate a previous example in which multiple tables share a daily partition in a fresh db.
q)`:db/2025.01.01/t/ set .Q.en[`:db;] ([] ti:09:30:00 09:31:00; sym:`ibm`msft; p:101 33f)
_
q)`:db/2025.01.02/t/ set .Q.en[`:db;] ([] ti:09:30:00 09:31:00;sym:`ibm`msft; p:101.5 33.5)
_
q)`:db/2025.01.01/q/ set .Q.en[`:db;] ([] ti:09:30:00 09:31:00;sym:`ibm`msft; b:100.75 32.75; a:101.25 33.25)
_
q)`:db/2025.01.02/q/ set .Q.en[`:db;] ([] ti:09:30:00 09:30:00;sym:`ibm`msft; b:101.25 33.25; a:101.75 33.75)
_
q)\\
$ q db
q
q)select from t where date within 2025.01.01 2025.01.02
date ti sym p
------------------------------
2025.01.01 09:30:00 ibm 101
…
q)select from q where date within 2025.01.01 2025.01.02
date ti sym b a
--------------------------------------
2025.01.01 09:30:00 ibm 100.75 101.25
…
14.6.3.5 Segmented Tables
When kdb+ startup finds a par.txt file in the root, it interprets each
line as a segment location and it examines the locations for valid
segments of partitioned tables. Valid tables are then mapped.
Important
Ensure that the segment directories are not located under the root; otherwise you will relive the table scene in the first Alien movie. The entries in par.txt can be symlinks under the root pointing to locations not under the root. In this case, end the name of the symlink with $ so it is bypassed during the initial potion of the load.
The presence of segmented tables is independent of the presence of plain splayed tables. Segmented tables and partitioned tables are mutually exclusive. All symbol columns in every segment must be enumerated. Ensure that the (common) sym file is in the root directory.
We recreate the previous example from 14.4 in which multiple tables share segmentation by symbol range. In fresh directories /am, /nz and /db,
q)extr:{[t;r] select from t where (`$1#'string sym) within r}
q)/ day 1
q)t:.Q.en[`:db;] ([] ti:09:30:00 09:31:00; sym:`ibm`t; p:101 17f)
q)q:.Q.en[`:db;] ([] ti:09:29:59 09:29:59 09:30:00; sym:`ibm`t`ibm; p:100.5 17 101)
q)`:am/2025.01.01/t/ set extr[t;`a`m]
_
q)`:nz/2025.01.01/t/ set extr[t;`n`z]
_
q)`:am/2025.01.01/q/ set extr[q;`a`m]
_
q)`:nz/2025.01.01/q/ set extr[q;`n`z]
_
q)/ day 2
q)t:.Q.en[`:db;] ([] ti:09:30:00 09:31:00; sym:`t`ibm; p:17.1 100.9)
q)q:.Q.en[`:db;] ([] ti:09:29:59 09:29:59 09:30:00; sym:`t`ibm`t;p:17 100.8 17.1)
q)`:am/2025.01.02/t/ set extr[t;`a`m]
_
q)`:nz/2025.01.02/t/ set extr[t;`n`z]
_
q)`:am/2025.01.02/q/ set extr[q;`a`m]
_
q)`:nz/2025.01.02/q/ set extr[q;`n`z]
_
q)`:db/par.txt 0: ("../am"; "../nz")
_
q)\\
$q db
q)select from t where date within 2025.01.01 2025.01.02
date ti sym p
-----------------------------
2025.01.01 09:30:00 ibm 101
…
q)select from q where date within 2025.01.01 2025.01.02
date ti sym p
-----------------------------
2025.01.01 09:29:59 ibm 100.5
…
14.6.3.6 Scripts
Files with extension .q are interpreted as q scripts; those with extension .k as k scripts. Any other extensions result in an error unless the relevant DSL has been defined – .e.g.:
q).p.e:value
would enable .p extensions. Scripts found in the root are loaded and executed in alphabetical order. You will probably want better control over the order of execution, so it is best to place in the root a single script that scripts located elsewhere in the desired sequence.
Note
An invalid expression in a script causes the entire script load to abort. Locating the invalid entry can be non-trivial for long scripts since by default console display is suppressed during load. Judicious use of (0N!) can be helpful. Sometimes using binary search using block comments is an effective approach to locating the offending line. Bottom line: keep scripts short!
In our first example, we initialize the workspace using a startup script db/init.q that we create from q (because we can). You could do it in any text editor.
q)`:db/init.q 0: ("LIFE:42";"f:{x*y}";"lookup:([s:`a`b`c] v:1 2 3)")
`:/db/init.q
q)\\
$q db
q)LIFE
42
q)f
{x*y}
q)lookup
s| v
-| -
a| 1
b| 2
c| 3
Since scripts are loaded as the last step in kdb+ startup, any loaded script can rely on startup, serialized variables and mapped tables being present. For example, in a fresh /db we create a splayed table and a startup script from within q and then run the script at startup.
q)`:db/cust/ set .Q.en[`:db;] ([] sym:`ibm`msft`goog; name:`IBM`Microsoft`Alphabet)
_
q)`:db/init.q 0: ("sym"; "show select from cust")
_
q)\\
$q db
`ibm`msft`goog`IBM`Microsoft`Alphabet
sym name
--------------
ibm IBM
msft Microsoft
goog Alphabet)
q)
Now we demonstrate a script that loads other scripts. Start with a fresh db. We write out a script having a library function (you could do it in a text editor) and then load that script from init.q at startup.
q)`:scripts/expr.q 0: enlist ".jab.lib.expr.eval:{x@y}"
_
q)`:db/init.q 0: enlist "\\l ../scripts/expr.q"
_
q)\\
$q db
q).jab.lib.expr.eval[exp; 1]
2.718282
14.7 Putting It All Together
In this section, we combine the various constituents of a kdb+ database demonstrated in the previous sections to give two archetypal examples. The first has partitioned tables; the second has segmented tables.
14.7.1 Partitioned Database
To construct our partitioned database, we combine the steps in previous sections to obtain the following meta-script mkdb1.q. (We assume that the scripts in /lib already exist.)
/ create serialized variables
`:db/LIFE set 42
`:db/f set {x*y}
`:db/lookup set ([s:`a`b`c] v:1 2 3)
/ create splayed tables
`:db/tref/ set ([] c1:1 2 3; c2:1.1 2.2 3.3)
`:db/cust/ set .Q.en[`:db;] ([] sym:`ibm`msft`goog;
name:`IBM`Microsoft`Google)
/create partitioned tables
`:db/2025.01.01/t/ set .Q.en[`:db;] ([] ti:09:30:00 09:31:00;
sym:`ibm`msft; p:101 33f)
`:db/2025.01.02/t/ set .Q.en[`:db;] ([] ti:09:30:00 09:31:00;
sym:`ibm`msft; p:101.5 33.5)
`:db/2025.01.01/q/ set .Q.en[`:db;] ([] ti:09:30:00 09:31:00;
sym:`ibm`msft; b:100.75 32.75; a:101.25 33.25f)
`:db/2025.01.02/q/ set .Q.en[`:db;] ([] ti:09:30:00 09:30:00;
sym:`ibm`msft; b:101.25 33.25; a:101.75 33.75)
/ create startup script - it will be automatically executed
/ ensure file /lib/math.q exists
`:db/init.q 0: ("TheUniverse:42";"\\l ../lib/math.q")
Now execute this script to create the database.
q)\l ../scripts
`:db/LIFE
`:db/f
`:db/lookup
`:db/tref/
`:db/cust/
`:db/2025.01.01/t/
`:db/2025.01.02/t/
`:db/2025.01.01/q/
`:db/2025.01.02/q/
`:db/init.q
The resulting partitioned database has the following layout.
/db
sym
LIFE
f
Lookup
TheUniverse
F
init.q
/tref
.d
c1
c2
/cust
.d
sym
name
/2025.01.01
/t
.d
ti
sym
p
/q
.d
ti
sym
b
a
/2025.01.02
/t
.d
ti
sym
p
/q
.d
ti
sym
b
a
14.7.2 Segmented Database
To construct a segmented database, we combine previous steps to obtain the following mega-script. (We again assume that the scripts in /lib already exist.)
/ create serialized variables
`:db/LIFE set 42
`:db/f set {x*y}
`:db/lookup set ([s:`a`b`c] v:1 2 3)
/ create splayed tables
`:db/tref/ set ([] c1:1 2 3; c2:1.1 2.2 3.3)
`:db/cust/ set .Q.en[`:db;] ([] sym:`ibm`msft`t;
name:`IBM`Microsoft`ATT)
/ create segmented tables
extr:{[t;r] select from t where (`$1#'string sym) within r}
t:.Q.en[`:db;] ([] ti:09:30:00 09:31:00; sym:`ibm`t; p:101 17f)
q:.Q.en[`:db;] ([] ti:09:29:59 09:29:59 09:30:00;
sym:`ibm`t`ibm; b:100.75 16.9 100.8;a:101.25 17.1 101.1)
`:am/2025.01.01/t/ set extr[t;`a`m]
`:nz/2025.01.01/t/ set extr[t;`n`z]
`:am/2025.01.01/q/ set extr[q;`a`m]
`:nz/2025.01.01/q/ set extr[q;`n`z]
t:.Q.en[`:db;] ([] ti:09:30:00 09:31:00; sym:`t`ibm; p:17.1 100.9)
q:.Q.en[`:db;] ([] ti:09:29:59 09:29:59 09:30:00;
sym:`t`ibm`t; b:17 100.7 17.1;a:17.2 101.25 17.25)
`:am/2025.01.02/t/ set extr[t;`a`m]
`:nz/2025.01.02/t/ set extr[t;`n`z]
`:am/2025.01.02/q/ set extr[q;`a`m]
`:nz/2025.01.02/q/ set extr[q;`n`z]
`:db/par.txt 0: ("../am"; "../nz")
/ create startup script - it will be automatically executed
/ ensure file /lib/math.q exists
`:db/init.q 0: ("TheUniverse:42";"\\l ../lib/math.q")
Now execute this script to create the database.
q)\l ../scripts/mkdb2.q
`:db/LIFE
`:db/f
`:db/lookup
`:db/tref/
`:db/cust/
`:am/2025.01.01/t/
`:nz/2025.01.01/t/
`:am/2025.01.01/q/
`:nz/2025.01.01/q/
`:am/2025.01.02/t/
`:nz/2025.01.02/t/
`:am/2025.01.02/q/
`:nz/2025.01.02/q/
`:db/par.txt
`:db/init.q
The resulting partitioned database has the following layout:
/db
sym
LIFE
f
Lookup
TheUniverse
F
/tref
.d
c1
c2
/cust
.d
sym
name
par.txt
init.q
== drive 1 ==
/am
/2025.01.01
/t
.d
ti
sym
p
/q
.d
ti
sym
b
a
/2025.01.02
/t
.d
ti
sym
p
/q
.d
ti
sym
b
a
== drive 2 ==
/nz
/2025.01.01
/t
.d
ti
sym
p
/q
.d
ti
sym
b
a
/2025.01.02
/t
.d
ti
sym
p
/q
.d
ti
sym
b
a
14.8 There's No Place Like QHOME
We discuss how kdb+ interacts with the OS environment when it looks for files.
14.8.1 The Environment Variables
Three environment variables QHOME, QLIC and QINIT are used by kdb+ at startup.
The environment variable QHOME specifies the directory where kdb+ expects to find the bootstrap file q.k. By default, it also looks there for the license file k4.lic. If QHOME is not defined, kdb+ falls back to $HOME/q for Unix-based systems and c:\q for Windows.
The environment variable QLIC overrides the default location for the license file. If QLIC is not defined, kdb+ falls back to QHOME (or its fallback).
The environment variable QINIT specifies the name of the file that is executed immediately after the load of q.k. If QINIT is not defined, kdb+ attempts to load the file q.q from QHOME. If QHOME is not defined or q.q is not found, no error is reported.
Note
The file in QINIT is executed in the root context (i.e., \d .).
14.8.2 q in da hood
Upon startup, the current directory is the directory from which the q session was started or the database root if one was specified on the command line.
Loading a script does not in itself change the current directory. Of course, if that script specifies a startup database root or executes a \cd then the current directory may change. In particular, the current directory is not temporarily changed to the location of the target script.
Loading a database, either as part of the startup invocation or via a \l command, changes the current directory to the database root. All subsequent relative access is based from there. This is especially important when starting q with -u as it determines the file hierarchy visibility.
When asked to load a file whose name is not fully qualified, q first searches the current directory. If the file is not found, it then searches QHOME, if it is defined, and $HOME otherwise.