Skip to content

Splayed tables

Medium-sized tables (up to 100 million rows) are best stored on disk splayed: each column is stored as a separate file, rather than using a single file for the whole table.

quotes/
├── .d
├── price
├── sym
└── time

Hidden file .d lists the columns in the order they appear in the table.

Tables that have many columns are good candidates for splaying, as most queries access only a small subset of those columns.

To save a table splayed, use set to save it not to a file

q)`:filename set table
`:filename

but to a directory

q)`:dirname/ set table
`:dirname/

The table must be

  • fully enumerated; i.e. no repeated symbols
  • simple, not keyed

Example:

q)m1:1000000 / 1 million
q)show q:value[countries]`code / country dialling codes
86 91 81 55
q)edpn:{10000000+x?80000000} / eight-digit phone numbers

q) a million calls with times, origin and destination codes and numbers
q)3#calls:([]tim:m1?23:59; occ:m1?q; ono:edpn m1; dcc:m1?q; dno:edpn m1)
tim   occ ono      dcc dno
-------------------------------
09:30 91  56809216 55  10241715
18:16 81  31860713 81  19766096
19:46 55  22477903 55  19899746

q)`:calls/ set calls
`:calls/
calls/
├── dcc
├── dno
├── occ
├── ano
└── tim
❯ ls -a calls
.   ..  .d  dcc dno occ ono tim

Nested columns

Nested columns contain items that are not atoms. A nested column can be splayed if and only if it is a compound list: its items are all vectors, i.e. simple lists of the same type:

("quick";"brown";"fox")
3 4#til 12

A compound column in a splayed table is represented in the filesystem by two files. One bears the name of the column; the other is the same name suffixed with a #.

A common question in designing a table schema is whether to represent text as strings (compound) or symbols (vector). Where many values are repeated, as in stock-exchange codes, symbols have important advantages:

  • atomic semantics in code
  • compact storage
  • fast execution

With fewer repeated values, these advantages dwindle, but the penalty of a bloated sym list remains. Fields such as comments or notes should always be strings.

Enumerating symbol columns

If a table contains columns of type symbol with repeated items (i.e. the table is not fully enumerated), trying to save it splayed will result in a type error.

q)tr
date       open  high  low   close volume    sym
-------------------------------------------------
2020.10.03 24.5  24.51 23.79 24.13 19087300  AMD
2020.10.03 27.37 27.48 27.21 27.37 39386200  MSFT
2020.10.04 24.1  25.1  23.95 25.03 17869600  AMD
2020.10.04 27.39 27.96 27.37 27.94 82191200  MSFT
..
q)`:tr/ set tr
'type

Tables splayed across a directory must be fully enumerated and not keyed.

The solution is to enumerate symbol columns before saving the table splayed. This is done with the function .Q.en.

q).Q.en[`:dir] tr
date       open  high  low   close volume    sym
-------------------------------------------------
2020.10.03 24.5  24.51 23.79 24.13 19087300  AMD
2020.10.03 27.37 27.48 27.21 27.37 39386200  MSFT
2020.10.04 24.1  25.1  23.95 25.03 17869600  AMD
2020.10.04 27.39 27.96 27.37 27.94 82191200  MSFT
..

This assigns to the variable sym the list of unique symbols in the table:

q)sym
`AMD`MSFT

It also creates the directory dir with a file in it, named sym, with the same contents:

q)\ls dir
"sym"
q)value `:dir/sym
`AMD`MSFT

Finally, it returns a table where columns of symbols are enumerated.

q)trenum: .Q.en[`:dir] tr
q)trenum.sym
`sym$`AMD`MSFT`AMD`MSFT`AMD`MSFT`AMD`MSFT`AMD`MSFT`AMD`MSFT`AMD`MSFT`AMD`MSFT

The value of this column differs from that of the original table tr:

q)tr.sym
`AMD`MSFT`AMD`MSFT`AMD`MSFT`AMD`MSFT`AMD`MSFT`AMD`MSFT`AMD`MSFT`AMD`MSFT

The enumerated table can now be saved splayed.

q)`:dir/tr/ set trenum
`:dir/tr/

The columns are saved separately, one per file:

q)\ls -a dir/tr
,"."
".."
".d"
"close"
"date"
"high"
"low"
"open"
"sym"
"volume"

This can also be done in a single step, without saving the enumerated table into a variable:

q)`:dir/tr/ set .Q.en[`:dir] tr
`:dir/tr/

Notice that the sym list is stored separately from the table. If the symbols are common to several tables it may be convenient to write the sym list in their common parent directory.

db/
├── sym
├── quotes/
└── tr/

Since table tr has a column sym, represented by file db/tr/sym, one place you cannot write the sym list is in db/tr.

Working with sym files

Enumerating nested symbol columns prior to V3.4

Since V3.0 2011.11.25 a splayed table column can hold nested sym vectors.

Prior to V3.4, .Q.en did not enumerate nested sym lists. It had to be done manually, e.g.

`:/db/2013.01.10/tt/ set .Q.en[`:/db] update c3:{`:sym?raze x;`sym$'x}c3 from t

Loading splayed tables

There are various ways to load or read a splayed table.

Start q with the directory:

$ q calls
KDB+ 4.0 2020.10.02 Copyright (C) 1993-2020 Kx Systems
m64/ 12()core 65536MB sjt mackenzie.local 127.0.0.1 EXPIRE..
q).z.f
`calls
q)calls
tim   occ ono      dcc dno
-------------------------------
17:26 55  13464625 55  47918725
10:26 86  15538442 91  72388386
23:23 81  50209480 91  88260320
..

Load the directory:

q)\l calls
`calls
q)system"l calls"
`calls

q)load`calls
`calls

‘Loading’ the table in fact maps it to memory. None of it is written into memory by the load itself.

Get the table:

q)get`:calls
tim   occ ono      dcc dno
-------------------------------
17:26 55  13464625 55  47918725
10:26 86  15538442 91  72388386
23:23 81  50209480 91  88260320
..

Symbol columns in a splayed table are stored as enumerations of a list sym, stored separately from the table.

Retrieving the table without the sym list leaves its symbol columns as bare enumerations.

q)get `:db/tr
date       time  vol   sym price
-----------------------------------
2020.07.03 11:23 12378 0   23.57101
2020.07.02 11:36 63682 1   24.06836
..
q)load `:db/sym
`sym
q)get `:db/tr
date       time  vol   sym  price
------------------------------------
2020.07.03 11:23 12378 APPL 23.57101
2020.07.02 11:36 63682 MSFT 24.06836
..

When q loads a directory it loads everything in it that represents a kdb+ object. So it is convenient to store a shared sym list in the root directory of a database.

db/
├── sym
├── quotes/
└── tr/
$ q db
KDB+ 4.0 2020.10.02 Copyright (C) 1993-2020 Kx Systems
m64/ 12()core 65536MB sjt mackenzie.local 127.0.0.1 EXPIRE ..

q)sym
`APPL`MSFT`AMD`IBM
q)tr
date       time  vol   inst price
------------------------------------
2020.07.03 11:23 12378 APPL 23.57101
2020.07.02 11:36 63682 MSFT 24.06836
..

Changing the schema of splayed tables

Some keywords do not work with splayed tables, in particular those that change the schema. Here is how to write queries that change the schema.

Consider the following table:

q)tr
date       open  high  low   close volume   sym
------------------------------------------------
2020.10.03 24.5  24.51 23.79 24.13 19087300 AMD
2020.10.03 27.37 27.48 27.21 27.37 39386200 MSFT
..

Removing a column is a simple matter:

q)delete volume from tr
date       open  high  low   close sym
---------------------------------------
2020.10.03 24.5  24.51 23.79 24.13 AMD
2020.10.03 27.37 27.48 27.21 27.37 MSFT
..

But that does not work with a splayed table:

q)trade
date       open  high  low   close volume   sym
------------------------------------------------
2020.10.03 24.5  24.51 23.79 24.13 19087300 AMD
2020.10.03 27.37 27.48 27.21 27.37 39386200 MSFT
..
q)delete volume from `trade
'splay
q)trade: delete volume from trade
'splay

The .d file contains the schema of a splayed database:

q)value `:trade/.d
`date`open`high`low`close`volume`sym

To remove a column from the schema, we can simply remove the column name from this file:

q).[`:trade/.d;();:;`date`open`high`low`close`sym]
`:trade/.d

For the changes to take effect we need to reload the splayed table.

$ q dir
KDB+ 4.0 2020.10.02 Copyright (C) 1993-2020 Kx Systems
m64/ 12()core 65536MB sjt mackenzie.local 127.0.0.1 EXPIRE..
q)\v
`s#`sym`trade
q)trade
date       open  high  low   close sym
---------------------------------------
2020.10.03 24.5  24.51 23.79 24.13 AMD
..

Notice that the file with the volume column has not been deleted. It is just not being used.

q)\ls trade
"close"
"date"
"high"
"low"
"open"
"sym"
"volume"

Adding a new column is similar: save the column contents as a file in the directory of the splayed table and update the file .d.

q)@[`:trade;`newcol;:;til 8]
`:trade
q).[`:trade/.d;();,;`newcol]
`:trade/.d

Verify the changes by loading the splayed table into a q session.

$ q dir
KDB+ 4.0 2020.10.02 Copyright (C) 1993-2020 Kx Systems
m64/ 12()core 65536MB sjt mackenzie.local 127.0.0.1 EXPIRE ..

q)trade
date       open  high  low   close sym  newcol
----------------------------------------------
2020.10.03 24.5  24.51 23.79 24.13 AMD  0
2020.10.03 27.37 27.48 27.21 27.37 MSFT 1
2020.10.04 24.1  25.1  23.95 25.03 AMD  2
..


Splaying large files
Partitioned tables
Q for Mortals §11.3 Splayed Tables
Q for Mortals §14.2 Splayed Tables