Partitioning tables across directories

A partitioned table is a splayed table that is further decomposed by grouping records having common values along a column of special type. The allowable special column types have underlying integer values: date, month, year and long.

If a database has multiple partitioned tables, the directory structure is going to contain the partitioning column at the top level (which means this must be uniform across the database), then each table will have its own directory for each partition. These directories effectively each contain a splayed table. The sym file for the all the tables is also located directly in the root directory of the entire database.

db
├── 2020.10.04
│   ├── quotes
│   │   ├── .d
│   │   ├── price
│   │   ├── sym
│   │   └── time
│   └── trades
│       ├── .d
│       ├── price
│       ├── sym
│       ├── time
│       └── vol
├── 2020.10.06
│   ├── quotes
..
└── sym

Partition data correctly: data for a particular date must reside in the partition for that date.

.Q.par

Creating partitioned tables

To create a partitioned table, we have to create the splayed tables in the directory system as above. To do that, we just need to see which part of our table goes to which partition, and to create that partition itself in the correct place.

Suppose we have the following table:

q)t:([]date:4#2020.10.04 2020.10.05;sym:`a`b`b`a;v:3 4 5 6)

The only new helper function we need is .Q.par which is going to construct the filepaths for the partitions of our table. Using this, we can do the following:

q){sv[`;.Q.par[`:db/;x;`t],`] set .Q.en[`:db/;delete date from select from t where date=x]}each exec distinct date from t
`:db//2020.10.04/t/`:db//2020.10.05/t/

This line of code has created all the necessary files for us in their correct locations. (We've created the splayed table for each date as well as the necessary sym file.)

Most of the time, we don't need to create the entire table from scratch, we only add the splayed table for the latest date. Because that is a common task, there's a helper function that does exactly that which is called .Q.dpft (named after its arguments). The function will save the table in the correct place and also do the enumeration, so that we only have to call it and do nothing else:

q)t:([]sym:`c;v:6 7)
q).Q.dpft[`:.;2020.10.06;`sym;`t]
`t
q)\l .     /reloading the current directory so that t becomes the table that's on disk instead of the current one in memory
q)t        /don't actually do this! partitioned tables might be very large and this can quickly cause issues.
date       sym v
----------------
2020.10.04 a   3
2020.10.04 b   5
2020.10.05 b   4
2020.10.05 a   6
2020.10.06 c   6
2020.10.06 c   7

Note

Notice that in both cases, at the point of writing the table down, it didn't have a date column. That's because the date column (along with its name, date) is automatically generated from the partition name. Writing down a table that has a date column date-partitioned will have unintended consequences.

Loading partitioned tables

To load a partitioned table, we should simply load the directory that contains all the partitions. We lose the ability to get the table which still worked with splayed tables (getting the directory that contains the entire database does "see" everything the directory contains but will not map the tables into memory).

Upserting into partitioned tables

Upserting works along the same logic with partitioned tables as with splayed ones: we can use the same functions we used when originally saving the table, but instead of set, we'll use upsert:

q)tn:([]sym:`a`b;v:2 5)
q)sv[`;.Q.par[`:db/;2020.10.06;`t],`] upsert .Q.en[`:db/;tn]
q)\l db
q)t
date       sym v
----------------
2020.10.04 a   3
2020.10.04 b   5
2020.10.05 b   4
2020.10.05 a   6
2020.10.06 c   6
2020.10.06 c   7
2020.10.06 a   2
2020.10.06 b   5

Table counts

For partitioned databases, q caches the count for a table, and this count cannot be updated from within a reval expression or from a secondary thread.

To avoid noupdate errors on queries on partitioned tables, put count table in your startup script.

Restricted actions on partitioned tables

There are certain keywords that work with segmented tables, but don't work with partitioned ones. For a full breakdown of these differences, refer to

Use case

Partition a table if any of the following conditions apply:

it has over 100 million records
it has a column that cannot fit in memory
it grows
many queries can be limited to a range of values of the partitioning column

count, maps, peach, reval, select

Errors, Parallel execution

Q for Mortals §14.3 Partitioned Tables

Flat tables
Splayed tables
Segmented databases