par.txt defines a top-level partitioning of a database into directories. Each row of
par.txt is a directory path. Each such directory would itself be partitioned in the usual way, typically by date. The directories should not be empty. The
par.txt file should be created in the main database directory.
par.txt is used to unify partitions of a database, presenting them as a single database for querying.
This is particularly useful in combination with multithreading. Starting the kdb+ process with secondary threads (see command line option
-s), and where each partition in
par.txt is on a separate local disk:
when the q process is started with secondary threads, the partitions in
par.txtare allocated to secondary threads on a round-robin basis, i.e. if kdb+ is started with
nsecondary threads, then partition
pis given to secondary thread
p mod n. This gives maximum parallelization for queries over date ranges.
if also, the partitions in
par.txtare on separate disks, this means that each thread gets its own disk or disks, and there should be no disk contention (i.e. not more than one thread issuing commands to any one disk). Ideally, there should be one disk per thread. Note that this works best where the disks have fully independent access paths CPU-disk controller-disk, but may be of little use with shared access due to disk contention, e.g. with SAN/RAID.
par.txt might be:
/0/db /1/db /2/db /3/db
with directories :
~$ls /0/db 2009.06.01 2009.06.05 2009.06.11 ... ~$ls /1/db 2009.06.02 2009.06.06 2009.06.12 ... ...
For partitioned databases, q caches the count for a table, and this count cannot be updated from within a
reval expression or from a secondary thread.
noupdate errors on queries on partitioned tables, put
count table in your startup script.
- the data should be partitioned correctly across the partitions – i.e. data for a particular date should reside in the partition for that date.
- the secondary/directory partitioning is for both read and write.
- the directories pointed to in
par.txtmay only contain appropriate database subdirectories. Any other content (file or directory) will give an error.
- the same subdirectory name may be in multiple
par.txtpartitions. For example, this would allow symbols to be split, as in A-M on
/0/db, N-Z on
/1/db(e.g. to work around the 2-billion row limit). Aggregations are handled correctly, as long as data is properly split (not duplicated). Note that in this case, the same day would appear on multiple partitions.
Q for Mortals: §14.4 Segmented Tables for an extended discussion of this topic