KdbplusForMortals/partitioned tables

From Kx Wiki
Jump to: navigation, search

Contents

1.3 Partitioned Tables

1.3.0 Overview

To recap the previous section, a splayed table is split vertically along its columns and is subsequently mapped into memory. When a query is executed against the mapped table, column data is loaded as needed and is cached by the O/S. This reduces the table’s memory footprint, especially for tables with many columns.

A single day’s trades and quotes work well as splayed tables, but several years of trades and quotes would result in very large column lists. When a table is so large that even the individual columns may not fit into memory, we can further decompose it by slicing horizontally. The solution for trades and quotes is to slice into daily portions.

1.3.1 Partitions

A partitioned table is a splayed tabled that is further decomposed by grouping records having common values along a column of special type. The allowable special column types have the property that the underlying value is an integer. The slice of records having a given value is splayed into a directory, called a partition, whose name is that value. In the canonical example, historical trades (or quotes) are stored in daily partition directories—remember a q date is an integer under the covers.

Big Picture (2): We think of a partitioned table as a two-dimensional persisted form since it is cut in two directions: vertically by splaying along columns and horizontally by slicing into partitions.

column1 column2 column3
partition1
partition2

Here each ▪ represents a (partial) column of a partition slice as an opaque entity.

To map a partitioned table into memory, point q startup or \l at the root directory. When q recognizes a valid partitioning under the root, it creates a map for the partitions and columns. The mapped table appears whole to users due to magic happening behind the scenes. Recall that columns of mapped splayed tables are loaded into memory only as needed. With mapped partitioned tables, only the required column slices are loaded.

1.3.1.1 Partition Layout

As with splaying, kdb+ implements partitioning via the OS directory structure. Each partition must be stored in a directory immediately under the database root directory. The directory name is the common column value for all the rows in the partition. Each partition directory in turn contains a subdirectory holding a splayed table with the records for that slice. This two dimensional decomposition is reflected in two levels of nested directories under the root.

    /root
      /partitionvalue1
        /tablename
          ...
      /partitionvalue2
        /tablename
          ...
    …

Since each splayed table subdirectory name is the name of the table, these names must be consistent across all partitions. Likewise, as portions of a single table, all splayed slices must have identical schemas. This means that each slice directory contains column files with the same names and internal data types, as well as identical .d files.

    /root
      /partitionvalue1
        /tablename
          .d
          column1name
          column2name
          …
      /partitionvalue2
        /tablename
          .d
          column1name
          column2name
          …
        …

Important: Since a partition directory name factors out the common value for all records in its slice, do not include the partition column when you splay a partition slice. Instead, kdb+ infers the name, value and type from the partition directory name (‼) and creates a virtual column from this information. The name of the virtual column is set by q and can not be changed.

1.3.1.2 Partition Examples

For our first example, we consider a table t of tick data with daily partitions.

    2009.01.01, 2009.01.02, …

The virtual column is date in this case. Under each partition directory, there is a subdirectory t with that day’s slice of records.

    /db
      /2009.01.01
        /t           <- trade records for 2009.01.01
      /2009.01.02
        /t           <- trade records for 2009.01.02
      …

Execution of the query,

    select from t where date within 2009.01.01 2009.01.02

results in only two daily partitions being read. This is clearly a big win compared to reading the entire year’s data.

Alternatively, if we partition by month,

    2009.01m, 2009.02m, …

the virtual column name is month. A query such as:

    select from t where month within 2009.01m 2009.03m

results in three monthly partitions being read.

For daily data, partitioning by year may be better.

    2009, 2008, …

In this case, the virtual column is year even though there is no year type in q. (The values are recognized as valid q years). The query,

    select from t where year>=2006

results in four yearly partitions being read. This is a reasonable chunk size to load into memory, as it comprises approximately 1000 records.

We can also partition by an arbitrary int such as a slot number:

    0, 1, …

This results in a virtual column int.

1.3.1.3 Partition Domain

We call the type of the virtual column for the partition the partition domain. As noted previously, the partition domain must have an underlying integral value.

Important: You can not use a symbol column as a partition domain, even if the symbols are enumerated.

A kdb+ database can only have a single partition domain. This means that you must create separate databases if you need partitions of different granularity. For example, you cannot have daily and monthly partitions in one database.

There is no requirement that there be entries for every value in the partition domain; for example, there may be no trades for holidays:

    /db
      /2008.12.31
        /t
      /2009.01.02
        /t

1.3.2 Creating Partitions

Creating a partition is as simple as writing out the splayed slice to the appropriate directory. There is no one-size-fits-all utility to define and create the partitions and splayed slices for you. You write a custom program (usually in q, perhaps with some Perl pre-processing) to load and populate your database.

Ini practice, a partitioned history database is usually loaded one partition at a time across multiple tables. For example, a daily kdb+ historical database might be back-filled with a script that extracts one day’s historical trade and quote data from another data source and then splays it into a kdb+ partition. Once all historical data is captured, you may manufacture subsequent partitions by capturing intra-day trades and quotes and splaying them into the current day’s partition during end-of-day processing.

1.3.2.1 Partitioning Tables without Symbols

As our first example, we consider a simple daily-partitioned table with no symbol columns. In a fresh root directory /db we will partition a table t having the schema:

([] ti:`time$(); p:`float$())

Observe that this schema does not contain a date column, as the common date value for the records in a slice will be captured in the partition directory name.

We manually create the first two daily partitions of the year:

    `:/db/2009.01.01/t/ set ([] ti:09:30:00 09:31:00; p:101 102f)
`:/db/2009.01.01/t/
    `:/db/2009.01.02/t/ set ([] ti:09:30:00 09:31:00; p:101.5 102.5)
`:/db/2009.01.02/t/

When we point kdb+ at /db, the partitioned table appears to be a normal table, but this is an illusion created by memory mapping.

    \l /db
    t            / this is unsafe as it will load all of t!
date       ti       p
-------------------------
2009.01.01 09:30:00 101
2009.01.01 09:31:00 102
2009.01.02 09:30:00 101.5
2009.01.02 09:31:00 102.5

1.3.2.2 Partitioning Tables with Symbols

We enhance the previous example to include a symbol column, again starting with a fresh /db.

    `:/db/2009.01.01/t/ set
    ([] ti:09:30:00 09:31:00; s:`:/db/sym?`ibm`msft; p:101 33f)
`:/db/2009.01.01/t/
    `:/db/2009.01.02/t/ set
    ([] ti:09:30:00 09:31:00; s:`:/db/sym?`ibm`msft; p:101.5 33.5)
`:/db/2009.01.02/t/

Because `:/db/sym? creates the file on first use, the enumeration domain is correctly written into the file sym in the root directory.

We load /db and find that all is well:

    \l /db
    sym
`ibm`msft
    select from t where date within 2009.01.01 2009.01.02
date       ti       s    p
------------------------------
2009.01.01 09:30:00 ibm  101
2009.01.01 09:31:00 msft 33
2009.01.02 09:30:00 ibm  101.5
2009.01.02 09:31:00 msft 33.5

1.3.3 Working with Partitioned Tables

Continuing with the example of the previous section, we discover that we can perform basic table operations on it.

    count t
4j
    cols t
`date`ti`s`p
    meta t
c   | t f a
----| -----
date| d
ti  | v
s   | s
p   | f

Other primitive operations do not work on partitioned tables and generate errors. These include:

/ All the following generate errors for partitioned table t
/ although they work for splayed tables

    t[0j]
    t[;`p]
    0#t
    select[1] from t
    `p xasc t

1.3.3.1 The select Template

The select template, or the equivalent functional form, is the way to access data for a partitioned table. The other templates—exec, update and delete—do not work on partitioned tables. If you need exec, apply it against the result of the corresponding select.

Important: The first sub-phrase of a where clause in a query against a partitioned table must constrain either the virtual column or a column having an index. If you fail to do this, the entire database will be scanned. Long before the query completes, your colleagues will show up at your workstation wielding pitchforks.

For example, to retrieve all records for one day:

    select from t where date=2009.01.01
date       ti       s    p
----------------------------
2009.01.01 09:30:00 ibm  101
2009.01.01 09:31:00 msft 33

When you issue a select query against a partitioned table, the query is applied sequentially against the partition slices. The partial results are unioned in the order of the virtual column values to give the result table.

    select from t where date within 2009.01.01 2009.01.02
date       ti       s    p
------------------------------
2009.01.01 09:30:00 ibm  101
2009.01.01 09:31:00 msft 33
2009.01.02 09:30:00 ibm  101.5
2009.01.02 09:31:00 msft 33.5

When you combine the virtual column constraint with other constraints, always place it first.

    select from t where date=2009.01.01, ti<09:30:30
date       ti       s   p
---------------------------
2009.01.01 09:30:00 ibm 101

To retrieve data from the earliest or most recent partition without explicitly specifying the partition, you can use first and last or min and max.

    select from t where date=first date
date       ti       s    p
----------------------------
2009.01.01 09:30:00 ibm  101
2009.01.01 09:31:00 msft 33

    select from t where date=max date
date       ti       s    p
------------------------------
2009.01.02 09:30:00 ibm  101.5
2009.01.02 09:31:00 msft 33.5

You can group by the partition virtual column:

    select hi:max p, lo:min p by date from t where date within
      2009.01.01 2009.01.02
date      | hi    lo
----------| ----------
2009.01.01| 101   33
2009.01.02| 101.5 33.5

1.3.3.2 The Virtual Column i in Partitioned Tables

In a partitioned table, the virtual column i does not refer to absolute row number. Instead, it refers to the relative row number within a partition—otherwise put, i refers to the absolute row number in the slice. Thus, a constraint on i alone would apply across all partitions and the result will contain that row in each partition slice. Probably not what you want and almost certainly a bad idea (colleagues with pitchforks again).

The following query retrieves the initial record from each partition:

    select from t where date in 2009.01.01 2009.01.02, i=0
date       ti       s   p
-----------------------------
2009.01.01 09:30:00 ibm 101
2009.01.02 09:30:00 ibm 101.5

Similarly, the following queries retrieve the first and last records in the table:

    select from t where date=first date, i=0
date       ti       s   p
---------------------------
2009.01.01 09:30:00 ibm 101

    select from t where date=max date, i=max i
date       ti       s    p
-----------------------------
2009.01.02 09:31:00 msft 33.5

Use the utility .Q.ind discussed in §1.5.3 to access rows in a partitioned table based on an absolute index.

1.3.4 Query Execution on Partitioned Tables (Advanced)

Query execution against a partitioned table is more complex than execution against a splayed table. Understanding how it works is necessary to achieve good design and performance with kdb+ databases.

1.3.4.0 Overview

Recall that the motivation for partitions was to avoid loading entire columns into memory. Behind the scenes, kdb+ achieves this as follows:

Reading only partial column slices of a table having many partitions is a big memory and performance win.

We can immediately deduce the necessity of constraining the virtual partition column. If you constrain the virtual column, only partial columns in the target slices are actually read. If you do not, kdb+ may be required to read data from all slices into memory sequentially. While this may not exhaust memory, it will likely put your server into live-lock.

The tradeoff for the decrease in memory footprint provided by partitioning is complexity: the query executes against each partition and combines the partial results. Kdb+ does this with a form of map-reduce, when appropriate; map applies against the partitions to obtain partial results, which are then reduced.

1.3.4.1 Map-Reduce

We start with a brief introduction to map-reduce. Suppose we want to compute some aggregation—say the maximum or the average—of a large list of numbers and have at our disposal a crop of interns armed with netbooks. Our strategy is to decompose the overall computation into two stages. In the first stage, we split the list into sublists, handing one to each intern together with instructions on what to compute. In the second step, we collect the results from each intern and combine them to get the overall result.

Think about how to do this for a moment before continuing. For the maximum, you could instruct each intern to compute the maximum of her list and then take the maximum of the individual maximums. Similarly for count or sum.

For average, things aren't quite so simple. If you simply had each intern compute a partial average, you would need the count as well, since you need to weight each partial average to get the final average. You quickly realize that you can decompose the average computation more effectively. Instead of asking each intern to return the average and count for her list, ask her to compute the sum and the count. In the reduce step, you then obtain the overall average as the sum of the partial sums divided by the sum of the partial counts.

[TBA: Insert Diagram here]

More generally, map-reduce decomposes an operation on a (presumably large) list into two suboperations, opmap and op,reduce. In the first step, called 'map', opmap is performed on each sublist to obtain a list of partial results. In the second step, called "reduce", the partial result lists are combined with opreduce to obtain the final result. A good exercise (and interview question) is to express sorting a list using map-reduce.

1.3.4.2 Map-Reduce in Queries

It is easy to see how map-reduce applies to a query against a partitioned table, since the table is a large list of records sliced into sub-lists by the partitioning. The challenge is to decompose the query into a map step and a reduce step. The solution depends on whether the query involves aggregation.

If there is no aggregation, the result of the query on each partition is simply the computed columns for the list of the records in the partition slice matching the constraint. In other words, produce a partial result table by computing the columns of the query sequentially across partitions. Because all the partial result tables conform, combine the partial result tables by join in order of their virtual partition column values. In summary: fan the query across the partitions and union the ordered results

Things are more interesting when the query contains aggregation. To begin, kdb+ determines if the aggregation is one for which it has a map-reduce decomposition—e.g., avg. For aggregates that it recognizes, kdb+ applies the map sub-operation sequentially across partition slices to obtain partial results. It then applies the reduce sub-operation across the partial result tables to obtain the final result table.

The aggregates that q can decompose with map-reduce are:

avg, cor, count, cov, dev, distinct, first, last, max, med, min, prd, sum, var, wavg, wsum

1.3.4.3 Map-Reduce Aggregates vs. Brute Force

To make this more concrete, we consider two approaches to computing an average against our daily partitioned trades table. In the first scenario, we write the naive query a user would write against the same table in memory:

    select avg size from trade where date within …

Because kdb+ recognizes avg as an aggregate it can decompose with map-reduce, it performs the calculation in two steps. Specifically, opmap computes the sum of size and a record count for each date, while opreduce divides the sum of the daily sums by the sum of the daily counts to give the average. The user gets the expected result and is none the wiser.

In our second scenario, we mimic the same computation without the map/reduce. Kdb+ would have to retrieve the date and size columns from all relevant partitions into memory simultaneously:

    select date,size from trade where date within …

and then do the brute force average calculation:

    exec (sum size)%count i from
      select date,size from trade where date within …

That whooshing sound is the memory advantage of partitioning going out the window.

1.3.5 Modifying Partitioned Tables

Unlike with splayed tables, attempting to apply update to a mapped partitioned table yields an error.

    update p:42. where date=2009.01.02,sym=`ibm,ti=09:30:00 from `t
'par

Modifying the persisted image of a partitioned table on disk is possible provided you know what you are doing and are careful (make backups). Since the table is essentially a collection of splayed slices, you can manipulate the slices using the techniques of §1.3. You must apply schema changes across all virtual partitions, whereas you can make data updates to individual slices.

Recall the mapped daily-partitioned example of the previous section:

    select from t where date in 2009.01.01 2009.01.02
date ti s p
------------------------------
2009.01.01 09:30:00 ibm 101
2009.01.01 09:31:00 msft 33
2009.01.02 09:30:00 ibm 101.5
2009.01.02 09:31:00 msft 33.5

1.3.5.1 Appending Records within a Partition

Since each partition slice is a splayed table, we follow the example in §1.2.4 to append records to the slice. Here we append a record to the end of the 2009.01.02 slice of t. Observe that we must remap to see the new record.

    select from t where date in 2009.01.01 2009.01.02
date       ti       s    p
------------------------------
2009.01.01 09:30:00 ibm  101
2009.01.01 09:31:00 msft 33
2009.01.02 09:30:00 ibm  101.5
2009.01.02 09:31:00 msft 33.5

    `:/db/2009.01.02/t/ upsert (09:33:00;`:/db/sym?`ibm; 103.)
`:/db/2009.01.02/t/

    \l /db
    select from t where date in 2009.01.01 2009.01.02
date       ti       s    p
------------------------------
2009.01.01 09:30:00 ibm  101
2009.01.01 09:31:00 msft 33
2009.01.02 09:30:00 ibm  101.5
2009.01.02 09:31:00 msft 33.5
2009.01.02 09:33:00 ibm  103

1.3.5.2 Modifying Data in a Partition

Now we want to modify the price for msft at 09:31:00 on the first day. There is no primitive operation for this. We put the relative row number into i, get the saved column into p, modify the column and write it back out. Observe that a remap is not necessary when we modify an existing “field” and do not change the data structure

    i:exec x from
    select i from t where date=2009.01.01,ti=09:31:00,s=`msft
    ptmp:exec p from select p from t where date=2009.01.01
    @[`:/db/2009.01.01/t;`p;:;@[ptmp;i;:;33.33]]
`:/db/2009.01.01/t
    select from t where date in 2009.01.01 2009.01.02
date ti s p
------------------------------
2009.01.01 09:30:00 ibm 101
2009.01.01 09:31:00 msft 33.33
2009.01.02 09:30:00 ibm 101.5
2009.01.02 09:31:00 msft 33.5
2009.01.02 09:33:00 ibm 103

Important: As the previous example demonstrates, a field update requires you to read, modify and then write the entire column in the slice. To modify the whole row—e.g., to delete it—you have to read, modify and write all the data in the slice. It requires the same amount of I/O to update a single row or every record! This leads to the oft-heard sentiment that a partitioned kdb+ table is effectively read-only except for appends.

In order to remove the column p, we apply the technique of §1.2.5 across all partitions.

    d:exec date from select distinct date from t
    f:`:/db

    {.[f;x,`t`.d;:;`ti`s]; hdel ` sv f,x,`t`p} each `$string d
`:/db/2007.01.01/t/p`:/db/2007.01.02/t/p

    \l /db
    select from t where date in 2009.01.01 2009.01.02
date ti s
------------------------
2007.01.01 09:30:00 ibm
2007.01.01 09:31:00 msft
2007.01.02 09:30:00 ibm
2007.01.02 09:31:00 msft
2009.01.02 09:33:00 ibm

We leave it as an exercise for the reader to generalize this example to remove multiple columns from an arbitrary partitioned table.

Next we demonstrate how to add a column by resurrecting the just-removed p as a column of nulls. We first create a dictionary cs whose keys are symbols with the partition names and whose values are the corresponding table counts. A lambda that creates an appropriately sized p column and augments .d is applied over cs.

    cs:exec first x by date from select count i by `$string date from t
    f:`:/db

    {.[f;(pnm:`$string x),`t`p;:;(d x)#0n]; .[f;pnm,`t`.d;,;`p]} each key cs
`:/db`:/db

    select from t where date in 2009.01.01 2009.01.02
date ti s p
--------------------------
2009.01.01 09:30:00 ibm
2009.01.01 09:31:00 msft
2009.01.02 09:30:00 ibm
2009.01.02 09:31:00 msft

1.3.6 Multiple Partitioned Tables

A partition domain can slice more that one table. Each table will have its own splayed subdirectory in each partition directory, holding its slice of records. We think of such tables as partitioned in parallel.

For example, we place trade and quote slices in the same daily partition:

  /db
    /2009.01.01
      /trade
      /quote
    /2009.01.02
      /trade
      /quote

Note: Any non-empty partition directory must have an entry for all tables. If the slice for a table is empty, splay the table schema—i.e., an empty table.

  /db
    /2009.01.01
      /quote  <- this is a bad partition!
    /2009.01.02
      /trade
      /quote

1.3.6.1 Daily trades and quotes

Here we partition simplified trades and quotes. To begin, in a fresh /db we reconstruct the example from §1.3.0:

    `:/db/2009.01.01/t/ set  ([] ti:09:30:00 09:31:00; sym:`:/db/sym?`ibm`msft;
      p:101 33f)
`:/db/2009.01.01/t/

    `:/db/2009.01.02/t/ set ([] ti:09:30:00 09:31:00; sym:`:/db/sym?`ibm`msft;
      p:101.5 33.5)
`:/db/2009.01.02/t/

Now we add a second table q to the existing partitions using the same enumeration domain.

    `:/db/2009.01.01/q/ set ([] ti:09:30:00 09:31:00; sym:`:/db/sym?`ibm`msft;
       b:100.75 32.75; a:101.25 33.25f)
`:/db/2009.01.01/q/

    `:/db/2009.01.02/q/ set ([] ti:09:30:00 09:30:00; sym:`:/db/sym?`ibm`msft;
      b:101.25 33.25; a:101.75 33.75)
`:/db/2009.01.02/q/

Loading /db we find both t and q are mapped.

    \l /db
    select from t where date in 2009.01.01 2009.01.02
date ti sym p
------------------------------
2009.01.01 09:30:00 ibm 101
2009.01.01 09:31:00 msft 33
2009.01.02 09:30:00 ibm 101.5
2009.01.02 09:31:00 msft 33.5

    select from q where date in 2009.01.01 2009.01.02
date ti sym b a
--------------------------------------
2009.01.01 09:30:00 ibm 100.75 101.25
2009.01.01 09:31:00 msft 32.75 33.25
2009.01.02 09:30:00 ibm 101.25 101.75
2009.01.02 09:30:00 msft 33.25 33.75

1.3.6.2 Missing Table in Partition

Now we create an additional daily slice for q but (erroneously) omit the t slice.

    `:/db/2009.01.03/t/ set ([] ti:09:30:00 09:31:00;
      sym:`:/db/sym?`ibm`msft; b:101. 33.; a:101.5 33.5f)
`:/db/2098.01.03/t/

We find that the load fails due to the missing slice of t.

    \l /db
t
k){+(!+. x)!`/:dd[y;z],x}
'./2009.01.03/t: No such file or directory
.:
`:./2009.01.03/t

To rectify this, we splay an empty slice of t to the offending date partition and the mapping is successful.

    t3:([] ti:`second$(); s:`sym$(); p:`float$())
    :/db/2009.01.03/t/ set ([] ti:`second$(); s:`sym$(); p:`float$())
`:/db/2009.01.03/t/

    \l /db
    select from t where date within 2009.01.01 2009.01.03
date       ti       sym  p
------------------------------
2009.01.01 09:30:00 ibm  101
2009.01.01 09:31:00 msft 33
2009.01.02 09:30:00 ibm  101.5
2009.01.02 09:31:00 msft 33.5

    select from q where date within 2009.01.01 2009.01.03
date       ti       sym  b      a
--------------------------------------
2009.01.01 09:30:00 ibm  100.75 101.25
2009.01.01 09:31:00 msft 32.75  33.25
2009.01.02 09:30:00 ibm  101.25 101.75
2009.01.02 09:30:00 msft 33.25  33.75
2009.01.03 09:30:00 ibm  101    101.5
2009.01.03 09:31:00 msft 33     33.5

The utility .Q.chk discussed in §1.5.6 automates finding and filling missing partition slices.

1.3.7 Examples of Partition Domains

Although we have used daily partitions in our examples thus far, other types having an underlying integral value are permissable for the partition domain. Beside date, the others are month, year and int.

Note: You cannot partition based on a symbol column, even with enumerated symbols. However, you can segment based on a symbol. See §1.4 below.

1.3.7.1 Yearly

For our first example, in a fresh /db we save daily price data into yearly partitions.

    `:/db/2008/t/ set ([] dt:2008.01.01 2008.01.02; p:101 102f)
`:/db/2008/t/
    `:/db/2009/t/ set ([] dt:2009.01.01 2009.01.02; p:101.5 102.5)
`:/db/2009/t/
    \l /db
    select from t where year in 2008 2009
year dt         p
---------------------
2008 2008.01.01 101
2008 2008.01.02 102
2009 2009.01.01 101.5
2009 2009.01.02 102.5

1.3.7.2 Monthly

We create monthly partitions in a fresh /db.

    `:/db/2009.01/t/ set ([] dt:2008.01.01 2008.01.02; p:101 102f)
`:/db/2009.01/t/
    `:/db/2009.02/t/ set ([] dt:2009.01.01 2009.01.02; p:101.5 102.5)
`:/db/2009.02/t/
    \l /db
    select from t where month in 2009.01 2009.02
month   dt         p
------------------------
2009.01 2008.01.01 101
2009.01 2008.01.02 102
2009.02 2009.01.01 101.5
2009.02 2009.01.02 102.5

1.3.7.3 int

We can also create integral slot partitions in a fresh /db.

    `:/db/1/t/ set ([] ti:09:30:00 09:31:00; p:101 102f)
`:/db/1/t/
    `:/db/2/t/ set ([] ti:09:30:00 09:31:00; p:101.5 102.5)
`:/db/2/t/
    \l /db
    select from t where int in 1 2
int ti       p
------------------
1   09:30:00 101
1   09:31:00 102
2   09:30:00 101.5
2   09:31:00 102.5

1.3.8 Links and Partitioned Tables

Partitioned tables can have link columns as long as the links do not span partitions. For example, you can create intra-day links for a daily-partitioned table, but the links cannot cross days.

Because links are intra-partition, the process of creating a link for a partitioned table reduces to that of §1.2.8 on each partition. If the target table of the link already exists, you can create the link column when you splay the rest of the columns in the slice; otherwise, you can create it on disk later.

1.3.8.1 Linking to a Column in Memory

In our first example, we create a link between non-symbol columns in partitioned tables t1 and t2. We create the first day’s tables, treating the link column just as we would a foreign key (but with different notation), then we save them to a virtual partition. Note that we assume the target table is in memory when we create the link in the corresponding t2 slice.

    `:/db/2009.01.01/t1/ set t1sl:([] id:101 102 103; v:1.1 2.2 3.3)
`:/db/2009.01.01/t1/
    `:/db/2009.01.01/t2/ set ([] id:`t1!t1sl.id?103 101 101 102; n:10 20 30 40)
`:/db/2009.01.01/t2/

We do the same for the second day.

    `:/db/2009.01.02/t1/ set t1sl:([] id:104 105; v:4.4 5.5)
`:/db/2009.01.02/t1/
    `:/db/2009.01.02/t2/ set ([] id:`t1!t1sl.id?105 104 104; n:50 60 70)
`:/db/2009.01.02/t2/

Finally, map the tables and execute a query across the link.

    \l /db
    select date,n,id.v from t2 where date in 2009.01.01 2009.01.02
date       n  v
-----------------
2009.01.01 10 3.3
2009.01.01 20 1.1
2009.01.01 30 1.1
2009.01.01 40 2.2
2009.01.02 50 5.5
2009.01.02 60 4.4
2009.01.02 70 4.4

1.3.8.2 Linking to a Column on Disk

This example creates a link from an enumerated symbol column in t2 to the target table t1 on disk. We first create all slices of t1. Then we fetch each appropriate slice of the link’s target column into memory while creating the corresponding slice of t2.

    `:/db/2009.01.01/t1/ set ([] id:`:/db/sym?`c`b`a; c1: 10 20 30)
`:/db/2009.01.01/t1/
    `:/db/2009.01.02/t1/ set ([] id:`:/db/sym ?`x`a; c1: 40 50)
`:/db/2009.01.02/t1/
    t1id:get `:/db/2009.01.01/t1/id
    `:/db/2009.01.01/t2/ set ([]fid:`t1!t1id?`a`b`a`c; c2: 1 2 3 4.)
`:/db/2009.01.01/t2/
    t1id:get `:/db/2009.01.02/t1/id
    `:/db/2009.01.02/t2/ set ([] fid:`t1!t1id?`x`a`x; c2:5 6 7.)
`:/db/2009.01.02/t2/
    \l /db
    select date,fid.id,fid.c1,c2 from t2 where date<=2009.01.02
date       id c1 c2
-------------------
2009.01.01 a  30 1
2009.01.01 b  20 2
2009.01.01 a  30 3
2009.01.01 c  10 4
2009.01.02 x  40 5
2009.01.02 a  50 6
2009.01.02 x  40 7

We leave it as an exercise to the reader to generalize the previous example into a function that works for an arbitrary number of partitions.


Prev: 1.2 Splayed Tables Next: 1.4 Segments

Table of Contents

Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox