QforMortals/tables

From Kx Wiki
Jump to: navigation, search

Contents

Tables

Overview

Tables form the basis for kdb+. A table is a collection of named columns implemented as a dictionary. Consequently, q tables are column-oriented, in contrast to row-oriented tables in relational databases. Moreover, column contents in q are ordered lists as opposed to SQL, in which the order of rows is undefined. The fact that q tables comprise ordered column lists makes kdb+ very efficient at storing, retrieving and manipulating time-sequenced data.

Kdb+ handles relational and time series data in the unified environment of q tables. There is no separate data definition language, no separate stored procedure language and no need to map internal representations to a separate form for persistence. Just q tables and q expressions.

Tables are built from dictionaries, so it behooves the cursory reader to review dictionaries before proceeding.

Tables

Table is the flip of Column Dictionary

You undoubtedly realized that a table is implemented as a column dictionary that has been flipped (i.e., transposed). The only affect of flipping the column dictionary is to reverse the order of its indices and no data is rearranged under the covers.

For example,

	d:`name`iq!(`Dent`Beeblebrox`Prefect;42 98 126)
	d[`iq;]
42 98 126
	d[;2]
`name`iq!(`prefect;126)
	d[`iq; 2]
126
	t: flip `name`iq!(`Dent`Beeblebrox`Prefect;42 98 126)
	t[;`iq]
42 98 126
	t[2;]
`name`iq!(prefect;126)
	t[2;`iq]
126

For accessing items in a table t created by flipping a column dictionary d, simply reverse the order of the arguments in the projections of d. We also reverse the roles of i and j compared to our examples from dictionaries to make things more natural from the table perspective.

	t[i;]        / dictionary mapping column names to values in row i
	t[i]         / same as previous
	t[;cj]       / vector of column values for column cj

This validates the implementation of a table as a flipped column dictionary since retrieving rows and columns conforms to conventional matrix notation.

Table Definition Syntax

Table definition can also be accomplished using syntax that manifests the columns,

  ([] c1:L1;...;cn:Ln)

Here ci is a symbol containing a column name and Li is the corresponding list of column values. The Li are lists of equal count. The purpose of the square brackets is to specify a primary key.

In our example, we can define t as,

	t:([] name:`Dent`Beeblebrox`Prefect; iq:42 98 126)
	t[;`iq]
42 98 126
	t[2;]
`name`iq!(prefect;126)
	t[2;`iq]
126

Defining t syntactically yields the same result as creating the column dictionary and flipping it. It is arguably simpler and clearer.

Table Display

It is often desirable to examine a table in a tabular form. The function show provides such a display.

	show t
name       iq
--------------
Dent       42
Beeblebrox 98
Prefect    126

Table Information

The column names of a table can be retrieved by using the unary cols.

	cols t
`name`iq

Recall that it is possible to retrieve the column values in a column dictionary using dot notation. This is also true after it is flipped to a table. For a table t and a column c, the expression t.c retrieves the value list for column c. In our example,

	t.name
`Dent`Beeblebrox`Prefect
	t.iq
42 98 126

The dot effectively disassociates a column's values from its name.

Information.png The function meta can be applied to a table t to retrieve its meta data. The result is a keyed table with one record for each column in t. The key column c of the result contains the column names. The column t contains a symbol denoting the type of the column. The column f contains the domains of any foreign keys. The column a contains any attribute associated with the column.
	show meta t
c   | t f a
----| -----
name| s
iq  | i

Records

We observe that count returns the number of rows in the table since each row is an item in the list. In our example,

	count t
3

Now let's inspect the sequence of dictionaries that constitute the rows.

	t[0]
	`name`iq!(`Dent;42)
	show t[0]
name| Dent
iq  | 42
	t[1]
`name`iq!(`Beebelbrox;98)
	show t[1]
name| Beeblebrox
iq  | 98

The dictionary in each row maps the common domain list of column names to the column values of the row. This motivates calling a row dictionary a record in the table.

Information.png A table is a sequentially ordered collection of records. Each record is an association of column names with one row's values.

Sometimes it is useful to separate a record's values from its column names. In this context, we shall refer to the list of row values. The list of row values in the ith row of a table is obtained by retrieving the ith item of each of the column vectors. This is simply the range of the record dictionary,

	value t[1]
(`Beeblebrox;98)

Flipped Column Dictionary vs. List of Records

Is a table a flipped column dictionary or a list of records? Logically it is both, but physically it is stored as a column dictionary with a flipped indicator.

To verify this, we create a list of records, each of which is a dictionary that maps (common) column names to a row's values.

	lrows:(`name`iq!(`Dent;42); `name`iq!(`Beeblebrox;98))

While this list is apparently different from the equivalent column dictionary, observe the curious result when you display the list of rows,

	lrows
+`name`iq!(`Dent`Beeblebrox;42 98)

The q interpreter has recognized that this list conforms to the requirements for a list of records of a table—i.e., the domain lists of all the dictionaries are the same, the range lists have common count, and the types of the range lists are consistent by position. It has converted the list of dictionaries to a flipped column dictionary and the values that we specified record-by-record have been reorganized into column vectors.

Information.png In general, column retrieval and manipulation on a simple column dictionary will be significantly faster than operations on rows. The values in a simple column are stored contiguously, whereas the values in a row must be retrieved by indexing into each column.

Be mindful that deletion of a row is an expensive operation because all the column lists must be compressed to close the resulting gap. This can result in large amounts of data being moved in a table with many rows.

Empty Tables

We saw in the previous section that a table can be defined and populated in one step using table syntax,

	t:([] name:`Dent`Beeblebrox`Prefect; iq:42 98 126)

This is infrequently done in practice, other than for small tests or when the columns are extracted from another table. Often values are deferred to run-time or the value lists may be prohibitively long.

It is useful to create an empty table initially and then populate it later. The empty parentheses here signify the empty list.

	t:([] name:(); iq:())

The table can then be populated, for example, by reading the values from a file.

When an empty table is created, the columns are lists of general type, so data of any type can initially be loaded. The type of each column will be determined by the type of the first item placed in it. Thereafter, type checking is enforced for all inserts and updates, with no type promotion performed.

It is possible to fix the type of any column in an empty table definition by specifying a null list of the appropriate type.

	t:([] name:`symbol$(); iq:`int$())

Equivalently, and arguably simpler,

	t:([] name:0#`; iq:0#0N)

Basic select

Syntax

We shall cover select expressions in depth later, but we provide an introduction here in order to extract and display data in our examples. The basic select expression takes the form,

	select cols from table

where table is either a table or a keyed table and cols is a comma separated list of columns from table. This expression results in a list of all records for the specified columns in table.

The simplest form of select is,

	select from  table

which corresponds to the SQL statement,

#!sql
	SELECT * FROM  table

In q there is no need to write the wildcard character when you want all columns in the table.

Information.png The basic select expression may look familiar from SQL, but it should seem odd to the q newbie who has finally become accustomed to parsing expressions right-to-left. Neither select nor from represent functions that can stand alone. Instead, they are part of a template and always appear together.

Q has a host of extensions to the basic select template whose elements appear between the select and from or after the table element. It is even possible to convert any select template to a purely functional form, although this form isn't particularly friendly to the q newbie.

Displaying the Result

Since the result of select is a list of records, it too is a table.

	select from t
+`name`iq!(`Dent`Beeblebrox`Prefect;42 98 126)

Thus, we can use show to provide a tabular display of the result.

	show select from t
name       iq
--------------
Dent       42
Beeblebrox 98
Prefect    126

We shall use this method of display in what follows unless we need to see the structure of the underlying column dictionary.

Selecting Columns

To select specific columns, list them in the desired order, comma-separated between select and from.

	show select fname from t
fname
------
Arthur
Zaphod
	show select iq,fname from t
iq fname
---------
42 Arthur
98 Zaphod

Primary Keys and Keyed Tables

Keyed Table

In SQL, it is possible to declare column(s) of a table as a primary key. Amongst other things, this means that the values in the column(s) are unique and it possible to retrieve a row via its key value. These two features motivate how q implements a primary key.

We begin with a simple key—i.e., the key is a single column. The idea is to place the key column in a separate table parallel to a table containing the remaining columns. How to associate each key with its corresponding value record? Simple: set up a dictionary mapping between the key records and the associated value records.

A keyed table is a dictionary that maps each row in a table of unique keys to a corresponding row in a table of values.

Simple Example

Let's see how this works for our previous example. Viewing the data table as a flipped dictionary of rows will make things explicit.

	values:flip `name`iq!(`Dent`Beeblebrox`Prefect;42 98 126)

Now say we want to add a key column named 'eid' containing employee identifiers. We place the identifiers in a separate table

	k:flip (enlist `eid)!enlist 1001 1002 1003

Observe that we must enlist both the column name and the range list; otherwise, the domain would be an atom (rather than a list) and the range would be a simple list (and not a list of column vectors).

Now we establish the mapping between the two tables.

	kt:k!values

Voila, a keyed table! The console display of the keyed table shows a dictionary mapping from the flipped dictionary of keys to the flipped dictionary of values,

	kt
(+(,`eid)!,1001 1002 1003)!+`name`iq!(`Dent`Beeblebrox`Prefect;42 98 126)

A more intuitive display is provided by show, which lists the key column(s) on the left, separated by a vertical bar from the value columns on the right.

show kt
eid | name       iq
----| --------------
1001| Dent       42
1002| Beeblebrox 98
1003| Prefect    126
Information.png The key mapping assumes that the key rows and value records are in corresponding order since the dictionary associates a key with the data row in the same position.
Information.png The keys should be unique. As we have already noted, dictionary creation does not enforce uniqueness, but a value row associated with a repeat key is not be accessible via key lookup. It can, however, be retrieved via a select on the key column.

Keyed Table Specification

The console display of a keyed table demonstrates how to define it in one step as a dictionary of flipped dictionaries,

	kt:( flip (enlist `eid)!enlist 1001 1002 1003)!
flip `name`iq!(`Dent`Beeblebrox`Prefect;42 98 126)

Unless you are constructing the keyed table from its constituents, it is simpler to use table syntax. The key column goes between the square brackets and the value columns to the right.

	kt:([eid:1001 1002 1003] name:`Dent`Beeblebrox`Prefect; iq:42 98 126)

To define an empty keyed table, use empty key and value columns.

	ktempty:([eid:()] name:(); iq:())

The empty columns can be typed with either of the following,

	ktempty:([eid:`int$()] `symbol$name:(); iq:`int$())
ktempty:([eid:0#0] name:0#`; iq:0#0)

Accessing Records of a Keyed Table

Since a keyed table is a dictionary mapping, it allows access to records in the value table via key lookup. Remember that the records in the key table and value table are both dictionary mappings for their rows.

	kt[`eid!1002]
`name`iq!(`Beeblebrox;98)

You can abbreviate the full dictionary specification of a key record to the key value. Our example reduces to,

	kt[1002]
`name`iq!(`Beeblebrox;98)

An individual column in the value record can be accessed via repeated indexing or indexing at depth,

	kt[1002][`iq]
98
	kt[1002;`iq]
98
Warning.png The net effect of placing a key on a table is to convert item indexing of the rows to generalized indexing via key value. Otherwise put, the first index is converted from positional retrieval to key lookup.

Retrieving Multiple Records

Given that it is possible to lookup a single record in a keyed table by the key value,

	kt[1001]

you might think it is possible to retrieve multiple records from a keyed table via a simple list of keys, but you would be wrong.

kt[1001 1002]
`length

To lookup multiple key values in a keyed table, you must use a list of enlisted keys.

	show kt[(enlist 1001; enlist 1002)]
name       iq
-------------
Dent       42
Beeblebrox 98

Another way to lookup multiple keys is to index using a table having a single column with the name of the primary key and value list of the desired keys. In our example,

	show kt[([] eid:1001 1002)]
name       iq
-------------
Dent       42
Beeblebrox 98

Components of a Keyed Table

Since a keyed table is a dictionary mapping between the table of keys and the table of values, the functions key and value provide a convenient way to retrieve the two constituent tables.

	show key kt
eid
----
1001
1002
1003
	show value kt
name       iq
--------------
Dent       42
Beeblebrox 98
Prefect    126

A list containing the names of the key column(s) can be retrieved with the function keys.

	keys kt
,`eid

Observe that cols retrieves both the key and value column names for a keyed table,

	cols kt
`eid`name`iq

Tables vs. Keyed Tables

It is sometimes convenient to convert between a regular table having a column of (presumably) unique values and the corresponding keyed table.

The dyadic primitive xkey converts a table with a column of unique values to a keyed table. The right argument of xkey is the table and the left operand is a symbol with the name of the column to be used as the key.

	t:([] eid:1001 1002 1003; name:`Dent`Beeblebrox`Prefect; iq:42 98 126)
	show t
eid  name       iq
-------------------
1001 Dent       42
1002 Beeblebrox 98
1003 Prefect    126
	show `eid xkey t
eid | name       iq
----| --------------
1001| Dent       42
1002| Beeblebrox 98
1003| Prefect    126

Conversely, to convert a keyed table to a regular table, the primitive 0! is used.

	kt:([eid:1001 1002 1003] name:`Dent`Beeblebrox`Prefect; iq:42 98 126)
show kt
eid | name       iq
----| --------------
1001| Dent       42
1002| Beeblebrox 98
1003| Prefect    126
	show 0!kt
eid  name       iq
-------------------
1001 Dent       42
1002 Beeblebrox 98
1003 Prefect    126
Information.png The conversion expressions above do not affect the original table. You must refer to the table by name to modify the original.
	`eid xkey `t
`t
 	show t
eid | name       iq
----| --------------
1001| Dent       42
1002| Beeblebrox 98
1003| Prefect    126

	0!`kt
`kt
 	show kt
eid  name       iq
-------------------
1001 Dent       42
1002 Beeblebrox 98
1003 Prefect    126
Warning.png It is possible to apply xkey against a column that does not contain unique values. The result is a keyed table that does not have a primary key. Any duplicate key values will not be accessible via key lookup
	t:([] eid:1001 1002 1003 1001; name:`Dent`Beeblebrox`Prefect`Dup )
	show t
eid  name
---------------
1001 Dent
1002 Beeblebrox
1003 Prefect
1001 Dup
	show ktdup:`eid xkey t
eid | name
----| ----------
1001| Dent
1002| Beeblebrox
1003| Prefect
1001| Dup
	show ktdup 1001
name| Dent

Compound Primary Key

We understand that the implementation of a table with a simple key is actually a dictionary mapping between a pair of tables in which the first table has a single key column. This has a straightforward extension to a compound key.

Recall that a compound key in SQL is a collection of two or more columns that together provide a unique value for each row. To implement a compound key in q, we generalize the key table from a single column to multiple columns by requiring that for each record in the key table, the combination of column values is unique.

Here is our example redone to replace the employee id with a compound key comprising the last and first names,

	ktc:([lname:`Dent`Beeblebrox`Prefect;
	fname:`Arthur`Zaphod`Ford];
	iq:42 98 126)

Observe that show displays a compound keyed table with the key columns on the left separated by a vertical bar | from the value columns to the right.

	show ktc
lname      fname | iq
-----------------| ---
Dent       Arthur| 42
Beeblebrox Zaphod| 98
Prefect    Ford  | 126

As in the case of a simple primary key, we can abbreviate the full key record to the key value for retrieval,

	ktc[`Dent`Arthur]
(,`iq)!,42

Retrieval of multiple records via a compound primary key is actually easier than with a simple key, since each compound key value is already a list.

	show ktc (`Dent`Arthur; `Prefect`Ford)
iq
---
42
126

Here is how to initialize our example as an empty table,

	ktc:([lname:();fname:()] iq:())

The empty keyed table can also be type with either of,

	ktc:([lname:`symbol$();fname:`symbol$()] iq:`int$())
	ktc:([lname:0#`;fname:0#`] iq:0#0)
Information.png For the fundamentalists, here is the same compound key table built from its constituent pair of tables
	ktc:(flip `lname`fname!(`Dent`Beeblebrox;`Arthur`Zaphod))!
	flip (enlist `iq)!enlist 42 98 126

And here is retrieval by full key record,

	ktc[`lname`fname!`Beeblebrox`Zaphod]
(,`iq)!,98

Most will agree that the table definition syntax and abbreviated key value retrieval is simpler.

Foreign Keys and Virtual Columns

A foreign key in SQL is a column in one table whose values are members of a primary key column in another table. Foreign keys provide the mechanism for establishing relations between tables.

One of the important features of a foreign key is that the RDBMS enforces referential integrity, meaning that the value in a foreign key column is required to be in the referred primary key column. If you want to insert a foreign key value that is not in the primary key column, it must first be inserted into the primary key column.

Definition of Foreign Key

Q has the notion of a foreign key that also provides referential integrity. Extra credit to the reader who realizes that a foreign key is implemented using an enumeration.

In our introduction to enumerations, we saw that an enumeration domain can be any list with unique items. A keyed table meets the criterion of a unique domain, since the key records in the dictionary domain are unique.

A foreign key is a table column defined as an enumerated value over a keyed table. As an enumeration, a foreign key does indeed provide referential integrity by restricting values in the foreign key column to be in the list of primary key values.

Example of Simple Foreign Key

An enumeration over a keyed table domain acts just like our simple enumerations. Let's return to our previous example,

	kt:([eid:1001 1002 1003] name:`Dent`Beeblebrox`Prefect; iq:42 98 126)

To enumerate over the primary key of kt, use a symbol containing the keyed table name as the domain in the enumeration,

	`kt$

The primary key table records provide the unique set of values for enumerating records,

	`kt$`eid!1001
	`kt$1001

As usual, q saves us the trouble of being so explicit and allows the enumeration to be applied to items in the value list for the primary key dictionary—that is, the primary key values.

	e1:`kt$1002 1001 1001 1003 1002 1003
	e1 = 1003
000101b

As with any enumeration, attempting to enumerate a key value that is not in the domain causes an error,

	`kt$1004
`cast

We can use table definition syntax to define a table with a foreign key over kt.

	tdetails:([] eid:`kt$1003 1001 1002 1001 1002 1001; sc:126 36 92 39 98 42)

The foreign key column is simply defined as an enumeration over the keyed table.

We see the foreign key table when we invoke meta on the table.

	show meta tdetails
c  | t f  a
---| ------
eid| i kt
sc | i

Foreign Keys and Relations

In SQL, an inner join is used to splice back together data that has been normalized. The splice is usually done along a foreign key, which establishes a relation to the keyed table via the primary key. In the join, columns from both tables are available using dot notation.

The same effect can be achieved using foreign keys in q without explicitly creating the joined table. The notation is similar but different enough to warrant close attention.

Let tf be a table having a foreign key f whose enumeration domain is the keyed table kt. All columns in kt are available via dot notation in any select expression whose from domain is tf. A column c in kt that is accessed in this way is called a virtual column and is notated with dot notation f.c in the select expression.

For example, given t as above, we create a details table that contains individual test results for each person. We name the foreign key in the details table the same as the primary key it refers to, but this is not required,

	tdetails:([] eid:`kt$1003 1001 1002 1001 1002 1001; sc:126 36 92 39 98 42)

Now we can access columns in t via a select on tdetails,

	show select eid.name,sc from tdetails
name       sc
--------------
Prefect    126
Dent       36
Beeblebrox 92
Dent       39
Beeblebrox 98
Dent       42

The case in which the enumeration domain of a foreign key has a compound primary key is slightly more complicated and will be covered later.

Working with Tables and Keyed Tables

In this section, we use the following examples.

	t:([] name:`Dent`Beeblebrox`Prefect; iq:42 98 126)
	kt:([eid:1001 1002 1003] name:`Dent`Beeblebrox`Prefect; iq:42 98 126)

First and Last Records

Because a table is a list of records, the functions first and last retrieve the first and last records, respectively.

	first t
`name`iq!(`Dent;42)
	last t
`name`iq!(`Prefect;126)
	first kt
`name`iq!(`Dent;42)
	last kt
`name`iq!(`Prefect;126)

These functions are useful in select expressions, especially with grouping and aggregation.

Information.png Every table in kdb+ has a first and last record since it is an ordered list of records. Moreover, the result of a select template is a table and so is ordered. Contrast this with SQL in which tables and result sets are unordered. You must use ORDER BY to impose an order.

You can retrieve the first or last n records of a table or keyed table using the take (#) operator.

	show 2#t
name       iq
-------------
Dent       42
Beeblebrox 98
	show -3#kt
eid | name       iq
----| --------------
1001| Dent       42
1002| Beeblebrox 98
1003| Prefect    126

Find

The find (?) primitive used with a table performs a reverse lookup of a record and returns the corresponding row number. With t as above,

	t?`name`iq!(`Dent;42)
0

As usual, the record can be abbreviated to a list of row values,

	t?(`Dent;42)
0

You can reverse lookup a list of multiple row values

	t?((`Dent;42);(`Prefect;126))
0 2

For a keyed table, find performs a reverse lookup of a value record and returns the key record.

	kt?`name`iq!(`Dent;42)
(,`eid)!,1001
	kt?(`Dent;42)
(,`eid)!,1001

In the case of find on a table with a single column, each list of row values must be a singleton list,

	t1:([] eid:1001 1002 1003)
	t1?(enlist 1001; enlist 1002)
0 1

The list of singletons can be created by either of the following expressions, although the first executes faster for long lists.

	flip enlist 1001 1002
(,1001;,1002)
	enlist each 1001 1002
(,1001;,1002)

Primitive Joins

The join (,) operator is defined for tables and keyed tables.

Only tables having exactly the same list of column names can be joined. Since a table is a list of records, the result is obtained by appending the rows of the right operand to those of the left operand.

	t1:([] a:1 2 3; b:100 200 300)
	t2:([] a:3 4 5; b:300 400 500)
	show t1,t2
a b
-----
1 100
2 200
3 300
3 300
4 400
5 500

Note that common rows are duplicated in the result.

Two tables with the same columns in different order cannot be joined with , because the order of columns in records is significant in q,

	t3:([]b:1001 2001 3001; a:101 201 301)
	show t1,t3
'mismatch

Two keyed tables with the same key and value columns can be joined. Because a keyed table is a dictionary, the result has upsert semantics just as join on dictionaries. Keys in the right operand that are not in the left operand are treated as inserts, whereas the right operand acts as an update for common keys.

	t4:([a:1 2 3] b:10 20 30)
	t5:([a:3 4 5] b:300 400 500)
	show t4,t5
a| b
-| ---
1| 10
2| 20
3| 300
4| 400
5| 500

Column Joins

Two tables with the same number of rows can be combined with join-each (,') to form a sideways, or column, join in which the columns are aligned in parallel.

	t1:([] a:1 2 3)
	t2:([] b:100 200 300)
	show t1,'t2
a b
-----
1 100
2 200
3 300

When the column lists of the tables are not disjoint, the operation on the common columns has upsert semantics because each column is a dictionary,

	t3:([] a:10 20 30; b:100 200 300)
	show t1,'t3
a  b
------
10 100
20 200
30 300

Because keyed tables are dictionaries, they can only be sideways joined if they have identical key columns. In such a situation, we can deduce the behavior by recalling from Operations on Dictionaries that any operation on a dictionary is applied on the common elements of the merged domains and is extended to the non-common domain elements with appropriate nulls.

Thus, a sideways join on keyed tables with identical key columns has simple upsert semantics for common data columns. More interesting are the non-common data columns, where the result becomes a column join spliced along common key values.

	t4:([a:1 2 3] x:100 200 300)
	show t4
a| x
-| ---
1| 100
2| 200
3| 300
	t5:([a:3 4 5] y:1000 2000 3000)
	show t5
a| y
-| ----
3| 1000
4| 2000
5| 3000
	show t4,'t5
a| x   y
-| --------
1| 100
2| 200
3| 300 1000
4|     2000
5|     3000

Complex Column Data

Simple Example

Recall from the definition of a column dictionary that there is no restriction that the column vectors must be lists of simple types. We have heretofore worked with examples having atomic values in each column because they correspond to familiar SQL tables, but there is no need to limit ourselves to simple columns.

Suppose we want to keep track of a pair of daily observations, say a low temperature and a high temperature. We can do this by storing the low and high values in separate columns,

	t1:([] d:2006.01.01 2006.01.02; l:67.9 72.8; h:82.1 88.4)
	show t1
d          l    h
--------------------
2006.01.01 67.9 82.1
2006.01.02 72.8 88.4
	t1[0]
`d`l`h!(2006.01.01;67.9;82.1)
	t1.l
67.9 72.8
	t1.h
82.1 88.4

We can also store pairs in a single column,

	t2:([] d:2006.01.01 2006.01.02; lh:(67.9 82.10; 72.8 88.4))
	show t2
d          lh
--------------------
2006.01.01 67.9 82.1
2006.01.02 72.8 88.4
	t2[0]
`d`lh!(2006.01.01;67.9 82.1)
	t2.lh
(67.9 82.1;72.8 88.4)
	t2.lh[;0]
67.9 72.8
	t2.lh[;1]
82.1 88.4

The first form is arguably more natural if you intend to manipulate the low and high values separately.

Compound Foreign Key

Storing multiple values in a column is how to make a foreign key on a compound primary key. Let's return to the example using last name and first name as the primary key,

	ktc:([lname:`Dent`Beeblebrox`Prefect; fname:`Arthur`Zaphod`Ford]; iq:42 98 126)

We create a details table with a foreign key enumeration over ktc by placing the names in the foreign key column,

	tdetails:([] name:`ktc$(`Dent`Arthur;`Prefect`Ford;`Dent`Arthur); sc:36 126 42)

The columns of ktc are available as virtual columns from tdetails,

	show select name.lname,name.iq,sc from tdetails
lname   iq  sc
---------------
Dent    42  36
Prefect 126 126
Dent    42  42

The previous example of storing two values in a column can easily be generalized to the situation of n-tuples. In this case, storing multiple values in a single column has a definite advantage since defining and populating n columns is unwieldy when n is not known in advance. Storing and retrieving n-tuples to/from a single column is a simple operation in q. A useful example in finance is storing daily values for a yield curve.

Operations on Complex Column Data

We generalize the above example to the case of storing a set of repeated observations in which the number of observations is not fixed—i.e., varies with each occurrence. Say we want to perform a statistical analysis on the weekly gross revenues for movies and we don't care about the specific titles. Since there will be a different number of movies in release each week, the number of observations will not be constant. An oversimplified version of this might look something like,

	t3:([] wk:2006.01.01 2006.01.08; gr:( 38.92 67.34; 16.99 5.14 128.23 31.69))
	show t3
w          gr
----------------------------------
2006.01.01 38.92 67.34
2006.01.08 16.99 5.14 128.23 31.69

Handling the situation in which the number of column values is not known in advance, or is variable, is cumbersome in SQL. You normalize the data into a master-detail pair of tables, but you cannot re-assemble the details into separate columns via a join. Instead, for each master record you get a collection of records that must be iterated over via some sort of cursor/loop. In verbose programming, this results in many lines of code that are slow and prone to error on edge cases.

By storing complex values in a single column in a q table, sophisticated operations can be performed in a single expression that executes fast. In the following q-sql examples, don't worry about the details of the syntax, and remember to read individual expressions from right to left. Observe that because there are no stinking loops, we never need to know the number of detail records.

Using our movie data, we can produce the sorted gross, the average and high gross for each week in one expression,

	show select wk, srt:desc each gr, avgr:avg each gr, hi:max each gr from t3
wk         srt                     avgr    hi
-------------------------------------------------
2006.01.01 67.34 38.92             53.13   67.34
2006.01.08 128.23 31.69 16.99 5.14 45.5125 128.23

While sorts and aggregates such as Max and Avg are standard SQL, think of how you'd produce the sorted sublist and the aggregates together. In your favorite verbose programming environment, you'll soon discover that you have a sordid list of rows requiring a loop to unravel into a single output line

Now think about what you'd do to compute the percentage drops between successive gross numbers within each week. Because the sorted detail items are rows in SQL, this requires another loop. In q,

	show select wk,drp:neg 1_'deltas each desc each gr,avgr:avg each gr,hi:max each gr from t3
wk         drp              avgr    hi
------------------------------------------
2006.01.01 28.42            53.13   67.34
2006.01.08 96.54 14.7 11.85 45.5125 128.23

Attributes

Attributes are metadata that apply to lists of special form. They are often used on a dictionary domain or a table column to reduce storage requirements or speed retrieval.

Important: Attributes are descriptive rather than prescriptive. A modification that respects the form specified by the attribute leaves the attribute intact. A modification that does not respect the organization is not prohibited and the attribute is lost.

The syntax for applying an attribute looks like the verb # with a left operand containing the symbol for the attribute and the list as the right operand. However, this use of # is not functional.

Advanced: You will probably not see significant benefit from any attribute for less than a million items. For parted, this number is a billion

Sorted (`s#)

Applying the sorted attribute to a list indicates that the items of the list are in sorted order. When a list has the sorted attribute, the default linear search used in lookups is replaced with binary search. The following fragments show situations in which this applies,

x?v
... where x = v, ...
... where x in v, ...
... where x within v, ...

The sorted attribute can be applied to a simple list,

        L:`s#1 2 2 4 8
        L
`s#1 2 2 4 8

        L,:16		/ respects sort
        L
`s#1 2 2 4 8 16
	
        L,:0			/  attribute lost
        L
1 2 2 4 8 16 0

or a column of a table,

        t:([]`s#t:04:02:42.001 04:02:42.003;v:101.05 100.95)

The sorted attribute can be applied to a dictionary, which makes the dictionary into a step function,

        ds:`s#1 2 3 4 5!`a`b`c`d`e
        ds
`s#`s#1 2 3 4 5!`a`b`c`d`e

Applying the sorted attribute to a table implies binary search on the table and also that the first column is sorted,

        ts:`s#([]t:04:02:42.001 04:02:42.003;v:101.05 100.95)
        ts
`s#+`t`v!(`s#04:02:42.001 04:02:42.003;101.05 100.95)

Applying the sorted attribute to a keyed table means that the dictionary, its key table and its value table are all sorted,

        kt:`s#([k:1 2 3 4] v:`d`c`b`a)
        kt
`s#(`s#+(,`k)!,`s#1 2 3 4)!+(,`v)!,`d`c`b`a

Unique (`u#)

Applying the unique attribute to a list indicates that the items of the list are distinct. Operations on the list must preserve uniqueness or the attribute is list.

        LU:`u#4 2 6 18 1
        LU
`u#4 2 6 18 1

        LU,:0		/ uniqueness preserved
        LU
`u#4 2 6 18 1 0

        LU,:2		/  attribute lost
        LU
4 2 6 18 1 0 2

The unique attribute can be applied to the domain of a dictionary dictionaries, a column of a table, of the key column of a keyed table. It cannot be applied to a dictionary, a table or a keyed table directly

Grouped (`g#)

Consider grouping when there is significant repetition. This will replace linear search with hashtable lookup.

	L:`g#1 2 3 2 3 4 3 4 5  2 3 4 5 4 3 5 6

Applying the grouped attribute to a table column corresponds to a SQL index.

	t:([]`g#c1:1 2 3 2 3 4 3 4; c2:`a`b`a`c`a`d`b`c)

Parted (`p#)

Parted applies when the values are distributed over a sequence of values in a domain. Domains can be int or time-related, such as years, months, days, etc. Parted results in a physical segmentation across the domain values. It should be considered when the number of entities reaches a billion.


Prev: Dictionaries, Next: Queries: q-sql

©2006 Kx Systems, Inc. and Continuux LLC. All rights reserved.

Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox