QforMortals2/tables

From Kx Wiki
Jump to: navigation, search

Contents

Tables

Overview

Tables form the basis for kdb+. A table is a collection of named columns implemented as a dictionary. Consequently, q tables are column-oriented, in contrast to row-oriented tables in relational databases. Moreover, a column's values in q comprise an ordered list; this contrasts to SQL, in which the order of rows is undefined. The fact that q tables comprise ordered column lists makes kdb+ very efficient at storing, retrieving and manipulating sequenced data. One important example is data that arrives in time sequence.

Kdb+ handles relational and time series data in the unified environment of q tables. There is no separate data definition language, no separate stored procedure language and no need to map internal representations to a separate form for persistence. Just q tables, expressions and functions.

Tables are built from dictionaries, so it behooves the cursory reader to review Dictionaries before proceeding.

Table Definition

Table is the flip of Column Dictionary

You undoubtedly realized at the end of Dictionaries that a table is implemented as a column dictionary that has been flipped (i.e., transposed). The only effect of flipping the column dictionary is to reverse the order of its indices; no data is rearranged under the covers.

Warning.png Note: All tables have type 98h.

For example,

        d:`name`iq!(`Dent`Beeblebrox`Prefect;98 42 126)
        d[`iq;]
98 42 126

        d[;2]
name| `Prefect
iq      | 126

        d[`iq; 2]
126

        t: flip `name`iq!(`Dent`Beeblebrox`Prefect;98 42 126)
        t[;`iq]
98 42 126

        t[2;]
name| `Prefect
iq     | 126

        t[2;`iq]
126

To access items in a table t created by flipping a column dictionary d, simply reverse the order of the arguments in the projections of d. We also reverse the roles of i and j compared to dictionaries to make things more natural from the table perspective.

t[i;] / row i is dictionary mapping column names to values
t[i] / ith element of list t...same as previous
t[;cj] / vector of column values for column cj

This validates the implementation of a table as a flipped column dictionary. Retrieving rows and columns conforms to conventional matrix notation in which the first index denotes the row and the second index the column.

Table Display

Observe that rows and columns of a table display are indeed the transpose of the dictionary display, even though the internal data layout is the same.

	d
name| Dent Beeblebrox Prefect
iq  | 98   42         126

        t
name       iq
--------------
Dent       98
Beeblebrox 42
Prefect    126

Table Definition Syntax

Table definition can also be accomplished using a syntax that manifests the columns,

([] c1:L1;...;cn:Ln)

Here c1 is a symbol containing a column name and L1 is the corresponding list of column values. The L1 are lists of equal count, but in some circumstances can be atoms. The purpose of the square brackets is to specify a primary key and will be explained in Basic Select.

Warning.png Note: For readability, we shall normally include optional whitespace after the closing square bracket and to the right of semicolon separators.

In our example, we can define t as,

        t:([] name:`Dent`Beeblebrox`Prefect; iq:98 42 126)
        t[;`iq]
98 42 126

        t[2;]
name| `Prefect
iq     | 126

        t[2;`iq]
126

Defining t syntactically yields the same result as creating the column dictionary and flipping it. It is arguably simpler and clearer.

The value columns can be stored in variables, which is useful for programmatic table definition.

        c1:`Dent`Beeblebrox`Prefect
        c2:98 42 126
        t:([]c1;c2)
        t
c1         c2
--------------
Dent       98
Beeblebrox 42
Prefect    126
Warning.png Note: When all Li are singleton lists - that is, you are defining a table with a single row - they must be enlisted.
        tt:([]c1:`a;c2:100)
'type
        tt:([]c1:enlist `a; c2:enlist 100)
Warning.png Note: When at least one column is a list and one or more columns are atoms, each atom column is extended into a list whose count matches the other columns. This can be used to assign a default value.
        tdef:([]c1:`a`b`c; c2:42; c3:1.1 2.2 3.3)
        tdef
c1 c2 c3
---------
a  42 1.1
b  42 2.2
c  42 3.3
Information.png Advanced: If you create a table as the flip of a column dictionary, item-wise extension of an atom column is not performed on the dictionary definition but it is performed when the column dictionary is flipped into a table.
        ddef:`c1`c2`c3!(`a`b`c;42;1.1 2.2 3.3)
        ddef
c1| `a`b`c
c2| 42
c3| 1.1 2.2 3.3

        flip ddef
c1 c2 c3
---------
a  42 1.1
b  42 2.2
c  42 3.3

Table Metadata

The column names of a table can be retrieved by using the unary cols.

        cols t
`name`iq

Recall that it is possible to retrieve the column values in a column dictionary using dot notation. This is also true after it is flipped to a table. For a table t and a column c, the expression t.c retrieves the value list for column c. In our example,

        t.name
`Dent`Beeblebrox`Prefect
        t.iq
98 42 126

The dot effectively disassociates a column's values from its name.

The function meta can be applied to a table t to retrieve its metadata. The result is a keyed table with one record for each column in t. The key column c of the result contains the column names. The column t contains a symbol denoting the type of the column. The column f contains the domains of any foreign keys. The column a contains any attribute associated with the column.

        meta t
c | t f a
--| -----
c1| s
c2| i
Information.png Advanced: If the result of meta displays an upper case type char for a column, this indicates that column is a non-simple list in which each item is a list of the corresponding type. Such tables arise, for example, when you group without aggregating in a select.
        t:([] sc:1 2 3; nsc:(1 2; 3 4; 5 6 7))
        t
sc nsc
--------
1  1 2
2  3 4
3  5 6 7
Information.png Advanced: The function tables XE "tables (function)" takes a symbol representing a context (see workspace organization) and returns a sorted list of symbol names of the tables in that context. For example,
tables `.
`s#`t`tt

lists all the tables in the default context. Alternatively, the command \a provides tha same result. If no parameter is provided, it returns the result for the current context.

Records

We observe that count returns the number of rows in the table since each row is an item in the list. In our example,

        count t
3

Now let's inspect the sequence of dictionaries that comprise the rows.

        t[0]
c1| `Dent
c2| 98

        t[1]
name| Beeblebrox
iq  | 98

The dictionary in each row maps the common domain list of column names to the column values of the row. This motivates calling each row dictionary a record in the table.

Warning.png Important: A table is a sequentially ordered list of records. Each record is an association of column names with one row's values.

Sometimes it is useful to separate a record's values from its column names. In this context, we shall refer to the row value list. The row value list for the ith row of a table is obtained by retrieving the ith item of each of the column vectors. This is simply the range of the record dictionary.

        value t[1]
`Beeblebrox
42

Flipped Column Dictionary vs. List of Records

Is a table a flipped column dictionary or a list of records? Logically it is both, but physically it is stored as a column dictionary with a flipped indicator.

To verify this, we create a list of records, each of which is a dictionary that maps (common) column names to a row's values.

        lrows:(`name`iq!(`Dent;98); `name`iq!(`Beeblebrox;42))

While this list is apparently different from the equivalent column dictionary, observe the curious result when you display the list of rows,

        lrows
name       iq
-------------
Dent       98
Beeblebrox 42

The q interpreter has recognized that this list conforms to the requirements for a list of records of a table - i.e., the domain lists of all the dictionaries are the same, the range lists have common count, and the types of the range lists are consistent by position. It has converted the list of dictionaries to a flipped column dictionary by reorganizing the values that we specified record-by-record into column vectors.

Warning.png Advanced: In general, column retrieval and manipulation on a simple column dictionary will be significantly faster than operations on rows. The values in a simple column are stored contiguously, whereas the values in each row must be retrieved by indexing into all columns.

Be mindful that deletion of a row is an expensive operation because all the column lists must be compressed to close the resulting gap. This can result in large amounts of data being moved in a table with many rows.

Empty Tables and Schema

We saw in the previous section that a table can be defined and populated in one step using table syntax.

        t:([] name:`Dent`Beeblebrox`Prefect; iq:98 42 126)

This is infrequently done with individual values in practice, other than for small tests. Often values are deferred to run-time or the value lists may be prohibitively long.

In these circumstances, it is useful to create an empty table initially and then populate it later. The empty parentheses here signify the empty list.

        t:([] name:(); iq:())

The table will then be populated, for example, by reading the values from a file.

When an empty table is created as above, the columns are lists of general type, so data of any type can initially be loaded. The type of each column will be determined by the type of the first item placed in it. Thereafter, type checking is enforced for all inserts and updates, with no type promotion performed.

It is possible to fix the type of any column in an empty table definition by specifying a null list of the appropriate type.

        t:([] name:`symbol$(); iq:`int$())

Shorter, and arguably less obvious,

        t:([] name:0#`; iq:0#0N)
Warning.png Note: Either of the previous two forms of empty table definition is the q version of the table's schema.

Basic select

We shall use the following definition in this section,

        t:([] name:`Dent`Beeblebrox`Prefect; iq:98 42 126)

Syntax

We shall cover select expressions in depth in q-sql, but we provide an introduction here in order to extract and display data in our examples. The basic select expression takes the form,

select cols from table

where table is either a table or a keyed table and cols is a comma separated list of columns from table. This expression results in a list of all records for the specified columns in table.

The simplest form of select is,

select from table

which corresponds to the SQL statement,

SELECT * FROM table

In q you do not need to write the wildcard character when you want all columns in the table.

Warning.png Note: The basic select expression may look familiar from SQL, but it should seem odd to the q newbie who is finally becoming accustomed to parsing expressions right-to-left. Neither select nor from represent functions that can stand alone. Instead, they are part of a template and always appear together.

Q has a host of extensions to the basic select template whose elements appear between the select and from or after the table element. As we shall see in q-sql, it is possible to convert any select template to a purely functional form, although this form isn't particularly friendly to the q newbie.

Displaying the Result

Since the result of select is a list of records, it too is a table.

       select from t
name       iq
--------------
Dent       42
Beeblebrox 98
Prefect    126

We shall use this method of display in what follows unless we need to see the structure of the underlying column dictionary.

Selecting Columns

To select specific columns, list them in the desired order, comma-separated, between select and from.

        select name from t
name
------
Dent
Beeblebrox
Prefect

        select iq,name from t
iq  name
--------------
98  Dent
42  Beeblebrox
126 Prefect

Basic update

The syntax of basic update is similar to select, but named columns represent replacement by the specified values. In our example,

        show update iq:iq%100 from t
name       iq
---------------
Dent       0.98
Beeblebrox 0.42
Prefect    1.26

Primary Keys and Keyed Tables

Keyed Table

In SQL, it is possible to declare column(s) of a table as a primary key. Amongst other things, this means that the values in the column(s) are unique, making it possible to retrieve a row via its key value. These two features motivate how q implements a primary key.

We begin with a simple key - i.e., the key is a single column. The idea is to place the key column in a separate table parallel to a table containing the remaining columns. How to associate each key with its corresponding value record? Simple: set up a dictionary mapping between the key records and the associated value records.

A keyed table is a dictionary that maps each row in a table of unique keys to a corresponding row in a table of values.

Simple Example

Let's see how this works for our previous example. Viewing the data table as a flipped dictionary of rows will make things explicit.

        values:flip `name`iq!(`Dent`Beeblebrox`Prefect;98 42 126)

Now say we want to add a key column named eid containing employee identifiers. We place the identifiers in a separate table. Recall from Column Dictionary with a Single Column that we must enlist both the column name and the value list for a column dictionary having a single column.

        k:flip (enlist `eid)!enlist 1001 1002 1003

Now we establish the mapping between the two tables.

        kt:k!values

Voilà!—a keyed table. The console display of a keyed table lists the key column(s) on the left, separated by a vertical bar from the value columns on the right.

        kt
eid | name       iq
----| --------------
1001| Dent       98
1002| Beeblebrox 42
1003| Prefect    126
Warning.png Note: The key mapping assumes that the key rows and value records are in corresponding order since the dictionary associates a key with the data row in the same position.
Warning.png Note: The keys should be unique. As we have already noted, dictionary creation does not enforce uniqueness, but a value row associated with a repeat key is not be accessible via key lookup. It can be retrieved via a select on the key column.

Keyed Table Specification

The console display of a keyed table demonstrates how to define it in one step as a dictionary of flipped dictionaries,

        kt:(flip (enlist `eid)!enlist 1001 1002 1003)!flip `name`iq!(`Dent`Beeblebrox`Prefect;98 42 126)

Unless you are constructing the keyed table from its constituents, it is simpler to use table syntax. The key column goes between the square brackets and the value columns to the right as in a normal table definition.

 #!q
       kt:([eid:1001 1002 1003] name:`Dent`Beeblebrox`Prefect; iq:98 42 126)

To define an empty keyed table, use empty key and value columns.

        ktempty:([eid:()] name:(); iq:())

The empty columns can be typed with either of the following constructs,

        ktempty:([eid:`int$()] `symbol$name:(); iq:`int$())
        ktempty:([eid:0#0] name:0#`; iq:0#0)

Accessing Records of a Keyed Table

Since a keyed table is a dictionary mapping, it provides access to records in the value table via key lookup. Remember that the records in the key table and value table are both dictionary mappings for their rows.

        kt[`eid!1002]
name| `Beeblebrox
iq  | 42

You can abbreviate the full dictionary specification of a key record to its key value. Our example reduces to,

        kt[1002]
name| `Beeblebrox
iq  | 42

An individual column in the value record can be accessed via repeated indexing or indexing at depth.

        kt[1002][`iq]
42
        kt[1002;`iq]
42
Warning.png Important: The net effect of placing a key on a table is to convert item indexing of the rows to generalized indexing via key value. Otherwise put, the first index is converted from positional retrieval to key lookup.

Retrieving Multiple Records

Given that it is possible to lookup a single record in a keyed table by the key value,

        kt[1001]

you might think it is possible to retrieve multiple records from a keyed table via a simple list of keys. You would be wrong.

        kt[1001 1002]
`length

To lookup multiple key values in a keyed table, you must use a list of enlisted keys.

        kt[(enlist 1001; enlist 1002)]
name       iq
-------------
Dent       98
Beeblebrox 42

A fast way to do this is,

         kt[flip enlist 1001 1002]
name       iq
-------------
Dent       98
Beeblebrox 42

Another convenient way to lookup multiple keys is to index using a table having a single column with the name of the primary key and value list of the desired keys. In our example,

        kt[([] eid:1001 1002)]
name       iq
-------------
Dent       98
Beeblebrox 42

This works because the records of the inner table are in the domain of the keyed table dictionary. See Operations on Dictionaries for details.

If you want to retrieve the full entries of the keyed table instead of just the value records, use the # operator.

        ([]eid:1001 1002)#kt
eid | name       iq
----| -------------
1001| Dent       98
1002| Beeblebrox 42

Reverse Lookup

Because a keyed table is a dictionary, it is possible to perform reverse lookup from a value to a key. In a simple example,

        kts:[eid:1001 1002 1003]
        name:`Dent`Beeblebrox`Prefect)
        kts
eid | name
----| ----------
1001| Dent
1002| Beeblebrox
1003| Prefect

        kts?`Prefect
eid| 1003

Components of a Keyed Table

Since a keyed table is a dictionary mapping between the table of keys and the table of values, the functions key and value provide a convenient way to retrieve the two constituent tables.

        key kt
eid
----
1001
1002
1003

        value kt
name       iq
--------------
Dent       98
Beeblebrox 42
Prefect    126

A list containing the names of the key column(s) can be retrieved with the function keys.

        keys kt
,`eid

Observe that cols retrieves both the key and value column names for a keyed table.

        cols kt
`eid`name`iq

Tables vs. Keyed Tables

It is sometimes convenient to convert between a regular table having a column of (presumably) unique values and the corresponding keyed table.

The dyadic primitive xkey converts a table with a column of unique values to a keyed table. The right argument of xkey is the table and the left operand is a symbol (or list of symbols) with the name of the column(s) to be used as the key(s).

        t:([] eid:1001 1002 1003; name:`Dent`Beeblebrox`Prefect; iq:98 42 126)
        t
eid  name       iq
-------------------
1001 Dent       98
1002 Beeblebrox 42
1003 Prefect    126

       `eid xkey t
eid | name       iq
----| --------------
1001| Dent       98
1002| Beeblebrox 42
1003| Prefect    126

Conversely, to convert a keyed table to a regular table, use xkey with an empty table as the left operand.

        kt:([eid:1001 1002 1003] name:`Dent`Beeblebrox`Prefect; iq:98 42 126)
        kt
eid | name       iq
----| --------------
1001| Dent       98
1002| Beeblebrox 42
1003| Prefect    126

        () xkey kt
eid  name       iq
-------------------
1001 Dent       98
1002 Beeblebrox 42
1003 Prefect    126
Warning.png Note: The conversion expressions above do not affect the original table. You must refer to the table by name to modify the original.
        `eid xkey `t
`t
        t
eid | name       iq
----| --------------
1001| Dent       98
1002| Beeblebrox 42
1003| Prefect    126

        () xkey `kt
`kt

        kt
eid  name       iq
-------------------
1001 Dent       98
1002 Beeblebrox 42
1003 Prefect    126
Warning.png Advanced: It is possible to apply xkey against a column that does not contain unique values. The result is a keyed table that does not have a primary key.
        t:([] eid:1001 1002 1003 1001; name:`Dent`Beeblebrox`Prefect`Dup )
        t
eid  name
---------------
1001 Dent
1002 Beeblebrox
1003 Prefect
1001 Dup

        ktdup:`eid xkey t
        ktdup
eid | name
----| ----------
1001| Dent
1002| Beeblebrox
1003| Prefect
1001| Dup

Duplicate key values are not accessible via key lookup,

        ktdup 1001
name| Dent

but they are accessible via select.

        select from ktdup where eid=1001
eid | name
----| ----
1001| Dent
1001| Dup

Compound Primary Key

We understand that the q implementation of a SQL table with a simple key is actually a dictionary mapping between a pair of tables in which the first table has a single key column. This has a straightforward extension to a compound key.

Recall that a compound key in SQL is a collection of two or more columns that together provide a unique value for each row. To implement a compound key in q, we generalize the key table from a single column to multiple columns by requiring that each record in the key table has a unique combination of column values.

Here is our example redone to replace the employee id with a compound key comprising the last and first names.

          ktc:([lname:`Dent`Beeblebrox`Prefect; fname:`Arthur`Zaphod`Ford]; iq:98 42 126)

Observe that the console displays a compound keyed table with the key columns on the left separated by a vertical bar | from the value columns to the right.

          ktc
lname      fname | iq
-----------------| ---
Dent       Arthur| 98
Beeblebrox Zaphod| 42
Prefect    Ford  | 126

As in the case of a simple primary key, we can abbreviate the full key record to the key value for retrieval.

        ktc[`Dent`Arthur]
iq| 98

Here is how to initialize our example as an empty table,

        ktc:([lname:();fname:()] iq:())

The empty keyed table can be typed with either of the following,

        ktc:([lname:`symbol$();fname:`symbol$()] iq:`int$())

        ktc:([lname:0#`;fname:0#`] iq:0#0)

We shall see in Insert into Keyed Tables how to fill both key columns and data tables in a keyed table simultaneously.

For the fundamentalist, here is the same compound keyed table built from its constituent pair of tables

        ktc:(flip `lname`fname!(`Dent`Beeblebrox;`Arthur`Zaphod))!
        flip (enlist `iq)!enlist 98 42 126

And here is retrieval by full key record,

        ktc[`lname`fname!`Beeblebrox`Zaphod]
iq| 42

Most will agree that the table definition syntax and abbreviated key value retrieval is simpler.

7.4.10 Retrieving Records with a Compound Primary Key

Retrieval of multiple records via a compound primary key is actually easier than with a simple key, since each compound key value is already a list.

        ktc (`Dent`Arthur; `Prefect`Ford)
iq
---
98
126

As was the case with a keyed table having a simple key, retrieval can be performed via a table whose columns and values match the key columns.

        K:([] lname:`Dent`Prefect; fname:`Arthur`Ford)
        ktc[K]
iq
---
98
126

        ktc K            /use juxtaposition
iq
---
98
126

As in the case of a simple key, you can use # to retrieve the full entities of the keyed table instead of just the value records.

        K#ktc
lname   fname | iq
--------------| ---
Dent    Arthur| 98
Prefect Ford  | 126

Key Lookup with txf

Looking up keys in a keyed table is complicated by the different formats for simple and compound keys. The triadic function txf provides a uniform way to perform such key lookup. The first argument is a keyed table (target). The second argument is a list of key values, either simple or compound. The third argument is a list of symbol column names in the value table of target. The result is a list comprising the matching row values from the specified columns of the value table of target.

In the following example using a simple key, observe the column order of the result.

        kts:([k:101 102 103] c1:`a`b`c; c2:1.1 2.2 3.3)
        txf[kts;101 103;`c2`c1]
1.1 `a
3.3 `c

With a compound key, the values to be looked up must be listed in columns.

        ktc:([k1:`a`b`a; k2:`x`y`z] c1:100 200 300; c2:1.1 2.2 3.3)
        txf[ktc;(`a`b;`z`y);`c1`c2]
300 3.3
200 2.2

Foreign Keys and Virtual Columns

A foreign key in SQL is a column in one table whose values are members of a primary key column in another table. Foreign keys are the mechanism for establishing relations between tables.

One of the important features of a foreign key is that the RDBMS enforces referential integrity, meaning that the value in a foreign key column is required to be in the referenced primary key column. To insert a foreign key value that is not in the primary key column, it must first be inserted into the primary key column.

Definition of Foreign Key

Q has the notion of a foreign key that also provides referential integrity. Extra credit to the reader who has guessed that a foreign key is implemented using an enumeration. In our introduction to enumerations, we saw that an enumeration domain can be any list with unique items. A keyed table meets the criterion of a unique domain, since the key records in the dictionary domain are unique.

A foreign key is a table column defined as an enumerated value over a keyed table. As an enumeration, a foreign key indeed provides referential integrity by restricting values in the foreign key column to be in the list of primary key values.

Example of Simple Foreign Key

An enumeration over a keyed table domain acts just like our simple enumeration examples. Let's return to a previous example.

        kt:([eid:1001 1002 1003] name:`Dent`Beeblebrox`Prefect; iq:98 42 126)

To enumerate over the primary key of kt , use a symbol containing the keyed table name as the domain in the enumeration.

       `kt$

The primary key table records provide the unique set of values for enumerating records.

       `kt$`eid!1001
`kt$1001

As usual, q saves us the trouble of being so explicit and allows the enumeration to be applied to items in the value list for the primary key dictionary - that is, the primary key values.

        e1:`kt$1002 1001 1001 1003 1002 1003
        e1 = 1003
000101b

As with any enumeration, attempting to enumerate a key value that is not in the domain causes an error.

        `kt$1004
`cast

We can use table definition syntax to define a table with a foreign key over kt.

        tdetails:([] eid:`kt$1003 1001 1002 1001 1002 1001; sc:126 36 92 39 98 42)

The foreign key column has simply been defined as an enumeration over the keyed table.

We see the foreign key table in the f column when we invoke meta on the table.

        meta tdetails
c  | t f  a
---| ------
eid| i kt
sc | i

As of release 2.4, the built-in function fkeys returns a dictionary in which each foreign key column name is mapped to its key domain—its primary key table name.

        treport:([] eid:`kt$1001 1002 1003; mgrid:`kt$1002 0N 1002)
        fkeys treport
eid  | kt
mgrid| kt

Resolving a Foreign Key

There are occasions when you wish to resolve a foreign key, by which we mean substitute the actual values in place of the enumerated values. As with an ordinary enumeration, this is done by applying the value function to the foreign key column.

        update eid:value eid from tdetails
eid  sc
--------
1003 126
1001 36
1002 92
1001 39
1002 98
1001 42

Foreign Keys and Relations

In SQL, an inner join is used to splice back together data that has been normalized via relations. The splice is usually done along a foreign key, which establishes a relation to the keyed table via the primary key. In the join, columns from both tables are available using dot notation.

In q the same effect is achieved using foreign keys without explicitly creating the joined table. The notation is similar, but different enough to warrant close attention.

Let tf be a table having a foreign key f whose enumeration domain is the keyed table kt. All columns in kt are available via dot notation in any select expression whose from domain is tf. A column c in kt that is accessed in this way is called a virtual column and is specified with dot notation f.c in the select expression.

For example, given t as above, we create a details table that contains individual test results for each person. We name the foreign key in the details table the same as the primary key it refers to, but this is not required,

      tdetails:([] eid:`kt$1003 1002 1001 1002 1001 1002; sc:126 36 92 39 98 42)

Now we can access columns in t via a select on tdetails.

        select eid.name, sc from tdetails
name       sc
--------------
Prefect    126
Beeblebrox 36
Dent       92
Beeblebrox 39
Dent       98
Beeblebrox 42

The case in which the enumeration domain of a foreign key has a compound primary key is slightly more complicated. We cover this in Operations on Compound Column Data

Working with Tables and Keyed Tables

In this section, we use the following examples.

    t:([] name:`Dent`Beeblebrox`Prefect; iq:98 42 126)

    kt:([eid:1001 1002 1003] name:`Dent`Beeblebrox`Prefect; iq:98 42 126)

First and Last Records

Because a table is a list of records, the functions first and last retrieve the first and last records, respectively.

         first t
name| `Dent
iq  | 98

         last t
name| `Prefect
iq  | 126

         first kt
name| `Dent
iq  | 98

         last kt
name| `Prefect
iq  | 126

These functions are useful in select expressions, especially with grouping and aggregation.

Warning.png Note: Every table in kdb+ has a first and last record since it is an ordered list of records. Moreover, the result of a select template is a table and so is also ordered. Contrast this with SQL, in which tables and result sets are unordered, and you must use ORDER BY to impose an order.

You can retrieve the first or last n records of a table or keyed table using the take operator ( # ).

        2#t
name       iq
-------------
Dent       98
Beeblebrox 42

        -3#kt
eid | name       iq
----| --------------
1001| Dent       98
1002| Beeblebrox 42
1003| Prefect    126

See Appendix A for more on using take. Also see select[n] for another way to achieve this result using select[n].

Find

The find operator ( ? ) used with a table performs a reverse lookup of a record and returns the corresponding row number. With t as above,

        t?`name`iq!(`Dent;98)
0

As usual, the record can be abbreviated to a list of row values.

        t?(`Dent;98)
0

You can reverse-lookup a list of multiple row values.

        t?((`Dent;98);(`Prefect;126))
0 2

Since a keyed table is a dictionary, find performs a reverse lookup of a value record and returns the key record.

        kt?`name`iq!(`Dent;98)
eid| 1001
        kt?(`Dent;98)
eid| 1001

In the case of find on a table with a single column, each list of row values must be a singleton list.

        t1:([] eid:1001 1002 1003)
        t1?(enlist 1001; enlist 1002)
0 1

The list of singletons can be created by the following expressions, although the first executes faster, especially for long lists.

        flip enlist 1001 1002
1001
1002

        enlist each 1001 1002
1001
1002

Primitive Join (,)

The join operator ( , ) is defined for tables and keyed tables.

You can use join to append a record to a table.

        t:([]c1:`a`b;c2:10 20)
        t,`c1`c2!(`c;30)
c1 c2
-----
a  10
b  20
c  30

This join is one situation in which you cannot use a list of row values.

        t,(`a;30)
`c1`c2!(`a;10)
`c1`c2!(`b;20)
`a
30

You can, however, use a list of row values to amend the original table.

        t,:(`a;30)
        t
c1 c2
-----
a  10
b  20
c  30

Only tables having exactly the same list of column names and compatible column types can be joined. Since a table is a list of records, the result is obtained by appending the rows of the right operand to those of the left operand.

        t1:([] a:1 2 3; b:100 200 300)
        t2:([] a:3 4 5; b:300 400 500)
        t1,t2
a b
-----
1 100
2 200
3 300
3 300
4 400
5 500

Note that common rows are duplicated in the result.

Two tables with the same columns in different order cannot be joined with , because the order of columns in records is significant in q,

        t3:([]b:1001 2001 3001; a:101 201 301)
        t1,t3
'mismatch

Two keyed tables with the same key and value columns can be joined. Because a keyed table is a dictionary, the result has upsert semantics, as we saw in Join Keys in the right operand that are not in the left operand are treated as inserts, whereas the right operand acts as an update for common key values.

        kt1:([k:1 2 3] c:10 20 30)
        kt2:([k:3 4 5] c:300 400 500)
        kt1,kt2
k| c
-| ---
1| 10
2| 20
3| 300
4| 400
5| 500

Coalesce (^)

The coalesce operator ( ^ ) is defined for keyed tables and differs from primitive join ( , ) in its treatment of null column items in the value tables.

When two keyed tables have the same key and value columns and the column values of both keyed tables are non-null atoms, ^ behaves the same as primitive join ( , ).

        kt1:([k:1 2 3] c1:10 20 30;c2:`a`b`c)
        kt2:([k:3 4 5] c1:300 400 500;c2:`cc`dd`ee)
        kt1,kt2
k| c1  c2
-| ------
1| 10  a
2| 20  b
3| 300 cc
4| 400 dd
5| 500 ee

        kt1^kt2
k| c1  c2
-| ------
1| 10  a
2| 20  b
3| 300 cc
4| 400 dd
5| 500 ee

When the right operand has null column values, the column values of left operand are only updated with non-null values of the right operand.

        kt3:([k:2 3] c1:0N 3000;c2:`bbb`)
        kt3
k| c1   c2
-| --------
2|      bbb
3| 3000

        kt1,kt3
k| c1   c2
-| --------
1| 10   a
2|      bbb
3| 3000

        kt1^kt3
k| c1   c2
-| --------
1| 10   a
2| 20   bbb
3| 3000 c
Information.png Note: The performance of ^ is slower than that of , since each column value of the right operand must be checked for null.

Column Join

Two tables with the same number of rows can be combined with join-each ( ,' ) to form a sideways, or column, join in which the columns are aligned in parallel.

        t1:([] a:1 2 3)
        t2:([] b:100 200 300)
        t1,'t2
a b
-----
1 100
2 200
3 300

When the column lists of the tables are not disjoint, the operation on the common columns has upsert semantics because each column is a dictionary.

        t3:([] a:10 20 30; b:100 200 300)
        t1,'t3
a  b
------
10 100
20 200
30 300

Because keyed tables are dictionaries, they can only be sideways joined if they have identical key columns. In such a situation, we can deduce the behavior by recalling from Removing Entries that any operation on a dictionary is applied on the common elements of the merged domains and is extended to the non-common domain elements with appropriate nulls.

Thus, a sideways join on keyed tables with identical key columns has simple upsert semantics for common data columns. More interesting are the non-common data columns, where the result becomes a column join spliced along common key values.

        t4:([a:1 2 3] x:100 200 300)
        t4
a| x
-| ---
1| 100
2| 200
3| 300

        t5:([a:3 4 5] y:1000 2000 3000)
        t5
a| y
-| ----
3| 1000
4| 2000
5| 3000

        t4,'t5
a| x   y
-| --------
1| 100
2| 200
3| 300 1000
4|     2000
5|     3000

Complex Column Data

Simple Example

Recall from the definition of a column dictionary in Dictionary vs. List that there is no restriction that the column vectors must be lists of simple type. We have heretofore worked with examples having homogenous atomic values in each column because they correspond to familiar SQL tables, but there is no need to limit ourselves to simple columns.

Suppose we want to keep track of a pair of daily observations, say a low temperature and a high temperature. We can do this by storing the low and high values in separate columns.

        t1:([] d:2006.01.01 2006.01.02; l:67.9 72.8; h:82.1 88.4)
        t1
d          l    h
--------------------
2006.01.01 67.9 82.1
2006.01.02 72.8 88.4

        t1[0]
d| 2006.01.01
l| 67.9
h| 82.1

        t1.l
67.9 72.8

        t1.h
82.1 88.4

We can also store pairs in a single column.

        t2:([] d:2006.01.01 2006.01.02; lh:(67.9 82.10; 72.8 88.4))
        t2
d          lh
--------------------
2006.01.01 67.9 82.1
2006.01.02 72.8 88.4

        t2[0]
d | 2006.01.01
lh| 67.9 82.1

        t2.lh
67.9 82.1
72.8 88.4

        t2.lh[;0]
67.9 72.8

        t2.lh[;1]
82.1 88.4

The first form is arguably more natural if you intend to manipulate the low and high values separately. This example can easily be generalized to the situation of n-tuples. In this case, storing multiple values in a single column has a definite advantage since defining and populating n columns is unwieldy when n is not known in advance. Storing and retrieving n-tuples to/from a single column is a simple operation in q. A useful example in finance is storing daily values for a yield curve.

Operations on Compound Column Data

We generalize the above example to the case of storing a set of repeated observations in which the number of observations is not fixed - i.e., varies with each occurrence. Say we want to perform a statistical analysis on the weekly gross revenues for movies and we don't care about the specific titles. Since there will be a different number of movies in release each week, the number of observations will not be constant. An oversimplified version of this might look something like,

        t3:([] wk:2006.01.01 2006.01.08; gr:( 38.92 67.34; 16.99 5.14 128.23 31.69))
        t3
wk         gr
----------------------------------
2006.01.01 38.92 67.34
2006.01.08 16.99 5.14 128.23 31.69

Handling the situation in which the number of column values is not known in advance, or is variable, is cumbersome in SQL. You normalize the data into a master-detail pair of tables, but you cannot re-assemble the details into separate columns via a join. Instead, for each master record you get a collection of records that must be iterated over via some sort of cursor/loop. In verbose programming, this results in many lines of code that are slow and prone to error on edge cases.

By storing complex values in a single column in a q table, sophisticated operations can be performed in a single expression that executes fast. In the following q-sql examples, don't worry about the details of the syntax, and remember to read individual expressions from right to left. Observe that because there are no stinking loops, we never need to know the number of detail records.

Using our movie data, we can produce the sorted gross, the average and high gross for each week in one expression.

        select wk, srt:desc each gr, avgr:avg each gr, hi:max each gr from t3
wk         srt                     avgr    hi
-------------------------------------------------
2006.01.01 67.34 38.92             53.13   67.34
2006.01.08 128.23 31.69 16.99 5.14 45.5125 128.23

While sorts and aggregates such as Max and Avg are standard SQL, think of how you'd produce the sorted sublist and the aggregates together. In your favorite verbose programming environment, you'll soon discover that you have a sordid list of rows requiring a loop to unravel into a single output line.

Now think about what you'd do to compute the percentage drops between successive gross numbers within each week. Because the sorted detail items are rows in SQL, this requires another loop. In q,

        select wk,drp:neg 1_'deltas each desc each gr,avgr:avg each gr,hi:max each gr from t3
wk         drp              avgr    hi
------------------------------------------
2006.01.01 ,28.42           53.13   67.34
2006.01.08 96.54 14.7 11.85 45.5125 128.23

Compound Foreign Key

Storing multiple values in a column is how to make a foreign key on a compound primary key. We return to the example using last name and first name as the primary key.

        ktc:([lname:`Dent`Beeblebrox`Prefect; fname:`Arthur`Zaphod`Ford]; iq:98 42 126)

We create a details table with a foreign key enumeration over ktc by placing the names in the foreign key column.

        tdetails:([] name:`ktc$(`Beeblebrox`Zaphod;`Prefect`Ford;`Beeblebrox`Zaphod); sc:36 126 42)

The columns of ktc are available as virtual columns from tdetails.

        select name.lname,name.iq,sc from tdetails
lname      iq  sc
------------------
Beeblebrox 42  36
Prefect    126 126
Beeblebrox 42  42

Attributes

Attributes are metadata applied to lists of special form. They are used on a dictionary domain or a table column to reduce storage requirements and/or speed retrieval. When it sees an attribute, the q interpreter can make certain optimizations based on the structure of the list.

Warning.png Important: Attributes are descriptive rather than prescriptive. Consequently, applying an attribute (other than `g#) to a list will not make it so. Moreover, a modification that respects the form specified by the attribute leaves the attribute intact (other than `p#), while a modification that breaks the form is permitted but the attribute is lost on the result.

The syntax for applying an attribute looks like the verb # with a left operand containing the symbol for the attribute and the list as the right operand. However, this use of # is not functional.

Warning.png Note: You will not see significant benefit from a attribute for less than a million items. This is why attributes are not automatically applied in mundane situations such as the result of til or distinct. You should test your particular situation to see whether applying an attribute actually provides performance benefit.

Sorted (`s#)

Applying the sorted attribute (`s#) to a list indicates that the items of the list are sorted in ascending order.

Warning.png Note: As of this writing (Jun 2007) there is no way to indicate a descending sort.

When a list has the sorted attribute, the default linear search used in lookups is replaced with binary search. Sorted also makes certain operations much faster — for example min and max.

The following fragments show situations in which this applies.

x?v
... where x = v, ...
... where x in v, ...
... where x within v, ...

The sorted attribute can be applied to a simple list,

        L:`s#1 2 2 4 8
        L
`s#1 2 2 4 8

        L,:16                      / respects sort
        L
`s#1 2 2 4 8 16

        L,:0                        /  does not, attribute lost
        L
1 2 2 4 8 16 0

or a column of a table,

        t:([]`s#t:04:02:42.001 04:02:42.003;v:101.05 100.95)

The sorted attribute can be applied to a dictionary, which makes the dictionary into a step function.

        ds:`s#1 2 3 4 5!`a`b`c`d`e
        ds
1| a
2| b
3| c
4| d
5| e

Applying the sorted attribute to a table implies binary search on the table and also that the first column is sorted.

         ts:`s#([]t:04:02:42.001 04:02:42.003;v:101.05 100.95)
         ts
t            v
-------------------
04:02:42.001 101.05
04:02:42.003 100.95

Applying the sorted attribute to a keyed table means that the dictionary, its key table and its key column(s) are all sorted.

        kt:`s#([k:1 2 3 4] v:`d`c`b`a)
        kt
k| v
-| -
1| d
2| c
3| b
4| a

Unique (`u#)

Applying the unique attribute (`u#) to a list indicates that the items of the list are distinct. Knowing that the elements of a list are unique dramatically speeds up distinct and allows q to exit some comparisons early.

Operations on the list must preserve uniqueness or the attribute is lost.

        LU:`u#4 2 6 18 1
        LU
`u#4 2 6 18 1

        LU,:0                / uniqueness preserved
        LU
`u#4 2 6 18 1 0

        LU,:2                /  attribute lost
        LU
4 2 6 18 1 0 2

The unique attribute can be applied to the domain of a dictionary, a column of a table, or the key column of a keyed table. It cannot be applied to a dictionary, a table or a keyed table directly.

Parted (`p#)

The parted attribute (`#p) indicates that the list represents a step function in which all occurrences of a particular output value are adjacent. The range is an int or temporal type that has an underlying int value, such as years, months, days, etc. You can also partition over a symbol provided it is enumerated.

Information.png Advanced': Applying the parted attribute causes the creation of an index dictionary that maps each unique output value to the position of its first occurrence.

When a list is parted, lookup is much faster since linear search is replaced by hashtable lookup.

Sorting in ascending or descending order is one way to produce the partitioned structure, but list need not be in sorted order. For example,

        L:`p#2 2 2 1 1 4 4 4 4 3 3
        L,:3
        L
2 2 2 1 1 4 4 4 4 3 3 3

The parted attribute is not preserved under an operation on the list, even if the operation preserves the partitioning.

Warning.png Note: The parted attribute should be considered when the number of entities reaches a billion and most of the partitions of of substantial size—i.e., there is significant repetition.

Grouped (`g#)

The grouped attribute (`g#) differs from other attributes in that it imposes additional structure on the list by causing q to create and maintain an index. Grouping can be applied to a list when no other assumptions about its structure can be made.

Applying the grouped attribute to a table column roughly corresponds to placing a SQL index on a column. For example, if you wish to query a table via a symbol column sym, applying the grouped attribute to the column drastically speeds up queries such as,

        select[-100] ... where sym=`xyz

Here we are retrieving the last 100 records matching a sym value.

Information.png Advanced: The index is a dictionary that maps each unique output value to the a list of the positions of all its occurrences. This speeds lookups and some operations (e.g., distinct). The tradeoff is significant storage overhead.

For example,

        L:`g#1 2 3 2 3 4 3 4 5  2 3 4 5 4 3 5 6
        L
`g#1 2 3 2 3 4 3 4 5 2 3 4 5 4 3 5 6
Warning.png Note: The grouped attribute is preserved for both inserts and upserts.

Applying the grouped attribute to a table column,

        t:([]`g#c1:1 2 3 2 3 4 3 4; c2:`a`b`a`c`a`d`b`c)
Warning.png Note: As of this writing (Jun 2007), the maximum number of `g# attributes that can be placed on a single table is 99.

Prev: Dictionaries Next: Queries: q-sql

Table of Contents

©2006-2007 Kx Systems, Inc. and Continuux LLC. All rights reserved.

Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox