KdbplusForMortals/loading tables from stored data

From Kx Wiki
Jump to: navigation, search


1.1 Loading Tables from Stored Data

1.1.0 Overview

It is often desirable to persist data that is produced or consumed by applications written in q or other languages. A table of reasonable size can be persisted in its entirety, meaning that its data is represented in a single file on external storage. In this section, we discuss the built-in storage formats. Other formats can be handled with custom q code.

1.1.1 Serialized Tables

Any q data entity can be serialized in binary form to a self-describing q data file. Serialization is an effective way to persist a reference table of reasonable size—i.e., up to millions of rows, as opposed to hundreds of millions of rows. To access a serialized table, you must deserialize it by reading it into a workspace in its entirety

Big Picture (0): We think of a serialized table as a zero-dimensional persisted form since it is stored in a single file and has not been split apart in any way.

table      ▪

Here the ▪ represents the contents of the table as an opaque entity. This contrasts with higher-dimensional persisted table forms we shall encounter later.

Since serialization and deserialization basically just write and read byte streams, they are quite fast. There are no provided APIs for other languages to read serialized q tables but it is possible with some work.

The size of a serialized table that a q process can read is limited by the amount of memory available to the process. Reserve multiples of the table’s actual size if you intend to copy it or pass it by value.

Use set with a file handle to serialize any q entity into a file and get or value to read a.

    `:/data/t set ([] s:`a`b`c; v:100 200 300)

    get `:/data/t
s v
a 100
b 200
c 300

Important: Observe the lack of a trailing / in the file handle, which indicates that we are serializing the table to a single file. Adding a trailing / causes the table to be splayed. See §1.2 for more on splayed tables.

Alternatively, you can use the functional form of . or @ with the : operator to serialize the table.

    .[`:/data/t;();:;([] s:`a`b`c; v:100 200 300)]

Here the file handle can be thought of as an external variable name.

A serialized table can also be read into the workspace using the \l command, in which case the retrieved table is assigned to a global variable with the same name as the file. Continuing the preceding example, loading the saved file creates the variable t.

    \l /data/t

s v
a 100
b 200
c 300

You can append records directly to a serialized table using upsert; alternately, you can use the functional form of . or @ with the , operator.

    .[`:/data/t;();,;] ([] s:`d`e; v:400 500)
    `:/data/t upsert `s`v!(`f;600)
    get `:/data/t
s v
a 100
b 200
c 300
d 400
e 500
f 600

Note: It is not possible to append a value list (i.e., a “row”) to a serialized table with insert as it is with a table in memory.

    `:/data/t insert(`h;800)

Similar operations apply to serialized keyed tables, with the expected difference that updating a persisted keyed table results in upsert semantics.

    `:/data/kt set ([k:`a`b`c] v:10 20 30)

    `:/data/kt upsert ([k:`c`d] v:100 200)

    get `:/data/kt
k| v
-| ---
a| 10
b| 20
c| 100
d| 200

1.1.2 Table from Newline-Delimited Text File

You can save a list of q strings as a newline-delimited text file using dyadic 0: with left operand comprising a file handle and right operand comprising a list of strings to be written as records.

    `:/data/notes.txt 0: ("A comment";"Another";"Audit note: Changed on 11/08/2007")

Read newline-delimited text using monadic read0, which returns a list of strings, one for each record. In the following snippet, we read the data written in the previous snippet, placing it in a single-column table that is then searched with like.

    t:([] notes:read0 `:/data/notes.txt)
    exec from t where notes like "Audit*"
"Audit note: Changed on 11/08/2007"

1.1.3 Table from Fixed Record Text File

A file containing fixed-format text is parsed into column lists using the dyadic form of 0: with field specifications in the left operand. The result can be turned into a column dictionary by adding column names and thence flipped to get a table.

For example, consider a file closepx.txt having the form:

1001DBT12345678 98.61002EQT98765432 24.571003CCR00000011 21.23

We read the file into a table:

     flip `id`sym`px!("ISF";4 11 6) 0: `:/data/closepx.txt
id   sym         px
1001 DBT12345678 98.6
1002 EQT98765432 24.57
1003 CCR00000011 121.23

See §1.5.5 to read such files in chunks when the entire table is too large to fit into memory.

1.1.4 Table from Delimited Record Text File

Read variable length delimited text files with:

(Lt;D) 0: f

where Lt is a simple list of char containing one type letter per field and D is either a char representing the delimiting character or the enlist of same.

For example, say we have a csv file Px.csv having records:


Reading with a delimiter char results in a list of column lists, whence we create a table.

    flip `id`sym`px!("ISF"; ",") 0: `:/data/Px.csv
id   sym         px
1001 DBT12345678 98.6
1002 EQT98765432 24.57
1003 CCR00000001 121.23

If the csv file has column names in the first row, enlisting the delimiter directs 0: to create the corresponding table. For example, suppose the file pxtitles.csv has records:


Reading with an enlisted delimiter:

    ("ISF";enlist ",") 0: `:/data/pxtitles.csv
Seq  Sym         Px
1001 DBT12345678 98.6
1002 EQT98765432 24.75
1004 CCR00000001 121.23

See §1.5.5 to read such files in chunks when the entire table is too large to fit into memory.

1.1.5 Links

A link column is closely related to a foreign key column, in that it provides linkage from column values in one table to column values in a second table. Whereas a foreign key column is an enumeration over the key column of a keyed table, a link column comprises indices into an arbitrary column of an arbitrary table or keyed table.

A link column is useful in situations where a key column is not available. For example, a table (keyed or non-keyed) can contain a link to itself in order to create a parent-child relationship. You can also use links to create “foreign key” relationships between splayed tables, where foreign keys are not possible (a keyed table cannot be splayed).

We create two tables t1 and t2 with the property that the values of the fid column in t2 are all drawn from the values of the column id in t1.

    t1:([] id: 1 2 3; p:1.1 2.2 3.3)
    t2:([] fid:2 1 3 2; v:20 10 30 40)

We use ? to create a vector of indices of fid in id and we use ! to create the link.

    update clink:`t1!t1.id?t2.fid from `t2
    meta t2
c    | t f  a
-----| ------
fid  | i
v    | i
clink| i t1

We can use the link to access t1 columns in queries against t2, just as we would with a foreign key.

    select clink.p, v from t2
p   v
2.2 20
1.1 10
3.3 30
2.2 40

Prev: Table of Contents Next: 1.2 Splayed Tables

Table of Contents

Personal tools