Data complexity is built up from atoms, which we know, and lists. It is important to achieve a thorough understanding of lists since nearly all q programming involves processing lists. The concepts are simple but complexity can build rapidly. Our approach is to introduce the basic notion of a general list in the first section, take a quick detour to cover simple and singleton lists, then return to cover general lists in more detail.
Introduction to Lists
A list is simply an ordered collection. A collection of what, you ask. More precisely, a list is an ordered collection of atoms and other lists. Since this definition is recursive, let’s start with the simplest case in which the list comprises only atoms.
List Definition and Assignment
The notation for a general list encloses its items within matching parentheses and separates them with semicolons.
(1;2;3) ("a";"b";"c";"d") (`Life;`the;`Universe;`and;`Everything) (-10.0; 3.1415e; 1b; `abc; "z")
In the preceding examples, the first three lists are simple, meaning that the list comprises atoms of uniform type. The last example is a general list, meaning that it is not simple. Otherwise put, a general list contains items that are not scalars of a uniform type. This could be atoms of mixed type, nested lists of uniform type, or nested lists of mixed type.
The order of the items in the list is positional (i.e., left-to-right) and is part of its definition. The lists (1;2) and (2;1) are different. SQL is based on sets, which are inherently unordered. This distinction leads to some subtle differences between the results of queries on q tables versus the result sets from analogous SQL queries. The inherent ordering of lists makes time series processing natural and fast in q, while it is cumbersome and performs poorly in standard SQL.
Lists can be assigned to variables exactly like atoms.
L1:(1;2;3) L2:("z";"a";"p";"h";"o";"d") L3:(`Life;`the;`Universe;`and;`Everything) L4:(0b;1b;0b;1b;1b;0b) L5:(-10.0;3.1415e;1b;`abc;"z")
The number of items in a list is its count. You can obtain the count of a list as follows,
count L1 3
This is our first example of a function. For now, we need only understand that count returns an int value equal to the number of items in a list.
Observe that the count of any atom is 1.
count 42 1 count `abcd 1
Simple lists of homogeneous type have a simplified notation. Of course, you can use general list notation, but the q console always displays simple lists in simplified form.
Simple Integer Lists
A simple list of any numeric type omits the enclosing parentheses and replaces the separating semi-colons with blanks. The following two expressions for a simple list of int are equivalent,
(100;200;300) 100 200 300
This is confirmed by the console display,
L:(100:200;300) L 100 200 300
Similar notation is used for simple lists of short and long with the addition of the type indicator.
H:(1h;2h;255h) H 1 2 255h
We conclude that a trailing type indicator in the display applies to the entire list and not just the last item of the list; otherwise, the list would not be simple and would be displayed in general form.
G:(1; 2; 255h) G (1;2;255h)
Simple Floating Point Lists
Simple lists of float and real are notated similarly. Observe that the q console suppresses the decimal point when displaying a float having zero(es) to the right of the decimal.
F:(123.4567;9876.543;99.0) F 123.4567 9876.543 99
This notational efficiency for float display means that a list of floats having no decimal parts displays with a trailing f.
FF:1.0 2.0 3.0 FF 1 2 3f
Simple Binary Lists
The simplified notation for a simple list of binary data juxtaposes the individual data values together with a type indicator. The type indicator for boolean trails the value,
bits:(0b;1b;0b;1b;1b) bits 01011b
whereas the indicator for byte leads,
bytes:(0x20;0xa1;0xff) bytes 0x20a1ff
A simple list of boolean atoms requires the same number of bytes to store as it has atoms. While the simplified notation is suggestive, multiple bits are not compressed to fit inside a single byte. The list bits above stores its values in 5 bytes of storage.
Simple Symbol Lists
The simplified notation for simple lists of symbols juxtaposes the individual atoms with no intervening whitespace.
symbols:(`Life;`the;`Universe;`and;`Everything) symbols `Life`the`Universe`and`Everything
Inserting spaces between the atoms causes an error.
bad:`This `is `wrong 'is
Simple char Lists and Strings
The simplified notation for a list of char looks just like a string in most languages, with the juxtaposed list of characters enclosed in double quotes.
chars:("s";"o";" ";"l";"o";"n";"g") chars "so long"
Entering Simple Lists
Lists can be defined using simplified notation,
L:100 200 300 H:1 2 255h F:123.4567 9876.543 99.99 bits:01011b bytes:0x20a1ff symbols:`Life`the`Universe`and`Everything chars:"so long"
Finally, we observe that a list entered as intermixed ints and floats is converted to a simple list of floats.
1 2.0 3 1 2 3f
Empty and Singleton Lists
Lists with one and zero elements merit special consideration.
The General Empty List
It is useful to have a list with no items. The empty list is denoted by a pair of parentheses with nothing (except possibly whitespace) between.
L:( ) L ()
We shall see that it is possible to define an empty list with a specific type when we look at Creating Typed Empty Lists.
Lists with a Single Item
There is a quirk in q regarding how it handles a list containing a single item, called a singleton. Creation of a singleton presents a notational problem. To see the issue, first realize that a list containing a single atom is distinct from the individual atom. As any UPS driver will readily tell you, an item in a box is not the same as an unboxed item. By now, we recognize the following as atoms,
42 1b 0x2a `beeblebrox "z"
We also recognize the following are all lists with two elements,
(42;6) 01b `zaphod`beeblebrox "zb" (40;`two)
How to create a list of a single item? Good question. The answer is that there is no syntactic way to do so. You might think that you could simply enclose the item in parentheses, but this doesn’t work since the result is an atom.
singleton:(42) singleton 42
The reason for this is that parentheses are used for multiple purposes. As we have seen, paired parentheses are used to delimit items in the specification of a general list. Paired parentheses are also used for grouping in expressions—that is, to isolate the result of the expression inside. The latter usage forces (42) to be the same as the atom 42 and so precludes the intention in the specification of singleton above.
The way to make a list with a single item is to use the enlist function, which returns a singleton list containing what is to its right.
singleton:enlist 42 singleton ,42
Note that the console displays a singleton list with a leading comma.
signum type 42 -1 signum type enlist 42 1
The comma in the display is a vestigial feature from k (the precursor to and implementation language of q) in which monadic , means enlist. You might ask why this notation isn’t used to create the singleton list in q. The answer is that comma is used to denote the "join" operator, so using it for enlist would overload its meaning and operator overloading by valence is not done in q.
As a final check before moving on, make sure that you understand that the following also defines a list containing a single item,
singleton:enlist 1 2 3 count singleton 1
Recall that a list is ordered from left to right by the position of its items. The offset of an item from the beginning of the list is called its index. Thus, the first item is has index 0, the second item (if there is one) has index 1, etc. A list of count n has index domain 0 to n-1.
Given a list L, the item at index i is accessed by L[i]. Retrieving an item by its index is called item indexing. For example,
L:(-10.0;3.1415e;1b;`abc;"z") L -10.0 L 3.1415e L 1b L `abc L "z"
Providing an invalid data type for the index will result in an error.
If you attempt to index outside of the bounds of the list, the result will not be an error. Rather, you get a null value. If the list is simple, this will be the null for the type of atoms in the list; otherwise, it will be 0n.
One way to understand this is that the result of asking for a non-existent index is "missing value." Keep this in mind, since indexing one position past the end of the list is easy to do, especially if you’re not used to indexing relative to 0.
Items in a list can also be assigned via item indexing. Thus,
L1:1 2 3 L1:42 L1 1 2 42
Index assignment into a simple list enforces strict type matching with no type promotion. Otherwise put, when you reassign an item in a simple list, the type must match exactly and a narrower type is not widened.
L:100 200 300 L:42h 'type f:100.0 200.0 300.0 f 100 200 300f f:400 'type
This may come as a surprise if you are accustomed to numeric values always being promoted to wider types in a verbose language.
Lists as Maps
Thus far, we have viewed a list as a static collection of its items. We can also consider a list to be a mapping provided by item indexing. Specifically, a list L of count n represents a monadic mapping over the domain of non-negative integers 0,...,n-1. The list mapping assigns the output value L[i] to the input value i. Succinctly, the I/O association for the list is,
i — > L[i]
Here are the I/O tables for some basic lists
101 102 103 104
(`a; 123.45; 1b)
(1 2; 3 4)
The first two examples demonstrate ranges of a collection of atoms. The last example has a range comprised of lists.
From this perspective, a list not only looks like a map, it is a map whose notation is a shortcut for the I/O table assignment. This is a useful way of looking at things.
Complexity is built by using lists as items of lists.
Now that we’re comfortable with simple lists, we return to general lists. We can nest by including lists as items of lists. The number of levels of nesting for a list is called its depth. Atoms are considered to have depth 0 and simple lists have depth 1.
The notation of complex lists reflects nesting. For pedagogical purposes, in this section, we shall often use general notation to define even simple lists; however, the console always display lists in simplified form. In subsequent sections, we shall use only simplified notation for simple lists.
Following is a list of depth 2 that has three items, the first two being atoms and the last a list.
L1: (1;2;(100;200)) count L1 3
Following is the simplified notation for the inner list,
L1 (1;2;100 200)
The general notation makes the levels of nesting explicit with parentheses, whereas the simplified notation reduces notational density. This is made more explicit by the function show,
show L1 1 2 100 200
Following is a list of depth two having two elements, each of which is a simple list,
L2:((1;2;3);(`ab;`c)) L2 (1 2 3; `ab`c) count L2 2 show L2 1 2 3 `ab`c
Following is a list of depth two having three elements, each of which is a general list,
L3:((1;2h;3j);("a";`bc);(1.23;4.56e)) L3 ((1;2h;3j);("a";`bc);(1.23;4.56e)) count L3 3 show L3 (1;2h;3j) ("a";`bc) (1.23;4.56e)
Following is a list of depth 2 having one item that is a simple list,
L4:enlist 1 2 3 4 count L4 1 L4 1 2 3 4 show L4 1 2 3 4
Following is list of depth three having two items. The second item is a list of depth two having three items, the last of which is a simple list of four items.
L5:(1;(100;200;(1000;2000;3000;4000))) L5 (1;(100;200;1000 2000 3000 4000)) count L5 2 count L5 3 show L5 1 (100;200;1000 2000 3000 4000)
Following is a “rectangular” list that can be thought of as a 3x4 matrix
m:((11;12;13;14);(21;22;23;24);(31;32;33;34)) m (11 12 13 14;21 22 23 24;31 32 33 34) show m 11 12 13 14 21 22 23 24 31 32 33 34
Lists from Variables
Lists can be created from variables.
L1 (1;2;100 200) L2 (1 2 3; `ab`c) L6:(L1;L2) L6 ((1;2;100 200); (1 2 3; `ab`c))
Indexing at Depth
It is possible to index into the items of a nested list.
Repeated Item Indexing
Retrieving an item via a single index always retrieves an uppermost item from a nested list.
L:(1;(100;200;(1000;2000;3000;4000))) L 1 L (100;200;1000 2000 3000 4000)
Recalling that q evaluates expressions from right-to-left, we interpret the second retrieval above as,
- Retrieve the item at index 1 from L
Alternatively, reading it functionally as left-of-right,
- Retrieve from L the item at index 1
Since the result L is itself a list, we can retrieve its elements using a single index as well,
L 1000 2000 300
Read this as:
- Retrieve the item at index 2 from the item at index 1 in L
- Retrieve the item at index 1 from L, and from it retrieve the item at index 2
We can repeat single indexing once more to retrieve an item from the innermost nested list.
Read this as,
- Retrieve the item from index 0 from the item at index 2 in the item at index
- 1 in L
- Retrieve the item at index 1 from L, and from it retrieve the item at index
- 2, and from it retrieve the item at index 0
Notation for Indexing at Depth
There is an alternate notation for repeated indexing into the constituents of a nested list. The last retrieval can also be written as,
Retrieving inner items for a nested list with this notation is called indexing at depth.
Assignment via index also works at depth.
L:(1;(100;200;(1000 2000 3000 4000))) L[1;2;0]:999 L (1;(100;200;(999 2000 3000 4000)))
To verify that the notation for indexing at depth is reasonable, we return to our matrix example,
m:((11;12;13;14);(21;22;23;24);(31;32;33;34)) m[0;2] 13 m 13
The indexing at depth notation suggests thinking of m as a multi-dimensional matrix, whereas repeated single indexing suggests thinking of m as an array of arrays. Chacun à son goût.
A list of positions can be used to index a list.
Retrieving Multiple Items
In this section, we begin to see the power of q for manipulating lists. We start with,
L1:100 200 300 400
We know how to index single items of the list
L1 100 L1 300
By extension, we can retrieve a list of multiple items via multiple indices,
L1[0 2] 100 300
The indices can be in any order, and the corresponding items are retrieved,
L1[3 2 0 1] 400 300 100 200
An index can be repeated,
L1[0 2 0] 100 300 100
Some more examples,
bits:01101011b bits[0 2 4] 011b chars:"beeblebrox" chars[0 8] "bo"
This explains why including the semi-colon separators are essential when indexing at depth. Leaving them out effectively specifies multiple indices, and you will get a corresponding list of values from the top level as a result.
Indexing via a Simple List
You have no doubt noticed that retrieving items via multiple indices looks just like we’ve substituted a list for the index. Indeed, this is exactly what is happening. Here are some examples of a simple index list,
I:3 2 0 L1[I] 400 300 100 L2:(-10.0;3.1415e;1b;`abc;"z") L2[I] (`abc;1b;-10.0) L3:(1;(100;200;(1000;2000;3000;4000));5;(600 700)) L3 (1; (100 200;(1000 2000 3000 4000));5;600 700) I:2 1 0 L3[I] (5;(100 200;(1000 2000 3000 4000));1)
Indexing via a General List
Observe that in every case, the result of indexing a given list via a simple list is a new list whose values are retrieved from the first level of the given list and whose shape is the same as the index list. In particular, the retrieved list has the same shape as the index list. This suggests the behavior with an index that is a non-simple list.
L1:100 200 300 400 L1[0 1; 2 3] (100 200; 300 400) I:(1;(0;(3 2))) L1[I] (200;(100;400 300))
To figure out the result of indexing by any non-simple list, start with the fact that the result always has the same shape as the index.
More precisely, the result of indexing via a list conforms to the index list. The notion of conformability of lists is defined recursively. All atoms conform. Two lists conform if they have the same number of items and each of their corresponding items conform. In plain language, two lists conform if they have the same shape.
Assignment with List Indexing
Recall that a list item can be assigned via item indexing,
L:100 200 300 400 L:1000 L 1000 200 300 400
Assignment via index extends to indexing via a simple list.
L[1 2 3]:2000 3000 4000 L 100 2000 3000 4000
L[3 2 1]:999 888 777
is equivalent to,
L:999 L:888 L:777
Consequently, in the case of a repeated item in the index list, the right-most assignment prevails.
L:100 200 300 400 L[0 1 0 3]:1000 2000 3000 4000 L 3000 2000 300 4000
You can assign a single value to multiple items in a list by indexing on a simple list and using an atom for the assignment value.
L:100 200 300 400 L[1 3]:999 L 100 999 300 999
Now that we're familiar with retrieving and assigning via an index list, we introduce a simplified notation. It is permissible to leave out the brackets and juxtapose the list and index with a separating blank. Some examples follow.
L:100 200 300 400 L 100 L 0 100 L[2 1] 300 200 L 2 1 300 200 I:2 1 L[I] 300 200 L I 300 200
Which notation you use is a matter of personal preference. In this manual, we usually use brackets, since this notation is probably most familiar from verbose programming. Experienced q programmers often use juxtaposition since it reduces notational density.
The dyadic primitive find (?) returns the index of the right operand in the left operand list.
1001 1002 1003?1002 1
Performing find on a list is the inverse to positional indexing because it maps an item to its position.
If you try to find an item that is not in the list, the result is an int equal to the count of the list.
1001 1002 1003?1004 3
The way to think of this result is that the position of an item that is not in the list is one past the end of the list.
Of course, find extends to lists of items.
1001 1002 1003?1003 1001 2 0
Eliding Indices for a Matrix List
We return to the situation of indexing at depth for nested lists. For simplicity, let's start with a list that looks like a matrix.
m:(1 2 3 4; 100 200 300 400; 1000 2000 3000 4000)
Analogy with traditional matrix notation suggests that we could retrieve a row or column from m by providing a "partial" index at depth. Indeed, this works,
m[1;] 100 200 300 400 m[;3] 4 400 4000
Observe that eliding the last index reduces to item indexing at the top level,
m[1;] 100 200 300 400 m 100 200 300 400
The situation of eliding other than the first index is more interesting. The way to read m[;3] above is
- Retrieve the items in the third position from all items at the top level of m
Eliding Indices for a General List
Let's tackle another level of nesting.
L:((1 2 3;4 5 6 7);(`a`b`c`d;`z`y`x`;`0`1`2);("now";"is";"the")) L[;1;] (4 5 6 7;`z`y`x`w`v;"is") L[;;2] (3 6;`c`x`2;"w em")
Interpret L[;1;] as
- Retrieve all items in the second position of each list at the top level
Interpret L[;;2] as
- Retrieve the items in the third position for each list at the second level
Observe that in L[;;2] the attempt to retrieve the item at the third position of the string "is" resulted in the null value " "; hence the blank in "w em" of the result.
As the final exam for this section, let’s combine an elided index with indexing by simple arrays. Let L be as above. Then we can retrieve a cross-section of L using a combination of elided and list indices,
L[0 2;;0 1] ((1 2;4 5);("no";"is";"th"))
Rectangular Lists and Matrices
In this section, we further investigate the lists that looked like matrices in the previous section. A rectangular list is a list of lists, all having the same count. Understand that this does not mean that a rectangular list is necessarily a traditional matrix, since there can be additional levels of nesting. For example, the following list is rectangular because each of its items has count three, but is not a matrix,
L:(1 2 3; (10 20; 100 200; 1000 2000))
In a rectangular list, elision of the second index corresponds to generalized row retrieval and elision of the first index corresponds to generalized column retrieval,
r:(`a`b`c;(1 2 3 4;10 20 30 40;100 200 300 400)) r[0;] `a`b`c r[;1] (`b;10 20 30 40)
A rectangular list can be transposed with flip, meaning that that the rows and columns are reflected, effectively reversing the first two indices in indexing at depth. For example, the transpose of L above is
flip L ((1;10 20);(2;100 200);(3;1000 2000))
Matrices are a special case of rectangular lists and can most easily be defined recursively. A matrix of dimension 1 is a simple list. In the context of mathematical operations, the simple list would have numeric type, but this is not a restriction. The count of a one-dimensional matrix is called it size. In some contexts, a one-dimensional matrix is called a vector, its count length, and an atom is a scalar. Some examples,
v1:1 2 3 v2:98.60 99.72 100.34 101.93 v3:`so`long`and`thanks`for`all`the`fish
For n>1, we define a matrix of dimension n recursively as a list of matrices of dimension n-1 all having the same size. Thus, a matrix of dimension 2 is a list of matrices of dimension 1, all having the same size. If all items in a matrix have the same type, we call this the type of the matrix.
Two and Three Dimensional Matrices
Two-dimensional matrices are frequently encountered and have special terminology. Let m be a two-dimensional matrix. The items of m are its rows. As we have already seen, the ith row of m can be obtained via item indexing as m[i]. Equivalently, we can use an elided index with indexing at depth to obtain the ith row as m[i;].
By laying out the rows of m in tabular form, we realize that the list m[;j] is the jth column of m. Note that the expressions m[i][j] and m[i;j] both retrieve the same item—namely, the element in row i and column j.
Following is an example of a two dimensional matrix of int having size 4x3,
m:(1 2 3;10 20 30;100 200 300;1000 2000 3000) m 1 2 3 m[0;] 1 2 3 m[;2] 3 30 300 3000 m 3 m[0;2] 3
The specification of m demonstrates that our approach to matrix definition treats m as a collection of rows—i.e., m is in row order. Since each row is a simple list, the elements of a row are in fact stored in contiguous memory. This makes retrieval of an entire row very fast, but retrieval of a column will be slower since its elements are not contiguous. This choice was made so that list indexing would result in the conventional matrix notation.
It is equally valid to consider a one-dimensional array as a column and a two dimensional array as a collection of column vectors. This would make column retrieval very fast, but indexing order would be transposed from conventional notation. As we shall see, a table is in fact a collection of columns that are notationally transposed for convenience. Since the constraints and calculations of q-sql operate on column vectors, they are fast. In particular, a simple time series can be represented by two parallel ordered columns, one holding the datetimes and the second holding the associated values. Retrieving and manipulating the points stored in time sequence is faster by orders of magnitude than performing the same operations in an RDBMS that stores data by row with undefined row order.
For completeness, here is an example of a three dimensional 2x3x3 matrix—i.e., each item of mm is a 3x3 matrix,
mm:((1 2 3;4 5 6;7 8 9);(10 20 30; 40 50 60; 70 80 90)) mm (1 2 3;4 5 6;7 8 9) mm[1;2] 70 80 90 mm[1;;2] 30 60 90
We have seen that matrices in q look and act like their mathematical counterparts. However, they have additional features not available in simple mathematical notation or in many verbose languages. First, we have seen that a matrix can be viewed and manipulated both as a multi-dimensional array (i.e., indexing at depth) and as an array of arrays (repeated item indexing). In addition, we can extend individual item indexing with indexing via a simple list. With m as above
m[0 2] (1 2 3;100 200 300)
©2006 Kx Systems, Inc. and Continuux LLC. All rights reserved.