Data complexity is built up from atoms, which we know, and lists. It is important to achieve a thorough understanding of lists since nearly all q programming involves processing lists. The concepts are simple but complexity can build rapidly. Our approach is to introduce the basic notion of a general list in the first section, take a quick detour to cover simple and singleton lists, then return to cover general lists in more detail.
Introduction to Lists
A list is simply an ordered collection. A collection of what, you ask. More precisely, a list is an ordered collection of atoms and other lists. Since this definition is recursive, let's start with the simplest case in which the list comprises only atoms.
List Definition and Assignment
The notation for a general list encloses its items within matching parentheses and separates them with semicolons. For readability, optional whitespace is used after the semicolon separators in the last example.
(1;2;3) ("a";"b";"c";"d") (`Life;`the;`Universe;`and;`Everything) (-10.0; 3.1415e; 1b; `abc; "z")
In the preceding examples, the first three lists are simple, meaning that the list comprises atoms of uniform type. The last example is a general list, meaning that it is not simple. Otherwise put, a general list contains items that are not atoms of a uniform type. This could be atoms of mixed type, nested lists of uniform type, or nested lists of mixed type.
Important: The order of the items in the list is positional (i.e., left-to-right) and is part of its definition. The lists (1;2) and (2;1) are different. SQL is based on sets, which are inherently unordered. This distinction leads to some subtle differences between the results of queries on q tables versus the result sets from analogous SQL queries. The inherent ordering of lists makes time series processing natural and fast in q, while it is cumbersome and performs poorly in standard SQL.
Lists can be assigned to variables exactly like atoms.
L1:(1;2;3) L2:("z";"a";"p";"h";"o";"d") L3:(`Life;`the;`Universe;`and;`Everything) L4:(0b;1b;0b;1b;1b;0b) L5:(-10.0;3.1415e;1b;`abc;"z")
The number of items in a list is its count. You can obtain the count of a list as follows,
count L1 3
This is our first example of a function, which we will learn about in Functions. For now, we need only understand that count returns an int value equal to the number of items in a list to its right.
Observe that the count of any atom is 1.
count 42 1 count `abcd 1
A simple list - that is, a list of atoms of a uniform type - corresponds to the mathematical notion of a vector. Such lists are treated specially in q. They have a simplified notation, take less storage and compute faster than general lists. Of course, you can use general list notation for a vector, but q converts a general list to a vector whenever feasible.
Simple Integer Lists
A simple list of any numeric type omits the enclosing parentheses and replaces the separating semi-colons with blanks. The following two expressions for a simple list of int are equivalent,
(100;200;300) 100 200 300
This is confirmed by the console display,
(100;200;300) 100 200 300
Similar notation is used for simple lists of short and long with the addition of the type indicator.
H:(1h;2h;255h) H 1 2 255h
We conclude that a trailing type indicator in the display applies to the entire list and not just the last item of the list; otherwise, the list would not be simple and would be displayed in general form.
G:(1; 2; 255h) G 1 2 255h
Simple Floating Point Lists
Simple lists of float and real are notated similarly. Observe that the q console suppresses the decimal point when displaying a float having zero(s) to the right of the decimal, but the value is not an int.
F:(123.4567;9876.543;99.0) F 123.4567 9876.543 99
This notational efficiency for float display means that a list of floats having no decimal parts displays with a trailing f.
FF:1.0 2.0 3.0 FF 1 2 3f
Simple Binary Lists
The simplified notation for a simple list of binary data juxtaposes the individual data values together with a type indicator. The type indicator for boolean trails the value.
bits:(0b;1b;0b;1b;1b) bits 01011b
The indicator for byte leads,
bytes:(0x20;0xa1;0xff) bytes 0x20a1ff
Note: A simple list of boolean atoms requires the same number of bytes to store as it has atoms. While the simplified notation is suggestive, multiple bits are not compressed to fit inside a single byte. The list bits above holds its values in 5 bytes of storage.
Simple Symbol Lists
The simplified notation for simple lists of symbols juxtaposes the individual atoms with no intervening whitespace.
symbols:(`Life;`the;`Universe;`and;`Everything) symbols `Life`the`Universe`and`Everything
Inserting spaces between the atoms causes an error.
bad:`This `is `wrong 'is
Simple char Lists and Strings
The simplified notation for a list of char looks just like a string in most languages, with the juxtaposed sequence of characters enclosed in double quotes.
chars:("s";"o";" ";"l";"o";"n";"g") chars "so long"
Entering Simple Lists
Lists can be defined using simplified notation,
L:100 200 300 H:1 2 255h F:123.4567 9876.543 99.99 bits:01011b bytes:0x20a1ff symbols:`Life`the`Universe`and`Everything chars:"so long"
Finally, we observe that a list entered as intermixed ints and floats is converted to a simple list of floats.
1 2.0 3 1 2 3f
Lists of Temporal Data
Specifying a list of mixed temporal types has a different behavior from that of a list of mixed numeric types. In this case, the list takes the type of the first item in the list; other items are widened or narrowed to match.
12:34 01:02:03 12:34 01:02 01:02:03 12:34 01:02:03 12:34:00
To force the type of a mixed list of temporal values, append a type specifier.
01:02:03 12:34 11:59:59.999u 01:02 12:34 11:59
Empty and Singleton Lists
Lists with one or no items merit special consideration.
The General Empty List
It is useful to have lists with no items. A pair of parentheses with nothing (except possibly whitespace) between denotes the empty list.
L:( ) L -
We shall see in Creating Typed Empty Lists that it is possible to define an empty list with a specific type.
Lists with a Single Item
There is a quirk in q regarding how it handles a list containing a single item, called a singleton. Creation of a singleton presents a notational problem. To see the issue, first realize that a list containing a single atom is distinct from the individual atom. As any UPS driver will readily tell you, an item in a box is not the same as an unboxed item. By now, we recognize the following as atoms,
42 1b 0x2a `beeblebrox "z"
We also recognize the following are all lists with two elements,
(42;6) 01b `zaphod`beeblebrox "zb" (40;`two)
How to create a list of a single item? Good question. The answer is that there is no syntactic way to do so. You might think that you could simply enclose the item in parentheses, but this doesn't work since the result is an atom.
singleton:(42) singleton 42
The reason for this is that parentheses are used for multiple purposes in q. As we have seen, paired parentheses are used to delimit items in the specification of a general list. Paired parentheses are also used for grouping in expressions - that is, to isolate the result of the expression inside the parentheses. The latter usage forces (42) to be the same as the atom 42 and so precludes the intention in the specification of singleton above.
The way to make a list with a single item is to use the enlist function, which returns a singleton list containing what is to its right.
singleton:enlist 42 singleton ,42
- To distinguish between an atom and the equivalent singleton, examine the sign of their types.
signum type 42 -1 signum type enlist 42 1
As a final check before moving on, make sure that you understand that the following also defines a list containing a single item,
singleton:enlist 1 2 3 count singleton 1
Recall that a list is ordered from left to right by the position of its items. The offset of an item from the beginning of the list is called its index. Thus, the first item is has index 0, the second item (if there is one) has index 1, etc. A list of count n has index domain 0 to n-1.
Given a list L, the item at index i is accessed by L[i]. Retrieving an item by its index is called item indexing. For example,
L:(-10.0;3.1415e;1b;`abc;"z") L -10f L 3.1415e L 1b L `abc L "z"
Items in a list can also be assigned via item indexing. Thus,
L1:1 2 3 L1:42 L1 1 2 42
Important: Index assignment into a simple list enforces strict type matching with no type promotion. Otherwise put, when you reassign an item in a simple list, the type must match exactly and a narrower type is not widened.
L:100 200 300 L:42h 'type f:100.0 200.0 300.0 f 100 200 300f f:400 'type
This may come as a surprise if you are accustomed to numeric values always being promoted to wider types in a verbose language.
Providing an invalid data type for the index results in an error.
L:(-10.0;3.1415e;1b;`abc;"z") L[`1] 'type
If you attempt to index outside of the bounds of the list, the result is not an error. Rather, you get a null value. If the list is simple, this is the null for the type of atoms in the list. For general lists, the result is 0n.
One way to understand this is that the result of asking for a non-existent index is "missing value." Keep this in mind, since indexing one position past the end of the list is easy to do, especially if you're not used to indexing relative to 0.
Empty Index and Null Item
An empty index returns the entire list.
L -10f 3.1415e 1b `abc "z"
Note: An empty index is not the same as indexing with an empty list. The latter returns an empty list.
The syntactic form double-colon ( :: ) denotes the null item, which allows explicit notation or programmatic generation of an empty index.
L[::] -10f 3.1415e 1b `abc "z"
Advanced: The type of the null item is undefined; in particular, its type does not match that of any normal item in a list. As a consequence, inclusion of the null item in a list forces the list to be general.
L:(1;2;3;::) L 1 2 3 :: type L 0h
This can be used to avoid a nasty surprise when q is too clever. To see how, consider the general list,
L:(1;2;3;`a) type L 0h
Now, reassign the last item to an int and note what happens to the list.
L:4 L 1 2 3 4 type L 6h
The list has been converted to a simple list of int! A subsequent attempt to reassign the last item back to its original value fails with a type error.
This can be circumvented by placing a null item in the list, forcing it to remain general.
L:(1;2;3;`a;::) L:4 L 1 2 3 4 :: type L 0h L:`a L 1 2 3 `a ::
Lists from Variables
Lists can be created from variables.
L1:(1;2;100 200) L2:(1 2 3;‘ab`c) L6:(L1;L2) L6 1 2 100 200 1 2 3 `ab `c
We scoop our presentation on operations in the next chapter to describe an important operation on lists. Probably the most common operation on two lists is to join them together to form a larger list. More precisely, the join oerator (,) appends its right operand to the end of the left operand and returns the result. It accepts an atom in either argument.
1 2,3 4 5 1 2 3 4 5 1,2 3 4 1 2 3 4 1 2 3,4 1 2 3 4
Observe that if the arguments are not of uniform type, the result is a general list.
1 2 3,4.4 5.5 1 2 3 4.4 5.5 1 2 3,"ab" 1 2 3 "a" "b"
which always yields a list with the content of x.
Lists as Maps
Thus far, we have viewed a list as a static collection of its items. We can also consider a list to be a mapping provided by item indexing. Specifically, a list L of count n represents a monadic mapping over the domain of non-negative integers 0,...,n-1. The list mapping assigns the output value L[ i] to the input value i. Succinctly, the I/O association for the list is,
i ——> L[ i]
Here are the I/O tables for some basic lists:
101 102 103 104
(`a; 123.45; 1b)
(1 2; 3 4)
The first two examples demonstrate ranges of a collection of atoms. The last example has a range comprised of lists.
A list not only looks like a map, it is a map whose notation is a shortcut for the I/O table assignment. This is a useful way of looking at things. We shall see in Primitive Operations that a nested list can be viewed as a multivalent map whose range is atoms.
From the perspective of list as map, the fact that indexing outside the bounds of a list returns null means the map is implicitly extended to the domain of all integers with null values outside the list items.
Data complexity is built by using lists as items of lists.
Now that we're comfortable with simple lists, we return to general lists. We can nest by including lists as items of lists. The number of levels of nesting for a list is called its depth. Atoms are considered to have depth 0 and simple lists have depth 1.
The notation of complex lists reflects their nesting. For pedagogical purposes, in this section, we shall often use general notation to define even simple lists; however, the console always display lists in simplified form. In subsequent sections, we shall use only simplified notation for simple lists.
Following is a list of depth 2 that has three items, the first two being atoms and the last a list.
L1: (1;2;(100;200)) count L1 3
Following is the simplified notation for the inner list,
L1:(1;2;100 200) L1 1 2 100 200
We present a pictorial representation that may help in visualizing levels of nesting. An atom is represented as a circle containing its value. A list is represented as a box containing its items. A general list is a box containing boxes and atoms.
Following is a list of depth two having two elements, each of which is a simple list,
L2:((1;2;3);(`ab;`c)) L2 1 2 3 `ab`c count L2 2
Following is a list of depth two having three elements, each of which is a general list,
L3:((1;2h;3j);("a";`bc);(1.23;4.56e)) L3 (1;2h;3j) ("a";`bc) (1.23;4.55999994278e) count L3 3
Following is a list of depth two having one item that is a simple list,
L4:enlist 1 2 3 4 L4 1 2 3 4 count L4 1 L4 1 2 3 4
Following is list of depth three having two items. The second item is a list of depth two having three items, the last of which is a simple list of four items.
L5:(1;(100;200;(1000;2000;3000;4000))) L5 1 (100;200;1000 2000 3000 4000) count L5 2 count L5 3
Following is a "rectangular" list that can be thought of as a 3x4 matrix,
m:((11;12;13;14);(21;22;23;24);(31;32;33;34)) m 11 12 13 14 21 22 23 24 31 32 33 34
Indexing at Depth
It is possible to index directly into the items of a nested list.
Repeated Item Indexing
Retrieving an item via a single index always retrieves an uppermost item from a nested list.
L:(1;(100;200;(1000;2000;3000;4000))) L 1 L 100 200 1000 2000 3000 4000
Recalling that q evaluates expressions from right-to-left, we interpret the second retrieval above as,
- Retrieve the item at index 1 from L
Alternatively, reading it functionally as left-of-right,
- Retrieve from L the item at index 1
Since the result L is itself a list, we can retrieve its elements using a single index.
L 1000 2000 3000 4000
Read this as:
- Retrieve the item at index 2 from the item at index 1 in L
- Retrieve the item at index 1 from L, and from it retrieve the item at index 2
We can repeat single indexing once more to retrieve an item from the innermost nested list.
Read this as,
- Retrieve the item from index 0 from the item at index 2 in the item at index 1 in L
- Retrieve the item at index 1 from L, and from it retrieve the item at index 2, and from it retrieve the item at index 0
Notation for Indexing at Depth
There is an alternate notation for repeated indexing into the constituents of a nested list. The last retrieval can also be written as,
Retrieving inner items for a nested list with this notation is called indexing at depth.
Assignment via index also works at depth.
L:(1;(100;200;(1000 2000 3000 4000))) L[1;2;0]:999 L 1 (100;200; 999 2000 3000 4000)
To verify that the notation for indexing at depth is reasonable, we return to our matrix example,
m:((11;12;13;14);(21;22;23;24);(31;32;33;34)) m[0;2] 13 m 13
The indexing at depth notation suggests thinking of m as a multi-dimensional matrix, whereas repeated single indexing suggests thinking of m as an array of arrays. Chacun à son goût.
A list of positions can be used to index a list.
Retrieving Multiple Items
In this section, we begin to see the power of q for manipulating lists. We start with,
L1:100 200 300 400
We know how to index single items of the list
L1 100 L1 300
By extension, we can retrieve a list of multiple items via multiple indices,
L1[0 2] 100 300
The indices can be in any order, and the corresponding items are retrieved,
L1[3 2 0 1] 400 300 100 200
An index can be repeated,
L1[0 2 0] 100 300 100
Some more examples,
bits:01101011b bits[0 2 4] 011b chars:"beeblebrox" chars[0 7 8] "bro"
This explains why including the semi-colon separators is essential when indexing at depth. Leaving them out effectively specifies multiple indices, and you will get a corresponding list of values from the top level as a result.
Indexing via a Simple List
You have no doubt noticed that retrieving items via multiple indices looks just like we've substituted a list for the index. Indeed, this is exactly what is happening. Here are some examples of a simple index list,
I:3 2 0 L1[I] 400 300 100 L2:(-10.0;3.1415e;1b;`abc;"z") L2[I] `abc 1b -10f L3:(1;(100;200;(1000;2000;3000;4000));5;(600 700)) L3 1 (100 200; 1000 2000 3000 4000) 5 600 700 J:2 1 0 L3[J] 5 (100 200; 1000 2000 3000 4000) 1
Indexing via a General List
Observe that in every case, the result of indexing a given list via a simple list is a new list whose values are retrieved from the first level of the given list and whose shape is the same as the index list. In particular, the retrieved list has the same shape as the index list. This suggests the behavior with an index that is a non-simple list.
L1:100 200 300 400 L1[(0 1; 2 3)] 100 200 300 400 I:(1;(0;(3 2))) L1[I] 200 (100;400 300)
To figure out the result of indexing by any non-simple list, start with the fact that the result always has the same shape as the index.
Advanced: More precisely, the result of indexing via a list conforms to the index list. The notion of conformability of lists is defined recursively. All atoms conform. Two lists conform if they have the same number of items and each of their corresponding items conform. In plain language, two lists conform if they have the same shape.
Assignment with List Indexing
Recall that a list item can be assigned via item indexing,
L:100 200 300 400 L:1000 L 1000 200 300 400
Assignment via index extends to indexing via a simple list.
L:100 200 300 400 L[1 2 3]:2000 3000 4000 L 100 2000 3000 4000
Note: Assignment via a simple index list is processed in index order - i.e., from left-to-right. Thus,
L[3 2 1]:999 888 777
is equivalent to,
L:999 L:888 L:777
Consequently, in the case of a repeated item in the index list, the right-most assignment prevails.
L:100 200 300 400 L[0 1 0 3]:1000 2000 3000 4000 L 3000 2000 300 4000
You can assign a single value to multiple items in a list by indexing on a simple list and using an atom for the assignment value.
L:100 200 300 400 L[1 3]:999 L 100 999 300 999
Now that we're familiar with retrieving and assigning via an index list, we introduce a simplified notation. It is permissible to leave out the brackets and juxtapose the list and index with a separating blank. Some examples follow.
L:100 200 300 400 L 100 L 0 100 L[2 1] 300 200 L 2 1 300 200 I:2 1 L[I] 300 200 L I 300 200 L[::] 100 200 300 400 L :: 100 200 300 400
Which notation you use is a matter of personal preference. In this manual, we usually use brackets, since this notation is probably most familiar from verbose programming. Experienced q programmers often use juxtaposition since it reduces notational density.
The dyadic primitive find ( ? ) returns the index of the right operand in the left operand list.
1001 1002 1003?1002 1
Performing find on a list is the inverse to positional indexing because it maps an item to its position.
If you try to find an item that is not in the list, the result is an int equal to the count of the list.
1001 1002 1003?1004 3
The way to think of this result is that the position of an item that is not in the list is one past the end of the list, which is where it would be if you were to append it to the list.
Of course, find extends to lists of items.
1001 1002 1003?1003 1001 2 0
Eliding Indices for a Matrix List
We return to the situation of indexing at depth for nested lists. For simplicity, let's start with a list that looks like a matrix.
m:(1 2 3 4; 100 200 300 400; 1000 2000 3000 4000)
Analogy with traditional matrix notation suggests that we could retrieve a row or column from m by providing a "partial" index at depth. Indeed, this works.
m[1;] 100 200 300 400 m[;3] 4 400 4000
Observe that eliding the last index reduces to item indexing at the top level.
m[1;] 100 200 300 400 m 100 200 300 400
Note: In the previous example, the two syntactic forms have the same result, but the first more clearly connotes the situation.
The situation of eliding other than the first index is more interesting. The way to read m[;3] above is,
- Retrieve the items in the third position from all items at the top level of m
Eliding Indices for a General List
Let's tackle another level of nesting.
L:((1 2 3;4 5 6 7);(`a`b`c`d;`z`y`x`;`0`1`2);("now";"is";"the")) L (1 2 3;4 5 6 7) (`a`b`c`d;`z`y`x`;`0`1`2) ("now";"is";"the") L[;1;] 4 5 6 7 `z`y`x` "is" L[;;2] 3 6 `c`x`2 "w e"
Interpret L[;1;] as,
- Retrieve all items in the second position of each list at the top level
Interpret L[;;2] as,
- Retrieve the items in the third position for each list at the second level
Observe that in L[;;2] the attempt to retrieve the item at the third position of the string "is" resulted in the null value " "; hence the blank in "w e" of the result.
Recommendation: In general, it will make things more evident if you do not omit trailing semi-colons when eliding indices. For example, with L as above,
L[ ;;] / instead of L L[1;;] / instead of L L[;1;] / instead of L[;1]
As the final exam for this section, let's combine an elided index with indexing by simple arrays. Let L be as above. Then we can retrieve a cross-section of L using a combination of elided and list indices.
L[0 2;;0 1] (1 2;4 5) ("no";"is";"th")
Interpret this as,
- Retrieve the items from positions 0 and 1 from all columns in rows 0 and 2
Rectangular Lists and Matrices
In this section, we further investigate the matrix-like lists from the previous section. A "rectangular" list is a list of lists, all having the same count. Understand that this does not mean that a rectangular list is necessarily a traditional matrix, since there can be additional levels of nesting. For example, the following list is rectangular because each of its items has count three, but is not a matrix.
L:(1 2 3; (10 20; 100 200; 1000 2000)) L 1 2 3 10 20 100 200 1000 2000
In a rectangular list, elision of the second index corresponds to generalized row retrieval and elision of the first index corresponds to generalized column retrieval.
r:(`a`b`c;(1 2 3 4;10 20 30 40;100 200 300 400)) r[0;] `a`b`c r[;1] `b 10 20 30 40
Advanced: A rectangular list can be transposed with flip (see flip), meaning that that the rows and columns are reflected, effectively reversing the first two indices in indexing at depth. For example, the transpose of L above is,
flip L 1 10 20 2 100 200 3 1000 2000
Matrices are a special case of rectangular lists and can most easily be defined recursively. A matrix of dimension 1 is a simple list. In the context of mathematical operations, the simple list would have numeric type, but this is not a restriction. The count of a one-dimensional matrix is called it size. In some contexts, a simple one-dimensional matrix is called a vector, its count length, and an atom is a scalar. Some examples.
v1:1 2 3 v2:98.60 99.72 100.34 101.93 v3:`so`long`and`thanks`for`all`the`fish
For n>1, we define a matrix of dimension n recursively as a list of matrices of dimension n-1 all having the same size. Thus, a matrix of dimension 2 is a list of matrices of dimension 1, all having the same size. If all items in a matrix have the same type, we call this the type of the matrix.
Two and Three Dimensional Matrices
Two-dimensional matrices are frequently encountered and have special terminology. Let m be a two-dimensional matrix. The items of m are its rows. As we have already seen, the ith row of m can be obtained via item indexing as m[i]. Equivalently, we can use an elided index with indexing at depth to obtain the ith row as m[i;].
By laying out the rows of m in tabular form, we realize that the list m[;j] is the jth column of m. Note that the expressions m[i][j] and m[i;j] both retrieve the same item - namely, the element in row i and column j.
Following is an example of a two dimensional matrix of int, having size 4x3,
m:(1 2 3;10 20 30;100 200 300;1000 2000 3000) m 1 2 3 m[0;] 1 2 3 m[;2] 3 30 300 3000 m 3 m[0;2] 3
The specification of m demonstrates that our approach to matrix definition treats m as a collection of rows - i.e., m is in row order. Since each row is a simple list, the elements of a row are in fact stored in contiguous memory. This makes retrieval of an entire row very fast, but retrieval of a column will be slower since its elements are not contiguous. This choice was made so that list indexing would result in the conventional matrix notation.
Advanced: It is equally valid to consider a one-dimensional array as a column and a two dimensional array as a collection of column vectors. This would make column retrieval very fast, but index order would be transposed from conventional notation. As we shall see in Tables, a table is in fact a collection of columns that are notationally transposed for convenience. The constraints and calculations of q-sql operate on columns, so they are fast, especially when the columns are vectors (i.e., simple lists). In particular, a simple time series can be represented by two parallel ordered columns, one holding the datetimes and the second holding the associated values. Retrieving and manipulating the points stored in time sequence is faster by orders of magnitude than performing the same operations in an RDBMS that stores data by row with undefined row order.
For completeness, here is an example of a three dimensional 2x3x3 matrix - i.e., each item of mm is a 3x3 matrix,
mm:((1 2 3;4 5 6;7 8 9);(10 20 30; 40 50 60; 70 80 90)) mm 1 2 3 4 5 6 7 8 9 mm[1;2] 70 80 90 mm[1;;2] 30 60 90
We have seen that matrices in q look and act like their mathematical counterparts. However, they have additional features not available in simple mathematical notation or in many verbose languages. We have seen that a matrix can be viewed and manipulated both as a multi-dimensional array (i.e., indexing at depth) and as an array of arrays (repeated item indexing). In addition, we can extend individual item indexing with indexing via a simple list. With m as above,
m[0 2] 1 2 3 100 200 300
©2006-2007 Kx Systems, Inc. and Continuux LLC. All rights reserved.