QforMortals/dictionaries
Contents |
Dictionaries
Overview
Dictionaries are a generalization of lists and provide the foundation for tables. A dictionary is a (mathematical) mapping defined by an explicit I/O association between a domain list and range list. The two lists must have the same count. While general lists can be used to create a dictionary, the most useful dictionaries involve lists of special forms. The domain list should be a unique collection, and is frequently a collection of symbols representing names. As we shall see, a dictionary that satisfies these two requirements and whose range is a list of simple lists corresponds to a SQL table.
Dictionary Basics
A dictionary is an ordered collection of key-value pairs that is roughly equivalent to a hashtable in verbose languages.
Definition
A dictionary, also called an association, is a mapping defined by an explicit I/O association between a domain list and a range list via positional correspondence. The creation of a dictionary uses the "xkey" primitive (!)
ListOfDomain ! ListOfRange
Recall from the Mathematical Functions Refresher that we can view a map's I/O table as a pair of input and output columns. Dictionary notation is simply the map's I/O table turned on its side for ease of entry and compactness of display.
The domain list comprises the keys of the dictionary and the range list its values. The keys of a dictionary are retrieved by the unary primitive key and the values by the unary primitive value. The count of the dictionary is the (common) count of its keys and values.
Although q does not enforce the requirement that the key items are unique, a dictionary does provide a unique output value for each input value, thus guaranteeing a well-defined mathematical map. See below for details.
The most basic dictionary maps a simple list to a simple list. The following I/O table represents a mapping of three symbols containing names to the corresponding individual's intelligent quotient,
I | O |
`Dent | 42 |
`Beeblebrox | 98 |
`Prefect | 126 |
This mapping is defined compactly as a dictionary,
d1:`Dent`Beeblebrox`Prefect!42 98 126 count d1 3 key d1 `Dent`Beeblebrox`Prefect value d1 42 98 126
The function cols also returns the domain of d1,
cols d1 `Dent`Beeblebrox`Prefect
The order of the items in the domain and range lists is significant, just as for lists. Although the I/O assignments and the associated mappings are the same regardless of order, differently ordered dictionaries are not considered identical.
d11:`Prefect`Beeblebrox`Dent!126 98 42 d1~d11 0b
Lookup
Finding the dictionary output corresponding to an input value is called looking up the input. This actually is achieved via a hash-table lookup under the covers. As for functions and lists, both d[x] and d x lookup the output value for x.
d1[`Beeblebrox] 98 d1 `Beeblebrox 98
As with item indexing, lookup of a key not in the domain of a dictionary results in a null value, not an error.
d1[`Slaartibartfast] 0N
As with lists and functions, key lookup in a dictionary is extended item-wise to a simple list of keys.
d1[`Dent`Prefect] 42 126
{{{1}}}
d[K][j] = d[K[j]]
Using one of our examples,
d:`Dent`Beeblebrox`Prefect!42 98 126 K:`Dent`Prefect d[K][1] 126 d[K[1]] 126
Dictionary vs. List
A dictionary is not a list. A dictionary is a generalization of a list in which item indexing has been extended to a non-integral domain. Although a dictionary domain is explicitly specified in an ordered list, a dictionary cannot be indexed implicitly via position. Attempting this on any dictionary generates an error.
d:"abcde"!1.1 2.2 3.3 4.4 6.5 d["c"] 3.3 d[0] `type
We can define a dictionary whose lookup emulates the mapping of list item indexing.
L3:`one`two`three L3[1] `two d3:0 1 2!`one`two`three d3[1] `two
When we ask q to compare the two entities for equality, it obliges by considering the list as a dictionary with implicit domain. It then tests item-wise.
L3=d3 0 1 2!111b
However, a dictionary so-specified is not the same as the list.
L3~d3 0b
Although retrieving items from a list-like dictionary is notationally identical to item indexing, it is not the same. Item indexing is positional, whereas dictionary retrieval is a lookup. They are implemented differently under the covers.
Uniqueness of Keys
We noted earlier that q does not enforce uniqueness in a dictionary domain list. In the event of a repeated domain item, only the output value associated with the first occurrence in left-to-right order is accessible via lookup. This guarantees that a dictionary provides a unique output for each input value and is thus a well-defined mathematical map. For example,
ddup:8 4 8 2 3 1!`one`two`three`four`five`six ddup[8] `one
Observe that the range values of a dictionary are not required to be atoms. The range can be a general list that contains nested lists,
dgv:(1;2h;3.3;"4")!(`one;2 3;"456";(7;8 9)) dgv["4"] (7;8 9)
Nor are keys are required to be atoms,
dgk:(0 1; 2 3)!`first`second dgk[0 1] `first dgk[2 3] `second
If the keys are not a list of items of uniform shape, lookup does not work in a useful way,
dweird:(0 1; 2; 3)!`first`second`third dweird[0 1] `first dweird[2] dweird[3]
The observed behavior is that key lookup fails at the first key of different shape.
Operations on Dictionaries
Amend
As with lists, the items of a dictionary can be modified via indexed assignment.
d:10 20 30!"abc" d[30]:"x" d 10 20 30!"abx"
In contrast to lists, dictionaries can be extended via index assignment. For example,
d[40]:"y" d 10 20 30 40!"abxy" L:"abc" L[3]:"x" 'length
Let's examine this capability to modify or extend a list via index assignment more closely. Let d be a dictionary, c be an atom whose type matches the domain of d, and x an item whose type is compatible with the range of d. The assignment,
d[c]:x
updates the existing range value if c is in the domain of d, but inserts a new entry at the end of the dictionary if c is not in the domain of d.
This insert/update behavior is called upsert semantics. Because tables are essentially dictionaries, upsert semantics carry through to tables.
Reverse Lookup with Find (?)
Recall that the dyadic primitive find (?) returns the index of the right operand in a list,
1001 1002 1003?1002 1
Extending this concept to dictionaries means reversing the domain-to-range mapping. We expect ? to perform reverse lookup by mapping a range element to its domain element.
d:`a`b`c!1001 1002 1003 d?1002 `b
The result of find on an entity not in the range is a null whose type matches the domain list. For simple lists, the null matches the type of the list; for general lists, the null is 0N.
d?1004 / the result is the null symbol ` ` dg:(1;`a;"z")!10 20 30 dg?50 0N
For a non-unique range element, find returns the first item mapping to it from the domain list.
d:`a`b`c`d!1001 1002 1003 1002 d?1002 `b
Removing Entries
To remove a single entry from a dictionary by key value, you can use the binary operation delete (_). The left operand of delete is the dictionary (target) and the right operand is a key value whose type matches that of target.
Whitespace is required on both sides of _.
For example,
d:1 2 3!`a`b`c d _ 2 1 3!`a`c d _ 42 1 2 3!`a`b`c
Observe that attempting to remove a key that does not exist has no effect. Also, deleting all the elements in a dictionary leaves a dictionary with empty domain and range lists of the appropriate types.
((d _ 1) _ 2) _ 3 (`int$())!`symbol$()
It is possible to remove multiple entries from a dictionary using the binary cut, also denoted by an underscore (_) . The left operand of cut is a list of key values whose type matches that of the dictionary and the right operand is the dictionary (target). The result is a dictionary obtained by removing the specified key-value pairs from target.
Whitespace is required on both sides of _.
Since the left operand is required to be a list, a single key value must be enlisted.
For example,
d:1 2 3!`a`b`c (enlist 2) _ d 1 3!`a`c 1 3 _ d (,2)!,`b (enlist 42) _ d 1 2 3!`a`b`c 1 2 3 _ d (`int$())!`symbol$()
As with delete, attempting to remove a key that does not exist has no effect and deleting all the elements in a dictionary leaves a dictionary with empty domain and range of the appropriate types.
You can spell out the verb _ as cut. This arguably makes things more readable.
(enlist 2) cut d 1 3!`a`c
Primitive Operations
Because dictionaries are maps, it is possible to compose their mappings with function mappings to perform operations on dictionaries. Of course, this assumes that the range of each dictionary is in the domain of the indicated operation, so that the operation makes sense.
d1:`a`b`c!1 2 3 2*d1 `a`b`c!2 4 6 d1=2 `a`b`c!010b
When the domains of two dictionaries are identical, performing binary operations is easy. For example, to add two dictionaries with a common domain, add their corresponding range elements,
d2:`a`b`c!10 20 30 d1+d2 `a`b`c!11 22 33
How do we combine two dictionaries whose domains are not identical? First, the domain of the resulting dictionary is the union of the domains of its operands. For items in the intersection of the domain lists, clearly we should simply apply the indicated operation on the corresponding range items.
The real question is what to do on non-common domain items? The answer: do what makes sense for the operation. We start with joining two dictionaries.
Join
In the simple case of joining two disjoint dictionaries, the result should be the merge.
d3:`e`f`g!100 200 300 d1,d3 `a`b`c`e`f`g!1 2 3 100 200 300 d3,d1 `e`f`g`a`b`c!100 200 300 1 2 3
Observe that although the mappings arising from opposite-order joins have equivalent input-output assignments, the dictionaries are not identical because order is significant.
We examine another simple example of joining dictionaries with a special form. The particular dictionaries map symbols to lists of simple lists. We are interested in merging when the two are disjoint. For example,
dc1:`a`b!(1 2 3; 10 20 40) dc2:(enlist `c)!enlist 10 20 30 dc1,dc2 `a`b`c!(1 2 3;10 20 40;10 20 30)
Observe that the join simply appends the domains and ranges in the obvious way.
Now we tackle the case of non-disjoint dictionaries. The issue is how to merge items that are common to both dictionary domains, since these elements each have two I/O assignments.
In a join of dictionaries, the right operand's I/O assignment prevails for common domain elements.
The result is another illustration of upsert semantics. Each I/O assignment of the right operand is applied as an update if the domain element is assigned in the left operand, or as an insert if the domain element is not already assigned.
With d1 as above,
d3:`c`d!33 44 d1,d3 `a`b!`c`d!!1 2 33 44
Observe that upsert is not commutative, even over a common domain. Join order matters.
d4:`a`b`c!300 400 500 d1,d4 `a`b`c!300 400 500 d4,d1 `a`b`c!1 2 3
Arithmetic Operations
Now that we understand how to join two dictionaries, we examine other operations. When arithmetic and comparison operations are performed on dictionaries, the indicated operation is performed on the common domain elements and the dictionaries are merged elsewhere,
d5:`c`x`y!1000 2000 3000 d1+d5 `a`b`c`x`y!1 2 1003 2000 3000 d1*d5 `a`b`c`x`y!1 2 3000 2000 3000 d1|d5 `a`b`c`x`y!1 2 1000 2000 3000
When a relational operation is performed on two dictionaries, the indicated operation is performed over the entire union domain. Effectively, each dictionary is extended to the union domain with (type-matched) nulls. Otherwise put, for non-common domain items, the operation is performed on a pair of items in which a null whose type matches the provided range item is substituted for the missing range item.
In the following examples, observe that operations on d1 and d6 are equivalent to the corresponding operations on d11 and d66,
d1:`a`b`c!1 2 3 d6:`b`c`d`e!22 3 44 55 d1=d6 `a`b`c`d`e!00100b d1 < d6 `a`b`c`d`e!01011b d6 < d1 `b`c`d`e`a!00001b d1 > d6 `b`c`d`e`a!00001b d11:`a`b`c`d`e!1 2 3 0N 0N d66:`a`b`c`d`e!0N 22 3 44 55 d11=d66 `a`b`c`d`e!00100b d11 < d66 `a`b`c`d`e!01011b d66 < d11 `b`c`d`e`a!00001b d11 > d66 `b`c`d`e`a!00001b
The > operation is evidently converted to the equivalent < operation with reversed operands.
Column Dictionaries
Column dictionaries are the foundation for tables.
Definition and Terminology
A very useful type of dictionary is one that maps a list of symbols to a rectangular list of lists. Such a dictionary has the form,
c_{1}...c_{n}!(v_{1};...;v_{n})
where each c_{i} is a symbol, each v_{i} is a list with (common) count. This dictionary associates the symbol c_{i} with the vector of values v_{i}.
Interpreting each symbol as a column name and the corresponding vector as the column values, we call such a list a column dictionary. The type of column named by c_{i} is the type of its value vector v_{i}. For many column dictionaries, all the vectors v_{i} are simple lists, meaning that each column comprises atoms of a uniform type. We call this a simple column dictionary.
Example
Let's reorganize the example of the previous section as a simple column dictionary,
scores:`name`iq!(`Dent`Beeblebrox`Prefect;42 98 126)
In this dictionary, the values for the name column are,
scores[`name] `Dent`Beeblebrox`Prefect
It is possible to retrieve the values for a column using dot notation,
scores.name `Dent`Beeblebrox`Prefect
The value in row 1 of the name column is,
scores[`name][1] `Beeblebrox
Similarly, the value in row 2 of the iq column is,
scores[`iq][2] 126
Accessing Values
For a general column dictionary defined as,
dcols:c1...cn!(v1;...;vn)
the i^{th} element of column cj is retrieved by,
dcols[cj][i]
What should we make of the following notation?
dcols[cj;i]
We can interpret it in three ways:
- Indexing at depth in the dictionary
- 2. A generalization of a two dimensional matrix in which item indexing in the first dimension has become lookup into the list of column names
- 3. A dyadic mapping
All interpretations are all equivalent and give the same result
dcols[cj][i]
In our example,
scores[`iq; 2] 126
Rows and Columns
Viewing the dictionary as a dyadic function, we can project onto its first argument by fixing it to obtain the monadic function—i.e., dcols[cj;]. This projected form yields item indexing into the column vector.
In simple terms, projecting onto the first argument retrieves a vector of column values from a column dictionary,
scores[`iq;] 42 98 126
Analogously, we would expect projection onto the second argument to retrieve a "row" corresponding to the values in the i^{th} position of each column vector. What form does such a row take?
Observe that the projection of the dyadic function onto its second argument by fixing the item index,
dcols[;i]
is a monadic function corresponding to generalized indexing by column name—i.e., dictionary lookup. Thus, we expect the i^{th} row to be a dictionary that maps each column name to the value in that column's i^{th} row.
This is exactly what we find,
scores[;2] `name`iq!(`prefect;126)
Notational differences aside, this resembles the result of retrieving a record from a table using a SQL query: we get the column names and the associated row values.
A column dictionary seems to be the perfect data structure to serve as the basis for a table: a generalized matrix with indexed rows and named columns. But you no doubt notice the fly in the ointment... the indices are in the wrong order. It is unnatural to retrieve a column in the first index and a row in the second.
Flipping a Dictionary
Transpose of a Column Dictionary
A a column dictionary can be viewed as a generalized rectangular matrix. Let d be a column dictionary defined as,
d:c1...cn!(v1;...;vn)
where ci is a symbol and the vi have common count, say m. We can index at depth into d for each ci and each j,
d[ci;j] = vi[j]
All the vi have count m. By analogy with matrices, it makes sense to define the transpose t of d by the formula,
t[j;ci] = d[ci;j]
Exactly what is t that is defined by this formula? The answer comes from realizing that indexing at depth into t should be the same as repeated indexing,
t[j;ci] = t[j][ci]
The right hand side of this equation makes explicit that t is a list of n items t[j], for 0<=j<n.
What is each item in the list t? Combining the three previous equations, we see that that
t[j][ci] = vi[j]
Now fix j in this equation. We see that t[j] is a dictionary with the same domain as d, meaning the list of ci. The dictionary assigns to each item ci the output value vi[j]. Thus, the range of the dictionary is the collection of values v1[1],...,vn[j].
We summarize our findings,
- The transpose of a column dictionary is a list of dictionaries. The dictionaries in the transpose have as common domain the column names of the original dictionary. The dictionary in the j^{th} item of the transpose maps the column names to the j^{th} row of values across the column vectors.
flip on a Column Dictionary
As in the case of lists, the transpose of a dictionary is obtained by applying the unary flip operator,
flip d
When flip is applied to a column dictionary, no data is actually rearranged. The console display confirms this with a + prepended to the dictionary, indicating that it has been transposed.
d:`name`iq!(`Dent`Beeblebrox`Prefect;42 98 126) flip d +`name`iq!(`Dent`Beeblebrox`Prefect;42 98 126) show flip d name iq -------------- Dent 42 Beeblebrox 98 Prefect 126
The net effect of flipping a column dictionary is simply reversing the order of the indices. This is logically equivalent to transposing rows and columns.
Flip of a Flipped Column Dictionary
If you transpose a dictionary twice, you obtain the original dictionary,
d~flip flip d 1b
Consequently, if you are given the transpose of a column dictionary and you flip it, you obtain a column dictionary.
t:flip d flip t `name`iq!(`Dent`Beeblebrox`Prefect;42 98 126)
As of this writing (Oct 2006), flip has been implemented in q for dictionaries of columns, although the operation makes sense for any rectangular dictionary. In the event that flip is implemented for a general rectangular dictionary (i.e., any dictionary in which the range is a list of lists all having the same count) we would find the following:
- The transpose of a rectangular dictionary is a list of dictionaries. The dictionaries in the transpose have a common domain that is the domain of the original dictionary. The j^{th} dictionary of the transpose maps the original domain to the j^{th} row of values across the range list. In this case, data likely will be rearranged.
Prev: Casting and Enumerations, Next: Tables
©2006 Kx Systems, Inc. and Continuux LLC. All rights reserved.