QforMortals2/dictionaries
Contents |
Dictionaries
Overview
Dictionaries are a generalization of lists and provide the foundation for tables. A dictionary is a (mathematical) mapping defined by an explicit I/O association between a domain list and range list. The two lists must have the same count and the domain list should be a unique collection. While general lists can be used to create a dictionary, many useful dictionaries involve lists of special forms. The domain is frequently a collection of symbols representing names. As we shall see, a dictionary whose domain is a unique list of symbols and whose range is rectangular corresponds to a table.
Dictionary Basics
A dictionary is an ordered collection of key-value pair - that is, a hashtable in verbose languages.
Definition
A dictionary, also called an association, is a mapping defined by an explicit I/O association between a domain list and a range list via positional correspondence. The creation of a dictionary uses the xkey primitive ( ! ),
- L_{domain}! L_{range}
Recall from Mathematical Functions Refresher the view of a map's I/O table as a pair of input and output columns. Dictionary notation is simply the map's I/O table turned on its side for ease of entry and compactness of display.
Note: All dictionaries have type 99h.
The domain list comprises the keys of the dictionary and the range list its values. The keys of a dictionary are retrieved by the unary primitive key and the values by the unary primitive value. The count of the dictionary is the (common) count of its keys and values.
Note: Although q does not enforce the requirement that the key items are unique, a dictionary does provide a unique output value for each input value, thus guaranteeing a well-defined mathematical map. See below for details.
The most basic dictionary maps a simple list to a simple list. The following I/O table represents a mapping of three symbols containing names to the corresponding individual's intelligent quotient,
I | O |
`Dent | 98 |
`Beeblebrox | 42 |
`Prefect | 126 |
This mapping is defined compactly as a dictionary.
d1:`Dent`Beeblebrox`Prefect!98 42 126 count d 3 key d `Dent`Beeblebrox`Prefect value d 98 42 126
The console displays a dictionary I/O table in columnar form.
d Dent | 98 Beeblebrox | 42 Prefect | 126
The function cols also returns the domain.
cols d1 `Dent`Beeblebrox`Prefect
Note: The order of the items in the domain and range lists is significant, just as positional order is significant for lists. Although the I/O assignments and the associated mappings are equivalent regardless of order, differently ordered dictionaries are not identical.
d1:`Prefect`Beeblebrox`Dent!126 42 98 d~d1 0b
Lookup
Finding the dictionary output value corresponding to an input value is called looking up the input. This actually is achieved via a hash-table lookup under the covers. Similar to functions and lists, both d[x] and d x lookup the output value for x.
d[`Beeblebrox] 42 d `Beeblebrox 42
As with item indexing, lookup of a key not in the domain of a dictionary results in an appropriately typed null value, not an error.
d[`Slartibartfast] 0N
As with lists and functions, key lookup in a dictionary is extended item-wise to a simple list of keys.
d[`Dent`Prefect] 98 126
Advanced: We can interpret key list lookup as the composition of the key lookup map with the item indexing map. Symbolically, let d be a dictionary and K a key list in the domain of d. Then for 0 ≤j < count K,
d[K][j] = d[K[j]]
Using one of our examples,
d:`Dent`Beeblebrox`Prefect! 98 42 126 K:`Dent`Prefect d[K][1] 126 d[K[1]] 126
Or, using the entire index list,
d K 98 126 d[K] 98 126
Dictionary vs. List
A dictionary a generalization of a list in which item indexing has been extended to a non-integral domain. In particular, a dictionary cannot be indexed implicitly via position. Attempting this on any dictionary generates an error.
d:"abcde"!1.1 2.2 3.3 4.4 6.5 d["c"] 3.3 d[0] `type
We can define a dictionary whose lookup emulates the mapping of list item indexing.
L3:`one`two`three L3[1] `two d3:0 1 2!`one`two`three d3[1] `two
When we ask q to compare the two entities for equality, it obliges by considering both as mappings with integral domain. It then tests the assignments item-wise.
L3=d3 0| 1 1| 1 2| 1
However, the dictionary so-specified is not the same as the list.
L3~d3 0b
Although retrieving items from a list-like dictionary is notationally identical to item indexing, it is not the same. Item indexing is a positional offset, whereas dictionary retrieval is a lookup. They are implemented differently under the covers.
Lookup with Verb @
Recall that indexing into a list can be achieved with verb @.
L:100 200 300 L[1] 200 L@1 200
The same syntax works for dictionary lookup.
d:`a`b`c!10 20 30 d[`b] 20 d@`b 20
Uniqueness of Keys
We noted earlier that q does not enforce uniqueness in a dictionary domain list. In the event of a repeated domain item, only the output value associated with the first occurrence in left-to-right order is accessible via lookup. This guarantees that a dictionary provides a unique output for each input value and is thus a well-defined mathematical map.
For example,
ddup:8 4 8 2 3 1!`one`two`three`four`five`six ddup[8] `one
Advanced: Reverse lookup works properly for a non-unique domain.
ddup?`three 8
Non-simple Domain or Range
The range values of a dictionary are not required to be atoms. The range can be a general list that contains nested lists.
dgv:(1;2h;3.3;"4")!(`one;2 3;"456";(7;8 9)) dgv["4"] 7 8 9
Nor are keys are required to be atoms.
dgk:(0 1; 2 3)!`first`second dgk[0 1] `first dgk[2 3] `second
Advanced: If the keys are not a list of items of uniform shape, lookup does not work in a useful way.
dweird:(0 1; 2; 3)!`first`second`third dweird[0 1] `first dweird[2] ` dweird[3] `
The observed behavior is that key lookup fails at the first key of different shape.
Extracting a Sub-Dictionary by Key
Dictionary lookup on a key or a list of keys returns the associated values. It is also possible to extract the key-value associations using the take operator (#). The left operand is a list of keys, the right operand is the source dictionary and the result is a new dictionary whose mapping is that of the original restricted to the specified keys.
(enlist `c)#d c| 30 `a`c#d a| 10 c| 30
This works when the keys are not simple.
dns:(1 2; 3 4; 5 6)!("onetwo"; "threefour"; "fivesix") (1 2; 5 6)#dns 1 2| "onetwo" 5 6| "fivesix"
Operations on Dictionaries
Amend and Upsert
As with lists, the items of a dictionary can be modified via indexed assignment.
d:10 20 30!"abc" d[30]:"x" d 10| a 20| b 30| x
Important: In contrast to lists, dictionaries can be extended via index assignment. For example,
d[40]:"y" d 10| a 20| b 30| x 40| y L:"abc" L[3]:"x" 'length
Let's examine this capability to modify or extend a dictionary via index assignment more closely. Let d be a dictionary, c be an atom whose type matches the domain of d, and x an item whose type is compatible with the range of d. The assignment,
d[c]:x
updates the existing range value if c is in the domain of d, but inserts a new entry at the end of the dictionary if c is not in the domain of d.
This insert/update behavior is called upsert semantics. Because tables are essentially dictionaries, upsert semantics carry through to tables.
Reverse Lookup with Find (?)
Recall that the dyadic primitive find ( ? ) returns the index of the right operand in a list.
1001 1002 1003?1002 1
Extending this concept to dictionaries means reversing the domain-to-range mapping. We expect ? to perform reverse lookup by mapping a range element to its domain element.
d:`a`b`c!1001 1002 1003 d?1002 `b
The result of find on an entity not in the range is a null whose type matches the domain list. For simple lists, the null matches the type of the list; for general lists, the null is 0N.
d?1004 / the result is the null symbol ` ` dg:(1;`a;"z")!10 20 30 dg?50 0N
Note: For a non-unique range element, find returns the first item mapping to it from the domain list.
d:`a`b`c`d!1001 1002 1003 1002 d?1002 `b
Removing Entries
The binary operation delete (_) returns the result of removing an entry from a dictionary by key value. The left operand of delete is the dictionary (target) and the right operand is a key value whose type matches that of target.
Note: Whitespace is required to the left of _ if the first operand is a variable.
For example,
d:1 2 3!`a`b`c d _2 1| a 3| c
Observe that attempting to remove a key that does not exist has no effect.
d _42 1| a 2| b 3| c
The binary delete, also denoted by an underscore ( _ ), returns the result of removing multiple entries from a dictionary. The left operand of delete is a list of key values whose type matches that of the dictionary and the right operand is the dictionary (target). The result is a dictionary obtained by removing the specified key-value pairs from target.
Note: Whitespace is also required to the left of _ if the first operand is a variable.
Note: Since the left operand is required to be a list, a single key value must be enlisted.
For example,
d:1 2 3!`a`b`c (enlist 2)_d 1| a 3| c 1 3_d 2| b (enlist 42)_d 1| a 2| b 3| c
Attempting to remove a key that does not exist has no effect.
4 5_d 1| a 2| b 3| c
Observe that removing all the entries in a dictionary leaves a dictionary with empty domain and range lists of the appropriate types.
1 2 3_d
There binary operator cut is the same as ( _ ) on a dictionary.
(enlist 2) cut d 1| a 3| c
Primitive Operations
Because dictionaries are maps, it is possible to compose their mappings with function mappings to perform operations on dictionaries. Of course, this assumes that the range of each dictionary is in the domain of the indicated operation, so that the operation makes sense. The application of a unary operator is straight-forward.
d1:`a`b`c!1 2 3 neg d1 a| -1 b| -2 c| -3 2*d1 a| 2 b| 4 c| 6 d1=2 a| 0 b| 1 c| 0
When the domains of two dictionaries are identical, performing binary operations is straightforward. For example, to add two dictionaries with a common domain, add their corresponding range elements,
d2:`a`b`c!10 20 30 d1+d2 a| 11 b| 22 c| 33
How do we combine two dictionaries whose domains are not identical? First, the domain of the resulting dictionary is the union of the domains of its operands. For items in the intersection of the domain lists, clearly we should simply apply the indicated operation on the corresponding range items.
The real question is, what to do on non-common domain items? The answer: do what makes sense for the operation. We start with joining two dictionaries.
Join
In the simple case of joining two disjoint dictionaries, the result should be the merge.
d3:`e`f`g!100 200 300 d1,d3 a| 1 b| 2 c| 3 e| 100 f| 200 g| 300 d3,d1 e| 100 f| 200 g| 300 a| 1 b| 2 c| 3
Observe that although the mappings arising from opposite-order joins have equivalent input-output assignments, the dictionaries are not identical because order is significant.
We examine another simple example of joining dictionaries with a special form. The particular dictionaries map symbols to lists of simple lists. When the two are disjoint the result should again be the merge. For example,
dc1:`a`b!(1 2 3; 10 20 40) dc2:(enlist `c)!enlist 10 20 30 dc1,dc2 a| 1 2 3 b| 10 20 40 c| 10 20 30
As in the previous example, join simply appends the domains and ranges in the obvious way. We shall refer to this case later.
Now we tackle the case of non-disjoint dictionaries. The issue is how to merge items that are common to both dictionary domains, since these elements each have two I/O assignments.
Important: In a join of dictionaries, the right operand's I/O assignment prevails for common domain elements.
The result is another illustration of upsert semantics. Each I/O assignment of the right operand is applied as an update if the domain element is assigned in the left operand, or as an insert if the domain element is not already assigned.
With d1 as above,
d3:`c`d!33 44 d1,d3 a| 1 b| 2 c| 33 d| 44
Observe that upsert is not commutative, even over a common domain. Join order matters.
d4:`a`b`c!300 400 500 d1,d4 a| 300 b| 400 c| 500 d4,d1 a| 1 b| 2 c| 3
Arithmetic Operations
Now that we understand how to join two dictionaries, we examine other operations. When arithmetic and comparison operations are performed on dictionaries, the indicated operation is performed on the common domain elements and the dictionaries are merged elsewhere,
d5:`c`x`y!1000 2000 3000 d1+d5 a| 1 b| 2 c| 1003 x| 2000 y| 3000 d1*d5 a| 1 b| 2 c| 3000 x| 2000 y| 3000 d1|d5 a| 1 b| 2 c| 1000 x| 2000 y| 3000
When a relational operation is performed on two dictionaries, the indicated operation is performed over the entire union domain. Effectively, each dictionary is extended to the union domain with (type-matched) nulls. Otherwise put, for non-common domain items, the operation is performed on a pair of items in which a null whose type matches the provided range item is substituted for the missing range item.
In the following examples, observe that operations on d1 and d6 are equivalent to the corresponding operations on d11 and d66,
d1:`a`b`c!1 2 3 d6:`b`c`d`e!22 3 44 55 d1=d6 a| 0 b| 0 c| 1 d| 0 e| 0 d1<d6 a| 0 b| 1 c| 0 d| 1 e| 1 d6<d1 b| 0 c| 0 d| 0 e| 0 a| 1 d1>d6 b| 0 c| 0 d| 0 e| 0 a| 1 d11:`a`b`c`d`e!1 2 3 0N 0N d66:`a`b`c`d`e!0N 22 3 44 55 d11=d66 a| 0 b| 0 c| 1 d| 0 e| 0 d11<d66 a| 0 b| 1 c| 0 d| 1 e| 1 d66<d11 a| 1 b| 0 c| 0 d| 0 e| 0 d11>d66 a| 1 b| 0 c| 0 d| 0 e| 0
Note: The > operation is evidently converted to the equivalent < operation with reversed operands.
Column Dictionaries
Column dictionaries are the foundation for tables.
Definition and Terminology
A very useful type of dictionary is one that maps a list of symbols to a rectangular list of lists. Such a dictionary has the form,
- c_{1} ... c_{n} !(v_{1} ;... ; v_{n})
where each c_{i} is a symbol and the v_{i} are lists with common count. Such a dictionary associates the symbol c_{i} with the list of values v_{i}.
Interpreting each symbol as a column name and the corresponding vector as the column values, we call such a list a column dictionary. The type of column named by c_{i} is the type of its value list v_{i}. For many column dictionaries, the v_{i} are all simple lists, meaning that each column is a vector of atoms of uniform type. We call this a simple column dictionary.
Simple Example
Let's reorganize the example of the previous section as a simple column dictionary.
scores:`name`iq!(`Dent`Beeblebrox`Prefect;42 98 126)
In this dictionary, the values for the name column are,
scores[`name] `Dent`Beeblebrox`Prefect
It is possible to retrieve the values for a column in a column dictionary using dot notation.
scores.name `Dent`Beeblebrox`Prefect
The value in row 1 of the name column is,
scores[`name][1] `Beeblebrox
Similarly, the value in row 2 of the iq column is,
scores[`iq][2] 126
The dictionary console shows the mapping clearly.
scores name| Dent Beeblebrox Prefect iq | 42 98 126
Accessing Values
For a general column dictionary defined as,
- d_{cols}:c_{1} ...c_{n}!(v_{1};...;v_{n})
the i^{th} element of column c_{j} is retrieved by,
d_{cols} [c_{j}][i]
What should we make of the following notation?
d_{cols} [c_{j};i]
We can interpret it in three ways:
- Indexing at depth in the dictionary
- A generalization of a two dimensional matrix in which item indexing in the first dimension has become lookup into the list of column names
- A dyadic mapping
All interpretations are all equivalent and give the same result,
- d_{cols} [c_{j}][i]
In our example,
scores[`iq][2] 126 scores[`iq; 2] 126
Rows and Columns
Viewing the dictionary as a dyadic function, we can project onto its first argument by fixing it to obtain the monadic function - i.e., d_{cols}[c_{j};]. This projected form yields item indexing into the column list.
In simple terms, projecting onto the first argument retrieves a vector of column values from a column dictionary.
scores[`iq;] 42 98 126
Analogously, we would expect projection onto the second argument to retrieve a "row" corresponding to the values in the i^{th} position of each column vector. What form does such a row take?
Observe that the projection of the dyadic function onto its second argument by fixing the item index,
- d_{cols}[;i])
is a monadic function corresponding to generalized indexing by column name - i.e., dictionary lookup. Thus, we expect the i^{th} row to be a dictionary that maps each column name to the value in that column's i^{th} row.
This is exactly what we find.
scores[;2] name| `Prefect iq | 126
Notational differences aside, this resembles the result of retrieving a record from a table using a SQL query: we get the column names and the associated row values.
A column dictionary seems to be the perfect data structure to serve as the basis for a table: a generalized matrix with indexed rows and named columns. But you no doubt notice the fly in the ointment: the indices are in the wrong order. It is unnatural to retrieve a column in the first index and a row in the second.
Column Dictionary with a Single Column
The domain of a column dictionary must always be a list of symbols and the range must be a list of column vectors. Consequently, when there is only one column you must enlist the domain and range. The following is a valid column dictionary (the parentheses are necessary),
ds:(enlist `c)!enlist 100 200 300
The following dictionary that maps a symbol to a list is not a valid column dictionary,
dnot:`c!1 2 3
Flipping a Dictionary
Transpose of a Column Dictionary
A column dictionary can be viewed as a generalized rectangular matrix. Let d be a column dictionary defined as,
- d:c_{1} ... c_{n}!(v_{1};...;v_{n})
where c_{i} is a symbol and the v_{i} have common count, say m. We can index at depth into d for each c_{i} and each j,
- d[c_{i};j] = v_{i}[j]
Since all the v_{i} have count m, in analogy with matrices, it makes sense to define the transpose t of d by the formula,
- t[j;c_{i}] = d[c_{i};j]
Exactly what is t that so defined? The answer comes from realizing that indexing at depth into t should be the same as repeated indexing,
- t[j;c_{i}] = t[j][c_{i}]
The right hand side of this equation makes explicit that t is a list of n items t[j] , for 0≤j<n.
What is each item in the list t? Combining the three previous equations, we see that that,
- t[j][c_{i}] = v_{i}[j]
Now fix j in this equation. We see that t[j] is a dictionary with the same domain as d, meaning the list of c_{i}. This dictionary assigns to each item c_{i} the output value v_{i}[j]. Thus, the range of the dictionary is the collection of values v_{1}![1],..., v_{n}[j].
We summarize our findings,
- The transpose of a column dictionary is a list of dictionaries.
- The dictionaries in the transpose have as common domain the column names of the original dictionary.
- The dictionary in the j^{th} item of the transpose maps the column names to the j^{th} row of values across the column vectors.
Flip of a Column Dictionary
As in the case of lists, the transpose of a dictionary is obtained by applying the unary flip operator,
flip d
When flip is applied to a column dictionary, no data is actually rearranged. The console display confirms the transposition of rows and columns.
d:`name`iq!(`Dent`Beeblebrox`Prefect;98 42 126) flip d name iq ------------------- Dent 98 Beeblebrox 42 Prefect 126
The net effect of flipping a column dictionary is simply reversing the order of the indices. This is logically equivalent to transposing rows and columns.
Flip of a Flipped Column Dictionary
If you transpose a dictionary twice, you obtain the original dictionary,
d~flip flip d / true for any column dictionary d 1b
Consequently, if you are given t the transpose of a column dictionary and you flip it, you obtain a column dictionary.
t:flip d / pretend you didn't see this step flip t name| Dent Beeblebrox Prefect iq | 98 42 126
Advanced: As of this writing (Jan 2007), flip has been implemented in q for dictionaries of columns, although the operation makes sense for any rectangular dictionary. In the event that flip is implemented for a general rectangular dictionary (i.e., any dictionary in which the range is a list of lists all having the same count) we would find the following:In this case, data likely will be rearranged.
- The transpose of a rectangular dictionary is a list of dictionaries. The dictionaries in the transpose have a common domain that is the domain of the original dictionary. The j^{th} dictionary of the transpose maps the original domain to the j^{th} row of values across the range list.
Prev: Casting and Enumerations Next: Tables
©2006-2007 Kx Systems, Inc. and Continuux LLC. All rights reserved.