QforMortals2/dictionaries

From Kx Wiki
Jump to: navigation, search

Contents

Dictionaries

Overview

Dictionaries are a generalization of lists and provide the foundation for tables. A dictionary is a (mathematical) mapping defined by an explicit I/O association between a domain list and range list. The two lists must have the same count and the domain list should be a unique collection. While general lists can be used to create a dictionary, many useful dictionaries involve lists of special forms. The domain is frequently a collection of symbols representing names. As we shall see, a dictionary whose domain is a unique list of symbols and whose range is rectangular corresponds to a table.

Dictionary Basics

A dictionary is an ordered collection of key-value pair - that is, a hashtable in verbose languages.

Definition

A dictionary, also called an association, is a mapping defined by an explicit I/O association between a domain list and a range list via positional correspondence. The creation of a dictionary uses the xkey primitive ( ! ),

Ldomain! Lrange

Recall from Mathematical Functions Refresher the view of a map's I/O table as a pair of input and output columns. Dictionary notation is simply the map's I/O table turned on its side for ease of entry and compactness of display.

Warning.png Note: All dictionaries have type 99h.

The domain list comprises the keys of the dictionary and the range list its values. The keys of a dictionary are retrieved by the unary primitive key and the values by the unary primitive value. The count of the dictionary is the (common) count of its keys and values.

Warning.png Note: Although q does not enforce the requirement that the key items are unique, a dictionary does provide a unique output value for each input value, thus guaranteeing a well-defined mathematical map. See below for details.

The most basic dictionary maps a simple list to a simple list. The following I/O table represents a mapping of three symbols containing names to the corresponding individual's intelligent quotient,

I O
`Dent 98
`Beeblebrox 42
`Prefect 126

This mapping is defined compactly as a dictionary.

        d1:`Dent`Beeblebrox`Prefect!98 42 126
        count d
3
        key d
`Dent`Beeblebrox`Prefect

        value d
98 42 126

The console displays a dictionary I/O table in columnar form.

        d
Dent           | 98
Beeblebrox | 42
Prefect       | 126

The function cols also returns the domain.

        cols d1
`Dent`Beeblebrox`Prefect
Warning.png Note: The order of the items in the domain and range lists is significant, just as positional order is significant for lists. Although the I/O assignments and the associated mappings are equivalent regardless of order, differently ordered dictionaries are not identical.
        d1:`Prefect`Beeblebrox`Dent!126 42 98
        d~d1
0b

Lookup

Finding the dictionary output value corresponding to an input value is called looking up the input. This actually is achieved via a hash-table lookup under the covers. Similar to functions and lists, both d[x] and d x lookup the output value for x.

        d[`Beeblebrox]
42
        d `Beeblebrox
42

As with item indexing, lookup of a key not in the domain of a dictionary results in an appropriately typed null value, not an error.

        d[`Slartibartfast]
0N

As with lists and functions, key lookup in a dictionary is extended item-wise to a simple list of keys.

        d[`Dent`Prefect]
98 126
Information.png Advanced: We can interpret key list lookup as the composition of the key lookup map with the item indexing map. Symbolically, let d be a dictionary and K a key list in the domain of d. Then for 0 ≤j < count K,
        d[K][j] = d[K[j]]

Using one of our examples,

        d:`Dent`Beeblebrox`Prefect! 98 42 126
        K:`Dent`Prefect
        d[K][1]
126
        d[K[1]]
126

Or, using the entire index list,

        d K
98 126
        d[K]
98 126

Dictionary vs. List

A dictionary a generalization of a list in which item indexing has been extended to a non-integral domain. In particular, a dictionary cannot be indexed implicitly via position. Attempting this on any dictionary generates an error.

        d:"abcde"!1.1 2.2 3.3 4.4 6.5
        d["c"]
3.3
        d[0]
`type

We can define a dictionary whose lookup emulates the mapping of list item indexing.

        L3:`one`two`three
        L3[1]
`two
        d3:0 1 2!`one`two`three
        d3[1]
`two

When we ask q to compare the two entities for equality, it obliges by considering both as mappings with integral domain. It then tests the assignments item-wise.

        L3=d3
0| 1
1| 1
2| 1

However, the dictionary so-specified is not the same as the list.

        L3~d3
0b

Although retrieving items from a list-like dictionary is notationally identical to item indexing, it is not the same. Item indexing is a positional offset, whereas dictionary retrieval is a lookup. They are implemented differently under the covers.

Lookup with Verb @

Recall that indexing into a list can be achieved with verb @.

        L:100 200 300
        L[1]
200
        L@1
200

The same syntax works for dictionary lookup.

        d:`a`b`c!10 20 30
        d[`b]
20
        d@`b
20

Uniqueness of Keys

We noted earlier that q does not enforce uniqueness in a dictionary domain list. In the event of a repeated domain item, only the output value associated with the first occurrence in left-to-right order is accessible via lookup. This guarantees that a dictionary provides a unique output for each input value and is thus a well-defined mathematical map.

For example,

        ddup:8 4 8 2 3 1!`one`two`three`four`five`six
        ddup[8]
`one
Information.png Advanced: Reverse lookup works properly for a non-unique domain.
        ddup?`three
8

Non-simple Domain or Range

The range values of a dictionary are not required to be atoms. The range can be a general list that contains nested lists.

        dgv:(1;2h;3.3;"4")!(`one;2 3;"456";(7;8 9))
        dgv["4"]
7
8 9

Nor are keys are required to be atoms.

        dgk:(0 1; 2 3)!`first`second
        dgk[0 1]
`first
        dgk[2 3]
`second
Information.png Advanced: If the keys are not a list of items of uniform shape, lookup does not work in a useful way.
        dweird:(0 1; 2; 3)!`first`second`third
        dweird[0 1]
`first
        dweird[2]
`
        dweird[3]
`

The observed behavior is that key lookup fails at the first key of different shape.

Extracting a Sub-Dictionary by Key

Dictionary lookup on a key or a list of keys returns the associated values. It is also possible to extract the key-value associations using the take operator (#). The left operand is a list of keys, the right operand is the source dictionary and the result is a new dictionary whose mapping is that of the original restricted to the specified keys.

       (enlist `c)#d
c| 30

        `a`c#d
a| 10
c| 30

This works when the keys are not simple.

        dns:(1 2; 3 4; 5 6)!("onetwo"; "threefour"; "fivesix")
        (1 2; 5 6)#dns
1 2| "onetwo"
5 6| "fivesix"

Operations on Dictionaries

Amend and Upsert

As with lists, the items of a dictionary can be modified via indexed assignment.

        d:10 20 30!"abc"
        d[30]:"x"
        d
10| a
20| b
30| x
Warning.png Important: In contrast to lists, dictionaries can be extended via index assignment. For example,
        d[40]:"y"
        d
10| a
20| b
30| x
40| y

        L:"abc"
        L[3]:"x"
'length

Let's examine this capability to modify or extend a dictionary via index assignment more closely. Let d be a dictionary, c be an atom whose type matches the domain of d, and x an item whose type is compatible with the range of d. The assignment,

        d[c]:x

updates the existing range value if c is in the domain of d, but inserts a new entry at the end of the dictionary if c is not in the domain of d.

This insert/update behavior is called upsert semantics. Because tables are essentially dictionaries, upsert semantics carry through to tables.

Reverse Lookup with Find (?)

Recall that the dyadic primitive find ( ? ) returns the index of the right operand in a list.

        1001 1002 1003?1002
1

Extending this concept to dictionaries means reversing the domain-to-range mapping. We expect ? to perform reverse lookup by mapping a range element to its domain element.

        d:`a`b`c!1001 1002 1003
        d?1002
`b

The result of find on an entity not in the range is a null whose type matches the domain list. For simple lists, the null matches the type of the list; for general lists, the null is 0N.

        d?1004        / the result is the null symbol `
`
        dg:(1;`a;"z")!10 20 30
        dg?50
0N
Warning.png Note: For a non-unique range element, find returns the first item mapping to it from the domain list.
        d:`a`b`c`d!1001 1002 1003 1002
        d?1002
`b

Removing Entries

The binary operation delete (_) returns the result of removing an entry from a dictionary by key value. The left operand of delete is the dictionary (target) and the right operand is a key value whose type matches that of target.

Note: Whitespace is required to the left of _ if the first operand is a variable.

For example,

        d:1 2 3!`a`b`c
        d _2
1| a
3| c

Observe that attempting to remove a key that does not exist has no effect.

       d _42
1| a
2| b
3| c

The binary delete, also denoted by an underscore ( _ ), returns the result of removing multiple entries from a dictionary. The left operand of delete is a list of key values whose type matches that of the dictionary and the right operand is the dictionary (target). The result is a dictionary obtained by removing the specified key-value pairs from target.

Warning.png Note: Whitespace is also required to the left of _ if the first operand is a variable.
Warning.png Note: Since the left operand is required to be a list, a single key value must be enlisted.

For example,

        d:1 2 3!`a`b`c
        (enlist 2)_d
1| a
3| c
        1 3_d
2| b
        (enlist 42)_d
1| a
2| b
3| c

Attempting to remove a key that does not exist has no effect.

        4 5_d
1| a
2| b
3| c

Observe that removing all the entries in a dictionary leaves a dictionary with empty domain and range lists of the appropriate types.

        1 2 3_d

There binary operator cut is the same as ( _ ) on a dictionary.

        (enlist 2) cut d
1| a
3| c

Primitive Operations

Because dictionaries are maps, it is possible to compose their mappings with function mappings to perform operations on dictionaries. Of course, this assumes that the range of each dictionary is in the domain of the indicated operation, so that the operation makes sense. The application of a unary operator is straight-forward.

        d1:`a`b`c!1 2 3
        neg d1
a| -1
b| -2
c| -3
        2*d1
a| 2
b| 4
c| 6
        d1=2
a| 0
b| 1
c| 0

When the domains of two dictionaries are identical, performing binary operations is straightforward. For example, to add two dictionaries with a common domain, add their corresponding range elements,

        d2:`a`b`c!10 20 30
        d1+d2
a| 11
b| 22
c| 33

How do we combine two dictionaries whose domains are not identical? First, the domain of the resulting dictionary is the union of the domains of its operands. For items in the intersection of the domain lists, clearly we should simply apply the indicated operation on the corresponding range items.

The real question is, what to do on non-common domain items? The answer: do what makes sense for the operation. We start with joining two dictionaries.

Join

In the simple case of joining two disjoint dictionaries, the result should be the merge.

        d3:`e`f`g!100 200 300
        d1,d3
a| 1
b| 2
c| 3
e| 100
f| 200
g| 300

        d3,d1
e| 100
f| 200
g| 300
a| 1
b| 2
c| 3

Observe that although the mappings arising from opposite-order joins have equivalent input-output assignments, the dictionaries are not identical because order is significant.

We examine another simple example of joining dictionaries with a special form. The particular dictionaries map symbols to lists of simple lists. When the two are disjoint the result should again be the merge. For example,

        dc1:`a`b!(1 2 3; 10 20 40)
        dc2:(enlist `c)!enlist 10 20 30
        dc1,dc2
a| 1  2  3
b| 10 20 40
c| 10 20 30

As in the previous example, join simply appends the domains and ranges in the obvious way. We shall refer to this case later.

Now we tackle the case of non-disjoint dictionaries. The issue is how to merge items that are common to both dictionary domains, since these elements each have two I/O assignments.

Warning.png Important: In a join of dictionaries, the right operand's I/O assignment prevails for common domain elements.

The result is another illustration of upsert semantics. Each I/O assignment of the right operand is applied as an update if the domain element is assigned in the left operand, or as an insert if the domain element is not already assigned.

With d1 as above,

        d3:`c`d!33 44
        d1,d3
a| 1
b| 2
c| 33
d| 44

Observe that upsert is not commutative, even over a common domain. Join order matters.

        d4:`a`b`c!300 400 500
        d1,d4
a| 300
b| 400
c| 500
        d4,d1
a| 1
b| 2
c| 3

Arithmetic Operations

Now that we understand how to join two dictionaries, we examine other operations. When arithmetic and comparison operations are performed on dictionaries, the indicated operation is performed on the common domain elements and the dictionaries are merged elsewhere,

        d5:`c`x`y!1000 2000 3000
        d1+d5
a| 1
b| 2
c| 1003
x| 2000
y| 3000
        d1*d5
a| 1
b| 2
c| 3000
x| 2000
y| 3000
        d1|d5
a| 1
b| 2
c| 1000
x| 2000
y| 3000

When a relational operation is performed on two dictionaries, the indicated operation is performed over the entire union domain. Effectively, each dictionary is extended to the union domain with (type-matched) nulls. Otherwise put, for non-common domain items, the operation is performed on a pair of items in which a null whose type matches the provided range item is substituted for the missing range item.

In the following examples, observe that operations on d1 and d6 are equivalent to the corresponding operations on d11 and d66,

        d1:`a`b`c!1 2 3
        d6:`b`c`d`e!22 3 44 55
        d1=d6
a| 0
b| 0
c| 1
d| 0
e| 0
        d1<d6
a| 0
b| 1
c| 0
d| 1
e| 1
        d6<d1
b| 0
c| 0
d| 0
e| 0
a| 1
        d1>d6
b| 0
c| 0
d| 0
e| 0
a| 1

        d11:`a`b`c`d`e!1 2 3 0N 0N
        d66:`a`b`c`d`e!0N 22 3 44 55
        d11=d66
a| 0
b| 0
c| 1
d| 0
e| 0
         d11<d66
a| 0
b| 1
c| 0
d| 1
e| 1
        d66<d11
a| 1
b| 0
c| 0
d| 0
e| 0
        d11>d66
a| 1
b| 0
c| 0
d| 0
e| 0
Warning.png Note: The > operation is evidently converted to the equivalent < operation with reversed operands.

Column Dictionaries

Column dictionaries are the foundation for tables.

Definition and Terminology

A very useful type of dictionary is one that maps a list of symbols to a rectangular list of lists. Such a dictionary has the form,

c1 ... cn  !(v1 ;... ; vn)

where each ci is a symbol and the vi are lists with common count. Such a dictionary associates the symbol ci with the list of values vi.

Interpreting each symbol as a column name and the corresponding vector as the column values, we call such a list a column dictionary. The type of column named by ci is the type of its value list vi. For many column dictionaries, the vi are all simple lists, meaning that each column is a vector of atoms of uniform type. We call this a simple column dictionary.

Simple Example

Let's reorganize the example of the previous section as a simple column dictionary.

        scores:`name`iq!(`Dent`Beeblebrox`Prefect;42 98 126)

In this dictionary, the values for the name column are,

        scores[`name]
`Dent`Beeblebrox`Prefect

It is possible to retrieve the values for a column in a column dictionary using dot notation.

        scores.name
`Dent`Beeblebrox`Prefect

The value in row 1 of the name column is,

        scores[`name][1]
`Beeblebrox

Similarly, the value in row 2 of the iq column is,

        scores[`iq][2]
126

The dictionary console shows the mapping clearly.

        scores
name| Dent Beeblebrox Prefect
iq      | 42     98              126

Accessing Values

For a general column dictionary defined as,

dcols:c1 ...cn!(v1;...;vn)

the ith element of column cj is retrieved by,

dcols [cj][i]

What should we make of the following notation?

dcols [cj;i]

We can interpret it in three ways:

All interpretations are all equivalent and give the same result,

dcols [cj][i]

In our example,

        scores[`iq][2]
126
        scores[`iq; 2]
126

Rows and Columns

Viewing the dictionary as a dyadic function, we can project onto its first argument by fixing it to obtain the monadic function - i.e., dcols[cj;]. This projected form yields item indexing into the column list.

In simple terms, projecting onto the first argument retrieves a vector of column values from a column dictionary.

        scores[`iq;]
42 98 126

Analogously, we would expect projection onto the second argument to retrieve a "row" corresponding to the values in the ith position of each column vector. What form does such a row take?

Observe that the projection of the dyadic function onto its second argument by fixing the item index,

dcols[;i])

is a monadic function corresponding to generalized indexing by column name - i.e., dictionary lookup. Thus, we expect the ith row to be a dictionary that maps each column name to the value in that column's ith row.

This is exactly what we find.

        scores[;2]
name| `Prefect
iq      | 126

Notational differences aside, this resembles the result of retrieving a record from a table using a SQL query: we get the column names and the associated row values.

A column dictionary seems to be the perfect data structure to serve as the basis for a table: a generalized matrix with indexed rows and named columns. But you no doubt notice the fly in the ointment: the indices are in the wrong order. It is unnatural to retrieve a column in the first index and a row in the second.

Column Dictionary with a Single Column

The domain of a column dictionary must always be a list of symbols and the range must be a list of column vectors. Consequently, when there is only one column you must enlist the domain and range. The following is a valid column dictionary (the parentheses are necessary),

        ds:(enlist `c)!enlist 100 200 300

The following dictionary that maps a symbol to a list is not a valid column dictionary,

	dnot:`c!1 2 3

Flipping a Dictionary

Transpose of a Column Dictionary

A column dictionary can be viewed as a generalized rectangular matrix. Let d be a column dictionary defined as,

d:c1 ... cn!(v1;...;vn)

where ci is a symbol and the vi have common count, say m. We can index at depth into d for each ci and each j,

d[ci;j] = vi[j]

Since all the vi have count m, in analogy with matrices, it makes sense to define the transpose t of d by the formula,

t[j;ci] = d[ci;j]

Exactly what is t that so defined? The answer comes from realizing that indexing at depth into t should be the same as repeated indexing,

t[j;ci] = t[j][ci]

The right hand side of this equation makes explicit that t is a list of n items t[j] , for 0≤j<n.

What is each item in the list t? Combining the three previous equations, we see that that,

t[j][ci] = vi[j]

Now fix j in this equation. We see that t[j] is a dictionary with the same domain as d, meaning the list of ci. This dictionary assigns to each item ci the output value vi[j]. Thus, the range of the dictionary is the collection of values v1![1],..., vn[j].

We summarize our findings,

Flip of a Column Dictionary

As in the case of lists, the transpose of a dictionary is obtained by applying the unary flip operator,

        flip d

When flip is applied to a column dictionary, no data is actually rearranged. The console display confirms the transposition of rows and columns.

        d:`name`iq!(`Dent`Beeblebrox`Prefect;98 42 126)
        flip d
name       iq
-------------------
Dent            98
Beeblebrox 42
Prefect       126

The net effect of flipping a column dictionary is simply reversing the order of the indices. This is logically equivalent to transposing rows and columns.

Flip of a Flipped Column Dictionary

If you transpose a dictionary twice, you obtain the original dictionary,

        d~flip flip d        / true for any column dictionary d
1b

Consequently, if you are given t the transpose of a column dictionary and you flip it, you obtain a column dictionary.

        t:flip d        / pretend you didn't see this step
        flip t
name| Dent Beeblebrox Prefect
iq      | 98     42              126
Information.png Advanced: As of this writing (Jan 2007), flip has been implemented in q for dictionaries of columns, although the operation makes sense for any rectangular dictionary. In the event that flip is implemented for a general rectangular dictionary (i.e., any dictionary in which the range is a list of lists all having the same count) we would find the following:
The transpose of a rectangular dictionary is a list of dictionaries. The dictionaries in the transpose have a common domain that is the domain of the original dictionary. The jth dictionary of the transpose maps the original domain to the jth row of values across the range list.
In this case, data likely will be rearranged.

Prev: Casting and Enumerations Next: Tables

Table of Contents

©2006-2007 Kx Systems, Inc. and Continuux LLC. All rights reserved.

Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox