QforMortals/casting and enumerations
Contents |
Casting and Enumerations
Types and Cast
Basic Types
Every atom has both an associated numeric and symbolic data type. For convenience we repeat this from the data types table in the overview of atoms.
type | char type | num type' |
boolean | b | 1 |
byte | x | 4 |
short | h | 5 |
int | i | 6 |
long | j | 7 |
real | e | 8 |
float | f | 9 |
char | c | 10 |
symbol | s | 11 |
month | m | 13 |
date | d | 14 |
datetime | z | 15 |
minute | u | 17 |
second | v | 18 |
time | t | 19 |
enumeration | * |
type
The monadic function type can be applied to any entity in q to find its numeric data type. It is a quirk of q that the data type of atoms is a short containing the negative of the value in the second column above.
type 42 -6h type 1b -1h type 4.2 -9h type 4h -5h type `42 -11h type "4" -10 type 2007.04.02 -15h
The type of a simple list is a short containing the positive value of the type of its constituent atoms.
type 1 2 3 6h type "abc" 10h type 1 2 3f 9h
The type of a general list is 0.
type 1 2h 3j 0h type (1;2;(3 4)) 0h type (`1;"1";3) 0h
Type of a Variable
The type of a variable may be confusing to programmers new to q. In most typed languages, the variable's type must be defined before the variable is assigned a value—that is, when it is declared. In q, a variable is assigned without declaration. The variable can subsequently be reassigned to a new value of a different type.
a:42 type a -6h a:98.6 type a -9h
This can be understood by considering that q thinks of a variable as a name associated with a value. The association is made upon assignment. A variable has the type of the value associated with its name.
In the example at hand, a variable with name `a is created when the initial assignment is made. Since this is the first time that the name `a is assigned, the q interpreter creates an entry for a in its dictionary of names and it associates the value 42 with it. On the second assignment, there is already an entry for a in the names table, so this name is simply re-associated to the value 98.6.
When you ask for the type of a variable, q returns the type of the value associated with the variable's name. Thus, when you reassign the variable, the type of the variable reflects the type of the new value.
Cast ($)
As in verbose languages, it is possible to cast an entity from one type to another, provided the underlying values are compatible. The q cast operator, denoted $, is a binary atomic verb whose right argument is the source value and left argument is the target type. The target type can be represented either as the type's (positive) numeric short value or a char type value First, examples using the numeric type,
5h$42 42h 6h$4.2 4
It is arguably more readable to use the type's char in a cast.
"i"$4.2 4 "x"$42 0x2a "d"$2004.04.02T04:02:24.042 2004.04.02
The result of casting between superficially distinct types can be derived by considering the underlying numeric values. Chars correspond to their underlying ASCII sequence; dates to their offset from Jan 1, 2000; and times to their count of milliseconds;
"c"$0x42 "B" "d"$42 2000.02.12
Because cast it atomic in its right operand, it is extended item-wise to a list,
"x"$(10 20 30;255) (0x0a141e;0xff)
Creating Symbols from Strings
Casting from a string (i.e., a list of char) to a symbol is a convenient way to create symbols. It is the preferred way to create symbols with embedded blanks. To cast a char or a string to a symbol, use the empty symbol (`) as the target domain.
`$"z" `z `$"Zaphod" `Zaphod `$"Zaphod Beeblebrox" `Zaphod Beeblebrox `$("Life";"the";"Universe";"and";"Everything") `Life`the`Universe`and`Everything
Parsing Strings to Data
Cast can also be used to parse data from a string by using an upper case char in the left argument,
"I"$"4267" 4267 "T"$"23.59.59.999" 23.59.59.999
Parsing date strings is flexible with respect to the format of the date,
"D"$"2007-04-24" 2007.04.24 "D"$"12/25/2006" 2006.12.25 "D"$"07/04/06" 2006.04.06
Coercing Types
Casting can be used to coerce type-safe assignment. Recall that assignment into a simple list must strictly match the type.
c: 10 20 30 40 c[1]:42h 'type
This situation can arise when the list and the assignment value are created dynamically. The following expression coerces the type,
c[1]:(type c)$42h c 10 42 30 40 c[0 1 3]:(type c)$(1.1; 42j; 0x2a) c 1 42 30 42
Creating Typed Empty Lists
We have already met the empty list. Observe that it has type 0h, meaning that is a general list whose elements have no specific type,
type () 0h
The empty list can be considered as the degenerate case of a general list, so we call it the general empty list. In situations where type enforcement is desired, it is necessary to have an empty list with a specific type. Cast the general empty list using a symbol with the name of the desired type as the target domain.
L1:`int$() type L1 6h L2:`float$() type L2 9h L3:`symbol$() type L3 11h
A typed empty list is the degenerate case of a simple list of the specified type. This is useful because type matching is enforced when you append items
L1,4.0 ,4f
Enumerations
We have seen that the binary casting operator ($) transforms its right operand into a conforming entity of type specified by the left operand. In the basic operation, the left operand can be a char type abbreviation, a type number, or a symbol containing the name of one of the primitive data types. In this section, casting is extended to user-defined target domains, providing a functional version of enumerated types.
Traditional Enumerations
To begin, recall that in some verbose languages, an enumerated type is a way of associating a series of names with a corresponding set of integral values. Often the sequence of numbers is consecutive and begins with 0. The specific set of names/values is called the domain of the enumerated type and its name identifies the enumeration.
A traditional enumerated type serves multiple purposes.
- It allows a descriptive name to be used instead of an arbitrary number—e.g., 'blue' instead of 3.
- It permits strong type checking to ensure that only permissible values are supplied—i.e., choosing a named color from a list instead of remembering a number is less prone to error.
- It can provide name spaces, meaning the same name can be reused indifferent domains without fear of confusion—e.g., color.blue and mood.blue.
There is a subtler, more powerful use: an enumeration normalizes data.
Data Normalization
Say you know that you will have a list—in either the colloquial or q sense—of text entries taken from a fixed and reasonably short set of values. Storing a long list of such strings verbatim presents two problems.
- Values of variable length complicate storage management for the list.
- 2. There is potentially much duplication of data in the list arising from repeated values.
An enumeration solves both problems.
To see how, we start with the case of a q list v containing arbitrary symbols representing character values. Let u be the unique values in v. This is achieved with the distinct function.
u:distinct v
Let's try a simple example,
v:`c`b`a`c`c`b`a`b`a`a`a`c o u:distinct v u `c`b`a
Observe that order of the items in u is the order of their first appearances in v.
Now consider a new list k that represents the positions in u of each of the items in v. This is achieved with the find (?) operator.
k:u?v k 0 1 2 0 0 1 2 1 2 2 2 0
Then we have,
u[k] `c`b`a`c`c`b`a`b`a`a`a`c v~u[k] 1b
This indeed normalizes the data of v. In general, v will have many repetitions of each of the underlying values, but u stores each value once. Changing an underlying value requires only one operation in the normalized version but potentially many updates to the non-unique list.
Extra credit for recognizing that v is simply the composite u◦k. Effectively we have factored the non-unique list v through the unique list u via the index map k.
v = u◦k
Why would we want to do this? Easy: compactness and speed.
- Let's say that the count of u is a and the maximum width (in the colloquial sense) of the symbols in u is b. For a list v of variable count x, the amount of storage required is potentially b*x. For the factored form, the storage is known to be a*b+4*x, representing the fixed amount of storage for u plus the variable amount of storage for the simple integer list k. If a is small and b is even moderately large, the factorization is significantly smaller.
- This can be seen by comparing the sizes of v, u and k in a slightly modified version of our example.
v:`ccccccc`bbbbbbb`aaaaaaa`ccccccc`ccccccc`bbbbbbb u:distinct v u `ccccccc`bbbbbbb`aaaaaaa k:u?v k 0 1 2 0 0 1
- Now imagine v and k to be much longer.
- Reading and writing the factored index list from/to disk is a block operation that will be very fast.
- Assuming that items of v are symbols stored in a hash-table, item indexing in the un-factored list requires looking up each symbol. Indexing into the factored list can be done directly via position since it is a uniform list of integers.
Enumerations
Enumeration encapsulates the above factorization of an arbitrary list through a list of unique values. An enumeration uses the binary cast operator ($) and is a generalization of the basic cast between types.
The general form of an enumerated value is,
`u$v
where u is a simple list of unique values and v is either an atom in u or a list of such. The projection `u$ is the enumeration, u is the domain of the enumeration and `u$v represents the enumerated value(s).
Under the covers, applying the enumeration `u$ to a vector v actually factors v through u as in the previous section. The resulting index list k is stored internally.
Working with an Enumeration
Recasting our factorization example as an enumeration,
u:`c`b`a v:`c`b`a`c`c`b`a`b`a`a`a`c ev:`u$v ev `u$`c`b`a`c`c`b`a`b`a`a`a`c
While the display of the enumeration ev shows that it represents the values of v within the domain u, only the implicit int index list is actually stored.
The enumeration ev acts just like the original v.
v[3] `c ev[3] `c v[3]:`b v `c`b`a`b`c`b`a`b`a`a`a`c ev[3]:`b ev `u$`c`b`a`b`c`b`a`b`a`a`a`c v=`a 001000101110b ev=`a 001000101110b v in `a`b 011101111110b ev in `a`b 011101111110b
While the enumeration is item-wise equal to—and can be freely substituted for—the original, they are not identical.
v=ev 111111111111b v~ev 0b
Type of an Enumeration
Each enumeration is assigned a new numeric data type, beginning with 20h. If you start a new q session and load no script files, you will observe the following.
u1:`c`b`a u2:`2`4`6`8 u3:`a`b`c u4:`c`b`a type `u1$`c`a`c`b`b`a 20h type `u1$`a`a`b`b`c`c 20h type `u2$`8`8`4`2`6`4 21h type `u3$`c`a`c`b`b`a 22h type `u4$`c`a`c`b`b`a 23h
Enumerations with distinct domains are distinct, even when the domains match.
u1~u4 1b v:`c`a`c`b`b`a (`u1$v)~`u4$v 0b
Prev: Functions, Next: Dictionaries
©2006 Kx Systems, Inc. and Continuux LLC. All rights reserved.