QforMortals/casting and enumerations

From Kx Wiki
Jump to: navigation, search

Contents

Casting and Enumerations

Types and Cast

Basic Types

Every atom has both an associated numeric and symbolic data type. For convenience we repeat this from the data types table in the overview of atoms.

type char type num type'
boolean b 1
byte x 4
short h 5
int i 6
long j 7
real e 8
float f 9
char c 10
symbol s 11
month m 13
date d 14
datetime z 15
minute u 17
second v 18
time t 19
enumeration *

type

The monadic function type can be applied to any entity in q to find its numeric data type. It is a quirk of q that the data type of atoms is a short containing the negative of the value in the second column above.

	type 42
-6h
	type 1b
-1h
	type 4.2
-9h
	type 4h
-5h
	type `42
-11h
	type "4"
-10
	type 2007.04.02
-15h

The type of a simple list is a short containing the positive value of the type of its constituent atoms.

	type 1 2 3
6h
	type "abc"
10h
	type 1 2 3f
9h

The type of a general list is 0.

	type 1 2h 3j
0h
	type (1;2;(3 4))
0h
	type (`1;"1";3)
0h

Type of a Variable

The type of a variable may be confusing to programmers new to q. In most typed languages, the variable's type must be defined before the variable is assigned a value—that is, when it is declared. In q, a variable is assigned without declaration. The variable can subsequently be reassigned to a new value of a different type.

	a:42
	type a
-6h
	a:98.6
	type a
-9h

This can be understood by considering that q thinks of a variable as a name associated with a value. The association is made upon assignment. A variable has the type of the value associated with its name.

In the example at hand, a variable with name `a is created when the initial assignment is made. Since this is the first time that the name `a is assigned, the q interpreter creates an entry for a in its dictionary of names and it associates the value 42 with it. On the second assignment, there is already an entry for a in the names table, so this name is simply re-associated to the value 98.6.

When you ask for the type of a variable, q returns the type of the value associated with the variable's name. Thus, when you reassign the variable, the type of the variable reflects the type of the new value.

Cast ($)

As in verbose languages, it is possible to cast an entity from one type to another, provided the underlying values are compatible. The q cast operator, denoted $, is a binary atomic verb whose right argument is the source value and left argument is the target type. The target type can be represented either as the type's (positive) numeric short value or a char type value First, examples using the numeric type,

	5h$42
42h
	6h$4.2
4

It is arguably more readable to use the type's char in a cast.

	"i"$4.2
4
	"x"$42
0x2a
	"d"$2004.04.02T04:02:24.042
2004.04.02

The result of casting between superficially distinct types can be derived by considering the underlying numeric values. Chars correspond to their underlying ASCII sequence; dates to their offset from Jan 1, 2000; and times to their count of milliseconds;

	"c"$0x42
"B"
	"d"$42
2000.02.12

Because cast it atomic in its right operand, it is extended item-wise to a list,

	"x"$(10 20 30;255)
(0x0a141e;0xff)

Creating Symbols from Strings

Casting from a string (i.e., a list of char) to a symbol is a convenient way to create symbols. It is the preferred way to create symbols with embedded blanks. To cast a char or a string to a symbol, use the empty symbol (`) as the target domain.

	`$"z"
`z
	`$"Zaphod"
`Zaphod
	`$"Zaphod Beeblebrox"
`Zaphod Beeblebrox
	`$("Life";"the";"Universe";"and";"Everything")
`Life`the`Universe`and`Everything

Parsing Strings to Data

Cast can also be used to parse data from a string by using an upper case char in the left argument,

	"I"$"4267"
4267
	"T"$"23.59.59.999"
23.59.59.999

Parsing date strings is flexible with respect to the format of the date,

	"D"$"2007-04-24"
2007.04.24
	"D"$"12/25/2006"
2006.12.25
	"D"$"07/04/06"
2006.04.06

Coercing Types

Casting can be used to coerce type-safe assignment. Recall that assignment into a simple list must strictly match the type.

	c: 10 20 30 40
	c[1]:42h
'type

This situation can arise when the list and the assignment value are created dynamically. The following expression coerces the type,

	c[1]:(type c)$42h
	c
10 42 30 40
	c[0 1 3]:(type c)$(1.1; 42j; 0x2a)
	c
1 42 30 42

Creating Typed Empty Lists

We have already met the empty list. Observe that it has type 0h, meaning that is a general list whose elements have no specific type,

	type ()
0h

The empty list can be considered as the degenerate case of a general list, so we call it the general empty list. In situations where type enforcement is desired, it is necessary to have an empty list with a specific type. Cast the general empty list using a symbol with the name of the desired type as the target domain.

	L1:`int$()
	type L1
6h
	L2:`float$()
	type L2
9h
	L3:`symbol$()
	type L3
11h

A typed empty list is the degenerate case of a simple list of the specified type. This is useful because type matching is enforced when you append items

	L1,4.0
,4f

Enumerations

We have seen that the binary casting operator ($) transforms its right operand into a conforming entity of type specified by the left operand. In the basic operation, the left operand can be a char type abbreviation, a type number, or a symbol containing the name of one of the primitive data types. In this section, casting is extended to user-defined target domains, providing a functional version of enumerated types.

Traditional Enumerations

To begin, recall that in some verbose languages, an enumerated type is a way of associating a series of names with a corresponding set of integral values. Often the sequence of numbers is consecutive and begins with 0. The specific set of names/values is called the domain of the enumerated type and its name identifies the enumeration.

A traditional enumerated type serves multiple purposes.

There is a subtler, more powerful use: an enumeration normalizes data.

Data Normalization

Say you know that you will have a list—in either the colloquial or q sense—of text entries taken from a fixed and reasonably short set of values. Storing a long list of such strings verbatim presents two problems.

  1. Values of variable length complicate storage management for the list.
2. There is potentially much duplication of data in the list arising from repeated values.

An enumeration solves both problems.

To see how, we start with the case of a q list v containing arbitrary symbols representing character values. Let u be the unique values in v. This is achieved with the distinct function.

	u:distinct v

Let's try a simple example,

v:`c`b`a`c`c`b`a`b`a`a`a`c
o
	u:distinct v
u
`c`b`a

Observe that order of the items in u is the order of their first appearances in v.

Now consider a new list k that represents the positions in u of each of the items in v. This is achieved with the find (?) operator.

	k:u?v
	k
0 1 2 0 0 1 2 1 2 2 2 0

Then we have,

	u[k]
`c`b`a`c`c`b`a`b`a`a`a`c
	v~u[k]
1b

This indeed normalizes the data of v. In general, v will have many repetitions of each of the underlying values, but u stores each value once. Changing an underlying value requires only one operation in the normalized version but potentially many updates to the non-unique list.

Extra credit for recognizing that v is simply the composite u◦k. Effectively we have factored the non-unique list v through the unique list u via the index map k.

  v = u◦k
Information.png Why would we want to do this? Easy: compactness and speed.
This can be seen by comparing the sizes of v, u and k in a slightly modified version of our example.
	v:`ccccccc`bbbbbbb`aaaaaaa`ccccccc`ccccccc`bbbbbbb
	u:distinct v
	u
`ccccccc`bbbbbbb`aaaaaaa

	k:u?v
	k
0 1 2 0 0 1
Now imagine v and k to be much longer.

Enumerations

Enumeration encapsulates the above factorization of an arbitrary list through a list of unique values. An enumeration uses the binary cast operator ($) and is a generalization of the basic cast between types.

The general form of an enumerated value is,

	`u$v

where u is a simple list of unique values and v is either an atom in u or a list of such. The projection `u$ is the enumeration, u is the domain of the enumeration and `u$v represents the enumerated value(s).

Under the covers, applying the enumeration `u$ to a vector v actually factors v through u as in the previous section. The resulting index list k is stored internally.

Working with an Enumeration

Recasting our factorization example as an enumeration,

	u:`c`b`a
v:`c`b`a`c`c`b`a`b`a`a`a`c
	ev:`u$v
	ev
`u$`c`b`a`c`c`b`a`b`a`a`a`c

While the display of the enumeration ev shows that it represents the values of v within the domain u, only the implicit int index list is actually stored.

The enumeration ev acts just like the original v.

	v[3]
`c
	ev[3]
`c
	v[3]:`b  v
`c`b`a`b`c`b`a`b`a`a`a`c
	ev[3]:`b  ev
`u$`c`b`a`b`c`b`a`b`a`a`a`c
	v=`a
001000101110b
	ev=`a
001000101110b
	v in `a`b
011101111110b
	ev in `a`b
011101111110b
Information.png While the enumeration is item-wise equal to—and can be freely substituted for—the original, they are not identical.
	v=ev
111111111111b
	v~ev
0b

Type of an Enumeration

Each enumeration is assigned a new numeric data type, beginning with 20h. If you start a new q session and load no script files, you will observe the following.

	u1:`c`b`a
	u2:`2`4`6`8
	u3:`a`b`c
	u4:`c`b`a

	type `u1$`c`a`c`b`b`a
20h
	type `u1$`a`a`b`b`c`c
20h
	type `u2$`8`8`4`2`6`4
21h
	type `u3$`c`a`c`b`b`a
22h
	type `u4$`c`a`c`b`b`a
23h
Information.png Enumerations with distinct domains are distinct, even when the domains match.
	u1~u4
1b
	v:`c`a`c`b`b`a
	(`u1$v)~`u4$v
0b

Prev: Functions, Next: Dictionaries

©2006 Kx Systems, Inc. and Continuux LLC. All rights reserved.

Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox