QforMortals2/atoms

From Kx Wiki
Jump to: navigation, search

Contents

1. Atoms

Overview

All data is ultimately built from atoms, so we begin with atoms. An atom is an irreducible value with a specific data type. The basic data types in q correspond to those of SQL with some additional date and time related types that facilitate time series. We summarize the data types in the tables below, giving the corresponding types in SQL, and where appropriate Java and C#. We cover enumerations in Casting and Enumerations.

Q SQL Java C#
boolean boolean Boolean Boolean
byte byte Byte Byte
short smallint Short Int16
int int Integer Int32
long bigint Long Int64
real real Float Single
float float Double Double
char char(1) Character Char
symbol varchar (String) (String)
date date Date
datetime datetime Timestamp !DateTime
minute
second
time time Time !TimeSpan
enumeration
Warning.png Note: The words boolean, short, int, etc. are not keywords in q, so they are not displayed in a special font in this text. They do have special meaning when used as name arguments in some operators. You should avoid using them as names.

The next table collects the important information about each of the q data types. We shall refer to this in subsequent sections.

type size char type num type notation null value
boolean 1 b 1 1b
byte 1 x 4 0x26 0x00
short 2 h 5 42h 0Nh
int 4 i 6 42 0N
long 8 j 7 42j 0Nj
real 4 e 8 4.2e 0Ne
float 8 f 9 4.2 0n
char 1 c 10 "z" " "
symbol * s 11 `zaphod `
month 4 m 13 2006.07m 0Nm
date 4 d 14 2006.07.21 0Nd
datetime 4 z 15 2006.07.21T09:13:39 0Nz
minute 4 u 17 23:59 0Nu
second 4 v 18 23:59:59 0Nv
time 4 t 19 09:01:02:042 0
enumeration * `u$v
dictionary 99 `a`b`c!10 20 30
table 98 ([] c1:`a`b`c; c2:10 20 30)

Integer Data

The basic integer data type is common to nearly all programming environments.

int

An int is a signed four-byte integer. A numeric value is identified as an int by that fact that it contains only numeric digits, possibly with a leading minus sign, without a decimal point. In particular, it has no trailing character that would indicate that it is another numeric type (see below). Here is a typical int value,

        42

short and long

The other two integer data types are short and long. The short type represents a two byte signed integer and is denoted by a trailing 'h' after optionally signed numeric digits. For example,

        b:-123h
        b
-123h

Similarly, the long type represents an eight byte signed long integer denoted by a trailing 'j' after optionally signed numeric digits.

        c:1234567890j
        c
1234567890j
Warning.png Important: Type promotion is performed automatically in q primitive operations. However, if a specific integer type is required in a list and a narrower type is presented - e.g., an int is expected and a short is presented - the submitted type will not be automatically promoted and an error will result. This may be unintuitive for programmers coming from languages of C ancestry, but it will make sense in the context of tables.

Floating Point Data

Single and double precision floating point data types are supported.

float

The float type represents an IEEE standard eight-byte floating point number, often called "double" in other languages. It is denoted by optionally signed numeric digits containing a decimal point with an optional trailing 'f'. A floating point number can hold at least 15 decimal digits of precision.

For example,

        pi:3.14159265

        float1:1f

real

The real type represents a four-byte floating point number and is denoted by numeric digits containing a decimal point and a trailing 'e'. Keep in mind that this type is called 'float' in some languages. A real can hold at least 6 decimal digits of precision, 7 being the norm. Thus

        r:1.4142e
        r
1.4142e

is a valid real number.

Warning.png Note: The q console abbreviates the display of float or real values having zeros to the right of the decimal.
        2.0
2f
        4.00e
4e

The behavior of substituting floating point types of different widths is analogous to the case of integer types.

Scientific Notation

Both float and real values can be specified in IEEE standard scientific notation for floating point values.

        f:1.23456789e-10
        r:1.2345678e-10e

By default, the q console displays only seven decimal digits of accuracy for float and real values by rounding the display in the seventh significant digit.

        f
1.234568e-10
        r
1.234568e-10e

You can change this by using the \P command (note upper case) to specify a display width up to 16 digits.

        f12:1.23456789012
        f16:1.234567890123456

        \P 12
        f12
1.23456789012
        f16
1.23456789012

        \P 16
        f12
1.23456789012
        f16
1.234567890123456

Binary Data

Binary data can be represented as bit or byte values.

boolean

The boolean type uses one byte to store an individual bit and is denoted by the bit value followed by 'b'.

        bit:0b
        bit
0b

byte

The byte type uses one byte to store 8 bits of data and is denoted by '0x' followed by a hexadecimal value,

        byte:0x2a

Binary Data is Numeric

In handling binary data, q is more like C than its descendants, in that both binary types are considered to be unsigned integers that can participate in arithmetic expressions or comparisons with other numeric types. There are no keywords for 'true' or 'false', nor are there separate logical operators. With a and pi as above,

        a:42
        bit:1b
        a+bit
43

is an int and

        byte+pi
45.14159

is a float. Observe that type promotion has been performed automatically.

Character Data

There are two atomic character types in q. They resemble the SQL types CHAR and VARCHAR more than the character types of verbose languages.

char

A char holds an individual ASCII character and is stored in one byte. This corresponds to a SQL CHAR. A char is denoted by a single character enclosed in double quotes.

        ch:"q"
        ch
"q"

Some keyboard characters, such as the double-quote, cannot be entered directly into a char since they have special meaning in q. As in C, these characters are escaped with a preceding back-slash ( \ ). While the console display also includes the escape, these are actually single characters.

        ch:"\""                        / double-quote
        ch                              / console also displays the escape "\""
        ch:"\\"                        / back-slash
        ch:"\n"                        / newline
        ch:"\r"                        / return
        ch:"\t"                         / horizontal tab

You can also escape a character with an underlying numeric value expressed as three octal digits.

        "\142"
"b"

symbol

A symbol holds a sequence of characters as a single unit. A symbol is denoted by a leading back-quote ( ` ), also read "back tick" in q circles.

        s1:`q
        s2:`zaphod

A symbol is irreducible, meaning that the individual characters that comprise it are not directly accessible. Symbols are often used in q to hold names of other entities.

Warning.png Important: A symbol is not a string. We shall see in lists that there is an analogue of strings in q, namely a list of char. While a list of char is a kissing cousin to a symbol, we emphasize that a symbol is not made up of char. The symbol `a and the char "a" are not the same. The char "q" and the symbol `kdb are both atomic entities.
Information.png Advanced:You may ask whether a symbol can include embedded blanks and special characters such as back-tick. The answer is yes. You create such a symbol using the relationship between lists of char and symbols. See Creating Symbols from Strings for more on this.
	`$"A symbol with `backtick"
`A symbol with `backtick
Warning.png Note: A symbol is somewhat akin a SQL VARCHAR, in that it can hold and arbitrary number of characters. It is different in that it is atomic. The char "q" and the symbol `kdb are both atomic entities.

Temporal Data

A major benefit of q is that it can process both time series and relational data in a consistent and efficient manner. Q extends the basic SQL date and time data types to facilitate temporal arithmetic, which is minimal in SQL and can be clumsy in verbose languages (e.g., Java's date library and its use of time zones). We begin with the equivalents to SQL temporal types. The additional temporal types in q deal with constituents of a date or time.

date

A date is stored in four bytes and is denoted by yyyy.mm.dd, where yyyy represents the year, mm the month and dd the day. A date value stores the count of days from Jan 1, 2000.

        d:2006.07.04
        d
2006.07.04
Warning.png Important: Months and days begin at 1 (not zero) so January is '01'.

Leading zeroes in months and days are required; their omission causes an error.

        bday:2007.1.1
'2007.1.1
Information.png Advanced: The underlying day count can be obtained by casting to int.
	`int$2000.02.01
31

time

A time is stored in four bytes and is denoted by hh:mm:ss.uuu where hh represents hours on the 24-hour clock, mm represents minutes, ss represents seconds, and uuu represents milliseconds. A time value stores the count of milliseconds from midnight.

        t:09:04:59.000
        t
09:04:59:000

Again, leading zeros are required in all constituents of a time.

Information.png Advanced: The underlying millisecond count can be obtained by casting to int.
	`int$12:34:56.789
45296789

datetime

A datetime is the combination of a date and a time, separated by 'T' as in the ISO standard format. A datetime value stores the fractional day count from midnight Jan 1, 2000.

        dt:2006.07.04T09:04:59:000
        dt
2006.07.04T09:04:59:000
Information.png Advanced: The underlying fractional day count can be obtained by casting to float.
	`float$2000.02.01T12:00:00.000
31.5

month

The month type uses four bytes and is denoted by yyyy.mm with a trailing 'm'. A month values stores the count of months since the beginning of the year.

        mon:2006.07m
        mon
2006.07m
Information.png Advanced: The underlying month offset can be obtained by casting to int.
	`int$2000.04m
3

minute

The minute type uses four bytes and is denoted by hh:mm. A minute value stores the count of minutes from midnight.

        mm:09:04
        mm
09:04
Warning.png Note: We did not use min for the variable name because min is a reserved name in q.
Information.png Advanced: The underlying minute offset can be obtained by casting to int.
	`int$01:23
83

second

The second type uses four bytes and is denoted by hh:mm:ss. A second value stores a count of seconds from midnight.

        sec:09:04:59
        sec
09:04:59

The representation of the second type makes it look like an everyday time value. However, a q time value is a count of milliseconds from midnight, so the underlying values are different.

Information.png Advanced: The underlying values can be obtained by casting to int. This manifests the inequality.
	`int$12:34:56
45296
	`int$12:34:56.000
45296000
	12:34:56=12:34:56.789
0b

Constituents and Dot Notation

The constituents of dates, times and datetimes can be extracted using dot notation. The individual field values are all extracted as int. The field values of a date are named 'year', 'mm' and 'dd'.

        d:2006.07.04
        d.year
2006
        d.mm
7
        d.dd
4

Similarly, the field values of time are 'hh', 'mm', 'ss'.

        t:12:45:59.876
        t.hh
12
        t.mm
45
        t.ss
59
Warning.png Note: At the time of this writing (Jun 2007) there is no syntax to retrieve the millisecond constituent. Use the construct,
        t mod 1000
876

In addition to the individual field values, you can also extract higher-order constituents.

        d.month
2007.07m
        t.minute
12:45
        t.second
12:45:59

Of course, this works for a datetime as well.

        dt:2006.07.04T12:45:59.876
        dt.date
2006.07.04
        dt.time
12:45:59.876
        dt.month
2006.07m
        dt.mm
7
        dt.minute
12.45
Information.png Advanced: It is a quirk in q that dot notation for accessing temporal constituents does not work on function arguments. For example,
        fmm:{[x] x.mm}
        fmm 2006.09.15
{[x] x.mm}
'x.mm

Instead, cast to the constituent type,

        fmm:{[x] `mm$x}
        fmm 2006.09.15
9

Infinities and NaN

In addition to the regular numeric and temporal values, special values represent infinities, whose absolute values are greater than any “normal” numeric or temporal value.

Token Value
0w Positive float infinity
0W Positive int infinity
0Wh Positive short infinity
0Wj Positive long infinity
0Wd Positive date infinity
0Wt Positive time infinity
0Wz Positive datetime infinity
0n NaN, or not a number
Warning.png Important: Observe the distinction between lower case 'w' and upper case 'W'.

The result of dividing any positive (or unsigned) non-zero value by any zero value is positive float infinity, denoted 0w. Dividing a negative value by zero results in negative float infinity, denoted by -0w. The way to remember these is that 'w' looks like the infinity symbol ∞.

The integral infinities can not be produced via an arithmetic division on normal int values, since the result of division in q is always a float.

The result of dividing any 0 value by any zero value is undefined, so q represents this as the floating point null 0n.

The q philosophy is that any valid arithmetic expression will produce a result rather than an error. Therefore, dividing by 0 produces a special float value rather than an exception. You can perform a complex sequence of calculations without worrying about things blowing up in the middle or inserting cumbersome exception trapping. We shall see more about this in Primitive Operations.

Warning.png Advanced: While infinities can participate in arithmetic operations, infinite arithmetic is not implemented. Instead, q performs the operation on the underlying bit patterns. Math propeller heads (including the author) find the following disconcerting.
	0W-2
2147483645

	2*0W
-2

Null Values

Overview of Nulls

The concept of a null value generally indicates missing data. This is an area in which q differs from both verbose programming languages and SQL.

In such languages as C++, Java and C#, the concept of a null value applies to complex entities (i.e., objects) that are accessed indirectly by pointer or by reference. A null value for such an entity corresponds to an un-initialized pointer, meaning that it has not been assigned the address of an allocated block of memory. There is no concept of null for entities that are of simple or value type. For those types that admit null, you test for being null by asking if the value is equal to null.

The NULL value in SQL indicates that the data value is inapplicable or missing. The NULL value is distinct from any value that can actually be contained in a field and does not have '=' semantics. That is, you cannot test a field for being null with = NULL. Instead, you ask if it IS NULL. Because NULL is a separate value, Boolean fields actually have three states: 0, 1 and NULL.

In q, the situation is more interesting. While most types have distinct null values, some types have no designated way of representing a null value.

The following table summarizes the way nulls are handled.

type null
boolean 0b
byte 0x00
short 0Nh
int 0N
long 0Nj
real 0Ne
float 0n
char " "
sym `
month 0Nm
date 0Nd
datetime 0Nz
minute 0Nu
second 0Nv
time 0Nt

Binary Nulls

Let's start with the binary types. As you can see, they have no special null value, which means that null is equivalent to the value zero. Consequently, you cannot distinguish between a missing boolean value and the value that represents false.

In practice, this isn't an issue, since in most applications it isn't a critical distinction. It can be a problem if the default value of a boolean flag in your application is not zero, so you must ensure that this does not occur. A similar precaution applies to byte values.

Numeric and Temporal Nulls

Next, observe that all the numeric and temporal types have their own designated null values. Here the situation is similar to SQL, in that you can distinguish missing data from data whose underlying value is zero. The difference from SQL is that there is no universal null value.

The advantage of the q approach is that the null values have equals semantics. The tradeoff is that you must use the correct null value in type-checked situations.

Character Nulls

Finally, we consider the character types. Considering a symbol to a variable length character collection justifies why the symbol null value is the empty symbol, designated by a back-tick ( ` ).

In contrast, the null value for the char type is the char consisting of the blank character ( " " ). As with binary data, you cannot distinguish between a missing char value and a blank value. Again, this is not seriously limiting in practice, but you should ensure that your application does not rely on this distinction.

Warning.png Note: The value "" is not the char null. Instead, it is the empty list of char.

Prev: Overview Next: Lists

Table of Contents

©2006-2007 Kx Systems, Inc. and Continuux LLC. All rights reserved.

Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox