QforMortals/atoms

From Kx Wiki
Jump to: navigation, search

Contents

Atoms

Overview

All data is ultimately built from atoms, so we begin with atoms. An atom is an irreducible value with a specific data type. The basic data types in q correspond to those of SQL with some additional date and time related types that facilitate time series. We summarize the data types in the tables below, giving the corresponding types in SQL, and where appropriate C++, Java and C#. We will cover enumerations in Casting and Enumerations.

Q SQL Java C#
boolean boolean Boolean Boolean
byte byte Byte Byte
short smallint Short Int16
int int Integer Int32
long bigint Long Int64
real real Float Single
float float Double Double
char char(1) Character Char
symbol varchar (String) (String)
date date Date
datetime datetime Timestamp DateTime
minute
second
time time Time TimeSpan
enumeration

The next table collects the important information about each of the q data types. We shall refer to this in subsequent sections.

type size char type num type notation null value
boolean 1 b 1 1b
byte 1 x 4 0x26
short 2 h 5 42h 0Nh
int 4 i 6 42 0N
long 8 j 7 42j 0Nj
real 4 e 8 4.2e 0Ne
float 8 f 9 4.2 0n
char 1 c 10 "z" " "
symbol * s 11 ‘zaphod `
month 4 m 13 2006.07m 0Nm
date 4 d 14 2006.07.21 0Nd
datetime 4 z 15 2006.07.21T09:13:39 0Nz
minute 4 u 17 23:59 0Nu
second 4 v 18 23:59:59 0Nv
time 4 t 19 09:01:02:042 0Nt
enumeration * `u$v

Integer Data

The basic integer data type is common to nearly all programming environments.

int

An int is a signed four byte integer. A numeric value is identified as an int by that fact that it contains only numeric digits, possibly with a leading minus sign, without a decimal point. In particular, it has no trailing character that would indicate that it is another numeric type (see below). Here is a typical int value,

	42

short and long

The other two integer data types are short and long. The short type represents a two byte signed integer and is denoted by a trailing 'h' after optionally signed numeric digits. For example,

	b:-123h
	b
-123h

Similarly, the long type represents an eight byte signed long integer and denoted by a trailing 'j' after optionally signed numeric digits.

	c:1234567890j
	c
1234567890j
Warning.png Type promotion is performed automatically in q primitive operations. However, if a specific integer type is required in a list and a narrower type is presented—e.g., an int is expected and a short is presented—the submitted type will not be automatically promoted and an error will result. This may be unintuitive for programmers coming from languages of C ancestry, but it will make sense in the context of tables.

Floating Point Data

Single and double precision floating point data types are standard in q.

float

The float type represents an IEEE standard eight byte floating point number, often called "double" in other languages. It is denoted by optionally signed numeric digits containing a decimal point with an optional trailing 'f'. A floating point number can hold at least 15 decimal digits of precision.

For example,

	pi:3.14159265
	sqrt2:1.41421f

real

The real type represents a four byte floating point number and is denoted by numeric digits containing a decimal point and a trailing ‘e’. Keep in mind that this type is called ‘float’ in some languages. A real can hold at least 6 decimal digits of precision, 7 being the norm. Thus

	r:1.4142e
	r
1.4142

is a valid real number.

Note: The q console abbreviates the display of float or real values having zeros to the right of the decimal.

	2.0
2f
	4.00e
4e

The behavior of substituting floating point types of different widths is analogous to the case of integer types.

Scientific Notation

Both float and real values can be specified in IEEE standard scientific notation for floating point values.

	f:1.23456789e-10
	r:1.2345678e-10e

By default, the q console displays only seven decimal digits of accuracy for float and real values by rounding the display in the seventh significant digit.

	f
1.234568e-010
	r
1.234568e+010e

You can change this by using the \P command (note upper case) to specify a display width up to 16 digits.

	f12:1.23456789012
	f16:1.234567890123456

	\P 12
	f12
1.23456789012
	f16
1.23456789012

	\P 16
	f12
1.23456789012
	f16
1.234567890123456

Binary Data

Binary data can be represented as bit or byte values.

boolean

The boolean type uses one byte to store an individual bit and is denoted by the bit value followed by 'b'.

	bit:0b
	bit
0b

byte

The byte type uses one byte to store 8 bits of data and is denoted by '0x’ followed by a hexadecimal value,

	byte:0x2a

Binary Data is Numeric

In handling binary data, q is more like C than its descendants, in that both binary types are unsigned integers and can participate in arithmetic expressions or comparisons with other numeric types. There are no keywords for 'true' or 'false', nor are there separate logical operators for logical operations. With a and pi as above,

	a:42
	bit:1b
	a+bit
43

is an int and

	byte+pi
45.14159

is a float. Observe that type promotion has been performed automatically.

Infinity and NaN

We have now met all the numeric types. In addition to the regular numeric values, three special floating point values that represent division of any numeric value by any numeric zero.

Symbol Value
0w Positive infinity
-0w Negative infinity
0n NaN, or not a number

The result of dividing any positive (or unsigned) non-zero value by zero is positive infinity, denoted 0w.

	42%0
0w

Dividing a negative value by zero results in negative infinity, denoted by -0w.

	-42%0
-0w

The way to remember these is that ‘w’ looks like the infinity symbol ∞.

Dividing any numeric zero by any numeric zero is undefined mathematically. It results in the float null value 0n, which can be thought of as NaN in this context.

	0%0
0n

The q philosophy is that any valid arithmetic expression will produce a result rather than an error. Therefore, dividing by 0 produces a value rather than an exception. Furthermore, infinites and nulls act reasonably in arithmetic and comparisons. You can perform a complex sequence of calculations without worrying about things blowing up in the middle or inserting all sorts of exception trapping. Instead, nulls or division by zero will be propagated through the results in a consistent and meaningful way. We shall learn more about this when we look at Primitive Operations.

Character Data

There are two atomic character types in q. They resemble the SQL types CHAR and VARCHAR more than the character types of verbose languages.

char

A char holds an individual ASCII character and is stored in one byte. This corresponds to a SQL CHAR. A char is denoted by a single character enclosed in double quotes.

	ch:"q"
	ch
"q"

Some keyboard characters, such as the double-quote, cannot be entered directly into a char since they have special meaning in q. As in C, these characters are escaped with a preceding '\'. The console display also includes the escape, but these are actually single characters.

	ch:"\""                        / double-quote
	ch
"\""
	ch :"\\"                       / back-slash
	ch:"\n"                        / newline
	ch:"\b"                        / back-space
	ch:"\t"                        / horizontal tab

You can also escape a character with an underlying numeric value expressed as three octal digits

	"\142"
"b"

symbol

A symbol holds a sequence of characters as a single unit. This corresponds to a SQL VARCHAR. A symbol is denoted by a leading back-quote (also read "back tick").

	s1:`q
	s2:`zaphod

A symbol is irreducible, meaning that the individual characters that comprise it are not directly accessible. Symbols are often used in q to hold names of other entities.

Warning.png A symbol is not a string. There is an analogue of strings in q, namely a list of char. Moreover, a list of char is a kissing cousin to a symbol. However, we emphasize that a symbol is not made up of char. The symbol `a and the char "a" are not the same. The char "q" and the symbol `abc are both atomic elements.

You may ask whether a symbol can include embedded blanks and special characters such as '`'. The answer is yes, and you create such a symbol using the relationship between lists of char and symbols. See Creating Symbols from Strings for more on this.

Temporal Data

A major benefit of q is that it can process both time series and relational data in a consistent and efficient manner. Q extends the basic SQL date and time data types to facilitate temporal arithmetic, which is minimal in SQL and can be clumsy in verbose languages (e.g., Java’s date library and its use of time zones). We begin with the SQL temporal types. The additional temporal types in q deal with constituents of a date or time.

date

A date is stored in four bytes and is denoted by yyyy.mm.dd, where yyyy represents the year, mm the month and dd the day.

	d:2006.07.04
	d
2006.07.04

Months and days begin at 1 (not zero) so January is '01'.

Information.png Leading zeroes in months and days are required and their omission causes an error.
	bday:7/7/1776
'/

time

A time is stored in four bytes and is denoted by hh:mm:ss.uuu where hh represents hours on the 24-hour clock, mm represents minutes, ss represents seconds, and uuu represents milliseconds.

	t:09:04:59:000
	t
09:04:59:000

Again, leading zeroes are required in all constituents of a time.

datetime

A datetime is the combination of a date and a time, separated by 'T' as in the ISO standard format.

	dt:2006.07.04T09:04:59:000
	dt
2006.07.04T09:04:59:000

month

The month type uses four bytes and is denoted by yyyy.mm with a trailing 'm'.

	mon:2006.07m
	mon
2006.07m

minute

The minute type uses four bytes and is denoted by hh:mm.

	min:09:04
	min
09:04

second

The second type uses four bytes and is denoted by hh:mm:ss.

	sec:09:04:59
	sec
09:04:59

Constituents and Dot Notation

The constituents of dates, times and datetimes can be extracted using dot notation. The individual field values are all extracted as int. The field values of a date are named 'year', 'mm' and 'dd'.

	d:2006.07.04
	d.year
2006
	d.mm
7
	d.dd
4

Similarly, the field values of time are 'hh', 'mm', 'ss'.

	t:12:45:59.876
	t.hh
12
	t.mm
45
	t.ss
59

In addition to the individual field values, you can also extract higher-level constituents.

	d.month
2007.07m
	t.minute
12:45
	t.second
12:45:59

Of course, this works for a datetime as well.

	dt:2006.07.04T12:45:59.876
	dt.date
2006.07.04
	dt.time
12:45:59.876
	dt.month
2006.07m
	dt.mm
59
Warning.png It is a quirk in q that dot notation for accessing temporal constitutents does not work on function arguments.

For example,

	fmm:{[x] x.mm}
	fmm 2006.09.15
{[x] x.mm}
'x.mm

Instead, cast to the constituent type,

	fmm:{[x] `mm$x}
	fmm 2006.09.15
9

Null Values

The concept of a null value generally indicates missing data. This is an area in which q differs from both verbose programming languages and SQL.

In such languages as C++, Java and C#, the concept of a null value applies to complex entities (i.e., objects) that are accessed indirectly by pointer or by reference. A null value for such an entity corresponds to an un-initialized pointer, meaning that it has not been assigned the address of an allocated a block of memory. There is no concept of null for entities that are of simple or value type. For those types that admit null, you test for being null by asking if the value is equal to null.

The NULL value in SQL represents missing data for any data type. The NULL value is distinct from any value that can actually be contained in a field and does not have ‘=’ semantics. That is, you cannot test a field for being null by asking if it = NULL. Instead, you ask if it IS NULL. Because NULL is a separate value, for example, Boolean fields actually have three states: 0, 1 and NULL.

Overview of Nulls

In q, the situation is more interesting. Some types have no designated way of representing a null value, other types have their own designated null values, and one type uses a regular value for null. The final column in the data type table from the overview of atoms summarizes the way nulls are handled: We reproduce it here for ease of reference.

type null
boolean 0b
byte 0x00
short 0Nh
int 0N
long 0Nj
real 0Ne
float 0n
char " "
sym `
month 0Nm
date 0Nd
datetime 0Nz
minute 0Nu
second 0Nv
time 0Nt

Binary Nulls

Let’s start with the binary types. As you can see, they have no special null value, which means that null is equivalent to the value zero. Thus, you cannot distinguish between a missing boolean value and the value that represents false.

In practice, this isn't a serious problem, since in most applications it isn't a critical distinction. It can be a problem if the default value of a boolean flag in your application is not zero, so you must ensure that this does not occur. A similar precaution applies to byte values.

Numeric and Temporal Nulls

Next, you'll notice that all the numeric and temporal types have their own designated null values. Here the situation is similar to SQL, in that you can distinguish missing data from data whose value is zero. The difference from SQL is that there is not a universal null value.

The advantage of the q approach is that the null values have equal semantics. The disadvantage is that you must use the correct null value in type-checked situations.

Character Nulls

Finally, we consider the character types. If you think of a symbol as a VARCHAR, it makes sense that the symbol null value should be the empty symbol, designated by a back-quote (`).

On the other hand, the null value for the char type is the char consisting of the blank character (" "). Similar to the situation with binary data, this means that you cannot distinguish between a missing char value and a blank value. Again, this is not seriously limiting in practice, but you should ensure that your application does not rely on this distinction.


Prev: Overview, Next: Lists

©2006 Kx Systems, Inc. and Continuux LLC. All rights reserved.

Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox