QforMortals/overview
Contents |
Overview
The Evolution of q
The q programming language and its database kdb+ were developed in 2003 by Arthur Whitney of Kx Systems, Inc. The primary design objectives of q are expressiveness, speed and efficiency. In these, it is beyond compare. The design tradeoff is a terseness that can be disconcerting to programmers coming from more verbose database programming environments—e.g., C++, Java or C#, combined with SQL. While the q programming gods revel in programs that resemble an ASCII core dump, this manual is for the rest of us.
Q evolved from APL (A Programming Language), which as first invented as a mathematical notation by Kenneth Iverson at Harvard University in the 1950's. APL became one of the first computer languages. APL was introduced by IBM as a vector programming language, meaning that it was able to process lists of numbers in a single operation, and it became successful in Finance and other industries that required heavy number crunching.
Since Q is a vector processing language by birth, it is well suited to performing complex calculations quickly on large volumes of data. What's new in q is that it can process time series data very efficiently in the relational paradigm. Its syntax allows select expressions that are similar to SQL 92, and its collection of built-in functions forms a rich superset of those in SQL 92.
There is also some LISP in q's genes. In fact, the fundamental data construct of q is a list. The notation and terminology are different, but the functionality is there and is arguably simpler. For those so inclined, writing compilers is a snap in q.
The DNA sequencing of q also shows the influence of functional computing. While q is not purely functional on the surface, it is arguably as functional as C++, Java and C# are object-oriented.
Philosophy
A proficient q developer thinks differently than in conventional RDBMS programming environments such as C++, Java and C#, henceforth referred to as "verbose" programming. In order to get you into the correct mindset, we summarize some of the potential discontinuities for the q newbie.
There are three major issues in verbose database programming:
- Business objects must be mapped to a completely different representation—e.g., tables—for persistence. It takes considerable effort to get the object-relational transfer correct—witness the complexity of EJB.
- 2. Business objects must be mapped to another representation for transport, usually some binary or XML form that flattens reference chains.
- 3. Performing data manipulations such as selection and aggregation is best done in stored procedures on the database server. Complex numeric calculations are best done away from the database on an application server.
Much of verbose programming design is spent getting the various representations correct, and much of verbose programming code is spent marshalling resources and synchronizing the different representations. These issues disappear in q.
In Memory Database One way to think of kdb+ is as an in-memory database with persistent backing. The form in which entities are held in memory is virtually identical to the way they are stored on disk and transported. Since data manipulation is performed in memory with q, there is no separate stored procedure language. This is somewhat akin to disconnected record sets in ADO.NET, but there is no separation between the language used to construct the table objects (C#) and that used to manipulate the data in the table objects (SQL).
Interpreted Q is interpreted instead of compiled. During execution, data and functions live in an in-memory workspace. Iterations of the development cycle tend to be quick because all information needed to debug is available in the workspace. Q programs are stored and executed as scripts. In addition, q functions can be created as strings and executed dynamically, so it is possible to write self-modifying code.
Ordered Lists Classical SQL is based on sets, which are unordered, so the order of rows in a table is not defined. In q, ordered lists are the foundation of all non-trivial data structures, so table rows have an order.
Evaluation Order While q is written left-to-right, expressions are evaluated right-to-left or, as the q gods prefer, left of right, meaning that the function or operator to the left executes on what is to the right of it. There is no operator precedence, so parentheses are not needed to resolve operation order.
No Objects Give up objects, ye who enter here. In contrast to the languages mentioned above, q does not implement such concepts of object-oriented programming as classes, inheritance and virtual methods. Instead, q builds complexity through the construction and mapping of ordered lists, which are actually sequences or vectors in mathematical parlance. The higher-level constructs for data manipulation in q are dictionaries and tables. A function in q can be named globally in the workspace, named within another function, or anonymous within an expression. Variables can be global or local to a function.
Types Q is a strongly typed dynamically-checked language, but its typing is less cumbersome then many typed languages. Each variable has a value of well-defined type and type promotion for operations is automatic. A variable’s type is not explicitly declared; instead, the type of a variable name reflects the value assigned to it. Lists that have been assigned with a homogenous data type will not accept or promote other types.
Null Values In classical SQL, the value NULL represents missing data for a field of any type. In q, types have separate null values. Infinite and null values can participate in arithmetic and other operations with reasonable results.
Integrated I/O I/O is done through handles that act as windows to the outside world. Once such a handle is set up, retrieving the handle's value results in a read and passing a value to the handle is a write.
Mathematical Functions Refresher
In order to understand q, it is important to have a clear grasp of the basic concepts and terminology of mathematical functions. There is no shortcut. In fact, nearly all the constructs of q can be understood as function mappings. The following refresher may help those who are unfamiliar or rusty with mathematical functions.
In mathematics, a function associates a unique output value with each input value. The collection of all input values is the domain of the function and the range is the collection from which the output values are chosen. A function is also called a map (or mapping) from the domain to the range.
The output value that a function f associates to an input value x is read "f of x." More verbosely, we say that the output is the result of applying f to the input parameter(s), or that the output value is f evaluated at x. In mathematics and most programming languages, the output value of a function is represented with the function name to the left of its arguments. The arguments are usually enclosed in matching parentheses or brackets and are separated by commas or semicolons.
There are two basic ways to define a function: an algorithm or a graph. You can specify an algorithm (formulas) that performs a sequence of operations on an input value to arrive at the corresponding output value. For example, we define the squaring function, over the domain and range of real numbers, to assign as output value the input value times itself. Alternatively, you can define a function by explicitly listing all the input-output associations. The collection of associated inputs and outputs is the graph of the function.
As you will no doubt recall from many bucolic hours in high-school math class, a function defined by formula can always be converted to a graph by feeding in input values, finding the associated outputs, and collecting the results into a table. But in general, there is no explicit formula to calculate the values in an input-output graph. If it is possible to define a function via a formula, this is usually the preferred way, since it will be compact, though there is no guarantee that the formula will be easy or quick to compute.
Here are the two forms for the squaring function over the domain of non-zero integers 0 through 3, as you might recall them from high school,
f(x) = x^{2}
I | O |
0 | 0 |
1 | 1 |
2 | 4 |
3 | 9 |
When graphing a function, we normally think of the I/O table as a list of (x,y) pairs
(0, 0) (1, 1) (2, 4) ...
However, it can also be viewed as a pair of columns in which there is a positional correspondence between the input column and the output column.
0 ——> 0 1 ——> 1 2 ——> 4 ...
The latter perspective will prove very useful.
The number of arguments to a function is called its valence. Some types of functions are important enough to have their own terminology. A function of valence 1 (i.e., defined by an algorithm that has one parameter) is said to be monadic. An example is neg(x) that takes a number and returns the negative of the number (i.e., -1 times the number). A function of valence 2 (i.e., two parameters) is said to be dyadic. An example is sum(x, y) that takes two numbers and adds them to get the result. A function with no parameters is niladic. For example, a function returning the constant 3 is niladic.
Given functions f and g for which the range of g is (contained in) the domain of f, the composite of f and g, denoted f◦g, is the function obtained by chaining the output of g into f. That is, the composite assigns to an input x the output value f(g(x)). Pictorially, we can see that the composite chains the output of g into the input of f,
g f x ——> g(x) ——> f(g(x))
The domain of the composite is the domain of g and its range is the range of f.
A recursive function is a function over the domain of positive integers whose definition has a special form. Namely, it is defined explicitly for the input value 0 (this is called the initial case) and its value for any n>0 is specified in terms of its values up to n-1. Often, but not always, the value for n is defined in terms of its value for n-1 only. In some situations the initial case will be correspond to 1 instead of 0. Many definitions and operations on lists in q will be presented recursively.
In the remainder of this document, we shall use the term map, or mapping, to refer to a mathematical function and will always mean a q function when we write "function" without a modifier.
We hope this trip down mathematics memory lane is not new territory for you. If it is, we strongly advise that you linger here until you're comfortable with the material before proceeding. There is no escaping the fact that q is a language whose foundation is mathematical functions. If you build on shaky ground, your understanding will certainly collapse under the weight of what is to come.
Getting Started
Starting q
The installation places the q executable in $HOME/q (or $QHOME) on Unix-based systems, or in the q directory on the c drive on Windows.
Start a q session by typing 'q' on the command line. You should see a new window with the Kx Systems copyright notice followed by a q command line. You should see a leading q) on the command line. Type '6*7' and press Enter to see the result.
q)6*7 42 q)
In this manual, to increase readability we shall omit the q prompt in all our snippets, showing the input you type as indented and the response as left justified.
6*7 42 _
Here the '_' stands for the blinking cursor.
Variables
Declaring a variable and assigning its value are done in a single step with ‘:’, read amend (or assign). Note that assignment does not misuse ‘=’ as many languages do. To assign a variable with an integer value 42 write,
a:42
A variable name comprises alphanumeric characters with a leading alpha. Some folks read the assignment operation succinctly as "gets."
Whitespace
In general, q permits, but does not require, whitespace around operators, separators, brackets, braces, etc. You could also write the above expression as
a : 42
or
a: 42
Because the q gods prefer compact code, you will often see programs with no superfluous whitespace…none, zilch, zip, nada. In order to help you get accustomed to this terseness, we will normally follow this convention here, but you should feel free to add whitespace for readability. We will point out where whitespace is required or is not permitted.
The q Console
Once you type your preferred version of the above assignment into the q console (which you should do now) the only response you will see is the cursor awaiting input on the next line. To see the value of a, type its name and press Enter.
a:42 a 42
You may wonder why the q console does not echo the value of a specification. This is simply a design feature of the q console.
Comments
In q, the forward-slash character (/) is used to indicate the beginning of a comment. Otherwise put, / instructs the interpreter to ignore anything to the end of the line.
Note: At least one white space character is required to the left of /.
In the following example, no definition of b is processed, so an error occurs,
a: 42 / nothing here counts b:6*7 b 'b
And the following generates an error,
a:42/ intended to be a comment '
Rant: The q gods have no need for explanatory error messages or comments since their q code is always correct and self-documenting. Mortals spend many hours poring over cryptic q error messages such as the one above that indicates b is undefined. Moreover, many mortals eschew comments in misguided misanthropic coding macho. Don't.
Assignment value
A variable is not explicitly declared or typed. Instead, the value assigned to a variable carries the type. In our example, the expression to the right of the assignment is syntactically an int value, so the name 'a' is associated with a value of type int.
The fact that variables are not declared before assignment means that an assignment can be interpreted either as the initial assignment or as a re-assignment, depending on the context. It is perfectly permissible to reassign a variable with a value of different type. Once this is done, the name will reflect the new type of the value assigned to it.
Warning: You can unintentionally change the type of a variable with a wayward assignment. Or you can inadvertently reuse a variable name and wipe out any data in the variable. An undetected typo can result in data being sent to a black hole. Be careful to enter variable names correctly.
Some verbose languages permit only a variable name to the left of an assignment. In q, as in C, an assignment carries the value being assigned and can be used as part of a larger expression. So we find,
1+a:42 43
Or
b:1+a:42 b 43
Order of Parsing
The interpreter evaluates the above specification of b by parsing the expression from right-to-left. If it were verbose,
- The integer 42 is assigned to a variable named a, then this value is added
- to the integer 1, then this result us assigned to a variable named b
Because the interpreter always parses expressions from right-to-left, programmers can read q expressions from left-to-right,
- The variable b gets the value of the integer 1 plus the value assigned to
- the variable a, which gets the integer 42
The ability to use the results of assignments in expressions permits a single line of q code to perform the work of an entire verbose program. Such an expression may execute more quickly than an equivalent version with the assignments split onto multiple statements, but the tradeoff is a reduction in readability and maintainability. Some programming in q carries terseness to the extreme. This choice of programming style should be carefully considered.
Sample Q Program
Now that we know how q works and how to start it up, let's examine some real code that shows the power of q. The following program reads a csv file of time-stamped symbols and prices, places the data into a table and computes the maximum price for each day. It then opens a socket connection to a q process on another machine and retrieves a similar daily aggregate. Finally, it merges the two intermediate tables and appends the result to an existing file.
sample:{ t:("DSF";enlist ",") 0: `:c:/q/data/px.csv; tmpx:select mpx:max Price by Date,Sym from t; h:hopen `:aerowing:5042; rtmpx:h "select mpx:max Price by Date,Sym from tpx"; hclose h; .[`:c:/q/data/tpx.dat;();,;rtmpx,tmpx]}
Most people have two immediate reactions upon seeing q code for the first time. First, they are amazed at how much can be done with so little code. Second, they wonder if they will ever be able to read it! We promise that by the end of this tutorial, this program will be easy.
Next: Atoms
©2006 Kx Systems, Inc. and Continuux LLC. All rights reserved.