11. I/O¶

11.0 Overview¶

I/O in q is one of the most powerful and succinct features of the language. The names and behavior of the functions are idiosyncratic but the economy of expression is unrivaled.

I/O is realized via handles, which are symbolic names of resources such as files or machines on a network. One-and-done operations can be performed directly on the symbolic handle – e.g., you can read a file into memory in a single operation. For continuing operations, you open the symbolic handle to obtain an open handle. The open handle is a function that is applied to perform operations. When you have completed the desired operations, you close the open handle to free any allocated resources.

11.1 Binary Data¶

In q, files come in two flavors: text and binary. Routines to process text data have ‘0’ in their names, whereas routines to process binary data have ‘1’. A text file is considered to be a list of strings – i.e., a list of char lists – and a binary file is a list of byte lists. While all text files can also be processed as binary data, not all binary data represents text. As mentioned above, file operations use handles.

11.1.1 File Handles¶

A file handle is a symbol that represents the name of a directory or file on persistent storage. A symbolic file handle starts with a colon : and has the form,

`:[path]name

where the bracketed expression represents an optional path and name is a file or directory name. The combination should be recognized as valid by the underlying operating system.

Some q operations require that you append a trailing slash / to indicate that you mean a directory. We will point these out.

It is generally easier to work with paths and names as strings so that blanks and other special characters can be handled easily. While `$ converts a string to a symbol, it can be awkward to include the leading : required in the symbolic handle. The keyword hsym, which inserts a leading colon into a symbol, serves this purpose.

q)hsym `$"/data/file name.csv"
`:/data/file name.csv

Note that q always represents separators in paths by the forward slash /, even when running on Windows. If you run q on Windows, you can type either / or \ but q will always display / in its response.

Tip

To make life easier when you are generating paths dynamically, hsym is idempotent, meaning that it will accept its own output and pass it through.


q)hsym hsym `$"/data/file name.csv"
_

11.1.2 `hcount` and `hdel`¶

The first one-and-done operation that works directly on a symbolic file handle is hcount, which returns a long representing the size of the file in bytes as reported by the OS.

q)hcount `:/data/solong.txt
35

The next one-and-done is hdel, which instructs the OS to remove the file specified by its symbolic handle operand.

q)hdel `:/data/solong.txt
`:/data/solong.txt

Some notes.

The return value of the symbolic file handle itself indicates that the deletion was successful. It should not be confused with an error message, which starts with a tick rather than a backtick.
You will get an error message if the file does not exist or if the delete cannot be performed.
You will not be prompted for confirmation. Back up any files that are important.

11.1.3 Serializing and Deserializing q Entities¶

Every q entity can be serialized and persisted to storage. Unlike traditional languages, where you must instantiate serializers and writers, things are simple and direct in q. This is because q data is self-describing, so that its internal representation can be written out as a sequence of bytes and then read directly back into memory. This is as close to the Star Trek transporter as we are likely to get.

The magic is done by (an overload of) the binary set, whose left operand is a file handle and right operand is the entity to be written. The result is the symbolic handle of the written file. The file is automatically closed once the write is complete.

q)`:/data/a set 42
`:/data/a
q)`:/data/L set 10 20 30
_
q)`:/data/t set ([] c1:`a`b`c; c2:10 20 30)
_

The behavior of set is to create the file if it does not exist and overwrite it if it does. It will also create the directory path if it does not exist.

A serialized q data file can be read using (an overload of) the unary get, whose argument is a symbolic file handle and whose result is the q entity contained in the data file.

q)get `:/data/a
42
q)get `:/data/L
_
q)get `:/data/t
_

An equivalent way to read a data file is with (an overload of) value.

q)value `:/data/t
_

Alternatively, you can use the command \l to load a data file into memory and assign it to a variable with the same name as the file. Here you do not use a file handle; rather, specify the path to the file without any decoration. In a fresh q session,

q)t
't
q)\l /data/t
`t
q)t
_

11.1.4 Binary Data Files¶

As with traditional languages, for continuing operations on a q data file, you open the file, perform the operation(s) and then close it. Unlike traditional languages, opening a symbolic handle returns a function, called an open handle, that is used to perform operations.

As mentioned previously, q files come in two flavors, binary and text. Serialized q data persisted with set is written in binary form with a header at the beginning of the file. You can read it as raw binary data to inspect its internals.

Open a data file handle with hopen, whose result is a function called the open handle. This function should be stored in a variable, traditionally h, which is functionally applied to data to write it to the file. We will explain the result of applying the open handle shortly. We begin with a file containing serialized q data and show how to append to it.

q)`:/data/L set 10 20 30
`:/data/L
q)h:hopen `:/data/L
q)h[42]
3i
q)h 100 200
3i

Always apply hclose to the open handle to close it and flush any data that might be buffered.

Failure to do so may cause your program to run out of file handles unnecessarily.

We verify that the appends have been made.

q)hclose h
q)get `:/data/L
10 20 30 42 100 200

We can also create a new file and write raw binary data to it.

q)h:hopen `:/data/raw
q)h[42]
3i
q)h 10 20 30
3i
q)hclose h

Now, what is the deal with the 3i return value of applying the open handle?

q)h:hopen `:/data/raw
q)h 43
3i

In fact, the return value is the value of the open handle itself.

q)h
3i

Surely, you say, we can’t use an int as a function to write data. But you would be wrong.

q)h:hopen `:/data/new
q)h
3i
q)3i[100 200 300]
3i
q)hclose 3i
q)get `:/data/new
_

The last expression above signals an error. get requires a file to be initialized by set. But appending data to a file initialized by set is neither reliable nor recommended. Ed.

Tip

Apparently q assigns an int to each open file and keeps track of which int values are valid handles. This accounts for the cryptic error message when you attempt to use variables with simple list notation.

 q)a:42
 q)b:43
 q)a b
 ': Bad file descriptor

11.1.5 Writing and Reading Binary¶

Apply read1 on a file handle to read any file into q as a list of bytes. For example, we can read the previously serialized value L as bytes.

q)read1 `:/data/L set 10 20 30
0xfe2007000000000003000000000000000a0000000000000014000000000000001e..

This shows the internal representation of the serialized q entity. How cool is that?

If you want to write raw binary data, as opposed to the internal representation of a q entity containing the data, use the infelicitously named 1:. It takes a symbolic file handle as its left argument and a simple byte list as its right argument. Bytes in the right operand are essentially streamed to the file.

q)`:/data/answer.bin 1: 0x06072a
`:/data/answer.bin
q)read1 `:/data/answer.bin
0x06072a

11.1.6 Using Apply Amend¶

Fundamentalists can use Apply Amend in place of set to serialize q entities to files. To write the file, or overwrite an existing file, use assign :.

q).[`:/data/raw; (); :; 1001 1002 1003]
`:/data/raw
q)get `:/data/raw
1001 1002 1003

To append to an existing file use ,.

q).[`:/data/raw; (); ,; 42]
`:/data/raw
q)get `:/data/raw
1001 1002 1003 42

11.2 Save and Load on Tables¶

We have already seen that it is easy to write and read tables to/from persistent storage.

q)`:/data/t set ([] c1:`a`b`c; c2:10 20 30; c3:1.1 2.2 3.3)
`:/data/t
q)get `:/data/t
_

The save and load functions make this even easier.

In its simplest form, save serializes a table in a global variable to a binary file having the same name as the variable. It overwrites an existing file.

q)t:([] c1:`a`b`c; c2:10 20 30; c3:1.1 2.2 3.3)
q)save `:/data/t
`:/data/t
q)get `:/data/t
_

This is equivalent to using set above with the table name as file name.

As you might expect, load is the inverse of save meaning that it reads a serialized table from a file into a variable with the same name as the file. It creates the variable in the workspace or overwrites it if it already exists.

In a fresh q session after t has been saved as above,

q)t / t doesn't exist
't
q)load `:/data/t
`t
q)t / now it does
_

You can also use save to write a table to a text file. You determine the format of the text with the file extension in the file handle.

All the following versions of save can also be performed with the more general 0: – see §11.5.

Save the table with .txt extension to obtain tab-delimited records. There is no corresponding load but you can parse the text file – see §11.5.1.

q)save `:data/t.txt
`:data/t.txt

The resulting file is

c1\tc2\tc3
a\t10\t1.1
b\t20\t2.2
c\t30\t3.3

Save the table with .csv extension to obtain comma-separated values. There is no corresponding load but you can parse the CSV file – see §11.5.2.

q)save `:data/t.csv
`:data/t.csv

The resulting file is

c1,c2,c3
a,10,1.1
b,20,2.2
c,30,3.3

Save the table with .xml extension to obtain XML records. There is no direct way to read XML into q although libraries have been contributed – see code.kx.com.

q)save `:data/t.xml
`:data/t.xml

The resulting file is

<R>
<r><c1>a</c1><c2>10</c2><c3>1.1</c3></r>
<r><c1>b</c1><c2>20</c2><c3>2.2</c3></r>
<r><c1>c</c1><c2>30</c2><c3>3.3</c3></r>
</R>

Save the table with .xls extension obtain an Excel spreadsheet. This file can be loaded by Excel work-alikes.

q)save `:data/t.xls
`:data/t.xls

11.3 Splayed Tables¶

We have already seen how to persist a table to a file using set. There are no restrictions on the types of columns in the table or the file name in this scenario.

q)`:/data/t set ([] c1:`a`b`c; c2:10 20 30; c3:1.1 2.2 3.3)
`:/data/t
q)get `:/data/t
_

This creates a single file, as the OS verifies.

>ls -l /data/t
-rw-r--r-- 1 jeffry wheel 98 Mar 6 08:22 /data/t

For larger tables that may not fit into memory on all machines, you can ask q to serialize each column of the table to its own file in a specified directory. A table persisted in this form is called a splayed table. The advantage is that when querying a splayed table, only the columns referred to in the query will be loaded into memory. This is a substantial memory win for a table having many columns.

It is worthwhile looking up the origin of the English word “splay”. Also, please don’t spay your tables.

To splay a table, use set and specify a directory as the target location indicated by a trailing slash / in the left operand.

q)`:/data/tsplay/ set ([] c1:10 20 30; c2:1.1 2.2 3.3)
`:/data/tsplay/

List the directory in the OS and you will see a directory tsplay that contains three files, one file for each column in the original table, as well as a hidden .d file.

>ls -l -d /data/tsplay
drwxr-xr-x 5 jeffry wheel 170 Mar 6 08:36 /data/tsplay
>ls -l -a /data/tsplay
total 24
drwxr-xr-x 5 jeffry wheel 170 Mar 6 08:36 .
drwxr-xr-x 9 jeffry wheel 306 Mar 6 08:36 ..
-rw-r--r-- 1 jeffry wheel 14 Mar 6 08:36 .d
-rw-r--r-- 1 jeffry wheel 40 Mar 6 08:36 c1
-rw-r--r-- 1 jeffry wheel 40 Mar 6 08:36 c2

Nearly all the metadata regarding the splayed table can be read from the file system – i.e., the name of table from directory and names of the columns from the files. The one missing bit is the order of the columns, which is stored as a serialized list in the hidden .d file.

q)get hsym `$"/data/tsplay/.d"
`c1`c2

Important

There are restrictions on tables that can be splayed.

All columns must be simple or compound lists. The latter means a list of simple lists of uniform type. An arbitrary general list column cannot be splayed.
Symbol columns must be enumerated.

Thus the following succeed.

q)`:/data/tok/ set ([] c1:2000.01.01+til 3; c2:1 2 3)
`:/data/tok/
q)`:/data/tok/ set ([] c1:1 2 3; c2:(1.1 2.2; enlist 3.3; 4.4 5.5))
`:/data/tok/

And the following fail.

q)`:/data/toops/ set ([] c1:1 2 3; c2:(1;`1;"a"))
k){$[@x;.[x;();:;y];-19!((,y),x)]}
'type
q)`:/data/toops/ set ([] c1:`a`b`c; c2:10 20 30)
k){$[@x;.[x;();:;y];-19!((,y),x)]}
'type

The first set above works in later versions of kdb+. [Ed.]

The convention for enumerating symbols in splayed tables is to enumerate all symbol columns in all tables over the domain sym and store the resulting sym list in the root directory – i.e., one level above the directory holding the splayed table. You can do this manually but practically no one does.

q)`:/db/tsplay/ set ([] `sym?c1:`a`b`c; c2:10 20 30)
`:/db/tsplay/
q)sym
`a`b`c
q)`:/db/sym set sym
`:/db/sym

Normally folks use one of the .Q utilities, in spite of the official KX admonition not to use them. For example, here we use .Q.en.

q)`:/db/tsplay/ set .Q.en[`:/db; ([] c1:`a`b`c; c2:10 20 30)]
`:/db/tsplay/

Only unofficially documented, .Q.en prepares a qualified table for splaying by enumerating all its symbol columns. The first argument is the symbolic file handle of the root directory for the persistent residence of the enumeration domain sym (no choice in the name). The second argument is a table. See §14.5.2 for more detail on its behavior.

Update: .Q is now documented at code.kx.com. Ed.

11.4 Text Data¶

We have seen that q views a record in a binary data file as a list of bytes. Similarly, a record in a text file is viewed as a list of char – i.e., a string. Thus reading a text file results in a list of strings and you pass a list of strings to write to a text file.

11.4.1 Reading and Writing Text Files¶

Read a text file with the unary read0 that takes a symbolic file handle argument. The result is a list of strings, one for each line in the file. For the file /data/solong.txt with content,

So long
and thanks
for all the fish

we find,

q)read0 `:/data/solong.txt
"So long"
"and thanks"
"for all the fish"

You can see the underlying binary values of the text by using read1 or casting the result of read0 to bytes.

q)read1 `:/data/solong.txt
_
q)"x"$read0 `:/data/solong.txt
0x4c696665
0x54686520556e697665727365
0x416e642045766572797468696e67

Or you can read the data as binary and cast the result to char. Observe that the data is a simple list of char so the newline character does not cause line breaks in the console display.

q)"c"$read1 `:/data/solong.txt
"Life\nThe Universe\nAnd Everything\n"

To write string as text, use the (infelicitously named) binary 0:, which takes a file handle in the left operand and a list of strings in the right operand. It creates the directory path if necessary and overwrites the file if it already exists.

q)`:/data/solong.txt 0: ("Life"; "The Universe"; "And Everything")
`:/data/solong.txt
q)read0 `:/data/solong.txt
_

11.4.2 Using `hopen` and `hclose`¶

Just as with a binary data file, a symbolic text file handle can be opened with hopen. The result is again an int that is conventionally stored in the variable h and is used with function application syntax to write data. The difference is that instead of using plain h to write binary data, you use neg[h] to write strings as text. Seriously.

q)h:hopen `:/data/new.txt
q)neg[h] enlist "This"
-3i
q)neg[h] ("and"; "that")
-3i
q)hclose h
q)read0 `:/data/new.txt
_

Observe that you apply hclose to h, not to neg[h].

If the file already exists, opening with hopen and applying the open handle will append rather than overwrite.

q)h:hopen `:/data/new.txt
q)neg[h] ("and"; "more")
-3i
q)hclose h
q)read0 `:/data/new.txt
_

11.4.3 Preparing Text¶

We saw the built-in functions for saving tables as text files in §11.2. When you need to control the filename, you can write the table yourself with 0:, but then you must prepare the table columns as formatted text. A separate overload of 0: is available for this purpose. A confusing naming convention, to say the least.

In this use, 0: has as left operand a char delimiter and as right operand a table or list of columns. Observe the use of the pre-defined constant csv, which is simply ",".

q)t:([] c1:`a`b`c; c2:1 2 3)
q)"\t" 0: t
"c1\tc2"
"a\t1"
"b\t2"
"c\t3"
q)"|" 0: t
_
q)csv
","
q)csv 0: t
_
q)`:/data/t.csv 0: csv 0: t
_

In the last snippet we applied 0: with two different meanings: to prepare and then write text. We hope you’ve grown fond of this name, since §11.5 will introduce yet another version of 0: for parsing text records.

11.5 Parsing Records¶

Binary forms of 0: and 1: parse individual fields according to data type from text or binary records. Field parsing is based on the following field types.

0	1	Type	Width(1)	Format(0)
B	b	boolean	1	[1tTyY]
X	x	byte	1
H	h	short	2	[0-9a-fA-F][0-9a-fA-F]
I	i	int	4
J	j	long	8
E	e	real	4
F	f	float	8
C	c	char	1
S	s	symbol	n
P	p	timestamp	8	date?timespan
M	m	month	4	[yy]yy[?]mm
D	d	date	4	[yy]yy[?]mm[?]dd or [m]m/[d]d/[yy]yy
Z	z	datetime	8	date?time
N	n	timespan	8	hh[:]mm[:]ss[[.]ddddddddd]
U	u	minute	4	hh[:]mm
V	v	second	4	hh[:]mm[:]ss
T	t	time	4	hh[:]mm[:]ss[[.]ddd]
blank	skip
*				literal chars

The column labeled ‘0’ contains the (upper case) field type char for text data. The (lower case) char in column ‘1’ is for binary data. The column labeled ‘Width(1)’ contains the number of bytes that will be parsed for a binary read. The column labeled ‘Format(0)’ displays the format(s) that are accepted in a text read.

The parsed records are returned in column form rather than row form to make it easy to associate a list of symbol names with ! and then flip into a table.

11.5.1 Fixed-Width Records¶

The binary form of 0: and 1: for reading fixed length files is,

(L_t;L_w) 0:f

(L_t;L_w) 1:f

The left operand is a nested list containing two items: L_t is a simple list of char containing one letter per field; L_w is a simple list of int containing one integer width per field. The sum of the field widths in L_w should equal the width of the record. The result of the function is a list of lists, one list arising from each field.

We demonstrate 0: here since it is more commonly used; 1: works analogously. The simplest form of the right operand f is a symbolic file handle. For example, suppose we have a file with records of the form,

1001  98.000ABCDEF1234Garbage2015.01.01
1002  42.001GHUJKL0123Garbage2015.01.02
1003  44.123nopqrs9876Garbage2015.01.03

We could parse the records of the file with,

q)("JFS D";4 8 10 7 10) 0: `:/data/Fixed.txt
1001 1002 1003
98 42.001 44.123
ABCDEF1234 GHUJKL0123 nopqrs9876
2015.01.01 2015.01.02 2015.01.03

This reads a text file containing fixed length records of width 39. The first field is a long occupying 4 positions; the second field is a float occupying 8 positions; the third field consists of a symbol occupying 10 positions; the fourth slot of 6 positions is ignored; the fifth field is a date occupying 10 positions.

You might think that the widths are superfluous, but they are not. The actual data width can be narrower than the normal size due to small values, as in our case of the long field. Or you may need to specify a width larger than that required by the corresponding data type due to whitespace in the fields, as in the case of our float field.

Observe how easy it is to make a table from the result.

q)flip `c1`c2`c3`c4!("JFS D";4 8 10 7 10) 0: `:/data/Fixed.txt
c1   c2     c3         c4
---------------------------------
1001 98     ABCDEF1234 2015.01.01
1002 42.001 GHUJKL0123 2015.01.02
1003 44.123 nopqrs9876 2015.01.03

Also note that it is possible to parse a list of strings using the same format, since they represent text records in memory.

q)fixed: read0 `:/data/Fixed.txt
q)("JFS D";4 8 10 7 10) 0: fixed
_

The more general form for the right operand f is,

(h_file;i;n)

where h_file is a symbolic file handle, i is the offset into the file to begin reading and n is the number of bytes to read. This is useful for sampling a file or for large files that cannot be read into memory in a single gulp.

A read operation should begin and end on record boundaries or you will get meaningless results.

In our trivial example, the following reads just the second and third records,

q)("JFS D";4 8 10 7 10) 0: (`:/data/Fixed.txt; 40; 80)
_

11.5.2 Variable Length Records¶

The binary form of 0: and 1: for reading variable length, delimited files is

(L_t;D) 0:f

(L_t;D) 1:f

The left operand is a list comprising two lists. L_t is a simple list of char containing one type letter per corresponding field. D is either a char representing the delimiting character or an enlisted char.

Specify D as a delimiter char when the first record of the file does not contain column names. In this case, the result of the parse is a list of column lists, each of which contains items of type specified by L_t. The simplest form of the right operand f is a symbolic file handle.

For example, say we have a comma-separated file /data/Simple.csv having records

1001,DBT12345678,98.6
1002,EQT98765432,24.75
1004,CCR00000001,121.23

Parsing with a delimiter char "," results in a list of column lists. As with parsing fixed format records, it is easy to make the result into a table.

q)("JSF"; ",") 0: read0 `:/data/Simple.csv
1001        1002        1004
DBT12345678 EQT98765432 CCR00000001
98.6        24.7        121.23
q)flip `c1`c2`c3!("JSF"; ",") 0: read0 `:/data/Simple.csv
_

Observe that it is possible to retrieve the second field as a string instead of a symbol using "*" as the data type specifier,

q)("J*F"; ",") 0: read0 `:/data/Simple.csv
1001          1002          1004
"DBT12345678" "EQT98765432" "CCR00000001"
98.6          24.7          121.23

Specify D as an enlisted char when the first record contains a separated list of names. Subsequent records are read as data specified by the types in L_t. The result is a table in which the column names are taken from the first record.

Say we have a comma-separated file /data/Titles.csv having records,

id,ticker,price
1001,DBT12345678,98.6
1002,EQT98765432,24.7
1004,CCR00000001,121.23

Reading with an enlisted "," delimiter results in a table.

q)("JSF"; enlist ",") 0: `:/data/Titles.csv
id  ticker       price
-----------------------
1001 DBT12345678 98.6
1002 EQT98765432 24.7
1004 CCR00000001 121.23

11.5.3 Key-Value Records¶

The operator 0: can also be used to process text representing key-value pairs. In this situation, the left operand is a three-character string P_f that specifies the pair format. The first char of P_f can be "S" to indicate the key is a string or "I" to indicate the key is an integer. The second char indicates the key-value separator. The third char indicates the pair delimiter.

The following examples illustrate various combinations in P_f.

q)"S=;" 0: "one=1;two=2;three=3"
one two three
,"1" ,"2" ,"3"
q)"S:/" 0: "one:1/two:2/three:3"
_
q)"I=;" 0: "1=one;2=two;3=three"
_

Again it is easy to make the result into a table.

q)flip `k`v!"I=;" 0: "1=one;2=two;3=three"
k v
---------
1 "one"
2 "two"
3 "three"

11.6 Interprocess Communication¶

The ease with which a q process can communicate with another q process residing on the network is one of the most impressive features of q. We shall cover all the basics of interprocess communication (IPC) so that you can follow the section on callbacks in Chapter 1 – Q Shock and Awe.

We shall use the following terminology. The process that initiates the communication is called the client, while the process receiving and processing requests is the server. The server process can be on the same machine, the same network, a different network or on the Internet, so long as it is accessible. The communication can be synchronous (wait for a result to be returned) or asynchronous (don’t wait and no result returned).

The only way to learn IPC is to do it, and the easiest way to do this is to set up two processes on the same machine. We recommend you use the machine running your q sessions for this tutorial, provided it will allow a port to be opened. In what follows, we shall assume that a server q process has been started on a machine with an open port.

>q -p 5042
q)

The client process is a separate q process running on the same machine.

>q
q)

11.6.1 Communication Handle¶

Symbolic communication handles look similar to file handles but they specify resources on the network. A communication handle has the form,

`:[server]:port

Here the bracketed expression represents an optional server machine identifier and port is a port number. An omitted server specification, or one of the form localhost, refers to the machine on which the originating q session lives. The following both refer to port 5042 on the same machine as the q session in which they are entered.

q)`::5042
_
q)`:localhost:5042
_

You can refer to a machine on the network by name. For example, on the author’s laptop the following is equivalent to the two previous network handles.

q)`:aerowing:5042
_

You can use the IP address of a machine.

q)`:198.162.0.2:5042
_

Finally, you can also use a URL.

q)`:www.myurl.com:5042
_

11.6.2 Opening a Connection Handle¶

As with a file handle, apply hopen to a communication handle to obtain an open connection handle that is used as a function. As before, the value is an int that is traditionally stored in the variable h. Also as with file I/O, the behavior of this function differs between using the original positive handle or its negation.

Let’s see how this works with our two sessions. (You did start them, didn’t you?). Remember, the session that opened port 5042 is the server; the other session is the client. In the client session, open a handle to the server and store it in h, then apply h to the string as shown. Finally close the connection handle.

q)h:hopen `::5042
q)h "a:6*7"
q)h "a"
42
q)hclose h

Whitespace between h and the quoted string is optional, as this is simply prefix syntax. We include it for readability.

As you have no doubt realized, the application of h sent the string to the server to be evaluated. On the server, we see,

q)a
42

How cool is that?

11.6.3 Remote Execution¶

We have seen that when you open a connection to a q process, you have the full capability of that process available remotely. Apply the connection handle to any q expression in a string and it will be evaluated on the server. As you contemplate the IPC Zen, a dark cloud passes over your tranquility. You realize that, by default, the server is wide open.

Allowing quoted q strings to be executed on a server makes the server susceptible to all manner of breaches.

Good practice does not permit this on a production server. You can mitigate this by having your server process accept only requests whose first item is a symbol (see below), which you should verify is the name of a function you have decided to expose.

An alternative format for remote execution is to apply the connection handler to a list of the form

(f;arg₁;arg₂;...)

Here f is a client-side expression that evaluates to a map that will be applied on the server. It can be:

The value of, or variable associated to, a map on the client
The symbolic name of a map on the server.

We use the term map here to be any q expression that can be evaluated as function application – e.g., a list on an index, a dictionary on a key or a function on an argument. Most commonly f is a function

The remaining items arg₁, arg₂, … are optional values sent along to the server for the evaluation. These are arguments when f is a function, indices when it is a list, or keys when it is a dictionary.

Application of the connection handle to such a list sends the list to the server where it is evaluated. Any result is sent back to the client, where it is presented as the result of the connection handle application. By simply applying the naked handle, this sequence of steps is synchronous, meaning that execution of the q session on the client blocks until the result of the server evaluation is returned.

Our examples will cover the case when f is of function type since that is most common. We first consider the first case when f is a map on the client side. In this situation the function (list, dictionary, etc.) is actually transported to the server along with the supplied arguments, where it is applied.

On the client in our two-session setup:

q)h:hopen`::5042 / client
q)h ({x*y}; 6; 7)
42
q)f:{x*y}
q)h (f; 6; 7)
42

Before you get too enamored of this form, we point out the limitations that disqualify it from production use. First, global variables referred to in the transported function will need to be present remotely in the exact contexts in effect when the function was defined. This can be avoided by restricting f to be a pure function that does not refer to any global entities. More damning is:

Allowing a function to be sent to the server for remote execution is as dangerous as sending quoted q strings

The function can access resources on the server and instigate an attack. Good practice does not permit this in production environments.

The remaining format for remote execution can be made safe for production environments. The function to be executed remotely must already be defined on the server and you pass its name and arguments via the connection handle.

On the server,

q)g:{x*y} / server

On the client,

q)h (`g; 6; 7) / client
42

Now consider the case when the remote function performs an operation on a table and returns the result. This is the q analogue of a remote stored procedure. For example, suppose t and f are defined on the server as,

q)t:([] c1:`a`b`c; c2:1 2 3) / server
q)f:{[x] select c2 from t where c1=x}

Now “call” the function f remotely from the client.

q)h (`f; `b) / client
c2
--
2

The difference from SQL stored procedures is that the remote procedure can be any q function on the server, making the full power of q available remotely.

11.6.4 Synchronous and Asynchronous Messages¶

The IPC in the previous sections was synchronous, meaning that upon application of the connection handle, the client process blocks, waiting for a result from the server before proceeding. The value returned from the server becomes the return value of the open handle application.

Under the covers, IPC is implemented as messages passed over an open connection between q processes. When the positive open handle is applied to an argument, the message passing is synchronous, meaning that the following steps occur in sequence.

The client sends a message containing the argument(s) of the handle application to the server and waits for a return message.
The server receives the message, interprets it as the appropriate function application and obtains the result.
The server sends a message containing the result back to the client.
The client receives the result and resumes execution from the point it left off.

When a client sends multiple messages to a server in synchronous message passing, the next message is not sent until the result of the previous message is received. Consequently the messages always arrive at the server in the order in which they are sent. Also, the results from the server arrive back at the client in the order in which the original messages were sent.

It is also possible to perform asynchronous IPC in q. In this case the message is sent to the server and execution on the client continues immediately. In particular, there is no return value from the server. This is useful to initiate a task on the server when you don’t care about the result. For example, you could initiate a long running operation, or you could send a message that the server will route to other processes.

Use the negation of the open connection handle to send an asynchronous message to the server. Let’s define an instrumented function on the server to demonstrate what is happening.

q)sq:{0N!x*x} / server

Now invoke sq asynchronously from the client

q)neg[h] (`sq; 5) / client
q)

You will observe 25 displayed on the server console. Also, the client session returns immediately with no return value. The expression on the console actually has a nil value :: that is suppressed by the console display.

When sending asynchronous messages, always send an empty “chaser” message immediately before applying hclose to the open handle

If you do not do this, buffered messages may not be sent when the connection is closed.

In order to convince ourselves that the client actually does return immediately without waiting for a return from the server, we wrap the client expression in a function. Observe that the client continues with the next statement.

q){neg[h] (`sq.; 5); 42}[] / client
42

Because a q session is single threaded by default, the server will process messages in the order in which they are received. However, in asynchronous messaging there is no guarantee that the messages arrive at the server in the order in which they are sent. It can be difficult to observe indeterminancy in simple examples, but you must assume that it will occur in practice.

11.6.5 Processing Messages¶

Assuming that you have passed the server either a function from the client side or the name of a function on the server side, the appropriate function is evaluated on the server. During evaluation, the communication handle of the remote process is available in the system variable .z.w ( “who” called). For an asynchronous call, this can be used to send messages back to the server during the function application on the server.

Both the client and the server have connection handles when a connection between them is opened

However, these handles are assigned independently and their int values are not equal in general.

Here is a simple example showing how to use .z.w to send a message back to the client. On the server, we define a function that displays its received parameter and then asynchronously calls mycallback with the passed argument incremented.

q)f:{show "Received ",string x; neg[.z.w] (`mycallback; x+1)}

On the client we define mycallback to display its parameter on the console. Then we make an asynchronous call to the function f on the server with an argument of 42.

q)mycallback:{show "Returned ",string x;}
q)neg[h] (`f; 42)
q)"Returned 43"

The result is that "Received 42" is displayed on the server console and "Returned 43" is displayed on the client console. Congratulations! We have just invented callbacks in q.

When performing asynchronous messaging, always use neg[.z.w] to ensure that all messages are asynchronous

Otherwise you will get a deadlock as each process waits for the other.

You can override the default behavior of message processing in q by assigning your own handler(s) to the appropriate system variables. Assign your function to the variable .z.pg to trap and process synchronous messages and to .z.ps for asynchronous messages. The names end in ‘g’ and ‘s’ because synchronous processing has "get" semantics and asynchronous processing has "set" semantics.

In the following we set the asynchronous handler to a trivial function, essentially ignoring asynchronous calls.

On the server,

q).z.ps:{show "ignore"} / server

On the client send an asynchronous message.

q)neg[h] "6*7" / client

This results in "ignore" being displayed on the server console.

Now we set the synchronous handler to a function that only accepts “safe” remote calls by function name. It then performs a protected evaluation on the function with the arguments passed, thus ensuring that a failed application does not hang the server.

On the server,

q).z.pg:{$[-11h=type first x; .[value first x; 1_x; ::]; `unsupported]}

Now send synchronous messages from the client.

q)h (`sq; 5) / client
25
q)h (`sq; `5)
"type"
q)h "6*7"
`unsupported
q)h ({x*y};6;7)
`unsupported

You can also specify handlers to be called upon connection open and close by assigning functions to the system variables .z.po and .z.pc, respectively. The connection handle of the sending process is passed as the lone argument to the functions assigned to .z.po and to .z.pc.

Here is a simple example that tracks connections and allows client processes to register callbacks with the server. Start a fresh q session on the server and open port 5042. Create a keyed table called Registry and define a function that can be invoked remotely to register a callback. Attach a handler to .z.po that initializes a dummy entry in Registry for the connection being opened and attach a handler to .z.pc to remove the record when a connection is closed.

q)Registry:([zw:`int$()] callback:`symbol$())
q)register:{[cb] `Registry upsert (.z.w; cb);}
q).z.po:{`Registry upsert (x; `unregistered);}
q).z.pc:{delete from `Registry where zw=x;}

Start a fresh q session on the client and connect to the server.

q)h:hopen`::5042 / client

We check that an item has been entered into Registry on the server.

q)Registry / server
zw| callback
--| ------------
6 | unregistered

Next we register the name of a callback function from the client. Note the asynchronous message.

q)neg[h] (`register; `mycallback) / client

Again we check Registry on the server and observe that our callback name has indeed been registered.

q)Registry / server
zw| callback
--| ----------
6 | mycallback

Finally, we close the connection on the client.

q)hclose h / client

And observe that the client has been automatically unregistered.

q).z.pg:{show x 0; show x 1; ; string value 1_x 0}
zw| callback
--| --------

11.6.6 Remote Queries¶

In this section, we demonstrate how to execute q-sql queries against a remote server. First, we splay a table to stand for a time-series database. We use the mktrades script that we created in §9.3.1 to create a trades table with 1,000,000 rows and then splay it to disk.

q)trade:mktrades[`aapl`goog`ibm; 1000000]
q)(`:/db/trade/) set .Q.en[`:/db;]
_

Now start a fresh server process (the server), open a port, say 5042, and map the splayed trade table into memory. Check that the mapping succeeded by running a query.

q)\p 5042 / server
q)\l /db
q)select from trade where dt=2015.01.01,sym=`ibm
dt         tm           sym qty  px
---------------------------------------
2015.01.01 00:00:01.796 ibm 7080 218.74
2015.01.01 00:00:10.581 ibm 3250 206.88
..

Leave the server process running and start another fresh process (the client), open a connection to the server and send the same query to the server for remote execution.

q)h:hopen`::5042 / client
q)h "select from trade where dt=2015.01.01,sym=`ibm"
dt         tm           sym qty  px
---------------------------------------
2015.01.01 00:00:01.796 ibm 7080 218.74
2015.01.01 00:00:10.581 ibm 3250 206.88
..

We have already pointed out that allowing remote execution of arbitrary strings is bad practice because it exposes the server to injection attack. So here is a simplistic example of a “safe” function that can be used as a stored procedure. It takes a symbolic table name, a list of symbolic column names for the result and a date range for the where phrase. Enter on the server:

q)extract:{[tn;cnms;dtrng] ?[tn;enlist (within;`dt; dtrng);0b;cnms!cnms]}

Now on the client we (synchronously) call the stored procedure by name with appropriate arguments.

q)h (`extract;`trade;`dt`tm`sym`qty`px;2015.01.01 2015.01.02)
dt         tm           sym   qty  px
----------------------------------------
2015.01.01 00:00:01.194 aapl  6770 94.62
2015.01.01 00:00:01.796 ibm   7080 218.74
..

In an actual application you would validate the input parameters and wrap the core evaluation in protected evaluation to trap unanticipated errors. You would also want to implement an entitlements system based on LDAP.

11.7 HTTP and Web sockets¶

11.7.1 HTTP Connections¶

When you open a port in a q session, by default that session serves HTTP requests. To demonstrate this, start a q session and open a port, say 5042. Then bring up a relatively recent browser on the same machine (the author uses Chrome) and enter the following URL

http://localhost:5042/?6%2A7

You should see 42 in the browser page display.

You can trap HTTP GET and POST traffic by assigning functions to the system variables .z.ph and .z.pp respectively. The default handler for .z.ph is to evaluate the content of the first item of the passed argument.

There is no default handler for .z.pp.

Here is a simple example that duplicates the default GET processing and shows the two items of its list argument. Define the following handler on the server process opened previously. It displays the two items of the input list then executes the first after removing the leading ? and then returns the result as a string.

q).z.ph:{show x 0; show x 1; ; string value 1_x 0} / server

Now enter the following from a browser on the same machine.

http://localhost:5042/?6*7

The server will display,

q)"?6*7"
Host            | "localhost:5042"
Connection      | "keep-alive"
Cache-Control   | "max-age=0"
Accept          | "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,\*/\*;q=0.8"
User-Agent      | "Mozilla/5.0 (Macintosh; Intel Mac OS X 10\_9\_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.89 Safari/537.36"
Accept-Encoding | "gzip, deflate, sdch"
Accept-Language | "en-US,en;q=0.8"

And the browser page displays “42”.

11.7.2 Basic WebSockets¶

WebSockets is a network protocol that upgrades an initial HTTP handshake into a TCP/IP socket connection. It was initially used to enhance communication capability between browsers and web servers but it can be used for general client-server applications. Once the WebSocket connection is established, either the client or server can message the other; in particular, this provides the capability for the server to push data to the client.

As of this writing (Sep 2015) q implements only asynchronous messaging in WebSockets.

In this section we show the basic mechanism for establishing a WebSocket connection between a browser and a q process acting as the server. We use Chrome for the examples but recent versions of Internet Explorer are now WebSockets-capable and should work similarly.

In the examples of this section we assume basic familiarity with HTML5 and JavaScript.

We begin with an extremely simple HTML page with a button that, when clicked, displays the answer to life, the universe and everything. Save the following as a text file sample0.html in a location accessible to your browser.

<!doctype html>
<html>
<head>
<script>
  function sayN(n) {
    document.getElementById('answer').textContent = n;
  }
</script>
</head>
<body>
  <h1 style='font-size:200px' id='answer'></h1>
  <button onclick='sayN(42)'>get the answer</button>
</body>
</html>

In our case we saved the file to /pages/sample0.html on the local drive, so we enter the following URL in the browser:

file:///pages/sample0.html

You should see a page with a single button labeled “get the answer”. Click the button and you will see the answer in a very large font.

Now we enhance this basic page to connect to a q process via WebSockets and retrieve the answer from q. Save the following script as sample1.html. We explain it below.

For simplicity in the example, we have placed a copy of c.js in the pages directory. You should modify this to reflect its location in your installation.

<!doctype html>
<html>
<head>
<script src="c.js"></script>
<script>
var serverurl = "//localhost:5042/",
  c = connect(),
  ws;
function connect() {
    if ("WebSocket" in window) {
        ws = new WebSocket("ws:" + serverurl);
        ws.binaryType="arraybuffer";
        ws.onopen=function(e){
            ws.send(serialize({ payload: "What is the meaning of life?" }));
        };
        ws.onclose=function(e){
        };
        ws.onmessage=function(e){
            sayN(deserialize(e.data));
        };
        ws.onerror=function(e) {window.alert("WS Error") };
    } else alert("WebSockets not supported on your browser.");
}
function sayN(n) {
    document.getElementById('answer').textContent = n;
}
</script>

This script first declares the script c.js, which is required for using q WebSockets.

The script then defines JavaScript variables

serverurl to hold the URL of our q service
c to hold the connection object returned by the connect function
ws to hold a WebSocket object.

The function connect() is where the WebSocket action happens.

It first tests to see if WebSocket is in the window, meaning that the browser supports WebSockets. If so, it makes the connection to the server; otherwise it displays an error alert.
The first step in the connection is to create a WebSocket object by connecting to the specified server URL, and storing the result in ws.
Then set the binaryType field in ws to the value needed by the q sockets code.

Now we assign handlers for the main WebSockets events.

The open handler serializes (into q form) a JavaScript object with a payload field and then sends it to the server. Consequently when a connection is opened, we immediately ask the server the meaning of life.
The close handler is empty.
The message handler deserializes the data field of the parameter e and applies the sayN function to display the result on the page.
The error handler displays an alert page with the error message.

The sayN function locates the answer field on the page and places the text of its argument there. Finally, the script defines a simple HTML element answer.

In contrast, the server side q code is blissfully short. Start a fresh q session, open port 5042 and set the WebSockets handler .z.ws to a function that will be invoked to handle WebSockets messages.

q)\p 5042
q).z.ws:{0N!-9!x; neg[.z.w] -8!42}

The handler first deserializes its parameter and displays it to the console for debugging, at which point we have no further use for it in this example. Then it serializes the answer to the question asked by the browser and asynchronously sends it back to the browser. That’s all there is to it!

Now point the browser to

file:///pages/sample1.html

and you will see the answer displayed on the page. At this point you are equipped to follow §1.19 in Q Shock and Awe.

11.7.3 Pushing Data to the Browser¶

In ordinary Web applications, the browser initiates interaction with the server. It sends a request to a specific URL on the server and the server replies with the requested page or data. Each such interaction is self-contained and is synchronous in that the browser waits for the server response.

In WebSockets the browser initiates the connection, but once the WebSocket request for protocol upgrade is successful, the browser – i.e., client – and the server are on equal footing. Either side can send messages. Moreover, in the current q implementation of WebSockets all interaction is asynchronous. Given that most current browsers and the default q session are both single-threaded, you don’t have to worry about races and deadlocks but you do have to set up callbacks.

In this section we demonstrate how the q server can push data to the browser, beginning with the browser script. Actually this script is a simplification of sample1.html in that we remove the initial call to the server upon open; everything else remains the same. The key point is that the onmessage handler will be called every time data is received, resulting in the data being displayed on the screen. Save the following as sample2.html.

<!doctype html>
<html>
<head>
<script src="c.js"></script>
<script>
var serverurl = "//localhost:4242/",
    c = connect(),
    ws;
function connect() {
    if ("WebSocket" in window) {
        ws = new WebSocket("ws:" + serverurl);
        ws.binaryType="arraybuffer";
        ws.onopen=function(e){
        };
        ws.onclose=function(e){
        };
        ws.onmessage=function(e){
            sayN(deserialize(e.data));
        };
        ws.onerror=function(e) {window.alert("WS Error") };
    } else alert("WebSockets not supported on your browser.");
}
function toQ(x) { ws.send(serialize({ payload: x })); }
function sayN(n) {
    document.getElementById('answer').textContent = n;
}
</script>
</head>
<body>
    <h1 style='font-size:200px' id='answer'></h1>
</body>
</html>

And now for the q side. You can enter the following in the console of a fresh q session; or you can save it as a script and load it with \l.

q)\p 4242
q)answer:42
q).z.po:{`requestor set x; system "t 1000";}
q).z.ts:{neg[requestor] -8!answer;; answer+:1;}

Here is what’s happening in the q code.

First we open the port and initialize the answer variable.
Then we set the connection open handler to store the client .z.w value of its parameter into the global requestor and start the system timer firing every 1000 milliseconds. Note that this only happens after the browser initiates a connection.
Finally, we set the timer handler to send an asynchronous message containing the serialized value of answer and then increment answer.

Change since q3.2

“The .z.wo and .z.wc message handlers were introduced in kdb+ version 3.3 (2014.11.26) to be evaluated whenever a WebSocket connection is opened (.z.wo) or closed (.z.wc). Prior to this version, .z.pc and .z.po provide an alternative solution however, these handle the opening and closing of all connections over a port and don’t distinguish WebSocket connections.” — Whitepaper Kdb+ and WebSockets

See Reference: System and callbacks

Now point the browser to

file:///pages/sample2.html

and you will see the answer ticking every second on the page.

11. I/O¶

11.0 Overview¶

11.1 Binary Data¶

11.1.1 File Handles¶

11.1.2 hcount and hdel¶

11.1.3 Serializing and Deserializing q Entities¶

11.1.4 Binary Data Files¶

11.1.5 Writing and Reading Binary¶

11.1.6 Using Apply Amend¶

11.2 Save and Load on Tables¶

11.3 Splayed Tables¶

11.4 Text Data¶

11.4.1 Reading and Writing Text Files¶

11.4.2 Using hopen and hclose¶

11.4.3 Preparing Text¶

11.5 Parsing Records¶

11.5.1 Fixed-Width Records¶

11.5.2 Variable Length Records¶

11.5.3 Key-Value Records¶

11.6 Interprocess Communication¶

11.6.1 Communication Handle¶

11.6.2 Opening a Connection Handle¶

11.6.3 Remote Execution¶

11.6.4 Synchronous and Asynchronous Messages¶

11.6.5 Processing Messages¶

11.6.6 Remote Queries¶

11.7 HTTP and Web sockets¶

11.7.1 HTTP Connections¶

11.7.2 Basic WebSockets¶

11.7.3 Pushing Data to the Browser¶

11.1.2 `hcount` and `hdel`¶

11.4.2 Using `hopen` and `hclose`¶