QforMortals2/i o

From Kx Wiki
Jump to: navigation, search




I/O in q is achieved using handles, which are symbols whose values are file names. The handle acts as a mapping to an I/O stream, in the sense that retrieving a value from the handle results in a read and passing a value to the handle is a write.

Data Files

All q entities are automatically serializable to disk. The persistent form is a self-describing version of the in-memory form. A data file comprises a q entity written to disk.

File Handle

A file handle is a symbol that starts with a colon ( : ) and has the form,


where the bracketed expression represents an optional path and fname is a file name. Both path and fname must be valid names as recognized by the underlying operating system.

Warning.png Important: The one caveat is that separators in q paths are always represented by the forward slash ( / ), even for Windows.

Using hcount and hdel

Use hcount with a file handle to determine the size of the file in bytes. The result is a long.

        hcount `:c:/q/Life.txt

Use hdel with a file handle to delete a file from the file system of the underlying operating system. A return value of the file handle indicates that the deletion was successful. You will get an error message if the file does not exist or if the delete cannot be performed.

        hdel `:c:/q/Life.txt

Using set and get

A data file is created and a q entity written to it in a single step using binary set . The left operand is a file handle, the right operand is the entity to be written and the result is the handle of the written file. The file is closed once the write is complete.

        `:/q/qdata.dat set 101 102 103
Warning.png Note: The behavior of set is to create the file if it does not exist and overwrite it if it does.

A data file can be read using unary get, whose argument is a file handle and whose result is the q entity contained in the data file.

	get `:/q/qdata.dat
101 102 103

An alternate way to read a data file is with value,

       value `:/q/qdata.dat
101 102 103 42 1 2 3 4

Using hopen and hclose

A data file handle is opened with hopen. The result of hopen is an int file handle that acts like a function for writing to the file once assigned to a variable.

        h:hopen `:c:/qdata.dat

        h[42]                        / handle used as function

        h 1 2 3 4                   / juxtaposition notation

If the file already exists, opening it with hopen appends to it rather than overwriting it.

To close the handle, issue hclose on the result of hopen. This flushes any data that might be buffered.

        hclose h

After the operations above, we fond,

	get `:/q/qdata.dat
101 102 103 42 1 2 3 4

Using Dot Amend

Fundamentalists can use dot amend to write to data files. To overwrite the file if it exists, use assign ( : ).

        .[`:/q/qdata.dat;();:;1001 1002 1003]

        get `:/q:/qdata.dat
1001 1002 1003

To append to the file if it exists, use join ( , ).

        .[`:/q/qdata.dat;();,;42 43]

         get `:/q/qdata.dat
1001 1002 1003 42 43

Writing Splayed Tables

Writing a table to a data file using the above methods puts it into a single file. For example,

        t:([] c1:101 102 103; c2:1.1 2.2 3.3)
        `:/q/data/t.dat set t

creates a single file in the data subdirectory of the q directory. List the directory on your disk now to verify this.

You can write each column of the table to its own file in the directory specified in the handle; this is especially useful for large tables. A table written in this form is called a splayed table.

To splay a table, specify the path as a directory - that is, with a trailing slash (/) and no file name.

        `:/q/data/t/ set t

If you list the directory in the OS, you will see a new subdirectory named 't'. It contains three files, one file for each column in the original table, as well as a '.d' file containing q meta data. The latter describes how to put the columns back together.

Warning.png Important: For a table to be splayed, each column must be of uniform width. Consequently a splayed table cannot contain any symbol or non-simple columns. A table with symbol column(s) can effectively splayed by enumerating the symbols.

Thus, the following fails,

        ts:([]c1:`a`b`c`a;c2:10 20 30 40)
        `:/q/data/ts/ set ts

Enumerate the symbol column and the write succeeds.

        syms:distinct ts.c1
        update c1:`syms$c1 from `ts

c1 c2
a  10
b  20
c  30
a  40

        `:/q/data/ts/ set ts

Save and Load on Tables

The save and load functions simplify the process of writing and reading tables to/from disk files.

In its simplist form, save writes a table to a file with the same name as the table. The form,

	save `:path/tname

in which path is an optional path name and tname is the name of a table in the workspace, is equivalent to,

	`:path/tname set tname


	save `:/q/trade

writes the trade table to a file named trade in the q directory.


	save `:path/tname/

splays the table within the directory tname.

As you would expect, load is the inverse of save, in that it reads a table from a file into a variable with the same name as the file. In other words,

	load `:path/tname

is equivalent to,

	tname:get `:path/tname

Thus, the expression,

	load `:/q/trade

creates a table variable trade and populates it from the file data.

As before, appending a / indicates that the table has been splayed. So,

	load `:path/tname/

populates a table tname from the directory tname.

You can also use save to write a table as delimited text simply by appending an appropriate file extension. The expression,

	save `:path/tname.txt

writes the table as text records. The expression,

	save `:path/tname.csv

writes the table as csv records. The expression,

	save `:path/tname.xml

writes the table as xml records.

Warning.png Note: Tables written as .txt or .csv can be read as text files.

As an example, we take the simple table,

	tsimp:([] c1:`a`b`c; c2:10 20 30)

We save it,

	save `:/q/tsimp

Then reload it

	load `:/q/tsimp
c1 c2
a  10
b  20
c  30

Next we save it in delimited text formats,

	save `:/q/tsimp.txt
	save `:/q/tsimp.csv
	save `:/q/tsimp.xml

Now we inspect the files files with a text editor. In tsimp.txt, we find,

c1	c2
a	10
b	20
c	30

In tsimp.csv we have,


In tsimp.xml, we have,


Text Files

Importing and exporting data often involves reading and writing text files. The mechanism for doing this in q differs from processing q data files.

Writing (0:) and Reading (read0)

The q primitive verb denoted 0: takes a file handle as its left argument and a list of q strings as it right argument. It writes each string as a line of text in the specified file.

         `:/q/Life.txt 0: ("So";"Long")

Opening the file Life.txt in a text editor will show a file with two lines.

Read a text file with read0. The result is a list of strings, one for each line in the file.

        read0 `:/q/Life.txt

Using hopen and hclose

A text file handle can be opened with hopen. The result of hopen is a positive int whose negative is a file handle can be used to write text to the file.

        h:hopen `:/q/Life.txt
        (neg h)["and"]
        (neg h) ("Thanks";"for";"all";"the";"Fish")

If the file already exists, opening it with hopen will append to it rather than overwriting it.

To close the handle, issue hclose on the int result of hopen . This flushes any data that might be buffered.

        hclose h
        read0 `:/q/Life.txt

Binary Files

It is also useful to read and write data from/to binary files. The mechanism for doing this is similar to that for processing text files. In q, a binary record is a simply a list of byte values.

Writing (1:) and Reading (read1)

The q primitive verb denoted 1: takes a file handle as its left argument and a simple byte list as its right argument. It writes each byte in the list as a byte in the specified file.

        `:/q/answer.bin 1: 0x2a0607

Opening the file answer.bin in an editor that displays binary data will show a file with three bytes.

Read a text file with read1. The result is a list of byte.

        read1 `:/q/answer.bin

Using hopen and hclose

A binary file handle can be opened with hopen. The result of hopen is a postiive file handle int that can be used to write a list of byte to the file. Close the file by issuing hclose on the handle.

        h:hopen `:/q/answer.bin

        h 0x020304

        hclose h
        read1 `:/q/answer.bin

Reading Text Files as Binary

A text file can also be read as binary data by using read1. With Life.txt as above,

        read0 `:/q/Life.txt
        read1 `:c:/q/Life.txt

To convert this binary data to char, cast the binary. On a Windows machine, this looks as follows,

        "c"$read1 `:c:/q/Life.txt

Parsing File Records

Binary forms of 0: and 1: parse individual fields of a text or binary record according to data type. Field parsing is based on the following field types.

0 1 Type Width(1) Format(0)
B b boolean 1 [1tTyY]
X x byte 1
H h short 2 [0-9a-fA-F][0-9a-fA-F]
I i int 4
J j long 8
E e real 4
F f float 8
C c char 1
S s symbol n
M m month 4 [yy]yy[?]mm
D d date 4 [yy]yy[?]mm[?]dd or [m]m/[d]d/[yy]yy
Z z datetime 8 date?time
U u minute 4 hh[:]mm
V v second 4 hh[:]mm[:]ss
T t time 4 hh[:]mm[:]ss[[.]ddd]
blank skip
* literal chars

The column labeled '0' contains the (upper case) field type char for text data. The (lower case) char in column '1' is for binary data. The column labeled 'Width(1)' contains the number of bytes that will be parsed for a binary read. The column labeled 'Format(0)' displays the format(s) that are accepted in a text read.

Warning.png Note: The parsed records are presented in column form rather than in row form because q considers a table to be a collection of columns.

Fixed Length Records

The binary form of 0: and 1: for reading fixed length files is,

(Lt;Lw) 0: f

(Lt;Lw) 1: f

The left operand is a (general) list containing two sublists: Lt is a simple list of char containing one letter per field; Lw is a simple list of int containing one int width per field. The sum of the field widths in Lw must equal the width of the record. The result of the function in all cases is a (general) list of lists with an item for each field.

The simplest form of the right operand f is a symbol representing a file handle. For example,

        ("IFC D";4 8 10 6 4) 0: `:/q/Fixed.txt

reads a text file containing fixed length records of width 32. The first field is an int of length 4; the second field is a float of width 8; the third field consists of 10 char; the fourth slot of 6 positions is skipped; the fifth field is a date of width 10.

You might think that the widths are superfluous, but they are not. The actual width can be narrower than the default for small values. Alternatively, you may wish to specify a width larger than that required by the corresponding data type to indicate blanks between fields. If the file in the previous example were rewritten with one additional blank character between fields, the proper left operand to read it would be,

        ("IFC D"; 5 9 11 6 4)

For example, we take a file c:/q/data/Px.txt having the form,

        1001DBT12345678  98.61002EQT98765432 24.571003CCR00000001121.23

The read is,

        ("ISF";4 11 6) 0: `:/q/data/Px.txt
1001        1002        1004
DBT12345678 EQT98765432 CCR00000001
98.6        24.75       121.23

The second form of the right operand f is,


where hfile is a symbol containing a file name, i is the offset into the file to begin reading and n is the number of bytes to read. This is useful for large files that cannot be read into memory in one operation.

Warning.png Note: A read operation must begin and end on a record boundary.

In our trivial example, the following reads the second and third records,

        ("ISF";4 11 6) 0: (`:/q/data/Px.txt; 21; 42)
1002        1004
EQT98765432 CCR00000001
24.75       121.23

Variable Length Records

The binary form of 0: and 1: for reading variable length delimited files is,

(Lt;D) 0: f

(Lt;D) 1: f

The left operand is a (general) list comprising two items: Lt is a simple list of char containing one type letter per field; D is a either a char representing the delimiting character or an enlisted such.

If D is a delimiter char, the result is a general list of lists. Each list in the result is made up of items of type specified by Lt. The simplest form of the right operand f is a symbol representing a file handle.

For example, say we have a csv file /q/data/Px.csv having records,


Reading with a simple delimiter char results in a list of column lists,

        ("ISF";",") 0: `:c:/q/data/Px.csv
1001        1002        1004
DBT12345678 EQT98765432 CCR00000001
98.6        24.75       121.23

If D is the enlist of a delimiter char, the first record is taken to be a list of column names. Subsequent records are read as data specified by the types in Lt. The result is a table in which each record is formed from a file record.

Say we have a csv file /q/data/pxtitles.csv having records,


Reading with an enlisted delimiter results in a table,

        ("ISF";enlist ",") 0: `:/q/data/pxtitles.csv
Seq  Sym         Px
1001 DBT12345678 98.6
1002 EQT98765432 24.75
1004 CCR00000001 121.23

You can also read this file with an atomic delimiter. The result is a list of lists with nulls in the positions where the header records do not match the specified types.

        ("ISF";",") 0: `:c:/q/data/pxtitles.csv
1001        1002        1004
Sym DBT12345678 EQT98765432 CCR00000001
    98.6        24.75       121.23

Saving and Loading Contexts

It is possible to save or restore all the entities in a q context in one operation. This is useful to restore the state of a system to its initial condition or from a checkpoint.

Saving a Context

Recall that a context is actually a dictionary. You can write an entire context, with all its entities, to a single data file by writing the dictionary.

For example, to write out the default context,

        `:currentws set value `.

Loading a Context

To retrieve a saved context, use get with the file handle,

        dc:get `:currentws

Use set with a symbol containing the context name to replace the context,

        `. set dc
Warning.png Important: Overlaying the root context replaces all its entities. This is convenient for re-initialization, but be sure of your intent.

Interprocess Communication

A q process can communicate with another q process residing anywhere on the network, provided that process is accessible. The process that initiates the communication is the client, while the process receiving and processing the request is the server. The server process can be on the same machine, the same network, a different network or on the internet. The communication can be synchronous (wait for a result to be returned) or asynchronous (don't wait and no result returned).

The easiest way to examine interprocess communication (IPC) is to start another q process on the same machine running your current q session. Make sure it is listening on a different port (the default port is 5000). In what follows we shall assume that a server q process has been started on the same machine with the command,

        q -p 5042

This means it is listening on port 5042.

Communication Handle

A communication handle is similar to a file handle. It is a symbol that starts with a colon (:) and has the form,


where the bracketed expression represents an optional server machine identifier and port is a port number.

If the server process is running on the same machine as the client process, you can omit the server identifier. In our case, the communication handle is,


If the server is on the same network as your machine, you can use its machine name. In our case,


You can use the IP address of the server,


If the server is running on the internet, you can use a url,


Connection Handle

Use a communication handle as the argument of hopen to open a connection to the server process. Store the int result of hopen , called the connection handle, in a variable. You issue commands to the server by treating this variable as if it were a function.

For example, if the server process is running on the same machine and is listening on port 5042, the following q code opens a connection to the server process. It assigns the value 42 to the variable a on the server and then retrieves the value of a from the server. Finally, the connection is closed.

        h:hopen `::5042
        h "a:42"
        h "a"
        hclose h
Warning.png Note: Whitespace between h and the quoted string is optional, as it is in function juxtaposition. We include it for readability.

Message Format

The general message format for interprocess communication is a list,

(f; arg1; arg2; ...)

Here f is a symbol or string representing an expression to be evaluated on the server. It can be an expression containing q operators or it can be a function, dictionary or list. The remaining items arg1, arg2 ... are optional parameters for the map. The parameters are arguments when f is function, indices when f is a list, or domain items when f is a dictionary. Message execution returns the result of the server's evaluation.

This form of remote call is very powerful, in that it can send a mapping to a remote q instance for evaluation. In particular, the lambda of a function is transported. In a simple example, say we already have an open handle h to a server. If f is defined on the client as,


then executing the following expression on the client,

	h (f;2)

results in f being sent to the server with the argument 2 and then evaluated there. The result is,

	h (f;2)
Warning.png Important: Exercise caution when sending entities to a remote server. A trivial mistake could place the server into a non-responding state. It is safer to define a function on the server and screen its input internally.

A special case of the general message format, which we used previously, is a string in which f is a q expression to be executed on the server and there are no args. For example,

        "select avg price from t where date>2006.01.01"

This format can be used to execute a function that has been defined on the server. For example, suppose g is defined on the server as,


Executing the following on the client sends the string "g 2" to the server where it is evaluated. The result is,

	h "g 2"

Compare this with the example above where f is defined on the client.

Warning.png Note: If the expression in the execution string contains special characters, they must be escaped. For example, to define a string on the server, you must escape the double quotes in the message string.

When the remote function performs an operation on a table, it can be viewed as a remote stored procedure. For example, suppose t and f are defined on the server as,

        t:([]c1:`a`b`c;c2:1 2 3)
        f:{[x] select c2 from t where c1=x}

The following expression on the client executes f on the server, selecting rows that match the value `b in c1,

        h "f `b"

The equivalent of dynamic SQL can be achieved by passing a function definition.

        h ({[x] select c2 from t where c1=x};`b)

Synchronous Messages

The messages sent in the previous sections were synchronous, meaning that the sending client process waits for a result from the server before proceeding. The result of the operation on the server becomes the return value of the remote call that uses the connection handle.

To send a synchronous message, use the original positive int value of the connection handle as if it were a function. A typical example of sending a synchronous message is executing a select expression on the server. In this case, you surely want to wait for the result to return.

For example, suppose a table has been defined on the server as,

        t:([]c1:`a`b`c;c2:1 2 3)

The following message executes a query against t, assuming h is an open connection handle to the server.

        h "select from t where c1=`b"
c1 c2
b  2
Warning.png Note: The previous example demonstrates how to perform the equivalent of dynamic SQL against the server process.

As another example, send an insert synchronously if you want confirmation of the operation.

       h "`t insert (`x;42)"
       h "t"
c1 c2
a  1
b  2
c  3
x  42

Asynchronous Messages

It is also possible to send messages asynchronously, meaning that the client does not wait and there is no result containing a return value. You would typically send an asynchronous message to kick off a long-running operation on the server. You might also send an asynchronous message if the operation does not have a meaningful result, or if you simply don't care to wait for the result.

To send an asynchronous message, use the negative of the int connection handle returned by hopen. For example, the insert that was sent synchronously in the previous example can also be sent asynchronously,

        (neg h) "`t insert (`y;43)"
        h "t"
c1 c2
a  1
b  2
c  3
x  42
y  43

Observe that there is no return value from the first message.

Information.png Advanced: In the previous example, because the first message is asynchronous, it is possible that the second message will be sent from the client before the insert has completed on the server. However, the second message will not execute on the server until the first has completed.

Message Handlers

When a q process receives a message via interprocess communication, the default behavior is to evaluate the message, effectively executing the message content. If the message is synchronous, the result is returned to the client.

During message processing on the server, the server connection handle is automatically placed in .z.w . This can be used to manage connections on the server. See below for a simple example.

Warning.png Note: The connection handle on the client side and the connection handle on the server side are assigned independently by their respective q processes. In general, they are not equal.

The default message processing can be overridden using message filters. Message filters are event-handling functions in the .z context. The .z.pg message filter processes synchronous requests and .z.ps processes asynchronous requests.

Information.png Advanced: The names end in 'g' and 's' because synchronous processing has "get" semantics and asynchronous processing has "set" semantics.

The following two assignments on the server recreate the default message processing behavior.

        .z.ps:{value x}
        .z.pg:{value x}

Message filtering can be used for a variety of purposes. For example, suppose the connection allows a user on the client side to execute dynamic q-sql against the server. You could improve on the default processing by enclosing the evaluation in protected execution.

        .z.pg:{@[value; x; errHandler x]}

Here errHandler is a function that recovers from an unexpected error.

A more interesting example is a server that keeps track of the clients connected to it. A simplistic way to do this is to maintain a dictionary of connection handles mapped to client names. The following function on the server registers a new client connection by upserting it to the global dictionary cp. Remember, .z.w has the connection handle.

        cp:()!()                                                / server

        regConn:{cp[.z.w]::x}                          / server

The client could pass its machine name,

        h:hopen `::5042                                / client
        h                                                        / client
        h "regConn `",string .z.h                    / client

After this call, cp will contain an entry that reflects the specific handle assigned to the connection on the server. For example,

        cp                                                        / server
4| macpro.local

As additional connections are made to the server, cp will contain one entry for each connection.

Handling Close

An open connection can be closed by either the client or the server. The close can be deliberate, meaning it occurs under user or program control, or it can be unanticipated due to a process terminating unexpectedly.

The close handler .z.pc can be used to perform processing whenever a connection is closed from the other end. While it will be invoked on any close, it does not know how the close was initiated.

In our example above, we use a close handler to remove the information about a connection once it is closed. Specifically, we create a handler to remove the appropriate entry from cp.

        .z.pc:{cp::cp _ x}                                / server

When the client issues an hclose on its connection handle,

        hclose h                                                / client

the dictionary cp no longer shows the connection,

        cp                                                        / server

Now that we have established basic close handling on the server, we turn our attention to the client. We want the client to reconnect automatically in the event the server disconnects for any reason. The easiest way to do this is with the timer.

We create a close handler that resets the global connection handle to 0 and issues a command that sets the timer to fire every 2 seconds (2000 milliseconds).

        .z.pc:{h::0; value"\\t 2000"}

The timer handler attempts to re-open the connection. Upon success, it issues a command that turns the timer off.

        .z.ts:{h::hopen`::5042; if[h>0;value"\\t 0"]}
Warning.png Note: In practice, you should restrict the number of connection retries rather than try forever.

Http Connection Handler

There is also a message handler for http connections, named z.ph. Since http communication is always synchronous, there is only one handler. In contrast to other system handlers, there is a default handler for http, which is used for the q web viewer.

The default handler allows a q process to be accessed programmatically over the web, similar to a servlet. The ambitious reader could replace this with a handler that processes SOAP, thus enabling q to be a web service. (Such a handler would be the object of derision from those who decry SOAP as unnecessary and wasteful.)

Prev: Execution Control Next: Workspace Organisation

Table of Contents

©2006-2007 Kx Systems, Inc. and Continuux LLC. All rights reserved.

Personal tools