QforMortals/i o

From Kx Wiki
Jump to: navigation, search

Contents

I/O

Overview

I/O in q is achieved using handles, which are symbols whose values are file names. The handle acts as a mapping to an I/O stream, in the sense that retrieving a value from the handle results in a read and passing a value to the handle is a write.

Data Files

All q entities are automatically serializable to disk. The persistent form is a self-describing version of the in-memory form. A data file comprises a q entity written to disk.

File Handle

A data file handle is a symbol that starts with a colon (:) and has the form,

	`:[path]fname

where the bracketed expression represents an optional path and fname is a file name. Both path and fname must be valid names as recognized by the underlying operating system.

Warning.png The one caveat is that separators in q paths are always represented by the forward slash (/), even for Windows.

Using set and get

A data file is created and a q entity written to it in a single step using binary set. The left operand is a file handle, the right operand is the entity to be written and the result is the handle of the written file. The file is closed once the write is complete.

	`:c:/qdata.dat set 101 102 103
`:c:/qdata.dat
Information.png The behavior of set is to create the file if it does not exist and overwrite it if it does.

A data file can be read using unary get, whose argument is a file handle and whose result is the q entity contained in the data file.

	get `:c:/qdata.dat
101 102 103

An alternate way to read a data file is with value,

	value `:c:/qdata.dat
101 102 103 42 1 2 3 4

Using hopen and hclose

A data file handle can be opened with hopen. The result of hopen is an int that acts like a function for writing to the file.

	h:hopen `:c:/qdata.dat
	h[42]
	h 1 2 3 4

If the file already exists, opening it with hopen appends to it rather than overwriting it.

To close the handle, issue hclose on the result of hopen.

	hclose h

Using Dot Amend

Fundamentalists can use dot amend to write to data files. To overwrite the file if it exists, use assign (:).

	.[`:c:/qdata.dat;();:;1001 1002 1003]
`:c:/qdata.dat
	get `:c:/qdata.dat
1001 1002 1003

To append to the file if it exists, use join (,).

	.[`:c:/qdata.dat;();,;42 43]
`:c:/qdata.dat
	get `:c:/qdata.dat
1001 1002 1003 42 43

Writing Splayed Tables

Writing a table to a data file using the above methods puts it into a single file. For example,

	t:([] c1:101 102 103; c2:1.1 2.2 3.3)
	`:c:/q/data/t.dat set t
`:c:/q/data/t.dat

creates a single file in the data subdirectory of the q directory. List the directory on your disk now to verify this.

For large tables, you can write each column of the table to its own file in the directory specified in the handle. A table written in this form is called a splayed table.

Warning.png In order for a table to be splayed, it must not contain any symbol columns. If the table has symbol columns, enumerate them.

To splay a table, specify the path as a directory—that is, with a trailing slash and no file name.

	`:c:/q/data/t/ set t
`:c:/q/data/t/

If you list now the directory, you will see a new subdirectory named 't'. It contains three files, one file for each column in the original table, as well as a file containing q directory information. The latter describes how to put the columns back together.

Text Files

To import and export data qith kdb+, it is often nexessary to read and write text files. The mechanism or doing this in q differs from processing q data files.

Writing (0:) and Reading (read0)

The q primitive verb denoted 0: takes a file handle as its left argument and a list of q strings as it right argument. It writes each string as a line of text in the specified file.

	`:c:/q/Life.txt 0: ("So";"Long")
`:c:/q/Life.txt

Opening the file Life.txt in a text editor will show a file with two lines.

Read a text file with read0. The result is a list of strings, one for each line in the file.

	read0 `:c:/q/Life.txt
("So";"Long")

Using hopen and hclose

A text file handle can be opened with hopen. The result of hopen is a positive int whose negative can be used to write text to the file.

	h:hopen `:c:/q/Life.txt
	(neg h)["and"]
-152
	(neg h) ("Thanks";"for";"all";"the";"Fish")
-152

If the file already exists, opening it with hopen will append to it rather than overwriting it.

To close the handle, issue hclose on the int result of hopen.

	hclose h
	read0 `:c:/q/Life.txt
("So";"Long";"and";"Thanks";"for";"all";"the";"Fish")

Binary Files

It is also useful to read and write data from/to binary files. The mechanism for doing this is similar to that for processing text files. In q, a binary record is a simply a list of byte values.

Writing (1:) and Reading (read1)

The q primitive verb denoted 1: takes a file handle as its left argument and a simple byte list as its right argument. It writes each byte in the list as a byte in the specified file.

	`:c:/q/answer.bin 1: 0x2a0607
`:c:/q/answer.bin

Opening the file answer.bin in an editor that displays binary data will show a file with three bytes.

Read a text file with read1. The result is a list of byte.

	read1 `:c:/q/answer.bin
0x2a0607

Using hopen and hclose

A binary file handle can be opened with hopen. The result of hopen is a postiive int that can be used to write a list of byte to the file.

	h:hopen `:c:/q/answer.bin  h[0x01]
152
	h 0x020304
152
	hclose h

        read1 `:c:/q/answer.bin
0x2a060701020304

Reading Text Files as Binary

A text file can also be read as binary data by using read1. With Life.txt as above,

	read0 `:c:/q/Life.txt
("So";"Long";"and";"Thanks";"for";"all";"the";"Fish")
	read1 `:c:/q/Life.txt
0x536f0d0a4c6f6e670d0a616e640d0a5468616e6b730d0a666f720d0...

If you prefer to see the text file as char, cast the binary,

	"c"$read1 `:c:/q/Life.txt
"So\r\nLong\r\nand\r\nThanks\r\nfor\r\nall\r\nthe\r\nFish\r\n"

Parsing File Records

There are binary forms of 0: and 1: that parse individual fields of a text or binary record according to data type. Field parsing is based on the following field types,

0 1 Type Width(1) Format(0)
B b boolean 1 [1tTyY]
X x byte 1
H h short 2 [0-9a-fA-F][0-9a-fA-F]
I i int 4
J j long 8
E e real 4
F f float 8
C c char 1
S s symbol n
M m month 4 [yy]yy[?]mm
D d date 4 [yy]yy[?]mm[?]dd or [m]m/[d]d/[yy]yy
Z z datetime 8 date?time
U u minute 4 hh[:]mm
V v second 4 hh[:]mm[:]ss
T t time 4 hh[:]mm[:]ss[[.]ddd]

A space character causes parsing to skip the field and * denotes literal chars.

The column labeled '0' contains the (upper case) field type char for text data. The (lower case) char in column '1' is for binary data. The column labeled 'Width(1)' contains the number of bytes that will be parsed for a binary read. The column labeled 'Format(0)' displays the format(s) that are accepted in a text read.

Warning.png The parsed records are presented in column form rather than in row form. This is because q considers a table to be a collection of column values.

Fixed Length Records

The binary form of 0: and 1: for reading fixed length files is,

(Lt;Lw) 0: f
(Lf;Lw) 1: f

The left operand is a (general) list containing two lists: Lt is a simple list of char containing one letter per field; Lw is a simple list of int containing one int width per field. The sum of the field widths in Lw must equal the width of the record. The result of the function in all cases is a (general) list of lists with an item for each field.

The simplest form of the right operand f is a symbol representing a file handle. For example,

	("IFC D";4 8 10 6 4) 0: `:c:/q/Fixed.txt

reads a text file containing fixed length records of width 32. The first field is an int of length 4; the second field is a float of width 8; the third field consists of 10 char; the fourth slot of 6 positions is skipped; the fifth field is a date of width 10.

You might think that the widths are superfluous, but they are not. The width can be narrower than the default for small values. Alternatively, specify a width larger than that required by the corresponding data type to indicate blanks between fields. If the file in the previous example were rewritten with one additional blank character between fields, the left operand to read it would be,

	("IFC D"; 5 9 11 6 4)

For a simple example, let's say we have a file c:/q/data/Px.txt having the form,

	1001DBT12345678  98.61002EQT98765432 24.571003CCR00000001121.23

The read is,

	("ISF";4 11 6) 0: `:c:/q/data/Px.txt
(1001 1002 1004;`DBT12345678`EQT98765432`CCR00000001;98.6 24.75 121.23)

The second form of the right operand f is,

(hfile;i;n)

where hfile is a symbol containing a file name, i is the offset into the file to begin reading and n is the number of bytes to read. This is useful for large files that cannot be read into memory in one operation.

Warning.png A read operation must begin and end on a record boundary.

In our trivial example, the following reads the second and third records,

	("ISF";4 11 6) 0: (`:c:/q/data/Px.txt; 21; 42)
(1002 1004;`EQT98765432`CCR00000001;24.75 121.23)

Variable Length Records

The binary form of 0: and 1: for reading variable length delimited files is,

(Lt;D) 0: f
(Lf;D) 1: f

The left operand is a (general) list comprising two items: Lt is a simple list of char containing one type letter per field; D is a either a char representing the delimiting character or an enlisted such.

If D is a delimiter char, the result is a general list of lists. Each list in the result is made up of items of type specified by Lt. The simplest form of the right operand f is a symbol representing a file handle.

For example, say we have a csv file c:/q/data/Px.csv having records,

1001,"DBT12345678",98.6
1002,"EQT98765432",24.75
1004,"CCR00000001",121.23

Reading with a simple delimiter char results in a list of column lists,

	("ISF";",") 0: `:c:/q/data/Px.csv
(1001 1002 1004;`DBT12345678`EQT98765432`CCR00000001;98.6 24.75 121.23)

If D is the enlist of a delimiter char, the first record is taken to be a list of column names. Subsequent records are read as data specified by the types in Lt. The result is a table in which each record is formed from a file record.

Say we have a csv file c:/q/data/PxTitles.csv having records,

"Seq","Sym","Px"
1001,"DBT12345678",98.6
1002,"EQT98765432",24.75
1004,"CCR00000001",121.23

Reading with an enlisted delimiter results in a table,

	show ("ISF";enlist ",") 0: `:c:/q/data/Px.csv
Seq  Sym         Px
-----------------------
1001 DBT12345678 98.6
1002 EQT98765432 24.75
1004 CCR00000001 121.23

You can also read this file with an atomic delimiter. The result is a list of lists with nulls in the positions where the header records do not match the specified types.

	("ISF";",") 0: `:c:/q/data/Px.csv
(0N 1001 1002 1004;`Sym`DBT12345678`EQT98765432`CCR00000001;0n 98.6 24.75 121

Saving and Loading Contexts

It is possible to save or restore all the entities in a q context in one operation. This is useful to restore the state of a system to its initial condition or from a checkpoint.

Saving a Context

Recall that a context is actually a dictionary. You can write an entire context, with all its entities, to a single data file by writing the dictionary.

For example, to write out the default context,

	`:currentws set value `.
`:currentws

Loading a Context

To retrieve a saved context, use get with the file handle,

	dc:get `:currentws

Use set with a symbol containing the context name to replace the context,

	`. set dc
Warning.png Overlaying the root context replaces all entities. This is convenient for re-initialization, but make sure it is what you intend.

Interprocess Communication

A q process can communicate with another q process residing anywhere on the network, provided that process is accessible. The process that initiates the communication is the client, while the process receiving and processing the request is the server. The server process can be on the same machine, the same network, a different network or on the internet. The communication can be synchronous (wait for a result to be returned) or asynchronous (don't wait and no result returned).

The easiest way to examine interprocess communication is to start another q process on the same machine running your current q session. Make sure it is listening on a different port (the default port is 5000). In what follows we shall assume that a server q process has been started on the same machine with the command,

q –p 5042

This means it is listening on port 5042.

Communication Handle

A communication handle is similar to a file handle. It is a symbol that starts with a colon (:) and has the form,

`:[server]:port

where the bracketed expression represents an optional server machine identifier and port is a port number.

If the server process is running on the same machine as the client process, you can omit the server identifier. In our case, the communication handle is,

	`::5042

If the server is on the same network as your machine, you can use its machine name. In our case,

	`:aerowing:5042

You can use the IP address of the server,

	`:198.162.0.2:5042

If the server is running on the internet, you can use its url,

	`:www.yourco.com:5042

Connection Handle

Use a communication handle as the argument to hopen to open a connection to the server process. Store the int result of hopen, called the connection handle, in a variable. You issue commands to the server by treating the variable as if it were a function.

For example, if the server process is running on the same machine and is listening on port 5042, the following q code opens a connection to the server process. It assigns the value 42 to the variable a on the server and then retrieves the value of a from the server. Finally, the connection is closed.

	h:hopen `::5042
	h "a:42"  h "a"
42
	hclose h
Information.png Whitespace between h and the quoted string is optional, as it is in function juxtaposition. We include it for readability.

Message Formats

There are two message formats for inter-process communication. The first, which we used above, is simply a string containing any q expression to be executed on the server. For example,

	"a:6*7"
	"select avg price from t where date>2006.01.01"
Warning.png If the expression in the execution string contains special characters, they must be escaped. For example, to define a string on the server, you must escape the double quotes.
	"str:\"abc\""

The second message format for inter-process communication is a list,

(f; arg1; arg2; ...)

Here f is a symbol or string containing the name of a mapping on the server representing a function, dictionary or list. The remaining items, arg1, arg2, etc.. comprise parameters if f is a function, indices if f is a list, or domain elements if f is a dictionary. The message returns the result of applying the mapping to the args.

When f is function that performs an operation on a table, it becomes a remote stored procedure. For example, suppose t and f are defined on the server as,

	t:([]c1:`a`b`c;c2:1 2 3)
	f:{[x] select c2 from t where c1=x}

The following expression on the client executes f on the server, selecting rows that match the value `b in c1,

	h "f `b"
+(,`c2)!,,2

Synchronous Messages

The messages sent in the previous sections were synchronous, meaning that the sending client process waits for a result from the server before proceeding. The result of the operation on the server becomes the return value of the remote call using the connection handle.

To send a synchronous message, use the original positive int value of the connection handle as if it were a function. A common example of sending a synchronous message is executing a select expression on the server. In this case, you surely want to wait for the result to return.

For example, suppose a table has been defined on the server,

	t:([]c1:`a`b`c;c2:1 2 3)

The following message executes a query against t, assuming h is an open connection handle to the server,

	h "select from t where c1=`b"
+`c1`c2!(,`b;,2)
Information.png The previous example demonstrates how to perform the equivalent of dynamic SQL against the server process.

As another example, you can send an insert synchronously if you want confirmation of the operation.

	h "`t insert (`x;42)"
,3
	show h "t"
c1 c2
-----
a  1
b  2
c  3
x  42

Asynchronous Messages

It is also possible to send messages asynchronously, meaning that the client does not wait and there is no result containing a return value. You would typically send an asynchronous message to kick off a long-running operation on the server. You might also send an asynchronous message if the operation does not have a meaningful result, or if you just don't care to wait for the result.

To send an asynchronous message, use the negative of the int connection handle returned by hopen. For example, the insert that was sent synchronously in the previous example can also be sent asynchronously,

	(neg h) "`t insert (`y;43)"
	show h "t"
c1 c2
-----
a  1
b  2
c  3
x  42
y  43

Observe that there is no return value from the first message.

Information.png In the previous example, because the first message is asynchronous, it is possible that the second message will be sent from the client before the insert has completed on the server. However, the second message will not execute on the server until the first has completed.

Message Handlers

When a q process receives a message via inter-process communication, the default behavior is to evaluate the message, effectively executing the message content. If the message is synchronous, the result is returned to the client.

During message processing on the server, the server connection handle is automatically placed in .z.w. This can be used to manage connections on the server. See below for a simple example.

Warning.png The connection handle on the client side and the connection handle on the server side are assigned independently by their respective q processes. In general, they are not equal.

The default message processing can be overridden using message filters. Message filters are event-handling functions in the .z context. The .z.pg message filter processes synchronous requests and .z.ps processes asynchronous requests.

Information.png The names end in 'g' and 's' because synchronous processing has "get" semantics and asynchronous processing has "set" semantics.

The following two assignments on the server recreate the default message processing behavior,

	.z.ps:{value x}
	.z.pg:{value x}

Message filtering can be used for a variety of purposes. For example, suppose the connection allows a user on the client side to execute dynamic q-sql against the server. You could improve on the default processing by enclosing the evaluation in protected execution.

	a.z.pg:{@[value; x; errHandler x]}

Here errHandler is a function that recovers from an unexpected error.

A more interesting example is a server that keeps track of the clients connected to it. A simplistic way to do this is to maintain a dictionary of connection handles mapped to client names. The following function on the server registers a new client connection by upserting it to the global dictionary cp. Remember, .z.w has the connection handle.

	cp:()!()                                     / server
	regConn:{cp[.z.w]::x}                        / server

The client could register with its machine name,

	h:hopen `::5042                              / client
	h                                            / client
224
	h "regConn `",string .z.h                    / client

After this call, cp will contain an entry that reflects the specific handle assigned to the connection on the server. For example,

	cp                                           / server
(,240)!,`aerowing

As additional connections are made to the server, cp will contain one entry for each connection.

Handling Close

An open connection can be closed by either the client or the server. The close can be deliberate, meaning it occurs under program control, or it can be unanticipated due to one of the processes terminating unexpectedly.

The close handler .z.pc can be used to perform processing whenever a connection is closed from the other end. While it will be invoked on any close, it does not know how the close was initiated

In our example above, we use a close handler to remove the information about a connection once it is closed. Specifically, we create a handler to remove the appropriate entry from cp.

	.z.pc:{cp::cp _ x}                           / server

When the client issues an hclose on its connection handle,

	hclose h                                     / client

the dictionary cp no longer shows the connection,

	cp                                           / server
(`int$())!`symbol$()

Now that we have established basic close handling on the server, we turn our attention to the client. We want the client to reconnect automatically in the event the server disconnects for any reason. The easiest way to do this is with the timer.

We create a close handler that resets the global connection handle to 0 and issues a command that sets the timer to fire every 2 seconds (2000 milliseconds).

	.z.pc:{h::0;value"\\t 2000"}

The timer handler attempts to re-open the connection. Upon success, it issues a command that turns the timer off.

	.z.ts:{h::hopen`::5042;if[h  >  0;value"\\t 0"]}
Warning.png In practice, you probably should restrict the number of connection retries rather than try forever.

Http Connection Handler

There is also a message handler for http connections, named z.ph. Since http communication is always synchronous, there is only one handler. In contrast to other system handlers, there is a default handler for http, which is used for the q web viewer.

The default handler allows a q process to be accessed programmatically over the web similar to a servlet. The ambitious reader could replace this with a handler that processes SOAP, thus enabling q to be a web service. Such a handler would be the object of derision from those who decry SOAP as unnecessary and wasteful.


Prev: Execution Control, Next: Workspace Organization

©2006 Kx Systems, Inc. and Continuux LLC. All rights reserved.

Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox