Tables¶
cols column names ungroup normalize meta metadata xasc sort ascending xcol rename cols xdesc sort descending xcols re-order cols xgroup group by values in selected cols insert insert records xkey sset cols as primary keys upsert add/insert records xdesc sort descending ! Enkey, Unkey add/remove keys
Tables are first-class objects in q.
A table is an ordered list of its rows.
Relations in SQL are sets. There are no duplicate rows, and rows are not ordered. It is possible to define a cursor on a result set and then manipulate the cursor rows in order. (Not in ANSI SQL.)
kdb+ tables are ordered and may contain duplicates. The order allows a class of very useful aggregates that are unavailable to the relational database programmer without the cumbersome and poorly-performing temporal extensions.
Because kdb+ tables are ordered, for a table trade
in timestamp order, the functions first
and last
give open and close prices:
q)first trade
stock| `ibm
price| 121.3
amt | 1000
time | 09:03:06.000
q)last trade
stock| `msft
price| 78.52
amt | 200
time | 16:59:39.000
Construction¶
Flip a column dictionary¶
Here is a matrix: a list of four vectors.
q)show mat:(`Jack`Jill`Janet;`brown`black`fair;`blue`green`hazel;12 9 14)
Jack Jill Janet
brown black fair
blue green hazel
12 9 14
The same information as a column dictionary: each value is a 3-vector.
q)`name`hair`eye`age!mat
name| Jack Jill Janet
hair| brown black fair
eye | blue green hazel
age | 12 9 14
Flipped, it is a table.
q)show t:flip`name`hair`eye`age!mat
name hair eye age
---------------------
Jack brown blue 12
Jill black green 9
Janet fair hazel 14
Each row is a dictionary.
q)first t
name| `Jack
hair| `brown
eye | `blue
age | 12
A table is an ordered list of same-key dictionaries.
Like any list, its length – the number of its records – is given by count
.
q)count t
3
Table notation¶
The same table can be defined with table notation:
q)t~([]name:`Jack`Jill`Janet;hair:`brown`black`fair;eye:`blue`green`hazel;age:12 9 14)
1b
By default, column names are taken from corresponding variable names.
q)stock:`ibm`bac`usb
q)price:121.3 5.76 8.19
q)amount:1000 500 800
q)time:09:03:06.000 09:03:23.000 09:04:01.000
q)show trade:([]stock;price;amt:amount;time)
stock price amt time
-----------------------------
ibm 121.3 1000 09:03:06.000
bac 5.76 500 09:03:23.000
usb 8.19 800 09:04:01.000
Columns get the datatypes of the values assigned them.
q)meta trade
c | t f a
-----| -----
stock| s
price| f
amt | j
time | t
Unlike SQL, there is no need to first declare the types of the columns.
Dict Each Right¶
If your records are in a row-order matrix, Dict Each Right (!/:
) will make each row a like dictionary – and a table is a list of like dictionaries.
q)mat
`ibm 121.3 1000 09:03:06.000
`bac 5.76 500 09:03:23.000
`usb 8.19 800 09:04:01.000
q)`stock`price`amt`time !/: mat
stock price amt time
-----------------------------
ibm 121.3 1000 09:03:06.000
bac 5.76 500 09:03:23.000
usb 8.19 800 09:04:01.000
Read from disk¶
Use Load CSV to load a CSV file as a table; more generally, File Text for reading data from text files.
For reading tables persisted to the filesystem, see Database: persisting tables in the filesystem
Compound columns¶
Any list of the right length can become a column of a table, and its items can be any kdb+ objects – including tables.
Most table columns are vectors: simple lists of uniform type. This is generally most efficient for storage and query execution.
A table column of uniform type but with vector items is called a compound list. A book index is a familiar example.
q)term:`$("analytical engine";"Babbage, Charles";"difference engine")
q)pages:(5 17 324;1 5 17;17 359)
q)([]term;pages)
term pages
--------------------------
analytical engine 5 17 324
Babbage, Charles 1 5 17
difference engine 17 359
Above, the term
column is a symbol vector and the pages
column is a compound list.
In the result of meta
its compound nature is indicated by its type code in upper case.
q)meta([]term;pages)
c | t f a
-----| -----
term | s
pages| J
meta
does not read the entire table.
The meta
keyword samples only the top of each column.
You cannot rely on it to determine whether a column is simple, compound or mixed.
Table schema¶
An empty table can be created by initializing each column as the general empty list.
q)meta trade:([] stock:(); price:(); amount:(); time:())
c | t f a
------| -----
stock |
price |
amount|
time |
In this table, the datatype of each column is mixed. The first record then inserted into the table sets the datatypes of the columns. Subsequent inserts may only insert values of the same type; otherwise a type
error is signalled, which can be trapped and handled.
This requires the table to start with the correct column types, so it is often better to initialize a table with empty columns of the correct type.
q)meta trade:([] stock:`$(); price:`float$(); `long$amount:(); `time$time:())
c | t f a
------| -----
stock | s
price | f
amount| j
time | t
Keyed tables¶
A kdb+ table is a list and it can contain duplicates.
A kdb+ keyed table is a dictionary.
- Its key is a table of the key column/s.
- Its value is a table of the non-key columns.
In table notation, write the key field/s inside the square brackets. For a simple table of financial markets and their addresses:
q)market:([name:`symbol$()] address:())
When constructing a table key, ensure its items are unique.
To protect performance, kdb+ does not ensure key items are unique.
But there is no use case for duplicate key items, which make operation results unpredictable.
If market names are not unique, the country name could be part of the primary key.
q)market:([name:`symbol$(); country:`symbol$()] address:())
An alternative to using multiple columns as a primary key is a column of unique integer IDs
keys
, xkey
get, set key columns of a table
Foreign keys¶
Foreign keys in SQL provide referential integrity: an attempt to insert a foreign key value that is not in the primary key will fail. This is also true in q.
Suppose we want to record in our trades
table the market (NYSE, LSE, etc) where each trade has been executed.
We make the primary key of the markets
table a foreign key in the trades
table.
q)name:`$("Stock Exchange";"Boersen";"NYSE")
q)country:`$("United Kingdom";"Denmark";"United States")
q)city:`$("London";"Copenhagen";"New York")
q)id:1001 1002 1003
q)show market:([id]name;country;city)
id | name country city
----| ----------------------------------------
1001| Stock Exchange United Kingdom London
1002| Boersen Denmark Copenhagen
1003| NYSE United States New York
q)trade:([]
stock:`symbol$();
market:`market$();
price:`float$();
amount:`int$();
time:`time$() )
Only primary keys of keyed tables can be used as a foreign key. But there are other ways to link table columns.
Linking columns
The application of foreign keys and linked columns in kdb+
Indexing a table¶
Column indexing¶
If a table is a flipped dictionary, it is unsurprising we can also index the table by column names.
q)trade `stock`amt
ibm bac usb ibm bac usb
1000 500 800 1000 500 800
Row indexing¶
Selecting the ith row in a table is complex in SQL, but easy in q.
The pseudo column i
represents the row index.
It can be used in queries.
The select
expression returns a table with a single row.
q)select from trade where i=5
stock price amt time
----------------------------
usb 8.19 800 09:04:01.000
Indexing a table with an atom returns a dictionary.
q)trade[5]
stock| `usb
price| 8.19
amt | 800
time | 09:04:01.000
Row indexing cannot be used on a keyed table, which is a dictionary.
It is also easy to access, say, the second-to-last row.
q)trade[(count trade) - 2]
stock| `bac
price| 5.76
amt | 500
time | 09:03:23.000
We see at last the dual nature of a table. It is both
- a list of named same-length columns
- a list of like (same-key) dictionaries
And we can index it either way – or both, which is indexing at depth.
q)trade 1 / index by row
stock| `bac
price| 5.76
amt | 500
time | 09:03:23.000
q)trade `price / index by column
121.3 5.76 8.19
q)trade[1 0;`stock`amt] / index at depth
`bac 500
`ibm 1000
Its items are dictionaries and, as a table is a list of like dictionaries, any sublist of the table is – also a table.
q)trade 1 0
stock price amt time
-----------------------------
bac 5.76 500 09:03:23.000
ibm 121.3 1000 09:03:06.000
Indexing at depth¶
Indexing at depth can be used to read a column within a specific row.
trade[5;`stock]
`usb
Also useful for updates.
trade[5;`amt]:15
Index out of range¶
If we use select
, the result is a table with no rows.
q)select from trade where i = 300000
stock price amount time
---------------------------------------
If we use indexing, the result is a dictionary containing null values.
q)trade 30000
stock| `
price| 0n
amt | 0N
time | 0Nt
Indexing a keyed table¶
There are two ways to index a keyed table.
First, with a single row from its key, returning a dictionary.
q)flip{x cols x}key kt
dick
jane
jack
jill
john
q)kt `john
dob| 1990.11.16
sex| `m
q)flip{x cols x}key ku
Tom NYC
Jo LA
Tom Lagos
q)ku `Tom`Lagos
eye| brown
sex| m
Second, with a sublist from its key, returning a list of dictionaries, which is a table.
q)ku ([]city:`LA`Lagos; name:`Jo`Tom)
eye sex
---------
blue f
brown m
qSQL and Functional SQL
The foregoing describes dictionaries and tables in terms of lists and indexes.
Functional SQL extends the concepts through the query operators ?
and !
.
If you are familiar with SQL, you will find qSQL queries more readable.
Functional SQL,
qSQL
Q for Mortals
§8. Tables
List operations¶
A table is an ordered list. Many list operators work.
Take and Drop¶
For the head and tail of a table use the Take operator #
.
q)2#trade / take first two records
stock price amt time
-----------------------------
ibm 121.3 1000 09:03:06.000
bac 5.76 500 09:03:23.000
q)-2#trade / take last two records
stock price amt time
----------------------------
bac 5.76 500 09:03:23.000
usb 8.19 15 09:04:01.000
q)-2 _ trade / drop last two records
stock price amt time
-----------------------------
ibm 121.3 1000 09:03:06.000
bac 5.76 500 09:03:23.000
usb 8.19 800 09:04:01.000
ibm 121.3 1000 09:03:06.000
Note that Take treats a list as circular if the number of items to take is longer than the list.
q)7#2 3 5
2 3 5 2 3 5 2
An alternative is to use sublist
, which takes only as many rows as are available.
q)count trade
6
q)count 3 sublist trade
3
q)count 30 sublist trade
6
You can also take selected columns.
q)`price`amt#trade
price amt
----------
121.3 1000
5.76 500
8.19 800
121.3 1000
5.76 500
8.19 15
Join and Join Each¶
The Join operator ,
catenates two lists – and tables are lists.
The table columns need not be in the same order.
q)trade,([]time:10:32:17.000 10:35:45.000;stock:`msft`aapl;amt:1500 750;price:17.5 103.2)
stock price amt time
-----------------------------
ibm 121.3 1000 09:03:06.000
bac 5.76 500 09:03:23.000
usb 8.19 800 09:04:01.000
ibm 121.3 1000 09:03:06.000
bac 5.76 500 09:03:23.000
usb 8.19 15 09:04:01.000
msft 17.5 1500 10:32:17.000
aapl 103.2 750 10:35:45.000
Join Each joins pairs of dictionaries and so has upsert semantics.
q)trade,'([]year:2019+til 6;exch:6?`NYSE`LSE;amt:999)
stock price amt time year exch
--------------------------------------
ibm 121.3 999 09:03:06.000 2019 NYSE
bac 5.76 999 09:03:23.000 2020 LSE
usb 8.19 999 09:04:01.000 2021 NYSE
ibm 121.3 999 09:03:06.000 2022 LSE
bac 5.76 999 09:03:23.000 2023 NYSE
usb 8.19 999 09:04:01.000 2024 LSE
There are many other join keywords.
List alternatives to queries¶
Many qSQL queries are equivalent to simple list operations.
select from trade where i=5 / trade[5]
select stock,amt from trade / `stock`amt#trade
select from trade where stock=`ibm / trade where `ibm=trade`stock
Amending a table¶
Inserting records¶
`trade insert (`ibm; 1001; 122.5; 500; 09:04:59:000)
Alternative syntax:
insert [`trade] (`ibm; 1001; 122.5; 500; 09:04:59:000)
insert [`trade; (`ibm; 1001; 122.5; 500; 09:04:59:000)]
q)table:([stock:()] price:())
q)insert[`table; (`intel; enlist (123.2; 120.4; 131.0))]
q)table
stock| price
-----| ---------------
intel| 123.2 120.4 131
Bulk insert¶
The right argument to insert
above is a list. It can also be a table having the same column names as the first argument.
q)`trade insert trade
Upsert¶
q)`trade upsert (`ibm; 122.5; 50; 09:04:59:000)
For a simple table (not keyed) the above is equivalent to an insert
.
For a keyed table, it is an update
if the key exists in the table and an insert
otherwise.
An alternative syntax for upsert
is to use the operator ,:
q)trade ,: (`ibm; 122.5; 50; 09:04:59:000)
upsert
can also take a table as an argument.
trade ,: trade
Delete rows¶
trade: delete from trade where stock=`ibm
delete
returns a table, but does not modify the trade
table in place. The assignment accomplishes that.
An alternative that updates the table in place:
delete from `trade where stock=`ibm
Update values¶
In SQL:
UPDATE trade SET amount=42+amount WHERE stock='ibm'
In q:
trade: update amount:42+amount from trade where stock=`ibm
update
returns a table, but does not modify the underlying table. The assignment accomplishes that. Alternatively:
update amount+42 from `trade where stock=`ibm
update
modifies the table in place much like delete
deletes in place if a symbol is given as the table name. Also note the default column names.
Replace null values¶
Use the Fill operator ^
.
For example, the following replaces all nulls in column amount
by zeroes.
trade.amount: 0^trade.amount
Table to set¶
Use distinct
to remove duplicate records, and count
to count them.
distinct trade
count distinct trade
qSQL query templates
Functional qSQL
Database: persisting tables to the filesystem
Q for Mortals
§8. Tables