Tutorials/Tables II

From Kx Wiki
Jump to: navigation, search

Prerequisites: The Basics, Datatypes, Lists, Functions, Dictionaries, Tables I.

Tables form the basis of a kdb+ database and are consequently one of the most important data structures in q. Tables I deals with creating and manipulating tables, including inserts and upserts. This tutorial is about queries and joins.

Contents

Queries

The simplest way of accessing data contained in a table is using a select statement, which looks something like this:

select <columns> from <table> where <condition>
select <columns> by <columns> from <table> where <condition>

Using the table family (from Tables I), here are some examples of select statements.

q)family
name    age hair   eyes
------------------------
John    52  brown  blue
Mary    49  black  brown
David   18  blonde blue
Jessica 13  black  blue
Avery   30  black  green
q)select from family where hair=`black
name    age hair  eyes
-----------------------
Mary    49  black brown
Jessica 13  black blue
Avery   30  black green
q)select name, hair from family where age>30
name hair
----------
John brown
Mary black

The by clause allows you to perform aggregations or group the results based on a particular column or columns.

q)select avg age by hair from family
hair  | age
------| --------
black | 30.66667
blonde| 18
brown | 52

Select statements always return a table. It can be useful to return a list or dictionary instead – for this we use exec.

q)exec name from family where eyes=`blue
`John`David`Jessica
q)exec name, eyes from family where hair=`black
name| Mary  Jessica Avery
eyes| brown blue    green

Notice that the result is a list if only one column is specified, but a dictionary (with column names as keys) when multiple columns are specified.

Updates and deletes

There are two keywords, update and delete, that work similarly to select and exec. Each of them returns a table modified in some way:

q)delete eyes from family
name    age hair
------------------
John    52  brown
Mary    49  black
David   18  blonde
Jessica 13  black
Avery   30  black
q)update age:age+1 from family where name in `John`Mary
name    age hair   eyes
------------------------
John    53  brown  blue
Mary    50  black  brown
David   18  blonde blue
Jessica 13  black  blue
Avery   30  black  green

Display family after these queries – what do you notice? The queries have not modified the table, but have simply returned a result.

To modify the table, we instead pass a reference to the table. The expression is modified only very slightly: a backtick precedes the table name. You might remember from the previous tutorial that this was also necessary when modifying a table using insert or upsert.

q)update surname:`Quill from `family
`family
q)delete from `family where name in `David`Jessica
`family

What does family look like now? It should have 3 rows and 4 columns!

Tip: Be very careful with deletes, you don't want accidentally to delete your entire in-memory database!

Joins

There are many ways of joining tables in q/kdb+. This tutorial deals with only some!

Let's say we've got the following tables.

q)students:([] id:140 265 204 212 367 197 329 242; class:`green`blue`green`orange`green`blue`green`green)
q)students
id  class
----------
140 green
265 blue
204 green
212 orange
367 green
197 blue
329 green
242 green
q)mentors:([] class:`green`blue`violet; mentor_name:("Julia Johnson";"Teddy Rowles";"Gerald Carlier"))
q)mentors
class  mentor_name
-----------------------
green  "Julia Johnson"
blue   "Teddy Rowles"
violet "Gerald Carlier"

The simplest kind of join just pastes the rows of one table onto the end of the other. If the schemas (lists of column names) are identical, we join them using a simple comma. Try typing out each of the following lines.

q)mentors,mentors
q)mentors,students

The second line fails, as the two tables do not have matching schemas.

We are going to look at three more joins: the left join lj, inner join ij and union join uj.

Left join and inner join each take two tables as inputs: the second table must be keyed, and the first table must contain those keyed columns.

<table> lj <keyed-table>
<table> ij <keyed-table>

We can join students and mentors on the class column. Type out the following lines and compare the results.

students lj 1!mentors
students ij 1!mentors

The left join contains all of the rows from students, whereas the inner join contains only those rows matching the class column in mentors.

Finally, to join the two tables without matching rows, we use the union join:

q)students uj mentors
id  class  mentor_name
---------------------------
140 green  ""
265 blue   ""
204 green  ""
212 orange ""
367 green  ""
197 blue   ""
329 green  ""
242 green  ""
    green  "Julia Johnson"
    blue   "Teddy Rowles"
    violet "Gerald Carlier"

For the union join either neither table will be keyed, or the tables will have the same keys.

See Reference/joins for more information on these and other joins.

Exercises

1. Create a table cities with columns city and state, keyed to city.

2. Create an unkeyed table sales with columns city and sale.

As an alternative to typing in the raw content of sales, see if you can construct it using rand and the city column of cities.

3. Join the tables to show the state for each sale. Your result will be an unkeyed table with columns city, sale, and state.

4. Report the total sales, by state. Your result will be a table with columns state and sale, keyed to state, with a row for each unique state in the result from (3).

Solutions

Personal tools
Namespaces
Variants
Actions
Navigation
Print/export
Toolbox