How to work with tables¶

This page introduces tables in q and demonstrates how to create and manipulate them efficiently.

Tables are first-class objects in q, meaning they are data structures that live in memory just like lists or dictionaries.

A table can be viewed from two perspectives:

A list of rows/records: where each record is a dictionary with identical keys.
A list of columns: where each column is a named list of equal length.

Internally, q implements a table as a flipped dictionary. Because operations on columns are vectorized, this structure allows for extreme performance when performing calculations across millions of records. A downside of this design is that row deletion is relatively expensive, as all the column lists must be compacted to close the resulting gap.

The column elements are called items or fields.

Creating a table¶

There are three primary ways to initialize a table in memory.

1. From a list of dictionaries¶

A list of conforming dictionaries (sharing the same keys) is automatically interpreted as a table.

q

q)d1: ([name: `Alice; phone: "555-0101"; age: 23])
q)d2: ([name: `Bob; phone: "555-0723"; age: 32])
q)d3: ([name: `Mike; phone: "555-6666"; age: 22])
q)(d1; d2; d3)
name  phone      age
--------------------
Alice "555-0101" 23
Bob   "555-0723" 32
Mike  "555-6666" 22

pandas

>>> d1 = {'name': 'Alice', 'phone': '555-0101', 'age': 23}
>>> d2 = {'name': 'Bob', 'phone': '555-0723', 'age': 32}
>>> d3 = {'name': 'Mike', 'phone': '555-6666', 'age': 22}
>>> pd.DataFrame([d1, d2, d3])
    name     phone  age
0  Alice  555-0101   23
1    Bob  555-0723   32
2   Mike  555-6666   22

2. By flipping a column dictionary¶

Since a table is a flipped dictionary, you can create one by applying flip to a dictionary of lists of the same size:

q

q)flip ([name: `Alice`Bob`Mike; phone: ("555-0101"; "555-0723"; "555-6666"); age: 23 32 22])
name  phone      age
--------------------
Alice "555-0101" 23
Bob   "555-0723" 32
Mike  "555-6666" 22

pandas

>>> pd.DataFrame({
...     'name': ['Alice', 'Bob', 'Mike'],
...     'phone': ['555-0101', '555-0723', '555-6666'],
...     'age': [23, 32, 22]
... })
    name     phone  age
0  Alice  555-0101   23
1    Bob  555-0723   32
2   Mike  555-6666   22

3. Table definition syntax (Recommended)¶

This is the most concise and idiomatic way to define a table. The [] brackets denote an unkeyed (simple) table.

q

q)([] name: `Alice`Bob`Mike; age: 23 32 22; city:`Dublin)
name  age city
----------------
Alice 23  Dublin
Bob   32  Dublin
Mike  22  Dublin

pandas

>>> pd.DataFrame({
...     'name': ['Alice', 'Bob', 'Mike'],
...     'age': [23, 32, 22],
...     'city': 'Dublin'
... })
    name  age    city
0  Alice   23  Dublin
1    Bob   32  Dublin
2   Mike   22  Dublin

Note

If a column is provided as an atom (like city:`Dublin above), q automatically extends it to match the length of the other columns.

Implicit column names¶

Table definition also makes it convenient to create a table from variables.

q)name: `Alice`Bob`Mike
q)phone: ("555-0101"; "555-0723"; "555-6666")
q)age: 23 32 22
q)([] name; phone; age)
name  phone      age
--------------------
Alice "555-0101" 23
Bob   "555-0723" 32
Mike  "555-6666" 22

Default column names are assigned when column names are omitted.

q)([] 1 2; `a`b)
x x1
----
1 a
2 b

Empty tables¶

Empty tables can be created from empty lists:

q)([] name: `$(); phone: (); age: `long$())
name phone age
--------------

It is recommended to specify the type of each column in an empty table. You can use the each-left iterator and type characters to avoid repeating empty list specifications for each column:

q

q)flip `name`phone`age!"s*j"$\:()
name phone age
--------------

pandas

>>> pd.DataFrame({
...     'name': pd.Series(dtype='str'),
...     'phone': pd.Series(dtype='object'),
...     'age': pd.Series(dtype='int64')
... })
Empty DataFrame
Columns: [name, phone, age]
Index: []

For wider tables, it is more convenient to define a schema dictionary:

q

q)schema: ([name:"s"; phone:"*"; age: "j"])
q)flip schema$\:()
name phone age
--------------

pandas

>>> schema = {'name': 'str', 'phone': 'object', 'age': 'int64'}
>>> pd.DataFrame({
...     col: pd.Series(dtype=dt)
...     for col, dt in schema.items()
... })
Empty DataFrame
Columns: [name, phone, age]
Index: []

Table metadata¶

Understanding the structure of your data is critical for debugging and optimization.

cols: Returns the column names as a list of symbols.
meta: Returns a keyed table describing column types, foreign keys, and attributes.
tables: Lists all tables defined in the namespace provided.

q

q)meta t
c    | t f a
-----| -----
name | s
phone| C
age  | j

pandas

>>> t.dtypes
name     object
phone    object
age       int64
dtype: object

Column	Description
c	column names
t	data type character (lowercase for atoms, uppercase for nested lists)
f	foreign key link (if any)
a	applied attributes (e.g., `s` for sorted, `u` for unique)

You can rename and reorder columns using xcol and xcols, respectively:

q

q)([name: `nickname]) xcol t
nickname phone      age
-----------------------
Alice    "555-0101" 23
Bob      "555-0723" 32
Mike     "555-6666" 22
q)`name`age xcols t
name  age phone
--------------------
Alice 23  "555-0101"
Bob   32  "555-0723"
Mike  22  "555-6666"

pandas

>>> t.rename(columns={'name': 'nickname'})
  nickname     phone  age
0    Alice  555-0101   23
1      Bob  555-0723   32
2     Mike  555-6666   22
>>> t[['name', 'age'] + [ c for c in t.columns if c not in ['name', 'age']]]
    name  age     phone
0  Alice   23  555-0101
1    Bob   32  555-0723
2   Mike   22  555-6666

Appending Records¶

Functions insert and upsert append a single record or a batch of records to a table in-place. insert throws an error if the table does not exist.

q

q)t,:(`John; "555-5382"; 40)  / append a single row
q)`t insert (`George`David; ("555-4000"; "555-4001"); 18 19)  / in-place batch append
4 5
q)`t upsert ((`Phil;"555-2341"; 44); (`Ryan;"555-6267";55))
`t
q)t
name   phone      age
---------------------
Alice  "555-0101" 23
Bob    "555-0723" 32
Mike   "555-6666" 22
George "555-4000" 18
David  "555-4001" 19
Phil   "555-2341" 44
Ryan   "555-6267" 55

pandas

>>> t = pd.concat([t, pd.DataFrame(dict(zip(t.columns, [['George', 'David'], ['555-4000', '555-4001'], [18, 19]])))], ignore_index=True)

>>> t = pd.concat([t, pd.DataFrame([['Phil', '555-2341', 44], ['Ryan', '555-6267', 55]], columns=t.columns)], ignore_index=True)

>>> t
     name     phone  age
0   Alice  555-0101   23
1     Bob  555-0723   32
2    Mike  555-6666   22
4  George  555-4000   18
5   David  555-4001   19
6    Phil  555-2341   44
7    Ryan  555-6267   55

Because tables are lists, you can also append data using list concatenation (,) — the simplest form of a join operation — combined with variable assignment:

q

q)t,:(`John; "555-5382"; 40)  / append a single row

pandas

>>> t = pd.concat([t, pd.DataFrame([['John', '555-5382', 40]], columns=t.columns)], ignore_index=True)

List operations¶

Since tables are lists, most list operations apply to tables. For a full reference, see list operations; only the most frequently used operations are listed here.

Number of rows: `count`¶

The count function returns the number of rows in a table:

q

q)t:([] name: `Alice`Bob`Mike; phone: ("555-0101"; "555-0723"; "555-6666"); age: 23 32 22)
q)count t
3

pandas

>>> t = pd.DataFrame({'name': ['Alice', 'Bob', 'Mike'], 'phone': ['555-0101', '555-0723', '555-6666'], 'age': [23, 32, 22]})
>>> len(t)
3

Indexing¶

Tables support both positional and label-based indexing:

q

q)t:([] name: `Alice`Bob`Mike; phone: ("555-0101"; "555-0723"; "555-6666"); age: 23 32 22)
q)t 2         / get the third record
name | `Mike
phone| "555-6666"
age  | 22

q)t `name     / get the 'name' column
`Alice`Bob`Mike



q)t[2; `name]   / get the 'name' value of the third row
`Mike
q)t . (2;`name) / index at depth by a list
`Mike
q)t 2 0         / index by list
name  phone      age
--------------------
Mike  "555-6666" 22
Alice "555-0101" 23
q)first t
name | `Alice
phone| "555-0101"
age  | 23

pandas

>>> t = pd.DataFrame({'name': ['Alice', 'Bob', 'Mike'], 'phone': ['555-0101', '555-0723', '555-6666'], 'age': [23, 32, 22]})
>>> t.iloc[2]
name         Mike
phone    555-6666
age            22
Name: 2, dtype: object
>>> t['name']
0    Alice
1      Bob
2     Mike
Name: name, dtype: object
>>> t.at[2, 'name']
'Mike'


>>> t.iloc[[2, 0]]
    name     phone  age
2   Mike  555-6666   22
0  Alice  555-0101   23

>>> t.iloc[0]
name     Alice
phone    555-0101
age            23
Name: 0, dtype: object

Indexed Assignment¶

You can overwrite any item by indexing:

q

q)t:([] name: `Alice`Bob`Mike; phone: ("555-0101"; "555-0723"; "555-6666"); age: 23 32 22)
q)t[1]: (`John; "555-5382"; 40)
q)t
name  phone      age
--------------------
Alice "555-0101" 23
John  "555-5382" 40
Mike  "555-6666" 22

pandas

>>> t = pd.DataFrame({'name': ['Alice', 'Bob', 'Mike'], 'phone': ['555-0101', '555-0723', '555-6666'], 'age': [23, 32, 22]})
>>> t.iloc[1] = ['John', '555-5382', 40]
>>> t
    name     phone  age
0  Alice  555-0101   23
1   John  555-5382   40
2   Mike  555-6666   22

or by using general amend at:

q)@[t;2;:;(`Phil;"555-2341"; 44)]
name  phone      age
--------------------
Alice "555-0101" 23
John  "555-5382" 40
Phil  "555-2341" 44

Typically, an update statement is used to modify (or get a new copy of) a table. Combining an update with a where clause provides a convenient and readable solution.

Sub-tables¶

You can extract sub-tables using the sublist function and the take operator. The inverse operator is _:

q

q)t:([] name: `Alice`Bob`Mike; phone: ("555-0101"; "555-0723"; "555-6666"); age: 23 32 22)
q)2 sublist t       / sub-table
name  phone      age
--------------------
Alice "555-0101" 23
Bob   "555-0723" 32
q)-2 sublist t      / last two records
name phone      age
-------------------
Bob  "555-0723" 32
Mike "555-6666" 22
q)2 _ t             / delete first two records
name phone      age
-------------------
Mike "555-6666" 22
q)`name`age cut t   / delete columns
phone
----------
"555-0101"
"555-0723"
"555-6666"
q)t _ 1             / delete records at index 1
name  phone      age
--------------------
Alice "555-0101" 23
Mike  "555-6666" 22

pandas

>>> t = pd.DataFrame({'name': ['Alice', 'Bob', 'Mike'], 'phone': ['555-0101', '555-0723', '555-6666'], 'age': [23, 32, 22]})
>>> t.head(2)
    name     phone  age
0  Alice  555-0101   23
1    Bob  555-0723   32

>>> t.tail(2)
   name     phone  age
1   Bob  555-0723   32
2  Mike  555-6666   22

>>> t.iloc[2:]
   name     phone  age
2  Mike  555-6666   22

>>> t.drop(columns=['name', 'age'])
      phone
0  555-0101
1  555-0723
2  555-6666

>>> t.drop(index=1)
    name     phone  age
0  Alice  555-0101   23
2   Mike  555-6666   22

Delete statements are another convenient way to delete rows from a table.

The take function is similar to sublist, but wraps around if the requested count exceeds the table length:

q)5#t
name  phone      age
--------------------
Alice "555-0101" 23
Bob   "555-0723" 32
Mike  "555-6666" 22
Alice "555-0101" 23
Bob   "555-0723" 32

You can also take selected columns:

q

q)`name`age#t
name  age
---------
Alice 23
Bob   32
Mike  22

pandas

>>> t[['name', 'age']]
    name  age
0  Alice   23
1    Bob   32
2   Mike   22

Arithmetic operations¶

All arithmetic operations apply to tables containing numeric columns. Operators are applied element-wise.

q

q)show t1: flip `a`b!3 cut til 6
a b
---
0 3
1 4
2 5
q)2*t1
a b
----
0 6
2 8
4 10
q)show t2: flip `a`b!3 cut 6?20
a  b
-----
12 1
8  9
10 11
q)t1 + t2
a  b
-----
12 4
9  13
12 16
q)neg t1
a  b
-----
0  -3
-1 -4
-2 -5

pandas

>>> t1 = pd.DataFrame(np.arange(6).reshape(2, 3).T, columns=['a', 'b'])
>>> t1
   a  b
0  0  3
1  1  4
2  2  5
>>> 2 * t1
   a   b
0  0   6
1  2   8
2  4  10

>>> t2 = pd.DataFrame(np.random.randint(0, 20, size=(2, 3)).T, columns=['a', 'b'])
>>> t2
   a   b
0 12   1
1  8   9
2 10  11
>>> t1 + t2
    a   b
0  12   4
1   9  13
2  12  16

>>> -t1
   a  b
0  0 -3
1 -1 -4
2 -2 -5

Agregations are executed column-wise and result normal dictionaries:

q

q)avg t1
a| 1
b| 4

pandas

>>> t1.mean()
a    1.0
b    4.0
dtype: float64

Replace null values: Coalesce (`^`)¶

The coalesce operator ^ is related to , in that it employs upsert semantics to merge two tables with right prevailing over left. The difference is that null values in the right operand do not prevail over those in the left.

q

q)show t:([] a: 1 0N; b: 0N 4)
a b
---
1
  4
q)0^t
a b
---
1 0
0 4
q)([] a: 10 2; b: 3 10) ^ t
a b
---
1 3
2 4

pandas

>>> t = pd.DataFrame({'a': [1, None], 'b': [None, 4]}).astype('Int64')
>>> t
      a     b
0     1  <NA>
1  <NA>     4
>>> t.fillna(0)
   a  b
0  1  0
1  0  4

>>> t.combine_first(pd.DataFrame({'a': [10, 2], 'b': [3, 10]}))
   a  b
0  1  3
1  2  4

Reordering¶

Tables are lists, and there are multiple ways to reorder a list:

q

q)t:([] name: `Alice`Bob`Mike; phone: ("555-0101"; "555-0723"; "555-6666"); age: 23 32 22)
q)reverse t
name  phone      age
--------------------
Mike  "555-6666" 22
Bob   "555-0723" 32
Alice "555-0101" 23
q)0N?t    / random permutation
name  phone      age
--------------------
Bob   "555-0723" 32
Alice "555-0101" 23
Mike  "555-6666" 22

pandas

>>> t = pd.DataFrame({'name': ['Alice', 'Bob', 'Mike'], 'phone': ['555-0101', '555-0723', '555-6666'], 'age': [23, 32, 22]})
>>> t.iloc[::-1].reset_index(drop=True)
    name     phone  age
0   Mike  555-6666   22
1    Bob  555-0723   32
2  Alice  555-0101   23

>>> t.sample(frac=1).reset_index(drop=True)
    name     phone  age
0    Bob  555-0723   32
1  Alice  555-0101   23
2   Mike  555-6666   22

Functions asc and desc sort lexicographically by all columns:

q

q)asc ([]a:4 3 4; b:`s`a`d)
a b
---
3 a
4 d
4 s

pandas

>>> pd.DataFrame({'a': [4, 3, 4], 'b': ['s', 'a', 'd']}).sort_values(by=['a', 'b'])
   a  b
1  3  a
2  4  d
0  4  s

To sort by selected columns, use xasc and xdesc:

q

q)`a xasc ([]a:4 3 4; b:`s`a`d)
a b
---
3 a
4 s
4 d

pandas

>>> pd.DataFrame({'a': [4, 3, 4], 'b': ['s', 'a', 'd']}).sort_values(by='a', kind='stable')
   a  b
1  3  a
0  4  s
2  4  d

The sort in q is stable: it preserves the relative order of equal elements.

Miscellaneous¶

Grouping by a field or multiple fields is a basic operation in data analysis:

q

q)t:([] name: `Alice`Bob`Mike; kids: (1#`Bill; (); `Maggie`Jack`Ellie))
q)show nt: ungroup t     / convert to normal table, "flatten" list fields
name  kids
------------
Alice Bill
Mike  Maggie
Mike  Jack
Mike  Ellie
q)`name xgroup nt       / group by name
name | kids
-----| ------------------
Alice| ,`Bill
Mike | `Maggie`Jack`Ellie

pandas

>>> t = pd.DataFrame({'name': ['Alice', 'Bob', 'Mike'], 'kids': [['Bill'], [], ['Maggie', 'Jack', 'Ellie']]})
>>> nt = t.explode('kids').dropna(subset=['kids'])
>>> nt
    name    kids
0  Alice    Bill
2   Mike  Maggie
2   Mike    Jack
2   Mike   Ellie
>>> nt.groupby('name').agg(list)
                        kids
name
Alice                 [Bill]
Mike   [Maggie, Jack, Ellie]

Set operations like inter, union, except and distinct handle tables as expected - the set items are table records. Furthermore, there are multiple ways to join tables.

Querying¶

q supports an SQL-like syntax for querying tables. See section Guides → Querying for examples and more details.

Next Steps¶

For a deeper dive into tables, check out Q for Mortals §8. Tables.
Learn how to create keyed tables from two tables.
To load a CSV as a table, refer to Load CSV.
Move your in-memory data to the filesystem - see Database: persisting tables in the filesystem.
Improve the performance of list operations by applying attributes.

How to work with tables¶

Creating a table¶

1. From a list of dictionaries¶

2. By flipping a column dictionary¶

3. Table definition syntax (Recommended)¶

Implicit column names¶

Empty tables¶

Table metadata¶

Appending Records¶

List operations¶

Number of rows: count¶

Indexing¶

Indexed Assignment¶

Sub-tables¶

Arithmetic operations¶

Replace null values: Coalesce (^)¶

Reordering¶

Miscellaneous¶

Querying¶

Next Steps¶

Number of rows: `count`¶

Replace null values: Coalesce (`^`)¶