How to Work with Data

This guide shows you how to create, save, load, and manage data in KDB-X.

In this guide, you will learn how to:

Create in-memory tables, both keyed and unkeyed
Save and load tables in different formats (single file, text, splayed, partitioned, segmented)
Understand how q stores and handles datatypes such as timestamp, symbol, long, and float
Use linked columns to reference values between tables
Apply database utilities in the .Q namespace to manage partitions and missing tables

By the end, you will know how to persist tables to disk, manage large datasets efficiently, and prepare data structures that support growth over time.

Overview

Create table
Save data (single files, splayed tables, partitioned tables)
Segmented databases
Linked columns
Database utilities

Create table

Datatypes

The examples in this section use four basic datatypes: timestamp (nanosecond precision), symbol (internal character data), long (8-byte signed integer), and float (double precision floating point number). For a full list, including specific types for temporal data, refer to q datatypes.

Define a table using the following syntax:

q)tab:([]colexp1;..;colexpN)

To create a keyed table, specify one or more column expressions within the square brackets:

q)keytab:([keyexp1;..;keyexpM]colexp1;..;colexpN)

Initialize a table with data by defining columns as lists in the keyexp or colexp expressions:

q)trade:([]time:asc 1000?.z.P;sym:1000?`AAPL`IBM`MSFT;size:100*1+1000?10;price:1000?100f)

Alternatively, create an empty schema by defining columns as (optionally typed) empty lists:

q)trade:([]time:`timestamp$();sym:`symbol$();size:`long$();price:`float$())

Character data

q stores character-based data (strings) in two main datatypes: symbols and character vectors. q stores symbols in an internal hash table and represents each instance as a reference to that table. This design speeds up comparisons (no string comparison is required) and reduces the memory and disk space needed to store repeated values.

However, because the symbol pool never shrinks in memory or on disk, avoid using symbols for non-repeating data. Use symbols for values that repeat, such as stock tickers or product IDs. For values that do not repeat (like order IDs or freeform text), store them as nested lists of character vectors.

Save data

The tables you created above live only in memory. To persist them, you need to save them to disk.

You can choose from three on-disk formats. The right format depends on the size of the table and how you expect it to grow:

Single file – Use for smaller tables. When you load this type of table, q loads the entire table into memory. Only use this format if the available memory can hold the full table.
Splayed table – Use for medium-sized tables, typically up to around 100 million rows, when you do not expect the table to grow significantly larger.
Partitioned table – Use for the largest tables, typically hundreds of millions of rows, especially when you expect the table to grow over time.

Single files

The simplest way to save a table is as a single file. Use the save keyword with the table name as the argument:

q)cities:([]city:`Tokyo`Delhi`Shanghai;pop:37435191 29399141 26317104)
q)save`cities
`:cities

You can then load the table into a new process using the load keyword:

q)cities                / not present yet
'cities
[0]  cities
q)load`cities
q)cities
city     pop
-----------------
Tokyo    37435191
Delhi    29399141
Shanghai 26317104

Save in text formats

The save keyword also supports text formats such as .csv, tab-delimited, or .xls. Add the appropriate file extension:

q)save`cities.csv
`:cities.csv
q)read0`:cities.csv     / read back the saved text
"city,pop"
"Tokyo,37435191"
"Delhi,29399141"
"Shanghai,26317104"

Instead of save and load, you can also use the more general set and get:

set takes 2 arguments: the first is the file handle, the second is the object to save.
get takes 1 argument: the file handle to load.

q)`:foo/bar/bigcities set cities
`:foo/bar/bigcities
q)newcities:get`:foo/bar/bigcities
q)newcities
city     pop
-----------------
Tokyo    37435191
Delhi    29399141
Shanghai 26317104

Splayed tables

A splayed table is used for medium-sized tables (around 100 million rows). Each column is stored in a separate file, and q infers the table name from the directory name.

Save a splayed table by using the set keyword with a trailing slash / to indicate a directory:

q)n:1000000
q)trade:`sym`time xasc ([]time:asc n?.z.P;sym:n?`AAPL`IBM`MSDT;size:100*1+n?10;price:n?100f)
q)`:trade/ set select time, size, price from trade
`:trade/
q)key`:trade/                   / list the directory contents
`s#`.d`price`size`time
q)select from `:trade/          / we can select directly from the directory

You will also notice a hidden file named .d inside the directory, which contains the list of columns in the table.

Enumerate symbol columns

q does not save the sym column because it is of type symbol and you must enumerate this type of column before saving it to a splayed or partitioned table. The function .Q.en enumerates symbol columns against a vector called sym and saves the enumeration on disk.

q)`:trade/ set trade            / error as there are unenumerated symbol type columns
'type
  [0]  `:trade/ set trade
                ^
q)`:trade/ set .Q.en[`:.]trade  / enumerate using .Q.en and save as before
`:trade/

Set attributes

You can set attributes such as s (sorted), u (unique), p (parted), or g (grouped) on a column of a splayed table on disk. Use amend to apply these attributes:

q)@[`:trade;`sym;`p#]
`:trade

Important limitation

You can save only unkeyed tables as splayed or partitioned tables.

Partitioned tables

A partitioned table is a splayed table split into groups of rows with the same value in a partition field. You can use date, month, year, or int as partition fields.

Saving data in partitioned tables works the same way as with splayed tables, but each partition (splay) is saved in a subdirectory named after the partition value:

/ save 2 copies of trade as separate partitions
q)`:db/2025.06.01/trade/ set .Q.en[`:db]trade
`:db/2025.06.01/trade/
q)`:db/2025.06.02/trade/ set .Q.en[`:db]trade
`:db/2025.06.02/trade/

This results in a directory structure like:

db
├── 2025.06.01
│   └── trade
│       ├── .d
│       ├── price
│       ├── size
│       ├── sym
│       └── time
├── 2025.06.02
│   └── trade
│       ├── .d
│       ├── price
│       ├── size
│       ├── sym
│       └── time
└── sym

Load the database with \l to see the partitions:

q)\l db
q)date
2025.06.01 2025.06.02
q)trade
date        time                    sym  size price
------------------------------------------------------------
2025.06.01 2000.01.01D00:21:48.836608570 AAPL 100  9.055363
2025.06.01 2000.01.01D00:25:37.859626281 AAPL 500  72.43675
2025.06.01 2000.01.01D00:51:29.565415399 AAPL 1000 58.5308
...

Date columns in partitioned tables

When you reload a partitioned database, q creates a global variable date that lists the partitions. The trade table also includes a virtual date column. q materializes the values in this column only when you query them.

Segmented databases

You can distribute partitioned tables across multiple storage devices or directory locations to:

Give them more space
Support parallelization
Reduce the size of individual partitions, potentially reducing database ingest and maintenance time

At the top level, a segmented database contains only:

Single file tables and objects
Enumeration files
Splayed tables
A text file par.txt that lists the directories containing the data for the partitions

Here’s the structure of a segmented database with 2 partitions split across 2 segments, each containing a single table:

├── db
│   ├── par.txt
│   └── sym
├── db1
│   └── 2025.06.01
│       └── trade
│           ├── .d
│           ├── price
│           ├── size
│           ├── sym
│           └── time
└── db2
    └── 2025.06.02
        └── trade
            ├── .d
            ├── price
            ├── size
            ├── sym
            └── time

To create this segmented database, run the following q code in a designated folder:

q)n:1000000
q)trade:`sym`time xasc ([]time:asc n?.z.P;sym:n?`AAPL`IBM`MSDT;size:100*1+n?10;price:n?100f)
q)`:db1/2025.06.01/trade/ set .Q.en[`:db]trade
`:db1/2025.06.01/trade/
q)`:db2/2025.06.02/trade/ set .Q.en[`:db]trade
`:db2/2025.06.02/trade/
q)`:db/par.txt 0: ("../db1";"../db2")
`:db/par.txt
q)\l db

The root of this database is db. The directories db1 and db2 are the segments. The par.txt file contains paths to the segments, either absolute or relative to its location:

../db1
../db2

Save data to segmented databases

Saving data to a segmented database works much like saving to a non-segmented one. The difference is that you must distribute the data across the available segments.

q provides the function .Q.par to round-robin partitions over the segments listed in par.txt. For example:

q)\l db
q).Q.par[`:.;2025.06.03;`$"trade/"]
`:../db2/2025.06.03/trade/
q).Q.par[`:.;2025.06.04;`$"trade/"]
`:../db1/2025.06.04/trade/

.Q.par simply returns a directory location. You must still write the data to this location using set, just as you do with partitioned tables:

q)n:1000000
q)trade:`sym`time xasc ([]time:asc n?.z.P;sym:n?`AAPL`IBM`MSDT;size:100*1+n?10;price:n?100f)
q).Q.par[`:.;2025.06.03;`$"trade/"]set .Q.en[`:.]trade
`:../db2/2025.06.03/trade/
q).Q.par[`:.;2025.06.04;`$"trade/"]set .Q.en[`:.]trade
`:../db1/2025.06.04/trade/

Linked columns

Linked columns store indices into another table. This lets you reference values from the second table easily when you write queries. Linked columns do not enforce referential integrity — they simply act as indices into a second table along with a label. For example:

q)cities:([]city:`Tokyo`Delhi`Shanghai;pop:37435191 29399141 26317104;country:`japan`india`china)
q)countries:([]country:`india`japan`china;pop:1463865525 123103479 1416096094)
q)update country:`countries!countries.country?country from `cities;
q)select city, pop, country.country, countrypop:country.pop from cities
city     pop      country countrypop
------------------------------------
Tokyo    37435191 japan   123103479
Delhi    29399141 india   1463865525
Shanghai 26317104 china   1416096094

Linked columns play an important role in on-disk data. q does not require the target table of a link to be a keyed table, and in fact you cannot key splayed or partitioned tables.

Database utilities

Each partitioned table is expected to appear in every partition of a database, even when some partitions contain no data. If a table is missing from a partition, q may return errors. This situation often arises when you add a new table but have not yet backfilled it across the database. q provides utilities in the .Q namespace to help manage this.

`.Q.chk` fill database

Use .Q.chk to scan the database for tables that are missing from partitions. The function takes a handle to the database as its single argument.

.Q.chk checks the most recent partition to determine the complete list of tables. When it finds a missing table in other partitions, it places an empty prototype in that partition.

Because .Q.chk writes to the database, you must run it as a user with write permissions to the database directory on disk.

`.Q.bv` build virtual prototypes

Use .Q.bv to scan for missing tables without writing to disk. Instead, it builds a map of missing table/partition pairs and creates an empty prototype for each missing table.

At runtime, q supplies these prototypes to queries that attempt to access missing data. This lets you continue querying the database without errors, even if you have not backfilled the new table.

Summary

In this guide, you:

Defined unkeyed and keyed tables, and initialized tables with data or empty schemas.
Saved and loaded tables as single files, in text formats, or with set/get.
Stored data in splayed tables for medium-sized datasets, including symbol enumeration and column attributes.
Used partitioned tables for large and growing datasets, and reloaded them with \l to access partitions and virtual columns.
Distributed partitions across multiple devices using segmented databases.
Linked columns between tables for efficient lookups in on-disk data.
Applied .Q.chk to fill missing tables with empty prototypes and .Q.bv to build virtual prototypes at runtime.

You now have the essential skills to create, save, and manage data in KDB-X across multiple storage formats and database structures.