Performance Tips and Tricks

This page provides guidelines for writing high-performance q code. It covers efficient use of built-in keywords, utility functions, datatypes, and attributes, along with best practices for loops, operators, and partitioned queries.

q is a programming language that lets us achieve extremely high performance compared to other solutions. However, we also need to know what patterns and setups to use and what to avoid to get the most out of our code.

Overview

Use built-in keywords
Know the available utility functions and where they are useful
Avoid explicit loops
Know which operators are atomic
Use the right datatypes
Be familiar with attributes
Handle partitioned tables appropriately
Test everything

Use built-in keywords

For common tasks, there is probably a keyword or operator that already exists in q. Trying to reinvent them from more primitive operators/iterators might incur a significant performance cost.

q)a:1000?100000; b:100000?100000
q)\t a in b
0
q)\t any a~\:/:b
3238
q)(a in b)~any a~\:/:b
1b

However, this doesn’t necessarily mean that a built-in function that appears to do exactly what we need is always the best solution for a given problem. For example, if we want to convert a list of dates formatted with hyphens to use dots instead, we might consider using ssr:

q)ip:@[;4 7;:;"-"]each string 2001.01.01+100000?1000
q)5#ip
"2002-05-21"
"2002-06-23"
"2003-05-28"
"2002-09-26"
"2002-01-31"
q)\t ssr[;"-";"."] each ip
352

But we can use that fact that we don't need to search at all, we know exactly which positions we want to change:

q)\t @[;4 7;:;"."] each ip
32

Even better, if we only want to do this to then convert the data to a q date, then we don't need to do this at all! q will recognize the dates even with hyphens in them.

q)\t "D"$ssr[;"-";"."] each ip
411
q)\t "D"$@[;4 7;:;"."] each ip
40
q)\t "D"$ip
6

Know the available utility functions and where they are useful

There are functions in built-in namespaces that are probably less known utilities. One example of that is .Q.fu. This utility function makes it so if a function would be applied to every item in a list, the function is only called once if there are identical elements in a list. There is a clear advantage to using this in cases where the function is computationally intensive.

q)l:1000000?10
q)f:{1%(1%2)+1%x}
q)\t f l
8
q)\t .Q.fu[f;l]
4

Using this has an overhead as well, as q has to check whether two elements are identical so as not to call the function again. If the list is close to being made up of distinct elements, .Q.fu should not be used.

q)l:til 1000000
q)\t f l
8
q)\t .Q.fu[f;l]
22

This is far from the only utility function that can increase performance. Please refer to the .Q namespace documentation for further examples.

Avoid explicit loops

q is a functional programming language and therefore usually handles iteration through iterators and maps instead of loops. Utilizing these correctly and efficiently is not only a question of programming style but of performance as well.

Let's look at an example of code where we use a programming style commonly used in certain other programming languages:

fLoop:{
  c:count x;
  i:0;
  toRet:();
  while[i<c;
    if[9990<x i; toRet,:i];
    i+:1
  ];
  toRet
 }

This kind of loop is not needed in q. Instead, there are vectorized declarative statements we can use:

fVec:{
  where 9990<x
 }

The latter version is more readable, but is also more performant by several orders of magnitude:

q)l:1000000?10000f
q)10#l
6335.327 9706.507 247.894 6494.84 4588.717 469.1395 3089.312 8390.561 245.4883 5496.135
q)\t fLoop l
1029
q)\t fVec l
2

However, there might be certain cases where it is faster to use non-vectorized logic:

fLoop:{
  c:count x;
  i:0;
  while[i<c;
    if[9990<x i; :i];
    i+:1
  ];
  0N
 }; 
fVec:{
  first where 9990<x
 }

Because the non-vectorized version exits the lambda when it finds the element it's looking for, the whole function will usually terminate faster than the vectorized version even though the individual operations are slow compared to the vectorized computation.

q)fLoop l
2247
// notice that the non-vectorized version has to do 2247 comparisons,
// while the vectorized one has to do the whole million!
q)\t fLoop l
1
q)\t fVec l
3

Know which operators are atomic

It isn't only a question of simplicity to know which functions are atomic, it is a question of performance as well. By using an extra unnecessary iterator, we lose the function's innate vector computing capabilites.

q)l:100000?100
q)\t 3+l
0
q)\t (3+)each l
12

Use the right datatypes

Choosing the most efficient datatypes for storage and for certain operations is an issue of memory and speed. For storage, similarly to other programming languages, choose the smallest datatype that can hold all your data. This is a question with floating numbers and integers, those are the types of variable that have multiple datatypes.

This is also a question for certain operations. There are keywords than can utilize multiple datatypes which can make a conversion unnecessary. For example, like can make use of symbols in the left argument, not just strings:

q)l:100000?`3
q)10#l
`ccm`mgj`inl`pam`nhj`bib`loa`hdg`mca`hgd
q)\t where string[l] like "aa*"
26
q)\t where l like "aa*"
3

Be familiar with attributes

Attributes play an important role in optimizing the overall time and memory usage. An existing attribute can make an operation complete faster.

q)n:1000000
q)t:([]time:2001.01.01D10:10:00+1000000*til n;id:n?`4)
q)3#t
time                          id
----------------------------------
2001.01.01D10:10:00.000000000 onnk
2001.01.01D10:10:00.001000000 gdmb
2001.01.01D10:10:00.002000000 bkem
q)\t select id from t where time within 2001.01.01D10:12:01.367 2001.01.01D10:12:02.371
3
q)update `s#time from `t
`t
q)\t select id from t where time within 2001.01.01D10:12:01.367 2001.01.01D10:12:02.371
0

Attributes also play a role in operations performed on tables stored on disk. For example, the as-of join performs best with different attributes depending on whether the tables are in memory or on disk. The documentation page provides detailed guidance on the attributes necessary for optimal aj performance, but let us look at an on-disk example here.

q)nt:10000000
q)nq:1000
q)t:([]sym:nt?`3; time:2025.01.01D0+nt?nt; qty:nt?1000)
q)q:([]sym:nq?exec distinct sym from t; time:2025.01.01D0+nq?nt; px:nq?1000)
q)`:t/ set .Q.en[`:.] t        / splay t to disk
`:t/
q)`:q/ set .Q.en[`:.] q
`:q/
q)qs:update `p#sym from `sym`time xasc q        / apply parted attribute on sym col in table q
q)`:qs/ set .Q.en[`:.] qs
`:qs/
q)\l .        / load tables from disk and override in-memory versions
q)\ts aj[`sym`time; select from t; select from q]
6629 436209648
q)\ts aj[`sym`time; select from t; select from qs]
1308 436209648

The example shows that aj performs significantly better if the columns operated on by the join (sym and time in the example) are sorted, with sym having the parted attribute applied to the column, just as described in the documentation.

Attributes, however, can impose a maintenance need on the variable. For instance, if the rows for a table with a sorted column aren't already sorted by that column, we need to sort them before applying the attribute. This can prove time-consuming if the records are arriving in real time. Refer to the attributes documentation for more information, as well as this whitepaper for a thorough test of queries on databases with attributes.

Handle partitioned tables appropriately

Partitioned tables are a frequent storage solution in q, but we have to know how to write efficient queries for them.

The most important aspect of a query is that the first where constraint must filter on the partitioning column. Otherwise, all the partitions are loaded into memory at the same time, and that will usually crash the process (as there is usually not enough memory to do that). The other constraints should also come in an order so that the largest possible amount of rows is filtered out with each step.

q)n:10000000
q)t:([]date:n?2020.10.01+til 30;sym:n?`a`b`c`d;v:n?8;g:n?0ng)
q){sv[`;.Q.par[`:db/;x;`t],`] set .Q.en[`:db/;delete date from select from t where date=x]}each exec distinct date from t
`:db//2020.10.17/t/`:db//2020.10.13/t/..
q)\l db

q)\t select from t where date=2020.10.06, v=2, sym<>`a
6
q)\t select from t where date=2020.10.06, sym<>`a, v=2
8
q)\t select from t where sym<>`a, v=2, date=2020.10.06
126

Parallel processing

By default, KDB-X uses a single thread. However, there are several ways to perform operations in parallel to improve performance.

First, we use the command line argument -s to start q with multiple secondary threads or processes:

$ q -s 4
KDB-X 5.0.20251113 2025.11.13 Copyright (C) 1993-2025 Kx Systems
...

With multithreading initialized, there are several ways these secondary threads or processes can be utilized. The first of these is peach. peach is short for parallel-each and is the parallelized version of the each iterator. Its output is exactly the same as that of each, but instead of running the function to be iterated one-by-one over the input list, it will distribute the computation to the secondary threads/processes.

q)v:1000000?1000
q)f:{sqrt 1000000 - x*x}
q)f each v
820.2536 622.0217 659.1631 705.2085 983.4831 999.902..
q)f peach v
820.2536 622.0217 659.1631 705.2085 983.4831 999.902..

Another approach here is recognizing that because q is a vector language and all the operators/keywords we use here are atomic, we don't even need each. This doesn't only work from a syntactic viewpoint, it also natively uses the secondary threads/processes because the operators/keywords aren't only atomic, they're also multithreaded primitives.

q)f v
820.2536 622.0217 659.1631 705.2085 983.4831 999.902..

A final parallelization function with the same result is .Q.fc. This function works by cutting the input list up into equal-sized chunks that it distributes among the secondary threads/processes for computation. (This also works because f is atomic, otherwise, we would need to write .Q.fc[f each] instead, which usually is not a better-performing solution than any of the alternatives.)

q).Q.fc[f] v
820.2536 622.0217 659.1631 705.2085 983.4831 999.902..

The different approaches work best in different situations. Let's see how fast the different calls complete:

q)\ts f each v
310 32777456
q)\ts f peach v
276 25165552
q)\ts f v
14 25166176
q)\ts .Q.fc[f] v
22 8389296

In this case, the multithreaded primitive approach works the fastest. This is the usual case: if a call is made up of only multithreaded primitives, it is probably a good idea to make the simplest call.

If that's not the case, we have to consider when the different methods work best.

Multithreading has the inherent problem of having to send the input data to the secondary threads/processes. If the function runs very fast compared to the time it would take to transmit the amount of data we would have to send over, it might be best to use the single-threaded approach:

q)los:" "vs 10000000?" ",.Q.a
q)\ts count each los
10 14330128
q)\ts count peach los
37 12583648

The .Q.fc function works best when repeatedly calling a function is expensive - making peach suboptimal - but not splitting the work up among the secondary threads/processes (i.e., simply using f v) also results in poor performance. This can happen, for example, when some information must be fetched from an HTTP connection, but then slow calculations are done on the data that's received back.

Finally, peach is preferable when the operations cannot be done in a vectorial way, but the calculations done by the function are heavy enough that they warrant multithreading.

q)ff:{sum 10000000?1f}
q)\ts ff each 100000
150 134218016
q)\ts ff peach 100000
37 134218016

Test everything

Quite a few of the above guidelines contain examples where different situations require different approaches. It is therefore essential that any code for which performance is critical is thoroughly tested on mock data before deploying it in a real-world environment.

To test the general hardware setup, the throughput.q script and the nano utility are a good starting point for measuring the performance of the systems where KDB-X will be deployed. The results of these preliminary tests can be used to stress-test different CPU, disk and network configurations running KDB-X.

Throughput

This test measures the time to insert a million rows into a table, one at a time, and also as bulk inserts of 10, 100, 1000, and 10000 rows.

To run the test, simply load throughput.q into a q session:

$ q throughput.q

On an AMD Opteron box with 4 GB of RAM, we get

0.672 million inserts per second (single insert)
6.944 million inserts per second (bulk insert 10)
20.408 million inserts per second (bulk insert 100)
24.39 million inserts per second (bulk insert 1000)
25 million inserts per second (bulk insert 10000)

On an AMD Turion64 laptop with 0.5 GB of RAM

0.928 million inserts per second (single insert)
8.065 million inserts per second (bulk insert 10)
16.129 million inserts per second (bulk insert 100)
16.129 million inserts per second (bulk insert 1000)
16.129 million inserts per second (bulk insert 10000)

On a 12-core Mac mini with 64 GB of RAM

2.639 million inserts per second (single insert)
25 million inserts per second (bulk insert 10)
166.667 million inserts per second (bulk insert 100)
333.333 million inserts per second (bulk insert 1000)
333.333 million inserts per second (bulk insert 10000)

Other hardware tests

The nano utility was created to measure a system's CPU, memory and disk input/output. Refer to its readme for instructions on how to use it.

Summary

Most of this document emphasized knowing the q programming language itself, through

direct knowledge of the available keywords and operators
where and how these work best
direct knowledge of the paradigms most often used in q
knowing the data and database structure

However, it is still very important to test different approaches as there might be a different right answer to the problem depending on what the exact programming situation is.