Performance tips
This page includes PyKX performance optimization tips, including insights on parallelization, secondary q threads, multithreading, and peach.
To get the best performance out of PyKX, follow these guidelines. Note that this page focuses on efficiently interfacing between Python and q, rather than optimizing Python or q individually.
General guidelines
-
Avoid Unnecessary Conversions. Avoid converting K objects with their
.py
/.np
/.pd
methods unless necessary. Often, the K object itself is sufficient. -
Avoid Nested Columns when converting q tables into Pandas dataframes, as this currently incurs data copy.
-
Do as Little Work as Necessary. Convert only what is needed. For example, instead of converting an entire q table to a dataframe, convert only the required columns into Numpy arrays by indexing into the
pykx.Table
and calling.np
on the columns. Use select statements and indexing to send only the necessary subset of data over an IPC connection. -
Prefer
.np
and.pd
Over.py
. Use Numpy/Pandas conversions to avoid data copying where possible. Converting objects with.py
always incurs a data copy and may not always be possible (for example, some K objects return themselves when.py
is called, such aspykx.Function
) instances. -
Use
raw=True
for Performance. When performance is more important than the richness of the output, use theraw=True
keyword argument. This can be more efficient by skipping certain adjustments, such as:- Temporal epoch adjustments from
2000-01-01
to1970-01-01
. - Converting q
GUIDs
to PythonUUID
objects (they will come through as complex numbers instead). - Converting bytes into strings.
- Temporal epoch adjustments from
-
Let q do the heavy lifting. When using licensed mode, prefer q code and functions (like
q.avg
,q.sdev
) over pure Python code. This is similar to using Numpy functions for Numpy arrays instead of pure Python.- Numpy functions on K vectors converted to Numpy arrays perform well, even with conversion overhead.
- When using an IPC connection to a remote q process, use q code to pre-process data and reduce the workload on Python.
- Avoid converting large data from Python to q. Conversions from q to Python (via Numpy) often avoid data copying, but Python to q conversions always copy the data.
Parallelization
Parallelization involves distributing computational tasks across multiple threads to improve performance and efficiency.
Use the following methods if you want to allow PyKX to handle large-scale data processing tasks efficiently by utilizing the available computational resources: secondary q threads, multithreading, or peach
.
Secondary q threads
PyKX starts embedded q with as many secondary q threads enabled as are available. q automatically uses these threads to parallelize some computations as it deems appropriate. You can use the QARGS
environment variable to provide command-line arguments and other startup flags to q/PyKX, including the number of secondary threads:
QARGS='-s 0' python # disable secondary threads
QARGS='-s 12' python # use 12 secondary threads by default
- The value set using
-s
sets both the default and the maximum available to the process; you can't change it after importing PyKX. pykx.q.system.max_num_threads
shows the maximum number of threads and cannot be changed.pykx.q.system.num_threads
shows the current number of threads in use. It starts at the maximum value but can be set to a lower number.
Multithreading
By default, PyKX doesn’t support calling q from multiple threads in a Python process due to the Global Interpreter Lock GIL. Enabling the PYKX_RELEASE_GIL
environment variable drops the GIL when calling q, making it unsafe to call q from multiple threads. To ensure thread safety, you can also enable the PYKX_Q_LOCK
environment variable, which adds a re-entrant lock around q. Learn how to enable multithreaded execution and set up a Python process using PyKX to call into EmbeddedQ from multiple threads
Peach
Using the peach
function in q to call Python is not supported unless you enable the PYKX_RELEASE_GIL
setting. Without enabling this setting, the process will hang indefinitely.
For example, calling from Python into q into Python works normally:
>>> kx.q('{x each 1 2 3}', lambda x: range(x))
pykx.List(pykx.q('
,0
0 1
0 1 2
'))
But, by default, using peach
to call from Python into q and back into Python hangs:
>>> kx.q('{x peach 1 2 3}', lambda x: range(x)) # Warning: will hang indefinitely
However, if you enable PYKX_RELEASE_GIL
, it works:
>>> import os
>>> os.environ['PYKX_RELEASE_GIL'] = '1'
>>> import pykx as kx
>>> kx.q('{x peach 1 2 3}', lambda x: range(x))
pykx.List(pykx.q('
,0
0 1
0 1 2
'))