Performance tips
This page includes PyKX performance optimization tips, including insights on parallelization, secondary q threads, multithreading, and peach.
To get the best performance out of PyKX, follow these guidelines. Note that this page focuses on efficiently interfacing between Python and q, rather than optimizing Python or q individually.
General guidelines
-
Avoid Unnecessary Conversions. Avoid converting K objects with their
.py/.np/.pdmethods unless necessary. Often, the K object itself is sufficient. -
Avoid Nested Columns when converting q tables into Pandas dataframes, as this currently incurs data copy.
-
Do as Little Work as Necessary. Convert only what is needed. For example, instead of converting an entire q table to a dataframe, convert only the required columns into Numpy arrays by indexing into the
pykx.Tableand calling.npon the columns. Use select statements and indexing to send only the necessary subset of data over an IPC connection. -
Prefer
.npand.pdOver.py. Use Numpy/Pandas conversions to avoid data copying where possible. Converting objects with.pyalways incurs a data copy and may not always be possible (for example, some K objects return themselves when.pyis called, such aspykx.Function) instances. -
Use
raw=Truefor Performance. When performance is more important than the richness of the output, use theraw=Truekeyword argument. This can be more efficient by skipping certain adjustments, such as:- Temporal epoch adjustments from
2000-01-01to1970-01-01. - Converting q
GUIDsto PythonUUIDobjects (they will come through as complex numbers instead). - Converting bytes into strings.
- Temporal epoch adjustments from
-
Let q do the heavy lifting. When using licensed mode, prefer q code and functions (like
q.avg,q.sdev) over pure Python code. This is similar to using Numpy functions for Numpy arrays instead of pure Python.- Numpy functions on K vectors converted to Numpy arrays perform well, even with conversion overhead.
- When using an IPC connection to a remote q process, use q code to pre-process data and reduce the workload on Python.
- Avoid converting large data from Python to q. Conversions from q to Python (via Numpy) often avoid data copying, but Python to q conversions always copy the data.
Parallelization
Parallelization involves distributing computational tasks across multiple threads to improve performance and efficiency.
Use the following methods if you want to allow PyKX to handle large-scale data processing tasks efficiently by utilizing the available computational resources: secondary q threads, multithreading, or peach.
Secondary q threads
PyKX starts embedded q with as many secondary q threads enabled as are available. q automatically uses these threads to parallelize some computations as it deems appropriate. You can use the QARGS environment variable to provide command-line arguments and other startup flags to q/PyKX, including the number of secondary threads:
QARGS='-s 0' python # disable secondary threads
QARGS='-s 12' python # use 12 secondary threads by default
- The value set using
-ssets both the default and the maximum available to the process; you can't change it after importing PyKX. pykx.q.system.max_num_threadsshows the maximum number of threads and cannot be changed.pykx.q.system.num_threadsshows the current number of threads in use. It starts at the maximum value but can be set to a lower number.
Multithreading
By default, PyKX doesn’t support calling q from multiple threads in a Python process due to the Global Interpreter Lock GIL. Enabling the PYKX_RELEASE_GIL environment variable drops the GIL when calling q, making it unsafe to call q from multiple threads. To ensure thread safety, you can also enable the PYKX_Q_LOCK environment variable, which adds a re-entrant lock around q. Learn how to enable multithreaded execution and set up a Python process using PyKX to call into EmbeddedQ from multiple threads
Peach
Using the peach function in q to call Python is not supported unless you enable the PYKX_RELEASE_GIL setting. Without enabling this setting, the process will hang indefinitely.
For example, calling from Python into q into Python works normally:
>>> kx.q('{x each 1 2 3}', lambda x: range(x))
pykx.List(pykx.q('
,0
0 1
0 1 2
'))
But, by default, using peach to call from Python into q and back into Python hangs:
>>> kx.q('{x peach 1 2 3}', lambda x: range(x)) # Warning: will hang indefinitely
However, if you enable PYKX_RELEASE_GIL, it works:
>>> import os
>>> os.environ['PYKX_RELEASE_GIL'] = '1'
>>> import pykx as kx
>>> kx.q('{x peach 1 2 3}', lambda x: range(x))
pykx.List(pykx.q('
,0
0 1
0 1 2
'))