Performance Considerations

To get the best performance out of PyKX, follow the guidelines explained on this page. Note that this page doesn't concern itself with getting the best performance out of Python itself, or out of q itself. Rather this page is focused on how to interface between the two most efficiently.

Avoid converting K objects with their .py/.np/.pd/etc. methods. Oftentimes the K object itself is sufficient for the task at hand.
Do as little work as necessary:
- When conversion is necessary, only convert what is really needed. For instance, instead of converting an entire q table to a dataframe, perhaps only a subset of the columns need be converted into Numpy arrays. You could get these columns by indexing into the pykx.Table, then calling .np on the columns returned.
- When using an IPC connection, make use of select statements and indexing to only send the subset of the data you want to process in Python over the IPC connection.
Prefer using .np and .pd over .py. If a conversion must happen, try to stick to the Numpy/Pandas conversions which avoid copying data where possible. Converting objects with .py will always incur a data copy (if the conversion is possible at all - n.b. some K objects return themselves when .py is called on them, such as pykx.Function) instances.
Convert with the keyword argument raw=True when performance is more important than the richness of the output. Using a raw conversion can be much more efficient in many cases by not doing some work, such as adjusting the temporal epoch from 2000-01-01 to 1970-01-01, turning q GUIDs into Python UUID objects (instead they will come through as complex numbers, as that is the only widely available 128 bit type), converting bytes into strings, and more.
Avoid nested columns when converting q tables into Pandas dataframes, as this currently incurs a data copy.
Let q do the heavy lifting:
- When running in licensed mode, make use of q code and q functions (e.g. q.avg, q.sdev, etc.) instead of pure Python code. This is similar to how you should use Numpy functions to operate on Numpy arrays instead of pure Python code. Note that the performance of Numpy functions on K vectors that have been converted to Numpy arrays is often comparable, even when including the conversion overhead.
- When using an IPC connection to a remote q process, consider using q code to offload some of the work to the q process by pre-processing the data in q.
Avoid converting large amounts of data from Python to q. Conversions from q to Python (via Numpy) can often avoid copying data, but conversions from Python to q always incur a copy of the data.