Realtime UDFs in Python¶
The realtime UDF framework supports Python 3 integration out of the box.
Python UDFs can be defined as either simple functions or as class methods.
Simple function¶
Python function example with argument:
def pyUDF(data):
return data
This function takes a data frame and returns it, but represents the data flowing through a Python function. The function name must match the file name.
Class method¶
Python method example with argument:
class pyUDF:
def myMethod(self,data):
return data
The class name must match the file name. The class should be instantiated either in the initialization function using the ml.q library, or at the bottom of the Python UDF script. Python UDFs must have .p extensions (not .py) to be loaded by embedPy.
Pre- and post-execution functions¶
To map kdb+ tables to formats Python can use easily, Python real-time UDFs support pre- and post-execution functions. These are q functions that operate on the inbound data and outbound result respectively before and after execution of the Python RTUDF. A basic use case is to convert a q table to a pandas dataframe.
Pre- and post-execution functions are optional. If unconfigured or left blank, no function will be run and input/output will be directly handed between q and Python.
Pre-execution function example:
{[t;d] .ml.tab2df d}
This function makes use of the ml.q utility function to convert a q table to a dataframe. Post-execution function example:
{[t;d] .ml.df2tab d}
This function makes use of the ml.q utility function to convert a dataframe to a q table. The configuration parameter for pre- and post-execution functions on Python UDFs is separate to the main realtime config. The parameter is .daas.udf.pythonRTUDFConfig.
| Parameter | Description | Example |
|---|---|---|
| udfName | The name of the Python UDF | pythonUDF |
| preExFunc | Function to run before execution. Leave blank for none. | qToDataframePreExFunc |
| postExFunc | Function to run after execution. Leave blank for none. | dataframeToQPostExFunc |
| method | Method of class to be run, if not using a function. Leave blank if using function | myMethod |
The configuration can be managed via the command-line interface.
Required dependencies¶
- Python 3.6/3.7
- KX [embedPy]code.kx.com/q/ml/embedpy/) (installable TGZ)
- Multiprocessing (Python library)
Recommended dependencies¶
Recommended dependencies enable smoother conversion between q tables and Python, as in the pre-execution function example above.
Parallel execution of Python UDFs¶
An added benefit of Python UDFs is the ability to parallelize the execution. The framework makes use of the multiprocessing module in Python to have a pool of Python processes running behind each embedPy worker node.
This means Python RTUDFs that operate on the exact same set of data set can be run simultaneously. This will be determined by UDFs that share a trigger function, data requirement, and pre-execution function.
The Python multiprocessing pool appears as child processes of the q worker node when viewed through ps -ef.