Realtime UDFs in Python

The realtime UDF framework supports Python 3 integration out of the box.

Python UDFs can be defined as either simple functions or as class methods.

Simple function

Python function example with argument:

def pyUDF(data):
    return data

This function takes a data frame and returns it, but represents the data flowing through a Python function. The function name must match the file name.

Class method

Python method example with argument:

class pyUDF:

    def myMethod(self,data):
        return data

The class name must match the file name. The class should be instantiated either in the initialization function using the ml.q library, or at the bottom of the Python UDF script. Python UDFs must have .p extensions (not .py) to be loaded by embedPy.

Pre- and post-execution functions

To map kdb+ tables to formats Python can use easily, Python real-time UDFs support pre- and post-execution functions. These are q functions that operate on the inbound data and outbound result respectively before and after execution of the Python RTUDF. A basic use case is to convert a q table to a pandas dataframe.

Pre- and post-execution functions are optional. If unconfigured or left blank, no function will be run and input/output will be directly handed between q and Python.

Pre-execution function example:

{[t;d] .ml.tab2df d}

This function makes use of the ml.q utility function to convert a q table to a dataframe. Post-execution function example:

{[t;d] .ml.df2tab d}

This function makes use of the ml.q utility function to convert a dataframe to a q table. The configuration parameter for pre- and post-execution functions on Python UDFs is separate to the main realtime config. The parameter is .daas.udf.pythonRTUDFConfig.

Parameter Description Example
udfName The name of the Python UDF pythonUDF
preExFunc Function to run before execution. Leave blank for none. qToDataframePreExFunc
postExFunc Function to run after execution. Leave blank for none. dataframeToQPostExFunc
method Method of class to be run, if not using a function. Leave blank if using function myMethod

The configuration can be managed via the command-line interface.

Required dependencies

  • Python 3.6/3.7
  • KX [embedPy]code.kx.com/q/ml/embedpy/) (installable TGZ)
  • Multiprocessing (Python library)
  • ml.q (KX Machine Learning library)
  • Pandas (Python library)

Recommended dependencies enable smoother conversion between q tables and Python, as in the pre-execution function example above.

Parallel execution of Python UDFs

An added benefit of Python UDFs is the ability to parallelize the execution. The framework makes use of the multiprocessing module in Python to have a pool of Python processes running behind each embedPy worker node.

This means Python RTUDFs that operate on the exact same set of data set can be run simultaneously. This will be determined by UDFs that share a trigger function, data requirement, and pre-execution function.

The Python multiprocessing pool appears as child processes of the q worker node when viewed through ps -ef.