User-Defined Functions

User-Defined Functions (UDFs) are functions written in Python or q which have special meaning within kdb Insights Enterprise. They are used in the deployment of named functions from a package to a pipeline. The addition of UDFs is motivated by the need for many users to define analytics in a streaming context while abstracting the underlying implementation logic and language used to define the UDF. This can be particularly useful in organizations with limited numbers of either q or Python developers who wish to make the most of their development resources by allowing experts in these languages to define functionality that can be used by users of the other.

Within the kdb Insights Enterprise UDFs are presently supported for use within a pipeline as the input to any of the functions nodes map, filter, merge, split etc allowing a user to specify persisted custom logic to be associated with a pipeline.

Defining a UDF

You can define UDFs within packages through the use of comments in q and decorators in Python. These constructs provide an association between the configuration of a UDF and the function linked with the UDF. In each case the following general construct is used:

qPython

// @udf.*

from kxi.packages.decorators import udf

@udf.*

Where in each case * within the definition @udf.* can be one of the following:

value	description	required	default
`name`	The name by which the underlying UDF will be associated when referenced by Insights APIs.	`yes`	`N/A`
`description`	A user supplied description allowing a user to discern the motivation for the UDF.	`no`	`""`
`tag`	A user specified tag outlining where in an Insights deployment the UDF is to be used, this information is not actioned but defined to allow segmentation of user code.	`no`	`""`
`category`	A user specified category/list of categories which can be used to define within a tagged section of the Insights deployment where the UDF is to be deployed for example `@udf.category(["map", "filter"])` to define usage within a `map` and `filter` node of a Pipeline.	`no`	`""`

The following provides examples of a number of fully defined UDFs within each language:

qPython

Fully Specified

// @udf.name("custom_map")
// @udf.description("Custom map function providing filtering against incoming data for a specified column and maximum threshold.")
// @udf.tag("sp")
// @udf.category("map")
.test.my_custom_udf:{[table;params]
  select from table where params[`column]>params`threshold
  }

Minimal-Information

// @udf.name("custom_map")
.test.my_custom_udf:{[table;params]
  select from table where params[`column]>params`threshold
  }

Fully Specified

import kxi.packages as pakx
from pakx.decorators import udf

import numpy as np

@udf.name('custom_py_map')
@udf.description('Custom Python UDF making use of numpy')
@udf.tag('sp')
@udf.category('map')
def py_udf(table, params):
    mod_column = table[params['column']]
    # Multiply the content of the column to be modified by random values between 0 and 1
    table[params['column']] = mod_column * np.random.random_sample(len(mod_column),)
    return(table)

Minimal-Information

import kxi.packages as pakx
from pakx.decorators import udf

import numpy as np

@udf.name('custom_py_map')
def py_udf(table, params):
    mod_column = table[params['column']]
    # Multiply the content of the column to be modified by random values between 0 and 1
    table[params['column']] = mod_column * np.random.random_sample(len(mod_column),)
    return(table)

UDF Usage

As noted above, presently UDFs can be used within a Stream Processor pipeline. This is supported within kdb Insights Enterprise within the drag and drop pipeline UI or via the definition of pipelines in the Query window.

Within the context of the Pipeline, UDFs are retrieved using the .qsp.udf and qsp.udf functions in q and Python respectively.

For examples of their usage see the kdb Insights Enterprise quickstart guide here.

UDF Constraints

The definition of your UDFs comes with the following constraints:

A UDF must take two or more parameters with a maximum of eight parameters supported.
The final parameter in the UDF is a reserved parameter (thus the maximum number of non reserved user parameters is seven) used to modify the UDF behavior for execution. When loading a UDF within a pipeline, this parameter is auto populated as an empty dictionary unless otherwise specified.
If defined in q, the function which is to be defined as a UDF must be presented beneath the relevant comment block to which it is associated with its full namespace definition, namely:

Supported-BehaviourUnsdefined Behaviour

\d .test

pi:3.14

square:{x wsum x}

// @udf.name("test")
// @udf.description("This is correct as UDF will be resolved in correct namespace")
.test.user_defined_function:{[data;params]pi*square data}

\d .test

pi:3.14

square:{x wsum x}

// @udf.name("test")
// @udf.description("This is incorrect as UDF will not resolve .test namespace")
user_defined_func:{[data;params]pi*square data}

Loading files within packages

The process of adding code into your packages requires the ability to load code contained within other files within the package. Loading one file from another should not be completed using relative or absolute paths. Instead, the loading of files internal to your packages should be completed through use of the kxi.packages.packages.load_file and .kxi.packages.file.load functions for Python and q respectively. These functions load files relative to the root of the package being loaded or the package within which a UDF is being loaded. The use of the relative path from root can then be used to pin all loading from.

Examples of their usage within package files are as follows:

Pythonq

from kxi.packages import packages

# Load the file src/example_udf.py
packages.load_file("src/example_udf.py")

// Load the file src/example_udf.q
.kxi.packages.file.load["src/example_udf.q"]

Note

To facilitate the use of locked files the loading functionality will, by default, attempt to load the locked version of all files first followed by the loading of unlocked files.

Custom UDF definitions

In the above examples, all UDFs, in both q and python, have been defined using the syntax @udf. udf is the default keyword used to define UDFs, however, it is possible to define UDFs using a custom keyword, for example @myudf could be used. Here are some examples:

qPython

// @myudf.name("custom_map")
// @myudf.description("Custom map function providing filtering against incoming data for a specified column and maximum threshold.")
// @myudf.tag("sp")
// @myudf.category("map")
.test.my_custom_udf:{[table;params]
  select from table where params[`column]>params`threshold
  }

from kxi.packages.decorators import udf as myudf

@myudf.name('custom_py_map')
@myudf.description('Custom Python UDF making use of numpy')
@myudf.tag('sp')
@myudf.category('map')
def py_udf(table, params):
    mod_column = table[params['column']]
    # Multiply the content of the column to be modified by random values between 0 and 1
    table[params['column']] = mod_column * np.random.random_sample(len(mod_column),)
    return(table)

In order to list and load UDFs defined using custom keywords, a udf_sym (or list of such symbols) needs to be passed to the listing functions alongside the path. Further details on this are described in the API sections for Python and q.

Note

All keywords used to define UDFs within a package must be added to the udfs section in the packages manifest file. This is important for deployment as any UDFs defined using keywords that are not listed in the manifest file are not retrievable.