Package Content

As discussed here, packages contain the following fundamental components presently:

A manifest file describing the content of a package and its dependencies.
An initialization script defining the file/files to be loaded when a package is loaded.
Arbitrary code written in Python/q which can be used when loaded by a package.
Within your source code User-Defined Functions (UDFs), as described below, can be used to define functions of particular significance, namely those which are intended to be deployed directly to a pipeline.

Package Structure

A package in its simplest form generated by the CLI consists of the following structure:

$ kxi package init test-package
$ tree test-package
test-package
├── init.q
└── manifest.yaml

You can extend this to contain arbitrary code; for example, in the following more complex package containing code in an arbitrary structure provided the manifest.yaml file exists in-place:

└── ml
    ├── init.q
    ├── manifest.yaml
    ├── ml.q
    └── machine_learning
        ├── preprocessing
        │     └── preproc.q
        ├── model.py
        └── model.q

Using the APIs provided you can then load the code as described within the q and Python API outlines.

Manifest File

The manifest.yaml file which is present when a new package is initialized is centrally important to the use of a package. Without a defined manifest.yaml file, a package cannot be used by kdb Insights Enterprise or the APIs provided for package interaction.

On initialization of a package you are presented with a manifest.yaml file with the following structure:

name: qpackage
version: 0.0.1
license: ''
dependencies: {}
entrypoints:
  default: init.q
metadata:
  description: ''
  authors:
    bob:
      email: ''
  entitlements: {}
system:
  _pakx_version: 1.1.0
databases: {}
pipelines: {}
udfs:
  names:
    - udf

The following table provides a brief description of each configurable section within the manifest.yaml and whether definition of its content is required for the package to be used effectively.

Warning

The keys defined at initialization must not be deleted.

section	description	required
`name`	The name associated with the package by default when building it.	`yes`
`version`	The version associated with the package by default when building it.	`yes`
`entrypoints`	The set of possible methods by which a package can be loaded. More information available here.	`yes`
`license`	The relative path to the license file under which the defined package is intended to be released.	`no`
`metadata`	Information about the package contents and the users who have contributed to it.	`no`
`dependencies`	Any explicit dependencies on additional packages. More information available here.	`no`
`system`	Information about the system conditions under which the package was generated. Presently this includes the version of the `pakx` Python package which was used to create the package.	`no`
`pipelines`	Currently unused, this denotes any Stream Processor pipeline definitions within a package.	`no`
`databases`	Currently unused, this denotes the databases that have been defined within a package.	`no`
`udfs.names`	This denotes the tagged names which are searched when parsing the package text for user-defined functions.	`yes`

Entrypoints

Entrypoints define the q/Python files which can be used as the initialization script for a package. The default entrypoint used when loading a package is default and is defined as init.q, this file is used when a package is loaded with no specific entrypoint defined. You can update this entrypoint to be any file relative to the package root i.e.

entrypoints:
    default: src/init.q

For more advanced usage you can specify multiple entrypoints for a package, allowing sub-sections of a code-base to be loaded independently. This is particularly useful when attempting to split code based on the area of an application it is intended to be used within, for example the following could define entrypoints specific to the pipelines and data access processes and aggregators separately.

entrypoints:
    default: init.q
    sp: src/sp.q
    data-access: src/da.q
    aggregator: src/agg.q

The API which provides you with the ability to load specific entrypoints is defined in q here and in Python here.

Warning

The use of Python entrypoints is currently a beta feature and still in active development. It is supported only when using the Python API independently of kdb Insights Enterprise; for example, when developing packages for use within kdb Insights Enterprise, entrypoints must at present be defined with a *.q extension.

Dependencies

The dependencies section of the manifest.yaml file outlines any external dependencies on which the package being defined is explicitly dependent. The expected structure for defining dependencies is as follows:

dependencies:
- name: package
  location: ''
  repo: ''
  version: ''

The keys within this dependency structure relate to the following:

key	description
`package`	The name of the package to be retrieved as a dependency.
`location`	The storage location from which a package is to be retrieved, one of `local`, `github`, `gitlab` or `kx-nexus`.
`repo`	The repository/path location from which the dependency is to be retrieved.
`version`	The version of the package `dependency-name` which is to be retrieved as a dependency.

For completeness we will outline each location option separately and the underlying structure of the request completed when retrieving the requested dependency.

GithubGitlabkx-nexus

Required environment variables:

GITHUB_TOKEN this token is required to allow a user download artifacts from github and can be generated by following the instructions outlined here

The following is an example request which would download the package test-package.1.0.0.kxi based on a tagged release 1.0.0 of the repository github.com/test_user/test_repo.

dependencies:
  - name: test-package:
    location: github
    repo: test_user/test_repo
    version: 1.0.0

The underlying URL against which this request is executed is as following

https://github.com/{package.repo}/release/download/{package.version}/{package.name}-{package.version}.kxi

Required environment variables:

GITLAB_TOKEN this token is required to allow a user download artifacts from gitlab and can be generated by following the instructions outline here

The following is an example request which would download the package test-package.1.0.0.kxi based on a tagged release 1.0.0 of the repository https://gitlab.com/test_user/test_repo.

dependencies:
  - name: test-package:
    location: gitlab
    repo: test_user/test_repo
    version: 1.0.0

The underlying URL against which this request is executed is as following

https://gitlab.com/api/v4/projects/{package.repo}/packages/generic/{package.name}/{package.version}/{package.name}-{package.version}.kxi

Required environment variables:

KX_NEXUS_USER The username associated with a users access to the KX External Nexus
KX_NEXUS_PASS The password associated with a users access to the KX External Nexus

The following is an example request which would download the package test-package.1.0.0.kxi based on a tagged release 1.0.0 stored at the location test_root/test_package within the packages store for the KX Nexus.

dependencies:
  - name: test-package
    location: kx-nexus
    repo: test_user/test_repo
    version: 1.0.0

Configuration of additional dependencies options can be done within the ~/.insights/pakx-config configuration file which allows for modifications to the retrieval locations and added sources.

Adding Local dependencies

When adding local dependencies there are a few constraints:

Only kxi files can be added as local dependencies
These can be referenced using path with should be the absolute filepath

The following is an example request which would find test-package.1.0.0.kxi based on a tagged release 1.0.0 stored at the location path/to/ on the local host.

 dependencies:
   - name: test-package:
     path: path/to/test-package-1.0.0.kxi

The version in the manifest.yaml will take precedence over the version in the filepath

User-Defined Functions

User-Defined Functions (UDFs) are functions written in Python or q which have special meaning within kdb Insights Enterprise. They are used in the deployment of named functions from a package to a pipeline. The addition of UDFs is motivated by the need for many users to define analytics in a streaming context while abstracting the underlying implementation logic and language used to define the UDF. This can be particularly useful in organizations with limited numbers of either q or Python developers who wish to make the most of their development resources by allowing experts in these languages to define functionality that can be used by users of the other.

Within the kdb Insights Enterprise UDFs are presently supported for use within a pipeline as the input to any of the functions nodes map, filter, merge, split etc allowing a user to specify persisted custom logic to be associated with a pipeline.

Defining a UDF

You can define UDFs within packages through the use of comments in q and decorators in Python. These constructs provide an association between the configuration of a UDF and the function linked with the UDF. In each case the following general construct is used:

qPython

// @udf.*

from kxi.packages.decorators import udf

@udf.*

Where in each case * within the definition @udf.* can be one of the following:

value	description	required	default
`name`	The name by which the underlying UDF will be associated when referenced by Insights APIs.	`yes`	`N/A`
`description`	A user supplied description allowing a user to discern the motivation for the UDF.	`no`	`""`
`tag`	A user specified tag outlining where in an Insights deployment the UDF is to be used, this information is not actioned but defined to allow segmentation of user code.	`no`	`""`
`category`	A user specified category/list of categories which can be used to define within a tagged section of the Insights deployment where the UDF is to be deployed for example `@udf.category(["map", "filter"])` to define usage within a `map` and `filter` node of a Pipeline.	`no`	`""`

The following provides examples of a number of fully defined UDFs within each language:

q

Fully-DescribedMinimal-Information

// @udf.name("custom_map")
// @udf.description("Custom map function providing filtering against incoming data for a specified column and maximum threshold.")
// @udf.tag("sp")
// @udf.category("map")
.test.my_custom_udf:{[table;params]
  select from table where params[`column]>params`threshold
  }

// @udf.name("custom_map")
.test.my_custom_udf:{[table;params]
  select from table where params[`column]>params`threshold
  }

Python

Fully-DescribedMinimal-Information

import kxi.packages as pakx
from pakx.decorators import udf

import numpy as np

@udf.name('custom_py_map')
@udf.description('Custom Python UDF making use of numpy')
@udf.tag('sp')
@udf.category('map')
def py_udf(table, params):
    mod_column = table[params['column']]
    # Multiply the content of the column to be modified by random values between 0 and 1
    table[params['column']] = mod_column * np.random.random_sample(len(mod_column),)
    return(table)

import kxi.packages as pakx
from pakx.decorators import udf

import numpy as np

@udf.name('custom_py_map')
def py_udf(table, params):
    mod_column = table[params['column']]
    # Multiply the content of the column to be modified by random values between 0 and 1
    table[params['column']] = mod_column * np.random.random_sample(len(mod_column),)
    return(table)

Usage

As noted above, presently UDFs can be used within a Stream Processor pipeline. This is supported within kdb Insights Enterprise within the drag and drop pipeline UI or via the definition of pipelines in the Query window.

Within the context of the Pipeline, UDFs are retrieved using the .qsp.udf and qsp.udf functions in q and Python respectively.

For examples of their usage see the kdb Insights Enterprise quickstart guide here.

Constraints

The definition of your UDFs comes with the following constraints:

A UDF must take two or more parameters with a maximum of eight parameters supported.
The final parameter in the UDF is a reserved parameter (thus the maximum number of non reserved user parameters is seven) used to modify the UDF behavior for execution. When loading a UDF within a pipeline, this parameter is auto populated as an empty dictionary unless otherwise specified.
If defined in q, the function which is to be defined as a UDF must be presented beneath the relevant comment block to which it is associated with its full namespace definition, namely:

Supported-BehaviourIncorrect-Behaviour

\d .test

pi:3.14

square:{x wsum x}

// @udf.name("test")
// @udf.description("This is correct as UDF will be resolved in correct namespace")
.test.user_defined_function:{[data;params]pi*square data}

\d .test

pi:3.14

square:{x wsum x}

// @udf.name("test")
// @udf.description("This is incorrect as UDF will not resolve .test namespace")
user_defined_func:{[data;params]pi*square data}

Custom UDF definitions

In the above examples, all UDFs, in both q and python, have been defined using the syntax @udf. udf is the default keyword used to define UDFs, however, it is possible to define UDFs using a custom keyword, for example @myudf could be used. Here are some examples:

qPython

// @myudf.name("custom_map")
// @myudf.description("Custom map function providing filtering against incoming data for a specified column and maximum threshold.")
// @myudf.tag("sp")
// @myudf.category("map")
.test.my_custom_udf:{[table;params]
  select from table where params[`column]>params`threshold
  }

from kxi.packages.decorators import udf as myudf

@myudf.name('custom_py_map')
@myudf.description('Custom Python UDF making use of numpy')
@myudf.tag('sp')
@myudf.category('map')
def py_udf(table, params):
    mod_column = table[params['column']]
    # Multiply the content of the column to be modified by random values between 0 and 1
    table[params['column']] = mod_column * np.random.random_sample(len(mod_column),)
    return(table)

In order to list and load UDFs defined using custom keywords, a udf_sym (or list of such symbols) needs to be passed to the listing functions alongside the path. Further details on this are described in the API sections for Python and q.

Note

All keywords used to define UDFs within a package must be added to the udfs section in the packages manifest file. This is important for deployment as any UDFs defined using keywords that are not listed in the manifest file are not retrievable.