Package Content

As discussed here, packages contain the following fundamental components presently:

A manifest file describing the content of a package and its dependencies.
An initialisation script defining the file/files to be loaded when a package is loaded.
Arbitrary code written in Python/q which can be used when loaded by a package.
- Within a user's source code User Defined Functions (UDFs) as described below can be used to define functions of particular significance. Namely those which are intended to be deployed directly to a pipeline.

Package Structure

A package in its simplest form generated by the CLI consists of the following structure:

$ kxi package init test_package
$ tree test_package
test_package
├── init.q
└── manifest.json

A user can extend this to contain arbitrary code for example in the following more complex package containing code in an arbitrary user structure:

└── ml
    ├── init.q
    ├── manifest.json
    ├── ml.q
    └── machine_learning
        ├── preprocessing
        │     └── preproc.q
        ├── model.py
        └── model.q

Using the APIs provided a user can then load the code provided within this structure as described within the q and Python API outlines.

Manifest File

The manifest.json file which is present when a new package is initialised is centrally important to the use of a package, without a defined manifest.json file a package can not be used by kdb Insights Enterprise or the APIs provided for package interaction.

On initialisation of a package a user is presented with a manifest.json file with the following structure:

{
    "name": "test_package",
    "version": "0.0.1",
    "entrypoints": {
        "default": "init.q"
    },
    "metadata": {
        "description": "",
        "authors": {
            "username": {
                "email": null
            }
        }
    },
    "dependencies": {}
}

The following table provides a brief description of each configurable section within the manifest.json and whether definition of its content is required for the package to be used effectively.

Note

The keys defined at initialization must not be deleted.

section	description	required
`name`	The name that will be associated with the package by default when building it.	`yes`
`version`	The version that will be associated with the package by default when building it.	`yes`
`entrypoints`	The set of possible methods by which a package can be loaded. More information available here	`yes`
`metadata`	Information about the package contents and the users who have contributed to it	`no`
`dependencies`	Any explicit dependencies on additional packages. More information available here	`no`

Entrypoints

Entrypoints define the q/Python files which can be used as the initialisation script for a package. The default entrypoint used when loading a package is default and is defined as init.q, this file is when a package is loaded with no specific entrypoint defined. A user can update this entrypoint to be any file relative to the package root i.e.

    "entrypoints": {
        "default": "src/init.q"
    },

A user can specify multiple entrypoints for their package allowing sub-sections of a code-base to be loaded independently, this is particularly useful when attempting to split code based on the area of an application it is intended to be used within, for example the following could define entrypoints specific to the Pipelines and Data Access Processes.

   "entrypoints": {
       "default": "init.q",
       "sp": "src/sp.q",
       "da": "src/da.q",
       "py": "src/py_file.py"
   },

The API which provides the ability to load specific entrypoints is defined in q here and in Python here.

Warning

The use of Python entrypoints is currently a beta feature and still in active development. It is supported only when using the Python API independent of kdb Insights Enterprise, as such entrypoints when developing packages for use within kdb Insights Enterprise entrypoints must at present be defined with a *.q extension.

Dependencies

The dependencies section of the manifest.json file outlines any external dependencies on which the package being defined is explicitly dependent. The expected structure for defining dependencies is as follows:

    "dependencies": {
        "package": {
            "location" : "",
            "repo": "",
            "version": ""
        }
    }

The keys within this dependency structure relate to the following:

key	description
`package`	The name of the package to be retrieved as a dependency.
`location`	The storage location from which a package is to be retrieved, one of `local`, `github`, `gitlab` or `kx-nexus`.
`repo`	The repository/path location from which the dependency is to be retrieved.
`version`	The version of the package `dependency-name` which is to be retrieved as a dependency.

For completeness we will outline each location option separately and the underlying structure of the request completed when retrieving the requested dependency.

GithubGitlabkx-nexus

Required environment variables:

GITHUB_TOKEN this token is required to allow a user download artifacts from github and can be generated by following the instructions outlined here

The following is an example request which would download the package test-package.1.0.0.kxi based on a tagged release 1.0.0 of the repository github.com/test_user/test_repo.

    "test-package": {
        "location": "github",
        "repo": "test_user/test_repo",
        "version": "1.0.0"
    }

The underlying URL against which this request is executed is as following

https://github.com/{package.repo}/release/download/{package.version}/{package.name}-{package.version}.kxi

Required environment variables:

GITLAB_TOKEN this token is required to allow a user download artifacts from gitlab and can be generated by following the instructions outline here

The following is an example request which would download the package test-package.1.0.0.kxi based on a tagged release 1.0.0 of the repository https://gitlab.com/test_user/test_repo.

    "test-package": {
        "location": "gitlab",
        "repo": "test_user/test_repo",
        "version": "1.0.0"
    }

The underlying URL against which this request is executed is as following

https://gitlab.com/api/v4/projects/{package.repo}/packages/generic/{package.name}/{package.version}/{package.name}-{package.version}.kxi

Required environment variables:

KX_NEXUS_USER The username associated with a users access to the KX External Nexus
KX_NEXUS_PASS The password associated with a users access to the KX External Nexus

The following is an example request which would download the package test-package.1.0.0.kxi based on a tagged release 1.0.0 stored at the location test_root/test_package within the packages store for the KX Nexus.

    "test-package": {
        "location": "kx-nexus",
        "repo": "test_user/test_repo",
        "version": "1.0.0"
    }

Adding Local dependencies

When adding local dependencies there are a few constraints:

Only kxi files can be added as local dependencies
These can be referenced using path with should be the absolute filepath

The following is an example request which would find test-package.1.0.0.kxi based on a tagged release 1.0.0 stored at the location path/to/ on the local host.

    "test-package": {
        "path": "path/to/test-package-1.0.0.kxi"
    }

Note

The version in the manifest.json will take precedence over the version in the filepath

User Defined Functions

User Defined Functions (UDFs) are functions written in Python or q which have special meaning within kdb Insights Enterprise and are used in the deployment of named functions from a package to a Pipeline. The addition of UDFs is motivated by the need for users to define analytics in a streaming context while abstracting the underlying implementation logic and language used to define the UDF. This can be particularly useful in organisations with limited numbers of either q or Python developers who wish to make the most out of their development resources by allowing experts in these languages to define functionality that can be used by users of the other.

Within kdb Insights Enterprise UDFs are presently supported for use within a pipeline as the input to any of the functions nodes map, filter, merge, split etc allowing a user to specify persisted custom logic to be associated with a pipeline.

Defining a UDF

UDFs are defined within packages through the use of comments in q and decorators in Python. These constructs provide an association between the configuration of a UDF and the function linked with the UDF. In each case the following general construct is used:

qPython

// @udf.*

from kxi.packages.decorators import udf

@udf.*

Where in each case * within the definition @udf.* can be one of the following:

value	description	required	default
`name`	The name by which the underlying UDF will be associated when referenced by Insights APIs.	`yes`	`N/A`
`description`	A user supplied description allowing a user to discern the motivation for the UDF.	`no`	`""`
`tag`	A user specified tag outlining where in an Insights deployment the UDF is to be used, this information is not actioned but defined to allow segmentation of user code.	`no`	`""`
`category`	A user specified category/list of categories which can be used to define within a tagged section of the Insights deployment where the UDF is to be deployed for example `@udf.category(["map", "filter"])` to define usage within a `map` and `filter` node of a Pipeline.	`no`	`""`

The following provides examples of a number of fully defined UDFs within each language:

q

Fully-DescribedMinimal-Information

// @udf.name("custom_map")
// @udf.description("Custom map function providing filtering against incoming data for a specified column and maximum threshold.")
// @udf.tag("sp")
// @udf.category("map")
.test.my_custom_udf:{[table;params]
  select from table where params[`column]>params`threshold
  }

// @udf.name("custom_map")
.test.my_custom_udf:{[table;params]
  select from table where params[`column]>params`threshold
  }

Python

Fully-DescribedMinimal-Information

import kxi.packages as pakx
from pakx.decorators import udf

import numpy as np

@udf.name('custom_py_map')
@udf.description('Custom Python UDF making use of numpy')
@udf.tag('sp')
@udf.category('map')
def py_udf(table, params):
    mod_column = table[params['column']]
    # Multiply the content of the column to be modified by random values between 0 and 1
    table[params['column']] = mod_column * np.random.random_sample(len(mod_column),)
    return(table)

import kxi.packages as pakx
from pakx.decorators import udf

import numpy as np

@udf.name('custom_py_map')
def py_udf(table, params):
    mod_column = table[params['column']]
    # Multiply the content of the column to be modified by random values between 0 and 1
    table[params['column']] = mod_column * np.random.random_sample(len(mod_column),)
    return(table)

Usage

As noted above, presently UDFs can be used within a Pipeline. This is supported within the kdb Insights Enterprise within the drag and drop Pipeline UI or via the definition of Pipelines in the Explore window.

Within the context of the Pipeline, UDFs are retrieved using the .qsp.udf and qsp.udf functions in q and Python respectively.

For examples of their usage see the kdb Insights Enterprise quickstart guide here.

Constraints

The definition of UDFs comes with the following constraints:

A UDF must take two or more parameters with a maximum of eight parameters supported.
The final parameter in the UDF is a reserved parameter used to modify the UDF behaviour for execution, when loading a UDF within a Pipeline this parameter will be auto populated as an empty dictionary unless otherwise specified.
If defined in q, the function which is to be defined as a UDF must be presented beneath the relevant comment block to which it is associated with its full namespace definition, namely:

Incorrect-BehaviourSupported-Behaviour

\d .test

pi:3.14

square:{x wsum x}

// @udf.name("test")
// @udf.description("This is incorrect as UDF will not resolve .test namespace")
user_defined_func:{[data;params]pi*square data}

\d .test

pi:3.14

square:{x wsum x}

// @udf.name("test")
// @udf.description("This is correct as UDF will be resolved in correct namespace")
.test.user_defined_function:{[data;params]pi*square data}