Skip to content

Creating a package

This section provides you with a guide detailing how to create a package using the kdb Insights CLI, introduce code to this package and generate a versioned *.kxi package artifact which can be uploaded to an instance of kdb Insights Enterprise following instructions outlined here. The API documentation relating to package interactions via the CLI can be found here. This provides a breakdown of the commands available for interacting with generated packages.

Pre-requisites

Prior to attempting to create a package you must first ensure you have the following pre-requisites covered within your local environment or CI setup where the package creation/generation is being completed.

  1. You have installed the kdb Insights CLI following the instructions here
  2. Have updated the following environment variables to locations which respectively denote locations for the storage of installed local packages and artifacts:

    Environment Variable Default Value
    KX_PACKAGE_PATH /tmp/packages/
    KX_ARTIFACT_PATH /tmp/artifacts/

Initializing a package

Use the kdb Insights CLI to initialize packages; in particular, use the following command:

$ kxi package init --help
Usage: kxi package init [OPTIONS] [PATH_TO_PACKAGE]

  Creates a bare package at the specified target path.

  Note: this will not be saved to your KX_PACKAGE_PATH
  unless `install`  is explicitly run

Options:
  --force  Will overwrite if the dir is not empty
  --help   Show this message and exit.

When you are initializing a package locally it is important to take the following into account:

  • Do not initialize packages within the KX_PACKAGE_PATH or KX_ARTIFACT_PATH locations, which are required to be set as environment variables. Both locations expect to maintain a specific structure in order to discern appropriate packages and artifacts, respectively. Adding in development packages to these locations can result in errors.
  • Aside from the above restricted locations, packages can be initialized at any location within your workspace, however it is suggested that package development should take place from an isolated and empty directory to avoid polluting repositories/folders with irrelevant packages.

The following provides you with a basic example of creating a new package and outlines some of the important characteristics of this structure:

  1. Create a new empty directory and move into this location

    $ mkdir my_packages
    $ cd my_packages
    
  2. Initialize a package named test_pkg

    $ kxi package init test_pkg
    Creating package at location: test_pkg
    Writing test_pkg/manifest.json
    

Once the package has been initialized, you can start to inspect its contents further. For more information on this, read the documentation regarding package components here. This provides you with a breakdown of all configuration options available, plus how you can customize usage of your package.

Adding code to a package

Once a package has been created, you can begin to add code to the package as would be expected of any code repository. The following caveats/restrictions apply with regard to reserved folders/files within a package based on near-term expectations for new functionality:

  • Do not use folders named pipelines, reports, or databases at the root of your package.
  • Do not use the file names udfs.json and manifest.json at the root of your package. These can be populated through the use of the package API and as such should be avoided.

There are three important areas which should be understood when adding code to a package:

  1. Entrypoints
  2. User Defined Functions
  3. Loading files within packages

Entrypoints

As outlined here, entrypoints define the q/Python files which are used as the initialization script for a package. When loading a package using q or Python, entrypoints provide a method by which you can specify the sub-sections of your package's code to be loaded. This can be visualized as follows: assume you have the following entrypoint definition within your package's manifest.json file:

   "entrypoints": {
       "default": "init.q",
       "sp": "src/sp.q",
       "data-access": "src/da.q",
       "aggregator": "src/agg.q"
   },

In the above example data-access includes all code that is to be loaded within the data access processes of the database while sp denotes code that is specifically intended to be loaded within the Stream Processor. You can also assume that the default entrypoint should be used to load all code within the repository. Within the Python and q package APIs it is possible to load these entrypoints separately

  • Load the default entrypoint for version 1.0.0 of a package named test_pkg

    import kxi.packages as pakx
    pakx.init()
    pakx.packages.load("test_pkg", "1.0.0")
    
  • Load the non-default entrypoint sp for the same package version and name

    import kxi.packages as pakx
    pakx.init()
    pakx.packages.load("test_pkg", "1.0.0", "sp")
    
  • Load the default entrypoint for version 1.0.0 of a package named test_pkg

    q).kxi.packages.load["test_pkg";"1.0.0"]
    
  • Load the non-default entrypoint sp for the same package version and name

    q).kxi.packages.load["test_pkg";"1.0.0";"sp"]
    

Of particular importance when dealing with entrypoints are the use of entrypoints named data-access and aggregator when querying the kdb Insights Enterprise database using Custom APIs. As outlined within this document these entrypoints determine the code that is loaded by the data access and aggregator processes respectively when loading Custom Query APIs.

User Defined Functions

As outlined in depth here, there are functions written in Python or q which have special meaning within kdb Insights Enterprise. You can make use of them specifically to deploy named, language agnostic functions within a package to a Stream Processor.

You can define UDFs in Python/q using decorators or comments respectively as outlined here. When loaded, UDFs only load the file within which they are defined. This means that when you are defining UDFs, it is important to ensure that all logic required to execute the UDF is defined within the file. The following outlines some examples of UDF definitions in Python and q:

import kxi.packages as pakx
from pakx.decorators import udf

import numpy as np

@udf.name('custom_py_map')
@udf.description('Custom Python UDF making use of numpy')
@udf.tag('sp')
@udf.category('map')
def py_udf(table, params):
    mod_column = table[params['column']]
    # Multiply the content of the column to be modified by random values between 0 and 1
    table[params['column']] = mod_column * np.random.random_sample(len(mod_column),)
    return(table)
// @udf.name("custom_map")
// @udf.description("Custom map function providing filtering against incoming data for a specified column and maximum threshold.")
// @udf.tag("sp")
// @udf.category("map")
.test.my_custom_udf:{[table;params]
  select from table where params[`column]>params`threshold
  }

Loading files within packages

The process of adding code into your packages requires the ability to load code contained within other files within the package. Loading one file from another should not be completed using relative or absolute paths. Instead, the loading of files internal to your packages should be completed through use of the kxi.packages.packages.load_file and .kxi.packages.file.load functions for Python and q respectively. These functions load files relative to the root of the package being loaded or the package within which a UDF is being loaded. The use of the relative path from root can then be used to pin all loading from.

Examples of their usage within package files are as follows:

from kxi.packages import packages

# Load the file src/example_udf.py
packages.load_file("src/example_udf.py")
// Load the file src/example_udf.q
.kxi.packages.file.load["src/example_udf.q"]

Note

To facilitate the use of locked files the loading functionality will, by default, attempt to load the locked version of all files first followed by the loading of unlocked files.

Creating an uploadable artifact from a package

Generation of a package artifact for upload to your kdb Insights Enterprise instance is facilitated through the use of the command kxi package packit which is defined as follows:

Usage: kxi package packit [OPTIONS] SOURCE

  Create a package artifact given source code directory.

Options:
  --keep-unlocked      Whether to keep the original unlocked q files once the
                       locked versions have been created.
  --lock-q-files       Whether to lock the contents of the q files in the
                       package not.
  --all-deps           Packages all deps together.
  --tag                Tag for release, omit the random dev hash suffix.
  --version TEXT       Override the version of the package.
  --package-name TEXT  Override the auto-generated package name.
  --help               Show this message and exit.

This provides you with flexibility in how to bundle packages for upload depending on business requirements. The following provides some additional information on each of the options available allowing you to customize the creation of a package artifact locally or within your Continuous Integration (CI) environment.

Generating named, versioned production packages.

The naming and versioning of packages is by default determined by the contents of the manifest.json file contained within the package. For many use-cases this is sufficient to allow packages to be generated, however, when automating this within Continuous Integration/Continuous Deployment (CI/CD) processes, this limitation can make the process of updating versions to align with release artifacts difficult. The use of --version in particular allows you to take the tag version of the code repository defining your package as the version to be used when creating an artifact. In a similar vein --tag provides the ability to determine, when generating your package, what is a production and what is a development version of your package. The following outlines the difference when generating a package using the --tag option:

$ kxi package packit .
Package validation was successful - all rules passed. 
Creating package from .
Package created: /tmp/artifacts/test_pkg1-0.0.1_5E8815A6.kxi
test_pkg1-0.0.1_5E8815A6.kxi 
$ kxi package packit --tag .
Package validation was successful - all rules passed.
Creating package from .
Package created: /tmp/artifacts/test_pkg1-0.0.1.kxi
test_pkg1-0.0.1.kxi

As can be noticed above, the key difference between the two options is the omission of the dev_hash when using the --tag option. This is useful within the context of a CI/CD workflow as it allows the same package to be generated within the testing and distribution phases of your workflow without requiring a change to the underlying code.

Usage of the --tag, --name and --version options results in changes to the manifest.json file contained within a package artifact. Specifically, your manifest.json file is updated to reflect the name and version given during the packit phase of your workflow.

Including all required information

By default, creating a package artifact includes only the content of the package you have explicitly defined, however your package may include dependencies as outlined here. kdb Insights Enterprise only provides push access for packages introduced to the system. As such, should your package need dependencies, you are reliant on the dependencies already existing on your instance of kdb Insights Enterprise.

To counteract this, the option --all-deps is available. It allows you to include within your package, as *.kxi files, all dependencies your package requires and their sub-dependencies. This functionality is extremely useful but comes at a cost with respect to the size of your artifacts, which now include all dependent packages.

Locking your code

In certain circumstances, such as the dissemination of sensitive IP, it may be useful for you to be able to lock q code scripts. This process removes the ability to read the content of files and creates a locked representation of functions, which removes the ability to display the function content.

To lock code code, you must have PyKX installed and operating in 'licensed' mode.

The following provides you with two examples of its usage.

  • Locking all code and removing the original copies of the q files
  • Locking all code and keeping the original copies of the q files
$ kxi package packit --lock-q-files --tag .
Package validation was successful - all rules passed.
Creating package from .
Package created: /tmp/artifacts/test_pkg-0.0.1.kxi
test_pkg-0.0.1.kxi
$ 
Package validation was successful - all rules passed.
Creating package from .
Package created: /tmp/artifacts/test_pkg-0.0.1.kxi
test_pkg-0.0.1.kxi