Creating a package
This section provides you with a guide detailing how to create a package using the kdb Insights CLI, introduce code to this package and generate a versioned *.kxi
package artifact which can be uploaded to an instance of kdb Insights Enterprise following instructions outlined here. The API documentation relating to package interactions via the CLI can be found here. This provides a breakdown of the commands available for interacting with generated packages.
Pre-requisites
Prior to attempting to create a package you must first ensure you have the following pre-requisites covered within your local environment or CI setup where the package creation/generation is being completed.
- You have installed the kdb Insights CLI following the instructions here
-
Have updated the following environment variables to locations which respectively denote locations for the storage of installed local packages and artifacts:
environment variable default value KX_PACKAGE_PATH
/tmp/packages/
KX_ARTIFACT_PATH
/tmp/artifacts/
Initializing a package
Use the kdb Insights CLI to initialize packages; in particular, use the following command:
$ kxi package init --help
Usage: kxi package init [OPTIONS] [PATH_TO_PACKAGE]
Creates a bare package at the specified target path.
Note: this will not be saved to your KX_PACKAGE_PATH
unless `install` is explicitly run
Options:
--force Will overwrite if the dir is not empty
--help Show this message and exit.
When you are initializing a package locally it is important to take the following into account:
- Do not initialize packages within the
KX_PACKAGE_PATH
orKX_ARTIFACT_PATH
locations, which are required to be set as environment variables. Both locations expect to maintain a specific structure in order to discern appropriate packages and artifacts, respectively. Adding in development packages to these locations can result in errors. - Aside from the above restricted locations, packages can be initialized at any location within your workspace, however it is suggested that package development should take place from an isolated and empty directory to avoid polluting repositories/folders with irrelevant packages.
The following provides you with a basic example of creating a new package and outlines some of the important characteristics of this structure:
-
Create a new empty directory and move into this location
$ mkdir my_packages $ cd my_packages
-
Initialize a package named
test_pkg
$ kxi package init test_pkg Creating package at location: test_pkg Writing test_pkg/manifest.json
Once the package has been initialized, you can start to inspect its contents further. For more information on this, read the documentation regarding package components here. This provides you with a breakdown of all configuration options available, plus how you can customize usage of your package.
Adding code to a package
Once a package has been created, you can begin to add code to the package as would be expected of any code repository. The following caveats/restrictions apply with regard to reserved folders/files within a package based on near-term expectations for new functionality:
- Do not use folders named
pipelines
,reports
, ordatabases
at the root of your package. - Do not use the file names
udfs.json
andmanifest.json
at the root of your package. These can be populated through the use of the package API and as such should be avoided.
There are three important areas which should be understood when adding code to a package:
- Entrypoints
- User Defined Functions
- Loading files within packages
Entrypoints
As outlined here, entrypoints define the q/Python files which are used as the initialization script for a package. When loading a package using q or Python, entrypoints provide a method by which you can specify the sub-sections of your package's code to be loaded. This can be visualized as follows: assume you have the following entrypoint
definition within your package's manifest.json
file:
"entrypoints": {
"default": "init.q",
"sp": "src/sp.q",
"data-access": "src/da.q",
"aggregator": "src/agg.q"
},
In the above example data-access
includes all code that is to be loaded within the data access processes of the database while sp
denotes code that is specifically intended to be loaded within the Stream Processor. You can also assume that the default
entrypoint should be used to load all code within the repository. Within the Python and q package APIs it is possible to load these entrypoints separately
-
Load the default entrypoint for version
1.0.0
of a package namedtest_pkg
import kxi.packages as pakx pakx.init() pakx.packages.load("test_pkg", "1.0.0")
-
Load the non-default entrypoint
sp
for the same package version and nameimport kxi.packages as pakx pakx.init() pakx.packages.load("test_pkg", "1.0.0", "sp")
-
Load the default entrypoint for version
1.0.0
of a package namedtest_pkg
q).kxi.packages.load["test_pkg";"1.0.0"]
-
Load the non-default entrypoint
sp
for the same package version and nameq).kxi.packages.load["test_pkg";"1.0.0";"sp"]
Of particular importance when dealing with entrypoints are the use of entrypoints named data-access
and aggregator
when querying the kdb Insights Enterprise database using Custom APIs. As outlined within this document these entrypoints determine the code that is loaded by the data access and aggregator processes respectively when loading Custom Query APIs.
User Defined Functions
As outlined in depth here, there are functions written in Python or q which have special meaning within kdb Insights Enterprise. You can make use of them specifically to deploy named, language agnostic functions within a package to a Stream Processor.
You can define UDFs in Python/q using decorators or comments respectively as outlined here. When loaded, UDFs only load the file within which they are defined. This means that when you are defining UDFs, it is important to ensure that all logic required to execute the UDF is defined within the file. The following outlines some examples of UDF definitions in Python and q:
import kxi.packages as pakx
from pakx.decorators import udf
import numpy as np
@udf.name('custom_py_map')
@udf.description('Custom Python UDF making use of numpy')
@udf.tag('sp')
@udf.category('map')
def py_udf(table, params):
mod_column = table[params['column']]
# Multiply the content of the column to be modified by random values between 0 and 1
table[params['column']] = mod_column * np.random.random_sample(len(mod_column),)
return(table)
// @udf.name("custom_map")
// @udf.description("Custom map function providing filtering against incoming data for a specified column and maximum threshold.")
// @udf.tag("sp")
// @udf.category("map")
.test.my_custom_udf:{[table;params]
select from table where params[`column]>params`threshold
}
Loading files within packages
The process of adding code into your packages requires the ability to load code contained within other files within the package. Loading one file from another should not be completed using relative or absolute paths. Instead, the loading of files internal to your packages should be completed through use of the kxi.packages.packages.load_file
and .kxi.packages.file.load
functions for Python and q respectively. These functions load files relative to the root of the package being loaded or the package within which a UDF is being loaded. The use of the relative path from root can then be used to pin all loading from.
Examples of their usage within package files are as follows:
from kxi.packages import packages
# Load the file src/example_udf.py
packages.load_file("src/example_udf.py")
// Load the file src/example_udf.q
.kxi.packages.file.load["src/example_udf.q"]
Note
To facilitate the use of locked files the loading functionality will, by default, attempt to load the locked version of all files first followed by the loading of unlocked files.
Creating an uploadable artifact from a package
Generation of a package artifact for upload to your kdb Insights Enterprise instance is facilitated through the use of the command kxi package packit
which is defined as follows:
Usage: kxi package packit [OPTIONS] SOURCE
Create a package artifact given source code directory.
Options:
--keep-unlocked Whether to keep the original unlocked q files once the
locked versions have been created.
--lock-q-files Whether to lock the contents of the q files in the
package not.
--all-deps Packages all deps together.
--tag Tag for release, omit the random dev hash suffix.
--version TEXT Override the version of the package.
--package-name TEXT Override the auto-generated package name.
--help Show this message and exit.
This provides you with flexibility in how to bundle packages for upload depending on business requirements. The following provides some additional information on each of the options available allowing you to customize the creation of a package artifact locally or within your Continuous Integration (CI) environment.
Generating named, versioned production packages.
The naming and versioning of packages is by default determined by the contents of the manifest.json
file contained within the package. For many use-cases this is sufficient to allow packages to be generated, however, when automating this within Continuous Integration/Continuous Deployment (CI/CD) processes, this limitation can make the process of updating versions to align with release artifacts difficult. The use of --version
in particular allows you to take the tag version of the code repository defining your package as the version to be used when creating an artifact. In a similar vein --tag
provides the ability to determine, when generating your package, what is a production
and what is a development
version of your package. The following outlines the difference when generating a package using the --tag
option:
$ kxi package packit .
Package validation was successful - all rules passed.
Creating package from .
Package created: /tmp/artifacts/test_pkg1-0.0.1_5E8815A6.kxi
test_pkg1-0.0.1_5E8815A6.kxi
$ kxi package packit --tag .
Package validation was successful - all rules passed.
Creating package from .
Package created: /tmp/artifacts/test_pkg1-0.0.1.kxi
test_pkg1-0.0.1.kxi
As can be noticed above, the key difference between the two options is the omission of the dev_hash
when using the --tag
option. This is useful within the context of a CI/CD workflow as it allows the same package to be generated within the testing and distribution phases of your workflow without requiring a change to the underlying code.
Usage of the --tag
, --name
and --version
options results in changes to the manifest.json
file contained within a package artifact. Specifically, your manifest.json
file is updated to reflect the name and version given during the packit
phase of your workflow.
Including all required information
By default, creating a package artifact includes only the content of the package you have explicitly defined, however your package may include dependencies as outlined here. kdb Insights Enterprise only provides push access for packages introduced to the system. As such, should your package need dependencies, you are reliant on the dependencies already existing on your instance of kdb Insights Enterprise.
To counteract this, the option --all-deps
is available. It allows you to include within your package, as *.kxi
files, all dependencies your package requires and their sub-dependencies. This functionality is extremely useful but comes at a cost with respect to the size of your artifacts, which now include all dependent packages.
Locking your code
In certain circumstances, such as the dissemination of sensitive IP, it may be useful for you to be able to lock q code scripts. This process removes the ability to read the content of files and creates a locked
representation of functions, which removes the ability to display the function content.
To lock code code, you must have PyKX
installed and operating in 'licensed' mode.
The following provides you with two examples of its usage.
- Locking all code and removing the original copies of the q files
- Locking all code and keeping the original copies of the q files
$ kxi package packit --lock-q-files --tag .
Package validation was successful - all rules passed.
Creating package from .
Package created: /tmp/artifacts/test_pkg-0.0.1.kxi
test_pkg-0.0.1.kxi
$
Package validation was successful - all rules passed.
Creating package from .
Package created: /tmp/artifacts/test_pkg-0.0.1.kxi
test_pkg-0.0.1.kxi