Adding to a Package

We can add a variety of objects/code to a package. For the managed obejcts we can check the add subcommand:

kxi package add --help

Usage: kxi package add [OPTIONS] COMMAND [ARGS]...

  Add an entity to the specified package.

Options:
  --to <Package>  Package, to which, the entity will be added.
  --help          Show this message and exit.

Commands:
  database           Add a database to the specified package.
  dep                Add a package dependency to the specified package.
  deployment-config  Add deployment_config to the specified package
  entitlements       Add entitlements to the specified package.
  entrypoint         Add an entrypoint to the specified package.
  patch              Add patch to the specified package
  pipeline           Add a pipeline to the specified package.
  router             Add a router to the specified package.
  table              Add a table to the specified package.

Package add...

These are "special" cases where the addition is manual:

These are cases where we can kxi package add --to mypkg <component>:

Each object controls some behaviour of the overall package that can be broadly broken into:

Each object can be added, removed (and copied) using the below:

Editing each object must be done manually

Package add is destructive!!

If you do:

kxi package add --to mypkg mydb
# some editing of the files
kxi package add --to mypkg mydb

The second add will create a clean database and overwrite your existing db with this clean one

addrmcopy

kxi package add --help

Usage: kxi package add [OPTIONS] COMMAND [ARGS]...

  Add an entity to the specified package.

Options:
  --to <Package>  Package, to which, the entity will be added.
  --help          Show this message and exit.

Commands:
  database           Add a database to the specified package.
  dep                Add a package dependency to the specified package.
  deployment-config  Add deployment_config to the specified package
  entitlements       Add entitlements to the specified package.
  entrypoint         Add an entrypoint to the specified package.
  patch              Add patch to the specified package
  pipeline           Add a pipeline to the specified package.
  router             Add a router to the specified package.
  table              Add a table to the specified package.

kxi package rm --help

Usage: kxi package rm [OPTIONS] COMMAND [ARGS]...

  Remove an entity from the specified package.

Options:
  -f, --from <Package>  Package from which the entity will be removed.
  --help                Show this message and exit.

Commands:
  database           Remove a database from the specified package
  dep                Remove a pipeline from the specified package.
  deployment-config  Remove deployment_config form the specified package
  entitlements       Remove entitlements from the specified package.
  entrypoint         Remove entrypoint from the specified package.
  patch              Remove deployment_config form the specified package
  pipeline           Remove a pipeline from the specified package.
  router             Remove a router from the specified package.
  table              Remove a table from the specified package

kxi package copy --help

Usage: kxi package copy [OPTIONS] COMMAND [ARGS]...

  Copy an entity from the specified package.

Options:
  -f, --from <Src-Package>   Package from which the entity will be copied.
  -t, --to <Target-Package>  Package to which the entity will be
                             copied;[default] `from`.
  --help                     Show this message and exit.

Commands:
  pipeline  Copy a pipeline from the specified package.

Runtime Context

The below control the package's runtime and load behaviour.

I.e: What happens when we load a package (endpoint or udf) from a q or python session.

Entrypoints

Entrypoints define the q/Python files which are used as the initialization script for a package. When loading a package using q or Python, entrypoints provide a method by which you can specify the sub-sections of your package's code to be loaded. This can be visualized as follows: assume you have the following entrypoint definition within your package's manifest.yaml file:

entrypoints:
   default: init.q
   sp: src/sp.q
   data-access: src/da.q
   aggregator: src/agg.q

In the above example data-access includes all code that is to be loaded within the data access processes of the database while sp denotes code that is specifically intended to be loaded within the Stream Processor. You can also assume that the default entrypoint should be used to load all code within the repository. Within the Python and q package APIs it is possible to load these entrypoints separately.

Manually add Entrypoint

Currently, in order to add/change entrypoints we must modify the manifest file directly (using whichever text editor we prefer). There is no pakx add style command for entrypoints.

Pythonq

Load the default entrypoint for version 1.0.0 of a package named test_pkg

import kxi.packages as 
pakx.init()
pakx.packages.load("test_pkg", "1.0.0")

Load the non-default entrypoint sp for the same package version and name

import kxi.packages as pakx
pakx.init()
pakx.packages.load("test_pkg", "1.0.0", "sp")

Load the default entrypoint for version 1.0.0 of a package named test_pkg
```
q).kxi.packages.load["test_pkg";"1.0.0"]
```
Load the non-default entrypoint sp for the same package version and name
```
q).kxi.packages.load["test_pkg";"1.0.0";"sp"]
```

Of particular importance when dealing with entrypoints are the use of entrypoints named data-access and aggregator when querying the kdb Insights Enterprise database using Custom APIs. As outlined within this document these entrypoints determine the code that is loaded by the data access and aggregator processes respectively when loading Custom Query APIs.

UDF

More info on UDFs

UDFs

UDFs are functions written in Python or q which have special status in kdb Insights Enterprise.

You can make use of them specifically to deploy named, language agnostic functions within a package to a Stream Processor.

UDF scope

When loaded, UDFs only load the file within which they are defined

This means that when you are defining UDFs, it is important to ensure that all logic required to execute the UDF is defined within the file.

You can define UDFs in Python/q using decorators or comments respectively.

Pythonq

import kxi.packages as pakx
from pakx.decorators import udf

import numpy as np

@udf.name('custom_py_map')
@udf.description('Custom Python UDF making use of numpy')
@udf.tag('sp')
@udf.category('map')
def py_udf(table, params):
    mod_column = table[params['column']]
    # Multiply the content of the column to be modified by random values between 0 and 1
    table[params['column']] = mod_column * np.random.random_sample(len(mod_column),)
    return(table)

// @udf.name("custom_map")
// @udf.description("Custom map function providing filtering against incoming data for a specified column and maximum threshold.")
// @udf.tag("sp")
// @udf.category("map")
.test.my_custom_udf:{[table;params]
  select from table where params[`column]>params`threshold
  }

How to add UDFs

Currently, in order to add/change UDFs the package framework searches all files for @udf definitions and writes them to a udfs file. There is no manual addition of UDFs.

This search can be invoked by: pakx refresh mypackage or by creating an artifact.

UDF Fields

Fields for UDF
=======================
field        required    type    class    description
-----------  ----------  ------  -------  ----------------------------------------------------------
uuid         False       UUID    UUID     None
name         True        str     str      Name of the UDF with no spaces
function     True        str     str      Native function name, inferred from code
language     True        str     str      Language, inferred from the file extension (py or q)
file_path    True        str     str      Location of the filepath within the package, inferred from
                                          context
udf_sym      True        str     str      The udf namespace this function is stored under with no
                                          spaces
description  False       str     str      A description of the udf
category     False       str     str      The categories of the udf, list or str
tag          False       str     str      Tag for the udf e.g. 1.0.0

Static Context

The below control the package's relationship with other packages and any internal config mutations that we should make.

Dep

Deps or Package Dependencies specify dependencies that our package makes use of.

They are covered in more depth in package dependencies.

To add a new dependency to your package you could do:

kxi package add --to mypkg dep --name mynewdep --version 1.0.0

Dep Fields

from pakxcli.datamodels.packageDependency import PackageDependency as dep
from pakxcli.utils.datamodelUtils import ptype
ptype(dep, print)

Fields for PackageDependency
=======================
field     required    type    class                       description
--------  ----------  ------  --------------------------  -------------
name      False       str     str                         None
version   False       str     str                         None
repo      False       str     Union[str, Path, NoneType]  None
location  False       str     str                         None
path      False       str     Union[str, Path, NoneType]  None
kxi       False       str     Union[str, Path, NoneType]  None

Patch

Patches are snippets of config that act directly on the Package object.

They are covered in more depth in package overlays

To add a new patch to your package you could do:

kxi package add --to my pkg patch --name my-cool-new-patch

Patch Fields

Fields for Patch
=======================
field    required    type    class    description
-------  ----------  ------  -------  -------------
path     True        str     str      None
target   False       str     str      None

Deployment Context

The below control the package's behaviour when being deployed on the Insights platform.

I.e. Which processes need to be created, how we should orchestrate those processes and what is to be deployed.

All of the below are linked to kubernetes resources/runtime

Database

Databases are required for all data persistence requirements.

A package can only have 1 DB defined

Although multiple DB definitions are allowed, this will likely be revised in future. Please specify only one DB, if multiple are specified only one will be deployed.

Currently they house quite a lot of configuration including:

- schemas
- streams (I/O bus)
- rdb, idb, hdb (Data Access Processes)
- sm (Storage Manager)
- env

To add a new Database to your package you can run:

kxi package add --to mypkg database --name mydb

This will add a new directory with the following structure:

mypkg/databases/mydb/shards/mydb-shard.yaml
mypkg/databases/mydb/schemas.yaml

These could then be modified manually or using patches as required. They should be deployable "as is". Below are some of the top level objects that comprise a Database.

DB Fields

Fields for Database
=======================
field    required    type       class      description
-------  ----------  ---------  ---------  -------------
uuid     False       UUID       UUID       None
name     False       str        str        DB Name
shards   False       Shard      Shard      Shards in DB
tables   False       TableList  TableList  Schemas in DB

Shard Fields

Fields for Shard
=======================
field       required    type    class      description
----------  ----------  ------  ---------  ------------------------
uuid        False       UUID    UUID       None
name        False       str     str        Name identifier of shard
labels      False       str     str        Shard labels
sm          False       Sm      Sm         Storage Manager Object
daps        False       Dap     Dap        Data Access object
sequencers  False       Dict    Sequencer  Messsage bus into system
mounts      False       Dict    Mounts     PVs used in package

Schema Fields

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/markdown_exec/formatters/python.py", line 59, in _run_python
    exec(compiled, exec_globals)  # noqa: S102
  File "<code block: n5>", line 1, in <module>
    from pakxcli.datamodels.database import TableList, Table
ImportError: cannot import name 'Table' from 'pakxcli.datamodels.database' (/usr/local/lib/python3.10/site-packages/pakxcli/datamodels/database.py)

Deployment-config

Deployment Configs are for defining some "top level" deployment rules when deploying your package.

To add a new deployment-config to your package you can run:

kxi package add --to mypkg deployment-config

This will add a new directory with the following structure:

mypkg/databases/mydb/deployment-config/deployment-config.yaml

Deployment Config Fields

Fields for DeploymentConfig
=======================
field             required    type     class            description
----------------  ----------  -------  ---------------  ----------------------------------------------------------
uuid              False       UUID     UUID             None
name              False       str      str              Name of the deployment_config object
attach            False       bool     bool             Enable tty and stdin for each process in the kdb+ insights
                                                        cluster
env               False       List     EnvItem          Env vars for every process in the kdb+ insights cluster
imagePullPolicy   False       str      str              Image pull secret policy
imagePullSecrets  False       List     ImagePullSecret  List of image registry secrets
license           False       License  License          License Object to be shared for all CRs
qlog              False       Qlog     Qlog             Assembly logging configuration

Pipeline

Pipelines are for used to enrich & stream data, you can add multiple pipelines to your package.

To add a new pipeline to your package you can run:

kxi package add --to mypkg pipeline --name my-pipeline

This will add a new directory with the following structure:

mypkg/databases/mydb/pipelines/my-pipeline.yaml

A user is expected to update the yaml to point at the correct spec (e.g. spec: src/mypipelinecode.q)

Pipeline Fields

Fields for Pipeline
=======================
field                       required    type                        class                       description
--------------------------  ----------  --------------------------  --------------------------  -----------------------------------------------------------
uuid                        False       UUID                        UUID                        None
base                        False       Base                        Base                        Image base
config                      False       Dict                        str                         Additional configuration to be applied to SP Pipeline
                                                                                                Assembly element
configMaps                  False       List                        ConstrainedStrValue         Pre-configured Kubernetes config maps to inject into
                                                                                                pipeline
controller                  False       Controller                  Controller                  Configure Pipeline Controller
destination                 False       ConstrainedStrValue         ConstrainedStrValue         Sequencer Bus to publish to
env                         False       List                        EnvItem8                    Environment Variables
group                       False       ConstrainedStrValue         ConstrainedStrValue         Groups a pipeline into a set of replicas that have a
                                                                                                matching group id
id                          False       str                         str                         SP Pipeline ID
imagePullSecrets            False       List                        ConstrainedStrValue         Pre-configured Kubernetes imagePullSecrets to inject into
                                                                                                pipeline
maxWorkers                  False       ConstrainedIntValue         ConstrainedIntValue         Maximum worker instances
minWorkers                  False       ConstrainedIntValue         ConstrainedIntValue         Minimum worker instances
monitoring                  False       bool                        bool                        Enable monitoring on Pipeline pods
name                        False       ConstrainedStrValue         ConstrainedStrValue         SP Pipeline name
protectedExecution          False       bool                        bool                        Enable Protected Execution
replicaAffinityTopologyKey  False       ReplicaAffinityTopologyKey  ReplicaAffinityTopologyKey  The key of node labels. If two Nodes are labelled with this
                                                                                                key and have identical values for that label, the scheduler
                                                                                                treats both Nodes as being in the same topology. Used for
                                                                                                affinity and anti-affinity rules related to replicas.
replicas                    False       ConstrainedIntValue         ConstrainedIntValue         Number of pipeline replicas
secrets                     False       List                        ConstrainedStrValue         Pre-configured Kubernetes secrets to inject into pipeline
source                      False       ConstrainedStrValue         ConstrainedStrValue         Sequencer Bus to subscribe to
spec                        True        str                         str                         Worker spec
type                        False       Type                        Type                        "graph" or "spec" pipeline deployment
volumes                     False       List                        Volume1                     List of volumes to attach to Pipeline
worker                      False       Worker                      Worker                      Configure Pipeline Worker

Router

Routers are for defining query environment and resource coordinator when when deploying your package.

To add a new router to your package you can run:

kxi package add --to mypkg router

This will add a new directory with the following structure:

mypkg/databases/mydb/router/router.yaml

Router Fields

Fields for Router
=======================
field    required    type              class             description
-------  ----------  ----------------  ----------------  -----------------------------------------
uuid     False       UUID              UUID              None
name     False       str               str               Name of the router
rc       False       Rc                Rc                Resource Coordinator object
agg      False       Agg               Agg               Aggregator object
qe       False       QueryEnvironment  QueryEnvironment  Configure Query Environments for Assembly

Entitlements

Entitlements are for defining read/write permissions to your package however they currently unused.