Adding to a Package
We can add a variety of objects/code to a package. For the managed obejcts we can check the add subcommand:
kxi package add --help
Usage: kxi package add [OPTIONS] COMMAND [ARGS]...
Add an entity to the specified package.
Options:
--to <Package> Package, to which, the entity will be added.
--help Show this message and exit.
Commands:
database Add a database to the specified package.
dep Add a package dependency to the specified package.
deployment-config Add deployment_config to the specified package
entitlements Add entitlements to the specified package.
entrypoint Add an entrypoint to the specified package.
patch Add patch to the specified package
pipeline Add a pipeline to the specified package.
router Add a router to the specified package.
table Add a table to the specified package.
Package add...
These are "special" cases where the addition is manual:
These are cases where we can kxi package add --to mypkg <component>
:
Each object controls some behaviour of the overall package that can be broadly broken into:
Each object can be added, removed (and copied) using the below:
Editing each object must be done manually
Package add is destructive!!
If you do:
kxi package add --to mypkg mydb
# some editing of the files
kxi package add --to mypkg mydb
kxi package add --help
Usage: kxi package add [OPTIONS] COMMAND [ARGS]...
Add an entity to the specified package.
Options:
--to <Package> Package, to which, the entity will be added.
--help Show this message and exit.
Commands:
database Add a database to the specified package.
dep Add a package dependency to the specified package.
deployment-config Add deployment_config to the specified package
entitlements Add entitlements to the specified package.
entrypoint Add an entrypoint to the specified package.
patch Add patch to the specified package
pipeline Add a pipeline to the specified package.
router Add a router to the specified package.
table Add a table to the specified package.
kxi package rm --help
Usage: kxi package rm [OPTIONS] COMMAND [ARGS]...
Remove an entity from the specified package.
Options:
-f, --from <Package> Package from which the entity will be removed.
--help Show this message and exit.
Commands:
database Remove a database from the specified package
dep Remove a pipeline from the specified package.
deployment-config Remove deployment_config form the specified package
entitlements Remove entitlements from the specified package.
entrypoint Remove entrypoint from the specified package.
patch Remove deployment_config form the specified package
pipeline Remove a pipeline from the specified package.
router Remove a router from the specified package.
table Remove a table from the specified package
kxi package copy --help
Usage: kxi package copy [OPTIONS] COMMAND [ARGS]...
Copy an entity from the specified package.
Options:
-f, --from <Src-Package> Package from which the entity will be copied.
-t, --to <Target-Package> Package to which the entity will be
copied;[default] `from`.
--help Show this message and exit.
Commands:
pipeline Copy a pipeline from the specified package.
Runtime Context
The below control the package's runtime and load behaviour.
I.e: What happens when we load a package (endpoint or udf) from a q or python session.
Entrypoints
Entrypoints define the q/Python files which are used as the initialization script for a package. When loading a package using q or Python, entrypoints provide a method by which you can specify the sub-sections of your package's code to be loaded. This can be visualized as follows: assume you have the following entrypoint
definition within your package's manifest.yaml
file:
entrypoints:
default: init.q
sp: src/sp.q
data-access: src/da.q
aggregator: src/agg.q
In the above example data-access
includes all code that is to be loaded within the data access processes of the database while sp
denotes code that is specifically intended to be loaded within the Stream Processor. You can also assume that the default
entrypoint should be used to load all code within the repository. Within the Python and q package APIs it is possible to load these entrypoints separately.
Manually add Entrypoint
Currently, in order to add/change entrypoints we must modify the manifest
file directly (using whichever text editor we prefer). There is no pakx add
style command for entrypoints.
-
Load the default entrypoint for version
1.0.0
of a package namedtest_pkg
import kxi.packages as pakx.init() pakx.packages.load("test_pkg", "1.0.0")
-
Load the non-default entrypoint
sp
for the same package version and nameimport kxi.packages as pakx pakx.init() pakx.packages.load("test_pkg", "1.0.0", "sp")
-
Load the default entrypoint for version
1.0.0
of a package namedtest_pkg
q).kxi.packages.load["test_pkg";"1.0.0"]
-
Load the non-default entrypoint
sp
for the same package version and nameq).kxi.packages.load["test_pkg";"1.0.0";"sp"]
Of particular importance when dealing with entrypoints are the use of entrypoints named data-access
and aggregator
when querying the kdb Insights Enterprise database using Custom APIs. As outlined within this document these entrypoints determine the code that is loaded by the data access and aggregator processes respectively when loading Custom Query APIs.
UDF
More info on UDFs
UDFs are functions written in Python or q which have special status in kdb Insights Enterprise.
You can make use of them specifically to deploy named, language agnostic functions within a package to a Stream Processor.
UDF scope
When loaded, UDFs only load the file within which they are defined
This means that when you are defining UDFs, it is important to ensure that all logic required to execute the UDF is defined within the file.
You can define UDFs in Python/q using decorators or comments respectively.
import kxi.packages as pakx
from pakx.decorators import udf
import numpy as np
@udf.name('custom_py_map')
@udf.description('Custom Python UDF making use of numpy')
@udf.tag('sp')
@udf.category('map')
def py_udf(table, params):
mod_column = table[params['column']]
# Multiply the content of the column to be modified by random values between 0 and 1
table[params['column']] = mod_column * np.random.random_sample(len(mod_column),)
return(table)
// @udf.name("custom_map")
// @udf.description("Custom map function providing filtering against incoming data for a specified column and maximum threshold.")
// @udf.tag("sp")
// @udf.category("map")
.test.my_custom_udf:{[table;params]
select from table where params[`column]>params`threshold
}
How to add UDFs
Currently, in order to add/change UDFs the package framework searches all files for @udf
definitions and writes them to a udfs
file. There is no manual addition of UDFs.
This search can be invoked by: pakx refresh mypackage
or by creating an artifact.
UDF Fields
Fields for UDF
=======================
field required type class description
----------- ---------- ------ ------- ----------------------------------------------------------
uuid False UUID UUID None
name True str str Name of the UDF with no spaces
function True str str Native function name, inferred from code
language True str str Language, inferred from the file extension (py or q)
file_path True str str Location of the filepath within the package, inferred from
context
udf_sym True str str The udf namespace this function is stored under with no
spaces
description False str str A description of the udf
category False str str The categories of the udf, list or str
tag False str str Tag for the udf e.g. 1.0.0
Static Context
The below control the package's relationship with other packages and any internal config mutations that we should make.
Dep
Deps or Package Dependencies specify dependencies that our package makes use of.
They are covered in more depth in package dependencies.
To add a new dependency to your package you could do:
kxi package add --to mypkg dep --name mynewdep --version 1.0.0
Dep Fields
from pakxcli.datamodels.packageDependency import PackageDependency as dep
from pakxcli.utils.datamodelUtils import ptype
ptype(dep, print)
Fields for PackageDependency
=======================
field required type class description
-------- ---------- ------ -------------------------- -------------
name False str str None
version False str str None
repo False str Union[str, Path, NoneType] None
location False str str None
path False str Union[str, Path, NoneType] None
kxi False str Union[str, Path, NoneType] None
Patch
Patches are snippets of config that act directly on the Package
object.
They are covered in more depth in package overlays
To add a new patch to your package you could do:
kxi package add --to my pkg patch --name my-cool-new-patch
Patch Fields
Fields for Patch
=======================
field required type class description
------- ---------- ------ ------- -------------
path True str str None
target False str str None
Deployment Context
The below control the package's behaviour when being deployed on the Insights platform.
I.e. Which processes need to be created, how we should orchestrate those processes and what is to be deployed.
All of the below are linked to kubernetes resources/runtime
Database
Databases are required for all data persistence requirements.
A package can only have 1 DB defined
Although multiple DB definitions are allowed, this will likely be revised in future. Please specify only one DB, if multiple are specified only one will be deployed.
Currently they house quite a lot of configuration including:
- schemas
- streams (I/O bus)
- rdb, idb, hdb (Data Access Processes)
- sm (Storage Manager)
- env
To add a new Database to your package you can run:
kxi package add --to mypkg database --name mydb
This will add a new directory with the following structure:
mypkg/databases/mydb/shards/mydb-shard.yaml
mypkg/databases/mydb/schemas.yaml
DB Fields
Fields for Database
=======================
field required type class description
------- ---------- --------- --------- -------------
uuid False UUID UUID None
name False str str DB Name
shards False Shard Shard Shards in DB
tables False TableList TableList Schemas in DB
Shard Fields
Fields for Shard
=======================
field required type class description
---------- ---------- ------ --------- ------------------------
uuid False UUID UUID None
name False str str Name identifier of shard
labels False str str Shard labels
sm False Sm Sm Storage Manager Object
daps False Dap Dap Data Access object
sequencers False Dict Sequencer Messsage bus into system
mounts False Dict Mounts PVs used in package
Schema Fields
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/markdown_exec/formatters/python.py", line 59, in _run_python
exec(compiled, exec_globals) # noqa: S102
File "<code block: n5>", line 1, in <module>
from pakxcli.datamodels.database import TableList, Table
ImportError: cannot import name 'Table' from 'pakxcli.datamodels.database' (/usr/local/lib/python3.10/site-packages/pakxcli/datamodels/database.py)
Deployment-config
Deployment Configs are for defining some "top level" deployment rules when deploying your package.
To add a new deployment-config to your package you can run:
kxi package add --to mypkg deployment-config
This will add a new directory with the following structure:
mypkg/databases/mydb/deployment-config/deployment-config.yaml
Deployment Config Fields
Fields for DeploymentConfig
=======================
field required type class description
---------------- ---------- ------- --------------- ----------------------------------------------------------
uuid False UUID UUID None
name False str str Name of the deployment_config object
attach False bool bool Enable tty and stdin for each process in the kdb+ insights
cluster
env False List EnvItem Env vars for every process in the kdb+ insights cluster
imagePullPolicy False str str Image pull secret policy
imagePullSecrets False List ImagePullSecret List of image registry secrets
license False License License License Object to be shared for all CRs
qlog False Qlog Qlog Assembly logging configuration
Pipeline
Pipelines are for used to enrich & stream data, you can add multiple pipelines to your package.
To add a new pipeline to your package you can run:
kxi package add --to mypkg pipeline --name my-pipeline
This will add a new directory with the following structure:
mypkg/databases/mydb/pipelines/my-pipeline.yaml
A user is expected to update the yaml to point at the correct spec
(e.g. spec: src/mypipelinecode.q
)
Pipeline Fields
Fields for Pipeline
=======================
field required type class description
-------------------------- ---------- -------------------------- -------------------------- -----------------------------------------------------------
uuid False UUID UUID None
base False Base Base Image base
config False Dict str Additional configuration to be applied to SP Pipeline
Assembly element
configMaps False List ConstrainedStrValue Pre-configured Kubernetes config maps to inject into
pipeline
controller False Controller Controller Configure Pipeline Controller
destination False ConstrainedStrValue ConstrainedStrValue Sequencer Bus to publish to
env False List EnvItem8 Environment Variables
group False ConstrainedStrValue ConstrainedStrValue Groups a pipeline into a set of replicas that have a
matching group id
id False str str SP Pipeline ID
imagePullSecrets False List ConstrainedStrValue Pre-configured Kubernetes imagePullSecrets to inject into
pipeline
maxWorkers False ConstrainedIntValue ConstrainedIntValue Maximum worker instances
minWorkers False ConstrainedIntValue ConstrainedIntValue Minimum worker instances
monitoring False bool bool Enable monitoring on Pipeline pods
name False ConstrainedStrValue ConstrainedStrValue SP Pipeline name
protectedExecution False bool bool Enable Protected Execution
replicaAffinityTopologyKey False ReplicaAffinityTopologyKey ReplicaAffinityTopologyKey The key of node labels. If two Nodes are labelled with this
key and have identical values for that label, the scheduler
treats both Nodes as being in the same topology. Used for
affinity and anti-affinity rules related to replicas.
replicas False ConstrainedIntValue ConstrainedIntValue Number of pipeline replicas
secrets False List ConstrainedStrValue Pre-configured Kubernetes secrets to inject into pipeline
source False ConstrainedStrValue ConstrainedStrValue Sequencer Bus to subscribe to
spec True str str Worker spec
type False Type Type "graph" or "spec" pipeline deployment
volumes False List Volume1 List of volumes to attach to Pipeline
worker False Worker Worker Configure Pipeline Worker
Router
Routers are for defining query environment and resource coordinator when when deploying your package.
To add a new router to your package you can run:
kxi package add --to mypkg router
This will add a new directory with the following structure:
mypkg/databases/mydb/router/router.yaml
Router Fields
Fields for Router
=======================
field required type class description
------- ---------- ---------------- ---------------- -----------------------------------------
uuid False UUID UUID None
name False str str Name of the router
rc False Rc Rc Resource Coordinator object
agg False Agg Agg Aggregator object
qe False QueryEnvironment QueryEnvironment Configure Query Environments for Assembly
Entitlements
Entitlements are for defining read/write permissions to your package however they currently unused.