Storing
The ML Registry allows users to persist a variety of versioned entities to disk and cloud storage applications. The ML Registry provides this persistence functionality across a number of namespaces, namely, .ml.registry.[new/set/log/update]
. All supported functionality within these namespaces is described below.
.ml.registry.new.registry
Generate a new registry
.ml.registry.new.registry[folderPath;config]
Parameters:
Name | Type | Description |
---|---|---|
folderPath |
dictionary | string | :: |
A folder path indicating the location of the registry. Can be one of 3 options: a dictionary containing the vendor and location as a string, e.g. enlist[`local]!enlist"myReg" ; a string indicating the local path; or a generic null to use the current .ml.registry.location pulled from CLI/JSON. |
config |
dictionary | :: |
Any additional configuration needed for initialising the registry. |
Returns:
Type | Description |
---|---|
dictionary |
Updated config dictionary containing relevant registry paths |
When generating a new registry within the context of cloud vendor interactions the folderPath
variable is unused and a new registry will be created at the storage location provided.
Examples:
Example 1: Generate a registry in 'pwd'
q).ml.registry.new.registry[::;::];
Example 2: Create a folder and generate a registry in that location
q)system"mkdir -p test/folder/location"
q).ml.registry.new.registry["test/folder/location";::];
Example 3: Generate registry in cloud storage location which is different from current .ml.registry.location
q).ml.registry.location
local| .
q).ml.registry.new.registry[enlist[`aws]!enlist"s3://ml-registry-test";::];
.ml.registry.new.experiment
Generate a new experiment within an existing registry. If the registry doesn't exist it will be created.
.ml.registry.new.experiment[folderPath;experimentName;config]
Where:
Parameters:
Name | Type | Description |
---|---|---|
folderPath |
dictionary | string | :: |
A folder path indicating the location of the registry. Can be one of 3 options: a dictionary containing the vendor and location as a string, e.g. enlist[`local]!enlist"myReg" ; a string indicating the local path; or a generic null to use the current .ml.registry.location pulled from CLI/JSON. |
experimentName |
string |
The name of the experiment to be located under the namedExperiments folder which can be populated by new models associated with the experiment. This may contain details of a subexperiment Eg. EXP1/SUBEXP1. |
config |
dictionary | :: |
Any additional configuration needed for initialising the experiment. |
Returns:
Type | Description |
---|---|
dictionary | Updated config dictionary containing relevant registry paths |
Examples:
Example 1: Create an experiment 'test' in a registry location in 'pwd'
q).ml.registry.new.experiment[::;"test";::];
Example 2: Create an experiment 'new_test' in a registry located at a different location
q)system"mkdir -p test/folder/location"
q).ml.registry.new.experiment["test/folder/location";"new_test";::];
Example 3: Create a sub-experiment 'sub_exp' under 'new_test' in the above registry
q).ml.registry.new.experiment["test/folder/location";"new_test/sub_exp";::];
Example 4: Generate experiment in a cloud storage location which is different from current .ml.registry.location
q).ml.registry.location
local| .
q).ml.registry.new.experiment[enlist[`aws]!enlist"s3://ml-registry-test";"my_test";::];
.ml.registry.set.model
Add a new model to the ML Registry. If the registry doesn't exist it will be created.
.ml.registry.set.model[folderPath;experimentName;model;modelName;modelType;config]
Parameters:
Name | Type | Description |
---|---|---|
folderPath |
dictionary | string | :: |
A folder path indicating the location of the registry. Can be one of 3 options: a dictionary containing the vendor and location as a string, e.g. enlist[`local]!enlist"myReg" ; a string indicating the local path; or a generic null to use the current .ml.registry.location pulled from CLI/JSON. |
experimentName |
string | :: |
The name of the experiment associated with the model or generic null if none. This may contain details of a subexperiment Eg. EXP1/SUBEXP1. |
model |
embedpy | dictionary | function | projection | symbol | string |
The model to be saved to the registry. |
modelName |
string |
The name to be associated with the model. |
modelType |
string |
The type of model that is being saved, namely "q" , "graph" , "sklearn" , "keras" , "python" , "torch" or "theano" . |
config |
dictionary |
Any additional configuration needed for setting the model. |
Returns:
Type | Description |
---|---|
guid |
Returns the unique id for the model |
Model Parameter:
The model variable defines the item that is to be saved to the registry and used as the model
when retrieved. This can be an embedPy object defining an underlying Python model, a q function/projection/dictionary or a symbol pointing to a model saved to disk.
Models can be added under the following qualifying conditions
Model Type | Saved File Type | Qualifying Conditions |
---|---|---|
q | q-binary | Model must be a q projection, function or dictionary with a predict and or update key. |
Python | pickled file | The model must be saved using joblib.dump . |
Sklearn | pickled file | The model must be saved using joblib.dump and contain a predict method i.e. is a fit scikit-learn model. |
Keras | HDF5 file | The model must be saved using the save method provided by Keras and contain a predict method i.e. is a fit Keras model. |
PyTorch | pickled file/jit | The model must be saved using the torch.save functionality. |
Theano | pickled file | The model must be saved using joblib.dump . Users should also save any files used in the generation of the model using the code option within config described below in order to ensure all methods required for the model are provided. |
When adding a model from disk the ability for the model to be loaded into the current process will be validated in order to ensure that the model can be loaded into a q process and it is not being added in a manner that will corrupt the registry.
If setting a q model to the registry the following conditions are important:
- When passed as a function/projection a model is expected to require one parameter only, namely the data to be passed to the model for it to be used as a prediction entity
- If the model is a dictionary
- It is expected to have a
predict
key which contains a model meeting the conditions of1
above. - Optionally it can have an
update
key which defines a function/projection taking feature and target data used to update the model, retrieval of the update functions can be configured for use in supervised and unsupervised use-cases as outlined here.
- It is expected to have a
When setting any of the Python
/Sklearn
/Keras
/PyTorch
or Theano
models to the registry the following conditions are important:
- All functions when used for prediction should accept one parameter, namely the data to be passed to the model to perform a prediction. A breakdown of expectations around how these models are stored is provided in the table above.
- Scikit-learn models are also supported for use as
updating
models, namely on retrieval of the models using.ml.registry.get.update
when this model has been fit and contains thepartial_fit
method for example: sklearn.linear_model.SGDClassifier.
Configuration Parameter:
The config
variable within the .ml.registry.set.model
function is used extensively within the code to facilitate advanced options within the registry code. The following keys in particular are supported for more advanced functionality, usage of these is outlined within the examples section here.
key | type | Description |
---|---|---|
data |
any |
If provided with data as a key the addition of the model to the registry will also attempt to parse out relevant statistical information associated with the data for use within deployment of the model. |
requirements |
boolean | string[][] | symbol |
Add Python requirements information associated with a model, this can either be a boolean 1b indicating use of pip freeze , a symbol indicating the path to a requirements.txt file or a list of strings defining the requirements to be added. |
major |
boolean |
Is the incrementing of a version to be 'major' i.e. should the model be incremented from 1 0 to 2 0 rather than 1 0 to 1 1 as is default. |
majorVersion |
long |
What major version is to be incremented? By default we increment major versions based on the maximal version within the registry, however users can define the major version to be incremented using this option. |
code |
symbol | symbol[] |
Reference to the location of any files *.py /*.p /*.k or *.q files. These files are then loaded automatically on retrieval of the models using the *.get.* functionality. |
axis |
boolean |
Should the data when passed to the model be 'vertical' or 'horizontal' i.e. should the data be retrieved from a table in flip value flip (0b ) or value flip (1b ) format. This allows flexibility in model design. |
supervise |
string[] |
List of metrics to be used for supervised monitoring of the model. |
Examples:
Example 1: Add a vanilla model to a registry in 'pwd'
q).ml.registry.set.model[::;::;{x};"model";"q";::]
440482bb-5404-b22d-6c53-c847f09acf0a
Example 2: Add a vanilla model to a registry in 'pwd' under experiment EXP1
q).ml.registry.set.model[::;"EXP1";{x};"model";"q";::]
440482bb-5404-b22d-6c53-c847f09acf0a
Example 3: Add a vanilla model to a registry in 'pwd' under sub-experiment EXP1/SUBEXP1
q).ml.registry.set.model[::;"EXP1/SUBEXP1";{x};"model";"q";::]
440482bb-5404-b22d-6c53-c847f09acf0a
Example 4: Add an sklearn model to a registry
q)skldata:.p.import`sklearn.datasets
q)blobs:skldata[`:make_blobs;<]
q)dset:blobs[`n_samples pykw 1000;`centers pykw 2;`random_state pykw 500]
q)skmdl :.p.import[`sklearn.cluster][`:AffinityPropagation][`damping pykw 0.8][`:fit]dset 0
q).ml.registry.set.model[::;::;skmdl;"skmodel";"sklearn";::]
6048775b-01e9-33b7-302a-8307ff8e132c
Example 5: Generate a major version of the "model" within the registry
q).ml.registry.set.model[::;::;{x+1};"model";"q";enlist[`major]!enlist 1b]
95ed27df-072d-6bd6-713d-c49fae255840
Example 6: Associate some Python requirements with the next version of the sklearn model
q)requirements:enlist[`requirements]!enlist ("scikit-learn";"numpy")
q).ml.registry.set.model[::;::;skmdl;"skmodel";"sklearn";requirements]
440482bb-5404-b22d-6c53-c847f09acf0a
Example 7: Add a q model saved to disk (this assumes running from the root of the registry repo)
q).ml.registry.set.model[::;::;`:examples/models/qModel;"qModel";"q";::]
bea225d4-f8e5-dd3a-32da-51ecc91a6d9e
.ml.registry.set.parameters
Generate a JSON file containing parameters to be associated with a model. These parameters define any information that a user believes to be important to the models generation, it may include hyperparameter sets used when fitting or information about training.
.ml.registry.set.parameters[folderPath;experimentName;modelName;version;paramName;params]
Parameters:
Name | Type | Description |
---|---|---|
folderPath |
dictionary | string | :: |
A folder path indicating the location of the registry. Can be one of 3 options: a dictionary containing the vendor and location as a string, e.g. enlist[`local]!enlist"myReg" ; a string indicating the local path; or a generic null to use the current .ml.registry.location pulled from CLI/JSON. |
experimentName |
string | :: |
The name of the experiment associated with the model or generic null if none. This may contain details of a subexperiment Eg. EXP1/SUBEXP1. |
modelName |
string | :: |
The name of the model to which the parameters are to be set. If this is null, the newest model associated with the experiment is used. |
version |
long[] | :: |
The specific version of a named model to set the parameters to, a list of length 2 with (major;minor) version number. If this is null the newest model is used. |
paramName |
string | symbol |
The name of the parameter to be saved. |
params |
dictionary | table | string |
The parameters to save to file. |
Returns:
Type | Description |
---|---|
:: |
When adding new parameters associated with a model within the context of cloud vendor interactions the folderPath
variable is unused and the registry location is assumed to be the storage location provided on initialisation.
Examples:
Example 1: Save a dictionary parameter associated with a model 'mymodel'
// Add a model to the registry
q).ml.registry.set.model[::;::;{x+2};"mymodel";"q";::]
// Save a dictionary parameter associated with a model 'mymodel'
q).ml.registry.set.parameters[::;::;"mymodel";1 0;"paramFile";`param1`param2!1 2]
Example 2: Save a list of strings as parameters associated with a model 'mymodel'
q).ml.registry.set.parameters[::;::;"mymodel";1 0;"paramFile2";("value1";"value2")]
.ml.registry.log.metric
Log metric values associated with a model
.ml.registry.log.metric[folderPath;experimentName;modelName;version;metricName;metricValue]
Parameters:
Name | Type | Description |
---|---|---|
folderPath |
dictionary | string | :: |
A folder path indicating the location of the registry. Can be one of 3 options: a dictionary containing the vendor and location as a string, e.g. enlist[`local]!enlist"myReg" ; a string indicating the local path; or a generic null to use the current .ml.registry.location pulled from CLI/JSON. |
experimentName |
string | :: |
The name of the experiment associated with the model or generic null if none. This may contain details of a subexperiment Eg. EXP1/SUBEXP1. |
modelName |
string | :: |
The name of the model to which the metrics are to be associated. If this is null, the newest model associated with the experiment is used. |
version |
long[] | :: |
The specific version of a named model to be used, a list of length 2 with (major;minor) version number. If this is null the newest model is used. |
metricName |
symbol | string |
The name of the metric to be persisted. In the case when this is a string, it is converted to a symbol. |
metricValue |
float |
The value of the metric to be persisted. |
Returns:
Type | Description |
---|---|
:: |
When logging metrics a persisted binary table is generated within the model registry containing the following information
- The time the metric value was added
- The name of the persisted metric
- The value of the persisted metric
When adding metrics associated with a model within the context of cloud vendor interactions the folderPath
variable is unused and the registry location is assumed to be the storage location provided on initialisation.
Examples:
Example 1: Log metric values associated with various metric names
// Create a model within the registry
q).ml.registry.set.model[::;::;{x+1};"metricModel";"q";::]
// Log metric values associated with various metric names
q).ml.registry.log.metric[::;::;"metricModel";1 0;`func1;2.4]
q).ml.registry.log.metric[::;::;"metricModel";1 0;`func1;3]
q).ml.registry.log.metric[::;::;"metricModel";1 0;`func2;10.2]
q).ml.registry.log.metric[::;::;"metricModel";1 0;`func3;9]
q).ml.registry.log.metric[::;::;"metricModel";1 0;`func3;11.2]
.ml.registry.update.latency
Update monitoring config with new latency information
.ml.registry.update.latency[cli;folderPath;experimentName;modelName;version;model;data]
Parameters:
Name | Type | Description |
---|---|---|
folderPath |
dictionary | string | :: |
A folder path indicating the location of the registry. Can be one of 3 options: a dictionary containing the vendor and location as a string, e.g. enlist[`local]!enlist"myReg" ; a string indicating the local path; or a generic null to use the current .ml.registry.location pulled from CLI/JSON. |
experimentName |
string | :: |
The name of the experiment associated with the model or generic null if none. This may contain details of a subexperiment Eg. EXP1/SUBEXP1. |
modelName |
string | :: |
The name of the model to be used. If this is null, the newest model associated with the experiment is retrieved. |
version |
long[] | :: |
The specific version of a named model to use, a list of length 2 with (major;minor) version number. If this is null the newest model is retrieved. |
model |
fn |
The function whos latency is to be monitored. |
data |
table |
Sample data on which to evaluate the function. |
Returns:
Type | Description |
---|---|
:: |
Examples:
Example 1: Update model latency config
// Create a model within the registry
q).ml.registry.set.model[::;::;{x};"configModel";"q";::]
// Get predict function
q)p:.ml.registry.get.predict[::;::;"configModel";::]
// Update model latency config
q).ml.registry.update.latency[::;::;"configModel";::;p;([]1000?1f)]
.ml.registry.update.nulls
Update monitoring config with new null information
.ml.registry.update.nulls[cli;folderPath;experimentName;modelName;version;data]
Parameters:
Name | Type | Description |
---|---|---|
folderPath |
dictionary | string | :: |
A folder path indicating the location of the registry. Can be one of 3 options: a dictionary containing the vendor and location as a string, e.g. enlist[`local]!enlist"myReg" ; a string indicating the local path; or a generic null to use the current .ml.registry.location pulled from CLI/JSON. |
experimentName |
string | :: |
The name of the experiment associated with the model or generic null if none. This may contain details of a subexperiment Eg. EXP1/SUBEXP1. |
modelName |
string | :: |
The name of the model to be used. If this is null, the newest model associated with the experiment is retrieved. |
version |
long[] | :: |
The specific version of a named model to use, a list of length 2 with (major;minor) version number. If this is null the newest model is retrieved. |
data |
table |
Sample data on which to evaluate the median value. |
Returns:
Type | Description |
---|---|
:: |
Examples:
Example 1: Update model nulls config
// Create a model within the registry
q).ml.registry.set.model[::;::;{x};"configModel";"q";::]
// Update model nulls config
q).ml.registry.update.nulls[::;::;"configModel";::;([]1000?1f)]
.ml.registry.update.infinity
Update monitoring config with new infinity information
.ml.registry.update.infinity[cli;folderPath;experimentName;modelName;version;data]
Parameters:
Name | Type | Description |
---|---|---|
folderPath |
dictionary | string | :: |
A folder path indicating the location of the registry. Can be one of 3 options: a dictionary containing the vendor and location as a string, e.g. enlist[`local]!enlist"myReg" ; a string indicating the local path; or a generic null to use the current .ml.registry.location pulled from CLI/JSON. |
experimentName |
string | :: |
The name of the experiment associated with the model or generic null if none. This may contain details of a subexperiment Eg. EXP1/SUBEXP1. |
modelName |
string | :: |
The name of the model to be used. If this is null, the newest model associated with the experiment is retrieved. |
version |
long[] | :: |
The specific version of a named model to use, a list of length 2 with (major;minor) version number. If this is null the newest model is retrieved. |
data |
table |
Sample data on which to evaluate the min/max value. |
Returns:
Type | Description |
---|---|
:: |
Examples:
Example 1: Update model infinity config
// Create a model within the registry
q).ml.registry.set.model[::;::;{x};"configModel";"q";::]
// Update model infinity config
q).ml.registry.update.infinity[::;::;"configModel";::;([]1000?1f)]
.ml.registry.update.csi
Update monitoring config with new csi information
.ml.registry.update.csi[cli;folderPath;experimentName;modelName;version;data]
Parameters:
Name | Type | Description |
---|---|---|
folderPath |
dictionary | string | :: |
A folder path indicating the location of the registry. Can be one of 3 options: a dictionary containing the vendor and location as a string, e.g. enlist[`local]!enlist"myReg" ; a string indicating the local path; or a generic null to use the current .ml.registry.location pulled from CLI/JSON. |
experimentName |
string | :: |
The name of the experiment associated with the model or generic null if none. This may contain details of a subexperiment Eg. EXP1/SUBEXP1. |
modelName |
string | :: |
The name of the model to be used. If this is null, the newest model associated with the experiment is retrieved. |
version |
long[] | :: |
The specific version of a named model to use, a list of length 2 with (major;minor) version number. If this is null the newest model is retrieved. |
data |
table |
Sample data on which to evaluate the historical distributions. |
Returns:
Type | Description |
---|---|
:: |
Examples:
Example 1: Update model csi config
// Create a model within the registry
q).ml.registry.set.model[::;::;{x};"configModel";"q";::]
// Update model csi config
q).ml.registry.update.csi[::;::;"configModel";::;([]1000?1f)]
.ml.registry.update.psi
Update monitoring config with new psi information
.ml.registry.update.psi[cli;folderPath;experimentName;modelName;version;model;data]
Parameters:
Name | Type | Description |
---|---|---|
folderPath |
dictionary | string | :: |
A folder path indicating the location of the registry. Can be one of 3 options: a dictionary containing the vendor and location as a string, e.g. enlist[`local]!enlist"myReg" ; a string indicating the local path; or a generic null to use the current .ml.registry.location pulled from CLI/JSON. |
experimentName |
string | :: |
The name of the experiment associated with the model or generic null if none. This may contain details of a subexperiment Eg. EXP1/SUBEXP1. |
modelName |
string | :: |
The name of the model to be used. If this is null, the newest model associated with the experiment is retrieved. |
version |
long[] | :: |
The specific version of a named model to use, a list of length 2 with (major;minor) version number. If this is null the newest model is retrieved. |
model |
fn |
Prediction function. |
data |
table |
Sample data on which to evaluate the historical predictions. |
Returns:
Type | Description |
---|---|
:: |
Examples:
Example 1: Update model psi config
// Create a model within the registry
q).ml.registry.set.model[::;::;{x};"configModel";"q";::]
// Get predict function
q)p:.ml.registry.get.predict[::;::;"configModel";::]
// Update model psi config
q).ml.registry.update.psi[::;::;"configModel";::;p;([]1000?1f)]
.ml.registry.update.type
Update monitoring config with new type information
.ml.registry.update.type[cli;folderPath;experimentName;modelName;version;format]
Parameters:
Name | Type | Description |
---|---|---|
folderPath |
dictionary | string | :: |
A folder path indicating the location of the registry. Can be one of 3 options: a dictionary containing the vendor and location as a string, e.g. enlist[`local]!enlist"myReg" ; a string indicating the local path; or a generic null to use the current .ml.registry.location pulled from CLI/JSON. |
experimentName |
string | :: |
The name of the experiment associated with the model or generic null if none. This may contain details of a subexperiment Eg. EXP1/SUBEXP1. |
modelName |
string | :: |
The name of the model to be used. If this is null, the newest model associated with the experiment is retrieved. |
version |
long[] | :: |
The specific version of a named model to use, a list of length 2 with (major;minor) version number. If this is null the newest model is retrieved. |
format |
string |
Model type. |
Returns:
Type | Description |
---|---|
:: |
Examples:
Example 1: Update model type config
// Create a model within the registry
q).ml.registry.set.model[::;::;{x};"configModel";"q";::]
// Update model type config
q).ml.registry.update.type[::;::;"configModel";::;"sklearn"]
.ml.registry.update.supervise
Update monitoring config with new supervise information
.ml.registry.update.supervise[cli;folderPath;experimentName;modelName;version;metrics]
Parameters:
Name | Type | Description |
---|---|---|
folderPath |
dictionary | string | :: |
A folder path indicating the location of the registry. Can be one of 3 options: a dictionary containing the vendor and location as a string, e.g. enlist[`local]!enlist"myReg" ; a string indicating the local path; or a generic null to use the current .ml.registry.location pulled from CLI/JSON. |
experimentName |
string | :: |
The name of the experiment associated with the model or generic null if none. This may contain details of a subexperiment Eg. EXP1/SUBEXP1. |
modelName |
string | :: |
The name of the model to be used. If this is null, the newest model associated with the experiment is retrieved. |
version |
long[] | :: |
The specific version of a named model to use, a list of length 2 with (major;minor) version number. If this is null the newest model is retrieved. |
metrics |
string[] |
Metrics to monitor. |
Returns:
Type | Description |
---|---|
:: |
Examples:
Example 1: Update model supervise config
// Create a model within the registry
q).ml.registry.set.model[::;::;{x};"configModel";"q";::]
// Update model supervise config
q).ml.registry.update.supervise[::;::;"configModel";::;enlist[".ml.mse"]]
.ml.registry.update.schema
Update monitoring config with new schema information
.ml.registry.update.schema[cli;folderPath;experimentName;modelName;version;data]
Parameters:
Name | Type | Description |
---|---|---|
folderPath |
dictionary | string | :: |
A folder path indicating the location of the registry. Can be one of 3 options: a dictionary containing the vendor and location as a string, e.g. enlist[`local]!enlist"myReg" ; a string indicating the local path; or a generic null to use the current .ml.registry.location pulled from CLI/JSON. |
experimentName |
string | :: |
The name of the experiment associated with the model or generic null if none. This may contain details of a subexperiment Eg. EXP1/SUBEXP1. |
modelName |
string | :: |
The name of the model to be used. If this is null, the newest model associated with the experiment is retrieved. |
version |
long[] | :: |
The specific version of a named model to use, a list of length 2 with (major;minor) version number. If this is null the newest model is retrieved. |
data |
table |
Table from which to retreive schema. |
Returns:
Type | Description |
---|---|
:: |
Examples:
Example 1: Update model supervise config
// Create a model within the registry
q).ml.registry.set.model[::;::;{x};"configModel";"q";::]
// Update model supervise config
q).ml.registry.update.schema[::;::;"configModel";::;([]til 7)]