Storing
The ML Registry allows users to persist a variety of versioned entities to disk and cloud storage applications. The ML Registry provides this persistence functionality across a number of Python submodules, namely, kxi.ml.registry.[new/set/log]
. All supported functionality within these submodules is described below.
kxi.ml.registry.new.registry
Create a new registry.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
folder_path |
Union[str, dict] |
Either a string denoting the folder path location to create the registry,
or a dictionary specifying the vendor (as key) and the path (as value),
ex: |
None |
config |
dict |
Either a registry configuration as dictionary, or None |
None |
Returns:
Type | Description |
---|---|
dict |
Registry configuration as a dictionary |
Examples:
Create a registry in various locations depending on argument type.
- Default arguments will create a registry in the current working directory:
>>> from kxi import ml
>>> ml.init()
>>> ml.registry.new.registry()
{'storage': 'local',
'folderPath': b'.',
'registryPath': b'./KX_ML_REGISTRY',
'modelStorePath': ':./KX_ML_REGISTRY/modelStore'}
- Passing the folder path as a string will create a registry in the specified local folder:
>>> ml.registry.new.registry(folder_path="/tmp")
{'storage': 'local',
'folderPath': b'/tmp',
'registryPath': b'/tmp/KX_ML_REGISTRY',
'modelStorePath': ':/tmp/KX_ML_REGISTRY/modelStore'}
- Calling with a dictionary will create a registry within the cloud:
>>> ml.registry.new.registry(folder_path={'aws':'s3://my-bucket/'})
kxi.ml.registry.new.experiment
Create a new experiment within the specified registry.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
experiment_name |
str |
The name of the experiment as a string |
required |
folder_path |
Union[str, dict] |
Either a string denoting the folder path location to create the registry,
or a dictionary specifying the vendor (as key) and the path (as value),
ex: |
None |
config |
dict |
Either an experiment configuration as dictionary, or None |
None |
Returns:
Type | Description |
---|---|
dict |
Experiment configuration as a dictionary |
Examples:
Create an experiment named "day0" in the registry located under /tmp
:
>>> from kxi import ml
>>> ml.init()
>>> ml.registry.new.experiment(experiment_name="day0", folder_path="/tmp")
kxi.ml.registry.log.metric
Associate a timestamp and named metric value with a model stored in the Registry.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
metric_name |
Name of the metric to be associated with a model as an 'str' |
required | |
metric_value |
Any |
Value of the metric to be stored with the model |
required |
folder_path |
Union[str, dict] |
Either a string denoting the folder path where the registry exists,
or a dictionary specifying the vendor (as key) and the path (as value),
ex: |
None |
experiment_name |
str |
The name of the experiment the model is associated with as a string |
None |
model_name |
str |
The name of the model to which the parameter is to be associated, if None the latest model added to the registry will be used |
None |
version |
list |
A list of the major and minor versions of the model - [major, minor]. If
None, the latest version of the model associated with |
None |
Examples:
Log a validation metric along with the corresponding model:
>>> from kxi import ml
>>> ml.init()
>>> from sklearn.linear_model import LinearRegression
>>> from sklearn.metrics import mean_squared_error
>>> import numpy as np
>>> regressor = LinearRegression().fit(np.random.randn(10, 1), np.random.rand(10))
>>> ml.registry.set.model(model=regressor,
model_name="linear_regression",
model_type="sklearn",
folder_path="/tmp",
experiment_name="day0")
>>> X = np.random.randn(5, 1)
>>> y_true = np.random.rand(5)
>>> y_pred = regressor.predict(X)
>>> mse = mean_squared_error(y_true, y_pred)
>>> ml.registry.log.metric(metric_name="mse",
metric_value=mse,
folder_path="/tmp",
experiment_name="day0",
model_name="linear_regression")
kxi.ml.registry.set.model
Persist a model to the Registry.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model |
The model to be saved to the Registry, this can be one of: - A Python function which can be saved using 'joblib.dump' - A fit scikit-learn model which contains a 'predict' method - A Keras model containing a 'predict' method - A PyTorch model saved which can be saved using 'torch.save' - A Theano model which can be saved using 'joblib.dump', users must additionally include a path to the code needed for running this function via the 'code' key in the 'opts' argument - A q model - A path to any of the above models saved to disk |
required | |
model_name |
str |
Name to be associated with the model when saved as a 'str'. |
required |
model_type |
str |
The type of model that is being saved, namely 'q', 'sklearn', 'keras', 'python', 'torch' or 'theano' as an 'str'. |
required |
folder_path |
Union[str, dict] |
Either a string denoting the folder path where the registry exists,
or a dictionary specifying the vendor (as key) and the path (as value),
ex: |
None |
experiment_name |
str |
The name of the experiment the model is associated with as a string. |
None |
description |
str |
User supplied description of the model to be added to the registry, this should describe important characteristics of the model indicating it's unique nature. |
None |
data |
dict |
User supplied reference data for the parsing of statistical information about the data to be used for monitoring purposes 'kx.Table', 'np.records', 'pd.DataFrame' or 'pa.Table' objects supported |
None |
requirements |
Union[bool, str, List[str]] |
Are Python requirements to be associated with a model, there are 2 options - bool - Whether to run 'pip freeze' against your current environment - str - Specifying the path to a 'requirements.txt' file by default no requirements are saved with the model - List[str] - A list of package names we want to link to the model |
None |
major |
bool |
Boolean indicating if the addition of this model a 'major' update i.e. 1.1 -> 2.0 rather than 1.1 -> 1.2 |
None |
major_version |
int |
What major version of the model is to be incremented? By default the function will increment based on the maximal version in the registry, users can overwrite this to increment a minor version of a previous model. |
None |
code |
str |
Reference to the location of any files |
None |
axis |
bool |
Boolean indicating if data should be passed into the model - 'horizontally', column per feature indicated by 'True' - 'vertically', row per feature indicated by 'False' |
None |
Returns:
Type | Description |
---|---|
UUID |
UUID indicating the unique model added to the Registry |
Examples:
Save a trained model to a local registry:
>>> from kxi import ml
>>> ml.init()
>>> from sklearn.linear_model import LinearRegression
>>> import numpy as np
>>> regressor = LinearRegression().fit(np.random.randn(10, 1), np.random.rand(10))
>>> ml.registry.set.model(model=regressor,
model_name="linear_regression",
model_type="sklearn",
folder_path="/tmp",
experiment_name="day0")
UUID('1fd69913-2ea8-6436-2f5d-129ae1261070')
kxi.ml.registry.set.parameters
Associate parameters with a model stored in the Registry.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
param_name |
str |
Name of the parameter to be associated with a model as a string |
required |
params |
str |
Value of the parameter to be stored as a json file |
required |
folder_path |
Union[str, dict] |
Either a string denoting the folder path where the registry exists,
or a dictionary specifying the vendor (as key) and the path (as value),
ex: |
None |
experiment_name |
str |
The name of the experiment the model is associated with as a string |
None |
model_name |
str |
The name of the model to which the parameter is to be associated, if None the latest model added to the registry will be used |
None |
version |
list |
A list of the major and minor versions of the model - [major, minor]. If None, the latest version of the model associated with model_name is used. |
None |
Returns:
Type | Description |
---|---|
str |
A string containing the path of the created parameter file. |
Examples:
Save hyperparameters used during training along with the corresponding model:
>>> from kxi import ml
>>> ml.init()
>>> from sklearn.linear_model import QuantileRegressor
>>> import numpy as np
>>> alpha = 0.0
>>> regressor = QuantileRegressor(alpha=alpha).fit(np.random.randn(10, 1),
np.random.rand(10))
>>> ml.registry.set.model(model=regressor,
model_name="quantile_regression",
model_type="sklearn",
folder_path="/tmp",
experiment_name="day0")
>>> ml.registry.set.parameters(param_name="alpha",
params=alpha,
folder_path="/tmp",
experiment_name="day0",
model_name="quantile_regression")
':/tmp/KX_ML_REGISTRY/namedExperiments/day0/quantile_regression/1.0/params/alpha.json'