# Storing

The ML Registry allows users to persist a variety of versioned entities to disk and cloud storage applications. The ML Registry provides this persistence functionality across a number of Python submodules, namely, kxi.ml.registry.[new/set/log]. All supported functionality within these submodules is described below.

## kxi.ml.registry.new.registry

Create a new registry.

Parameters:

Name Type Description Default
folder_path Union[str, dict]

Either a string denoting the folder path location to create the registry, or a dictionary specifying the vendor (as key) and the path (as value), ex: {'aws':'s3://kx-ml-registry-bucket'} or None to default to local current working directory

None
config dict

Either a registry configuration as dictionary, or None

None

Returns:

Type Description
dict

Registry configuration as a dictionary

Examples:

Create a registry in various locations depending on argument type.

1. Default arguments will create a registry in the current working directory:
>>> from kxi import ml
>>> ml.init()
>>> ml.registry.new.registry()
{'storage': 'local',
'folderPath': b'.',
'registryPath': b'./KX_ML_REGISTRY',
'modelStorePath': ':./KX_ML_REGISTRY/modelStore'}
1. Passing the folder path as a string will create a registry in the specified local folder:
>>> ml.registry.new.registry(folder_path="/tmp")
{'storage': 'local',
'folderPath': b'/tmp',
'registryPath': b'/tmp/KX_ML_REGISTRY',
'modelStorePath': ':/tmp/KX_ML_REGISTRY/modelStore'}
1. Calling with a dictionary will create a registry within the cloud:
>>> ml.registry.new.registry(folder_path={'aws':'s3://my-bucket/'})

## kxi.ml.registry.new.experiment

Create a new experiment within the specified registry.

Parameters:

Name Type Description Default
experiment_name str

The name of the experiment as a string

required
folder_path Union[str, dict]

Either a string denoting the folder path location to create the registry, or a dictionary specifying the vendor (as key) and the path (as value), ex: {'aws':'s3://kx-ml-registry-bucket'} or None to default to local current working directory

None
config dict

Either an experiment configuration as dictionary, or None

None

Returns:

Type Description
dict

Experiment configuration as a dictionary

Examples:

Create an experiment named "day0" in the registry located under /tmp:

>>> from kxi import ml
>>> ml.init()
>>> ml.registry.new.experiment(experiment_name="day0", folder_path="/tmp")

## kxi.ml.registry.log.metric

Associate a timestamp and named metric value with a model stored in the Registry.

Parameters:

Name Type Description Default
metric_name

Name of the metric to be associated with a model as an 'str'

required
metric_value Any

Value of the metric to be stored with the model

required
folder_path Union[str, dict]

Either a string denoting the folder path where the registry exists, or a dictionary specifying the vendor (as key) and the path (as value), ex: {'aws':'s3://kx-ml-registry-bucket'} or None to default to local current working directory

None
experiment_name str

The name of the experiment the model is associated with as a string

None
model_name str

The name of the model to which the parameter is to be associated, if None the latest model added to the registry will be used

None
version list

A list of the major and minor versions of the model - [major, minor]. If None, the latest version of the model associated with model_name is used.

None

Examples:

Log a validation metric along with the corresponding model:

>>> from kxi import ml
>>> ml.init()
>>> from sklearn.linear_model import LinearRegression
>>> from sklearn.metrics import mean_squared_error
>>> import numpy as np
>>> regressor = LinearRegression().fit(np.random.randn(10, 1), np.random.rand(10))
>>> ml.registry.set.model(model=regressor,
model_name="linear_regression",
model_type="sklearn",
folder_path="/tmp",
experiment_name="day0")
>>> X = np.random.randn(5, 1)
>>> y_true = np.random.rand(5)
>>> y_pred = regressor.predict(X)
>>> mse = mean_squared_error(y_true, y_pred)
>>> ml.registry.log.metric(metric_name="mse",
metric_value=mse,
folder_path="/tmp",
experiment_name="day0",
model_name="linear_regression")

## kxi.ml.registry.set.model

Persist a model to the Registry.

Parameters:

Name Type Description Default
model

The model to be saved to the Registry, this can be one of: - A Python function which can be saved using 'joblib.dump' - A fit scikit-learn model which contains a 'predict' method - A Keras model containing a 'predict' method - A PyTorch model saved which can be saved using 'torch.save' - A Theano model which can be saved using 'joblib.dump', users must additionally include a path to the code needed for running this function via the 'code' key in the 'opts' argument - A q model - A path to any of the above models saved to disk

required
model_name str

Name to be associated with the model when saved as a 'str'.

required
model_type str

The type of model that is being saved, namely 'q', 'sklearn', 'keras', 'python', 'torch' or 'theano' as an 'str'.

required
folder_path Union[str, dict]

Either a string denoting the folder path where the registry exists, or a dictionary specifying the vendor (as key) and the path (as value), ex: {'aws':'s3://kx-ml-registry-bucket'} or None to default to local current working directory.

None
experiment_name str

The name of the experiment the model is associated with as a string.

None
description str

User supplied description of the model to be added to the registry, this should describe important characteristics of the model indicating it's unique nature.

None
data dict

User supplied reference data for the parsing of statistical information about the data to be used for monitoring purposes 'kx.Table', 'np.records', 'pd.DataFrame' or 'pa.Table' objects supported

None
requirements Union[bool, str, List[str]]

Are Python requirements to be associated with a model, there are 2 options - bool - Whether to run 'pip freeze' against your current environment - str - Specifying the path to a 'requirements.txt' file by default no requirements are saved with the model - List[str] - A list of package names we want to link to the model

None
major bool

Boolean indicating if the addition of this model a 'major' update i.e. 1.1 -> 2.0 rather than 1.1 -> 1.2

None
major_version int

What major version of the model is to be incremented? By default the function will increment based on the maximal version in the registry, users can overwrite this to increment a minor version of a previous model.

None
code str

Reference to the location of any files *.py, *.p, *.k or *.q required by the model, these will be loaded prior to loading the model on retrieval as such this should be used if the model being retrieved has prerequisites.

None
axis bool

Boolean indicating if data should be passed into the model - 'horizontally', column per feature indicated by 'True' - 'vertically', row per feature indicated by 'False'

None

Returns:

Type Description
UUID

UUID indicating the unique model added to the Registry

Examples:

Save a trained model to a local registry:

>>> from kxi import ml
>>> ml.init()
>>> from sklearn.linear_model import LinearRegression
>>> import numpy as np
>>> regressor = LinearRegression().fit(np.random.randn(10, 1), np.random.rand(10))
>>> ml.registry.set.model(model=regressor,
model_name="linear_regression",
model_type="sklearn",
folder_path="/tmp",
experiment_name="day0")
UUID('1fd69913-2ea8-6436-2f5d-129ae1261070')

## kxi.ml.registry.set.parameters

Associate parameters with a model stored in the Registry.

Parameters:

Name Type Description Default
param_name str

Name of the parameter to be associated with a model as a string

required
params str

Value of the parameter to be stored as a json file

required
folder_path Union[str, dict]

Either a string denoting the folder path where the registry exists, or a dictionary specifying the vendor (as key) and the path (as value), ex: {'aws':'s3://kx-ml-registry-bucket'} or None to default to local current working directory

None
experiment_name str

The name of the experiment the model is associated with as a string

None
model_name str

The name of the model to which the parameter is to be associated, if None the latest model added to the registry will be used

None
version list

A list of the major and minor versions of the model - [major, minor]. If None, the latest version of the model associated with model_name is used.

None

Returns:

Type Description
str

A string containing the path of the created parameter file.

Examples:

Save hyperparameters used during training along with the corresponding model:

>>> from kxi import ml
>>> ml.init()
>>> from sklearn.linear_model import QuantileRegressor
>>> import numpy as np
>>> alpha = 0.0
>>> regressor = QuantileRegressor(alpha=alpha).fit(np.random.randn(10, 1),
np.random.rand(10))
>>> ml.registry.set.model(model=regressor,
model_name="quantile_regression",
model_type="sklearn",
folder_path="/tmp",
experiment_name="day0")
>>> ml.registry.set.parameters(param_name="alpha",
params=alpha,
folder_path="/tmp",
experiment_name="day0",
model_name="quantile_regression")

':/tmp/KX_ML_REGISTRY/namedExperiments/day0/quantile_regression/1.0/params/alpha.json'