Skip to content

Loading

Machine Learning object retrieval.

Once saved to the ML Registry following the instructions outlined here, entities that have been persisted should be accessible to any user permissioned with access to the registry save location. The kxi.ml.registry.get class provides all the callable functions used for the retrieval of objects from a registry. All functionality within this class is described below.

kxi.ml.registry.get.model_store

Get model store table.

Parameters:

Name Type Description Default
folder_path Union[str, dict]

Either a string indicating the local path or a dictionary containing the vendor and location as strings eg. {'local':'./path_to_folder'} or {'aws':'s3://aws_bucket_name'} or None to default to local current working directory.

None
config dict

Dictionary containing the additional configuration needed for retrieving the model store

None

Returns:

Type Description
DataFrame

The model store as a Pandas dataframe

Examples:

Retrieve the model store from a local registry:

>>> from kxi import ml
>>> ml.init()
>>> ml.registry.get.model_store(folder_path="/tmp")
               registrationTime experimentName  ... version description
0 2022-02-07 17:18:23.174058795        b'day0'  ... [1, 0]         b''
1 2022-02-07 17:18:45.035301957        b'day0'  ... [1, 1]         b''
2 2022-02-07 17:24:19.347368947        b'day0'  ... [1, 0]         b''
3 2022-02-07 17:26:22.473878853        b'day0'  ... [1, 2]         b''

kxi.ml.registry.get.metric

Get metrics associated with a specific model.

Parameters:

Name Type Description Default
metric Union[str, List[str]]

Name or list of names of metrics to be retrieved. If None then retrieve all metrics.

None
folder_path Union[str, dict]

Either a string containing the folder path denoting where to get the metrics, or a dictionary specifying the vendor (as key) and the path (as value), e.g. {'aws':'s3://kx-ml-registry-bucket'}, or None to default to local current working directory.

None
experiment_name str

Either the name of the experiment under which the metrics reside as a string, or None if unnamed.

None
model_name str

Either the name of model with metrics as a string, or None if latest model to be used.

None
version List[int]

A list of the major and minor versions of the model - [major, minor]. If None, the latest version of the model associated with model_name is used.

None

Returns:

Type Description
DataFrame

Table of metrics associated to model.

Examples:

Retrieve the Mean Squared Error (MSE) metric values logged for the model called "linear_regression":

>>> from kxi import ml
>>> ml.init()
>>> ml.registry.get.metric(folder_path="/tmp",
                           experiment_name="day0",
                           model_name="linear_regression",
                           metric="mse")
                      timestamp metricName  metricValue
0 2022-02-07 18:26:22.488021473        mse     0.071849

kxi.ml.registry.get.parameters

Get parameters associated with a specific model.

Parameters:

Name Type Description Default
param_name str

Name of parameters to be retrieved.

required
folder_path Union[str, dict]

Either a string containing the folder path denoting where to get the params, or a dictionary specifying the vendor (as key) and the path (as value), e.g. {'aws':'s3://kx-ml-registry-bucket'}, or None to default to local current working directory.

None
experiment_name str

Either the name of the experiment under which the params reside as a string, or None if unnamed.

None
model_name str

Either the name of model with metrics as a string, or None if latest model to be used.

None
version List[int]

A list of the major and minor versions of the model - [major, minor]. If None, the latest version of the model associated with model_name is used.

None

Returns:

Type Description
Union[str, dict, float, pandas.core.frame.DataFrame]

Parameters associated to model.

Examples:

Retrieve the "alpha" hyperparameter associated with the model "quantile_regression":

>>> from kxi import ml
>>> ml.init()
>>> ml.registry.get.parameters(param_name="alpha",
                               folder_path="/tmp",
                               experiment_name="day0",
                               model_name="quantile_regression")
0.0

kxi.ml.registry.get.model

Retrieve a q/python/sklearn/keras model from the registry.

Parameters:

Name Type Description Default
folder_path Union[str, dict]

Either a string indicating the local path or a dictionary containing the vendor and location as strings eg. {'local':'./path_to_folder'} or {'aws':'s3://aws_bucket_name'} or None to default to local current working directory.

None
experiment_name str

Either the name of the experiment under which the metrics reside as a string, or None if unnamed.

None
model_name str

Either the name of model with metrics as a string, or None if latest model to be used.

None
version List[int]

A list of the major and minor versions of the model - [major, minor]. If None, the latest version of the model associated with model_name is used.

None

Returns:

Type Description
dict

The model and information related to the generation of the model

Examples:

Retrieve the model "linear_regression" from a local registry:

>>> from kxi import ml
>>> ml.init()
>>> ml.registry.get.model(folder_path="/tmp",
                          experiment_name="day0",
                          model_name="linear_regression")
{'modelInfo': {'registry': {'description': b'',
   'modelInformation': {'modelName': b'linear_regression',
    'version': [1.0, 2.0],
    'registrationTime': [b'2022-02-07T17:26:22.473878853'],
    'uniqueID': [b'909b8828-e138-8399-0a77-98bdbffef099'],
    'requirements': False},
   'experimentInformation': {'experimentName': b'day0'}},
  'model': {'type': b'sklearn', 'axis': b''},
  'monitoring': {'nulls': {'monitor': True, 'values': {}},
   'infinity': {'monitor': True,
    'values': {'negInfReplace': {}, 'posInfReplace': {}}},
   'schema': {'monitor': False, 'values': {}},
   'latency': {'monitor': False, 'values': {'avg': inf, 'std': inf}},
   'psi': {'monitor': False, 'values': {}},
   'csi': {'monitor': False, 'values': {}},
   'supervised': {'monitor': False, 'values': []}}},
 'model': pykx.Composition(pykx.q('{[f;x]embedPy[f;x]}[foreign]enlist'))}

Retrieve a specific version:

>>> ml.registry.get.model(folder_path="/tmp",
                          experiment_name="day0",
                          model_name="linear_regression",
                          version=[1, 0])
{'modelInfo': {'registry': {'description': b'',
   'modelInformation': {'modelName': b'linear_regression',
    'version': [1.0, 0.0],
    'registrationTime': [b'2022-02-07T17:18:23.174058795'],
    'uniqueID': [b'4dc4f616-e66d-bd42-ca71-79bc4fe94683'],
    'requirements': False},
   'experimentInformation': {'experimentName': b'day0'}},
  'model': {'type': b'sklearn', 'axis': b''},
  'monitoring': {'nulls': {'monitor': True, 'values': {}},
   'infinity': {'monitor': True,
    'values': {'negInfReplace': {}, 'posInfReplace': {}}},
   'schema': {'monitor': False, 'values': {}},
   'latency': {'monitor': False, 'values': {'avg': inf, 'std': inf}},
   'psi': {'monitor': False, 'values': {}},
   'csi': {'monitor': False, 'values': {}},
   'supervised': {'monitor': False, 'values': []}}},
 'model': pykx.Composition(pykx.q('{[f;x]embedPy[f;x]}[foreign]enlist'))}

kxi.ml.registry.get.predict

Retrieve a q/python/sklearn/keras model from the registry for prediction.

Parameters:

Name Type Description Default
folder_path Union[str, dict]

Either a string indicating the local path or a dictionary containing the vendor and location as strings eg. {'local':'./path_to_folder'} or {'aws':'s3://aws_bucket_name'} or None to default to local current working directory.

None
experiment_name str

Either the name of the experiment under which the metrics reside as a string, or None if unnamed.

None
model_name str

Either the name of model with metrics as a string, or None if latest model to be used.

None
version List[int]

A list of the major and minor versions of the model - [major, minor]. If None, the latest version of the model associated with model_name is used.

None

Returns:

Type Description
Composition

Model retrieved from the registry.

Examples:

Retrieve the trained model "linear_regression" to use for new predictions:

>>> from kxi import ml
>>> ml.init()
>>> predict = ml.registry.get.predict(folder_path="/tmp",
                                      experiment_name="day0",
                                      model_name="linear_regression")
>>> import numpy as np
>>> predict(np.random.randn(5, 1)).np()
array([0.37508146, 0.34456208, 0.37623354, 0.49891433, 0.38987454])

Repeat for a specific version of the trained model:

>>> predict = ml.registry.get.predict(folder_path="/tmp",
                                      experiment_name="day0",
                                      model_name="linear_regression",
                                      version=[1, 0])
>>> import numpy as np
>>> predict(np.random.randn(5, 1)).np()
array([0.56951651, 0.79826611, 0.78192483, 0.65961372, 0.65201045])

kxi.ml.registry.get.update

Retrieve a q/python/sklearn/keras model from the registry for update.

Parameters:

Name Type Description Default
supervised bool

Boolean to specify if the model to update is supervised or not.

required
folder_path Union[str, dict]

Either a string indicating the local path or a dictionary containing the vendor and location as strings eg. {'local':'./path_to_folder'} or {'aws':'s3://aws_bucket_name'} or None to default to local current working directory.

None
experiment_name str

Either the name of the experiment under which the metrics reside as a string, or None if unnamed.

None
model_name str

Either the name of model with metrics as a string, or None if latest model to be used.

None
version List[int]

A list of the major and minor versions of the model - [major, minor]. If None, the latest version of the model associated with model_name is used.

None

Returns:

Type Description
Composition

Model retrieved from the registry.

Examples:

When our model supports "partial fitting", we can retrieve it from the registry for additional training using the ml.registry.get.update function. In this example, we train a SGDRegressor and save it to a local registry:

>>> from kxi import ml
>>> ml.init()
>>> from sklearn.linear_model import SGDRegressor
>>> import numpy as np
>>> regressor = SGDRegressor().fit(np.random.randn(10, 1), np.random.rand(10))
>>> ml.registry.set.model(model=regressor,
                          model_name="sgd_regression",
                          model_type="sklearn",
                          folder_path="/tmp",
                          experiment_name="day0")
UUID('05a68abe-b256-f829-39f1-34229f0f015f')

Then, we can retrieve the fit model for additional training:

>>> update = ml.registry.get.update(supervised=True,
                                    folder_path="/tmp",
                                    experiment_name="day0",
                                    model_name="sgd_regression")
>>> updated_regressor = update(np.random.randn(5, 1), np.random.rand(5))

The updated model can be saved back to the registry, with the model version incremented accordingly:

>>> ml.registry.set.model(model=updated_regressor,
                          model_name="sgd_regression",
                          model_type="sklearn",
                          folder_path="/tmp",
                          experiment_name="day0")
UUID('dcca8b86-2fa2-889a-67c0-ad2dbb762163')

Finally, we can retrieve the updated model to make predictions:

>>> predict = ml.registry.get.predict(folder_path="/tmp",
                                      experiment_name="day0",
                                      model_name="sgd_regression")
>>> predict(np.random.randn(5, 1)).np()
array([0.26516812, 0.27963551, 0.29606174, 0.25128473, 0.36829261])