kxi.ml.registry.get
Machine Learning object retrieval.
Once saved to the ML Registry following the instructions outlined here,
entities that have been persisted should be accessible to any user permissioned
with access to the registry save location. The kxi.ml.registry.get
class provides
all the callable functions used for the retrieval of objects from a registry.
All functionality within this class is described below.
model_store
def model_store(folder_path: Optional[Union[str, dict]] = None,
config: Optional[dict] = None) -> pd.DataFrame
Get model store table.
Arguments:
folder_path
- Either a string indicating the local path or a dictionary containing the vendor and location as strings eg.{'local':'./path_to_folder'}
or{'aws':'s3://aws_bucket_name'}
orNone
to default to local current working directory.config
- Dictionary containing the additional configuration needed for retrieving the model store
Returns:
The model store as a Pandas dataframe
Examples:
Retrieve the model store from a local registry:
>>> from kxi import ml
>>> ml.init()
>>> ml.registry.get.model_store(folder_path="/tmp")
registrationTime experimentName ... version description
0 2022-02-07 17:18:23.174058795 b'day0' ... [1, 0] b''
1 2022-02-07 17:18:45.035301957 b'day0' ... [1, 1] b''
2 2022-02-07 17:24:19.347368947 b'day0' ... [1, 0] b''
3 2022-02-07 17:26:22.473878853 b'day0' ... [1, 2] b''
metric
def metric(metric: Optional[Union[str, List[str]]] = None,
*,
folder_path: Optional[Union[str, dict]] = None,
experiment_name: Optional[str] = None,
model_name: Optional[str] = None,
version: Optional[List[int]] = None) -> pd.DataFrame
Get metrics associated with a specific model.
Arguments:
metric
- Name or list of names of metrics to be retrieved. IfNone
then retrieve all metrics.folder_path
- Either a string containing the folder path denoting where to get the metrics, or a dictionary specifying the vendor (as key) and the path (as value), e.g.{'aws':'s3://kx-ml-registry-bucket'}
, orNone
to default to local current working directory.experiment_name
- Either the name of the experiment under which the metrics reside as a string, orNone
if unnamed.model_name
- Either the name of model with metrics as a string, orNone
if latest model to be used.version
- A list of the major and minor versions of the model - [major, minor]. If None, the latest version of the model associated withmodel_name
is used.
Returns:
Table of metrics associated to model.
Examples:
Retrieve the Mean Squared Error (MSE) metric values logged for the model called "linear_regression":
>>> from kxi import ml
>>> ml.init()
>>> ml.registry.get.metric(folder_path="/tmp",
experiment_name="day0",
model_name="linear_regression",
metric="mse")
timestamp metricName metricValue
0 2022-02-07 18:26:22.488021473 mse 0.071849
parameters
def parameters(
param_name: str,
*,
folder_path: Optional[Union[str, dict]] = None,
experiment_name: Optional[str] = None,
model_name: Optional[str] = None,
version: Optional[List[int]] = None
) -> Union[str, dict, float, pd.DataFrame]
Get parameters associated with a specific model.
Arguments:
param_name
- Name of parameters to be retrieved.folder_path
- Either a string containing the folder path denoting where to get the params, or a dictionary specifying the vendor (as key) and the path (as value), e.g.{'aws':'s3://kx-ml-registry-bucket'}
, orNone
to default to local current working directory.experiment_name
- Either the name of the experiment under which the params reside as a string, orNone
if unnamed.model_name
- Either the name of model with metrics as a string, orNone
if latest model to be used.version
- A list of the major and minor versions of the model - [major, minor]. If None, the latest version of the model associated with model_name is used.
Returns:
Parameters associated to model.
Examples:
Retrieve the "alpha" hyperparameter associated with the model "quantile_regression":
>>> from kxi import ml
>>> ml.init()
>>> ml.registry.get.parameters(param_name="alpha",
folder_path="/tmp",
experiment_name="day0",
model_name="quantile_regression")
0.0
model
def model(folder_path: Optional[Union[str, dict]] = None,
experiment_name: Optional[str] = None,
model_name: Optional[str] = None,
version: Optional[List[int]] = None) -> dict
Retrieve a q/python/sklearn/keras model from the registry.
Arguments:
folder_path
- Either a string indicating the local path or a dictionary containing the vendor and location as strings eg.{'local':'./path_to_folder'}
or{'aws':'s3://aws_bucket_name'}
or None to default to local current working directory.experiment_name
- Either the name of the experiment under which the metrics reside as a string, orNone
if unnamed.model_name
- Either the name of model with metrics as a string, orNone
if latest model to be used.version
- A list of the major and minor versions of the model - [major, minor]. If None, the latest version of the model associated with model_name is used.
Returns:
The model and information related to the generation of the model
Examples:
Retrieve the model "linear_regression" from a local registry:
```python
from kxi import ml ml.init() ml.registry.get.model(folder_path="/tmp", experiment_name="day0", model_name="linear_regression") -
{'modelInfo'
- {'registry': {'description': b'', -'modelInformation'
- {'modelName': b'linear_regression', -'version'
- [1.0, 2.0], -'registrationTime'
- [b'2022-02-07T17:26:22.473878853'], -'uniqueID'
- [b'909b8828-e138-8399-0a77-98bdbffef099'], -'requirements'
- False}, -'experimentInformation'
- {'experimentName': b'day0'}}, -'model'
- {'type': b'sklearn', 'axis': b''}, -'monitoring'
- {'nulls': {'monitor': True, 'values': {}}, -'infinity'
- {'monitor': True, -'values'
- {'negInfReplace': {}, 'posInfReplace': {}}}, -'schema'
- {'monitor': False, 'values': {}}, -'latency'
- {'monitor': False, 'values': {'avg': inf, 'std': inf}}, -'psi'
- {'monitor': False, 'values': {}}, -'csi'
- {'monitor': False, 'values': {}}, -'supervised'
- {'monitor': False, 'values': []}}}, -'model'
- pykx.Composition(pykx.q('{[f;x]embedPy[f;x]}[foreign]enlist'))} ```
Retrieve a specific version:
```python
ml.registry.get.model(folder_path="/tmp", experiment_name="day0", model_name="linear_regression", version=[1, 0]) -
{'modelInfo'
- {'registry': {'description': b'', -'modelInformation'
- {'modelName': b'linear_regression', -'version'
- [1.0, 0.0], -'registrationTime'
- [b'2022-02-07T17:18:23.174058795'], -'uniqueID'
- [b'4dc4f616-e66d-bd42-ca71-79bc4fe94683'], -'requirements'
- False}, -'experimentInformation'
- {'experimentName': b'day0'}}, -'model'
- {'type': b'sklearn', 'axis': b''}, -'monitoring'
- {'nulls': {'monitor': True, 'values': {}}, -'infinity'
- {'monitor': True, -'values'
- {'negInfReplace': {}, 'posInfReplace': {}}}, -'schema'
- {'monitor': False, 'values': {}}, -'latency'
- {'monitor': False, 'values': {'avg': inf, 'std': inf}}, -'psi'
- {'monitor': False, 'values': {}}, -'csi'
- {'monitor': False, 'values': {}}, -'supervised'
- {'monitor': False, 'values': []}}}, -'model'
- pykx.Composition(pykx.q('{[f;x]embedPy[f;x]}[foreign]enlist'))} ```
predict
def predict(folder_path: Optional[Union[str, dict]] = None,
experiment_name: Optional[str] = None,
model_name: Optional[str] = None,
version: Optional[List[int]] = None) -> kx.Composition
Retrieve a q/python/sklearn/keras model from the registry for prediction.
Arguments:
folder_path
- Either a string indicating the local path or a dictionary containing the vendor and location as strings eg.{'local':'./path_to_folder'}
or{'aws':'s3://aws_bucket_name'}
or None to default to local current working directory.experiment_name
- Either the name of the experiment under which the metrics reside as a string, orNone
if unnamed.model_name
- Either the name of model with metrics as a string, orNone
if latest model to be used.version
- A list of the major and minor versions of the model - [major, minor]. If None, the latest version of the model associated with model_name is used.
Returns:
Model retrieved from the registry.
Examples:
Retrieve the trained model "linear_regression" to use for new predictions:
>>> from kxi import ml
>>> ml.init()
>>> predict = ml.registry.get.predict(folder_path="/tmp",
experiment_name="day0",
model_name="linear_regression")
>>> import numpy as np
>>> predict(np.random.randn(5, 1)).np()
array([0.37508146, 0.34456208, 0.37623354, 0.49891433, 0.38987454])
Repeat for a specific version of the trained model:
>>> predict = ml.registry.get.predict(folder_path="/tmp",
experiment_name="day0",
model_name="linear_regression",
version=[1, 0])
>>> import numpy as np
>>> predict(np.random.randn(5, 1)).np()
array([0.56951651, 0.79826611, 0.78192483, 0.65961372, 0.65201045])
update
def update(supervised: bool,
*,
folder_path: Optional[Union[str, dict]] = None,
experiment_name: Optional[str] = None,
model_name: Optional[str] = None,
version: Optional[List[int]] = None) -> kx.Composition
Retrieve a q/python/sklearn/keras model from the registry for update.
Arguments:
supervised
- Boolean to specify if the model to update is supervised or not.folder_path
- Either a string indicating the local path or a dictionary containing the vendor and location as strings eg.{'local':'./path_to_folder'}
or{'aws':'s3://aws_bucket_name'}
or None to default to local current working directory.experiment_name
- Either the name of the experiment under which the metrics reside as a string, orNone
if unnamed.model_name
- Either the name of model with metrics as a string, orNone
if latest model to be used.version
- A list of the major and minor versions of the model - [major, minor]. If None, the latest version of the model associated with model_name is used.
Returns:
Model retrieved from the registry.
Examples:
When our model supports "partial fitting", we can retrieve it from the registry for
additional training using the ml.registry.get.update
function.
In this example, we train a SGDRegressor and save it to a local registry:
>>> from kxi import ml
>>> ml.init()
>>> from sklearn.linear_model import SGDRegressor
>>> import numpy as np
>>> regressor = SGDRegressor().fit(np.random.randn(10, 1), np.random.rand(10))
>>> ml.registry.set.model(model=regressor,
model_name="sgd_regression",
model_type="sklearn",
folder_path="/tmp",
experiment_name="day0")
UUID('05a68abe-b256-f829-39f1-34229f0f015f')
Then, we can retrieve the fit model for additional training:
>>> update = ml.registry.get.update(supervised=True,
folder_path="/tmp",
experiment_name="day0",
model_name="sgd_regression")
>>> updated_regressor = update(np.random.randn(5, 1), np.random.rand(5))
The updated model can be saved back to the registry,
with the model version incremented accordingly:
>>> ml.registry.set.model(model=updated_regressor,
model_name="sgd_regression",
model_type="sklearn",
folder_path="/tmp",
experiment_name="day0")
UUID('dcca8b86-2fa2-889a-67c0-ad2dbb762163')
Finally, we can retrieve the updated model to make predictions:
>>> predict = ml.registry.get.predict(folder_path="/tmp",
experiment_name="day0",
model_name="sgd_regression")
>>> predict(np.random.randn(5, 1)).np()
array([0.26516812, 0.27963551, 0.29606174, 0.25128473, 0.36829261])
version
def version(folder_path: Optional[Union[str, dict]] = None,
experiment_name: Optional[str] = None,
model_name: Optional[str] = None,
version: Optional[List[int]] = None)
Retrieve language/library version information associated with a model stored in the registry.
Arguments:
folder_path
- Either a string indicating the local path or a dictionary containing the vendor and location as strings eg.{'local':'./path_to_folder'}
or{'aws':'s3://aws_bucket_name'}
or None to default to local current working directory.experiment_name
- Either the name of the experiment under which the metrics reside as a string, orNone
if unnamed.model_name
- Either the name of model with metrics as a string, orNone
if latest model to be used.version
- A list of the major and minor versions of the model - [major, minor]. If None, the latest version of the model associated with model_name is used.
Returns:
A dictionary containing version information about a model stored in the registy.
Examples:
When adding models to the registry we include information relating to the Python/q library and language versions used when persisting the model. This provides information to users retrieving these models which allow them to ensure compatibility with their deployment environment In this example, we train a SGDRegressor, save it to a local registry and interrogate information relating to the environment with which it was persisted:
```python
from kxi import ml ml.init() from sklearn.linear_model import SGDRegressor import numpy as np regressor = SGDRegressor().fit(np.random.randn(10, 1), np.random.rand(10)) ml.registry.set.model(model=regressor, model_name="sgd_regression", model_type="sklearn", folder_path="/tmp", experiment_name="day0") UUID('05a68abe-b256-f829-39f1-34229f0f015f') ml.registry.get.version(model_name="sgd_regression", folder_path="/tmp", experiment_name="day0") -
{'q_version'
- b'Version: 4 | Release Date: 2022.02.16', -'model_type'
- b'sklearn', -'python_version'
- b'3.9.6 (default, Nov 9 2021, 13:31:27) ... -'python_library_version'
- b'1.0.2'} ```