Skip to content

Cloud integration

Warning

Cloud integration is supported when the registry is used as a standalone application.

Support for this within Docker images will be provided at next release.

By default the registry operates entirely as an on-prem solution for the storage, management and serving of models. But functionality to interact with cloud storage has been provided for

  • GCP Cloud Storage
  • AWS S3 Storage
  • Azure Blob Storage

This interaction is facilitated by wrappers around the gsutil (GCP), aws s3 (AWS) and azcopy (Azure) command-line interfaces. You need this software installed, and access appropriately permissioned.

You should find minimal differences between the on-prem and cloud storage versions of the registry.

Before running any of the function calls to publish/retrieve from the cloud, you must generate a storage bucket to which artifacts will be published. At present this must be done external to the functionality provided by this interface: if the bucket does not exist ML Registry will not generate it.

Once you have a storage bucket available you can run the following examples from any application running this library standalone.

In the examples below, initializing the repository follows the quickstart guide, using the init.q file defined in the guide.

Initialization

There are two ways to define how a q session using the ML Registry should interact with it in a supported cloud-storage solution.

  1. JSON configuration
  2. command-line definition

JSON configuration

The ML Registry includes a configuration file src/cloud/config.json within the unpacked registry library. This provides a location for users to define variables required for each of the cloud vendors:

{ 
  "aws":{
    "bucket":"s3://path-to-bucket"
    },
  "gcp":{
    "bucket":"gs://path-to-bucket"
    },
  "azure":{
    "blob" :"https://SAS-url-to-blob",
    "token":"SAS-token"
    }
}

In each case users define the bucket/blob location to which they are expected to publish the ML Registry; and in the case of Azure, the SAS-token generated for interactions with the blob storage location.

To use the default buckets defined within this config, provide a command-line argument at startup of your q session, with no other parameter:

q init.q -gcp
q init.q -aws
q init.q -azure

Command-line definition

If you are modifying the registry being published to regularly, or you need to remove explicit definition of the bucket location for security purposes, you can, on initialization of the repo, define the bucket/blob which is being published to:

q init.q -gcp gs://path-to-bucket
q init.q -aws s3://path-to-bucket
q init.q -azure "https://[SAS-url-to-blob]?[SAS-Token]"

Azure command-line integration requires the double quotes shown above.

Examples

After initialization as above you can run these examples.

Add items to cloud storage:

q).ml.registry.new.registry[::;::]
q).ml.registry.new.experiment[::;"test";::]
q).ml.registry.set.model[::;{x};"mymodel";"q";enlist[`experimentName]!enlist"test"]

Log ancillary information associated with a model:

q).ml.registry.log.metric[::;::;::;::;`metric;2f]
q).ml.registry.set.parameters[::;::;"mymodel";1 0;"paramFile";`param1`param2!1 2]

Retrieve items from the registry:

q).ml.registry.get.model[::;::;"mymodel";1 0]
modelInfo| `registry`model`monitoring!(`description`modelInformation`experime..
model    | {x}
q).ml.registry.get.metric[::;::;::;::;`metric]
timestamp                     metricName metricValue
----------------------------------------------------
2021.04.29D12:24:23.117795000 metric     2
q).ml.registry.get.parameters[::;::;::;::;`paramFile]
param1| 1
param2| 2

Delete the registry:

q).ml.registry.delete.registry[::;::]