Deploying models to production systems with zero downtime particularly when those models can be updated based on streaming data poses a number of challenges. Principle among these is that these models don't get polluted with 'bad' data that could affect the results of the model.

To help mitigate this for the stochastic gradient descent models an updateSecure function is made available for the Linear Regression and Logistic Classification models once fit.

The logic used by this function is populated based on the addition of threshFunc and deleteRows to the paramDict parameter for each of the .ml.online.sgd fitting functions.

The following outlines the behaviours supported by the threshFunc parameter:

option format description
max max|(max;val) Set the maximum allowed value, by default this is the maximum value within the original fit dataset when using max, if using (max;val) then the maximum accepted value is val
min min|(min;val) Set the minimum allowed value, by default this is the minimum value within the original fit dataset when using min, if using (min;val) then the minimum accepted value is val
avg avg|(avg;dev) When using avg data must be within avg +/- 2*standard deviation, when using (avg;dev) new data must be within avg +/- dev*standard deviation

By default the optional parameter deleteRows is set to 0b, in this instance the presence of values which fall outside the bounds of threshFunc will result in an error. If set to 1b any rows that fall outside the bounds of threshFunc will be removed and the model updated with any data that does conform with requirements.

Examples:

The example below makes use of the linear regression model but the same logic and workflow also works for the logistic classification model.

Example 1: Apply updateSecure after fitting linear model

// Generate data
q)X:10 3#30?10f
q)y:10?10f
q)X2:5 3#15?10f
q)y2:5?10f

// Fit linear regression model
q)show mdl:.ml.online.sgd.linearRegression.fit[X;y;1b;enlist[threshFunc]!enlist (min,max)]
modelInfo   | thetaiterdifftrendparamDictinputType!(-0.02693591 0.90007..
predict     | {[config;features]
config:configmodelInfo;
if[configtrend..
update      | {[config;secure;features;target]
modelInfo:configmodelInfo;
..
modelInfo:configmodelInfo;
..

'Input column(s): 0 have values outside of given threshold bounds: 0.8388858 for function: min
[3]  /Users/test/projects/ml/insights/ml-tools/ml-analytics/src/online/sgd/utils.q:530: .ml.online.sgd.util.threshCheck:
"threshold bounds: ",bounds," for function: ",string threshFunc;
\$[not deleteRow;'printCol;-1 printCol];
^
rows:asc distinct raze online.sgd.util.findRow[threshFunc;threshBound;X]each idx;
[2]  /Users/test/projects/ml/insights/ml-tools/ml-analytics/src/online/sgd/utils.q:503: .ml.online.sgd.util.checkX:
if[0=count threshFunc;:(::)];
distinct raze online.sgd.util.threshCheck[features;deleteRow]'[threshFunc;threshBound]
^
}

// Set default dictionary to delete rows instead of erroring
q)show mdl1:.ml.online.sgd.linearRegression.fit[X;y;1b;threshFuncdeleteRow!(min,max;1b)]
modelInfo   | thetaiterdifftrendparamDictinputType!(-0.05745156 0.97512..
predict     | {[config;features]
config:configmodelInfo;
if[configtrend..
update      | {[config;secure;features;target]
modelInfo:configmodelInfo;
..
modelInfo:configmodelInfo;
..

Input column(s): 0,1,2 have values outside of given threshold bounds: 0.8388858 1.780839 2.306385 for function: min
Row(s) 2 3 4 removed from dataset

Input column(s): 2 have values outside of given threshold bounds: 7.111716 for function: max
Row(s) 0 3 removed from dataset

modelInfo   | thetaiterdifftrendparamDictinputType!(-0.1185931 0.920137..
predict     | {[config;features]
config:configmodelInfo;
if[configtrend..
update      | {[config;secure;features;target]
modelInfo:configmodelInfo;
..
modelInfo:configmodelInfo;
..

// Explicitly set upper and lower bounds of acceptable values
q)show mdl1:.ml.online.sgd.linearRegression.fit[X;y;1b;threshFuncdeleteRow!(((min;0.1);(max;9.9));1b)]
modelInfo   | thetaiterdifftrendparamDictinputType!(-0.02812883 0.41534..
predict     | {[config;X]
config:configmodelInfo;
theta:configtheta;
..
update      | {[config;secure;X;y]
modelInfo:configmodelInfo;
theta:mode..
modelInfo:configmodelInfo;
theta:mode..

Input column(s): 1 have values outside of given threshold bounds: 0.8904193 for function: min
Row(s) 3 removed from dataset

modelInfo   | thetaiterdifftrendparamDictinputType!(-0.01304417 0.49747..
predict     | {[config;X]
config:configmodelInfo;
theta:configtheta;
..
update      | {[config;secure;X;y]
modelInfo:configmodelInfo;
theta:mode..
modelInfo:configmodelInfo;
theta:mode..`