Skip to content

Linear-regression SGD model

A linear regression model assumes a linear relationship between the input variables \(X\) and the target/output value \(y\). In practice this implies some linear combination of the input variable \(X\) allows us to infer/predict the value \(y\).

This relationship can be described:

\[y_i= \sum_{n=1}^{N} \theta_{n} X_{i,n}\]

Where:

  • \(X\) is the \(N\)-dimensional features which in linear combination represent \(y\)
  • \(y\) is the variable being predicted
  • \(\theta\) is the weighted coefficients used to calculate the best linear combination of \(X\) to describe \(y\)

The goal of a linear-regression model is to calculate the \(\theta\) coefficients which can then combined with input data to make predictions based on this linear relationship.

.ml.online.sgd.linearRegression.fit

Fit a linear regression model using stochastic gradient descent

.ml.online.sgd.linearRegression.fit[X;y;trend;paramDict]

Where

  • X is the input/training data of \(N\) dimensions
  • y is the output/target regression data
  • trend is whether a trend is to be accounted for (boolean)
  • paramDict is the configurable dictionary defining any modifications to be applied during the fitting process of SGD

returns a dictionary containing, all information collected during the fitting of a model, a model to be used for prediction and an update function.

The information collected during the fitting of the model are contained within the modelInfo key and include:

parameter description
theta The weights calculated during the process
iter The number of iterations applied during the process
diff The difference between the final theta values and the preceding values
trend Whether or not a trend value was fitted during the process
paramDict The parameter dictionary used during the process
inputType The data type of each column of the input data

Prediction functionality is contained within the predict key. The function takes as argument the input/training data of \(N\) dimensions, and returns the predicted values.

The model contains two types of update functions:

  • update, where models are updated assuming that the data given is suitable
  • updateSecure, where additional checks are applied to the data to ensure that it is in the correct format to ensure no ‘model pollution’ occurs

Both functions are binary, with arguments

  • input/training data of \(N\) dimensions
  • output/target regression data

and return a dictionary containing all information collected during the updating of a model, along with a prediction and update function.

If updateSecure is used, an error will be returned if appropriate data is not used.

During the update phase, the same model parameters are used as were applied during the fitting process, except that the maximum iteration is set to 1.

// Create data with strong correlation but also some noise
q)X:8*100?1f
q)y:4+3*X+100?1f
// Fit a linear regression SGD
q)show regMdl:.ml.online.sgd.linearRegression.fit[X;y;1b;`maxIter`alpha!(1000;0.01)]
modelInfo   | `theta`iter`diff`trend`paramDict`inputType!(5.648133 2.934584;1000;..
predict     | {[config;X]
  config:config`modelInfo;
  theta:config`theta;
  tre..
update      | {[config;X;y]
  modelInfo:config`modelInfo;
  theta:modelInfo`thet..
updateSecure| {[config;secure;X;y]
  modelInfo:config`modelInfo;
  theta:mode..

// Information generated during the fitting of the model
q)regMdl.modelInfo
theta    | 5.648133 2.934584
iter     | 1000
diff     | 0.01827465 0.01503591
trend    | 1b
paramDict| `alpha`maxIter`gTol`theta`k`seed`batchType`gra...
inputTyp | -9h

// Predict on new data
q)Xnew:8*10?1f
q)regMdl.predict[Xnew]
12.26865 19.97765 24.66665 18.05956 13.65026...

// Update the model using new data points
q)Xupd:8*5?1f
q)yUpd:4+3*Xupd+5?1f
q)show regUpd:regMdl.update[Xupd;yUpd]
modelInfo   | `theta`iter`diff`trend`paramDict!(5.627178 2.926986;1;0.02095537 0..
predict     | {[config;X]
  config:config`modelInfo;
  theta:config`theta;
  tre..
update      | {[config;X;y]
  modelInfo:config`modelInfo;
  theta:modelInfo`thet..
updateSecure| {[config;secure;X;y]
  modelInfo:config`modelInfo;
  theta:mode..

// Information generated during the updating of the model
q)regUpd.modelInfo
theta    | 5.627178 2.926986
iter     | 1
diff     | 0.02095537 0.007598042
trend    | 1b
paramDict| `alpha`maxIter`gTol`theta`k`seed`batchType`gradArgs`penalty`lambda.
inputTyp | -9h

Configurable parameters

In the above function, the following are the optional configurable entries for paramDict:

key type default Description
alpha float 0.01 The learning rate applied
maxIter integer 100 The maximum possible number of iterations before the run is terminated, this does not guarantee convergence
gTol float 1e-5 If the difference in gradient falls below this value the run is terminated
theta float 0 The initial starting weights
k integer *n The number of batches used or random points chosen each iteration
seed integer random The random seed
batchType symbol shuffle The batch type (`single`shuffle`shuffleRep`nonShuffle`noBatch)
penalty symbol l2 The penalty/regularization term (`l1`l2`elasticNet)
lambda float 0.001 The penalty term coefficient
l1Ratio float 0.5 The elastic net mixing parameter (Only used if penalty type `ElasticNet is applied)
decay float 0 The decay coefficient
p float 0 The momentum coefficient
verbose boolean 0b If information about the fitting process is to be printed after every epoch
accumulation boolean 0b If the theta value after each epoch is returned as the output
thresholdFunc list () The threshold function and value (optional) to apply when using updateSecure

In the above table *n is the length of the dataset.

A number of batchTypes can be applied when fitting a model using SGD:

option description
noBatch No batching occurs and all data points are used (regular gradient descent)
nonShuffle The data is split into k batches with no shuffling applied.
shuffle The data is shuffled into k batches. Each data point appears once.
shuffleRep The data is shuffled into k batches. Data points can appear more than once and not all data points may be used.
single k random points are chosen each iteration.