Logistic Classifier SGD Model
Logistic classification is an abstraction of logistic regression used to generate models for classification of boolean and multi-class use-cases. The model does this by calculating the probability that a given datapoint belongs to a specified category. A threshold is then set above which an item is deemed to belong to one class or another.
Similar to linear regression, the y
values are assumed to be dependent on the linear combination of the input X
values. Calculation of the probability is defined by the following formula:
Where z
is linear combination of the X
variable in N
dimensions and their associated weights \(\theta\) defined by the function:
SGD can be used as a method of fitting the X
data to the target variable y
in order to determine the coefficient weights \(\theta\) that best represent this combination.
.ml.online.sgd.logClassifier.fit
Fit a logistic classification stochastic gradient descent model
.ml.online.sgd.logClassifier.fit[X;y;trend;paramDict]
Parameters:
name | type | description |
---|---|---|
X |
any |
Input/training data of N dimensions. |
y |
any |
Output/target classification data. |
trend |
boolean |
Is a trend to be accounted for. |
paramDict |
dictionary |
Any modifications to be applied during the fitting process of SGD (See here for more details). |
Returns:
type | description |
---|---|
dictionary |
All information collected during the fitting of a model, along with prediction and update functionality. |
The information collected during the fitting of the model are contained within modelInfo
and include:
name | description |
---|---|
theta |
The weights calculated during the process. |
iter |
The number of iterations applied during the process. |
diff |
The difference between the final theta values and the preceding values. |
trend |
Whether or not a trend value was fitted during the process. |
paramDict |
The parameter dictionary used during the process. |
Prediction functionality is contained within the predict
key. The function takes the following inputs:
X
is the input/training data of N dimensions
and returns the predicted classes
Prediction probability functionality is contained within the predictProba
key. The function takes the following inputs:
X
is the input/training data of N dimensions
and returns the predicted probability of each class. For binary classification, a single probability is returned indicating the probability of the positive class being predicted, for multiclass models a one-vs-rest approach is used.
The model contains two types of update functions:
update
, where models are updated assuming that the data given is suitable.updateSecure
, where additional checks are applied to the data to ensure that it is in the correct format to ensure no 'model pollution' occurs.
Both functions take the following inputs:
X
is the input/training data of N dimensionsy
is the output/target classification data
returns a dictionary containing all information collected during the updating of a model, along with a prediction and update function.
If updateSecure
is used, an error will be returned if appropriate data is not used. See here for more information.
During the update
phase, the same model parameters are used that were applied during the fitting process, except the maximum iteration is set to 1.
Examples:
Example 1: Fit, predict and update a model
// Create data with strong correlation but also some noise
q)X:8*100?1f
q)y:4+3*X+100?1f
q)yClass:y<avg y
// Fit a logistic regression SGD
q)show logMdl:.ml.online.sgd.logClassifier.fit[X;yClass;1b;`seed`k!(42;5)]
modelInfo | `theta`iter`diff`trend`paramDict`inputType!(0.05981966 -0.2055255;100..
predict | {[config;X]
yhat:online.sgd.logClassifier.predict[config;X];
proba:..
predictProba| {[config;X]
yhat:online.sgd.logClassifier.predict[config;X];
proba:..
update | {[config;X;y]
modelInfo:config`modelInfo;
theta:modelInfo`t..
updateSecure| {[config;secure;X;y]
modelInfo:config`modelInfo;
theta:mode..
// Information generated during the fitting of the model
q)logMdl.modelInfo
theta | 0.05981966 -0.2055255
iter | 100
diff | -0.00124056 -0.0009483654
trend | 1b
paramDict| `alpha`maxIter`gTol`theta`k`seed`batchType`....
/ Predict on new data
q)Xnew:8*10?1f
q)logMdl.predict[Xnew]
0 0 0 0 0 0...
q)logMdl.predictProba[Xnew]
0.2065456 0.3266713 0.2207807 0.2183085 0.3717741 0..
// Update the fitted model
q)Xupd:8*5?1f
q)yUpd:4+3*Xupd+5?1f
q)yClassUpd:yUpd<avg yUpd
q)show logUpd:logMdl.update[Xupd;yClassUpd]
modelInfo | `theta`iter`diff`trend`paramDict!(0.06008984 -0.2086289;1;-0.00027..
predict | {[config;X]
yhat:online.sgd.logClassifier.predict[config;X];
proba:1%(..
update | {[config;X;y]
modelInfo:config`modelInfo;
theta:modelInfo`thet..
updateSecure| {[config;secure;X;y]
modelInfo:config`modelInfo;
theta:mode..
q)logUpd.modelInfo
theta | 0.06008984 -0.2086289
iter | 1
diff | -0.0002701815 0.003103383
trend | 1b
paramDict| `alpha`maxIter`gTol`theta`k`seed`batchType...
inputTyp | -9h
Configurable parameters
In the above function, the following are the optional configurable entries for paramDict
:
name | type | default | description |
---|---|---|---|
alpha |
float |
Applied learning rate. | 0.01 |
maxIter |
integer |
Max possible number of iterations before the run is terminated, this does not guarantee convergence. | 100 |
gTol |
float |
If the difference in gradient falls below this value the run is terminated. | 1e-5 |
theta |
float |
Initial starting weights. | 0 |
k |
integer |
Number of batches used or random points chosen each iteration. | *n |
seed |
integer |
Random seed. | random |
batchType |
symbol |
Batch type - `single`shuffle`shuffleRep`nonShuffle`noBatch . |
shuffle |
penalty |
symbol |
Penalty/regularization term - `l1`l2`elasticNet . |
l2 |
lambda |
float |
Penalty term coefficient. | 0.001 |
l1Ratio |
float |
Elastic net mixing parameter, only used if penalty type is ElasticNet . |
0.5 |
decay |
float |
Decay coefficient. | 0 |
p |
float |
Momentum coefficient. | 0 |
verbose |
boolean |
If information about the fitting process is to be printed after every epoch. | 0b |
accumulation |
boolean |
If the theta value after each epoch is returned as the output. | 0b |
thresholdFunc |
list |
Threshold function and value to apply when using updateSecure . |
() |
In the above table *n
is the length of the dataset.
A number of batchTypes
can be applied when fitting a model using SGD, the supported types and an explanation of their use of the k
parameter are explained below:
options:
name | description |
---|---|
noBatch |
No batching occurs and all data points are used (regular gradient descent) |
nonShuffle |
Data split into k batches with no shuffling applied. |
shuffle |
Data shuffled into k batches. Each data point appears once. |
shuffleRep |
Data shuffled into k batches. Data points can appear more than once and not all data points may be used. |
single |
k random points are chosen each iteration. |