Clustering
The following outlines the variadic function definitions provided with the kdb Insights ML Analytics library for Clustering. Full breakdowns of the algorithms represented can be found here, this includes, via examples, the use of the function returns for prediction/update, this is not outlined below explicitly.
Note
All arguments marked with an asterisk are optional and can be input using the notation defined in the function calls section of the model monitoring documentation.
K-Means
.ml.kxi.clust.kmeans.fit
Fit a K Means model
.ml.kxi.clust.kmeans.fit[X]
Parameters:
name | type | description |
---|---|---|
X |
any |
Input/training data of N dimensions. |
options:
name | type | description | default |
---|---|---|---|
df |
symbol |
Distance function used in clustering. | edist |
k |
long |
The number of clusters. | 8 |
centers |
dictionary|null |
Initial cluster centers. If null, initial centers are calculated using k++/random initialisation. If dictionary, must contain num and centroids which define the number of points in a cluster and the cluster location often calculated from a previous 'fit' phase. |
:: |
config |
dictionary |
Any additional configuration required for application of clustering, supported options defined here. | :: |
Returns:
type | description |
---|---|
dictionary |
All information collected during the fitting of a model, along with prediction and update functionality. |
Examples:
Example 1: Fit a model in default configuration using only required arguments
// Generate feature data
q)data:([]100?1f;100?1f;100?1f)
// Fit model
q)show mdl1:.ml.kxi.clust.kmeans.fit data
modelInfo| `repPts`clust`data`inputs!((0.8139576 0.09132079 0.2219031;0.67699..
predict | {[config;data]
config:config[`modelInfo];
data:clust.util.floatCo..
update | {[config;data]
modelConfig:config[`modelInfo];
data:clust.util.fl..
q)mdl1[`modelInfo;`inputs]
df | `e2dist
k | 8
iter| 100
kpp | 1b
Example 2: Fit model modifying the default behaviour using additional arguments
// Generate feature data
q)data:([]100?1f;100?1f;100?1f)
// Fit model
q)show mdl2:.ml.kxi.clust.kmeans.fit[data;.var.kwargs`df`k!(`edist;3)]
modelInfo| `repPts`clust`data`inputs!((0.8148896 0.3256995 0.5313307;0.236372..
predict | {[config;data]
config:config[`modelInfo];
data:clust.util.floatCo..
update | {[config;data]
modelConfig:config[`modelInfo];
data:clust.util.fl..
q)mdl2[`modelInfo;`inputs]
df | `edist
k | 3
iter| 100
kpp | 1b
Affinity Propagation
.ml.kxi.clust.ap.fit
Fit a Affinity Propagation model
.ml.kxi.clust.ap.fit[X]
Parameters:
name | type | description |
---|---|---|
X |
any |
Input/training data of N dimensions. |
options:
name | type | description | default |
---|---|---|---|
df |
symbol |
Distance function used in clustering. | nege2dist |
damp |
float |
Damping coefficient. | 0.5 |
diag |
function |
Preference function for the diagonal of the similarity matrix. | med |
iter |
dictionary |
Max allowed iterations and the max iterations without a change in clusters. When null is passed in `total`noChange!200 50 are used. |
:: |
Returns:
type | description |
---|---|
dictionary |
All information collected during the fitting of a model, along with prediction functionality. |
Examples:
Example 1:
// Generate feature data
q)data:([]100?1f;100?1f;100?1f)
// Fit a model in default configuration using only required arguments
q)show mdl1:.ml.kxi.clust.ap.fit data
modelInfo| `data`inputs`clust`exemplars!((0.8599461 0.2452222 0.6070236 0.686..
predict | {[config;data]
config:config`modelInfo;
data:clust.util.floatConv..
q)mdl1[`modelInfo;`inputs]
df | `nege2dist
damp| 0.5
diag| k){avg x(<x)@_.5*-1 0+#x,:()}
iter| `run`total`noChange!0 200 15
Example 2:
// Generate feature data
q)data:([]100?1f;100?1f;100?1f)
// Fit a model modifying the default behaviour using a mix of positional and keyword arguments
q)damp:.75
q)show mdl2:.ml.kxi.clust.ap.fit[data;damp;.var.kw[`diag;max]]
modelInfo| `data`inputs`clust`exemplars!((0.8599461 0.2452222 0.6070236 0.686..
predict | {[config;data]
config:config`modelInfo;
data:clust.util.floatConv..
q)mdl2[`modelInfo;`inputs]
df | `nege2dist
damp| 0.75
diag| max
iter| `run`total`noChange!0 200 15
DBSCAN
.ml.kxi.clust.dbscan.fit
Fit a DBSCAN model
.ml.kxi.clust.dbscan.fit[X]
Parameters:
name | type | description |
---|---|---|
X |
any |
Input/training data of N dimensions. |
options:
name | type | description | default |
---|---|---|---|
df |
symbol |
Distance function used in clustering. | e2dist |
minPts |
long |
Minimum number of points required in a given neighborhood to define a cluster. | 5 |
eps |
float |
Epsilon radius. | 0.5 |
Returns:
type | description |
---|---|
dictionary |
All information collected during the fitting of a model, along with prediction and update functionality. |
Examples:
Example 1:
// Generate feature data
q)data:([]100?1f;100?1f;100?1f)
// Fit a model in default configuration using only required arguments
q)show mdl1:.ml.kxi.clust.dbscan.fit data
modelInfo| `data`inputs`clust`tab!((0.8599461 0.2452222 0.6070236 0.6868635 0..
predict | {[config;data]
config:config[`modelInfo];
data:clust.util.floatCo..
update | {[config;data]
modelConfig:config[`modelInfo];
data:clust.util.fl..
q)mdl1[`modelInfo;`inputs]
df | `e2dist
minPts| 5
eps | 0.5
Example 2:
// Generate feature data
q)data:([]100?1f;100?1f;100?1f)
// Fit a model modifying the default behaviour using positional arguments
q)df:`edist
q)eps:.75
q)show mdl2:.ml.kxi.clust.dbscan.fit[data;df;eps]
modelInfo| `data`inputs`clust`tab!((0.8599461 0.2452222 0.6070236 0.6868635 0..
predict | {[config;data]
config:config[`modelInfo];
data:clust.util.floatCo..
update | {[config;data]
modelConfig:config[`modelInfo];
data:clust.util.fl..
q)mdl2[`modelInfo;`inputs]
df | `edist
minPts| 5
eps | 0.75
CURE
.ml.kxi.clust.cure.fit
Fit a CURE model
.ml.kxi.clust.cure.fit[X]
Parameters:
name | type | description |
---|---|---|
X |
any |
Input/training data of N dimensions. |
options:
name | type | description | default |
---|---|---|---|
df |
symbol |
Distance function used in clustering. | e2dist |
n |
long |
Number of representative points. | 5 |
c |
float |
Compression ratio. | 0 |
Returns:
type | description |
---|---|
dictionary |
All information collected during the fitting of a model, along with prediction functionality. |
Examples:
Example 1:
// Generate feature data
q)data:([]100?1f;100?1f;100?1f)
// Fit a model in default configuration using only required arguments
q)show mdl1:.ml.kxi.clust.cure.fit data
modelInfo| `data`inputs`dgram!((0.8599461 0.2452222 0.6070236 0.6868635 0.837..
predict | {[config;data;cutDict]
data:clust.util.floatConversion util.tabConvert..
q)mdl1[`modelInfo;`inputs]
df| `e2dist
n | 5
c | 0
Example 2:
// Generate feature data
q)data:([]100?1f;100?1f;100?1f)
// Fit a model modifying the default behaviour using additional arguments
q)show mdl2:.ml.kxi.clust.cure.fit[data;.var.kwargs`n`c!(4;.1)]
modelInfo| `data`inputs`dgram!((0.8599461 0.2452222 0.6070236 0.6868635 0.837..
predict | {[config;data;cutDict]
data:clust.util.floatConversion util.tabConvert..
q)mdl2[`modelInfo;`inputs]
df| `e2dist
n | 4
c | 0.1
.ml.kxi.clust.cure.fitPredict
Fit and predict on CURE model
.ml.kxi.clust.cure.fitPredict[X]
Parameters:
name | type | description |
---|---|---|
X |
any |
Input/training data of N dimensions. |
options:
name | type | description | default |
---|---|---|---|
df |
symbol |
Distance function used in clustering. | e2dist |
n |
long |
Number of representative points. | 5 |
c |
float |
Compression ratio. | 0 |
cutDict |
dictionary |
Cutting algo to use when splitting the data into clusters (`k/`dist ) and a value defining the cutting threshold. |
enlist[`k]!enlist 5 |
Returns:
type | description |
---|---|
dictionary |
All information collected during the fitting of a model, along with predicted clusters and prediction functionality. |
Examples:
Example 1:
// Generate feature data
q)show data:2 10#20?10.
1.473702 4.080537 3.03448 9.659883 7.874197 4.734442 8.423141 2.7..
0.72077 5.450964 4.625792 0.6486378 6.951865 9.674697 7.26315 2.4..
// Fit a CURE model and cut the dendrogram into 3 clusters
// Use a mix of positional and keyword arguments
q).ml.kxi.clust.cure.fitPredict[data;.var.kw[`df;`edist];.var.kw[`cutDict;enlist[`k]!enlist 3]]
modelInfo| `data`inputs`dgram!((1.473702 4.080537 3.03448 9.659883 ..
predict | {[config;data;cutDict]
updConfig:clust.i.prepPred[config;cutDict..
clust | 0 0 0 1 1 2 1 0 1 0
Hierarchical Clustering
.ml.kxi.clust.hc.fit
Fit a Hierarchical clustering model
.ml.kxi.clust.hc.fit[X]
Parameters:
name | type | description |
---|---|---|
X |
any |
Input/training data of N dimensions. |
options:
name | type | description | default |
---|---|---|---|
df |
symbol |
Distance function. | e2dist |
lf |
symbol |
Linkage function. | ward |
Returns:
type | description |
---|---|
dictionary |
All information collected during the fitting of a model, along with prediction functionality. |
Examples:
Example 1:
// Generate feature data
q)data:([]100?1f;100?1f;100?1f)
// Fit a model in default configuration using only required arguments
q)show mdl1:.ml.kxi.clust.hc.fit[data]
modelInfo| `data`inputs`dgram!((0.8599461 0.2452222 0.6070236 0.6868635 0.837..
predict | {[config;data;cutDict]
data:clust.util.floatConversion util.tabConvert..
q)mdl1[`modelInfo;`inputs]
df| e2dist
lf| ward
Example 2:
// Generate feature data
q)data:([]100?1f;100?1f;100?1f)
// Fit a model modifying the default behaviour using only positional arguments
q)df:`mdist
q)show mdl2:.ml.kxi.clust.hc.fit[data;df]
modelInfo| `data`inputs`dgram!((0.8599461 0.2452222 0.6070236 0.6868635 0.837..
predict | {[config;data;cutDict]
data:clust.util.floatConversion util.tabConvert..
q)mdl2[`modelInfo;`inputs]
df| mdist
lf| complete
.ml.kxi.clust.hc.fitPredict
Fit and predict on a hierarchical clustering model
.ml.kxi.clust.hc.fit[X]
Parameters:
name | type | description |
---|---|---|
X |
any |
Input/training data of N dimensions. |
options:
name | type | description | default |
---|---|---|---|
df |
symbol |
Distance function. | e2dist |
lf |
symbol |
Linkage function. | ward |
cutDict |
dictionary |
Cutting algo to use when splitting the data into clusters (`k/`dist ) and a value defining the cutting threshold. |
enlist[`k]!enlist 5 |
Returns:
type | description |
---|---|
dictionary |
All information collected during the fitting of a model, along with prediction functionality. |
Examples:
Example 1:
// Generate feature data
q)show data:2 10#20?10.
6.01551 9.775468 9.809354 4.237163 5.424916 1.994707 2.496307 2.599..
1.046143 7.154895 8.098937 2.546309 6.298331 0.249301 5.341463 4.106..
// Fit a HC model and cut the dendrogram into 4 clusters
// Use only keyword arguments
q).ml.clust.hc.fitPredict[data;.var.kwargs`lf`k!(`single;4)]
modelInfo| `data`inputs`dgram!((6.01551 9.775468 9.809354 4.237163 5..
predict | {[config;data;cutDict]
updConfig:clust.i.prepPred[config;cutDict..
clust | 0 2 2 0 1 3 0 0 0 1