Skip to content

Clustering

Variadic function definitions in the KX Insights ML-Analytics library for Clustering

ml.kxi.clust. kmeans.fit fit a K-means model ap.fit fit an Affinity Propagation model dbscan.fit fit a DBSCAN model cure.fit fit a CURE model hc.fit fit a Hierarchical Clustering model

ML Toolkit clustering algorithms for examples of function returns for prediction/update, not covered below

.ml.kxi.clust.ap.fit

Fit an Affinity Propagation model

.ml.kxi.clust.ap.fit[X]
.ml.kxi.clust.ap.fit[X;config]

Where

  • X is the input/training data
  • config is an optional dictionary containing modifications to default behavior, with keys

    key type default description
    df symbol nege2dist Distance function used in calculating distance between points
    damp float 0.5 The damping coefficient to be applied to the availability and responsibility matrices
    diag function med Function applied to the similarity matrix diagonal
    iter dictionary :: A dictionary containing the max allowed interations and max iterations without a change in cluster centers

returns a dictionary

modelInfo | all information needed to fit the original model 
predict   | a projection allowing for predictions on new input data 
q)data:([]100?1f;100?1f;100?1f)

// Fit a model using default configuration
q)show mdl1:.ml.kxi.clust.ap.fit[data]
modelInfo| `data`inputs`clust`exemplars!((0.8599461 0.2452222 0.6070236 0.686..
predict  | {[config;data]
  config:config`modelInfo;
  data:clust.i.floatConv..
q)mdl1[`modelInfo;`inputs]
df  | `nege2dist
damp| 0.5
diag| k){avg x(<x)@_.5*-1 0+#x,:()}
iter| `run`total`noChange!0 200 15

// Fit a model modifying the default behaviour
q)show mdl2:.ml.kxi.clust.ap.fit[data;`damp`diag!(0.75;max)]
modelInfo| `data`inputs`clust`exemplars!((0.8599461 0.2452222 0.6070236 0.686..
predict  | {[config;data]
  config:config`modelInfo;
  data:clust.i.floatConv..
q)mdl2[`modelInfo;`inputs]
df  | `nege2dist
damp| 0.75
diag| max
iter| `run`total`noChange!0 200 15

.ml.kxi.clust.cure.fit

Fit a CURE model

.ml.kxi.clust.cure.fit[X]
.ml.kxi.clust.cure.fit[X;config]

Where

  • X is the input/training data
  • config is an optional dictionary containing modifications to default behavior, with keys

    key type default description
    df symbol e2dist Distance function used in calculating distance between data points
    n integer 5 The number of representative points used per cluster.
    c float 0.0 The compression ratio, this determines how close to the center of a cluster representative points are spaced.

returns a dictionary

modelInfo | All information needed to fit the original model 
predict   | A projection allowing for predictions on new input data 
q)data:([]100?1f;100?1f;100?1f)

// Fit a model using default configuration
q)show mdl1:.ml.kxi.clust.cure.fit[data]
modelInfo| `data`inputs`dgram!((0.8599461 0.2452222 0.6070236 0.6868635 0.837..
predict  | {[config;data;cutDict]
  data:clust.i.floatConversion i.tabConvert..
q)mdl1[`modelInfo;`inputs]
df| `e2dist
n | 5
c | 0

// Fit a model modifying the default behaviour
q)show mdl2:.ml.kxi.clust.cure.fit[data;`n`c!(4;0.1)]
modelInfo| `data`inputs`dgram!((0.8599461 0.2452222 0.6070236 0.6868635 0.837..
predict  | {[config;data;cutDict]
  data:clust.i.floatConversion i.tabConvert..
q)mdl2[`modelInfo;`inputs]
df| `e2dist
n | 4
c | 0.1

.ml.kxi.clust.dbscan.fit

Fit a DBSCAN model

.ml.kxi.clust.dbscan.fit[X]
.ml.kxi.clust.dbscan.fit[X;config]

Where

  • X is the input/training data
  • config is an optional dictionary containing modifications to default behavior, with keys

    key type default description
    df symbol e2dist Distance function used in calculating distance between data points
    minPts integer 5 Minimum number of datapoints within a radius required to define a cluster
    eps float 0.5 The epsilon radius, this is the distance from a point within which neighbouring points are said to be in the same cluster.

returns a dictionary

modelInfo | all information needed to fit the original model 
predict   | a projection allowing for predictions on new input data 
update    | a projection allowing new data to be used to update cluster centers 
            such that the model can react to new data 
q)data:([]100?1f;100?1f;100?1f)

// Fit a model using default configuration
q)show mdl1:.ml.kxi.clust.dbscan.fit[data]
modelInfo| `data`inputs`clust`tab!((0.8599461 0.2452222 0.6070236 0.6868635 0..
predict  | {[config;data]
  config:config[`modelInfo];
  data:clust.i.floatCo..
update   | {[config;data]
  modelConfig:config[`modelInfo];
  data:clust.i.fl..
q)mdl1[`modelInfo;`inputs]
df    | `e2dist
minPts| 5
eps   | 0.5

// Fit a model modifying the default behaviour
q)show mdl2:.ml.kxi.clust.dbscan.fit[data;`df`eps!(`edist;0.75)]
modelInfo| `data`inputs`clust`tab!((0.8599461 0.2452222 0.6070236 0.6868635 0..
predict  | {[config;data]
  config:config[`modelInfo];
  data:clust.i.floatCo..
update   | {[config;data]
  modelConfig:config[`modelInfo];
  data:clust.i.fl..
q)mdl2[`modelInfo;`inputs]
df    | `edist
minPts| 5
eps   | 0.75

.ml.kxi.clust.hc.fit

Fit a Hierarchical clustering model

.ml.kxi.clust.hc.fit[X]
.ml.kxi.clust.hc.fit[X;config]

Where

  • X is the input/training data
  • config is an optional dictionary containing modifications to default behavior, with keys

    key type default description
    df symbol e2dist Distance function used in calculating distance between data points
    lf symbol ward Linkage function used to calculate the 'distance' between two clusters
q)data:([]100?1f;100?1f;100?1f)

// Fit a model using default configuration
q)show mdl1:.ml.kxi.clust.hc.fit[data]
modelInfo| `data`inputs`dgram!((0.8599461 0.2452222 0.6070236 0.6868635 0.837..
predict  | {[config;data;cutDict]
  data:clust.i.floatConversion i.tabConvert..
q)mdl1[`modelInfo;`inputs]
df| e2dist
lf| ward

// Fit a model modifying the default behaviour
q)show mdl2:.ml.kxi.clust.hc.fit[data;`df`lf!`e2dist`complete]
modelInfo| `data`inputs`dgram!((0.8599461 0.2452222 0.6070236 0.6868635 0.837..
predict  | {[config;data;cutDict]
  data:clust.i.floatConversion i.tabConvert..
q)mdl2[`modelInfo;`inputs]
df| e2dist
lf| complete

.ml.kxi.clust.kmeans.fit

Fit a K Means model

.ml.kxi.clust.kmeans.fit[X]
.ml.kxi.clust.kmeans.fit[X;config]

Where

  • X is the input/training data
  • config is an optional dictionary containing modifications to default behavior, with keys

    key type default description
    df symbol e2dist Distance function used in calculating distance between cluster centers and data points
    k integer 8 Number of cluster centers to be calculated
    config dictionary :: Any additional configuration required for application of clustering

returns a dictionary:

modelInfo | all information needed to fit the original model 
predict   | a projection allowing for predictions on new input data 
update    | a projection allowing new data to be used to update cluster centers 
            such that the model can react to new data 
q)data:([]100?1f;100?1f;100?1f)

// Fit a model using default configuration
q)show mdl1:.ml.kxi.clust.kmeans.fit data
modelInfo| `repPts`clust`data`inputs!((0.8139576 0.09132079 0.2219031;0.67699..
predict  | {[config;data]
  config:config[`modelInfo];
  data:clust.i.floatCo..
update   | {[config;data]
  modelConfig:config[`modelInfo];
  data:clust.i.fl..
q)mdl1[`modelInfo;`inputs]
df  | `e2dist
k   | 8
iter| 100
kpp | 1b

// Fit a model modifying the default behaviour
q)show mdl2:.ml.kxi.clust.kmeans.fit[data;`df`k!(`edist;3)]
modelInfo| `repPts`clust`data`inputs!((0.8148896 0.3256995 0.5313307;0.236372..
predict  | {[config;data]
  config:config[`modelInfo];
  data:clust.i.floatCo..
update   | {[config;data]
  modelConfig:config[`modelInfo];
  data:clust.i.fl..
q)mdl2[`modelInfo;`inputs]
df  | `edist
k   | 3
iter| 100
kpp | 1b