Skip to content

Similarity Metrics

This section contains details on the different similarity metrics available to use with KDB.AI.

Distance metrics are used to measure similarities among vectors. A number of similarity metrics are available, but you would generally choose the same metric used by your embedding model. Before you create your table, ensure you assign your metric as part of the table schema. Some examples can be seen below.

KDB.AI supports the following distance metrics:

Euclidean distance: Represented as L2

Euclidean Distance is a measure of the straight-line distance between two points in Euclidean space. In the context of vectors, Euclidean Distance is calculated as the square root of the sum of squared differences between corresponding elements of two vectors. With this metric, both magnitude and direction of vectors are used.

The following example uses 'metric': 'L2' to assign the euclidian metric to the documents table.

schema = {'columns': [{'name': 'id', 'pytype': 'str'},
                      {'name': 'vectors',
                        'vectorIndex': {'dims': 8, 'metric': 'L2', 'type': 'flat'}}]}
table = session.create_table('documents',schema)

Dot product: Represented as IP

The Dot Product measures the similarity or alignment between two vectors. It quantifies how much the vectors point in the same direction. For two vectors A and B in n-dimensional space, the Dot Product is calculated as the sum of the products of their corresponding elements.

The Dot Product can be positive (if vectors are aligned in the same direction), negative (if vectors are aligned in opposite directions), or zero (if vectors are orthogonal). With this metric, both magnitude and direction of vectors are used.

The following example uses 'metric': 'IP' to assign the dot product metric to the movies table.

schema = {'columns': [{'name': 'id', 'pytype': 'str'},
                      {'name': 'vectors',
                        'vectorIndex': {'dims': 8, 'metric': 'IP', 'type': 'flat'}}]}
table = session.create_table('movies',schema)

Cosine similarity: Represented as CS

Cosine Similarity is a metric that measures the cosine of the angle between two vectors. It assesses the similarity in terms of the direction of vectors, rather than their magnitude. Cosine Similarity values range from -1 (perfect dissimilarity) to 1 (perfect similarity), with 0 indicating orthogonality (perfect non-relation). With this metric, only the direction of vectors is used.

The following example uses 'metric': 'CS' to assign the cosine similarity metric to the imaging table.

schema = {'columns': [{'name': 'id', 'pytype': 'str'},
                      {'name': 'vectors',
                        'vectorIndex': {'dims': 8, 'metric': 'CS', 'type': 'flat'}}]}
table = session.create_table('imaging',schema)