Fuzzy Matching

This page describes the parameters for the Fuzzy Matching method as part of AI libs.

Fuzzy filters improve vector search efficiency and recall by narrowing down candidates based on specific criteria. Using fuzzy filters leads to more accurate results, even when dealing with imprecise or "fuzzy" queries.

.ai.fuzzy.dist

The .ai.fuzzy.dist function calculates the fuzzy distance between parameters, typically strings or sequences.

The distance represents how similar or dissimilar the inputs are, often based on edit operations such as insertions, deletions, or substitutions. It is useful for approximate matching scenarios where exact equality is not required, such as handling typos, variations in text, or noisy input data.

Parameters

Name	Type(s)	Description
`data`	string \| string[] \| enum	The strings to search against
`q`	string \| string[]	The query string(s)
`metric`	symbol	The distance type, one of `.ai.fuzzy.utils.fuzzyDistances`

Returns

Type	Description
float	The fuzzy distance between x and y

Example

q).ai:use`kx.ai
q)genString:{[length] length?.Q.an," "};
q)batch:genString each 5 + 10?10;
q)q:genString[5 + rand[10]];
q).ai.fuzzy.dist[batch;q;`levenshtein];
14 12 12 14 13 14 14 14 14 13f

The example generates a batch of random strings (batch) and a query string (q), then calculates the Levenshtein distance between them using .ai.fuzzy.dist. The output shows the edit distances for each comparison, where smaller values indicate greater similarity. This demonstrates how the function measures the "closeness" of sequences even when they are not identical.

.ai.fuzzy.search

The .ai.fuzzy.search function returns k best match for q from a dataset using fuzzy matching.

It identifies the closest candidates by computing similarity scores between the query and each data entry, ranking them based on their fuzzy distance.

Parameters

Name	Type(s)	Description
`data`	string[] \| symbol[] \| enum	The list of strings/symbols to search from
`q`	string \| string[] \| symbol \| symbol[]	The pattern to search for
`k`	short \| int \| long	The number of best matches to return
`metric`	symbol	The distance type, one of `.ai.fuzzy.utils.fuzzyDistances`

Returns

Type	Description
(float[]; int[]; (char \| string \| symbol)[])	A list of triples containing distance, index and match value

Example

q).ai:use`kx.ai
q)genString:{[length] length?.Q.an," "};
q)batch:genString each 5 + 10?10;
q)q:genString[5 + rand[10]];
q).ai.fuzzy.search[batch;q;3;`levenshtein];
12               12       13         
2                1        4          
"6SmcKLSCuEI9n8" "ynEr5x" "TtNEiYH_c"

Here, the same random string batch and query are used, but .ai.fuzzy.search retrieves the top 3 closest matches ranked by Levenshtein distance. The first row lists the distances, the second row the indices in the dataset, and the third row the actual matching strings. This example shows how fuzzy search can quickly find approximate matches in a noisy or variable dataset.

Next steps

Read the Fuzzy filters for symbol changes with KDB-X AI-Libs tutorial on Medium.