Fuzzy Matching
This page describes the parameters for the Fuzzy Matching method as part of AI libs.
Fuzzy filters improve vector search efficiency and recall by narrowing down candidates based on specific criteria. Using fuzzy filters leads to more accurate results, even when dealing with imprecise or "fuzzy" queries.
.ai.fuzzy.dist
The .ai.fuzzy.dist function calculates the fuzzy distance between parameters, typically strings or sequences.
The distance represents how similar or dissimilar the inputs are, often based on edit operations such as insertions, deletions, or substitutions. It is useful for approximate matching scenarios where exact equality is not required, such as handling typos, variations in text, or noisy input data.
Parameters
| Name | Type(s) | Description |
|---|---|---|
data |
string | string[] | enum | The strings to search against |
q |
string | string[] | The query string(s) |
metric |
symbol | The distance type, one of .ai.fuzzy.utils.fuzzyDistances |
Returns
| Type | Description |
|---|---|
| float | The fuzzy distance between x and y |
Example
q).ai:use`kx.ai
q)genString:{[length] length?.Q.an," "};
q)batch:genString each 5 + 10?10;
q)q:genString[5 + rand[10]];
q).ai.fuzzy.dist[batch;q;`levenshtein];
14 12 12 14 13 14 14 14 14 13f
The example generates a batch of random strings (batch) and a query string (q), then calculates the Levenshtein distance between them using .ai.fuzzy.dist. The output shows the edit distances for each comparison, where smaller values indicate greater similarity. This demonstrates how the function measures the "closeness" of sequences even when they are not identical.
.ai.fuzzy.search
The .ai.fuzzy.search function returns k best match for q from a dataset using fuzzy matching.
It identifies the closest candidates by computing similarity scores between the query and each data entry, ranking them based on their fuzzy distance.
Parameters
| Name | Type(s) | Description |
|---|---|---|
data |
string[] | symbol[] | enum | The list of strings/symbols to search from |
q |
string | string[] | symbol | symbol[] | The pattern to search for |
k |
short | int | long | The number of best matches to return |
metric |
symbol | The distance type, one of .ai.fuzzy.utils.fuzzyDistances |
Returns
| Type | Description |
|---|---|
| (float[]; int[]; (char | string | symbol)[]) | A list of triples containing distance, index and match value |
Example
q).ai:use`kx.ai
q)genString:{[length] length?.Q.an," "};
q)batch:genString each 5 + 10?10;
q)q:genString[5 + rand[10]];
q).ai.fuzzy.search[batch;q;3;`levenshtein];
12 12 13
2 1 4
"6SmcKLSCuEI9n8" "ynEr5x" "TtNEiYH_c"
Here, the same random string batch and query are used, but .ai.fuzzy.search retrieves the top 3 closest matches ranked by Levenshtein distance. The first row lists the distances, the second row the indices in the dataset, and the third row the actual matching strings. This example shows how fuzzy search can quickly find approximate matches in a noisy or variable dataset.
Next steps
- Read the Fuzzy filters for symbol changes with KDB-X AI-Libs tutorial on Medium.