Inverted File (IVF)
This page describes the parameters for Inverted File (IVF) calls as part of AI libs.
An IVF index is an efficient data structure used for Approximate Nearest Neighbor (ANN) search. It helps narrow down the scope of vectors during search, significantly improving search speed. IVF maps contents (vectors) to their locations, making it easier to retrieve relevant information from large datasets.
.ai.ivf.del
The .ai.ivf.del function removes one or more points from an existing IVF (Inverted File) index.
Deletion helps maintain index accuracy by discarding outdated or irrelevant vectors without requiring a full retraining. It ensures that the index reflects only the current and valid dataset.
Parameters
| Name | Type(s) | Description |
|---|---|---|
ivf |
dict | The existing IVF index to delete from |
ids |
long | long[] | The IDs of vectors to delete |
Returns
| Type | Description |
|---|---|
| dict | IVF index with points deleted |
Refer also to .ai.ivf.train.
Example
q).ai:use`kx.ai
q)vecs:{(x;y)#(x*y)?1e}[1000;10];
q)repPts:.ai.ivf.train[4;vecs;`L2];
q)ivf:.ai.ivf.put[();repPts;vecs;`L2];
clusters | `s#0 1 2 3
ids | (0 2 5 8 9 13 14 17 21 29 31 51 52 5..
vectors | ((0.3927524 0.5170911 0.5159796 0.40..
centroids| (0.5303572 0.5351732 0.5627522 0.422..
metric | `L2
q)ivf:.ai.ivf.del[ivf;0 2]
clusters | `s#0 1 2 3
ids | (5 8 9 13 14 17 21 29 31 51 52 53 54..
vectors | ((0.6203014 0.9326316 0.2747066 0.05..
centroids| (0.5303572 0.5351732 0.5627522 0.422..
metric | `L2
The example first trains cluster centroids, builds an IVF index, and then deletes vectors with IDs 0 and 2. After deletion, the ids list no longer contains those entries, while the cluster structure and centroids remain intact. This demonstrates how .ai.ivf.del removes specific vectors from the index while preserving the overall indexing structure.
.ai.ivf.predict
The .ai.ivf.predict function predicts the cluster a vector belongs to by comparing its features to the centroids of predefined clusters.
This function predicts the cluster assignment of a given vector within an IVF index. By determining which centroid the vector belongs to, it enables efficient routing of searches to relevant partitions. This reduces search space and accelerates nearest-neighbor retrieval.
Parameters
| Name | Type(s) | Description |
|---|---|---|
repPts |
real[][] | Centroid centers from .ai.ivf.train |
vecs |
real[][] | real[] | The vector(s) to assign a cluster to |
metric |
symbol | The metric for distance calculation, one of (L2; CS; IP) |
Returns
| Type | Description |
|---|---|
| long[] | Cluster IDs where vectors belong to |
Example
q).ai:use`kx.ai
q)vecs:{(x;y)#(x*y)?1e}[1000;10];
q)repPts:.ai.ivf.train[4;vecs;`L2]
q).ai.ivf.predict[repPts;vecs;`L2]
0 1 0 1 1 0 2 2 0 ..
This example trains cluster centroids for an IVF index and then predicts the cluster assignment for each vector in the dataset. The output shows which cluster (0-3) each vector is mapped to. It demonstrates how .ai.ivf.predict assigns vectors to the closest centroid, enabling partitioned indexing.
.ai.ivf.put
The .ai.ivf.put function creates/inserts to an IVF index.
This function creates or inserts new vectors into an IVF index. Each vector is assigned to its nearest cluster centroid during insertion, allowing the index to grow dynamically. It supports incremental updates, making it suitable for datasets that evolve over time.
Parameters
| Name | Type(s) | Description |
|---|---|---|
ivf |
dict | The existing IVF index to upsert to |
repPts |
real[][] | Centroid centers from .ai.ivf.train |
vecs |
real[][] | The vector(s) to calculate nearest centroid of |
metric |
symbol | The metric for distance calculation, one of (L2; CS; IP) |
Returns
| Type | Description |
|---|---|
| dict | The IVF index |
Refer also to .ai.ivf.train.
Example
q).ai:use`kx.ai
q)vecs:{(x;y)#(x*y)?1e}[1000;10];
q)repPts:.ai.ivf.train[4;vecs;`L2];
q)index:.ai.ivf.put[();repPts;vecs;`L2];
clusters | `s#0 1 2 3
ids | (0 2 5 8 9 13 14 17 21 29 ..
vectors | ((0.3927524 0.5170911 0.51..
centroids| (0.5303572 0.5351732 0.562..
metric | `L2
After training cluster centroids, this example inserts vectors into the IVF index. Each vector is routed to its nearest cluster, and the index tracks cluster membership, IDs, and centroids. The example shows how .ai.ivf.put builds a working IVF index ready for search.
.ai.ivf.search
The .ai.ivf.search function performs a similarity search against an IVF index, returning the nearest neighbors for a query vector.
By restricting the search to relevant clusters, it achieves faster lookups compared to brute-force methods. It is a core operation for scalable vector search.
Parameters
| Name | Type(s) | Description |
|---|---|---|
ivf |
dict | The existing IVF index to search |
q |
real[] | real[][] | The query vector(s) |
k |
short | int | long | The number of nearest neighbors to return |
nprobe |
int | long | The number of clusters to search |
Returns
| Type | Description |
|---|---|
| (real; long)[] | The nearest points and the corresponding distance under the given metric |
Refer also to .ai.ivf.train.
Example
q).ai:use`kx.ai
q)vecs:{(x;y)#(x*y)?1e}[1000;10];
q)repPts:.ai.ivf.train[4;vecs;`L2];
q)index:.ai.ivf.put[();repPts;vecs;`L2];
q).ai.ivf.search[index;10?1e;5;2]
0.1686666 0.1898834 0.3466181 0.368607 0.3952734
152 993 180 220 495
Here, a query vector (10?1e) is searched against the IVF index, requesting the top 5 nearest neighbors while searching 2 clusters. The output shows distances (top row) and document IDs (bottom row). This demonstrates how .ai.ivf.search accelerates retrieval by limiting comparisons to the most relevant clusters.
.ai.ivf.topq
The .ai.ivf.topq function converts an IVF index into Product Quantization (PQ) cluster centroids for each partition.
By combining IVF with PQ, it further compresses vector representations while maintaining efficient search. This hybrid approach is particularly valuable for large-scale, high-dimensional datasets.
Parameters
| Name | Type(s) | Description |
|---|---|---|
ivf |
dict | The existing IVF index to search |
nsplits |
long | int | The number of columnular splits on the matrices to make |
nbits |
long | int | The number of bits used to encode each PQ subvector |
metric |
symbol | The metric for centroid calculation, one of (L2; CS; IP) |
ntrain |
long | int | The number of vectors in the IVF index to train PQ on |
Returns
| Type | Description |
|---|---|
| dict | Returns the ivfpq dictionary |
Example
q).ai:use`kx.ai
q)vecs:{(x;y)#(x*y)?1e}[1000;10];
q)repPts:.ai.ivf.train[4;vecs;`L2];
q)ivf:.ai.ivf.put[();repPts;vecs;`L2];
q).ai.ivf.topq[ivf;2;8;`L2;500]
clusters | `s#0 1 2 3
ids | (0 2 5 8 9 13 14 17 21 29 31 51 52..
centroids | (0.5303572 0.5351732 0.5627522 0.4..
metric | `L2
pqCentroids| ((-0.2639576 -0.4129897 0.07042291..
encodings | ((76 216 136 246 7 16 6 149 217 14..
The example converts an existing IVF index into an IVFPQ structure using .ai.ivf.topq, with 2 splits and 8 quantization levels. The result includes PQ centroids and encodings for compressed vector storage. This illustrates how IVF can be combined with PQ for memory-efficient large-scale search.
.ai.ivf.train
The .ai.ivf.train function calculates cluster centroids for building or retraining an IVF index.
Training is performed on a representative dataset to define the partitions that are used for insertion and search. The quality of these centroids directly impacts the accuracy and speed of the index.
Parameters
| Name | Type(s) | Description |
|---|---|---|
nlist |
long | The number of centroids to compute |
vecs |
real[][] | The training vectors |
metric |
symbol | The metric for centroid calculation, one of (L2; CS; IP) |
Returns
| Type | Description |
|---|---|
| real[][] | The vectors representing the centroid center |
Example
q).ai:use`kx.ai
q)vecs:{(x;y)#(x*y)?1e}[1000;10];
q).ai.ivf.train[4;vecs;`L2]
0.5303572 0.5351732 0.5627522 0.4228943 0.535001 0.2002861 0.4314032 0.3511047 0.5948122 0.5000019
0.6322692 0.4451934 0.4216797 0.523856 0.360434 0.5268008 0.3773872 0.751892 0.4982377 0.5712299
0.2963423 0.5678515 0.6497793 0.4968773 0.4498042 0.6334871 0.6088817 0.5445439 0.3068971 0.4350893
0.5868517 0.4527895 0.3803186 0.5102138 0.6851335 0.7158532 0.5852287 0.3580849 0.6611753 0.4312862
This example trains an IVF index with 4 clusters using the dataset vecs and the L2 distance metric. The output shows the learned cluster centroids. It demonstrates the training step required before vectors can be inserted into an IVF index.
.ai.ivf.upd
The .ai.ivf.upd function updates existing points in an IVF index with new vector representations.
Updates ensure that indexed data remains current without requiring full deletion and reinsertion. It is useful for dynamic datasets where vectors may change over time.
Parameters
| Name | Type(s) | Description |
|---|---|---|
ivf |
dict | The existing IVF index to update |
ids |
long | long[] | The vectors to insert |
vecs |
real[][] | The replacement vectors |
Returns
| Type | Description |
|---|---|
| dict | The IVF index with updated points |
Refer also to .ai.ivf.train.
Example
q).ai:use`kx.ai
q)vecs:{(x;y)#(x*y)?1e}[1000;10];
q)repPts:.ai.ivf.train[4;vecs;`L2];
q)ivf:.ai.ivf.put[();repPts;vecs;`L2];
clusters | `s#0 1 2 3
ids | (0 2 5 8 9 13 14 17 21 29 31 51 52 5..
vectors | ((0.3927524 0.5170911 0.5159796 0.40..
centroids| (0.5303572 0.5351732 0.5627522 0.422..
metric | `L2
q)ivf:.ai.ivf.upd[ivf;2 5;2#enlist (first vecs)]
clusters | `s#0 1 2 3
ids | (0 8 9 13 14 17 21 29 31 51 52 53 54 ..
vectors | ((0.3927524 0.5170911 0.5159796 0.40..
centroids| (0.5303572 0.5351732 0.5627522 0.422..
metric | `L2
The example creates an IVF index and then updates the vectors at IDs 2 and 5 with new representations. After updating, the index reflects the modified vectors, and a search verifies that the updated entries now align more closely with the query. This demonstrates how .ai.ivf.upd refreshes specific vectors in place, ensuring the index remains current without a full rebuild.