Send Feedback
Skip to content

Inverted File (IVF)

This page describes the parameters for Inverted File (IVF) calls as part of AI libs.

An IVF index is an efficient data structure used for Approximate Nearest Neighbor (ANN) search. It helps narrow down the scope of vectors during search, significantly improving search speed. IVF maps contents (vectors) to their locations, making it easier to retrieve relevant information from large datasets.

.ai.ivf.del

The .ai.ivf.del function removes one or more points from an existing IVF (Inverted File) index.

Deletion helps maintain index accuracy by discarding outdated or irrelevant vectors without requiring a full retraining. It ensures that the index reflects only the current and valid dataset.

Parameters

Name Type(s) Description
ivf dict The existing IVF index to delete from
ids long | long[] The IDs of vectors to delete

Returns

Type Description
dict IVF index with points deleted

Refer also to .ai.ivf.train.

Example

q).ai:use`kx.ai
q)vecs:{(x;y)#(x*y)?1e}[1000;10];
q)repPts:.ai.ivf.train[4;vecs;`L2];
q)ivf:.ai.ivf.put[();repPts;vecs;`L2];
clusters | `s#0 1 2 3
ids      | (0 2 5 8 9 13 14 17 21 29 31 51 52 5..
vectors  | ((0.3927524 0.5170911 0.5159796 0.40..
centroids| (0.5303572 0.5351732 0.5627522 0.422..
metric   | `L2
q)ivf:.ai.ivf.del[ivf;0 2]
clusters | `s#0 1 2 3
ids      | (5 8 9 13 14 17 21 29 31 51 52 53 54..
vectors  | ((0.6203014 0.9326316 0.2747066 0.05..
centroids| (0.5303572 0.5351732 0.5627522 0.422..
metric   | `L2

The example first trains cluster centroids, builds an IVF index, and then deletes vectors with IDs 0 and 2. After deletion, the ids list no longer contains those entries, while the cluster structure and centroids remain intact. This demonstrates how .ai.ivf.del removes specific vectors from the index while preserving the overall indexing structure.

.ai.ivf.predict

The .ai.ivf.predict function predicts the cluster a vector belongs to by comparing its features to the centroids of predefined clusters.

This function predicts the cluster assignment of a given vector within an IVF index. By determining which centroid the vector belongs to, it enables efficient routing of searches to relevant partitions. This reduces search space and accelerates nearest-neighbor retrieval.

Parameters

Name Type(s) Description
repPts real[][] Centroid centers from .ai.ivf.train
vecs real[][] | real[] The vector(s) to assign a cluster to
metric symbol The metric for distance calculation, one of (L2; CS; IP)

Returns

Type Description
long[] Cluster IDs where vectors belong to

Example

q).ai:use`kx.ai
q)vecs:{(x;y)#(x*y)?1e}[1000;10];
q)repPts:.ai.ivf.train[4;vecs;`L2]
q).ai.ivf.predict[repPts;vecs;`L2]
0 1 0 1 1 0 2 2 0 ..

This example trains cluster centroids for an IVF index and then predicts the cluster assignment for each vector in the dataset. The output shows which cluster (0-3) each vector is mapped to. It demonstrates how .ai.ivf.predict assigns vectors to the closest centroid, enabling partitioned indexing.

.ai.ivf.put

The .ai.ivf.put function creates/inserts to an IVF index.

This function creates or inserts new vectors into an IVF index. Each vector is assigned to its nearest cluster centroid during insertion, allowing the index to grow dynamically. It supports incremental updates, making it suitable for datasets that evolve over time.

Parameters

Name Type(s) Description
ivf dict The existing IVF index to upsert to
repPts real[][] Centroid centers from .ai.ivf.train
vecs real[][] The vector(s) to calculate nearest centroid of
metric symbol The metric for distance calculation, one of (L2; CS; IP)

Returns

Type Description
dict The IVF index

Refer also to .ai.ivf.train.

Example

q).ai:use`kx.ai
q)vecs:{(x;y)#(x*y)?1e}[1000;10];
q)repPts:.ai.ivf.train[4;vecs;`L2];
q)index:.ai.ivf.put[();repPts;vecs;`L2];
clusters | `s#0 1 2 3
ids      | (0 2 5 8 9 13 14 17 21 29 ..
vectors  | ((0.3927524 0.5170911 0.51..
centroids| (0.5303572 0.5351732 0.562..
metric   | `L2

After training cluster centroids, this example inserts vectors into the IVF index. Each vector is routed to its nearest cluster, and the index tracks cluster membership, IDs, and centroids. The example shows how .ai.ivf.put builds a working IVF index ready for search.

.ai.ivf.search

The .ai.ivf.search function performs a similarity search against an IVF index, returning the nearest neighbors for a query vector.

By restricting the search to relevant clusters, it achieves faster lookups compared to brute-force methods. It is a core operation for scalable vector search.

Parameters

Name Type(s) Description
ivf dict The existing IVF index to search
q real[] | real[][] The query vector(s)
k short | int | long The number of nearest neighbors to return
nprobe int | long The number of clusters to search

Returns

Type Description
(real; long)[] The nearest points and the corresponding distance under the given metric

Refer also to .ai.ivf.train.

Example

q).ai:use`kx.ai
q)vecs:{(x;y)#(x*y)?1e}[1000;10];
q)repPts:.ai.ivf.train[4;vecs;`L2];
q)index:.ai.ivf.put[();repPts;vecs;`L2];
q).ai.ivf.search[index;10?1e;5;2]
0.1686666 0.1898834 0.3466181 0.368607 0.3952734
152       993       180       220      495

Here, a query vector (10?1e) is searched against the IVF index, requesting the top 5 nearest neighbors while searching 2 clusters. The output shows distances (top row) and document IDs (bottom row). This demonstrates how .ai.ivf.search accelerates retrieval by limiting comparisons to the most relevant clusters.

.ai.ivf.topq

The .ai.ivf.topq function converts an IVF index into Product Quantization (PQ) cluster centroids for each partition.

By combining IVF with PQ, it further compresses vector representations while maintaining efficient search. This hybrid approach is particularly valuable for large-scale, high-dimensional datasets.

Parameters

Name Type(s) Description
ivf dict The existing IVF index to search
nsplits long | int The number of columnular splits on the matrices to make
nbits long | int The number of bits used to encode each PQ subvector
metric symbol The metric for centroid calculation, one of (L2; CS; IP)
ntrain long | int The number of vectors in the IVF index to train PQ on

Returns

Type Description
dict Returns the ivfpq dictionary

Example

q).ai:use`kx.ai
q)vecs:{(x;y)#(x*y)?1e}[1000;10];
q)repPts:.ai.ivf.train[4;vecs;`L2];
q)ivf:.ai.ivf.put[();repPts;vecs;`L2];
q).ai.ivf.topq[ivf;2;8;`L2;500]
clusters   | `s#0 1 2 3
ids        | (0 2 5 8 9 13 14 17 21 29 31 51 52..
centroids  | (0.5303572 0.5351732 0.5627522 0.4..
metric     | `L2
pqCentroids| ((-0.2639576 -0.4129897 0.07042291..
encodings  | ((76 216 136 246 7 16 6 149 217 14..

The example converts an existing IVF index into an IVFPQ structure using .ai.ivf.topq, with 2 splits and 8 quantization levels. The result includes PQ centroids and encodings for compressed vector storage. This illustrates how IVF can be combined with PQ for memory-efficient large-scale search.

.ai.ivf.train

The .ai.ivf.train function calculates cluster centroids for building or retraining an IVF index.

Training is performed on a representative dataset to define the partitions that are used for insertion and search. The quality of these centroids directly impacts the accuracy and speed of the index.

Parameters

Name Type(s) Description
nlist long The number of centroids to compute
vecs real[][] The training vectors
metric symbol The metric for centroid calculation, one of (L2; CS; IP)

Returns

Type Description
real[][] The vectors representing the centroid center

Example

q).ai:use`kx.ai
q)vecs:{(x;y)#(x*y)?1e}[1000;10];
q).ai.ivf.train[4;vecs;`L2]
0.5303572 0.5351732 0.5627522 0.4228943 0.535001  0.2002861 0.4314032 0.3511047 0.5948122 0.5000019
0.6322692 0.4451934 0.4216797 0.523856  0.360434  0.5268008 0.3773872 0.751892  0.4982377 0.5712299
0.2963423 0.5678515 0.6497793 0.4968773 0.4498042 0.6334871 0.6088817 0.5445439 0.3068971 0.4350893
0.5868517 0.4527895 0.3803186 0.5102138 0.6851335 0.7158532 0.5852287 0.3580849 0.6611753 0.4312862

This example trains an IVF index with 4 clusters using the dataset vecs and the L2 distance metric. The output shows the learned cluster centroids. It demonstrates the training step required before vectors can be inserted into an IVF index.

.ai.ivf.upd

The .ai.ivf.upd function updates existing points in an IVF index with new vector representations.

Updates ensure that indexed data remains current without requiring full deletion and reinsertion. It is useful for dynamic datasets where vectors may change over time.

Parameters

Name Type(s) Description
ivf dict The existing IVF index to update
ids long | long[] The vectors to insert
vecs real[][] The replacement vectors

Returns

Type Description
dict The IVF index with updated points

Refer also to .ai.ivf.train.

Example

q).ai:use`kx.ai
q)vecs:{(x;y)#(x*y)?1e}[1000;10];
q)repPts:.ai.ivf.train[4;vecs;`L2];
q)ivf:.ai.ivf.put[();repPts;vecs;`L2];
clusters | `s#0 1 2 3
ids      | (0 2 5 8 9 13 14 17 21 29 31 51 52 5..
vectors  | ((0.3927524 0.5170911 0.5159796 0.40..
centroids| (0.5303572 0.5351732 0.5627522 0.422..
metric   | `L2
q)ivf:.ai.ivf.upd[ivf;2 5;2#enlist (first vecs)]
clusters | `s#0 1 2 3
ids      | (0 8 9 13 14 17 21 29 31 51 52 53 54 ..
vectors  | ((0.3927524 0.5170911 0.5159796 0.40..
centroids| (0.5303572 0.5351732 0.5627522 0.422..
metric   | `L2

The example creates an IVF index and then updates the vectors at IDs 2 and 5 with new representations. After updating, the index reflects the modified vectors, and a search verifies that the updated entries now align more closely with the query. This demonstrates how .ai.ivf.upd refreshes specific vectors in place, ensuring the index remains current without a full rebuild.