cuVS CAGRA reference card

This page covers the KDB-X cuVS CAGRA module APIs, including inputs, outputs, and examples for each function.

Overview

The cuVS module exposes a cagra namespace with the following functions:

Function	Description
`.cuvs.cagra.init`	Create a new CAGRA index
`cagra.insert`	Build or extend the index with vectors
`cagra.count`	Count vectors in the index
`cagra.search`	Nearest-neighbor search
`cagra.filter`	Nearest-neighbor search with a boolean mask
`cagra.write`	Serialize the index to disk
`cagra.read`	Deserialize an index from disk
`cagra.normalize`	L2-normalize a list of vectors

Load the module

.cuvs:use`kx.cuvs

Note

The cuVS CAGRA module operates on GPU-resident indexes represented as foreign objects returned by .cuvs.cagra.init or cagra.read.

.cuvs.cagra.init

Create a new CAGRA index

.cuvs.cagra.init params

Where params is a dictionary of index parameters (or :: for all defaults), returns a CAGRA index foreign object.

The index is empty after creation – use cagra.insert to build it with vectors.

Note

Indexes created with .cuvs.cagra.init are in-memory and are not persisted automatically. They are lost when the process exits unless explicitly saved using cagra.write.

Index parameters

Parameter	Type	Default	Description
`gpuid`	long	`0`	GPU device ID to use
`dims`	long	–	Dimensionality of the vectors. Must match the vectors passed to `cagra.insert`.
`metric`	symbol	`L2	Distance metric. One of `L2, `CS, `IP.
`graph_degree`	long	`64`	Edges per node in the final graph. Core recall/memory trade-off.
`intermediate_graph_degree`	long	`128`	Graph degree before pruning. Must be ≥ `graph_degree`.
`build_algo`	symbol	`AUTO_SELECT	Graph build algorithm. One of `AUTO_SELECT, `IVF_PQ, `nn_descent.
`nn_descent_niter`	long	`20`	Number of iterations for `nn_descent` build. Ignored for other algorithms.

Warning

compression and graph_build_params are not currently available – passing either will return a 'nyi error.

.cuvs:use`kx.cuvs

/ Create with all defaults
idx:.cuvs.cagra.init[::]

/ Create with explicit params
idx:.cuvs.cagra.init[`gpuid`dims`metric`graph_degree`intermediate_graph_degree`build_algo`nn_descent_niter!(0;128;`L2;64;128;`IVF_PQ;20)]

/ Cosine similarity – vectors are L2-normalized internally at insert and search time
idx:.cuvs.cagra.init[`gpuid`dims`metric!(0;128;`CS)]

cagra.insert

Build or extend the index with vectors

cagra.insert[index;vectors]

Where:

index is a CAGRA index foreign object returned by .cuvs.cagra.init or cagra.read
vectors is a list of float (8h) vectors, each of length dims

Returns the count of vectors inserted as a long.

On the first call, builds the full CAGRA graph. On subsequent calls, extends the existing graph incrementally.

Minimum dataset size

CAGRA requires at least intermediate_graph_degree + 1 vectors before the index can be built (default minimum: 129). Inserting fewer vectors on the first call will cause a GPU memory fault and corrupt the CUDA context – all subsequent GPU operations will fail until the process is restarted. Accumulate a sufficient batch before the first insert, or use a lower intermediate_graph_degree.

Note

A minimum of 2 vectors is required on the first insert. The q wrapper enforces this with a 'Cagra index requires at least 2 vectors error.

dims:128
idx:.cuvs.cagra.init[`gpuid`dims`metric`build_algo!(0;dims;`L2;`IVF_PQ)]

/ Generate random float vectors
N:10000
vecs:{(x;y)#(x*y)?1e};
data:vecs[N;dims];
testVecs:data[answer:neg[nTest]?nTrain];

/ Build the index
cagra.insert[idx;data]
10000

/ Extend with additional vectors
data2:vecs[1000;dims];
cagra.insert[idx;data2]
1000

cagra.count

Count vectors in the index

cagra.count index

Where index is a CAGRA index foreign object, returns the number of vectors currently in the index as a long. Returns 0 for an empty (unbuilt) index.

idx:.cuvs.cagra.init[`gpuid`dims!(0;128)]
.cuvs.cagra.count idx
0

vecs:{(x;y)#(x*y)?1e};
data:vecs[10000;128];
.cuvs.cagra.insert[idx;data]
10000
.cuvs.cagra.count idx
10000

cagra.search

Nearest-neighbor search

cagra.search[index;query;k;params]

Where:

index is a CAGRA index foreign object
query is a single float vector or a list of float vectors of length dims
k is a long – the number of nearest neighbors to return
params is a dictionary of search parameters (or :: for all defaults)

Returns a table with columns distances and neighbors (0-based row indices into the index). For a single query vector, returns a single flattened result. For a batch, returns a list of results.

Note

k must not exceed itopk_size (default 64). Passing k > itopk_size returns a 'value error. To retrieve more than 64 neighbors, set itopk_size accordingly in params.

Warning

Searching an empty (unbuilt) index returns a 'empty error.

Search parameters

Parameter	Type	Default	Description
`max_queries`	long	`0`	Max concurrent queries (batch size). `0` = auto-select.
`itopk_size`	long	`64`	Internal candidate list size. Primary recall/speed trade-off. Max 512 for `SINGLE_CTA`.
`max_iterations`	long	`0`	Upper limit on search iterations. `0` = auto.
`algo`	symbol	`SINGLE_CTA	Search parallelism strategy. One of `SINGLE_CTA, `MULTI_CTA, `MULTI_KERNEL, `AUTO.
`team_size`	long	`0`	CUDA threads per distance calculation. `0` = auto. Valid values: 4, 8, 16, 32.
`search_width`	long	`1`	Graph nodes explored in parallel per iteration.
`min_iterations`	long	`0`	Minimum search iterations to perform.
`thread_block_size`	long	`0`	CUDA thread block size. `0` = auto.
`hashmap_mode`	symbol	`HASH	Hashmap allocation strategy. One of `HASH, `SMALL, `AUTO_HASH.
`hashmap_min_bitlen`	long	`0`	Minimum bit length for hashmap.
`hashmap_max_fill_rate`	float	`0.5`	Maximum fill rate before hashmap rehash.
`num_random_samplings`	long	`1`	Number of random seed samples for graph traversal start points.
`rand_xor_mask`	long	`0x128394`	XOR mask for random seed generation.
`persistent`	boolean	`0b`	Enable persistent kernel mode.
`persistent_lifetime`	float	`2.0`	Lifetime (seconds) for persistent kernel.
`persistent_device_usage`	float	`1.0`	Fraction of GPU to dedicate to persistent kernel.

Note

SINGLE_CTA is the default for small workloads, but AUTO is recommended for general use and larger batch sizes.

dims:128;
vecs:{(x;y)#(x*y)?1e};
data:vecs[10000;dims];
idx:.cuvs.cagra.init[`gpuid`dims`metric`build_algo!(0;dims;`L2;`IVF_PQ)]
.cuvs.cagra.insert[idx;data]
10000

/ Single query – returns a flat table
q:dims?1e
.cuvs.cagra.search[idx;q;10;::]
distances  neighbors
--------------------
0.1234     42
0.2341     817
...

/ Batch query – returns a list of tables
qs:vecs[5;dims];
.cuvs.cagra.search[idx;qs;10;::]

/ Custom search params – higher recall
params:`itopk_size`algo!(128;`MULTI_CTA);
.cuvs.cagra.search[idx;q;10;params]

/ Full search params
params:`max_queries`itopk_size`max_iterations`algo`team_size`search_width`min_iterations`thread_block_size`hashmap_mode`hashmap_min_bitlen`hashmap_max_fill_rate`num_random_samplings!(0;64;0;`SINGLE_CTA;0;1;0;0;`HASH;0;0.5;1);
.cuvs.cagra.search[idx;q;10;params]

cagra.filter

Nearest-neighbor search with a boolean mask

cagra.filter[index;query;k;params;ids]

Where:

index is a CAGRA index foreign object
query is a single float vector or a list of float vectors of length dims
k is a long – the number of nearest neighbors to return
params is a dictionary of search parameters (or :: for all defaults) – same parameters as cagra.search
ids is a list of IDs to include of type long.

Returns the same structure as cagra.search, restricted to vectors that match the ids. Results with negative distances are filtered out automatically.

Note

ids must be a subset of the input vectors.

dims:10;
N:10000;
vecs:{(x;y)#(x*y)?1e};
data:vecs[N;dims];
idx:.cuvs.cagra.init[`gpuid`dims`metric`build_algo!(0;dims;`L2;`IVF_PQ)];
.cuvs.cagra.insert[idx;data];
10000

/ Allow only the first 5000 vectors
ids:til 5000;
q:dims?1e
.cuvs.cagra.filter[idx;q;10;::;ids]
distances  neighbors
--------------------
0.1891     312
0.2104     1204
...

/ All results will have neighbors < 5000

cagra.write

Serialize the index to disk

cagra.write[index;path]

Where:

index is a CAGRA index foreign object
path is a symbol or string file path (without extension)

Writes two files to disk: <path>.cagra (the binary index) and <path>.kdb (index metadata including build parameters and GPU ID).

Note

Both files are required to reload the index with cagra.read. Do not delete or rename one without the other.

.cuvs.cagra.write[idx;`:myindex]
/ Writes: myindex.cagra and myindex.kdb

.cuvs.cagra.write[idx;"myindex"]
/ Equivalent using a string path

cagra.read

Deserialize an index from disk

cagra.read[path;gpuid]

Where:

path is a symbol or string file path used when saving (without extension)
gpuid is a long specifying the GPU device to load onto, or :: to use the GPU ID stored in the index metadata

Returns a CAGRA index foreign object ready for search.

Note

Both <path>.cagra and <path>.kdb must exist. If either is missing, a 'os error is returned.

Warning

Specifying a gpuid that does not exist on the current system returns an error. If the stored GPU ID is no longer valid (e.g. loading on a different machine), pass an explicit gpuid override.

/ Load onto the GPU stored in metadata
idx:.cuvs.cagra.read[`:myindex;::]

/ Load onto a specific GPU
idx:.cuvs.cagra.read[`:myindex;0]

/ Use immediately after loading
.cuvs.cagra.count idx
10000
.cuvs.cagra.search[idx;dims?1e;10;::]

cagra.normalize

L2-normalize a list of vectors

cagra.normalize x

Where x is a list of numeric vectors (types 5h–9h), returns a list of real (8h) vectors each L2-normalized to unit length.

Note

When metric:``CS`` (Cosine Similarity) is passed to.cuvs.cagra.init, vectors are normalized automatically at insert and search time. Usecagra.normalizeexplicitly only when you need to pre-normalize vectors for other purposes or when usingmetric:IP.

vecs:(1 0 0e; 0 1 0e; 1 1 0e)
.cuvs.cagra.normalize vecs
1          0          0e
0          1          0e
0.7071068  0.7071068  0e

/ Each result vector has unit length
sqrt sum each {x*x} each .cuvs.cagra.normalize vecs
1 1 1e

Next steps

Explore the cuVS Examples page for end-to-end usage including index tuning, VRAM planning, and search performance.
Visit Troubleshooting Errors if you encounter issues.
Check the cuVS Release Notes for version history and fixes.