cuVS Module in KDB-X
This page explains the cuVS module in KDB-X – what it is, how CAGRA works, and when to use GPU-accelerated vector search over CPU-based alternatives.
The cuVS module enables Nvidia's cuVS (CUDA Vector Search) library to function with KDB-X, providing GPU-accelerated similarity search over large-scale vector datasets. It exposes the CAGRA (CUDA ANNS GRAph-based) algorithm – a graph-based nearest neighbor search method that builds and traverses a directed k-nearest neighbor graph entirely on the GPU, delivering higher throughput than CPU-based alternatives. The cuVS module introduces CAGRA as a new vector index type, configured and managed in the same way as other similarity search indexes such as qFlat and qHnsw.
How CAGRA works
CAGRA builds a directed k-nearest neighbor graph (k-NNG) across your vector dataset entirely on the GPU, then runs a parallelized beam search at query time. Graph construction has two phases:
- Initial graph build – seeds the graph using either
IVF-PQ(default) orNN-Descent. - Graph pruning and optimization – removes redundant edges and improves connectivity.
At query time, CAGRA traverses this graph rather than scanning the full dataset, delivering significantly higher throughput than CPU-based alternatives such as HNSW.
Key features
The cuVS module exposes CAGRA as a vector index type in KDB-X with the following capabilities:
- GPU-accelerated search – runs similarity search entirely on the GPU, supporting millions of vectors with high throughput and low latency.
- Configurable graph construction – builds the
k-NNGusingIVF_PQ,nn_descent, orAUTObuild strategies, with tunable parameters includinggraph_degreeandintermediate_graph_degree. - Flexible distance metrics – supports Euclidean distance (
L2), cosine similarity (CS), and inner product (IP). - Fine-grained search tuning – exposes search parameters including
itopk_size,search_width, andalgoto balance recall and latency. - Native KDB-X integration – CAGRA indexes are created and queried through the same table and schema interface as existing index types.
When to use cuVS
Use the cuVS module whenever you need to:
- Run high-throughput similarity search over datasets of 1 million vectors or more where CPU-based indexing is a bottleneck.
- Power semantic search, recommendation, or anomaly detection pipelines that require low-latency nearest neighbor lookup at scale.
- Offload vector search to the GPU while keeping KDB-X's memory-efficient data structures on the host.
- Manage a large static vector index, taking advantage of CAGRA's fast GPU rebuild times for periodic refresh.
Key terms
The following terms describe how cuVS and CAGRA operate within KDB-X:
-
cuVS (CUDA Vector Search): Nvidia’s GPU-accelerated library for large-scale similarity search on vector data.
-
CAGRA: A graph-based approximate nearest neighbor algorithm that builds a directed k-nearest neighbor graph on the GPU and performs parallel beam search.
-
k-NNG (k-nearest neighbor graph): A graph where each node connects to its k closest neighbors, used by CAGRA for efficient traversal during search.
-
Build algorithm (
build_algo): Determines how the initial graph is constructed. Options include:IVF_PQ– fast, GPU-native, production defaultnn_descent– higher recall, higher VRAM usageAUTO– automatically selected based on dataset and GPU
-
Graph degree (
graph_degree): Number of edges per node in the final graph; controls the trade-off between recall and memory usage. -
Intermediate graph degree (
intermediate_graph_degree): Temporary graph connectivity before pruning; must be greater than or equal tograph_degree. -
Static index: An index whose structure is fixed at build time and cannot be modified incrementally.
-
VRAM: GPU memory required to store the vector dataset and graph index. Peak usage during build exceeds the final index size and scales with batch size.
-
Batched queries: Multiple query vectors submitted to the index in a single call.
Limitations
The cuVS module has the following limitations:
- The index must fit in GPU memory. CAGRA loads the full index into VRAM – refer to VRAM planning.
- Best suited for batched queries. For single-query workloads, review the search algorithm settings – refer to Search performance tuning.
- Minimum dataset size required. At least
intermediate_graph_degree + 1rows are needed before the index can build. Use brute-force search for small datasets.
Performance considerations
When using the cuVS module, consider the following:
-
Batch queries where possible: CAGRA works efficiently with batched queries. Increasing batch size improves GPU utilization and overall throughput. For concurrent workloads with many threads, increasing batch size per thread is more effective than increasing thread count alone.
-
Tune
itopk_sizefor recall vs speed: This is the primary search trade-off parameter. Higher values improve recall at the cost of search latency. The maximum is 512 when using theSINGLE_CTAsearch algorithm. -
Choose
algobased on workload scale: At 1M+ vectors,SINGLE_CTAcan show measurably lower recall thanMULTI_CTAbecause it exhausts available search steps on larger graphs. UseAUTOfor general workloads, or setalgo=1(MULTI_CTA) explicitly for recall-sensitive workloads at scale. -
Plan rebuilds explicitly: Since indexes are static, consider establishing a rebuild cadence (hourly or daily) or use a tiered approach; maintain CAGRA for the bulk static dataset and buffer recent writes in a secondary index such as HNSW, merging periodically.
-
Pre-allocate scratch buffers: Set
max_queriesto your expected batch size to avoid per-call allocation overhead during search.
Next steps
- To get started with cuVS, refer to the Quickstart.