cuVS Module in KDB-X

This page explains the cuVS module in KDB-X – what it is, how CAGRA works, and when to use GPU-accelerated vector search over CPU-based alternatives.

The cuVS module enables Nvidia's cuVS (CUDA Vector Search) library to function with KDB-X, providing GPU-accelerated similarity search over large-scale vector datasets. It exposes the CAGRA (CUDA ANNS GRAph-based) algorithm – a graph-based nearest neighbor search method that builds and traverses a directed k-nearest neighbor graph entirely on the GPU, delivering higher throughput than CPU-based alternatives. The cuVS module introduces CAGRA as a new vector index type, configured and managed in the same way as other similarity search indexes such as qFlat and qHnsw.

How CAGRA works

CAGRA builds a directed k-nearest neighbor graph (k-NNG) across your vector dataset entirely on the GPU, then runs a parallelized beam search at query time. Graph construction has two phases:

Initial graph build – seeds the graph using either IVF-PQ (default) or NN-Descent.
Graph pruning and optimization – removes redundant edges and improves connectivity.

At query time, CAGRA traverses this graph rather than scanning the full dataset, delivering significantly higher throughput than CPU-based alternatives such as HNSW.

Key features

The cuVS module exposes CAGRA as a vector index type in KDB-X with the following capabilities:

GPU-accelerated search – runs similarity search entirely on the GPU, supporting millions of vectors with high throughput and low latency.
Configurable graph construction – builds the k-NNG using IVF_PQ, nn_descent, or AUTO build strategies, with tunable parameters including graph_degree and intermediate_graph_degree.
Flexible distance metrics – supports Euclidean distance (L2), cosine similarity (CS), and inner product (IP).
Fine-grained search tuning – exposes search parameters including itopk_size, search_width, and algo to balance recall and latency.
Native KDB-X integration – CAGRA indexes are created and queried through the same table and schema interface as existing index types.

When to use cuVS

Use the cuVS module whenever you need to:

Run high-throughput similarity search over datasets of 1 million vectors or more where CPU-based indexing is a bottleneck.
Power semantic search, recommendation, or anomaly detection pipelines that require low-latency nearest neighbor lookup at scale.
Offload vector search to the GPU while keeping KDB-X's memory-efficient data structures on the host.
Manage a large static vector index, taking advantage of CAGRA's fast GPU rebuild times for periodic refresh.

Key terms

The following terms describe how cuVS and CAGRA operate within KDB-X:

cuVS (CUDA Vector Search): Nvidia’s GPU-accelerated library for large-scale similarity search on vector data.
CAGRA: A graph-based approximate nearest neighbor algorithm that builds a directed k-nearest neighbor graph on the GPU and performs parallel beam search.
k-NNG (k-nearest neighbor graph): A graph where each node connects to its k closest neighbors, used by CAGRA for efficient traversal during search.
Build algorithm (build_algo): Determines how the initial graph is constructed. Options include:
- IVF_PQ – fast, GPU-native, production default
- nn_descent – higher recall, higher VRAM usage
- AUTO – automatically selected based on dataset and GPU
Graph degree (graph_degree): Number of edges per node in the final graph; controls the trade-off between recall and memory usage.
Intermediate graph degree (intermediate_graph_degree): Temporary graph connectivity before pruning; must be greater than or equal to graph_degree.
Static index: An index whose structure is fixed at build time and cannot be modified incrementally.
VRAM: GPU memory required to store the vector dataset and graph index. Peak usage during build exceeds the final index size and scales with batch size.
Batched queries: Multiple query vectors submitted to the index in a single call.

Limitations

The cuVS module has the following limitations:

The index must fit in GPU memory. CAGRA loads the full index into VRAM – refer to VRAM planning.
Best suited for batched queries. For single-query workloads, review the search algorithm settings – refer to Search performance tuning.
Minimum dataset size required. At least intermediate_graph_degree + 1 rows are needed before the index can build. Use brute-force search for small datasets.

Performance considerations

When using the cuVS module, consider the following:

Batch queries where possible: CAGRA works efficiently with batched queries. Increasing batch size improves GPU utilization and overall throughput. For concurrent workloads with many threads, increasing batch size per thread is more effective than increasing thread count alone.
Tune itopk_size for recall vs speed: This is the primary search trade-off parameter. Higher values improve recall at the cost of search latency. The maximum is 512 when using the SINGLE_CTA search algorithm.
Choose algo based on workload scale: At 1M+ vectors, SINGLE_CTA can show measurably lower recall than MULTI_CTA because it exhausts available search steps on larger graphs. Use AUTO for general workloads, or set algo=1 (MULTI_CTA) explicitly for recall-sensitive workloads at scale.
Plan rebuilds explicitly: Since indexes are static, consider establishing a rebuild cadence (hourly or daily) or use a tiered approach; maintain CAGRA for the bulk static dataset and buffer recent writes in a secondary index such as HNSW, merging periodically.
Pre-allocate scratch buffers: Set max_queries to your expected batch size to avoid per-call allocation overhead during search.

Next steps

To get started with cuVS, refer to the Quickstart.
Check the cuVS Release Notes for version history and fixes.