Troubleshooting cuVS

This guide helps you diagnose and resolve common issues when using the cuVS module in KDB-X, including VRAM errors, index build failures, and performance problems.

How to use this guide

Identify the issue you are encountering.
Locate the matching scenario in the Issue index.
Review the summary to confirm it matches your case.
Check the likely causes.
Follow the mitigation steps.

Issue index

Category	Issue	Summary
Memory	High VRAM when using NN Descent	Excessive VRAM usage during index build
	VRAM Data Retention	Higher-than-expected VRAM usage during search
	nn_descent Out of Memory	Out-of-memory errors on large datasets
Index build	Minimum Dataset Size	Index build fails with small datasets
General	Misleading VRAM	Reported free VRAM does not reflect actual availability

Memory issues

`High VRAM when using nn_descent`

Summary: Excessive VRAM usage during index build

The nn_descent build algorithm consumes significantly more VRAM than other build strategies, especially on large datasets.

Likely causes

Using nn_descent on large datasets
Building indexes on shared GPUs with limited available memory

Mitigation steps

Switch to IVF_PQ, which has significantly lower VRAM requirements.
Use AUTO to allow cuVS to select the appropriate build strategy.
Run index builds on a dedicated GPU if nn_descent is required.

`VRAM data retention`

Summary: Higher-than-expected VRAM usage during search

CAGRA retains additional memory during search due to internal data structures, resulting in higher-than-expected VRAM usage.

Likely causes

Internal float16 copies retained during search
Large datasets and index structures placed in GPU memory during search

Mitigation steps

Account for approximately 1.8× the raw dataset size when planning VRAM.
Use IVF_PQ to reduce memory overhead where possible.
Reduce dataset size or dimensionality if VRAM is constrained.

This behavior is expected and may be improved in future cuVS releases.

`nn_descent out of memory`

Summary: Out-of-memory errors on large datasets

The nn_descent algorithm may fail with out-of-memory errors as dataset size increases.

Likely causes

VRAM requirements scaling with dataset size
GPU already partially occupied by other processes

Mitigation steps

Use IVF_PQ for datasets larger than ~5M vectors.
Avoid running nn_descent on shared GPUs.
Monitor VRAM usage using nvidia-smi during index build.

Index build issues

`Minimum dataset size`

Summary: Index build fails with small datasets

CAGRA requires a minimum number of rows before the index can be built.

Likely causes

Dataset size is smaller than intermediate_graph_degree + 1
Index build triggered too early during ingestion

Mitigation steps

Ensure at least intermediate_graph_degree + 1 rows are inserted before building the index.
Buffer data until sufficient rows are available.
Use brute-force search or defer indexing for very small datasets.

This may leave the CUDA context in an invalid state, requiring a container or process restart.

General issues

`Misleading VRAM`

Summary: Reported free VRAM does not reflect actual availability

GPU memory reporting may appear misleading when using shared GPUs.

Likely causes

cudaMemGetInfo() reports per-process memory rather than system-wide availability
Memory reported by CUDA APIs may not reflect total GPU usage across processes
Other processes holding VRAM not visible to the current process

Mitigation steps

Use nvidia-smi to check total GPU memory usage across all processes.
Avoid relying solely on per-process memory reports.
Prefer IVF_PQ in shared GPU environments.
Use nvtop for a live, per-process GPU memory and utilisation view when nvidia-smi snapshots are insufficient.