Send Feedback
Skip to content

Troubleshooting cuVS

This guide helps you diagnose and resolve common issues when using the cuVS module in KDB-X, including VRAM errors, index build failures, and performance problems.

How to use this guide

  1. Identify the issue you are encountering.
  2. Locate the matching scenario in the Issue index.
  3. Review the summary to confirm it matches your case.
  4. Check the likely causes.
  5. Follow the mitigation steps.

Issue index

Category Issue Summary
Memory High VRAM when using NN Descent Excessive VRAM usage during index build
VRAM Data Retention Higher-than-expected VRAM usage during search
nn_descent Out of Memory Out-of-memory errors on large datasets
Index build Minimum Dataset Size Index build fails with small datasets
General Misleading VRAM Reported free VRAM does not reflect actual availability

Memory issues

High VRAM when using nn_descent

Summary: Excessive VRAM usage during index build

The nn_descent build algorithm consumes significantly more VRAM than other build strategies, especially on large datasets.

Likely causes

  • Using nn_descent on large datasets
  • Building indexes on shared GPUs with limited available memory

Mitigation steps

  1. Switch to IVF_PQ, which has significantly lower VRAM requirements.
  2. Use AUTO to allow cuVS to select the appropriate build strategy.
  3. Run index builds on a dedicated GPU if nn_descent is required.

VRAM data retention

Summary: Higher-than-expected VRAM usage during search

CAGRA retains additional memory during search due to internal data structures, resulting in higher-than-expected VRAM usage.

Likely causes

  • Internal float16 copies retained during search
  • Large datasets and index structures placed in GPU memory during search

Mitigation steps

  1. Account for approximately 1.8× the raw dataset size when planning VRAM.
  2. Use IVF_PQ to reduce memory overhead where possible.
  3. Reduce dataset size or dimensionality if VRAM is constrained.

This behavior is expected and may be improved in future cuVS releases.


nn_descent out of memory

Summary: Out-of-memory errors on large datasets

The nn_descent algorithm may fail with out-of-memory errors as dataset size increases.

Likely causes

  • VRAM requirements scaling with dataset size
  • GPU already partially occupied by other processes

Mitigation steps

  1. Use IVF_PQ for datasets larger than ~5M vectors.
  2. Avoid running nn_descent on shared GPUs.
  3. Monitor VRAM usage using nvidia-smi during index build.

Index build issues

Minimum dataset size

Summary: Index build fails with small datasets

CAGRA requires a minimum number of rows before the index can be built.

Likely causes

  • Dataset size is smaller than intermediate_graph_degree + 1
  • Index build triggered too early during ingestion

Mitigation steps

  1. Ensure at least intermediate_graph_degree + 1 rows are inserted before building the index.
  2. Buffer data until sufficient rows are available.
  3. Use brute-force search or defer indexing for very small datasets.

This may leave the CUDA context in an invalid state, requiring a container or process restart.


General issues

Misleading VRAM

Summary: Reported free VRAM does not reflect actual availability

GPU memory reporting may appear misleading when using shared GPUs.

Likely causes

  • cudaMemGetInfo() reports per-process memory rather than system-wide availability
  • Memory reported by CUDA APIs may not reflect total GPU usage across processes
  • Other processes holding VRAM not visible to the current process

Mitigation steps

  1. Use nvidia-smi to check total GPU memory usage across all processes.
  2. Avoid relying solely on per-process memory reports.
  3. Prefer IVF_PQ in shared GPU environments.
  4. Use nvtop for a live, per-process GPU memory and utilisation view when nvidia-smi snapshots are insufficient.