Troubleshooting cuVS
This guide helps you diagnose and resolve common issues when using the cuVS module in KDB-X, including VRAM errors, index build failures, and performance problems.
How to use this guide
- Identify the issue you are encountering.
- Locate the matching scenario in the Issue index.
- Review the summary to confirm it matches your case.
- Check the likely causes.
- Follow the mitigation steps.
Issue index
| Category | Issue | Summary |
|---|---|---|
| Memory | High VRAM when using NN Descent | Excessive VRAM usage during index build |
| VRAM Data Retention | Higher-than-expected VRAM usage during search | |
| nn_descent Out of Memory | Out-of-memory errors on large datasets | |
| Index build | Minimum Dataset Size | Index build fails with small datasets |
| General | Misleading VRAM | Reported free VRAM does not reflect actual availability |
Memory issues
High VRAM when using nn_descent
Summary: Excessive VRAM usage during index build
The nn_descent build algorithm consumes significantly more VRAM than other build strategies, especially on large datasets.
Likely causes
- Using
nn_descenton large datasets - Building indexes on shared GPUs with limited available memory
Mitigation steps
- Switch to
IVF_PQ, which has significantly lower VRAM requirements. - Use
AUTOto allow cuVS to select the appropriate build strategy. - Run index builds on a dedicated GPU if
nn_descentis required.
VRAM data retention
Summary: Higher-than-expected VRAM usage during search
CAGRA retains additional memory during search due to internal data structures, resulting in higher-than-expected VRAM usage.
Likely causes
- Internal float16 copies retained during search
- Large datasets and index structures placed in GPU memory during search
Mitigation steps
- Account for approximately 1.8× the raw dataset size when planning VRAM.
- Use
IVF_PQto reduce memory overhead where possible. - Reduce dataset size or dimensionality if VRAM is constrained.
This behavior is expected and may be improved in future cuVS releases.
nn_descent out of memory
Summary: Out-of-memory errors on large datasets
The nn_descent algorithm may fail with out-of-memory errors as dataset size increases.
Likely causes
- VRAM requirements scaling with dataset size
- GPU already partially occupied by other processes
Mitigation steps
- Use
IVF_PQfor datasets larger than ~5M vectors. - Avoid running
nn_descenton shared GPUs. - Monitor VRAM usage using
nvidia-smiduring index build.
Index build issues
Minimum dataset size
Summary: Index build fails with small datasets
CAGRA requires a minimum number of rows before the index can be built.
Likely causes
- Dataset size is smaller than
intermediate_graph_degree + 1 - Index build triggered too early during ingestion
Mitigation steps
- Ensure at least
intermediate_graph_degree + 1rows are inserted before building the index. - Buffer data until sufficient rows are available.
- Use brute-force search or defer indexing for very small datasets.
This may leave the CUDA context in an invalid state, requiring a container or process restart.
General issues
Misleading VRAM
Summary: Reported free VRAM does not reflect actual availability
GPU memory reporting may appear misleading when using shared GPUs.
Likely causes
cudaMemGetInfo()reports per-process memory rather than system-wide availability- Memory reported by CUDA APIs may not reflect total GPU usage across processes
- Other processes holding VRAM not visible to the current process
Mitigation steps
- Use
nvidia-smito check total GPU memory usage across all processes. - Avoid relying solely on per-process memory reports.
- Prefer
IVF_PQin shared GPU environments. - Use
nvtopfor a live, per-process GPU memory and utilisation view whennvidia-smisnapshots are insufficient.