Troubleshooting Errors
This guide helps you diagnose and resolve errors returned by the KDB-X GPU module. Each error includes its likely causes and step-by-step remediation.
How to use this guide
- Locate the error code in the Error index.
- Review the summary to confirm it matches your issue.
- Check the likely causes to narrow down the problem.
- Follow the remediation steps in order.
- If unresolved, escalate with logs and a reproducible example.
Error index
| Category | Error code | Summary |
|---|---|---|
| Internal | GPU_INTERNAL_LOGIC_ERROR | Unexpected internal control flow |
| GPU_SELECT_DEADLOCK | Interpreter deadlock detected | |
| GPU_LAUNCH_FAIL | GPU kernel launch failed | |
| GPU_CALCULATE_SCRATCH_ERR | Scratch memory calculation failed | |
| Memory | GPU_ARENA_MAP_FAILED | Host allocator creation failed |
| GPU_HOST_ALLOC_FAILED | Host memory allocation failed | |
| GPU_CUDA_ALLOC_FAILED | GPU VRAM allocation failed | |
| GPU_CREATE_GLOBAL_SET | Global set creation failed | |
| Type/parse | GPU_PARSE_FAIL | Input parsing failed |
| GPU_PARSE_NOT_IMPLEMENTED | Input parsing not implemented | |
| GPU_INVALID_STATEMENT | Statement rejected during parsing | |
| GPU_TYPE_UNSUPPORTED | Type not supported for operation | |
| GPU_INVALID_DOMAIN | Enumeration domain not found | |
| Data and column | GPU_EXPECTED_FOREIGN | Expected a GPU foreign object |
| GPU_WRONG_DEVICE | Object belongs to a different GPU device | |
| GPU_NOT_A_COLUMN | Symbol not found in column list | |
| GPU_COLUMN_OF_LISTS_UNSUPPORTED | Columns of lists not supported in select | |
| GPU_SYMBOL_SORTING_NOT_IMPLEMENTED | Sorting on symbols or character arrays not supported | |
| Other | GPU_NOT_IMPLEMENTED | Feature not available |
| GPU_UNKNOWN | Unclassified error |
Internal errors
GPU_INTERNAL_LOGIC_ERROR
Summary: Unexpected internal control flow
The GPU module reached a code path that should be unreachable under normal operation. This indicates a bug within the module itself rather than incorrect user input.
Remediation steps
- Note the exact query or operation that triggered the error.
- Check the KDB-X release notes for known issues in your current version.
- Try downgrading to the previous stable release to confirm it is a regression.
- Report the issue to KDB-X with a reproducible test case.
This indicates a system-level issue. Escalate to KDB-X support.
GPU_SELECT_DEADLOCK
Summary: Interpreter deadlock detected
The GPU interpreter has reached a deadlock state where two or more operations are waiting on each other and can no longer proceed. This is caused by a bug in the module's concurrency logic and should not occur under normal operation.
Remediation steps
- Terminate the GPU process.
- Upgrade to the latest module version where the bug may be fixed.
- Report to KDB-X support - this error should not occur.
This indicates a system-level issue. Escalate to KDB-X support.
GPU_LAUNCH_FAIL
Summary: GPU kernel launch failed
An attempt to launch a CUDA kernel on the GPU device failed. The kernel could not be dispatched for execution.
Likely causes
- CUDA driver or device in an error state
- Invalid internal kernel configuration
Remediation steps
- Check
nvidia-smifor GPU health and error state. - Run
compute-sanitizerto detect device errors. - Reboot the host to reset the CUDA driver and GPU state.
- Verify the CUDA toolkit and driver versions are compatible.
GPU_CALCULATE_SCRATCH_ERR
Summary: Scratch memory calculation failed
The module failed to determine how much scratch (temporary working) memory a GPU kernel requires before launching it. This is a pre-launch validation step.
Likely causes
- A bug in the scratch-size calculation logic for a specific kernel
- An unsupported parameter combination passed to the calculation routine
Remediation steps
- Note the specific operation that triggered this error.
- Check for a known issue in the current module version's changelog.
- Report to KDB-X support with full context - this error should not occur.
This indicates a system-level issue. Escalate to KDB-X support.
Memory errors
GPU_ARENA_MAP_FAILED
Summary: Host allocator creation failed
The module failed to create the memory arena (allocator) used to manage intermediate data structures on the host (CPU-side memory). This is a prerequisite for any subsequent GPU operation.
Likely causes
- Insufficient host RAM available at the time of the call
- Memory fragmentation preventing a large contiguous allocation
- OS-level memory limits (for example,
ulimit) being exceeded
Remediation steps
- Check host memory usage with
free -hor a system monitor. - Terminate other memory-heavy processes to free up RAM.
- Check
ulimit -vand increase virtual memory limits if required. - Reduce query complexity or data volume to lower peak memory demand.
GPU_HOST_ALLOC_FAILED
Summary: Host memory allocation failed
An attempt to allocate an intermediate data structure on the host (CPU RAM) failed. The allocator was created successfully, but the specific allocation request could not be fulfilled.
Likely causes
- System RAM is fully exhausted
- The requested allocation size exceeds available contiguous memory
- A memory leak in a prior operation has consumed available RAM
Remediation steps
- Monitor memory usage during the operation with
toporhtop. - Reduce the size of the input dataset or split it into smaller batches.
- Restart the process to clear any leaked allocations.
- Add more physical RAM or increase available swap space.
GPU_CUDA_ALLOC_FAILED
Summary: GPU VRAM allocation failed
A CUDA memory allocation call (cudaMalloc or cudaMallocAsync) returned cudaErrorMemoryAllocation, meaning the GPU does not have enough free VRAM to satisfy the request.
Likely causes
- GPU VRAM is exhausted by the current workload
- Other processes are consuming VRAM (for example, display driver or other CUDA applications)
- Memory fragmentation on the GPU leaving no contiguous free block of sufficient size
- Requesting more memory than the GPU physically has
Remediation steps
- Run
nvidia-smito inspect current VRAM usage. - Kill or pause other GPU-bound processes to free VRAM.
- Reduce dataset size or enable chunked/streaming processing.
- Switch to a GPU with more VRAM if the workload fundamentally requires it.
- Set
CUDA_VISIBLE_DEVICESto isolate the process to a less-loaded GPU.
This is the most common error in production GPU workloads. Monitor VRAM headroom proactively.
GPU_CREATE_GLOBAL_SET
Summary: Global set creation failed
The module failed to create a global set, most likely due to a CUDA memory allocation failure on the device.
Likely causes
- Insufficient GPU VRAM available at the time of the call
- A preceding
cudaMallocfailure that has not been handled - GPU device in an error state
Remediation steps
- Run
nvidia-smito inspect current VRAM usage. - Kill or pause other GPU-bound processes to free VRAM.
- Check for a preceding GPU_CUDA_ALLOC_FAILED error in the same session – resolve that first.
- Reboot the host to reset the CUDA driver and GPU state if the device appears unhealthy.
Type/parse errors
GPU_PARSE_FAIL
Summary: Input parsing failed
The GPU module was unable to parse the provided input. The input did not conform to the expected format.
Likely causes
- Malformed parse tree or expression
- Unexpected parse tree structure
Remediation steps
- Validate the query syntax against the KDB-X GPU module documentation.
- Simplify the query to isolate which part is causing the parse failure.
- Ensure the input is within documented size and nesting limits.
GPU_PARSE_NOT_IMPLEMENTED
Summary: Input parsing not implemented
The GPU module recognised the input but parsing for this construct has not yet been implemented.
Likely causes
- Using a query construct that is planned but not yet supported by the GPU parser
- A syntax variant that the CPU parser accepts but the GPU parser does not yet handle
Remediation steps
- Check the KDB-X GPU module changelog for the feature's planned release version.
- Rewrite the query to avoid the unimplemented construct.
- Use the CPU fallback path in the interim.
- Contact KDB-X to request prioritisation or a workaround.
GPU_INVALID_STATEMENT
Summary: Statement rejected during parsing
The input was parseable but the resulting statement was determined to be semantically invalid. The statement structure violates the module's rules.
Likely causes
- Type mismatch between operands in an expression
- Use of a keyword in an unsupported context
- Missing required clauses or arguments in a statement
Remediation steps
- Review the exact statement flagged in the error output.
- Cross-reference the statement against valid syntax examples in the docs.
- Check operand types - ensure all types are compatible with the operation.
- Break complex statements into smaller parts to locate the invalid construct.
- Check if the same query works on the CPU with functional selects.
GPU_TYPE_UNSUPPORTED
Summary: Type not supported for operation
The data type of an operand is not supported for the requested GPU operation. The operation exists but does not handle this specific type.
Likely causes
- Using a data type that has not yet been implemented for this operation on the GPU
- Implicit type coercion producing an unexpected intermediate type
- Mixing types in an operation that requires homogeneous inputs
Remediation steps
- Check the KDB-X docs for supported types for the operation.
- Cast the data to a supported type before passing it to the operation.
- Use the CPU fallback path if a GPU-compatible type cast is not feasible.
GPU_INVALID_DOMAIN
Summary: Enumeration domain not found
The named domain for an enumeration could not be located.
Likely causes
- Referencing an enum domain that has not been defined
- Corrupted or incomplete enum metadata
Remediation steps
- Verify that the enumeration domain is correctly defined in your schema.
- Re-register or re-initialise the enum domain if it has been removed.
- Validate enum metadata integrity in the underlying data files.
Data and column errors
GPU_EXPECTED_FOREIGN
Summary: Expected a GPU foreign object
The module expected the input object to be a GPU foreign (a table or array resident on the GPU) but received a non-foreign object.
Likely causes
- Passing a CPU-resident table directly to a GPU operation without first moving it with
.gpu.to - A prior
.gpu.tocall that failed silently, leaving a CPU object in place
Remediation steps
- Ensure the table or array has been moved to the GPU with
.gpu.tobefore passing it to GPU operations. - Confirm the return value of
.gpu.to– it should display as a composite offoreignobjects. - Check for errors earlier in the session that may have interrupted the data transfer.
GPU_WRONG_DEVICE
Summary: Object belongs to a different GPU device
The object being accessed was allocated on a different GPU device than the one currently active.
Likely causes
- Using a multi-GPU setup where data was moved to one device but the operation is running on another
- The active device changed between the
.gpu.tocall and the operation
Remediation steps
- Check the active device with
.gpu.gdev. - Verify which device the object was allocated on – it must match the active device.
- Use
.gpu.sdevto set the correct active device before running the operation. - If using multiple devices, ensure data and operations are consistently routed to the same device.
GPU_NOT_A_COLUMN
Summary: Symbol not found in column list
A symbol used in the query could not be matched to any column in the table.
Likely causes
- A column name is misspelled in the query
- Referencing a column that does not exist in the GPU table
- Case mismatch in the column name
Remediation steps
- Check the column names in the table with
colsorkey .gpu.to table. - Verify the spelling and case of all column references in the query.
- Confirm the correct table is being queried.
GPU_COLUMN_OF_LISTS_UNSUPPORTED
Summary: Columns of lists are not supported in select
The query produced or referenced a column of lists, which the GPU select operation does not support.
Likely causes
- A
group byselect without an aggregate function, which produces list columns - A table that already contains list-typed columns being passed to a GPU select
Remediation steps
- Add an aggregate function (for example,
sum,avg,max) to anygroup byclause. - Check the table schema for existing list-typed columns and flatten them before passing to the GPU.
- Use the CPU fallback path for queries that inherently produce list columns.
GPU_SYMBOL_SORTING_NOT_IMPLEMENTED
Summary: Sorting on symbols or character arrays is not supported
The query attempted to sort on a symbol or character array column, which is not currently supported by the GPU module.
Likely causes
- Using
xasc,xdesc, or anorder byclause on a symbol or character array column - An implicit sort triggered by a keyed table operation on a symbol column
Remediation steps
- Convert the symbol column to an enum before sorting – for example,
update sym:`sym$sym from table. - Perform the sort on the CPU and then move the result to the GPU.
- Check the KDB-X GPU module changelog for when symbol sorting support is planned.
Other errors
GPU_NOT_IMPLEMENTED
Summary: Feature not available
The requested operation or feature has not yet been implemented in this version of the GPU module.
Likely causes
- Using a query syntax or function that is planned but not yet released
- Calling an operation on a data type that does not yet have GPU support
Remediation steps
- Review the KDB-X GPU module changelog for the feature's planned release version.
- Check whether a CPU fallback is available for your operation.
- Rewrite the query to avoid the unimplemented construct.
- Contact KDB-X to request prioritisation or a workaround.
GPU_UNKNOWN
Summary: Unclassified error
An unexpected error occurred that does not match any known error category.
Likely causes
- A bug in the error reporting code
Remediation steps
- Capture the full stack trace.
- Check whether earlier API calls in the same session returned a more specific error code that may indicate the root cause.
- Attempt to reproduce with a minimal input to isolate the trigger.
- File a bug report with KDB-X support, including the trace and reproduction steps.