Send Feedback
Skip to content

Troubleshooting Errors

This guide helps you diagnose and resolve errors returned by the KDB-X GPU module. Each error includes its likely causes and step-by-step remediation.

How to use this guide

  1. Locate the error code in the Error index.
  2. Review the summary to confirm it matches your issue.
  3. Check the likely causes to narrow down the problem.
  4. Follow the remediation steps in order.
  5. If unresolved, escalate with logs and a reproducible example.

Error index

Category Error code Summary
Internal GPU_INTERNAL_LOGIC_ERROR Unexpected internal control flow
GPU_SELECT_DEADLOCK Interpreter deadlock detected
GPU_LAUNCH_FAIL GPU kernel launch failed
GPU_CALCULATE_SCRATCH_ERR Scratch memory calculation failed
Memory GPU_ARENA_MAP_FAILED Host allocator creation failed
GPU_HOST_ALLOC_FAILED Host memory allocation failed
GPU_CUDA_ALLOC_FAILED GPU VRAM allocation failed
GPU_CREATE_GLOBAL_SET Global set creation failed
Type/parse GPU_PARSE_FAIL Input parsing failed
GPU_PARSE_NOT_IMPLEMENTED Input parsing not implemented
GPU_INVALID_STATEMENT Statement rejected during parsing
GPU_TYPE_UNSUPPORTED Type not supported for operation
GPU_INVALID_DOMAIN Enumeration domain not found
Data and column GPU_EXPECTED_FOREIGN Expected a GPU foreign object
GPU_WRONG_DEVICE Object belongs to a different GPU device
GPU_NOT_A_COLUMN Symbol not found in column list
GPU_COLUMN_OF_LISTS_UNSUPPORTED Columns of lists not supported in select
GPU_SYMBOL_SORTING_NOT_IMPLEMENTED Sorting on symbols or character arrays not supported
Other GPU_NOT_IMPLEMENTED Feature not available
GPU_UNKNOWN Unclassified error

Internal errors

GPU_INTERNAL_LOGIC_ERROR

Summary: Unexpected internal control flow

The GPU module reached a code path that should be unreachable under normal operation. This indicates a bug within the module itself rather than incorrect user input.

Remediation steps

  1. Note the exact query or operation that triggered the error.
  2. Check the KDB-X release notes for known issues in your current version.
  3. Try downgrading to the previous stable release to confirm it is a regression.
  4. Report the issue to KDB-X with a reproducible test case.

This indicates a system-level issue. Escalate to KDB-X support.


GPU_SELECT_DEADLOCK

Summary: Interpreter deadlock detected

The GPU interpreter has reached a deadlock state where two or more operations are waiting on each other and can no longer proceed. This is caused by a bug in the module's concurrency logic and should not occur under normal operation.

Remediation steps

  1. Terminate the GPU process.
  2. Upgrade to the latest module version where the bug may be fixed.
  3. Report to KDB-X support - this error should not occur.

This indicates a system-level issue. Escalate to KDB-X support.


GPU_LAUNCH_FAIL

Summary: GPU kernel launch failed

An attempt to launch a CUDA kernel on the GPU device failed. The kernel could not be dispatched for execution.

Likely causes

  • CUDA driver or device in an error state
  • Invalid internal kernel configuration

Remediation steps

  1. Check nvidia-smi for GPU health and error state.
  2. Run compute-sanitizer to detect device errors.
  3. Reboot the host to reset the CUDA driver and GPU state.
  4. Verify the CUDA toolkit and driver versions are compatible.

GPU_CALCULATE_SCRATCH_ERR

Summary: Scratch memory calculation failed

The module failed to determine how much scratch (temporary working) memory a GPU kernel requires before launching it. This is a pre-launch validation step.

Likely causes

  • A bug in the scratch-size calculation logic for a specific kernel
  • An unsupported parameter combination passed to the calculation routine

Remediation steps

  1. Note the specific operation that triggered this error.
  2. Check for a known issue in the current module version's changelog.
  3. Report to KDB-X support with full context - this error should not occur.

This indicates a system-level issue. Escalate to KDB-X support.


Memory errors

GPU_ARENA_MAP_FAILED

Summary: Host allocator creation failed

The module failed to create the memory arena (allocator) used to manage intermediate data structures on the host (CPU-side memory). This is a prerequisite for any subsequent GPU operation.

Likely causes

  • Insufficient host RAM available at the time of the call
  • Memory fragmentation preventing a large contiguous allocation
  • OS-level memory limits (for example, ulimit) being exceeded

Remediation steps

  1. Check host memory usage with free -h or a system monitor.
  2. Terminate other memory-heavy processes to free up RAM.
  3. Check ulimit -v and increase virtual memory limits if required.
  4. Reduce query complexity or data volume to lower peak memory demand.

GPU_HOST_ALLOC_FAILED

Summary: Host memory allocation failed

An attempt to allocate an intermediate data structure on the host (CPU RAM) failed. The allocator was created successfully, but the specific allocation request could not be fulfilled.

Likely causes

  • System RAM is fully exhausted
  • The requested allocation size exceeds available contiguous memory
  • A memory leak in a prior operation has consumed available RAM

Remediation steps

  1. Monitor memory usage during the operation with top or htop.
  2. Reduce the size of the input dataset or split it into smaller batches.
  3. Restart the process to clear any leaked allocations.
  4. Add more physical RAM or increase available swap space.

GPU_CUDA_ALLOC_FAILED

Summary: GPU VRAM allocation failed

A CUDA memory allocation call (cudaMalloc or cudaMallocAsync) returned cudaErrorMemoryAllocation, meaning the GPU does not have enough free VRAM to satisfy the request.

Likely causes

  • GPU VRAM is exhausted by the current workload
  • Other processes are consuming VRAM (for example, display driver or other CUDA applications)
  • Memory fragmentation on the GPU leaving no contiguous free block of sufficient size
  • Requesting more memory than the GPU physically has

Remediation steps

  1. Run nvidia-smi to inspect current VRAM usage.
  2. Kill or pause other GPU-bound processes to free VRAM.
  3. Reduce dataset size or enable chunked/streaming processing.
  4. Switch to a GPU with more VRAM if the workload fundamentally requires it.
  5. Set CUDA_VISIBLE_DEVICES to isolate the process to a less-loaded GPU.

This is the most common error in production GPU workloads. Monitor VRAM headroom proactively.


GPU_CREATE_GLOBAL_SET

Summary: Global set creation failed

The module failed to create a global set, most likely due to a CUDA memory allocation failure on the device.

Likely causes

  • Insufficient GPU VRAM available at the time of the call
  • A preceding cudaMalloc failure that has not been handled
  • GPU device in an error state

Remediation steps

  1. Run nvidia-smi to inspect current VRAM usage.
  2. Kill or pause other GPU-bound processes to free VRAM.
  3. Check for a preceding GPU_CUDA_ALLOC_FAILED error in the same session – resolve that first.
  4. Reboot the host to reset the CUDA driver and GPU state if the device appears unhealthy.

Type/parse errors

GPU_PARSE_FAIL

Summary: Input parsing failed

The GPU module was unable to parse the provided input. The input did not conform to the expected format.

Likely causes

  • Malformed parse tree or expression
  • Unexpected parse tree structure

Remediation steps

  1. Validate the query syntax against the KDB-X GPU module documentation.
  2. Simplify the query to isolate which part is causing the parse failure.
  3. Ensure the input is within documented size and nesting limits.

GPU_PARSE_NOT_IMPLEMENTED

Summary: Input parsing not implemented

The GPU module recognised the input but parsing for this construct has not yet been implemented.

Likely causes

  • Using a query construct that is planned but not yet supported by the GPU parser
  • A syntax variant that the CPU parser accepts but the GPU parser does not yet handle

Remediation steps

  1. Check the KDB-X GPU module changelog for the feature's planned release version.
  2. Rewrite the query to avoid the unimplemented construct.
  3. Use the CPU fallback path in the interim.
  4. Contact KDB-X to request prioritisation or a workaround.

GPU_INVALID_STATEMENT

Summary: Statement rejected during parsing

The input was parseable but the resulting statement was determined to be semantically invalid. The statement structure violates the module's rules.

Likely causes

  • Type mismatch between operands in an expression
  • Use of a keyword in an unsupported context
  • Missing required clauses or arguments in a statement

Remediation steps

  1. Review the exact statement flagged in the error output.
  2. Cross-reference the statement against valid syntax examples in the docs.
  3. Check operand types - ensure all types are compatible with the operation.
  4. Break complex statements into smaller parts to locate the invalid construct.
  5. Check if the same query works on the CPU with functional selects.

GPU_TYPE_UNSUPPORTED

Summary: Type not supported for operation

The data type of an operand is not supported for the requested GPU operation. The operation exists but does not handle this specific type.

Likely causes

  • Using a data type that has not yet been implemented for this operation on the GPU
  • Implicit type coercion producing an unexpected intermediate type
  • Mixing types in an operation that requires homogeneous inputs

Remediation steps

  1. Check the KDB-X docs for supported types for the operation.
  2. Cast the data to a supported type before passing it to the operation.
  3. Use the CPU fallback path if a GPU-compatible type cast is not feasible.

GPU_INVALID_DOMAIN

Summary: Enumeration domain not found

The named domain for an enumeration could not be located.

Likely causes

  • Referencing an enum domain that has not been defined
  • Corrupted or incomplete enum metadata

Remediation steps

  1. Verify that the enumeration domain is correctly defined in your schema.
  2. Re-register or re-initialise the enum domain if it has been removed.
  3. Validate enum metadata integrity in the underlying data files.

Data and column errors

GPU_EXPECTED_FOREIGN

Summary: Expected a GPU foreign object

The module expected the input object to be a GPU foreign (a table or array resident on the GPU) but received a non-foreign object.

Likely causes

  • Passing a CPU-resident table directly to a GPU operation without first moving it with .gpu.to
  • A prior .gpu.to call that failed silently, leaving a CPU object in place

Remediation steps

  1. Ensure the table or array has been moved to the GPU with .gpu.to before passing it to GPU operations.
  2. Confirm the return value of .gpu.to – it should display as a composite of foreign objects.
  3. Check for errors earlier in the session that may have interrupted the data transfer.

GPU_WRONG_DEVICE

Summary: Object belongs to a different GPU device

The object being accessed was allocated on a different GPU device than the one currently active.

Likely causes

  • Using a multi-GPU setup where data was moved to one device but the operation is running on another
  • The active device changed between the .gpu.to call and the operation

Remediation steps

  1. Check the active device with .gpu.gdev.
  2. Verify which device the object was allocated on – it must match the active device.
  3. Use .gpu.sdev to set the correct active device before running the operation.
  4. If using multiple devices, ensure data and operations are consistently routed to the same device.

GPU_NOT_A_COLUMN

Summary: Symbol not found in column list

A symbol used in the query could not be matched to any column in the table.

Likely causes

  • A column name is misspelled in the query
  • Referencing a column that does not exist in the GPU table
  • Case mismatch in the column name

Remediation steps

  1. Check the column names in the table with cols or key .gpu.to table.
  2. Verify the spelling and case of all column references in the query.
  3. Confirm the correct table is being queried.

GPU_COLUMN_OF_LISTS_UNSUPPORTED

Summary: Columns of lists are not supported in select

The query produced or referenced a column of lists, which the GPU select operation does not support.

Likely causes

  • A group by select without an aggregate function, which produces list columns
  • A table that already contains list-typed columns being passed to a GPU select

Remediation steps

  1. Add an aggregate function (for example, sum, avg, max) to any group by clause.
  2. Check the table schema for existing list-typed columns and flatten them before passing to the GPU.
  3. Use the CPU fallback path for queries that inherently produce list columns.

GPU_SYMBOL_SORTING_NOT_IMPLEMENTED

Summary: Sorting on symbols or character arrays is not supported

The query attempted to sort on a symbol or character array column, which is not currently supported by the GPU module.

Likely causes

  • Using xasc, xdesc, or an order by clause on a symbol or character array column
  • An implicit sort triggered by a keyed table operation on a symbol column

Remediation steps

  1. Convert the symbol column to an enum before sorting – for example, update sym:`sym$sym from table.
  2. Perform the sort on the CPU and then move the result to the GPU.
  3. Check the KDB-X GPU module changelog for when symbol sorting support is planned.

Other errors

GPU_NOT_IMPLEMENTED

Summary: Feature not available

The requested operation or feature has not yet been implemented in this version of the GPU module.

Likely causes

  • Using a query syntax or function that is planned but not yet released
  • Calling an operation on a data type that does not yet have GPU support

Remediation steps

  1. Review the KDB-X GPU module changelog for the feature's planned release version.
  2. Check whether a CPU fallback is available for your operation.
  3. Rewrite the query to avoid the unimplemented construct.
  4. Contact KDB-X to request prioritisation or a workaround.

GPU_UNKNOWN

Summary: Unclassified error

An unexpected error occurred that does not match any known error category.

Likely causes

  • A bug in the error reporting code

Remediation steps

  1. Capture the full stack trace.
  2. Check whether earlier API calls in the same session returned a more specific error code that may indicate the root cause.
  3. Attempt to reproduce with a minimal input to isolate the trigger.
  4. File a bug report with KDB-X support, including the trace and reproduction steps.