Troubleshooting Errors¶

This guide helps you diagnose and resolve errors returned by the KDB-X GPU module. Each error includes its likely causes and step-by-step remediation.

How to use this guide¶

Locate the error code in the Error index.
Review the summary to confirm it matches your issue.
Check the likely causes to narrow down the problem.
Follow the remediation steps in order.
If unresolved, escalate with logs and a reproducible example.

Error index¶

Category	Error code	Summary
Internal	GPU_INTERNAL_LOGIC_ERROR	Unexpected internal control flow
	GPU_SELECT_DEADLOCK	Interpreter deadlock detected
	GPU_LAUNCH_FAIL	GPU kernel launch failed
	GPU_CALCULATE_SCRATCH_ERR	Scratch memory calculation failed
	GPU_Q_ERROR	A q C API call signalled an error
Memory	GPU_ARENA_MAP_FAILED	Host allocator creation failed
	GPU_HOST_ALLOC_FAILED	Host memory allocation failed
	GPU_CUDA_ALLOC_FAILED	GPU VRAM allocation failed
	GPU_CREATE_GLOBAL_SET	Global set creation failed
Type/parse	GPU_PARSE_FAIL	Input parsing failed
	GPU_PARSE_NOT_IMPLEMENTED	Input parsing not implemented
	GPU_INVALID_STATEMENT	Statement rejected during parsing
	GPU_TYPE_UNSUPPORTED	Type not supported for operation
	GPU_INVALID_DOMAIN	Enumeration domain not found
Data and column	GPU_SYMBOL_SORTING_NOT_IMPLEMENTED	Sorting on symbols or character arrays is not supported
	GPU_EXPECTED_FOREIGN	Expected a GPU foreign object
	GPU_WRONG_DEVICE	Object belongs to a different GPU device
	GPU_NOT_A_COLUMN	Symbol not found in column list
	GPU_INVALID_VALUE	Input value invalid for the operation
Other	GPU_UNKNOWN	Unclassified error
	GPU_NOT_IMPLEMENTED	Feature not available

Internal errors¶

`GPU_INTERNAL_LOGIC_ERROR`¶

Summary: Unexpected internal control flow

The GPU module reached a code path that should be unreachable under normal operation. This indicates a bug within the module itself rather than incorrect user input.

Remediation steps

Note the exact query or operation that triggered the error.
Try downgrading to the previous stable release to confirm it is a regression.
Report the issue to KDB-X with a reproducible test case.

This indicates a system-level issue. Escalate to KDB-X support.

`GPU_SELECT_DEADLOCK`¶

Summary: Interpreter deadlock detected

The GPU interpreter has reached a deadlock state where two or more operations are waiting on each other and can no longer proceed. This is caused by a bug in the module's concurrency logic and should not occur under normal operation.

Remediation steps

Terminate the GPU process.
Upgrade to the latest module version where the bug may be fixed.
Report to KDB-X support - this error should not occur.

This indicates a system-level issue. Escalate to KDB-X support.

`GPU_LAUNCH_FAIL`¶

Summary: GPU kernel launch failed

An attempt to launch a CUDA kernel on the GPU device failed. The kernel could not be dispatched for execution.

Likely causes

CUDA driver or device in an error state
Invalid internal kernel configuration

Remediation steps

Check nvidia-smi for GPU health and error state.
Run compute-sanitizer to detect device errors.
Reboot the host to reset the CUDA driver and GPU state.
Verify the CUDA toolkit and driver versions are compatible.

`GPU_CALCULATE_SCRATCH_ERR`¶

Summary: Scratch memory calculation failed

The module failed to determine how much scratch (temporary working) memory a GPU kernel requires before launching it. This is a pre-launch validation step.

Likely causes

A bug in the scratch-size calculation logic for a specific kernel
An unsupported parameter combination passed to the calculation routine

Remediation steps

Note the specific operation that triggered this error.
Report to KDB-X support with full context - this error should not occur.

This indicates a system-level issue. Escalate to KDB-X support.

`GPU_Q_ERROR`¶

Summary: A q C API call signalled an error

A call into the q C API (for example, building the result table) signalled an error. The GPU stages of the operation completed, but the result could not be handed back to q.

Likely causes

q workspace exhaustion (wsfull) while materialising the result
A result structure that q rejects, indicating a bug in the module

Remediation steps

Check q's workspace usage against the -w limit and increase it if the result is large.
Reduce the result size (fewer rows or columns) and retry.
If memory is not the issue, report to KDB-X support with a reproducible example.

Memory errors¶

`GPU_ARENA_MAP_FAILED`¶

Summary: Host allocator creation failed

The module failed to create the memory arena (allocator) used to manage intermediate data structures on the host (CPU-side memory). This is a prerequisite for any subsequent GPU operation.

Likely causes

Insufficient host RAM available at the time of the call
Memory fragmentation preventing a large contiguous allocation
OS-level memory limits (for example, ulimit) being exceeded

Remediation steps

Check host memory usage with free -h or a system monitor.
Terminate other memory-heavy processes to free up RAM.
Check ulimit -v and increase virtual memory limits if required.

`GPU_HOST_ALLOC_FAILED`¶

Summary: Host memory allocation failed

An attempt to allocate an intermediate data structure on the host (CPU RAM) failed. The allocator was created successfully, but the specific allocation request could not be fulfilled.

Likely causes

System RAM is fully exhausted
The requested allocation size exceeds available contiguous memory

Remediation steps

Monitor memory usage during the operation with top or htop.
Reduce the size of the input dataset or split it into smaller batches.
Restart the process to clear any leaked allocations.
Add more physical RAM or increase available swap space.

`GPU_CUDA_ALLOC_FAILED`¶

Summary: GPU VRAM allocation failed

A CUDA memory allocation call (cudaMalloc or cudaMallocAsync) returned cudaErrorMemoryAllocation, meaning the GPU does not have enough free VRAM to satisfy the request.

Likely causes

GPU VRAM is exhausted by the current workload
Other processes are consuming VRAM (for example, display driver or other CUDA applications)
Memory fragmentation on the GPU leaving no contiguous free block of sufficient size
Requesting more memory than the GPU physically has

Remediation steps

Run nvidia-smi to inspect current VRAM usage.
Kill or pause other GPU-bound processes to free VRAM.
Reduce dataset size.
Switch to a GPU with more VRAM if the workload fundamentally requires it.
Set CUDA_VISIBLE_DEVICES to isolate the process to a less-loaded GPU.

This is the most common error in production GPU workloads. Monitor VRAM headroom proactively.

`GPU_CREATE_GLOBAL_SET`¶

Summary: Global set creation failed

The module failed to create a global set, most likely due to a CUDA memory allocation failure on the device.

Likely causes

Insufficient GPU VRAM available at the time of the call
A preceding cudaMalloc failure that has not been handled
GPU device in an error state

Remediation steps

Run nvidia-smi to inspect current VRAM usage.
Kill or pause other GPU-bound processes to free VRAM.
Check for a preceding GPU_CUDA_ALLOC_FAILED error in the same session - resolve that first.
Reboot the host to reset the CUDA driver and GPU state if the device appears unhealthy.

Type/parse errors¶

`GPU_PARSE_FAIL`¶

Summary: Input parsing failed

The GPU module was unable to parse the provided input. The input did not conform to the expected format.

Likely causes

Malformed parse tree or expression
Unexpected parse tree structure

Remediation steps

Validate the query syntax against the KDB-X GPU module documentation.
Simplify the query to isolate which part is causing the parse failure.

`GPU_PARSE_NOT_IMPLEMENTED`¶

Summary: Input parsing not implemented

The GPU module recognised the input but parsing for this construct has not yet been implemented.

Likely causes

Using a query construct that is planned but not yet supported by the GPU parser
A syntax variant that the CPU parser accepts but the GPU parser does not yet handle

Remediation steps

Rewrite the query to avoid the unimplemented construct.
Use the CPU fallback path in the interim.
Contact KDB-X to request prioritisation or a workaround.

`GPU_INVALID_STATEMENT`¶

Summary: Statement rejected during parsing

The input was parseable but the resulting statement was determined to be semantically invalid. The statement structure violates the module's rules.

Likely causes

Type mismatch between operands in an expression
Use of a keyword in an unsupported context
Missing required clauses or arguments in a statement

Remediation steps

Break complex statements into smaller parts to locate the invalid construct.
Check operand types - ensure all types are compatible with the operation.
Check if the same query works on the CPU with functional selects.

`GPU_TYPE_UNSUPPORTED`¶

Summary: Type not supported for operation

The data type of an operand is not supported for the requested GPU operation. The operation exists but does not handle this specific type.

Likely causes

Using a data type that has not yet been implemented for this operation on the GPU
Mixing types in an operation that requires homogeneous inputs

Remediation steps

Check the KDB-X docs for supported types for the operation.
Cast the data to a supported type before passing it to the operation.

`GPU_INVALID_DOMAIN`¶

Summary: Enumeration domain not found

The named domain for an enumeration could not be located.

Likely causes

Referencing an enum domain that has not been defined
Corrupted or incomplete enum metadata

Remediation steps

Verify that the enumeration domain is correctly defined.
Re-register or re-initialise the enum domain if it has been removed.
Validate enum metadata integrity in the underlying data files.

Data and column errors¶

`GPU_SYMBOL_SORTING_NOT_IMPLEMENTED`¶

Summary: Sorting on symbols or character arrays is not supported

The query attempted to sort on a symbol or character array column, which is not currently supported by the GPU module.

Likely causes

Using xasc, xdesc, or an order by clause on a symbol or character array column
An implicit sort triggered by a keyed table operation on a symbol column

Remediation steps

Convert the symbol column to an enum before sorting - for example, update sym:`sym$sym from table.
Perform the sort on the CPU and then move the result to the GPU.

`GPU_EXPECTED_FOREIGN`¶

Summary: Expected a GPU foreign object

The module expected the input object to be a GPU foreign (a table or array resident on the GPU) but received a non-foreign object.

Likely causes

Passing a CPU-resident table directly to a GPU operation without first moving it with .gpu.to
A prior .gpu.to call that failed silently, leaving a CPU object in place

Remediation steps

Ensure the table or array has been moved to the GPU with .gpu.to before passing it to GPU operations.
Confirm the return value of .gpu.to - it should display as a composite of foreign objects.

`GPU_WRONG_DEVICE`¶

Summary: Object belongs to a different GPU device

The object being accessed was allocated on a different GPU device than the one currently active.

Likely causes

Using a multi-GPU setup where data was moved to one device but the operation is running on another
The active device changed between the .gpu.to call and the operation

Remediation steps

Check the active device with .gpu.getDev.
Verify which device the object was allocated on - it must match the active device.
Use .gpu.setDev to set the correct active device before running the operation.
If using multiple devices, ensure data and operations are consistently routed to the same device.

`GPU_NOT_A_COLUMN`¶

Summary: Symbol not found in column list

A symbol used in the query could not be matched to any column in the table.

Likely causes

A column name is misspelled in the query
Referencing a column that does not exist in the GPU table
Case mismatch in the column name

Remediation steps

Check the column names in the table with cols or key .gpu.to table.
Verify the spelling and case of all column references in the query.
Confirm the correct table is being queried.

`GPU_INVALID_VALUE`¶

Summary: Input value invalid for the operation

An input value was rejected by the operation. The value's type may be supported, but the specific value could not be processed.

Likely causes

An argument outside the range or form the operation accepts
Applying the operation to a host-resident column failed in q

Remediation steps

Review the arguments passed to the operation and confirm they match the documented form.
Simplify the call to isolate which argument is being rejected.
Check whether the same operation succeeds on the CPU - if it fails there too, correct the input data.

Other errors¶

`GPU_UNKNOWN`¶

Summary: Unclassified error

An unexpected error occurred that does not match any known error category.

Likely causes

A bug in the error reporting code

Remediation steps

Capture the full stack trace.
Attempt to reproduce with a minimal input to isolate the trigger.
File a bug report with KDB-X support, including the trace and reproduction steps.

`GPU_NOT_IMPLEMENTED`¶

Summary: Feature not available

The requested operation or feature has not yet been implemented in this version of the GPU module.

Likely causes

Using a query syntax or function that is planned but not yet released
Calling an operation on a data type that does not yet have GPU support

Remediation steps

Rewrite the query to avoid the unimplemented construct.
Contact KDB-X to request prioritisation or a workaround.

Troubleshooting Errors¶

How to use this guide¶

Error index¶

Internal errors¶

GPU_INTERNAL_LOGIC_ERROR¶

GPU_SELECT_DEADLOCK¶

GPU_LAUNCH_FAIL¶

GPU_CALCULATE_SCRATCH_ERR¶

GPU_Q_ERROR¶

Memory errors¶

GPU_ARENA_MAP_FAILED¶

GPU_HOST_ALLOC_FAILED¶

GPU_CUDA_ALLOC_FAILED¶

GPU_CREATE_GLOBAL_SET¶

Type/parse errors¶

GPU_PARSE_FAIL¶

GPU_PARSE_NOT_IMPLEMENTED¶

GPU_INVALID_STATEMENT¶

GPU_TYPE_UNSUPPORTED¶

GPU_INVALID_DOMAIN¶

Data and column errors¶

GPU_SYMBOL_SORTING_NOT_IMPLEMENTED¶

GPU_EXPECTED_FOREIGN¶

GPU_WRONG_DEVICE¶

GPU_NOT_A_COLUMN¶

GPU_INVALID_VALUE¶

Other errors¶

GPU_UNKNOWN¶

GPU_NOT_IMPLEMENTED¶

`GPU_INTERNAL_LOGIC_ERROR`¶

`GPU_SELECT_DEADLOCK`¶

`GPU_LAUNCH_FAIL`¶

`GPU_CALCULATE_SCRATCH_ERR`¶

`GPU_Q_ERROR`¶

`GPU_ARENA_MAP_FAILED`¶

`GPU_HOST_ALLOC_FAILED`¶

`GPU_CUDA_ALLOC_FAILED`¶

`GPU_CREATE_GLOBAL_SET`¶

`GPU_PARSE_FAIL`¶

`GPU_PARSE_NOT_IMPLEMENTED`¶

`GPU_INVALID_STATEMENT`¶

`GPU_TYPE_UNSUPPORTED`¶

`GPU_INVALID_DOMAIN`¶

`GPU_SYMBOL_SORTING_NOT_IMPLEMENTED`¶

`GPU_EXPECTED_FOREIGN`¶

`GPU_WRONG_DEVICE`¶

`GPU_NOT_A_COLUMN`¶

`GPU_INVALID_VALUE`¶

`GPU_UNKNOWN`¶

`GPU_NOT_IMPLEMENTED`¶