GPU reference card¶

This page covers the KDB-X GPU module APIs, including inputs, outputs, and examples for each function.

Keywords¶

aj	append	asc	attr	bin	cntDev	count	drop	from	gather
getDev	getMemRelThres	group	iasc	memDev	meta	nvtx	profiler	select	setDev
setMemRelThres	sublist	sync	take	to	type	ver	xasc	xgroup	xto

By category¶

data transfer	from, to, xto
device management	cntDev, getDev, getMemRelThres, memDev, setDev, setMemRelThres, sync
list	append, count, drop, gather, group, sublist, take
join	aj
meta	attr, type
query	select
sort	asc, bin, iasc
table	meta, xasc, xgroup

Note

Examples use the following naming convention:

Lowercase names indicate CPU-resident objects (for example, t, a).
Uppercase names indicate GPU or mixed-residency objects (for example, T, A).

q).gpu:use`kx.gpu
q)\S 42

aj¶

As-of join

.gpu.aj  [c; t1; t2]
.gpu.aj0 [c; t1; t2]
.gpu.ajf [c; t1; t2]
.gpu.ajf0[c; t1; t2]

Where:

t1 is a CPU, GPU or mixed-residency table
t2 is a CPU, GPU or mixed-residency table
c is a symbol vector of n column names, common to t1 and t2, and of matching type
column c_n is of a sortable type (typically time)

returns a table with records from the left-join of t1 and t2. In the join, columns c₀...c_n-1 are matched for equality, and the last value of c_n (most recent time) is taken. For each record in t1, the result has one record with the items in t1, and

if there are matching records in t2, the items of the last (in row order) matching record are appended to those of t1;
otherwise the remaining columns are null.

The .gpu.aj and .gpu.bin APIs are currently limited to one or two column names in c.

The .gpu.aj, .gpu.bin and .gpu.group APIs are currently limited to long, timestamp or timespan columns.

q)t:([]time:2026.01.01D10:01:01 2026.01.01D10:01:03 2026.01.01D10:01:04;sym:`msft`ibm`ge;qty:100 200 150)
q)q:([]time:2026.01.01D10:01:00 2026.01.01D10:01:00 2026.01.01D10:01:00 2026.01.01D10:01:02;sym:`g#`ibm`msft`msft`ibm;px:100 99 101 98)
q)T:.gpu.to t
q)Q:.gpu.to q

Join on a single column - every row in t is matched against the most recent row overall in q:

q).gpu.from .gpu.aj[`time;T;Q]
time                          sym  qty px
------------------------------------------
2026.01.01D10:01:01.000000000 msft 100 101
2026.01.01D10:01:03.000000000 ibm  200 98
2026.01.01D10:01:04.000000000 ibm  150 98

Join on two columns - each row in t is matched against the most recent row in q with the same sym. The `g# attribute on the exact-match column in t2 is required for two-column joins.

q).gpu.from .gpu.aj[`sym`time;T;Q]
time                          sym  qty px
------------------------------------------
2026.01.01D10:01:01.000000000 msft 100 101
2026.01.01D10:01:03.000000000 ibm  200 98
2026.01.01D10:01:04.000000000 ge   150

The ge row has a null px because there is no ge in q.

`aj` vs. `aj0`¶

aj and aj0 return different times in their results:

aj    boundary time from t1
aj0   actual time from t2

q).gpu.from .gpu.aj0[`sym`time;T;Q]
time                          sym  qty px
------------------------------------------
2026.01.01D10:01:00.000000000 msft 100 101
2026.01.01D10:01:02.000000000 ibm  200 98
2026.01.01D10:01:04.000000000 ge   150

`ajf`, `ajf0`¶

ajf and ajf0 fill from t1 if the corresponding value in t2 is null.

q)t0:([]time:2#2026.01.01D00:00:01;sym:`a`b;p:1 1;n:`r`s)
q)t1:([]time:2#2026.01.01D00:00:01;sym:`a`b;p:0 1)
q)t2:([]time:2#2026.01.01D00:00:00;sym:`g#`a`b;p:1 0N;n:`r`s)
q)T1:.gpu.to t1
q)T2:.gpu.to t2

With aj, the null p value for b in t2 is preserved in the result:

q).gpu.from .gpu.aj[`sym`time;T1;T2]
time                          sym p n
-------------------------------------
2026.01.01D00:00:01.000000000 a   1 r
2026.01.01D00:00:01.000000000 b     s

With ajf, null values in t2 are filled from t1:

q).gpu.from .gpu.ajf[`sym`time;T1;T2]
time                          sym p n
-------------------------------------
2026.01.01D00:00:01.000000000 a   1 r
2026.01.01D00:00:01.000000000 b   1 s
q)t0 ~ .gpu.from .gpu.ajf[`sym`time;T1;T2]
1b

ajf0 combines both: fills nulls from t1 and uses the actual time from t2:

q).gpu.from .gpu.ajf0[`sym`time;T1;T2]
time                          sym p n
-------------------------------------
2026.01.01D00:00:00.000000000 a   1 r
2026.01.01D00:00:00.000000000 b   1 s

append¶

Append two lists

.gpu.append[x;y]

Where x and y are CPU or GPU-resident lists, dictionaries or tables, returns x joined to y.

q)A:.gpu.to a:til 5
q)B:.gpu.to b:a+10
q).gpu.from .gpu.append[A;b]
0 1 2 3 4 10 11 12 13 14

q)D:.gpu.to d:`a`b!(a;b)
q).gpu.from .gpu.append[D;d]
a| 0  1  2  3  4  0  1  2  3  4
b| 10 11 12 13 14 10 11 12 13 14

q)T:.gpu.to t:2#flip d
q)t
a b
----
0 10
1 11
q).gpu.from .gpu.append[T;t]
a b
----
0 10
1 11
0 10
1 11

Only the `g attribute is preserved on GPU-resident appends.

.gpu.append falls back to CPU join given entirely CPU-resident arguments.

q).gpu.from .gpu.append[a;b]
0 1 2 3 4 10 11 12 13 14
q).gpu.append[a;b]
0 1 2 3 4 10 11 12 13 14

asc¶

Ascending sort

.gpu.asc x

Where x is a GPU-resident vector, returns its items in ascending order of value, with the `s attribute set, indicating the list is sorted.

Sorting on GPU-resident symbols is not yet implemented; consider an enumeration instead.

q)A:.gpu.to a:10?10
q)a
7 4 7 1 0 5 9 9 2 9
q).gpu.attr .gpu.asc A
`s
q).gpu.from .gpu.asc A
`s#0 1 2 4 5 7 7 9 9 9

iasc¶

Ascending grade

.gpu.iasc x

Where x is a GPU-resident vector, returns the indices needed to sort x in ascending order.

q)A:.gpu.to a:10?10
q)a
7 4 7 1 0 5 9 9 2 9
q).gpu.from .gpu.iasc A
4 3 8 1 5 0 2 6 7 9
q)a .gpu.from .gpu.iasc A
0 1 2 4 5 7 7 9 9 9

xasc¶

Sort a table in ascending order of specified columns

.gpu.xasc[x;y]

Where x is a symbol vector of column names defined in table y, returns y sorted in ascending order by x. The sort is by the first column specified, then by the second column within the first, and so on. All columns referenced in x must be GPU-resident.

q)T:.gpu.to t:0N?([]sym:raze 2#/:`a`b`c; date:6#2025.01.01+til 2; val:50+6?10f)
q)t
sym date       val
-----------------------
c   2025.01.02 51.1578
b   2025.01.01 57.52659
b   2025.01.02 59.49526
a   2025.01.01 59.74317
c   2025.01.01 53.74586
a   2025.01.02 50.81106
q).gpu.from .gpu.xasc[`date;T]
sym date       val     
-----------------------
b   2025.01.01 57.52659
a   2025.01.01 59.74317
c   2025.01.01 53.74586
c   2025.01.02 51.1578 
b   2025.01.02 59.49526
a   2025.01.02 50.81106
q).gpu.from .gpu.xasc[`date`val;T]
sym date       val     
-----------------------
c   2025.01.01 53.74586
b   2025.01.01 57.52659
a   2025.01.01 59.74317
a   2025.01.02 50.81106
c   2025.01.02 51.1578 
b   2025.01.02 59.49526

q)/ Use a symbol enum to sort on symbols
q).gpu.from .gpu.xasc[`sym`date;T]
'GPU_SYMBOL_SORTING_NOT_IMPLEMENTED
q)syms:exec distinct sym from t
q)T:.gpu.to t:update `syms$sym from t
q).gpu.from .gpu.xasc[`sym`date;T]
sym date       val     
-----------------------
a   2025.01.01 59.74317
a   2025.01.02 50.81106
b   2025.01.01 57.52659
b   2025.01.02 59.49526
c   2025.01.01 53.74586
c   2025.01.02 51.1578

attr¶

Attributes of a GPU or CPU object

.gpu.attr x

Where x is any CPU or GPU-resident object, returns its attribute as a symbol atom.

The possible attributes are:

code	attribute
`s	sorted
`u	unique
`p	parted
`g	grouped

A null symbol result ` means no attributes are set on x.

All attributes are preserved on .gpu.to, while all but `s are lost on .gpu.from.

Vanilla attr returns ` for GPU-resident foreigns regardless of their true attributes.

q)G:.gpu.to g:`g#a:10?10
q)a
9 9 2 1 5 0 9 9 6 1
q)g
`g#9 9 2 1 5 0 9 9 6 1
q)attr g
`g
q)attr G        / vanilla attr insufficient on gpu foreign objects
`
q).gpu.attr G
`g
q).gpu.attr g   / .gpu.attr falls back to vanilla for CPU-resident objects
`g

q).gpu.attr .gpu.to til 10
`
q).gpu.attr .gpu.asc .gpu.to til 10
`s

q)attr .gpu.from .gpu.to `s#1 2 3
`s
q)attr .gpu.from .gpu.to `p#1 2 3
`               / only `s preserved on .gpu.from

bin¶

Binary search

.gpu.bin[x;y]

Vectors¶

Where:

x is a sorted GPU-resident vector.
y is a GPU-resident vector of the same type.

returns the index of the last item in x which is ≤ each y[i]. The result is -1 where y[i] is less than the first item of x.

.gpu.aj, .gpu.bin and .gpu.group are currently limited to long, timestamp or timespan columns.

q)A:.gpu.to a:0 2 4 6 8 10
q)B:.gpu.to b:5 6 7
q).gpu.from .gpu.bin[A;B]
2 3 3

q)B:.gpu.to b:-10 0 4 5 6 20
q).gpu.from .gpu.bin[A;B]
-1 0 2 2 3 5

.gpu.bin uses a binary search algorithm, which is generally more efficient on large data. The items of x must be sorted ascending although .gpu.bin does not verify this property.

Tables¶

Where:

x is a GPU-resident table of n columns
y is a GPU-resident table with the same schema

returns the indices of the last row of x for which

the first n-1 columns each match the first n-1 columns of the corresponding row of y, and
the last column is not greater than the last column of the corresponding row of y.

If no items match the criteria, either because there are no rows that match in the first n-1 columns, or because the last value is smaller than the last value in the first such row, 0N is returned.

.gpu.aj and .gpu.bin are currently limited to one or two columns.

.gpu.bin is significantly faster when GPU `g is applied to the exact-match column; refer to .gpu.group.

q)A:.gpu.to a:([]a:`p`p`p`q`q`q;b:0 2 4 0 2 4)
q)B:.gpu.to b:([]a:`q;b:-1 1 3 5)
q).gpu.from .gpu.bin[A;B]
0N 3 4 5

To use bin with a table, the last column need not be sorted overall, but it needs to be sorted within the equivalence classes defined by the first n-1 columns (as shown in the previous example).

count¶

Count of a CPU or GPU-resident object

.gpu.count x

Where x is:

a vector, returns the number of elements
a dictionary, returns the number of keys
a table, returns the number of rows
anything else, 1

Vanilla count returns 1 for GPU-resident foreigns regardless of their true size.

q)A:.gpu.to a:til 5
q)D:.gpu.to d:`a`b!(a;a+10)
q)T:.gpu.to t:flip d

q).gpu.count each (a;d;t)
5 2 5
q).gpu.count each (A;D;T)
5 2 5

q)count each (a;d;t)
5 2 5
q)count each (A;D;T)
1 2 1

drop¶

Drop specified count, keys or columns

.gpu.drop[x;y]

Where x is a symbol atom or vector, returns a dictionary/table y without the specified keys/columns.

Note

Unlike the standard drop, .gpu.drop does not currently support dropping by count; it only supports dropping by key or column name.

q)T:.gpu.to t:([]a:`x`y`z;b:0 2 4;c:1 2 3)
q)T
+`a`b`c!(foreign;foreign;foreign)
q).gpu.drop[`b] T
+`a`c!(foreign;foreign)
q).gpu.drop[`b] t
a c
---
x 1
y 2
z 3
q).gpu.drop[`a`b] t
c
-
1
2
3
q).gpu.drop[`a`b] T
+(,`c)!,foreign

from¶

Retrieve data from the GPU

.gpu.from x

Where x is any CPU, GPU or mixed-resident object. Returns x moved to the CPU.

All attributes are preserved on .gpu.to, while all but `s are lost on .gpu.from.

q)A:.gpu.to a:til 5
q)D:.gpu.to d:`a`b!(a;a+10)
q)T:.gpu.to t:flip d

q).gpu.from A
0 1 2 3 4
q).gpu.from a
0 1 2 3 4
q).gpu.from D
a| 0  1  2  3  4
b| 10 11 12 13 14
q).gpu.from T
a b
----
0 10
1 11
2 12
3 13
4 14

gather¶

Gather specified indices of a list

.gpu.gather[x;y]

Where:

x is a GPU-resident vector
y is a vector of indices into x, either CPU-resident or GPU-resident.

returns the elements of x at each index y.

q)A:.gpu.to a:10?10
q)a
7 4 7 1 0 5 9 9 2 9
q)B:.gpu.to b:2 4 6 8
q)a b
7 0 9 2
q).gpu.from .gpu.gather[A;B]
7 0 9 2

q).gpu.from .gpu.asc[A]
`s#0 1 2 4 5 7 7 9 9 9
q).gpu.from .gpu.gather[A] .gpu.iasc A
0 1 2 4 5 7 7 9 9 9

Null indices produce null values in the output.

q)A:.gpu.to a:1 2 3
q).gpu.from .gpu.gather[A;.gpu.to 1 2 0N]
2 3 0N

getDev¶

Get current GPU

.gpu.getDev x

Where x is:

null, returns the currently selected device index
a GPU-resident object, returns the device index on which it resides

q).gpu.setDev 0
q).gpu.getDev[]
0
q)A:.gpu.to 1 2 3
q).gpu.setDev 2
q).gpu.getDev[]
2
q).gpu.getDev A
0

getMemRelThres¶

Get the memory release threshold

.gpu.getMemRelThres []

Get the memory release threshold for the GPU memory pool of the active device. Refer to .gpu.setMemRelThres.

q).gpu.getMemRelThres[]
0
q).gpu.setMemRelThres[1024]
q).gpu.getMemRelThres[]
1024
q).gpu.setDev 2
q).gpu.getMemRelThres[]
0

group¶

Apply GPU grouped attribute

.gpu.group x

Where x is a GPU-resident vector, returns x with the `g attribute set.

.gpu.aj and .gpu.bin are currently limited to long, timestamp or timespan columns.

All attributes are preserved on .gpu.to, while all but `s are lost on .gpu.from.

q)A:.gpu.to a:10?10
q)a
7 4 7 1 0 5 9 9 2 9
q)G:.gpu.group A
q).gpu.attr G
`g
q).gpu.from G
7 4 7 1 0 5 9 9 2 9     / `g not preserved on .gpu.from

xgroup¶

Apply GPU grouped attribute to specific columns/keys

.gpu.xgroup[x;y]

Where x is a symbol atom/list specifying GPU-resident keys/columns of a dictionary/table y, returns y with specified columns/keys grouped.

q)T:.gpu.to t:flip `a`b!(1 2 3 4;10 20 30 40)
q).gpu.meta T
c| t f a r
-| ---------
a| j     gpu
b| j     gpu
q).gpu.meta .gpu.xgroup[`a] T
c| t f a r
-| ---------
a| j   g gpu
b| j     gpu

memDev¶

Get memory info for current GPU

.gpu.memDev[]

Returns memory utilisation for the currently selected device in bytes, similar to .Q.w[].

q).gpu.memDev[]
used| 512
heap| 33554432
peak| 33554432
mphy| 23661248512
free| 23426301952

q)a:.gpu.memDev[]
q).gpu.memDev[]-a
used| 0
heap| 0
peak| 0
mphy| 0
free| 0
q)A:.gpu.to til 1000
q).gpu.memDev[]-a
used| 8064
heap| 0
peak| 0
mphy| 0
free| 0
q).gpu.setDev 1
q).gpu.memDev[]
used| 0
heap| 0
peak| 0
mphy| 23661248512
free| 23459856384

meta¶

Metadata of a table, including GPU-residency info

.gpu.meta x

Where x is a CPU, GPU or mixed-residency table, returns the metadata including a residency column.

Vanilla meta returns no metadata for GPU-resident foreigns regardless of their true type or attributes.

q)syms:-10?`3
q)q:`time xasc([]time:2026.07.10D12:32+5?00:00:00;sym:5?syms;px:95+5?10)
q).gpu.meta q
c   | t f a r  
----| ---------
time| p   s cpu
sym | s     cpu
px  | j     cpu
q).gpu.meta .gpu.to q
c   | t f a r  
----| ---------
time| p   s gpu
sym | s     gpu
px  | j     gpu
q).gpu.meta .gpu.xto[`time`sym] q
c   | t f a r  
----| ---------
time| p   s gpu
sym | s     gpu
px  | j     cpu

q)/ vanilla meta returns no information for GPU-resident columns
q)meta .gpu.xto[`time`sym] q
c   | t f a
----| -----
time|      
sym |      
px  | j

cntDev¶

Count of available GPUs

.gpu.cntDev[]

Returns the number of available GPUs. A process may be limited by setting CUDA_VISIBLE_DEVICES in its environment.

q).gpu.cntDev[]
4
q){.gpu.setDev x;.gpu.getDev[]} peach til 4
0 1 2 3

q) / in another process limited to devices 1 and 3
q)getenv `CUDA_VISIBLE_DEVICES
"1,3"
q).gpu.cntDev[]
2
q){.gpu.setDev x;.gpu.memDev[]} peach til .gpu.cntDev[]
used heap peak mphy        free
--------------------------------------
0    0    0    23661248512 23459856384
0    0    0    23661248512 23459856384

nvtx¶

Use the NVIDIA NVTX tool to record ranges that can be viewed in Nsight Systems.

end¶

Mark the end of a record range

.gpu.nvtx.end[range_id]

Mark the end of the range with the id range_id. Refer to .gpu.nvtx.start.

range_id:.gpu.nvtx.start["big select statement"]
.gpu.from .gpu.select[...]
.gpu.nvtx.end[range_id]

start¶

Mark the beginning of a record range

range_id:.gpu.nvtx.start[desc_str]

Mark the beginning of the range with a descriptive string. .gpu.nvtx.start returns a long id value that should be used with .gpu.nvtx.end to stop recording. There are a few ways to use NVTX ranges with Nsight Systems that can be helpful for finding bottlenecks.

range_id:.gpu.nvtx.start["big select statement"]
.gpu.from .gpu.select[...]
.gpu.nvtx.end[range_id]

profiler¶

Access to the CUDA profiler control API

Use it to selectively profile a specific section of your application.

start¶

Calls cudaProfilerStart

.gpu.profiler.start[]

stop¶

Calls cudaProfilerStop

.gpu.profiler.stop[]

setDev¶

Set current GPU

.gpu.setDev x

Where x is a long from 0 to .gpu.cntDev[]-1, selects the device with that index.

.gpu.setDev takes an index into the available device list. For a process with CUDA_VISIBLE_DEVICES='3,1', .gpu.setDev 0 selects the device with system ID 3.

q)getenv `CUDA_VISIBLE_DEVICES
"3,1"
q).gpu.getDev[]
0
q)A:.gpu.to til 1000
q).gpu.setDev 1
q).gpu.getDev[]
1
q)B:.gpu.to til 1000
q).gpu.getDev each (A;B)
0 1

select¶

Functional select

The GPU module provides a select function that mirrors the functional‑select API ?[t;c;b;a], except that t is a table whose columns reside on the GPU.

.gpu.select[t;c;b;a]

Where:

t is a table with all columns on device
c is the Where phrase, a list of constraints
b is the By phrase
a is the Select phrase

Currently all columns of t must be resident on the GPU.

Run parse on a qSQL statement to see how to represent it as a functional select

q)parse "select p_min:min p, p_max:max p, p_avg:avg p by sym from t"
?
`t
()
(,`sym)!,`sym
`p_min`p_max`p_avg!((min;`p);(max;`p);(avg;`p))

q)t:([]sym:`AAPL`MSFT`AAPL`MSFT`AAPL;p:150 300 152 298 155;size:100 200 150 300 250)
q)T:.gpu.to t

q)/ select all columns
q).gpu.from .gpu.select[T;();0b;()]
sym  p   size
-------------
AAPL 150 100
MSFT 300 200
AAPL 152 150
MSFT 298 300
AAPL 155 250

q)/ select specific columns
q).gpu.from .gpu.select[T;();0b;`sym`p!(`sym;`p)]
sym  p
--------
AAPL 150
MSFT 300
AAPL 152
MSFT 298
AAPL 155

q)/ where sym = `AAPL - enlist the symbol to distinguish it from a column name
q).gpu.from .gpu.select[T;enlist(=;`sym;enlist`AAPL);0b;()]
sym  p   size
-------------
AAPL 150 100
AAPL 152 150
AAPL 155 250

q)/ multiple where conditions are ANDed
q).gpu.from .gpu.select[T;((=;`sym;enlist`AAPL);(>;`p;151));0b;()]
sym  p   size
-------------
AAPL 152 150
AAPL 155 250

q)/ aggregation by group
q).gpu.from .gpu.select[T;();enlist[`sym]!enlist`sym;`p_min`p_max`p_avg!((min;`p);(max;`p);(avg;`p))]
sym  p_min p_max p_avg
-------------------------
AAPL 150   155   152.3333
MSFT 298   300   299

q)/ computed column: avg price * size by sym
q).gpu.from .gpu.select[T;();enlist[`sym]!enlist`sym;enlist[`avgpxsz]!enlist(avg;(*;`p;`size))]
sym  avgpxsz
-------------
AAPL 25516.67
MSFT 74700

q)/ unary ops in the select phrase
q).gpu.from .gpu.select[T;();0b;enlist[`logp]!enlist(log;`p)]
logp
--------
5.010635
5.703782
5.023881
5.697093
5.043425

Where clause¶

c is a list of parse-tree expressions, each evaluating to a boolean vector.
Expressions can use all supported operations.
() (empty list) means no constraints.
Multiple conditions are ANDed: ((>;`p;100);(<;`p;200)).

By clause¶

b is a dictionary mapping output key names to input column names: enlist[`sym]!enlist`sym.
0b means no grouping.

The output is usually sorted by the key columns; this is currently not true for symbol key columns.

Select clause¶

a is a dictionary mapping output column names to input columns or parse-tree expressions.
() selects all columns.
Bare symbols in dictionary values are column references. To reference a literal symbol, enlist it.

Supported operations¶

Not all types are supported for all operations, but in general the supported types are scalars and 1D vectors of basic atomic types.

Binary ops: =, <>, <, >, <=, >=, +, -, *, %, |, &, in, within, xbar, xexp
Unary ops: abs, log, exp, sin, asin, cos, acos, tan, atan, floor, ceiling, sqrt
Aggregate ops: sum, min, max, avg, count, first, last, var, dev, wavg
Scan ops: sums, prds, mins, maxs
Windowed scan ops: mavg, msum, mmin, mmax, mdev
Sorting ops: iasc
Casting

In parse-tree form, <> is (';~:;=), >= is (';~:;<), and <= is (';~:;>).

Results may deviate slightly from base kdb+. For more information, refer to CUDA Mathematical Functions.

Performance note¶

Depending on the operations present, the Where clause may not filter rows until the final stage of the select operation. If the Where clause will considerably reduce the size of the table, consider applying it in a separate select statement first. This keeps allocation sizes predictable and allows Where conditions to be evaluated concurrently.

setMemRelThres¶

Set the memory release threshold

.gpu.setMemRelThres num_bytes

Set the memory release threshold for the GPU memory pool of the active device to num_bytes bytes. When the memory pool has reserved more than num_bytes bytes, it will release unused memory down to the threshold. Memory will be released at synchronization points, such as .gpu.sync or .gpu.from. This does not set a limit on the amount of memory that can be allocated.

q)/ set the current active device to device 1
q).gpu.setDev 1
q)/ set memory release threshold of device 1 to 10GB
q).gpu.setMemRelThres 10*1024*1024*1024

Excess unused memory above the threshold is released at synchronization points: the peak field of .gpu.memDev is the peak reserved memory, and can be used for sizing.

Using .gpu.setMemRelThres can be a significant performance win and is recommended.

sublist¶

Head, tail or slice of a list

.gpu.sublist[x;y]

Where x is:

a positive long atom, returns the first x items of y.
a negative long atom, returns the last x items of y.
a long vector, returns up to x[1] items from y, starting at item x[0]. The result contains no more items than are available in y.

Negative starting points are not yet implemented for GPU-resident vectors.

q)/ Head or tail, similar to .gpu.take but limited to the list length
q)A:.gpu.to a:2 3 5 7 11
q).gpu.sublist[3;A]
foreign
q).gpu.from .gpu.sublist[3;A]
2 3 5
q).gpu.from .gpu.sublist[10;A]
2 3 5 7 11
q).gpu.from .gpu.take[10;A]
2 3 5 7 11 2 3 5 7 11

q)syms:-10?`3
q)T:.gpu.to t:`time xasc([]time:2026.07.10D12:32+5?00:00:00;sym:5?syms;px:95+5?10)
q)t
time                          sym px 
-------------------------------------
2026.07.10D18:48:49.000000000 adl 104
2026.07.11D03:15:53.000000000 cgl 97 
2026.07.11D04:57:05.000000000 hmc 104
2026.07.11D06:32:25.000000000 fjk 96 
2026.07.11D09:26:01.000000000 cgl 104
q).gpu.from .gpu.sublist[-2] T
time                          sym px 
-------------------------------------
2026.07.11D06:32:25.000000000 fjk 96 
2026.07.11D09:26:01.000000000 cgl 104


q)/ Slice
q)A:.gpu.to a:2 3 5 7 11
q).gpu.from .gpu.sublist[1 2;A]
3 5

q) / Negative start for GPU nyi
q)a
2 3 5 7 11
q)sublist[-2 5;a]
0N 0N 2 3 5
q).gpu.sublist[-2 5;a]
0N 0N 2 3 5
q).gpu.sublist[-2 5;A]
'nyi

sync¶

Wait for outstanding GPU work

.gpu.sync[]

Blocks the calling CPU thread until all work on the currently selected GPU has completed. Returns null.

.gpu.sync is useful to avoid 'wsfull when the CPU provides work faster than the GPU can consume it, or when more explicit control of memory is needed.

q)a:til 1000000
q)-16!a
1i                  / only 1 reference to a
q)A:.gpu.to a       / returns before the copy to device is completed
q)-16!a             / the gpu stream owns a reference to a to facilitate this
2i
q)r:.gpu.from A     / implicit sync before return to q, but .gpu.from does no destructor work.
q)-16!a             / gpu stream still owns a reference to a, despite the copy having completed
2i

q).gpu.sync[]       / explicit sync, waits for the stream to complete then completes any destructor work
q)-16!a
1i

q)/ Most other apis will run any outstanding destructor work but without syncing.
q)a:til 1000000; b:til 1000000  / Get new objects, both rc=1
q)A:.gpu.to a; B:.gpu.to b      / Both return A & B before either copy is complete:
q)-16!a
2i
q)-16!b
2i
q)B:.gpu.to b                   / Both copies have completed, doing another doesn't sync but does clean up the refs.
q)-16!a
1i
q)-16!b
2i
q).gpu.sync[]                   / And a final sync cleans up the latest ref.
q)-16!a
1i
q)-16!b
1i

q)a:til 1000000; b:til 1000000  / Get new objects, both rc=1
q)A:.gpu.to a; .gpu.sync[]; B:.gpu.to b / Ensures A is done first, second copy can clean up its ref.
q)-16!a
1i
q)-16!b
2i

take¶

Select leading or trailing items from a list or dictionary, named entries from a dictionary, or named columns from a table

.gpu.take[x;y]

Where x is:

a positive long atom, returns the first x items of y
a negative long atom, returns the last x items of y
a symbol list, returns the specified keys/column of a dictionary/table y

q)A:.gpu.to a:til 5
q)D:.gpu.to d:`a`b!(a;a+10)
q)T:.gpu.to t:flip d

q).gpu.from .gpu.take[2;A]
0 1
q).gpu.from .gpu.take[2;D]
a| 0  1  2  3  4
b| 10 11 12 13 14
q).gpu.from .gpu.take[`b`a;D]
b| 10 11 12 13 14
a| 0  1  2  3  4 
q).gpu.from .gpu.take[2;T]
a b
----
0 10
1 11
q).gpu.from .gpu.take[-2;T]
a b
----
3 13
4 14
q).gpu.from .gpu.take[`b;T]
b
--
10
11
12
13
14

to¶

Send data to the GPU

.gpu.to x

Where x is any CPU, GPU or mixed-resident object. Returns a copy of x with any non-atom moved to the GPU.

All attributes are preserved on .gpu.to, while all but `s are lost on .gpu.from.

q).gpu.to 10
10
q).gpu.to 10 20 30
foreign
q).gpu.to `a`b!(10 20;30 40)
a| foreign
b| foreign
q).gpu.to flip `a`b!(10 20;30 40)
+`a`b!(foreign;foreign)
q).gpu.from .gpu.to flip `a`b!(10 20;30 40)
a  b
-----
10 30
20 40

xto¶

Send specific columns/keys to the GPU

.gpu.xto[x;y]

Where x is a symbol atom/list specifying keys/columns of a dictionary/table y, returns y with specified non-atoms moved to the GPU.

q).gpu.xto[`a] `a`b!(10 20;30 40)
a| foreign
b| 30 40
q).gpu.xto[`a`c] `a`b!(10 20;30 40)
a| foreign
b| 30 40
q).gpu.xto[`a] flip `a`b!(10 20;30 40)
+`a`b!(foreign;30 40)
q).gpu.meta .gpu.xto[`a] flip `a`b!(10 20;30 40)
c| t f a r
-| ---------
a| j     gpu
b| j     cpu

type¶

Type of a GPU-resident object

.gpu.type x

Where x is any CPU or GPU-resident object, returns its type as short.

Vanilla type returns 112h for GPU-resident foreigns regardless of their true underlying type.

q)A:.gpu.to a:1 2 3
q).gpu.type A
7h
q).gpu.type a
7h
q)type a
7h
q)type A
112h

q)B:.gpu.to b:1 2 3f
q).gpu.type B
9h
q)type B
112h

ver¶

Module Version String

q).gpu.ver[]
"gpulib version: 2.0.0-ada-cuda13.1"

Returns the version string of the module.

GPU reference card¶

Keywords¶

By category¶

aj¶

aj vs. aj0¶

ajf, ajf0¶

append¶

asc¶

iasc¶

xasc¶

attr¶

bin¶

Vectors¶

Tables¶

count¶

drop¶

from¶

gather¶

getDev¶

getMemRelThres¶

group¶

xgroup¶

memDev¶

meta¶

cntDev¶

nvtx¶

end¶

start¶

profiler¶

start¶

stop¶

setDev¶

select¶

Where clause¶

By clause¶

Select clause¶

Supported operations¶

Performance note¶

setMemRelThres¶

sublist¶

sync¶

take¶

to¶

xto¶

type¶

ver¶

`aj` vs. `aj0`¶

`ajf`, `ajf0`¶