Send Feedback
Skip to content

GPU reference card

This page covers the KDB-X GPU module APIs, including inputs, outputs, and examples for each function.

Keywords

aj append asc attr bin count drop from gather gdev
group iasc mdev meta ndev sdev select sublist sync take
to type xasc xgroup xto

By category

data transfer from, to, xto
device management gdev, mdev, ndev, sdev, sync
list append, count, drop, gather, group, sublist, take
join aj
meta attr, type
query select
sort asc, bin, iasc
table meta, xasc, xgroup

Note

Examples use the following naming convention:

  • Lowercase names indicate CPU-resident objects (for example, t, a).
  • Uppercase names indicate GPU or mixed-residency objects (for example, T, A).
q).gpu:use`kx.gpu
q)\S 42

aj

As-of join

.gpu.aj  [c; t1; t2]
.gpu.aj0 [c; t1; t2]
.gpu.ajf [c; t1; t2]
.gpu.ajf0[c; t1; t2]

Where:

  • t1 is a CPU, GPU or mixed-residency table
  • t2 is a CPU, GPU or mixed-residency table
  • c is a symbol vector of n column names, common to t1 and t2, and of matching type
  • column cn is of a sortable type (typically time)

returns a table with records from the left-join of t1 and t2. In the join, columns c0...cn-1 are matched for equality, and the last value of cn (most recent time) is taken. For each record in t1, the result has one record with the items in t1, and

  • if there are matching records in t2, the items of the last (in row order) matching record are appended to those of t1;
  • otherwise the remaining columns are null.

The .gpu.aj and .gpu.bin APIs are currently limited to one or two column names in c.

The .gpu.aj, .gpu.bin and .gpu.group APIs are currently limited to long, timestamp or timespan columns.

q)t:([]time:2026.01.01D10:01:01 2026.01.01D10:01:03 2026.01.01D10:01:04;sym:`msft`ibm`ge;qty:100 200 150)
q)q:([]time:2026.01.01D10:01:00 2026.01.01D10:01:00 2026.01.01D10:01:00 2026.01.01D10:01:02;sym:`g#`ibm`msft`msft`ibm;px:100 99 101 98)
q)T:.gpu.to t
q)Q:.gpu.to q

Join on a single column - every row in t is matched against the most recent row overall in q:

q).gpu.from .gpu.aj[`time;T;Q]
time                          sym  qty px
------------------------------------------
2026.01.01D10:01:01.000000000 msft 100 101
2026.01.01D10:01:03.000000000 ibm  200 98
2026.01.01D10:01:04.000000000 ibm  150 98

Join on two columns - each row in t is matched against the most recent row in q with the same sym. The `g# attribute on the exact-match column in t2 is required for two-column joins.

q).gpu.from .gpu.aj[`sym`time;T;Q]
time                          sym  qty px
------------------------------------------
2026.01.01D10:01:01.000000000 msft 100 101
2026.01.01D10:01:03.000000000 ibm  200 98
2026.01.01D10:01:04.000000000 ge   150

The ge row has a null px because there is no ge in q.

aj vs. aj0

aj and aj0 return different times in their results:

aj    boundary time from t1
aj0   actual time from t2
q).gpu.from .gpu.aj0[`sym`time;T;Q]
time                          sym  qty px
------------------------------------------
2026.01.01D10:01:00.000000000 msft 100 101
2026.01.01D10:01:02.000000000 ibm  200 98
2026.01.01D10:01:04.000000000 ge   150

ajf, ajf0

ajf and ajf0 fill from t1 if the corresponding value in t2 is null.

q)t0:([]time:2#2026.01.01D00:00:01;sym:`a`b;p:1 1;n:`r`s)
q)t1:([]time:2#2026.01.01D00:00:01;sym:`a`b;p:0 1)
q)t2:([]time:2#2026.01.01D00:00:00;sym:`g#`a`b;p:1 0N;n:`r`s)
q)T1:.gpu.to t1
q)T2:.gpu.to t2

With aj, the null p value for b in t2 is preserved in the result:

q).gpu.from .gpu.aj[`sym`time;T1;T2]
time                          sym p n
-------------------------------------
2026.01.01D00:00:01.000000000 a   1 r
2026.01.01D00:00:01.000000000 b     s

With ajf, null values in t2 are filled from t1:

q).gpu.from .gpu.ajf[`sym`time;T1;T2]
time                          sym p n
-------------------------------------
2026.01.01D00:00:01.000000000 a   1 r
2026.01.01D00:00:01.000000000 b   1 s
q)t0 ~ .gpu.from .gpu.ajf[`sym`time;T1;T2]
1b

ajf0 combines both: fills nulls from t1 and uses the actual time from t2:

q).gpu.from .gpu.ajf0[`sym`time;T1;T2]
time                          sym p n
-------------------------------------
2026.01.01D00:00:00.000000000 a   1 r
2026.01.01D00:00:00.000000000 b   1 s

append

Append two lists

.gpu.append[x;y]

Where x and y are CPU or GPU-resident lists, dictionaries or tables, returns x joined to y.

q)A:.gpu.to a:til 5
q)B:.gpu.to b:a+10
q).gpu.from .gpu.append[A;b]
0 1 2 3 4 10 11 12 13 14

q)D:.gpu.to d:`a`b!(a;b)
q).gpu.from .gpu.append[D;d]
a| 0  1  2  3  4  0  1  2  3  4
b| 10 11 12 13 14 10 11 12 13 14

q)T:.gpu.to t:2#flip d
q)t
a b
----
0 10
1 11
q).gpu.from .gpu.append[T;t]
a b
----
0 10
1 11
0 10
1 11

Only the `g attribute is preserved on GPU-resident appends.

.gpu.append falls back to CPU join given entirely CPU-resident arguments.

q).gpu.from .gpu.append[a;b]
0 1 2 3 4 10 11 12 13 14
q).gpu.append[a;b]
0 1 2 3 4 10 11 12 13 14

asc

Ascending sort

.gpu.asc x

Where x is a GPU-resident vector, returns its items in ascending order of value, with the `s attribute set, indicating the list is sorted.

Sorting on GPU-resident symbols is not yet implemented; consider an enum with a sorted domain.

q)A:.gpu.to a:10?10
q)a
7 4 7 1 0 5 9 9 2 9
q).gpu.attr .gpu.asc A
`s
q).gpu.from .gpu.asc A
`s#0 1 2 4 5 7 7 9 9 9

iasc

Ascending grade

.gpu.iasc x
Where x is a GPU-resident vector, returns the indices needed to sort x in ascending order.

q)A:.gpu.to a:10?10
q)a
7 4 7 1 0 5 9 9 2 9
q).gpu.from .gpu.iasc A
4 3 8 1 5 0 2 6 7 9
q)a .gpu.from .gpu.iasc A
0 1 2 4 5 7 7 9 9 9

xasc

Sort a table in ascending order of specified columns

.gpu.xasc[x;y]

Where x is a symbol vector of column names defined in table y, returns y sorted in ascending order by x. The sort is by the first column specified, then by the second column within the first, and so on.

q)T:.gpu.to t:0N?([]sym:raze 2#/:`a`b`c; date:6#2025.01.01+til 2; val:50+6?10f)
q)t
sym date       val
-----------------------
c   2025.01.02 51.1578
b   2025.01.01 57.52659
b   2025.01.02 59.49526
a   2025.01.01 59.74317
c   2025.01.01 53.74586
a   2025.01.02 50.81106
q).gpu.from .gpu.xasc[`date;T]
sym date       val     
-----------------------
b   2025.01.01 57.52659
a   2025.01.01 59.74317
c   2025.01.01 53.74586
c   2025.01.02 51.1578 
b   2025.01.02 59.49526
a   2025.01.02 50.81106
q).gpu.from .gpu.xasc[`date`val;T]
sym date       val     
-----------------------
c   2025.01.01 53.74586
b   2025.01.01 57.52659
a   2025.01.01 59.74317
a   2025.01.02 50.81106
c   2025.01.02 51.1578 
b   2025.01.02 59.49526

q)/ Use a sorted symbol enum to sort on symbols
q).gpu.from .gpu.xasc[`sym`date;T]
'GPU_SYMBOL_SORTING_NOT_IMPLEMENTED
q)syms:exec asc distinct sym from t
q)T:.gpu.to t:update `syms$sym from t
q).gpu.from .gpu.xasc[`sym`date;T]
sym date       val     
-----------------------
a   2025.01.01 59.74317
a   2025.01.02 50.81106
b   2025.01.01 57.52659
b   2025.01.02 59.49526
c   2025.01.01 53.74586
c   2025.01.02 51.1578

attr

Attributes of a GPU or CPU object

.gpu.attr x

Where x is any CPU or GPU-resident object, returns its attribute as a symbol atom.

The possible attributes are:

code attribute
`s sorted
`u unique
`p parted
`g grouped

A null symbol result ` means no attributes are set on x.

All attributes are preserved on .gpu.to, while all but `s are lost on .gpu.from.

Vanilla attr returns ` for GPU-resident foreigns regardless of their true attributes.

q)G:.gpu.to g:`g#a:10?10
q)a
9 9 2 1 5 0 9 9 6 1
q)g
`g#9 9 2 1 5 0 9 9 6 1
q)attr g
`g
q)attr G        / vanilla attr insufficent on gpu foreign objects
`
q).gpu.attr G
`g
q).gpu.attr g   / .gpu.attr falls back to vanilla for CPU-resident objects
`g

q).gpu.attr .gpu.to til 10
`
q).gpu.attr .gpu.asc .gpu.to til 10
`s

q)attr .gpu.from .gpu.to `s#1 2 3
`s
q)attr .gpu.from .gpu.to `p#1 2 3
`               / only `s preserved on .gpu.from

bin

Binary search

.gpu.bin[x;y]

Vectors

Where:

  • x is a sorted GPU-resident vector.
  • y is a GPU-resident vector of the same type.

returns the index of the last item in x which is ≤ each y[i]. The result is -1 where y[i] is less than the first item of x.

.gpu.aj, .gpu.bin and .gpu.group are currently limited to long, timestamp or timespan columns.

q)A:.gpu.to a:0 2 4 6 8 10
q)B:.gpu.to b:5 6 7
q).gpu.from .gpu.bin[A;B]
2 3 3

q)B:.gpu.to b:-10 0 4 5 6 20
q).gpu.from .gpu.bin[A;B]
-1 0 2 2 3 5

.gpu.bin uses a binary search algorithm, which is generally more efficient on large data. The items of x must be sorted ascending although .gpu.bin does not verify this property.

Tables

Where:

  • x is a GPU-resident table of n columns
  • y is a GPU-resident table with the same schema

returns the indices of the last row of x for which

  • the first n-1 columns each match the first n-1 columns of the corresponding row of y, and
  • the last column is not greater than the last column of the corresponding row of y.

If no items match the criteria, either because there are no rows that match in the first n-1 columns, or because the last value is smaller than the last value in the first such row, 0N is returned.

.gpu.aj and .gpu.bin are currently limited to one or two columns.

.gpu.bin is significantly faster when GPU `g is applied to the exact-match column; see .gpu.group.

q)A:.gpu.to a:([]a:`p`p`p`q`q`q;b:0 2 4 0 2 4)
q)B:.gpu.to b:([]a:`q;b:-1 1 3 5)
q).gpu.from .gpu.bin[A;B]
0N 3 4 5

To use bin with a table, the last column need not be sorted overall, but it needs to be sorted within the equivalence classes defined by the first n-1 columns (as shown in the previous example).

count

Count of a CPU or GPU-resident object

.gpu.count x

Where x is:

  • a vector, returns the number of elements
  • a dictionary, returns the number of keys
  • a table, returns the number of rows
  • anything else, 1

Vanilla count returns 1 for GPU-resident foreigns regardless of their true size.

q)A:.gpu.to a:til 5
q)D:.gpu.to d:`a`b!(a;a+10)
q)T:.gpu.to t:flip d

q).gpu.count each (a;d;t)
5 2 5
q).gpu.count each (A;D;T)
5 2 5

q)count each (a;d;t)
5 2 5
q)count each (A;D;T)
1 2 1

drop

Drop specified count, keys or columns

.gpu.drop[x;y]

Where x is a symbol atom or vector, returns a dictionary/table y without the specified keys/columns.

Note

Unlike the standard drop, .gpu.drop does not currently support dropping by count; it only supports dropping by key or column name.

q)T:.gpu.to t:([]a:`x`y`z;b:0 2 4;c:1 2 3)
q)T
+`a`b`c!(foreign;foreign;foreign)
q).gpu.drop[`b] T
+`a`c!(foreign;foreign)
q).gpu.drop[`b] t
a c
---
x 1
y 2
z 3
q).gpu.drop[`a`b] t
c
-
1
2
3
q).gpu.drop[`a`b] T
+(,`c)!,foreign

from

Retrieve data from the GPU

.gpu.from x

Where x is any CPU, GPU or mixed-resident object. Returns x moved to the CPU.

All attributes are preserved on .gpu.to, while all but `s are lost on .gpu.from.

q)A:.gpu.to a:til 5
q)D:.gpu.to d:`a`b!(a;a+10)
q)T:.gpu.to t:flip d

q).gpu.from A
0 1 2 3 4
q).gpu.from a
0 1 2 3 4
q).gpu.from D
a| 0  1  2  3  4
b| 10 11 12 13 14
q).gpu.from T
a b
----
0 10
1 11
2 12
3 13
4 14

gather

Gather specified indices of a list

.gpu.gather[x;y]

Where:

  • x is a GPU-resident vector
  • y is a GPU-resident vector of indices into x

returns the elements of x at each index y.

q)A:.gpu.to a:10?10
q)a
7 4 7 1 0 5 9 9 2 9
q)B:.gpu.to b:2 4 6 8
q)a b
7 0 9 2
q).gpu.from .gpu.gather[A;B]
7 0 9 2

q).gpu.from .gpu.asc[A]
`s#0 1 2 4 5 7 7 9 9 9
q).gpu.from .gpu.gather[A] .gpu.iasc A
0 1 2 4 5 7 7 9 9 9

gdev

Get current GPU

.gpu.gdev x

Where x is:

  • null, returns the currently selected device index
  • a GPU-resident object, returns the device index on which it resides
q).gpu.sdev 0
q).gpu.gdev[]
0
q)A:.gpu.to 1 2 3
q).gpu.sdev 2
q).gpu.gdev[]
2
q).gpu.gdev A
0

group

Apply GPU grouped attribute

.gpu.group x

Where x is a GPU-resident vector, returns x with the `g attribute set.

.gpu.aj and .gpu.bin are currently limited to long, timestamp or timespan columns.

All attributes are preserved on .gpu.to, while all but `s are lost on .gpu.from.

q)A:.gpu.to a:10?10
q)a
7 4 7 1 0 5 9 9 2 9
q)G:.gpu.group A
q).gpu.attr G
`g
q).gpu.from G
7 4 7 1 0 5 9 9 2 9     / `g not preserved on .gpu.from

xgroup

Apply GPU grouped attribute to specific columns/keys

.gpu.xgroup[x;y]

Where x is a symbol atom/list specifying GPU-resident keys/columns of a dictionary/table y, returns y with specified columns/keys grouped.

q)T:.gpu.to t:flip `a`b!(1 2 3 4;10 20 30 40)
q).gpu.meta T
c| t f a r
-| ---------
a| j     gpu
b| j     gpu
q).gpu.meta .gpu.xgroup[`a] T
c| t f a r
-| ---------
a| j   g gpu
b| j     gpu

mdev

Get memory info for current GPU

.gpu.mdev[]

Returns memory utilisation for the currently selected device in bytes, similar to .Q.w[].

q).gpu.mdev[]
used| 512
heap| 33554432
peak| 33554432
mphy| 23661248512
free| 23426301952

q)a:.gpu.mdev[]
q).gpu.mdev[]-a
used| 0
heap| 0
peak| 0
mphy| 0
free| 0
q)A:.gpu.to til 1000
q).gpu.mdev[]-a
used| 8064
heap| 0
peak| 0
mphy| 0
free| 0
q).gpu.sdev 1
q).gpu.mdev[]
used| 0
heap| 0
peak| 0
mphy| 23661248512
free| 23459856384

meta

Metadata of a table, including GPU-residency info

.gpu.meta x

Where x is a CPU, GPU or mixed-residency table, returns the metadata including a residency column.

Vanilla meta returns no metadata for GPU-resident foreigns regardless of their true type or attributes.

q)syms:-10?`3
q)q:`time xasc([]time:2026.07.10D12:32+5?00:00:00;sym:5?syms;px:95+5?10)
q).gpu.meta q
c   | t f a r  
----| ---------
time| p   s cpu
sym | s     cpu
px  | j     cpu
q).gpu.meta .gpu.to q
c   | t f a r  
----| ---------
time| p   s gpu
sym | s     gpu
px  | j     gpu
q).gpu.meta .gpu.xto[`time`sym] q
c   | t f a r  
----| ---------
time| p   s gpu
sym | s     gpu
px  | j     cpu

q)/ vanilla meta returns no information for GPU-resident columns
q)meta .gpu.xto[`time`sym] q
c   | t f a
----| -----
time|      
sym |      
px  | j  

ndev

Count of available GPUs

.gpu.ndev[]

Returns the number of available GPUs. A process may be limited by setting CUDA_VISIBLE_DEVICES in its environment.

q).gpu.ndev[]
4
q){.gpu.sdev x;.gpu.gdev[]} peach til 4
0 1 2 3

q) / in another process limited to devices 1 and 3
q)getenv `CUDA_VISIBLE_DEVICES
"1,3"
q).gpu.ndev[]
2
q){.gpu.sdev x;.gpu.mdev[]} peach til .gpu.ndev[]
used heap peak mphy        free
--------------------------------------
0    0    0    23661248512 23459856384
0    0    0    23661248512 23459856384

sdev

Set current GPU

.gpu.sdev x

Where x is a long from 0 to .gpu.ndev[]-1, selects the device with that index.

.gpu.sdev takes an index into the available device list. For a process with CUDA_VISIBLE_DEVICES='3,1', .gpu.sdev 0 selects the device with system ID 3.

q)getenv `CUDA_VISIBLE_DEVICES
"3,1"
q).gpu.gdev[]
0
q)A:.gpu.to til 1000
q).gpu.sdev 1
q).gpu.gdev[]
1
q)B:.gpu.to til 1000
q).gpu.gdev each (A;B)
0 1

select

Functional select

The GPU module provides a select function that mirrors the functional‑select API ?[t;c;b;a], except that t is a table whose columns reside on the GPU.

.gpu.select[t;c;b;a]

Where:

  • t is a table with all columns on device
  • c is the Where phrase, a list of constraints
  • b is the By phrase
  • a is the Select phrase

Currently all columns of t must be resident on the GPU.

Run parse on a qSQL statement to see how to represent it as a functional select

q)parse "select p_min:min p, p_max:max p, p_avg:avg p by sym from t"
?
`t
()
(,`sym)!,`sym
`p_min`p_max`p_avg!((min;`p);(max;`p);(avg;`p))
q)t:([]sym:`AAPL`MSFT`AAPL`MSFT`AAPL;p:150 300 152 298 155;size:100 200 150 300 250)
q)T:.gpu.to t

q)/ select all columns
q).gpu.from .gpu.select[T;();0b;()]
sym  p   size
-------------
AAPL 150 100
MSFT 300 200
AAPL 152 150
MSFT 298 300
AAPL 155 250

q)/ select specific columns
q).gpu.from .gpu.select[T;();0b;`sym`p!(`sym;`p)]
sym  p
--------
AAPL 150
MSFT 300
AAPL 152
MSFT 298
AAPL 155

q)/ where sym = `AAPL - enlist the symbol to distinguish it from a column name
q).gpu.from .gpu.select[T;enlist(=;`sym;enlist`AAPL);0b;()]
sym  p   size
-------------
AAPL 150 100
AAPL 152 150
AAPL 155 250

q)/ multiple where conditions are ANDed
q).gpu.from .gpu.select[T;((=;`sym;enlist`AAPL);(>;`p;151));0b;()]
sym  p   size
-------------
AAPL 152 150
AAPL 155 250

q)/ aggregation by group
q).gpu.from .gpu.select[T;();enlist[`sym]!enlist`sym;`p_min`p_max`p_avg!((min;`p);(max;`p);(avg;`p))]
sym  p_min p_max p_avg
-------------------------
AAPL 150   155   152.3333
MSFT 298   300   299

q)/ computed column: avg price * size by sym
q).gpu.from .gpu.select[T;();enlist[`sym]!enlist`sym;enlist[`avgpxsz]!enlist(avg;(*;`p;`size))]
sym  avgpxsz
-------------
AAPL 25516.67
MSFT 74700

q)/ unary ops in the select phrase
q).gpu.from .gpu.select[T;();0b;enlist[`logp]!enlist(log;`p)]
logp
--------
5.010635
5.703782
5.023881
5.697093
5.043425

Where clause

  • c is a list of parse-tree expressions, each evaluating to a boolean vector.
  • Expressions can use supported binary ops, unary ops, and aggregates.
  • () (empty list) means no constraints.
  • Multiple conditions are ANDed: ((>;`p;100);(<;`p;200)).

By clause

  • b is a dictionary mapping output key names to input column names: enlist[`sym]!enlist`sym.
  • 0b means no grouping.

The result of a select with a By clause is currently a simple table, but will likely change to a keyed table in the future.

The output is usually sorted by the key columns; this is currently not true for symbol key columns.

When a By clause is used, the Select phrase must contain one or more aggregates.

Select clause

  • a is a dictionary mapping output column names to input columns or parse-tree expressions.
  • () selects all columns.
  • Bare symbols in dictionary values are column references. To reference a literal symbol, enlist it.

Supported operations

Not all types are supported for all operations, but in general the supported types are scalars and 1D vectors of basic atomic types.

  • Binary ops: =, <>, <, >, <=, >=, +, -, *, %, |, &
  • Unary ops: abs, log, exp, sin, asin, cos, acos, tan, atan, floor, ceiling
  • Aggregate ops: sum, min, max, avg, count

In parse-tree form, <> is (';~:;=), >= is (';~:;<), and <= is (';~:;>).

Results may deviate slightly from base kdb+. For more information see CUDA Mathematical Functions.

There is a known bug in integer summation causing divergence from base kdb+ when large values or infinities cause overflow.

Performance note

The Where clause does not filter rows until the final stage of the select operation. If the Where clause will considerably reduce the size of the table, consider applying it in a separate select statement first. This keeps allocation sizes predictable and allows Where conditions to be evaluated concurrently.

sublist

Head, tail or slice of a list

.gpu.sublist[x;y]

Where x is:

  • a positive long atom, returns the first x items of y.
  • a negative long atom, returns the last x items of y.
  • a long vector, returns up to x[1] items from y, starting at item x[0]. The result contains no more items than are available in y.

Negative starting points are not yet implemented for GPU-resident vectors.

q)/ Head or tail, similar to .gpu.take but limited to the list length
q)A:.gpu.to a:2 3 5 7 11
q).gpu.sublist[3;A]
foreign
q).gpu.from .gpu.sublist[3;A]
2 3 5
q).gpu.from .gpu.sublist[10;A]
2 3 5 7 11
q).gpu.from .gpu.take[10;A]
2 3 5 7 11 2 3 5 7 11

q)syms:-10?`3
q)T:.gpu.to t:`time xasc([]time:2026.07.10D12:32+5?00:00:00;sym:5?syms;px:95+5?10)
q)t
time                          sym px 
-------------------------------------
2026.07.10D18:48:49.000000000 adl 104
2026.07.11D03:15:53.000000000 cgl 97 
2026.07.11D04:57:05.000000000 hmc 104
2026.07.11D06:32:25.000000000 fjk 96 
2026.07.11D09:26:01.000000000 cgl 104
q).gpu.from .gpu.sublist[-2] T
time                          sym px 
-------------------------------------
2026.07.11D06:32:25.000000000 fjk 96 
2026.07.11D09:26:01.000000000 cgl 104


q)/ Slice
q)A:.gpu.to a:2 3 5 7 11
q).gpu.from .gpu.sublist[1 2;A]
3 5

q) / Negative start for GPU nyi
q)a
2 3 5 7 11
q)sublist[-2 5;a]
0N 0N 2 3 5
q).gpu.sublist[-2 5;a]
0N 0N 2 3 5
q).gpu.sublist[-2 5;A]
'nyi

sync

Wait for outstanding GPU work

.gpu.sync[]

Blocks the calling CPU thread until all work on the currently selected GPU has completed. Returns null.

.gpu.sync is useful to avoid 'wsfull when the CPU provides work faster than the GPU can consume it, or when more explicit control of memory is needed.

q)a:til 1000000
q)-16!a
1i                  / only 1 reference to a
q)A:.gpu.to a       / returns before the copy to device is completed
q)-16!a             / the gpu stream owns a reference to a to facilitate this
2i
q)r:.gpu.from A     / implicit sync before return to q, but .gpu.from does no destructor work.
q)-16!a             / gpu stream still owns a reference to a, despite the copy having completed
2i

q).gpu.sync[]       / explicit sync, waits for the stream to complete then completes any destructor work
q)-16!a
1i

q)/ Most other apis will run any outstanding destructor work but without syncing.
q)a:til 1000000; b:til 1000000  / Get new objects, both rc=1
q)A:.gpu.to a; B:.gpu.to b      / Both return A & B before either copy is complete:
q)-16!a
2i
q)-16!b
2i
q)B:.gpu.to b                   / Both copies have completed, doing another doesn't sync but does clean up the refs.
q)-16!a
1i
q)-16!b
2i
q).gpu.sync[]                   / And a final sync cleans up the latest ref.
q)-16!a
1i
q)-16!b
1i

q)a:til 1000000; b:til 1000000  / Get new objects, both rc=1
q)A:.gpu.to a; .gpu.sync[]; B:.gpu.to b / Ensures A is done first, second copy can clean up its ref.
q)-16!a
1i
q)-16!b
2i

take

Select leading or trailing items from a list or dictionary, named entries from a dictionary, or named columns from a table

.gpu.take[x;y]

Where x is:

  • a positive long atom, returns the first x items of y
  • a negative long atom, returns the last x items of y
  • a symbol list, returns the specified keys/column of a dictionary/table y
q)A:.gpu.to a:til 5
q)D:.gpu.to d:`a`b!(a;a+10)
q)T:.gpu.to t:flip d

q).gpu.from .gpu.take[2;A]
0 1
q).gpu.from .gpu.take[2;D]
a| 0  1  2  3  4
b| 10 11 12 13 14
q).gpu.from .gpu.take[`b`a;D]
b| 10 11 12 13 14
a| 0  1  2  3  4 
q).gpu.from .gpu.take[2;T]
a b
----
0 10
1 11
q).gpu.from .gpu.take[-2;T]
a b
----
3 13
4 14
q).gpu.from .gpu.take[`b;T]
b
--
10
11
12
13
14

to

Send data to the GPU

.gpu.to x

Where x is any CPU, GPU or mixed-resident object. Returns a copy of x with any non-atom moved to the GPU.

All attributes are preserved on .gpu.to, while all but `s are lost on .gpu.from.

q).gpu.to 10
10
q).gpu.to 10 20 30
foreign
q).gpu.to `a`b!(10 20;30 40)
a| foreign
b| foreign
q).gpu.to flip `a`b!(10 20;30 40)
+`a`b!(foreign;foreign)
q).gpu.from .gpu.to flip `a`b!(10 20;30 40)
a  b
-----
10 30
20 40

xto

Send specific columns/keys to the GPU

.gpu.xto[x;y]

Where x is a symbol atom/list specifying keys/columns of a dictionary/table y, returns y with specified non-atoms moved to the GPU.

q).gpu.xto[`a] `a`b!(10 20;30 40)
a| foreign
b| 30 40
q).gpu.xto[`a`c] `a`b!(10 20;30 40)
a| foreign
b| 30 40
q).gpu.xto[`a] flip `a`b!(10 20;30 40)
+`a`b!(foreign;30 40)
q).gpu.meta .gpu.xto[`a] flip `a`b!(10 20;30 40)
c| t f a r
-| ---------
a| j     gpu
b| j     cpu

type

Type of a GPU-resident object

.gpu.type x

Where x is any CPU or GPU-resident object, returns its type as short.

Vanilla type returns 112h for GPU-resident foreigns regardless of their true underlying type.

q)A:.gpu.to a:1 2 3
q).gpu.type A
7h
q).gpu.type a
7h
q)type a
7h
q)type A
112h

q)B:.gpu.to b:1 2 3f
q).gpu.type B
9h
q)type B
112h