Skip to content

Function reference

These functions are exposed within the .arrowkdb namespace, allowing users to convert data between the Arrow/Parquet and kdb+.

KxSystems/arrowkdb

.arrowkdb Arrow/Parquet interface Datatype constructors dt.na Create a NULL datatype dt.boolean Create a boolean datatype dt.int8 Create an int8 datatype dt.int16 Create an int16 datatype dt.int32 Create an int32 datatype dt.int64 Create an int64 datatype dt.uint8 Create an uint8 datatype dt.uint16 Create an uint16 datatype dt.uint32 Create an uint32 datatype dt.uint64 Create an uint64 datatype dt.float16 Create a float16 (represented as uint16_t) datatype dt.float32 Create a float32 datatype dt.float64 Create a float64 datatype dt.time32 Create a 32-bit time (units since midnight with specified granularity) datatype dt.time64 Create a 64-bit time (units since midnight with specified granularity) datatype dt.timestamp Create a 64-bit timestamp (units since UNIX epoch with specified granularity) datatype dt.date32 Create a 32-bit date (days since UNIX epoch) datatype dt.date64 Create a 64-bit date (milliseconds since UNIX epoch) datatype dt.month_interval Create a 32-bit interval (described as a number of months, similar to YEAR_MONTH in SQL) datatype dt.day_time_interval Create a 64-bit interval (described as a number of days and milliseconds, similar to DAY_TIME in SQL) datatype dt.duration Create a 64-bit duration (measured in units of specified granularity) datatype dt.binary Create a variable length bytes datatype dt.utf8 Create a UTF8 variable length string datatype dt.large_binary Create a large (64-bit offsets) variable length bytes datatype dt.large_utf8 Create a large (64-bit offsets) UTF8 variable length string datatype dt.fixed_size_binary Create a fixed width bytes datatype dt.decimal128 Create a 128-bit integer (with precision and scale in twos complement) datatype dt.list Create a list datatype, specified in terms of its child datatype dt.large_list Create a large (64-bit offsets) list datatype, specified in terms of its child datatype dt.fixed_size_list Create a fixed size list datatype, specified in terms of its child datatype dt.map Create a map datatype, specified in terms of its key and item child datatypes dt.struct Create a struct datatype, specified in terms of the field identifiers of its children dt.sparse_union Create a sparse union datatype, specified in terms of the field identifiers of its children dt.dense_union Create a dense union datatype, specified in terms of the field identifiers of its children dt.dictionary Create a dictionary datatype specified in terms of its value and index datatypes, similar to pandas categorical dt.inferDatatype Infer and construct a datatype from a kdb+ list

Datatype inspection dt.datatypeName Return the base name of a datatype, ignoring any parameters or child datatypes/fields dt.getTimeUnit Return the TimeUnit of a time32/time64/timestamp/duration datatype dt.getByteWidth Return the byte_width of a fixed_size_binary datatype dt.getListSize Returns the list_size of a fixed_size_list datatype dt.getPrecisionScale Return the precision and scale of a decimal128 datatype dt.getListDatatype Return the child datatype identifier of a list/large_list/fixed_size_list datatype dt.getMapDatatypes Return the key and item child datatype identifiers of a map datatype dt.getDictionaryDatatypes Return the value and index child datatype identifiers of a dictionary datatype dt.getChildFields Return the list of child field identifiers of a struct/spare_union/dense_union datatype

Datatype management dt.printDatatype Display user readable information for a datatype, including parameters and nested child datatypes dt.listDatatypes Return the list of identifiers for all datatypes held in the DatatypeStore dt.removeDatatype Remove a datatype from the DatatypeStore dt.equalDatatypes Check if two datatypes are logically equal, including parameters and nested child datatypes

Field Constructor fd.field Create a field instance from its name and datatype

Field Inspection fd.fieldName Return the name of a field fd.fieldDatatype Return the datatype of a field

Field management fd.printField Display user readable information for a field, including name and datatype fd.listFields Return the list of identifiers for all fields held in the FieldStore fd.removeField Remove a field from the FieldStore fd.equalFields Check if two fields are logically equal, including names and datatypes

Schema constructors sc.schema Create a schema instance from a list of field identifiers sc.inferSchema Infer and construct a schema based on a kdb+ table

Schema inspection sc.schemaFields Return the list of field identifiers used by a schema

Schema management sc.printSchema Display user readable information for a schema, including its fields and their order sc.listSchemas Return the list of identifiers for all schemas held in the SchemaStore sc.removeSchema Remove a schema from the SchemaStore sc.equalSchemas Check if two schemas are logically equal, including their fields and the fields' order

Array data ar.prettyPrintArray Convert a kdb+ list to an Arrow array and pretty print the array ar.prettyPrintArrayFromList Convert a kdb+ list to an Arrow array and pretty print the array, inferring the datatype from the kdb+ list type

Table data tb.prettyPrintTable Convert a kdb+ mixed list of array data to an Arrow table and pretty print the table tb.prettyPrintTableFromTable Convert a kdb+ table to an Arrow table and pretty print the table, inferring the schema from the kdb+ table structure

Parquet files pq.writeParquet Convert a kdb+ mixed list of array data to an Arrow table and write to a Parquet file pq.writeParquetFromTable Convert a kdb+ table to an Arrow table and write to a Parquet file, inferring the schema from the kdb+ table structure pq.readParquetSchema Read the schema from a Parquet file pq.readParquetData Read an Arrow table from a Parquet file and convert to a kdb+ mixed list of array data pq.readParquetColumn Read a single column from a Parquet file and convert to a kdb+ list pq.readParquetToTable Read an Arrow table from a Parquet file and convert to a kdb+ table

Arrow IPC files ipc.writeArrow Convert a kdb+ mixed list of array data to an Arrow table and write to an Arrow file ipc.writeArrowFromTable Convert a kdb+ table to an Arrow table and write to an Arrow file, inferring the schema from the kdb+ table structure ipc.readArrowSchema Read the schema from an Arrow file ipc.readArrowData Read an Arrow table from an Arrow file and convert to a kdb+ mixed list of array data ipc.readArrowToTable Read an Arrow table from an Arrow file and convert to a kdb+ table

Arrow IPC streams ipc.serializeArrow Convert a kdb+ mixed list of array data to an Arrow table and serialize to an Arrow stream ipc.serializeArrowFromTable Convert a kdb+ table to an Arrow table and serialize to an Arrow stream, inferring the schema from the kdb+ table structure ipc.parseArrowSchema Parse the schema from an Arrow stream ipc.parseArrowData Parse an Arrow table from an Arrow stream and convert to a kdb+ mixed list of array data ipc.parseArrowToTable Parse an Arrow table from an Arrow file and convert to a kdb+ table

Utilities util.buildInfo Return build information regarding the in use Arrow library

Datatype constructors

dt.na

Create a NULL datatype

.arrowkdb.dt.na[]

Returns the datatype identifier

q).arrowkdb.dt.printDatatype[.arrowkdb.dt.na[]]
null
q).arrowkdb.ar.prettyPrintArray[.arrowkdb.dt.na[];(();();());::]
3 nulls

dt.boolean

Create a boolean datatype

.arrowkdb.dt.boolean[]

Returns the datatype identifier

q).arrowkdb.dt.printDatatype[.arrowkdb.dt.boolean[]]
bool
q).arrowkdb.ar.prettyPrintArray[.arrowkdb.dt.boolean[];(010b);::]
[
  false,
  true,
  false
]

dt.int8

Create an int8 datatype

.arrowkdb.dt.int8[]
kdb+ type 10h can be written to an int8 array

The is supported on the writing path only. Reading from an int8 array returns a 4h list

Returns the datatype identifier

q).arrowkdb.dt.printDatatype[.arrowkdb.dt.int8[]]
int8
q).arrowkdb.ar.prettyPrintArray[.arrowkdb.dt.int8[];(0x102030);::]
[
  16,
  32,
  48
]

dt.int16

Create an int16 datatype

.arrowkdb.dt.int16[]

Returns the datatype identifier

q).arrowkdb.dt.printDatatype[.arrowkdb.dt.int16[]]
int16
q).arrowkdb.ar.prettyPrintArray[.arrowkdb.dt.int16[];(11 22 33h);::]
[
  11,
  22,
  33
]

dt.int32

Create an int32 datatype

.arrowkdb.dt.int32[]

Returns the datatype identifier

q).arrowkdb.dt.printDatatype[.arrowkdb.dt.int32[]]
int32
q).arrowkdb.ar.prettyPrintArray[.arrowkdb.dt.int32[];(11 22 33i);::]
[
  11,
  22,
  33
]

dt.int64

Create an int64 datatype

.arrowkdb.dt.int64[]

Returns the datatype identifier

q).arrowkdb.dt.printDatatype[.arrowkdb.dt.int64[]]
int64
q).arrowkdb.ar.prettyPrintArray[.arrowkdb.dt.int64[];(11 22 33j);::]
[
  11,
  22,
  33
]

dt.uint8

Create an uint8 datatype

.arrowkdb.dt.uint8[]

Returns the datatype identifier

q).arrowkdb.dt.printDatatype[.arrowkdb.dt.uint8[]]
uint8
q).arrowkdb.ar.prettyPrintArray[.arrowkdb.dt.uint8[];(0x102030);::]
[
  16,
  32,
  48
]

dt.uint16

Create an uint16 datatype

.arrowkdb.dt.uint16[]

Returns the datatype identifier

q).arrowkdb.dt.printDatatype[.arrowkdb.dt.uint16[]]
uint16
q).arrowkdb.ar.prettyPrintArray[.arrowkdb.dt.uint16[];(11 22 33h);::]
[
  11,
  22,
  33
]

dt.uint32

Create an uint32 datatype

.arrowkdb.dt.uint32[]

Returns the datatype identifier

uint32 datatype is supported by Parquet v2.0 only, being changed to int64 otherwise

The Parquet file format is less fully featured compared to Arrow and consequently the Arrow/Parquet file writer currently does not support some datatypes or represents them using a different datatype as described here

q).arrowkdb.dt.printDatatype[.arrowkdb.dt.uint32[]]
uint32
q).arrowkdb.ar.prettyPrintArray[.arrowkdb.dt.uint32[];(11 22 33i);::]
[
  11,
  22,
  33
]

dt.uint64

Create an uint64 datatype

.arrowkdb.dt.uint64[]

Returns the datatype identifier

q).arrowkdb.dt.printDatatype[.arrowkdb.dt.uint64[]]
uint64
q).arrowkdb.ar.prettyPrintArray[.arrowkdb.dt.uint64[];(11 22 33j);::]
[
  11,
  22,
  33
]

dt.float16

Create a float16 (represented as uint16_t) datatype

.arrowkdb.dt.float16[]

Returns the datatype identifier

float16 datatype is not supported by Parquet

The Parquet file format is less fully featured compared to Arrow and consequently the Arrow/Parquet file writer currently does not support some datatypes or represents them using a different datatype as described here

q).arrowkdb.dt.printDatatype[.arrowkdb.dt.float16[]]
halffloat
q).arrowkdb.ar.prettyPrintArray[.arrowkdb.dt.float16[];(11 22 33h);::]
[
  11,
  22,
  33
]

dt.float32

Create a float32 datatype

.arrowkdb.dt.float32[]

Returns the datatype identifier

q).arrowkdb.dt.printDatatype[.arrowkdb.dt.float32[]]
float
q).arrowkdb.ar.prettyPrintArray[.arrowkdb.dt.float32[];(1.1 2.2 3.3e);::]
[
  1.1,
  2.2,
  3.3
]

dt.float64

Create a float64 datatype

.arrowkdb.dt.float64[]

Returns the datatype identifier

q).arrowkdb.dt.printDatatype[.arrowkdb.dt.float64[]]
double
q).arrowkdb.ar.prettyPrintArray[.arrowkdb.dt.float64[];(1.1 2.2 3.3f);::]
[
  1.1,
  2.2,
  3.3
]

dt.time32

Create a 32-bit time (units since midnight with specified granularity) datatype

.arrowkdb.dt.time32[time_unit]

Where time_unit is the time unit string: SECOND or MILLI

returns the datatype identifier

q).arrowkdb.dt.printDatatype[.arrowkdb.dt.time32[`MILLI]]
time32[ms]
q).arrowkdb.dt.getTimeUnit[.arrowkdb.dt.time32[`MILLI]]
`MILLI
q).arrowkdb.ar.prettyPrintArray[.arrowkdb.dt.time32[`MILLI];(01:00:00.100 02:00:00.200 03:00:00.300);::]
[
  01:00:00.100,
  02:00:00.200,
  03:00:00.300
]

dt.time64

Create a 64-bit time (units since midnight with specified granularity) datatype

.arrowkdb.dt.time64[time_unit]

Where time_unit is the time unit string: MICRO or NANO

returns the datatype identifier

q).arrowkdb.dt.printDatatype[.arrowkdb.dt.time64[`NANO]]
time64[ns]
q).arrowkdb.dt.getTimeUnit[.arrowkdb.dt.time64[`NANO]]
`NANO
q).arrowkdb.ar.prettyPrintArray[.arrowkdb.dt.time64[`NANO];(0D01:00:00.100000001 0D02:00:00.200000002 0D03:00:00.300000003);::]
[
  01:00:00.100000001,
  02:00:00.200000002,
  03:00:00.300000003
]

dt.timestamp

Create a 64-bit timestamp (units since UNIX epoch with specified granularity) datatype

.arrowkdb.dt.timestamp[time_unit]

Where time_unit is the time unit string: SECOND, MILLI, MICRO or NANO

returns the datatype identifier

timestamp(nano) datatype is supported by Parquet v2.0 only, being mapped to timestamp(milli) otherwise

The Parquet file format is less fully featured compared to Arrow and consequently the Arrow/Parquet file writer currently does not support some datatypes or represents them using a different datatype as described here

q).arrowkdb.dt.printDatatype[.arrowkdb.dt.timestamp[`NANO]]
timestamp[ns]
q).arrowkdb.dt.getTimeUnit[.arrowkdb.dt.timestamp[`NANO]]
`NANO
q).arrowkdb.ar.prettyPrintArray[.arrowkdb.dt.timestamp[`NANO];(2001.01.01D00:00:00.100000001 2002.02.02D00:00:00.200000002 2003.03.03D00:00:00.300000003);::]
[
  2001-01-01 00:00:00.100000001,
  2002-02-02 00:00:00.200000002,
  2003-03-03 00:00:00.300000003
]

dt.date32

Create a 32-bit date (days since UNIX epoch) datatype

.arrowkdb.dt.date32[]

Returns the datatype identifier

q).arrowkdb.dt.printDatatype[.arrowkdb.dt.date32[]]
date32[day]
q).arrowkdb.ar.prettyPrintArray[.arrowkdb.dt.date32[];(2001.01.01 2002.02.02 2003.03.03);::]
[
  2001-01-01,
  2002-02-02,
  2003-03-03
]

dt.date64

Create a 64-bit date (milliseconds since UNIX epoch) datatype

.arrowkdb.dt.date64[]

Returns the datatype identifier

date64 datatype is changed to date32(days) by Parquet

The Parquet file format is less fully featured compared to Arrow and consequently the Arrow/Parquet file writer currently does not support some datatypes or represents them using a different datatype as described here

q).arrowkdb.dt.printDatatype[.arrowkdb.dt.date64[]]
date64[ms]
q).arrowkdb.ar.prettyPrintArray[.arrowkdb.dt.date64[];(2001.01.01D00:00:00.000000000 2002.02.02D00:00:00.000000000 2003.03.03D00:00:00.000000000);::]
[
  2001-01-01,
  2002-02-02,
  2003-03-03
]

dt.month_interval

Create a 32-bit interval (described as a number of months, similar to YEAR_MONTH in SQL) datatype

.arrowkdb.dt.month_interval[]

Returns the datatype identifier

month_interval datatype is not supported by Parquet

The Parquet file format is less fully featured compared to Arrow and consequently the Arrow/Parquet file writer currently does not support some datatypes or represents them using a different datatype as described here

q).arrowkdb.dt.printDatatype[.arrowkdb.dt.month_interval[]]
month_interval
q).arrowkdb.ar.prettyPrintArray[.arrowkdb.dt.month_interval[];(2001.01m,2002.02m,2003.03m);::]
[
  12,
  25,
  38
]

dt.day_time_interval

Create a 64-bit interval (described as a number of days and milliseconds, similar to DAY_TIME in SQL) datatype

.arrowkdb.dt.day_time_interval[]

Returns the datatype identifier

day_time_interval datatype is not supported by Parquet

The Parquet file format is less fully featured compared to Arrow and consequently the Arrow/Parquet file writer currently does not support some datatypes or represents them using a different datatype as described here

q).arrowkdb.dt.printDatatype[.arrowkdb.dt.day_time_interval[]]
day_time_interval
q).arrowkdb.ar.prettyPrintArray[.arrowkdb.dt.day_time_interval[];(0D01:00:00.100000000 0D02:00:00.200000000 0D03:00:00.300000000);::]
[
  0d3600100ms,
  0d7200200ms,
  0d10800300ms
]

dt.duration

Create a 64-bit duration (measured in units of specified granularity) datatype

.arrowkdb.dt.duration[time_unit]

Where time_unit is the time unit string: SECOND, MILLI, MICRO or NANO

returns the datatype identifier

duration datatype is not supported by Parquet

The Parquet file format is less fully featured compared to Arrow and consequently the Arrow/Parquet file writer currently does not support some datatypes or represents them using a different datatype as described here

q).arrowkdb.dt.printDatatype[.arrowkdb.dt.duration[`NANO]]
duration[ns]
q).arrowkdb.dt.getTimeUnit[.arrowkdb.dt.duration[`NANO]]
`NANO
q).arrowkdb.ar.prettyPrintArray[.arrowkdb.dt.duration[`NANO];(0D01:00:00.100000000 0D02:00:00.200000000 0D03:00:00.300000000);::]
[
  3600100000000,
  7200200000000,
  10800300000000
]

dt.binary

Create a variable length bytes datatype

.arrowkdb.dt.binary[]

Returns the datatype identifier

q).arrowkdb.dt.printDatatype[.arrowkdb.dt.binary[]]
binary
q).arrowkdb.ar.prettyPrintArray[.arrowkdb.dt.binary[];(enlist 0x11;0x2222;0x333333);::]
[
  11,
  2222,
  333333
]

dt.utf8

Create a UTF8 variable length string datatype

.arrowkdb.dt.utf8[]

Returns the datatype identifier

kdb+ type 11h can be written to an utf8 array

The is supported on the writing path only. Reading from an utf8 array returns a mixed list of 10h

q).arrowkdb.dt.printDatatype[.arrowkdb.dt.utf8[]]
string
q).arrowkdb.ar.prettyPrintArray[.arrowkdb.dt.utf8[];(enlist "a";"bb";"ccc");::]
[
  "a",
  "bb",
  "ccc"
]

dt.large_binary

Create a large (64-bit offsets) variable length bytes datatype

.arrowkdb.dt.large_binary[]

Returns the datatype identifier

large_binary datatype is not supported by Parquet

The Parquet file format is less fully featured compared to Arrow and consequently the Arrow/Parquet file writer currently does not support some datatypes or represents them using a different datatype as described here

q).arrowkdb.dt.printDatatype[.arrowkdb.dt.large_binary[]]
large_binary
q).arrowkdb.ar.prettyPrintArray[.arrowkdb.dt.large_binary[];(enlist 0x11;0x2222;0x333333);::]
[
  11,
  2222,
  333333
]

dt.large_utf8

Create a large (64-bit offsets) UTF8 variable length string datatype

.arrowkdb.dt.large_utf8[]

Returns the datatype identifier

large_utf8 datatype is not supported by Parquet

The Parquet file format is less fully featured compared to Arrow and consequently the Arrow/Parquet file writer currently does not support some datatypes or represents them using a different datatype as described here

q).arrowkdb.dt.printDatatype[.arrowkdb.dt.large_utf8[]]
large_string
q).arrowkdb.ar.prettyPrintArray[.arrowkdb.dt.large_utf8[];(enlist "a";"bb";"ccc");::]
[
  "a",
  "bb",
  "ccc"
]

dt.fixed_size_binary

Create a fixed width bytes datatype

.arrowkdb.dt.fixed_size_binary[byte_width]

Where byte_width is the int32 fixed size byte width (each value in the array occupies the same number of bytes).

returns the datatype identifier

kdb+ type 2h can be written to a fixed_size_binary(16) array

The is supported on the writing path only. Reading from a fixed_size_binary array returns a mixed list of 4h

q).arrowkdb.dt.printDatatype[.arrowkdb.dt.fixed_size_binary[2i]]
fixed_size_binary[2]
q).arrowkdb.dt.getByteWidth[.arrowkdb.dt.fixed_size_binary[2i]]
2i
q).arrowkdb.ar.prettyPrintArray[.arrowkdb.dt.fixed_size_binary[2i];(0x1111;0x2222;0x3333);::]
[
  1111,
  2222,
  3333
]

dt.decimal128

Create a 128-bit integer (with precision and scale in twos complement) datatype

.arrowkdb.dt.decimal128[precision;scale]

Where:

  • precision is the int32 precision width
  • scale is the int32 scaling factor

returns the datatype identifier

q).arrowkdb.dt.printDatatype[.arrowkdb.dt.decimal128[38i;2i]]
decimal(38, 2)
q).arrowkdb.dt.getPrecisionScale[.arrowkdb.dt.decimal128[38i;2i]]
38
2
q).arrowkdb.ar.prettyPrintArray[.arrowkdb.dt.decimal128[38i;2i];(0x00000000000000000000000000000000; 0x01000000000000000000000000000000; 0x00000000000000000000000000000080);::]
[
  0.00,
  0.01,
  -1701411834604692317316873037158841057.28
]
q) // With little endian twos complement the decimal128 values are 0, minimum positive, maximum negative

dt.list

Create a list datatype, specified in terms of its child datatype

.arrowkdb.dt.list[child_datatype_id]

Where child_datatype_id is the identifier of the list’s child datatype

returns the datatype identifier

q)list_datatype:.arrowkdb.dt.list[.arrowkdb.dt.int64[]]
q).arrowkdb.dt.printDatatype[list_datatype]
list<item: int64>
q).arrowkdb.dt.printDatatype[.arrowkdb.dt.getListDatatype[list_datatype]]
int64
q).arrowkdb.ar.prettyPrintArray[list_datatype;((enlist 1);(2 2);(3 3 3));::]
[
  [
    1
  ],
  [
    2,
    2
  ],
  [
    3,
    3,
    3
  ]
]

dt.large_list

Create a large (64-bit offsets) list datatype, specified in terms of its child datatype

.arrowkdb.dt.large_list[child_datatype_id]

Where child_datatype_id is the identifier of the list’s child datatype

returns the datatype identifier

q)list_datatype:.arrowkdb.dt.large_list[.arrowkdb.dt.int64[]]
q).arrowkdb.dt.printDatatype[list_datatype]
large_list<item: int64>
q).arrowkdb.dt.printDatatype[.arrowkdb.dt.getListDatatype[list_datatype]]
int64
q).arrowkdb.ar.prettyPrintArray[list_datatype;((enlist 1);(2 2);(3 3 3));::]
[
  [
    1
  ],
  [
    2,
    2
  ],
  [
    3,
    3,
    3
  ]
]

dt.fixed_size_list

Create a fixed size list datatype, specified in terms of its child datatype

.arrowkdb.dt.fixed_size_list[child_datatype_id;list_size]

Where:

  • child_datatype_id is the identifier of the list’s child datatype
  • list_size is the int32 fixed size of each of the child lists

returns the datatype identifier

fixed_size_list datatype is changed to list by Parquet

The Parquet file format is less fully featured compared to Arrow and consequently the Arrow/Parquet file writer currently does not support some datatypes or represents them using a different datatype as described here

q)list_datatype:.arrowkdb.dt.fixed_size_list[.arrowkdb.dt.int64[];2i]
q).arrowkdb.dt.printDatatype[list_datatype]
fixed_size_list<item: int64>[2]
q).arrowkdb.dt.printDatatype[.arrowkdb.dt.getListDatatype[list_datatype]]
int64
q).arrowkdb.dt.getListSize[list_datatype]
2i
q).arrowkdb.ar.prettyPrintArray[list_datatype;((1 1);(2 2);(3 3));::]
[
  [
    1,
    1
  ],
  [
    2,
    2
  ],
  [
    3,
    3
  ]
]

dt.map

Create a map datatype, specified in terms of its key and item child datatypes

.arrowkdb.dt.map[key_datatype_id;item_datatype_id]

Where:

  • key_datatype_id is the identifier of the map key child datatype
  • item_datatype_id is the identifier of the map item child datatype

returns the datatype identifier

q)map_datatype:.arrowkdb.dt.map[.arrowkdb.dt.int64[];.arrowkdb.dt.float64[]]
q).arrowkdb.dt.printDatatype[map_datatype]
map<int64, double>
q).arrowkdb.dt.printDatatype each .arrowkdb.dt.getMapDatatypes[map_datatype]
int64
double
::
::
q).arrowkdb.ar.prettyPrintArray[map_datatype;((enlist 1)!(enlist 1f);(2 2)!(2 2f);(3 3 3)!(3 3 3f));::]
[
  keys:
  [
    1
  ]
  values:
  [
    1
  ],
  keys:
  [
    2,
    2
  ]
  values:
  [
    2,
    2
  ],
  keys:
  [
    3,
    3,
    3
  ]
  values:
  [
    3,
    3,
    3
  ]
]

dt.struct

Create a struct datatype, specified in terms of the field identifiers of its children

.arrowkdb.dt.struct[field_ids]

Where field_ids is the list of field identifiers of the struct’s children

returns the datatype identifier

q)field_one:.arrowkdb.fd.field[`int_field;.arrowkdb.dt.int64[]]
q)field_two:.arrowkdb.fd.field[`utf8_field;.arrowkdb.dt.utf8[]]
q)struct_datatype:.arrowkdb.dt.struct[field_one,field_two]
q).arrowkdb.dt.printDatatype[struct_datatype]
struct<int_field: int64 not null, utf8_field: string not null>
q).arrowkdb.fd.fieldName each .arrowkdb.dt.getChildFields[struct_datatype]
`int_field`utf8_field
q).arrowkdb.dt.printDatatype each .arrowkdb.fd.fieldDatatype each .arrowkdb.dt.getChildFields[struct_datatype]
int64
string
::
::
q).arrowkdb.ar.prettyPrintArray[struct_datatype;((1 2 3);("aa";"bb";"cc"));::]
-- is_valid: all not null
-- child 0 type: int64
  [
    1,
    2,
    3
  ]
-- child 1 type: string
  [
    "aa",
    "bb",
    "cc"
  ]
q) // By slicing across the lists the logical struct values are: (1,"aa"); (2,"bb"); (3,"cc")

dt.sparse_union

Create a sparse union datatype, specified in terms of the field identifiers of its children

.arrowkdb.dt.sparse_union[field_ids]

Where field_ids is the list of field identifiers of the union’s children

returns the datatype identifier

An arrow union array is similar to a struct array except that it has an additional type_id array which identifies the live field in each union value set.

sparse_union datatype is not supported by Parquet

The Parquet file format is less fully featured compared to Arrow and consequently the Arrow/Parquet file writer currently does not support some datatypes or represents them using a different datatype as described here

q)field_one:.arrowkdb.fd.field[`int_field;.arrowkdb.dt.int64[]]
q)field_two:.arrowkdb.fd.field[`utf8_field;.arrowkdb.dt.utf8[]]
q)union_datatype:.arrowkdb.dt.sparse_union[field_one,field_two]
q).arrowkdb.dt.printDatatype[union_datatype]
sparse_union<int_field: int64 not null=0, utf8_field: string not null=1>
q).arrowkdb.fd.fieldName each .arrowkdb.dt.getChildFields[union_datatype]
`int_field`utf8_field
q).arrowkdb.dt.printDatatype each .arrowkdb.fd.fieldDatatype each .arrowkdb.dt.getChildFields[union_datatype]
int64
string
::
::
q).arrowkdb.ar.prettyPrintArray[union_datatype;((1 0 1h);(1 2 3);("aa";"bb";"cc"));::]
-- is_valid: all not null
-- type_ids:   [
    1,
    0,
    1
  ]
-- child 0 type: int64
  [
    1,
    2,
    3
  ]
-- child 1 type: string
  [
    "aa",
    "bb",
    "cc"
  ]
q) // Looking up the type_id array the logical union values are: "aa", 2, "cc"

dt.dense_union

Create a dense union datatype, specified in terms of the field identifiers of its children

.arrowkdb.dt.dense_union[field_ids]

Where field_ids is the list of field identifiers of the union’s children

returns the datatype identifier

An arrow union array is similar to a struct array except that it has an additional type_id array which identifies the live field in each union value set.

dense_union datatype is not supported by Parquet

The Parquet file format is less fully featured compared to Arrow and consequently the Arrow/Parquet file writer currently does not support some datatypes or represents them using a different datatype as described here

q)field_one:.arrowkdb.fd.field[`int_field;.arrowkdb.dt.int64[]]
q)field_two:.arrowkdb.fd.field[`utf8_field;.arrowkdb.dt.utf8[]]
q)union_datatype:.arrowkdb.dt.dense_union[field_one,field_two]
q).arrowkdb.dt.printDatatype[union_datatype]
dense_union<int_field: int64 not null=0, utf8_field: string not null=1>
q).arrowkdb.fd.fieldName each .arrowkdb.dt.getChildFields[union_datatype]
`int_field`utf8_field
q).arrowkdb.dt.printDatatype each .arrowkdb.fd.fieldDatatype each .arrowkdb.dt.getChildFields[union_datatype]
int64
string
::
::
q).arrowkdb.ar.prettyPrintArray[union_datatype;((1 0 1h);(1 2 3);("aa";"bb";"cc"));::]
-- is_valid: all not null
-- type_ids:   [
    1,
    0,
    1
  ]
-- value_offsets:   [
    0,
    0,
    0
  ]
-- child 0 type: int64
  [
    1,
    2,
    3
  ]
-- child 1 type: string
  [
    "aa",
    "bb",
    "cc"
  ]
q) // Looking up the type_id array the logical union values are: "aa", 2, "cc"

dt.dictionary

Create a dictionary datatype specified in terms of its value and index datatypes, similar to pandas categorical

.arrowkdb.dt.dictionary[value_datatype_id;index_datatype_id]

Where:

  • value_datatype_id is the identifier of the dictionary value datatype, must be a scalar type
  • index_datatype_id is the identifier of the dictionary index datatype, must be a signed int type

returns the datatype identifier

Only the categorical interpretation of a dictionary datatype array is saved by Parquet

The Parquet file format is less fully featured compared to Arrow and consequently the Arrow/Parquet file writer currently does not support some datatypes or represents them using a different datatype as described here

q)dict_datatype:.arrowkdb.dt.dictionary[.arrowkdb.dt.utf8[];.arrowkdb.dt.int64[]]
q).arrowkdb.dt.printDatatype[dict_datatype]
dictionary<values=string, indices=int64, ordered=0>
q).arrowkdb.dt.printDatatype each .arrowkdb.dt.getDictionaryDatatypes[dict_datatype]
string
int64
::
::
q).arrowkdb.ar.prettyPrintArray[dict_datatype;(("aa";"bb";"cc");(2 0 1 0 0));::]

-- dictionary:
  [
    "aa",
    "bb",
    "cc"
  ]
-- indices:
  [
    2,
    0,
    1,
    0,
    0
  ]
q) // The categorical interpretation of the dictionary (looking up the values set at each index) would be: "cc", "aa", "bb", "aa", "aa"

dt.inferDatatype

Infer and construct a datatype from a kdb+ list

.arrowkdb.dt.inferDatatype[list]

Where list is a kdb+ list

returns the datatype identifier

The kdb+ list type is mapped to an Arrow datatype as described here.

q).arrowkdb.dt.printDatatype[.arrowkdb.dt.inferDatatype[(1 2 3j)]]
int64
q).arrowkdb.dt.printDatatype[.arrowkdb.dt.inferDatatype[("aa";"bb";"cc")]]
string

Datatype inspection

dt.datatypeName

Return the base name of a datatype, ignoring any parameters or child datatypes/fields

.arrowkdb.dt.datatypeName[datatype_id]

Where datatype_id is the identifier of the datatype

returns a symbol containing the base name of the datatype

q).arrowkdb.dt.datatypeName[.arrowkdb.dt.int64[]]
`int64
q).arrowkdb.dt.datatypeName[.arrowkdb.dt.fixed_size_binary[4i]]
`fixed_size_binary

dt.getTimeUnit

Return the TimeUnit of a time32/time64/timestamp/duration datatype

.arrowkdb.dt.getTimeUnit[datatype_id]

Where datatype_id is the identifier of the datatype

returns a symbol containing the time unit string: SECOND/MILLI/MICRO/NANO

q).arrowkdb.dt.getTimeUnit[.arrowkdb.dt.timestamp[`NANO]]
`NANO

dt.getByteWidth

Return the byte_width of a fixed_size_binary datatype

.arrowkdb.dt.getByteWidth[datatype_id]

Where datatype_id is the identifier of the datatype

returns the int32 byte width

q).arrowkdb.dt.getByteWidth[.arrowkdb.dt.fixed_size_binary[4i]]
4i

dt.getListSize

Returns the list_size of a fixed_size_list datatype

.arrowkdb.dt.getListSize[datatype_id]

Where datatype_id is the identifier of the datatype

returns the int32 list size

q).arrowkdb.dt.getListSize[.arrowkdb.dt.fixed_size_list[.arrowkdb.dt.int64[];4i]]
4i

dt.getPrecisionScale

Return the precision and scale of a decimal128 datatype

.arrowkdb.dt.getPrecisionScale[datatype_id]

Where datatype_id is the identifier of the datatype

returns the int32 precision and scale

q).arrowkdb.dt.getPrecisionScale[.arrowkdb.dt.decimal128[38i;2i]]
38
2

dt.getListDatatype

Return the child datatype identifier of a list/large_list/fixed_size_list datatype

.arrowkdb.dt.getListDatatype[datatype_id]

Where datatype_id is the identifier of the datatype

returns the list’s child datatype identifier

q)list_datatype:.arrowkdb.dt.list[.arrowkdb.dt.int64[]]
q).arrowkdb.dt.printDatatype[.arrowkdb.dt.getListDatatype[list_datatype]]
int64

dt.getMapDatatypes

Return the key and item child datatype identifiers of a map datatype

.arrowkdb.dt.getMapDatatypes[datatype_id]

Where datatype_id is the identifier of the datatype

returns the map’s key and item child datatype identifiers

q)map_datatype:.arrowkdb.dt.map[.arrowkdb.dt.int64[];.arrowkdb.dt.float64[]]
q).arrowkdb.dt.printDatatype each .arrowkdb.dt.getMapDatatypes[map_datatype]
int64
double
::
::

dt.getDictionaryDatatypes

Return the value and index child datatype identifiers of a dictionary datatype

.arrowkdb.dt.getDictionaryDatatypes[datatype_id]

Where datatype_id is the identifier of the datatype

returns the dictionary’s value and index child datatype identifiers

q)dict_datatype:.arrowkdb.dt.dictionary[.arrowkdb.dt.utf8[];.arrowkdb.dt.int64[]]
q).arrowkdb.dt.printDatatype each .arrowkdb.dt.getDictionaryDatatypes[dict_datatype]
string
int64
::
::

dt.getChildFields

Return the list of child field identifiers of a struct/spare_union/dense_union datatype

.arrowkdb.dt.getChildFields[datatype_id]

Where datatype_id is the identifier of the datatype

returns the list of child field identifiers

q)field_one:.arrowkdb.fd.field[`int_field;.arrowkdb.dt.int64[]]
q)field_two:.arrowkdb.fd.field[`utf8_field;.arrowkdb.dt.utf8[]]
q)struct_datatype:.arrowkdb.dt.struct[field_one,field_two]
q).arrowkdb.fd.printField each .arrowkdb.dt.getChildFields[struct_datatype]
int_field: int64 not null
utf8_field: string not null
::
::

Datatype management

dt.printDatatype

Display user-readable information for a datatype, including parameters and nested child datatypes

.arrowkdb.dt.printDatatype[datatype_id]

Where datatype_id is the identifier of the datatype,

  1. prints datatype information to stdout
  2. returns generic null
For debugging use only

The information is generated by the arrow::DataType::ToString() functionality and displayed on stdout to preserve formatting and indentation.

q).arrowkdb.dt.printDatatype[.arrowkdb.dt.fixed_size_list[.arrowkdb.dt.int64[];4i]]
fixed_size_list<item: int64>[4]

dt.listDatatypes

Return the list of identifiers for all datatypes held in the DatatypeStore

.arrowkdb.dt.listDatatypes[]

Returns list of datatype identifiers

q).arrowkdb.dt.int64[]
1i
q).arrowkdb.dt.float64[]
2i
q).arrowkdb.dt.printDatatype each .arrowkdb.dt.listDatatypes[]
int64
double
::
::

dt.removeDatatype

Remove a datatype from the DatatypeStore

.arrowkdb.dt.removeDatatype[datatype_id]

Where datatype_id is the identifier of the datatype

returns generic null on success

q).arrowkdb.dt.int64[]
1i
q).arrowkdb.dt.float64[]
2i
q).arrowkdb.dt.listDatatypes[]
1 2i
q).arrowkdb.dt.removeDatatype[1i]
q).arrowkdb.dt.listDatatypes[]
,2i

dt.equalDatatypes

Check if two datatypes are logically equal, including parameters and nested child datatypes

.arrowkdb.dt.equalDatatypes[first_datatype_id;second_datatype_id]

Where:

  • first_datatype_id is the identifier of the first datatype
  • second_datatype_id is the identifier of the second datatype

returns boolean result

Internally the DatatypeStore uses the equalDatatypes functionality to prevent a new datatype identifier being created when an equal datatype is already present in the DatatypeStore, returning the existing datatype identifier instead.

q).arrowkdb.dt.equalDatatypes[.arrowkdb.dt.int64[];.arrowkdb.dt.int64[]]
1b
q).arrowkdb.dt.equalDatatypes[.arrowkdb.dt.int64[];.arrowkdb.dt.float64[]]
0b
q).arrowkdb.dt.equalDatatypes[.arrowkdb.dt.fixed_size_binary[4i];.arrowkdb.dt.fixed_size_binary[4i]]
1b
q).arrowkdb.dt.equalDatatypes[.arrowkdb.dt.fixed_size_binary[2i];.arrowkdb.dt.fixed_size_binary[4i]]
0b
q).arrowkdb.dt.equalDatatypes[.arrowkdb.dt.list[.arrowkdb.dt.int64[]];.arrowkdb.dt.list[.arrowkdb.dt.int64[]]]
1b
q).arrowkdb.dt.equalDatatypes[.arrowkdb.dt.list[.arrowkdb.dt.int64[]];.arrowkdb.dt.list[.arrowkdb.dt.float64[]]]
0b

Field constructor

fd.field

Create a field instance from its name and datatype

.arrowkdb.fd.field[field_name;datatype_id]

Where:

  • field_name is a symbol containing the field’s name
  • datatype_id is the identifier of the field’s datatype

returns the field identifier

q).arrowkdb.fd.printField[.arrowkdb.fd.field[`int_field;.arrowkdb.dt.int64[]]]
int_field: int64 not null

Field inspection

fd.fieldName

Name of a field

.arrowkdb.fd.fieldName[field_id]

Where field_id is the field identifier

returns a symbol containing the field’s name

q)field:.arrowkdb.fd.field[`int_field;.arrowkdb.dt.int64[]]
q).arrowkdb.fd.fieldName[field]
`int_field

fd.fieldDatatype

Datatype of a field

.arrowkdb.fd.fieldDatatype[field_id]

Where field_id is the field identifier

returns the datatype identifier

q)field:.arrowkdb.fd.field[`int_field;.arrowkdb.dt.int64[]]
q).arrowkdb.dt.printDatatype[.arrowkdb.fd.fieldDatatype[field]]
int64

Field management

fd.printField

Display user readable information for a field, including name and datatype

.arrowkdb.fd.printField[field_id]

Where field_id is the identifier of the field,

  1. prints field information to stdout
  2. returns generic null
For debugging use only

The information is generated by the arrow::Field::ToString() functionality and displayed on stdout to preserve formatting and indentation.

q).arrowkdb.fd.printField[.arrowkdb.fd.field[`int_field;.arrowkdb.dt.int64[]]]
int_field: int64 not null

fd.listFields

List of identifiers for all fields held in the FieldStore

.arrowkdb.fd.listFields[]

Returns list of field identifiers

q).arrowkdb.fd.field[`int_field;.arrowkdb.dt.int64[]]
1i
q).arrowkdb.fd.field[`float_field;.arrowkdb.dt.float64[]]
2i
q).arrowkdb.fd.printField each .arrowkdb.fd.listFields[]
int_field: int64 not null
float_field: double not null
::
::

fd.removeField

Remove a field from the FieldStore

.arrowkdb.fd.removeField[field_id]

Where field_id is the identifier of the field

returns generic null on success

q).arrowkdb.fd.field[`int_field;.arrowkdb.dt.int64[]]
1i
q).arrowkdb.fd.field[`float_field;.arrowkdb.dt.float64[]]
2i
q).arrowkdb.fd.listFields[]
1 2i
q).arrowkdb.fd.removeField[1i]
q).arrowkdb.fd.listFields[]
,2i

fd.equalFields

Check if two fields are logically equal, including names and datatypes

.arrowkdb.fd.equalDatatypes[first_field_id;second_field_id]

Where:

  • first_field_id is the identifier of the first field
  • second_field_id is the identifier of the second field

returns boolean result

Internally the FieldStore uses the equalFields functionality to prevent a new field identifier being created when an equal field is already present in the FieldStore, returning the existing field identifier instead.

q)int_dt:.arrowkdb.dt.int64[]
q)float_dt:.arrowkdb.dt.float64[]
q).arrowkdb.fd.equalFields[.arrowkdb.fd.field[`f1;int_dt];.arrowkdb.fd.field[`f1;int_dt]]
1b
q).arrowkdb.fd.equalFields[.arrowkdb.fd.field[`f1;int_dt];.arrowkdb.fd.field[`f2;int_dt]]
0b
q).arrowkdb.fd.equalFields[.arrowkdb.fd.field[`f1;int_dt];.arrowkdb.fd.field[`f1;float_dt]]
0b

Schema constructors

sc.schema

Create a schema instance from a list of field identifiers

.arrowkdb.sc.schema[field_ids]

Where fields_ids is a list of field identifiers

returns the schema identifier

q)f1:.arrowkdb.fd.field[`int_field;.arrowkdb.dt.int64[]]
q)f2:.arrowkdb.fd.field[`float_field;.arrowkdb.dt.float64[]]
q).arrowkdb.sc.printSchema[.arrowkdb.sc.schema[(f1,f2)]]
int_field: int64 not null
float_field: double not null

sc.inferSchema

Infer and construct a schema based on a kdb+ table

.arrowkdb.sc.inferSchema[table]

Where table is a kdb+ table or dictionary

returns the schema identifier

Inferred schemas only support a subset of the Arrow datatypes and is considerably less flexible than creating them with the datatype/field/schema constructors

Each column in the table is mapped to a field in the schema. The column name is used as the field name and the column’s kdb+ type is mapped to an Arrow datatype as as described here.

q)schema_from_table:.arrowkdb.sc.inferSchema[([] int_field:(1 2 3); float_field:(4 5 6f); str_field:("aa";"bb";"cc"))]
q).arrowkdb.sc.printSchema[schema_from_table]
int_field: int64
float_field: double
str_field: string

Schema inspection

sc.schemaFields

Return the list of field identifiers used by a schema

.arrowkdb.sc.schemaFields[schema_id]

Where schema_id is the schema identifier

returns list of field identifiers used by the schema

q)f1:.arrowkdb.fd.field[`int_field;.arrowkdb.dt.int64[]]
q)f2:.arrowkdb.fd.field[`float_field;.arrowkdb.dt.float64[]]
q)schema:.arrowkdb.sc.schema[(f1,f2)]
q).arrowkdb.fd.printField each .arrowkdb.sc.schemaFields[schema]
int_field: int64 not null
float_field: double not null
::
::

Schema management

sc.printSchema

Display user readable information for a schema, including its fields and their order

.arrowkdb.sc.printSchema[schema_id]

Where schema_id is the identifier of the schema,

  1. prints schema information to stdout
  2. returns generic null
For debugging use only

The information is generated by the arrow::Schema::ToString() functionality and displayed on stdout to preserve formatting and indentation.

q)f1:.arrowkdb.fd.field[`int_field;.arrowkdb.dt.int64[]]
q)f2:.arrowkdb.fd.field[`float_field;.arrowkdb.dt.float64[]]
q)f3:.arrowkdb.fd.field[`str_field;.arrowkdb.dt.utf8[]]
q)schema:.arrowkdb.sc.schema[(f1,f2,f3)]
q).arrowkdb.sc.printSchema[schema]
int_field: int64 not null
float_field: double not null
str_field: string not null

sc.listSchemas

Return the list of identifiers for all schemas held in the SchemaStore

.arrowkdb.sc.listSchemas[]

Returns list of schema identifiers

q)f1:.arrowkdb.fd.field[`int_field;.arrowkdb.dt.int64[]]
q)f2:.arrowkdb.fd.field[`float_field;.arrowkdb.dt.float64[]]
q).arrowkdb.sc.schema[(f1,f2)]
1i
q).arrowkdb.sc.schema[(f2,f1)]
2i
q).arrowkdb.sc.listSchemas[]
1 2i

sc.removeSchema

Remove a schema from the SchemaStore

.arrowkdb.sc.removeSchema[schema_id]

Where schema_id is the identifier of the schema

returns generic null on success

q)f1:.arrowkdb.fd.field[`int_field;.arrowkdb.dt.int64[]]
q)f2:.arrowkdb.fd.field[`float_field;.arrowkdb.dt.float64[]]
q).arrowkdb.sc.schema[(f1,f2)]
1i
q).arrowkdb.sc.schema[(f2,f1)]
2i
q).arrowkdb.sc.listSchemas[]
1 2i
q).arrowkdb.sc.removeSchema[1i]
q).arrowkdb.sc.listSchemas[]
,2i

sc.equalSchemas

Check if two schemas are logically equal, including their fields and the fields' order

.arrowkdb.sc.equalSchemas[first_schema_id;second_schema_id]

Where:

  • first_schema_id is the identifier of the first schema
  • second_schema_id is the identifier of the second schema

returns boolean result

Internally the SchemaStore uses the equalSchemas functionality to prevent a new schema identifier being created when an equal schema is already present in the SchemaStore, returning the existing schema identifier instead.

q)f1:.arrowkdb.fd.field[`int_field;.arrowkdb.dt.int64[]]
q)f2:.arrowkdb.fd.field[`float_field;.arrowkdb.dt.float64[]]
q).arrowkdb.sc.schema[(f1,f2)]
1i
q).arrowkdb.sc.schema[(f2,f1)]
2i
q).arrowkdb.sc.equalSchemas[.arrowkdb.sc.schema[(f1,f2)];.arrowkdb.sc.schema[(f1,f2)]]
1b
q).arrowkdb.sc.equalSchemas[.arrowkdb.sc.schema[(f1,f2)];.arrowkdb.sc.schema[(f1,f1)]]
0b
q).arrowkdb.sc.equalSchemas[.arrowkdb.sc.schema[(f1,f2)];.arrowkdb.sc.schema[(f2,f1)]]
0b

Array data

ar.prettyPrintArray

Convert a kdb+ list to an Arrow array and pretty print the array

.arrowkdb.ar.prettyPrintArray[datatype_id;list;options]

Where:

  • datatype_id is the datatype identifier of the array
  • list is the kdb+ list data to be displayed
  • options is a kdb+ dictionary of options or generic null (::) to use defaults. Dictionary key must be a 11h list. Values list can be 7h, 11h or mixed list of -7|-11|4h.

the function

  1. prints array contents to stdout
  2. returns generic null

Supported options:

  • DECIMAL128_AS_DOUBLE - Flag indicating whether to override the default type mapping for the Arrow decimal128 datatype and instead represent it as a double (9h). Long, default 0.
For debugging use only

The information is generated by the arrow::PrettyPrint() functionality and displayed on stdout to preserve formatting and indentation.

q)int_datatype:.arrowkdb.dt.int64[]
q).arrowkdb.ar.prettyPrintArray[int_datatype;(1 2 3j);::]
[
  1,
  2,
  3
]

ar.prettyPrintArrayFromList

Convert a kdb+ list to an Arrow array and pretty print the array, inferring the datatype from the kdb+ list type

.arrowkdb.ar.prettyPrintArrayFromList[list;options]

Where:

  • list is the kdb+ list data to be displayed
  • options is reserved for future use - specify generic null (::)

the function

  1. prints array contents to stdout
  2. returns generic null

The kdb+ list type is mapped to an Arrow datatype as described here.

For debugging use only

The information is generated by the arrow::PrettyPrint() functionality and displayed on stdout to preserve formatting and indentation.

q).arrowkdb.ar.prettyPrintArrayFromList[(1 2 3j);::]
[
  1,
  2,
  3
]

Table data

tb.prettyPrintTable

Convert a kdb+ mixed list of array data to an Arrow table and pretty print the table

.arrowkdb.tb.prettyPrintTable[schema_id;array_data;options]

Where:

  • schema_id is the schema identifier of the table
  • array_data is a mixed list of array data
  • options is a kdb+ dictionary of options or generic null (::) to use defaults. Dictionary key must be a 11h list. Values list can be 7h, 11h or mixed list of -7|-11|4h.

the function

  1. prints table contents to stdout
  2. returns generic null

The mixed list of Arrow array data should be ordered in schema field number and each list item representing one of the arrays must be structured according to the field’s datatype.

Supported options:

  • DECIMAL128_AS_DOUBLE - Flag indicating whether to override the default type mapping for the Arrow decimal128 datatype and instead represent it as a double (9h). Long, default 0.
For debugging use only

The information is generated by the arrow::Table::ToString() functionality and displayed on stdout to preserve formatting and indentation.

q)f1:.arrowkdb.fd.field[`int_field;.arrowkdb.dt.int64[]]
q)f2:.arrowkdb.fd.field[`float_field;.arrowkdb.dt.float64[]]
q)f3:.arrowkdb.fd.field[`str_field;.arrowkdb.dt.utf8[]]
q)schema:.arrowkdb.sc.schema[(f1,f2,f3)]
q).arrowkdb.tb.prettyPrintTable[schema;((1 2 3j);(4 5 6f);("aa";"bb";"cc"));::]
int_field: int64 not null
float_field: double not null
str_field: string not null
----
int_field:
  [
    [
      1,
      2,
      3
    ]
  ]
float_field:
  [
    [
      4,
      5,
      6
    ]
  ]
str_field:
  [
    [
      "aa",
      "bb",
      "cc"
    ]
  ]

tb.prettyPrintTableFromTable

Convert a kdb+ table to an Arrow table and pretty print the table, inferring the schema from the kdb+ table structure

.arrowkdb.tb.prettyPrintTableFromTable[table;options]

Where:

  • table is a kdb+ table
  • options is reserved for future use - specify generic null (::)

the function

  1. prints table contents to stdout
  2. returns generic null

Each column in the table is mapped to a field in the schema. The column name is used as the field name and the column’s kdb+ type is mapped to an Arrow datatype as as described here.

Inferred schemas only support a subset of the Arrow datatypes and is considerably less flexible than creating them with the datatype/field/schema constructors

Each column in the table is mapped to a field in the schema. The column name is used as the field name and the column’s kdb+ type is mapped to an Arrow datatype as as described here.

For debugging use only

The information is generated by the arrow::Table::ToString() functionality and displayed on stdout to preserve formatting and indentation.

q).arrowkdb.tb.prettyPrintTableFromTable[([] int_field:(1 2 3); float_field:(4 5 6f); str_field:("aa";"bb";"cc"));::]
int_field: int64
float_field: double
str_field: string
----
int_field:
  [
    [
      1,
      2,
      3
    ]
  ]
float_field:
  [
    [
      4,
      5,
      6
    ]
  ]
str_field:
  [
    [
      "aa",
      "bb",
      "cc"
    ]
  ]

Parquet files

pq.writeParquet

Convert a kdb+ mixed list of array data to an Arrow table and write to a Parquet file

.arrowkdb.pq.writeParquet[parquet_file;schema_id;array_data;options]

Where:

  • parquet_file is a string containing the Parquet file name
  • schema_id is the schema identifier to use for the table
  • array_data is a mixed list of array data
  • options is a kdb+ dictionary of options or generic null (::) to use defaults. Dictionary key must be a 11h list. Values list can be 7h, 11h or mixed list of -7|-11|4h.

returns generic null on success

The mixed list of Arrow array data should be ordered in schema field number and each list item representing one of the arrays must be structured according to the field’s datatype.

Supported options:

  • PARQUET_CHUNK_SIZE - Controls the approximate size of encoded data pages within a column chunk. Long, default 1MB.
  • PARQUET_VERSION - Select the Parquet format version, either V1.0 or V2.0. V2.0 is more fully featured but may be incompatible with older Parquet implementations. String, default V1.0
  • DECIMAL128_AS_DOUBLE - Flag indicating whether to override the default type mapping for the Arrow decimal128 datatype and instead represent it as a double (9h). Long, default 0.
The Parquet format is compressed and designed for for maximum space efficiency which may cause a performance overhead compared to Arrow. Parquet is also less fully featured than Arrow which can result in schema limitations

The Parquet file format is less fully featured compared to Arrow and consequently the Arrow/Parquet file writer currently does not support some datatypes or represents them using a different datatype as described here

q)f1:.arrowkdb.fd.field[`int_field;.arrowkdb.dt.int64[]]
q)f2:.arrowkdb.fd.field[`float_field;.arrowkdb.dt.float64[]]
q)f3:.arrowkdb.fd.field[`str_field;.arrowkdb.dt.utf8[]]
q)schema:.arrowkdb.sc.schema[(f1,f2,f3)]
q)array_data:((1 2 3j);(4 5 6f);("aa";"bb";"cc"))
q).arrowkdb.pq.writeParquet["file.parquet";schema;array_data;::]
q)read_data:.arrowkdb.pq.readParquetData["file.parquet";::]
q)array_data~read_data
1b

pq.writeParquetFromTable

Convert a kdb+ table to an Arrow table and write to a Parquet file, inferring the schema from the kdb+ table structure

.arrowkdb.pq.writeParquetFromTable[parquet_file;table;options]

Where:

  • parquet_file is a string containing the Parquet file name
  • table is a kdb+ table
  • options is a kdb+ dictionary of options or generic null (::) to use defaults. Dictionary key must be a 11h list. Values list can be 7h, 11h or mixed list of -7|-11|4h.

returns generic null on success

Supported options:

  • PARQUET_CHUNK_SIZE - Controls the approximate size of encoded data pages within a column chunk. Long, default 1MB.
  • PARQUET_VERSION - Select the Parquet format version, either V1.0 or V2.0. V2.0 is more fully featured but may be incompatible with older Parquet implementations. String, default V1.0
Inferred schemas only support a subset of the Arrow datatypes and is considerably less flexible than creating them with the datatype/field/schema constructors

Each column in the table is mapped to a field in the schema. The column name is used as the field name and the column’s kdb+ type is mapped to an Arrow datatype as as described here.

q)table:([] int_field:(1 2 3); float_field:(4 5 6f); str_field:("aa";"bb";"cc"))
q).arrowkdb.pq.writeParquetFromTable["file.parquet";table;::]
q)read_table:.arrowkdb.pq.readParquetToTable["file.parquet";::]
q)read_table~table
1b

pq.readParquetSchema

Read the schema from a Parquet file

.arrowkdb.pq.readParquetSchema[parquet_file]

Where parquet_file is a string containing the Parquet file name

returns the schema identifier

q)f1:.arrowkdb.fd.field[`int_field;.arrowkdb.dt.int64[]]
q)f2:.arrowkdb.fd.field[`float_field;.arrowkdb.dt.float64[]]
q)f3:.arrowkdb.fd.field[`str_field;.arrowkdb.dt.utf8[]]
q)schema:.arrowkdb.sc.schema[(f1,f2,f3)]
q)array_data:((1 2 3j);(4 5 6f);("aa";"bb";"cc"))
q).arrowkdb.pq.writeParquet["file.parquet";schema;array_data;::]
q).arrowkdb.sc.equalSchemas[schema;.arrowkdb.pq.readParquetSchema["file.parquet"]]
1b

pq.readParquetData

Read an Arrow table from a Parquet file and convert to a kdb+ mixed list of array data

.arrowkdb.pq.readParquetData[parquet_file;options]

Where:

  • parquet_file is a string containing the Parquet file name
  • options is a kdb+ dictionary of options or generic null (::) to use defaults. Dictionary key must be a 11h list. Values list can be 7h, 11h or mixed list of -7|-11|4h.

returns the array data

Supported options:

  • PARQUET_MULTITHREADED_READ - Flag indicating whether the Parquet reader should run in multithreaded mode. This can improve performance by processing multiple columns in parallel. Long, default 0.
  • USE_MMAP - Flag indicating whether the Parquet file should be memory mapped in. This can improve performance on systems which support mmap. Long, default: 0.
  • DECIMAL128_AS_DOUBLE - Flag indicating whether to override the default type mapping for the Arrow decimal128 datatype and instead represent it as a double (9h). Long, default 0.
q)f1:.arrowkdb.fd.field[`int_field;.arrowkdb.dt.int64[]]
q)f2:.arrowkdb.fd.field[`float_field;.arrowkdb.dt.float64[]]
q)f3:.arrowkdb.fd.field[`str_field;.arrowkdb.dt.utf8[]]
q)schema:.arrowkdb.sc.schema[(f1,f2,f3)]
q)array_data:((1 2 3j);(4 5 6f);("aa";"bb";"cc"))
q).arrowkdb.pq.writeParquet["file.parquet";schema;array_data;::]
q)read_data:.arrowkdb.pq.readParquetData["file.parquet";::]
q)array_data~read_data
1b

pq.readParquetColumn

Read a single column from a Parquet file and convert to a kdb+ list

.arrowkdb.pq.readParquetColumn[parquet_file;column_index;options]

Where:

  • parquet_file is a string containing the Parquet file name
  • column_index is the index of the column to read, relative to the schema field order
  • options is a kdb+ dictionary of options or generic null (::) to use defaults. Dictionary key must be a 11h list. Values list can be 7h, 11h or mixed list of -7|-11|4h.

returns the array’s data

Supported options:

  • DECIMAL128_AS_DOUBLE - Flag indicating whether to override the default type mapping for the Arrow decimal128 datatype and instead represent it as a double (9h). Long, default 0.
q)f1:.arrowkdb.fd.field[`int_field;.arrowkdb.dt.int64[]]
q)f2:.arrowkdb.fd.field[`float_field;.arrowkdb.dt.float64[]]
q)f3:.arrowkdb.fd.field[`str_field;.arrowkdb.dt.utf8[]]
q)schema:.arrowkdb.sc.schema[(f1,f2,f3)]
q)array_data:((1 2 3j);(4 5 6f);("aa";"bb";"cc"))
q).arrowkdb.pq.writeParquet["file.parquet";schema;array_data;::]
q)col1:.arrowkdb.pq.readParquetColumn["file.parquet";1i;::]
q)col1~array_data[1]
1b

pq.readParquetToTable

Read an Arrow table from a Parquet file and convert to a kdb+ table

.arrowkdb.pq.readParquetToTable[parquet_file;options]

Where:

  • parquet_file is a string containing the Parquet file name
  • options is a kdb+ dictionary of options or generic null (::) to use defaults. Dictionary key must be a 11h list. Values list can be 7h, 11h or mixed list of -7|-11|4h.

returns the kdb+ table

Each schema field name is used as the column name and the Arrow array data is used as the column data.

Supported options:

  • PARQUET_MULTITHREADED_READ - Flag indicating whether the Parquet reader should run in multithreaded mode. This can improve performance by processing multiple columns in parallel. Long, default 0.
  • USE_MMAP - Flag indicating whether the Parquet file should be memory mapped in. This can improve performance on systems which support mmap. Long, default: 0.
  • DECIMAL128_AS_DOUBLE - Flag indicating whether to override the default type mapping for the Arrow decimal128 datatype and instead represent it as a double (9h). Long, default 0.
q)table:([] int_field:(1 2 3); float_field:(4 5 6f); str_field:("aa";"bb";"cc"))
q).arrowkdb.pq.writeParquetFromTable["file.parquet";table;::]
q)read_table:.arrowkdb.pq.readParquetToTable["file.parquet";::]
q)read_table~table
1b

Arrow IPC files

ipc.writeArrow

Convert a kdb+ mixed list of array data to an Arrow table and write to an Arrow file

.arrowkdb.ipc.writeArrow[arrow_file;schema_id;array_data;options]

Where:

  • arrow_file is a string containing the Arrow file name
  • schema_id is the schema identifier to use for the table
  • array_data is a mixed list of array data
  • options is a kdb+ dictionary of options or generic null (::) to use defaults. Dictionary key must be a 11h list. Values list can be 7h, 11h or mixed list of -7|-11|4h.

returns generic null on success

The mixed list of Arrow array data should be ordered in schema field number and each list item representing one of the arrays must be structured according to the field’s datatype.

Supported options:

  • DECIMAL128_AS_DOUBLE - Flag indicating whether to override the default type mapping for the Arrow decimal128 datatype and instead represent it as a double (9h). Long, default 0.
q)f1:.arrowkdb.fd.field[`int_field;.arrowkdb.dt.int64[]]
q)f2:.arrowkdb.fd.field[`float_field;.arrowkdb.dt.float64[]]
q)f3:.arrowkdb.fd.field[`str_field;.arrowkdb.dt.utf8[]]
q)schema:.arrowkdb.sc.schema[(f1,f2,f3)]
q)array_data:((1 2 3j);(4 5 6f);("aa";"bb";"cc"))
q).arrowkdb.ipc.writeArrow["file.arrow";schema;array_data;::]
q)read_data:.arrowkdb.ipc.readArrowData["file.arrow";::]
q)read_data~array_data
1b

ipc.writeArrowFromTable

Convert a kdb+ table to an Arrow table and write to an Arrow file, inferring the schema from the kdb+ table structure

.arrowkdb.ipc.writeArrowFromTable[arrow_file;table;options]

Where:

  • arrow_file is a string containing the Arrow file name
  • table is a kdb+ table
  • options is reserved for future use - specify generic null (::)

returns generic null on success

Inferred schemas only support a subset of the Arrow datatypes and is considerably less flexible than creating them with the datatype/field/schema constructors

Each column in the table is mapped to a field in the schema. The column name is used as the field name and the column’s kdb+ type is mapped to an Arrow datatype as as described here.

q)table:([] int_field:(1 2 3); float_field:(4 5 6f); str_field:("aa";"bb";"cc"))
q).arrowkdb.ipc.writeArrowFromTable["file.arrow";table;::]
q)read_table:.arrowkdb.ipc.readArrowToTable["file.arrow";::]
q)read_table~table
1b

ipc.readArrowSchema

Read the schema from an Arrow file

.arrowkdb.ipc.readArrowSchema[arrow_file]

Where arrow_file is a string containing the Arrow file name

returns the schema identifier

q)f1:.arrowkdb.fd.field[`int_field;.arrowkdb.dt.int64[]]
q)f2:.arrowkdb.fd.field[`float_field;.arrowkdb.dt.float64[]]
q)f3:.arrowkdb.fd.field[`str_field;.arrowkdb.dt.utf8[]]
q)schema:.arrowkdb.sc.schema[(f1,f2,f3)]
q)array_data:((1 2 3j);(4 5 6f);("aa";"bb";"cc"))
q).arrowkdb.ipc.writeArrow["file.arrow";schema;array_data;::]
q).arrowkdb.sc.equalSchemas[schema;.arrowkdb.ipc.readArrowSchema["file.arrow"]]
1b

ipc.readArrowData

Read an Arrow table from an Arrow file and convert to a kdb+ mixed list of array data

.arrowkdb.ipc.readArrowData[arrow_file;options]

Where:

  • arrow_file is a string containing the Arrow file name
  • options is a kdb+ dictionary of options or generic null (::) to use defaults. Dictionary key must be a 11h list. Values list can be 7h, 11h or mixed list of -7|-11|4h.

returns the array data

Supported options:

  • USE_MMAP - Flag indicating whether the Parquet file should be memory mapped in. This can improve performance on systems which support mmap. Long, default: 0.
  • DECIMAL128_AS_DOUBLE - Flag indicating whether to override the default type mapping for the Arrow decimal128 datatype and instead represent it as a double (9h). Long, default 0.
q)f1:.arrowkdb.fd.field[`int_field;.arrowkdb.dt.int64[]]
q)f2:.arrowkdb.fd.field[`float_field;.arrowkdb.dt.float64[]]
q)f3:.arrowkdb.fd.field[`str_field;.arrowkdb.dt.utf8[]]
q)schema:.arrowkdb.sc.schema[(f1,f2,f3)]
q)array_data:((1 2 3j);(4 5 6f);("aa";"bb";"cc"))
q).arrowkdb.ipc.writeArrow["file.arrow";schema;array_data;::]
q)read_data:.arrowkdb.ipc.readArrowData["file.arrow";::]
q)read_data~array_data
1b

ipc.readArrowToTable

Read an Arrow table from an Arrow file and convert to a kdb+ table

.arrowkdb.ipc.readArrowToTable[arrow_file;options]

Where:

  • arrow_file is a string containing the Arrow file name
  • options is a kdb+ dictionary of options or generic null (::) to use defaults. Dictionary key must be a 11h list. Values list can be 7h, 11h or mixed list of -7|-11|4h.

returns the kdb+ table

Each schema field name is used as the column name and the Arrow array data is used as the column data.

Supported options:

  • USE_MMAP - Flag indicating whether the Parquet file should be memory mapped in. This can improve performance on systems which support mmap. Long, default: 0.
  • DECIMAL128_AS_DOUBLE - Flag indicating whether to override the default type mapping for the Arrow decimal128 datatype and instead represent it as a double (9h). Long, default 0.
q)table:([] int_field:(1 2 3); float_field:(4 5 6f); str_field:("aa";"bb";"cc"))
q).arrowkdb.ipc.writeArrowFromTable["file.arrow";table;::]
q)read_table:.arrowkdb.ipc.readArrowToTable["file.arrow";::]
q)read_table~table
1b

Arrow IPC streams

ipc.serializeArrow

Convert a kdb+ mixed list of array data to an Arrow table and serialize to an Arrow stream

.arrowkdb.ipc.serializeArrow[schema_id;array_data;options]

Where:

  • schema_id is the schema identifier to use for the table
  • array_data is a mixed list of array data
  • options is a kdb+ dictionary of options or generic null (::) to use defaults. Dictionary key must be a 11h list. Values list can be 7h, 11h or mixed list of -7|-11|4h.

returns a byte list containing the serialized stream data

The mixed list of Arrow array data should be ordered in schema field number and each list item representing one of the arrays must be structured according to the field’s datatype.

Supported options:

  • DECIMAL128_AS_DOUBLE - Flag indicating whether to override the default type mapping for the Arrow decimal128 datatype and instead represent it as a double (9h). Long, default 0.
q)f1:.arrowkdb.fd.field[`int_field;.arrowkdb.dt.int64[]]
q)f2:.arrowkdb.fd.field[`float_field;.arrowkdb.dt.float64[]]
q)f3:.arrowkdb.fd.field[`str_field;.arrowkdb.dt.utf8[]]
q)schema:.arrowkdb.sc.schema[(f1,f2,f3)]
q)array_data:((1 2 3j);(4 5 6f);("aa";"bb";"cc"))
q)serialized:.arrowkdb.ipc.serializeArrow[schema;array_data;::]
q)read_data:.arrowkdb.ipc.parseArrowData[serialized;::]
q)read_data~array_data
1b

ipc.serializeArrowFromTable

Convert a kdb+ table to an Arrow table and serialize to an Arrow stream, inferring the schema from the kdb+ table structure

.arrowkdb.ipc.serializeArrowFromTable[table;options]

Where:

  • table is a kdb+ table
  • options is reserved for future use - specify generic null (::)

returns a byte list containing the serialized stream data

Inferred schemas only support a subset of the Arrow datatypes and is considerably less flexible than creating them with the datatype/field/schema constructors

Each column in the table is mapped to a field in the schema. The column name is used as the field name and the column’s kdb+ type is mapped to an Arrow datatype as as described here.

q)table:([] int_field:(1 2 3); float_field:(4 5 6f); str_field:("aa";"bb";"cc"))
q)serialized:.arrowkdb.ipc.serializeArrowFromTable[table;::]
q)new_table:.arrowkdb.ipc.parseArrowToTable[serialized;::]
q)new_table~table
1b

ipc.parseArrowSchema

Parse the schema from an Arrow stream

.arrowkdb.ipc.parseArrowSchema[serialized]

Where serialized is a byte list containing the serialized stream data

returns the schema identifier

q)f1:.arrowkdb.fd.field[`int_field;.arrowkdb.dt.int64[]]
q)f2:.arrowkdb.fd.field[`float_field;.arrowkdb.dt.float64[]]
q)f3:.arrowkdb.fd.field[`str_field;.arrowkdb.dt.utf8[]]
q)schema:.arrowkdb.sc.schema[(f1,f2,f3)]
q)array_data:((1 2 3j);(4 5 6f);("aa";"bb";"cc"))
q)serialized:.arrowkdb.ipc.serializeArrow[schema;array_data;::]
q).arrowkdb.sc.equalSchemas[schema;.arrowkdb.ipc.parseArrowSchema[serialized]]
1b

ipc.parseArrowData

Parse an Arrow table from an Arrow stream and convert to a kdb+ mixed list of array data

.arrowkdb.ipc.parseArrowData[serialized;options]

Where:

  • serialized is a byte list containing the serialized stream data
  • options is a kdb+ dictionary of options or generic null (::) to use defaults. Dictionary key must be a 11h list. Values list can be 7h, 11h or mixed list of -7|-11|4h.

returns the array data

Supported options:

  • DECIMAL128_AS_DOUBLE - Flag indicating whether to override the default type mapping for the Arrow decimal128 datatype and instead represent it as a double (9h). Long, default 0.
q)f1:.arrowkdb.fd.field[`int_field;.arrowkdb.dt.int64[]]
q)f2:.arrowkdb.fd.field[`float_field;.arrowkdb.dt.float64[]]
q)f3:.arrowkdb.fd.field[`str_field;.arrowkdb.dt.utf8[]]
q)schema:.arrowkdb.sc.schema[(f1,f2,f3)]
q)array_data:((1 2 3j);(4 5 6f);("aa";"bb";"cc"))
q)serialized:.arrowkdb.ipc.serializeArrow[schema;array_data;::]
q)read_data:.arrowkdb.ipc.parseArrowData[serialized;::]
q)read_data~array_data
1b

ipc.parseArrowToTable

Parse an Arrow table from an Arrow file and convert to a kdb+ table

.arrowkdb.ipc.parseArrowToTable[serialized;options]

Where:

  • serialized is a byte list containing the serialized stream data
  • options is a kdb+ dictionary of options or generic null (::) to use defaults. Dictionary key must be a 11h list. Values list can be 7h, 11h or mixed list of -7|-11|4h.

returns the kdb+ table

Each schema field name is used as the column name and the Arrow array data is used as the column data.

Supported options:

  • DECIMAL128_AS_DOUBLE - Flag indicating whether to override the default type mapping for the Arrow decimal128 datatype and instead represent it as a double (9h). Long, default 0.
q)table:([] int_field:(1 2 3); float_field:(4 5 6f); str_field:("aa";"bb";"cc"))
q)serialized:.arrowkdb.ipc.serializeArrowFromTable[table;::]
q)new_table:.arrowkdb.ipc.parseArrowToTable[serialized;::]
q)new_table~table
1b

Utilities

util.buildInfo

Return build information regarding the in use Arrow library

.arrowkdb.util.buildInfo[]

Returns a dictionary detailing various Arrow build info including: Arrow version, shared object version, git description and compiler used.

q).arrowkdb.util.buildInfo[]
version         | 3000000i
version_string  | `3.0.0-SNAPSHOT
full_so_version | `300.0.0
compiler_id     | `MSVC
compiler_version| `19.26.28806.0
compiler_flags  | `/DWIN32 /D_WINDOWS  /GR /EHsc /D_SILENCE_TR1_NAMESPACE_DEP..
git_id          | `c8c2110cd7d01d2f4420079c450997ef5fa89029
git_description | `apache-arrow-2.0.0-194-gc8c2110cd
package_kind    | `