Type mapping between kdb+ and Protobuf¶
We describe how q numeric and string types are represented in Protobuf, how Protobuf structures are mapped to q.
Protobuf/q mapping¶
A Protobuf message is composed of a number of fields where each field can be scalar, repeated scalar, sub-message, repeated sub-message, enum or map. For example:
message MyMessage {
int32 scalar_int = 1;
repeated double repeated_double = 2;
MyMessage sub_message = 3;
repeated MyMessage repeated_msg = 4;
enum EnumType {
ZERO = 0;
ONE = 1;
}
EnumType enum_field = 5;
map<int64, string> int_str = 6;
}
A Protobuf message is represented in q using a mixed list with length equal to the number of fields in the message. The fields are iterated in the order they are declared in the message definition, which may differ from the field number tags.
The q type used for each list item depends on the field type:
field type kdb+ structure
--------------------------------------------
Scalar Atom
Repeated scalar Simple list
Sub-message Mixed list
Repeated Sub-message List of mixed lists
Enum Integer of enum value
Map Dictionary
Where on serialization you do not wish to explicitly set a field, a generic null (::)
can be specified in the mixed list for that field’s value.
Furthermore, an additional ::
can be included at the end of the mixed list (past the number of message fields). Such ::
field values are ignored and this can be used to prevent q from changing the message’s mixed list to a simple list, e.g. where all fields have the same q type. (Parsing will return a mixed list without the additional trailing ::
since it is being created directly, outside of q.)
Scalar fields¶
Scalar fields are represented as q atoms with their type mapping based on the underlying C++ representation.
scalar C++ kdb+
----------------------------------------------
int32, sint32, sfixed32 int32 -6h
uint32, fixed32 int32 -6h
int64, sint64, sfixed64 int64 -7h
uint64, fixed64 int64 -7h
double double -9h
float float -8h
bool bool -1h
enum int32 -6h
string string 10h
bytes string 4h
When parsing from Protobuf to kdb+, for any field which has not been explicitly set in the encoded message, Protobuf will return the default value for that field type which is then populated as usual into the corresponding q element.
Similarly when serializing from q to Protobuf, any q element set to its field-specific default value is equivalent to not explicitly setting that field in the encoded message. It is necessary to pad unspecified message fields with their default value or ::
in q in order to maintain the one-to-one positional mapping between any given message field and its corresponding q element.
Repeated fields¶
Repeated fields are represented as singularly typed q lists with the underlying q type based on the repeated type:
repeated C++ kdb+
-----------------------------------------
int32, sint32, sfixed32 int32 6h
uint32, fixed32 int32 6h
int64, sint64, sfixed64 int64 7h
uint64, fixed64 int64 7h
double double 9h
float float 8h
bool bool 1h
enum int32 6h
string string 0h (of 10h)
bytes string 0h (of 4h)
Map fields¶
Protobuf maps are represented as kdb+ dictionaries with keys and values defined as follows
The Protobuf wire format and map iteration ordering of map items is not deterministic
You cannot rely upon it to return a particular dictionary ordering on conversion to q.
Protobuf map keys can only be integer, boolean or string types and are therefore limited to the following type mappings.
map-key C++ kdb+
-----------------------------------------
int32, sint32, sfixed32 int32 6h
uint32, fixed32 int32 6h
int64, sint64, sfixed64 int64 7h
uint64, fixed64 int64 7h
bool bool 1h
string string 11h
Protobuf map values can be any type other than repeated fields or maps and therefore are defined by the following type mapping.
map-value C++ kdb+
-----------------------------------------------
int32, sint32, sfixed32 int32 6h
uint32, fixed32 int32 6h
int64, sint64, sfixed64 int64 7h
uint64, fixed64 int64 7h
double double 9h
float float 8h
bool bool 1h
enum int32 6h
string string 0h (of 10h)
bytes string 0h (of 4h)
message [message class] 0h
Oneof fields¶
Oneof fields are similar to regular fields except all the fields in a oneof share memory, and at most one field can be set at the same time. Setting any member of the oneof automatically clears all the other members.
The q representation of a oneof field is therefore dependent on whether that field is currently the active member of the oneof:
Oneof field state kdb+ representation
---------------------------------------
Set As per regular field
Unset Empty mixed list
When serializing from kdb+ to Protobuf it is possible to specify values for multiple oneof fields
This is valid usage of oneof and does not produce an error; rather the oneof will be set to the value of the last specified field.
KdbTypeSpecifier
field option¶
To support the use of the q temporal types and GUIDs which do not have an equivalent representation in Protobuf, a field option extension is provided in src/kdb_type_specifier.proto
(for compiled in message definitions) and proto/kdb_type_specified.proto
(for dynamically imported message definitions). This allows a q-specific context to be applied to fields, map-keys and map-values:
syntax = "proto2";
import "google/protobuf/descriptor.proto";
enum KdbTypeSpecifier {
DEFAULT = 0;
TIMESTAMP = 1;
MONTH = 2;
DATE = 3;
DATETIME = 4;
TIMESPAN = 5;
MINUTE = 6;
SECOND = 7;
TIME = 8;
GUID = 9;
// Internal use only
// Must be at the end since it is used to determine the enum size for lookup arrays
KDBTYPE_LEN = 10;
}
message MapKdbTypeSpecifier {
optional KdbTypeSpecifier key_type = 1;
optional KdbTypeSpecifier value_type = 2;
}
extend google.protobuf.FieldOptions {
optional KdbTypeSpecifier kdb_type = 756866;
optional MapKdbTypeSpecifier map_kdb_type = 756867;
}
Protobuf versions
For the purpose of compatibility across different versions of Protobuf the above has been defined in proto2. This definition is equally valid in proto3.
To apply a KdbTypeSpecifier
to a field, import kdb_type_specifier.proto
into your .proto
file, then specify the KdbTypeSpecifier
field option.
For example:
syntax = "proto3";
import "kdb_type_specifier.proto";
message SpecifierExample {
int32 date = 1 [(kdb_type) = DATE]; // Scalar
repeated int32 time = 2 [(kdb_type) = TIME]; // Repeated
map<string, int64> guid_timespan = 3 [(map_kdb_type).key_type = GUID, // Map-key
(map_kdb_type).value_type = TIMESPAN]; // Map-value
}
The following is the mapping between the KdbTypeSpecifier
and associated q types.
KdbTypeSpecifier kdb+ type
---------------------------
GUID 2h
TIMESTAMP 12h
MONTH 13h
DATE 14h
DATETIME 15h
TIMESPAN 16h
MINUTE 17h
SECOND 18h
TIME 19h
KdbTypeSpecifier=DEFAULT
is equivalent to the specifier not being present, i.e. the q type is determined from the Proto field type.
The KdbTypeSpecifier
must be compatible with the defined Proto field type. The following outlines the compatible combinations of types:
KdbTypeSpecifier Compatible Field Type
----------------------------------------------------------
GUID string, bytes
TIMESTAMP int64, sint64, sfixed64, uint64, fixed64
MONTH int32, sint32, sfixed32, uint32, fixed32
DATE int32, sint32, sfixed32, uint32, fixed32
DATETIME double
TIMESPAN int64, sint64, sfixed64, uint64, fixed64
MINUTE int32, sint32, sfixed32, uint32, fixed32
SECOND int32, sint32, sfixed32, uint32, fixed32
TIME int32, sint32, sfixed32, uint32, fixed32
Type checking¶
When serializing from q to Protobuf, the q structure must conform to the mappings above with respect to that particular message definition. This is important both in terms of the type of each message field (scalar, repeated, map, etc.) and the Proto type specified to represent that field (int32, double, string, etc.)
The interface will type-check each field as appropriate and return an error if a mismatch is detected detailing:
- The type of failure
- The message/field where it occurred
- The expected q type
- The received q type
For example:
q).protobufkdb.displayMessageSchema[`ScalarExample]
message ScalarExample {
int32 scalar_int32 = 1;
double scalar_double = 2;
string scalar_string = 3;
}
// Correct function invocation
q)array:.protobufkdb.serializeArrayFromList[`ScalarExample;(12i;55f;"str")]
q)array:.protobufkdb.serializeArrayFromList[`ScalarExample;(12i;55f)]
'Incorrect number of fields, message: 'ScalarExample', expected: 3, received: 2
[0] array:.protobufkdb.serializeArrayFromList[`ScalarExample;(12i;55f)]
^
q)array:.protobufkdb.serializeArrayFromList[`ScalarExample;(12j;55f;"str")]
'Invalid scalar type, field: 'ScalarExample.scalar_int32', expected: -6, received: -7
[0] array:.protobufkdb.serializeArrayFromList[`ScalarExample;(12j;55f;"str")]
^
q)array:.protobufkdb.serializeArrayFromList[`ScalarExample;(enlist 12i;55f;"str")]
'Invalid scalar type, field: 'ScalarExample.scalar_int32', expected: -6, received: 6
[0] array:.protobufkdb.serializeArrayFromList[`ScalarExample;(enlist 12i;55f;"str")]
^
Representing messages as q dictionaries¶
As described above, protobufkdb represents a message in q as a mixed list of field values in field-positional order. This more closely ties in with how protobuf serializes messages, and is recommended for performance reasons.
However, you may prefer the q mapping for messages to be a dictionary from symbol field names to a mixed list of field values, similarly to how JSON is represented. This style is supported through separate APIs for both the parsing and serialization functions.
Rather than the field-positional order:
- Each field name is looked up in the message and its corresponding field value applied
- Any message fields without a field name/value pair in the dictionary are not explicitly set
- Where is field value is set to the generic null (
::
) that field name/value pair is ignored - Where a field name is specified (and its value is not
::
) but that field is not present is the message, a type-check error is returned
In addition the dictionary style is used recursively when fields contain other messages:
field type kdb+ structure
------------------------------------------------------------------------------------
Sub-message Field name dictionary
Repeated sub-message List of field name dictionaries or a flip table
Map to message Map-value is a list of field name dictionaries or a flip table
All other field types use the same q mapping for the field value as is used by the mixed-list message style.