Python Objects¶
This page lists python objects that are referenced from elsewhere in the documentation.
Pipeline Objects¶
class Pipeline()
Thin Python wrapper around a q Stream Processor pipeline.
This class provides special handling for the | operator. When given a Pipeline
instance on either side, it produces a new Pipeline instance that is comprised of the
operators from the pipeline on the left connected in series with those from the pipeline on
the right.
The | operator does not modify a pipeline definition in-place. Pipelines are immutable.
See also: splitting a pipeline, merging two pipelines with a joining function, and unifying two pipelines
When the right argument of | is a list of pipelines, and the left argument is a single
pipeline, the result is a list where each element is a pipeline with the left argument joined
to one of the elements from the list on the right.
When the left argument of | is a list of pipelines, and the right argument is a single
pipeline, the result is a single pipeline obtained by taking the union
of every pipeline from the list with the pipeline on the right.
Examples:
Join a reader to a map operation, then join that pipeline to a writer, and run the resulting pipeline:
>>> from kxi import sp
>>> sp.run(sp.read.from_expr(lambda: range(10, 30, 2))
| sp.map(lambda x: x*x)
| sp.write.to_console(timestamp='none'))
100 144 196 256 324 400 484 576 676 784
Join a reader to multiple map operations, and then have them all output to the same writer:
>>> from kxi import sp
>>> reader = sp.read.from_expr(lambda: range(10))
>>> maps = (sp.map(lambda x: x), sp.map(lambda x: x ** 2), sp.map(lambda x: x ** 3))
>>> writer = sp.write.to_console(timestamp='none')
>>> sp.run(reader | maps | writer)
0 1 2 3 4 5 6 7 8 9
0 1 4 9 16 25 36 49 64 81
0 1 8 27 64 125 216 343 512 729
as_dot¶
@cached_property
def as_dot() -> str
Provides the graph structure of the pipeline in the DOT format.
validate¶
def validate() -> None
Validate the structure of the pipeline graph.
Raises:
pykx.QError- If the pipeline is not valid; the error message will explain why.
SPModule Objects¶
class SPModule(ModuleType)
mode¶
@cached_property
def mode()
The mode of the Stream Processor.
'local'if operating independently as a local deployment.'cluster'if operating as part of a larger cluster with a parent controller.
OperatorParams Objects¶
class OperatorParams(AutoNameEnum)
Specifies a parameter that will be provided to the function you provide to an operator.
operator¶
Provide the operator's dictionary, which may be required by other SP functions.
metadata¶
Provide the message's metadata, which may be required by other SP functions.
data¶
Provide the message's data.
Operator Objects¶
class Operator()
Stream Processor operator interface.
An operator is a first-class building block in the stream processor API. Operators can be
strung together to form a kxi.sp.Pipeline instance,
which can then be run.
Pipeline and operator objects can be joined together using the | operator. Operators can also
be joined to lists/tuples of pipelines or operators.
See Also:
Attributes:
idstr - The unique ID of this operator.
as_pipeline¶
@property
def as_pipeline()
A new pipeline that only contains this operator.
OperatorFunction¶
Custom Python type defined as:
OperatorFunction = Union[Callable, str]
OperatorSpecifier¶
Custom Python type defined as:
OperatorSpecifier = Union[str, kx.SymbolAtom, kx.Dictionary, Operator]
CharString¶
Custom Python type defined as:
CharString = Union[str, bytes, kx.CharVector]
Metadata¶
Custom Python type defined as:
Metadata = kx.Dictionary
TimedeltaSpec¶
Custom Python type defined as:
TimedeltaSpec = namedtuple('TimedeltaSpec', ('magnitude', 'unit'))
Timedelta¶
Custom Python type defined as:
Timedelta = Union[timedelta, np.timedelta64, kx.TimespanAtom, TimedeltaSpec]
TimestampSpec¶
Custom Python type defined as:
TimestampSpec = Union[kx.TimestampAtom, datetime]
DictSpec¶
Custom Python type defined as:
DictSpec = Union[dict[CharString, CharString], kx.Dictionary[CharString, CharString]]
FileChunking Objects¶
class FileChunking(IntEnum)
Enum for file chunking options.
Chunking a file splits the file into smaller batches, and streams the batches through the pipeline.
These enum values can be provided as True or False for enabled and disabled
respectively.
disabled¶
Do not split the file into chunks.
enabled¶
Split the file into chunks.
auto¶
Automatically determine the size of the target file, and if it is sufficiently large (more than a few megabytes) it will be read in chunks.
FileMode Objects¶
class FileMode(AutoNameEnum)
Enum for file mode options.
These enum values can be provided as enum member objects (e.g. FileMode.binary), or as
strings matching the names of members (e.g. 'binary').
binary¶
Read the content of the file into a byte vector.
text¶
Read the content of the file into strings, and split on newlines.
ParquetMode Objects¶
class ParquetMode(AutoNameEnum)
Enum for parquet file mode options.
These enum values can be provided as enum member objects (e.g. ParquetMode.table), or as
strings matching the names of members (e.g. 'table').
table¶
Read the content of the parquet file into a table.
lists¶
Read the content of the parquet file into list of arrays.
KafkaOffset Objects¶
class KafkaOffset(IntEnum)
Where to start consuming a Kafka partition.
beginning¶
Start consumption at the beginning of the partition.
end¶
Start consumption at the end of the partition.
CSVHeader Objects¶
class CSVHeader(AutoNameEnum)
Enum for csv header options.
These enum values can be provided as enum member objects (e.g. CSVHeader.always), or as
strings matching the names of the members (e.g. 'always').
none¶
Encoded data never starts with a header row.
always¶
Encoded data always starts with a header row.
first¶
Encoded data initially starts with a header row.
CSVEncoding Objects¶
class CSVEncoding(AutoNameEnum)
Enum for csv encoding formats.
These enum values can be provided as enum member objects (e.g. CSVEncoding.UTF8), or as
strings matching the names of the members (e.g. 'UTF8').
UTF8¶
Data is expected to be encoded in UTF8 format.
ASCII¶
Data is expected to be encoded in ASCII format.
PayloadType Objects¶
class PayloadType(AutoNameEnum)
Enum to specify the payload type for a protobuf encoding.
table¶
table.
dict¶
dictionary.
array¶
array.
arrays¶
arrays.
ArrowPayloadType Objects¶
class ArrowPayloadType(AutoNameEnum)
Enum to specify the payload type for Arrow encoding.
table¶
table.
arrays¶
arrays.
CSVHeader Objects¶
class CSVHeader(AutoNameEnum)
Enum for csv header options.
These enum values can be provided as enum member objects (e.g. CSVHeader.always), or as
strings matching the names of the members (e.g. 'always').
none¶
Encoded data never starts with a header row.
always¶
Encoded data always starts with a header row.
first¶
Only first batch starts with a header row.
InputType Objects¶
class InputType(AutoNameEnum)
Input type of data to schema plugin.
arrays¶
data is always array type
table¶
data is always table type
auto¶
data can be of mixed type
Parse Objects¶
class Parse(AutoNameEnum)
Parse string data to other types.
on¶
on
off¶
off
auto¶
auto
ConsoleTimestamp Objects¶
class ConsoleTimestamp(AutoNameEnum)
Enum for to_console timestamp options.
These enum values can be provided as enum member objects (e.g. ConsoleTimestamp.utc), or as
strings matching the names of the members (e.g. 'utc').
local¶
Prefix each output line with a local timestamp.
utc¶
Prefix each output line with a utc timestamp.
none¶
Do not prefix the any output lines with a timestamp.
default¶
Equivalent to 'none' if using qlog, and equivalent to 'utc' otherwise. The default
option allows qlog to use its own timestamps, instead of ones provided by the writer.
AmazonS3Teardown Objects¶
class AmazonS3Teardown(AutoNameEnum)
Enum for to_amazon_s3 teardown options
These enum values can be provided as enum member objects (e.g. AmazonS3Teardown.complete),
or as strings matching the names of the members (e.g. 'complete').
none¶
Leave any partial uploads in a pending state to be resumed by a future pipeline.
abort¶
Abort an pending partial uploads. This means any processed data that is still pending will be lost on teardown.
complete¶
Mark any partial uploads as complete. This will flush any partial data buffers to S3 ensure that any in-flight data is saved. However, once the data is saved, it cannot be appended to
TargetMode Objects¶
class TargetMode(AutoNameEnum)
The kind of object a specified target in a kdb+ process is.
These enum values can be provided as enum member objects (e.g. TargetMode.table), or as
strings matching the names of members (e.g. 'table').
function¶
The target is a function defined in the kdb+ process. It will be called with the data being written to the process.
table¶
The target is a table defined in the kdb+ process. It will be upsert with the data being written to the process.
VariableMode Objects¶
class VariableMode(AutoNameEnum)
How to set/update the specified kdb+ variable.
These enum values can be provided as enum member objects (e.g. VariableMode.upsert), or as
strings matching the names of members (e.g. 'upsert').
append¶
The data from the stream will be appended to the variable.
overwrite¶
The variable will be set to the last output of the pipeline.
upsert¶
The tabular data from the stream will be upserted to the variable, which must be a table.
BatchType Objects¶
class BatchType(AutoNameEnum)
Enum for the type of batches to use when training a model.
single¶
Single batch of k data points.
shuffle¶
Shuffle the dataset and split it into k batches.
shuffle_rep¶
Shuffle the dataset and create k batches with potential repeated data points.
non_shuffle¶
Keep the natural order of the dataset and take k batches.
no_batch¶
Take the whole dataset with its natural order2.
Distance Objects¶
class Distance(AutoNameEnum)
Enum to specify the distance metric to use with a model.
edist¶
Euclidean distance.
e2dist¶
Squared euclidean distance.
Metric Objects¶
class Metric(AutoNameEnum)
Enum to specify the metric to use to evaluate a model.
f1¶
F1 score.
accuracy¶
Accuracy score.
mse¶
Mean squared error.
rmse¶
Root mean squared error.
ModelType Objects¶
class ModelType(AutoNameEnum)
Enum to specify the type of ML model.
q¶
Q native model.
sklearn¶
Scikit-learn model.
Penalty Objects¶
class Penalty(AutoNameEnum)
Enum to specify the penalty/regularization to use when training a model.
l1¶
L1 regularization.
l2¶
L2 regularization.
elastic_net¶
Elastic Net regularization.
format_dict_string¶
def format_dict_string(folder_path: Union[str, dict] = None)
Parameters:
| name | type | description |
|---|---|---|
| folder_path | string or dictionary | The folder_path as a str or appropriate dict. |
Returns:
The dict or str formatted as an appropriate k object.
format_string¶
def format_string(name: str = None)
Convert a str object to a q character vector.
Parameters:
| name | type | description |
|---|---|---|
| name | string | The name of the item to be converted. |
Returns:
The name object converted from an str to a pykx.CharVector or unchanged.
sp.schema¶
sp.schema is a PyKX Table containing the column details of a schema specified in the argument to Get Schema. It is defined as follows:
| key | type | description | default |
|---|---|---|---|
| name | pykx.SymbolAtom | Schema name | b'' |
| datatype | pykx.ShortAtom | 0 | |
| tokenize | pykx.BooleanAtom | False | |
| primary | pykx.BooleanAtom | False |
Example:
sp.get_schema('iceTrade')
name datatype tokenize primary
----------------------------------------
instrumentID -11 0 0
eventTimestamp -12 0 0
price -9 0 0
quantity -9 0 0
tCond1 -7 0 0
tTime -12 0 0
exchTime -7 0 0
uid -7 0 0
idCode -7 0 0
eVenueID -7 0 0
tMode -11 0 0
mmtuID -11 0 0
sourceID -7 0 0