Skip to content

Convert text in PyKX

This page provides details on how to represent, handle, and convert text in PyKX.

In PyKX, text can be represented in various ways. Here are the basic building blocks for handling text within the library:

Type Description Example Generation
pykx.SymbolAtom A symbol atom in PyKX is an irreducible atomic entity storing an arbitrary number of characters. pykx.q('`test')
pykx.SymbolVector A symbol vector is a collected list of symbol atoms. pykx.q('`test`vector')
pykx.CharAtom A char atom holds a single ASCII or 8-but unicode character stored as 1 byte. pykx.q('"a"')
pykx.CharVector A char vector is a collected list of char vectors. pykx.q('"test"')

Head to our Text data section for a deeper dive into the underlying text representation.

Convert text to/from PyKX

To convert Pythonic text data to PyKX objects, use the pykx.SymbolAtom and pykx.CharVector functions as shown below:

>>> import pykx as kx
>>> pystring = 'test string'
>>> kx.SymbolAtom(pystring)
pykx.SymbolAtom(pykx.q('`test string'))
>>> kx.CharVector(pystring)
pykx.CharVector(pykx.q('"test string"'))

Alternatively, you use the automatic conversion function pykx.toq which takes an incoming Python type and converts it to its analogous PyKX type. The following table shows the mapping between the two types:

Python Type PyKX Type
str pykx.SymbolAtom
byte pykx.CharAtom/pykx.CharVector
>>> import pykx as kx
>>> kx.toq('string')
pykx.SymbolAtom(pykx.q('`string'))
>>> kx.toq(b'bytes')
pykx.CharVector(pykx.q('"bytes"'))
>>> kx.toq(b'a')
pykx.CharAtom(pykx.q('"a"'))

When using the pykx.toq function, you can specify the target type for your data as shown below. This can be useful when selectively converting data:

>>> import pykx as kx
>>> kx.toq('string', kx.CharVector)
pykx.CharVector(pykx.q('"string"'))
>>> kx.toq(b'bytes', kx.SymbolAtom)
pykx.SymbolAtom(pykx.q('`bytes'))

The pykx.toq conversion is used by default when passing Python data to PyKXfunctions, for example:

>>> import pykx as kx
>>> kx.q('{(x;y)}', 'string', b'bytes')
pykx.List(pykx.q('
`string
"bytes"
'))

Differences between Symbol and Char data objects

While there may appear to be limited differences between Symbol and Char representations of objects, the choice of underlying representation can have an impact on the performance and memory profile of many applications of PyKX. This section will describe a number of these differences and their impact in various scenarios.

Although Symbol and Charrepresentations of objects might seem similar, the choice between them can significantly affect the performance and memory usage of many PyKX applications. This section exploreS the impact of these differences in various scenarios.

Text access and mutability

The individual characters which comprise a pykx.SymbolAtom object are not directly accessible by a user; this limitation does not exist for pykx.CharVector objects. For example, it's possible to retrieve slices of a pykx.CharVector:

>>> import pykx as kx
>>> charVector = kx.CharVector('test')
>>> charVector
pykx.CharVector(pykx.q('"test"'))
>>> charVector[1:]
pykx.CharVector(pykx.q('"est"'))
>>> symbolAtom = kx.SymbolAtom('test')
>>> symbolAtom
pykx.SymbolAtom(pykx.q('`test'))
>>> symbolAtom[1:]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'SymbolAtom' object is not subscriptable

Similarly pykx.CharVector type objects are mutable while pykx.SymbolAtom type objects are not:

>>> import pykx as kx
>>> charVector = kx.CharVector('test')
>>> kx.q('{x[0]:"r";x}', charVector)
pykx.CharVector(pykx.q('"rest"'))

Memory considerations

When dealing with Symbol type objects, note that they are never deallocated once generated. You can notice this through growth of the syms key of kx.q.Q.w as follows:

>>> kx.q.Q.w()['syms']
pykx.LongAtom(pykx.q('2790'))
>>> kx.SymbolAtom('test')
pykx.SymbolAtom(pykx.q('`test'))
>>> kx.q.Q.w()['syms']
pykx.LongAtom(pykx.q('2791'))
>>> kx.SymbolAtom('testing')
pykx.SymbolAtom(pykx.q('`testing'))
>>> kx.q.Q.w()['syms']
pykx.LongAtom(pykx.q('2792'))

This is important as overuse of symbols can result in increased memory requirements for your processes. Symbols as such are best used when dealing with highly repetitive text data.