Skip to content

Text representation in PyKX

Within PyKX text can be represented in a number of ways that you will encounter when using the library. The following are the basic building blocks for text within PyKX, a deeper dive into the underlying text representation can be found here:

Type Description Example Generation
pykx.SymbolAtom A symbol atom in PyKX is an irreducible atomic entity storing an arbitrary number of characters. pykx.q('`test')
pykx.SymbolVector A symbol vector is a collected list of symbol atoms. pykx.q('`test`vector')
pykx.CharAtom A char atom holds a single ASCII or 8-but unicode character stored as 1 byte. pykx.q('"a"')
pykx.CharVector A char vector is a collected list of char vectors pykx.q('"test"')

Converting text to/from PyKX

Pythonic text data can be converted to PyKX objects directly through use of the pykx.SymbolAtom and pykx.CharVector functions as shown below

>>> import pykx as kx
>>> pystring = 'test string'
>>> kx.SymbolAtom(pystring)
pykx.SymbolAtom(pykx.q('`test string'))
>>> kx.CharVector(pystring)
pykx.CharVector(pykx.q('"test string"'))

Alternatively you can make use of the automatic conversion function pykx.toq which will take an incoming Python type and convert it to its analagous PyKX type. The following table shows the mapping which is used

Python Type PyKX Type
str pykx.SymbolAtom
byte pykx.CharAtom/pykx.CharVector
>>> import pykx as kx
>>> kx.toq('string')
pykx.SymbolAtom(pykx.q('`string'))
>>> kx.toq(b'bytes')
pykx.CharVector(pykx.q('"bytes"'))
>>> kx.toq(b'a')
pykx.CharAtom(pykx.q('"a"'))

When using the pykx.toq function it is possible to specify the target type for your data as shown below, this can be useful when selectively converting data

>>> import pykx as kx
>>> kx.toq('string', kx.CharVector)
pykx.CharVector(pykx.q('"string"'))
>>> kx.toq(b'bytes', kx.SymbolAtom)
pykx.SymbolAtom(pykx.q('`bytes'))

An important note on the above when using PyKX functions is that the pykx.toq conversion will be used by default when passing Python data to these functions, for example:

>>> import pykx as kx
>>> kx.q('{(x;y)}', 'string', b'bytes')
pykx.List(pykx.q('
`string
"bytes"
'))

Differences between Symbol and Char data objects

While there may appear to be limited differences between Symbol and Char representations of objects, the choice of underlying representation can have an impact on the performance and memory profile of many applications of PyKX. This section will describe a number of these differences and their impact in various scenarios.

Text access and mutability

The individual characters which comprise a pykx.SymbolAtom object are not directly accessible by a user, this limitation does not exist for pykx.CharVector objects. For example it is possible to retrieve slices of a pykx.CharVector

>>> import pykx as kx
>>> charVector = kx.CharVector('test')
>>> charVector
pykx.CharVector(pykx.q('"test"'))
>>> charVector[1:]
pykx.CharVector(pykx.q('"est"'))
>>> symbolAtom = kx.SymbolAtom('test')
>>> symbolAtom
pykx.SymbolAtom(pykx.q('`test'))
>>> symbolAtom[1:]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'SymbolAtom' object is not subscriptable

Similarly pykx.CharVector type objects are mutable while pykx.SymbolAtom type objects are not

>>> import pykx as kx
>>> charVector = kx.CharVector('test')
>>> kx.q('{x[0]:"r";x}', charVector)
pykx.CharVector(pykx.q('"rest"'))

Memory considerations

An important point of note when dealing with Symbol type objects is that these are never deallocated once generated, this can be seen through growth of the syms key of kx.q.Q.w as follows

>>> kx.q.Q.w()['syms']
pykx.LongAtom(pykx.q('2790'))
>>> kx.SymbolAtom('test')
pykx.SymbolAtom(pykx.q('`test'))
>>> kx.q.Q.w()['syms']
pykx.LongAtom(pykx.q('2791'))
>>> kx.SymbolAtom('testing')
pykx.SymbolAtom(pykx.q('`testing'))
>>> kx.q.Q.w()['syms']
pykx.LongAtom(pykx.q('2792'))

This is important as overuse of symbols can result in increased memory requirements for your processes. Symbols as such are best used when dealing with highly repetitive text data.