Database interaction and management functionality

Functionality for the interaction with and management of databases.

Warning

This functionality is provided in its present form as a BETA Feature and is subject to change. To enable this functionality for testing please following configuration instructions here setting PYKX_BETA_FEATURES='true'

`DB`

DB(*, path=None)

Bases: _TABLES

Singleton class used for the management of kdb+ Databases

create

create(
    table,
    table_name,
    partition,
    *,
    by_field=None,
    sym_enum=None,
    log=True,
    compress=None,
    encrypt=None
)

Create an on-disk partitioned table within a kdb+ database from a supplied pykx.Table object. Once generated this table will be accessible as an attribute of the DB class or a sub attribute of DB.table.

Parameters:

Name	Description	Default
`table`	The `pykx.Table` object which is to be persisted to disk	required
`table_name`	The name with which the table will be persisted and accessible once loaded and available as a `pykx.PartitionedTable`	required
`partition`	The name of the column which is to be used to partition the data if supplied as a `str` or if supplied as non string object this will be used as the partition to which all data is persisted	required
`by_field`	A field of the table to be used as a by column, this column will be the second column in the table (the first being the virtual column determined by the partitioning column)	`None`
`sym_enum`	The name of the symbol enumeration table to be associated with the table	`None`
`log`	Print information about status of partitioned datab	`True`
`compress`	`pykx.Compress` initialized class denoting the compression settings to be used when persisting a partition/partitions	`None`
`encrypt`	`pykx.Encrypt` initialized class denoting the encryption setting to be used when persisting a partition/partitions	`None`

Returns:

Type	Description
	A `None` object on successful invocation, the database class will be updated to contain attributes associated with the available created table

Examples:

Generate a partitioned table from a table containing multiple partitions

>>> import pykx as kx
>>> db = kx.DB(path = '/tmp/newDB')
>>> N = 1000
>>> qtab = kx.Table(data = {
...     'date': kx.q.asc(kx.random.random(N, kx.q('2020.01 2020.02 2020.03m'))),
...     'sym': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),
...     'price': kx.random.random(N, 10.0),
...     'size': kx.random.random(N, 100)
... })
>>> db.create(qtab, 'stocks', 'date', by_field = 'sym', sym_enum = 'symbols')
>>> db.tables
['stocks']
>>> db.stocks
pykx.PartitionedTable(pykx.q('
month   sym  price     size
---------------------------
2020.01 AAPL 7.979004  85
2020.01 AAPL 5.931866  55
2020.01 AAPL 5.255477  49
2020.01 AAPL 8.15255   74
2020.01 AAPL 4.771067  80
..
'))

Add a table as a partition to an on-disk database, in the example below we are adding a partition to the table generated above

>>> import pykx as kx
>>> db = kx.DB(path = '/tmp/newDB')
>>> N = 333
>>> qtab = kx.Table(data = {
...     'sym': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),
...     'price': kx.random.random(N, 10.0),
...     'size': kx.random.random(N, 100)
... })
>>> db.create(qtab, 'stocks', kx.q('2020.04m'), by_field = 'sym', sym_enum = 'symbols')
>>> db.tables
['stocks']
>>> db.stocks
pykx.PartitionedTable(pykx.q('
month   sym  price     size
---------------------------
2020.01 AAPL 7.979004  85
2020.01 AAPL 5.931866  55
2020.01 AAPL 5.255477  49
2020.01 AAPL 8.15255   74
2020.01 AAPL 4.771067  80
..
'))

Add a table as a partition to an on-disk database, in the example below we are additionally applying gzip compression to the persisted table

>>> import pykx as kx
>>> db = kx.DB(path = '/tmp/newDB')
>>> N = 333
>>> qtab = kx.Table(data = {
...     'sym': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),
...     'price': kx.random.random(N, 10.0),
...     'size': kx.random.random(N, 100)
... })
>>> compress = kx.Compress(kx.CompressionAlgorithm.gzip, level=2)
>>> db.create(qtab, 'stocks', kx.q('2020.04m'), compress=compress)
>>> kx.q('{-21!hsym x}', '/tmp/newDB/2020.04/stocks/price')
pykx.Dictionary(pykx.q('
compressedLength  | 2064
uncompressedLength| 2680
algorithm         | 2i
logicalBlockSize  | 17i
zipLevel          | 2i
'))

load

load(path, *, overwrite=False, encrypt=None)

Load the tables associated with a kdb+ Database, once loaded a table is accessible as an attribute of the DB class or a sub attribute of DB.table. Note that can alternatively be called when providing a path on initialisation of the DB class.

Parameters:

Name	Type	Description	Default
`path`	`Union[Path, str]`	The file system path at which your database is located	required
`overwrite`		Should loading of the database overwrite any currently loaded databases	`False`

Returns:

Type	Description
	A `None` object on successful invocation, the database class will be updated to contain attributes associated with available tables

Examples:

Load an on-disk database

>>> import pykx as kx
>>> db = kx.DB()
>>> db.load('testData')
>>> db.tables
['testData']
>>> db.testData
pykx.PartitionedTable(pykx.q('
month   sym  time         price    size
---------------------------------------
2020.01 FDP  00:00:00.004 90.94738 12
2020.01 FDP  00:00:00.005 33.81127 15
2020.01 FDP  00:00:00.027 88.89853 16
2020.01 FDP  00:00:00.035 78.33244 9
2020.01 JPM  00:00:00.055 68.65177 1
..
'))
>>> db.table.testData
pykx.PartitionedTable(pykx.q('
month   sym  time         price    size
---------------------------------------
2020.01 FDP  00:00:00.004 90.94738 12
2020.01 FDP  00:00:00.005 33.81127 15
2020.01 FDP  00:00:00.027 88.89853 16
2020.01 FDP  00:00:00.035 78.33244 9
2020.01 JPM  00:00:00.055 68.65177 1
..
'))

Load an on-disk database when initialising the class

>>> import pykx as kx
>>> db = kx.DB(path = 'testData')
>>> db.tables
['testData']
>>> db.testData
pykx.PartitionedTable(pykx.q('
month   sym  time         price    size
---------------------------------------
2020.01 FDP  00:00:00.004 90.94738 12
2020.01 FDP  00:00:00.005 33.81127 15
2020.01 FDP  00:00:00.027 88.89853 16
2020.01 FDP  00:00:00.035 78.33244 9
2020.01 JPM  00:00:00.055 68.65177 1
..
'))
>>> db.table.testData
pykx.PartitionedTable(pykx.q('
month   sym  time         price    size
---------------------------------------
2020.01 FDP  00:00:00.004 90.94738 12
2020.01 FDP  00:00:00.005 33.81127 15
2020.01 FDP  00:00:00.027 88.89853 16
2020.01 FDP  00:00:00.035 78.33244 9
2020.01 JPM  00:00:00.055 68.65177 1
..
'))

rename_column

rename_column(table, original_name, new_name)

Rename a column within a loaded kdb+ Database

Parameters:

Name	Description	Default
`table`	The name of the table within which a column is to be renamed	required
`original_name`	Name of the column which is to be renamed	required
`new_name`	Column name which will be used as the new column name	required

Returns:

Type	Description
	A `None` object on successful invocation, the database class will be updated and column rename actioned.

Examples:

Rename the column of a table

>>> import pykx as kx
>>> db = kx.DB()
>>> db.load('testDB')
>>> db.tables
['testTable']
>>> db.list_columns('testTable')
['month', 'sym', 'time', 'price', 'size']
>>> db.rename_column('testTable', 'sym', 'symbol')
>>> db.testTable
pykx.PartitionedTable(pykx.q('
month   symbol  time         price    size
---------------------------------------
2020.01 FDP     00:00:00.004 90.94738 12
2020.01 FDP     00:00:00.005 33.81127 15
2020.01 FDP     00:00:00.027 88.89853 16
2020.01 FDP     00:00:00.035 78.33244 9
2020.01 JPM     00:00:00.055 68.65177 1
..
'))

delete_column

delete_column(table, column)

Delete the column of a loaded kdb+ Database

Parameters:

Name	Type	Description	Default
`table`		The name of the table within which a column is to be deleted	required
`column`		Column which is to be deleted from the database	required

Returns:

Type	Description
	A `None` object on successful invocation, the database class will be updated and specified column deleted

Examples:

Delete the column of a table

>>> import pykx as kx
>>> db = kx.DB()
>>> db.load('testDB')
>>> db.tables
['testTable']
>>> db.list_columns('testTable')
['month', 'sym', 'time', 'price', 'size']
>>> db.delete_column('testTable', 'size')
>>> db.testTable
pykx.PartitionedTable(pykx.q('
month   symbol  time         price
-------------------------------------
2020.01 FDP     00:00:00.004 90.94738
2020.01 FDP     00:00:00.005 33.81127
2020.01 FDP     00:00:00.027 88.89853
2020.01 FDP     00:00:00.035 78.33244
2020.01 JPM     00:00:00.055 68.65177
..
'))

rename_table

rename_table(original_name, new_name)

Rename a table within a loaded kdb+ Database

Parameters:

Name	Type	Description	Default
`original_name`		The name of the table which is to be renamed	required
`new_name`		Updated table name	required

Returns:

Type	Description
	A `None` object on successful invocation, the database class will be updated, original table name deleted from q memory and new table accessible

Examples:

Rename a database table

>>> import pykx as kx
>>> db = kx.DB()
>>> db.load('testDB')
>>> db.tables
['testTable']
>>> db.rename_table('testTable', 'updated_table')
>>> db.tables
['updated_table']

list_columns

list_columns(table)

List the columns of a table within a loaded kdb+ Database

Parameters:

Name	Type	Description	Default
`table`		The name of the table whose columns are listed	required

Returns:

Type	Description
	A list of strings defining the columns of a table

Examples:

List the columns of a table in a database

>>> import pykx as kx
>>> db = kx.DB()
>>> db.load('testDB')
>>> db.tables
['testTable']
>>> db.list_columns('testTable')
['month', 'sym', 'time', 'price', 'size']

add_column

add_column(table, column_name, default_value)

Add a column to a table within a loaded kdb+ Database

Parameters:

Name	Description	Default
`table`	The name of the table to which a column is to be added	required
`column_name`	Name of the column to be added	required
`default_value`	The default value to be used for all existing partitions	required

Returns:

Type	Description
	A `None` object on successful invocation, the database class will be updated and the new column available for use/access

Examples:

Add a column to a table within a partitioned database

>>> import pykx as kx
>>> db = kx.DB()
>>> db.load('testDB')
>>> db.tables
['testTable']
>>> db.list_columns('testTable')
['month', 'sym', 'time', 'price', 'size']
>>>  db.add_column('testTable', 'test', kx.IntAtom.null)
>>> db.list_columns('testTable')
['month', 'sym', 'time', 'price', 'size']

find_column

find_column(table, column_name)

Functionality for finding a column across partitions within a loaded kdb+ Database

Parameters:

Name	Type	Description	Default
`table`		The name of the table within which columns are to be found	required
`column_name`		The name of the column to be found within a table	required

Returns:

Type	Description
	A `None` object on successful invocation printing search status per partition,
	if a column does not exist in a specified partition an error will be raised
	and the logs will indicate which columns did not have the specified column.

Examples:

Find a column that exists

>>> import pykx as kx
>>> db = kx.DB()
>>> db.load('testDB')
>>> db.tables
['testTable']
>>> db.list_columns('testTable')
['month', 'sym', 'time', 'price', 'size']
>>> db.find_column('price')
2023.11.10 16:48:57 column price (type 0) in `:/usr/pykx/db/2015.01.01/testTable
2023.11.10 16:48:57 column price (type 0) in `:/usr/pykx/db/2015.01.02/testTable

Attempt to find a column that does not exist

>>> import pykx as kx
>>> db = kx.DB()
>>> db.load('testDB')
>>> db.tables
['testTable']
>>> db.list_columns('testTable')
['month', 'sym', 'time', 'price', 'size']
>>> db.find_column('side')
2023.11.10 16:49:02 column side *NOT*FOUND* in `:/usr/pykx/db/2015.01.01/testTable
2023.11.10 16:49:02 column side *NOT*FOUND* in `:/usr/pykx/db/2015.01.02/testTable
Traceback (most recent call last):
...
pykx.exceptions.QError: Requested column not found in all partitions, see log output above

reorder_columns

reorder_columns(table, new_order)

Reorder the columns of a persisted kdb+ database

Parameters:

Name	Type	Description	Default
`table`		The name of the table within which columns will be rearranged	required
`new_order`		The ordering of the columns following update	required

Returns:

Type	Description
	A `None` object on successfully updating the columns of the database

Examples:

Update the order of columns for a persisted kdb+ database

>>> import pykx as kx
>>> db = kx.DB()
>>> db.load('testDB')
>>> db.tables
['testTable']
>>> col_list = db.list_columns('testTable')
>>> col_list
['month', 'sym', 'time', 'price', 'size']
>>> col_list.reverse()
>>> col_list
['size', 'price', 'time', 'sym', 'month']
>>> db.reorder_columns('testTable', col_list)
2023.11.13 17:56:17 reordering columns in `:/usr/pykx/2015.01.01/testTable
2023.11.13 17:56:17 reordering columns in `:/usr/pykx/2015.01.02/testTable
['month', 'sym', 'time', 'price', 'size']

set_column_attribute

set_column_attribute(table, column_name, new_attribute)

Set an attribute associated with a column for an on-disk database

Parameters:

Name	Description	Default
`table`	The name of the table within which an attribute will be set	required
`column_name`	Name of the column to which the attribute will be applied	required
`new_attribute`	The attribute which is to be applied, this can be one of 'sorted'/'u', 'partitioned'/'p', 'unique'/'u' or 'grouped'/'g'.	required

Returns:

Type	Description
	A `None` object on successfully setting the attribute for a column

Examples:

Add an attribute to a column of a persisted database

>>> import pykx as kx
>>> db = kx.DB()
>>> db.load('testDB')
>>> db.tables
['testTable']
>>> kx.q.meta(db.testTable)
pykx.KeyedTable(pykx.q('
c   | t f a
----| -----
date| d
test| j
p   | f
sym | s
'))
>>> db.set_column_attribute('testTable', 'sym', 'grouped')
>>> kx.q.meta(db.testTable)
pykx.KeyedTable(pykx.q('
c   | t f a
----| -----
date| d
test| j
p   | f
sym | s   g
'))

set_column_type

set_column_type(table, column_name, new_type)

Convert/set the type of a column to a specified type

Parameters:

Name	Description	Default
`table`	The name of the table within which a column is to be converted	required
`column_name`	Name of the column which is to be converted	required
`new_type`	PyKX type to which a column is to be converted	required

Returns:

Type	Description
	A `None` object on successfully updating the type of the column

Examples:

Convert the type of a column within a database table

>>> import pykx as kx
>>> db = kx.DB()
>>> db.load('testDB')
>>> db.tables
['testTable']
>>> kx.q.meta(db.testTable)
pykx.KeyedTable(pykx.q('
c   | t f a
----| -----
date| d
test| j
p   | f
sym | s
'))
>>> db.set_column_type('testTable', 'test', kx.FloatAtom)
>>> kx.q.meta(db.testTable)
pykx.KeyedTable(pykx.q('
c   | t f a
----| -----
date| d
test| f
p   | f
sym | s
'))

clear_column_attribute

clear_column_attribute(table, column_name)

Clear an attribute associated with a column of an on-disk database

Parameters:

Name	Type	Description	Default
`table`		The name of the table within which the attribute of a column will be removed	required
`column_name`		Name of the column from which an attribute will be removed	required

Returns:

Type	Description
	A `None` object on successful removal of the attribute of a column

Examples:

Remove an attribute of a column of a persisted database

>>> import pykx as kx
>>> db = kx.DB()
>>> db.load('testDB')
>>> db.tables
['testTable']
>>> kx.q.meta(db.testTable)
pykx.KeyedTable(pykx.q('
c   | t f a
----| -----
date| d
test| j
p   | f
sym | s    g
'))
>>> db.clear_column_attribute('testTable', 'sym')
>>> kx.q.meta(db.testTable)
pykx.KeyedTable(pykx.q('
c   | t f a
----| -----
date| d
test| j
p   | f
sym | s
'))

copy_column

copy_column(table, original_column, new_column)

Create a copy of a column within a table

Parameters:

Name	Description	Default
`table`	Name of the table	required
`original_column`	Name of the column to be copied	required
`new_column`	Name of the copied column	required

Returns:

Type	Description
	A `None` object on successful column copy, reloading the
	database following column copy

Examples:

Copy a column within a kdb+ database

>>> import pykx as kx
>>> db = kx.DB()
>>> db.load('testDB')
>>> db.list_columns('testTable')
['month', 'sym', 'time', 'price', 'size']
>>> db.copy_column('testTable', 'size', 'dup_size')
['month', 'sym', 'time', 'price', 'size', 'dup_size']

apply_function

apply_function(table, column_name, function)

Apply a function per partition on a column of a persisted kdb+ database

Parameters:

Name	Description	Default
`table`	Name of the table	required
`column_name`	Name of the column on which the function is to be applied	required
`function`	Callable function to be applied on a column vector per column	required

Returns:

Type	Description
	A `None` object on successful application of a function to the column
	and the reloading of the database

Examples:

Apply a q function to a specified column per partition

>>> import pykx as kx
>>> db = kx.DB()
>>> db.load('testDB')
>>> db.testTable
pykx.PartitionedTable(pykx.q('
month   symbol  time         price
-------------------------------------
2020.01 FDP     00:00:00.004 90.94738
2020.01 FDP     00:00:00.005 33.81127
2020.01 FDP     00:00:00.027 88.89853
2020.01 FDP     00:00:00.035 78.33244
2020.01 JPM     00:00:00.055 68.65177
..
'))
>>> db.apply_function('testTable', 'price', kx.q('2*'))
>>> db.testTable
pykx.PartitionedTable(pykx.q('
month   symbol  time         price
-------------------------------------
2020.01 FDP     00:00:00.004 181.8948
2020.01 FDP     00:00:00.005 67.62254
2020.01 FDP     00:00:00.027 177.7971
2020.01 FDP     00:00:00.035 156.6649
2020.01 JPM     00:00:00.055 137.3035
..
'))

Apply a Python function to the content of a specified column per partition

>>> import pykx as kx
>>> db = kx.DB()
>>> db.load('testDB')
>>> db.testTable
pykx.PartitionedTable(pykx.q('
month   symbol  time         price
-------------------------------------
2020.01 FDP     00:00:00.004 90.94738
2020.01 FDP     00:00:00.005 33.81127
2020.01 FDP     00:00:00.027 88.89853
2020.01 FDP     00:00:00.035 78.33244
2020.01 JPM     00:00:00.055 68.65177
..
'))
>>> db.apply_function('testTable', 'price', lambda x:2*x.np())
>>> db.testTable
pykx.PartitionedTable(pykx.q('
month   symbol  time         price
-------------------------------------
2020.01 FDP     00:00:00.004 181.8948
2020.01 FDP     00:00:00.005 67.62254
2020.01 FDP     00:00:00.027 177.7971
2020.01 FDP     00:00:00.035 156.6649
2020.01 JPM     00:00:00.055 137.3035
..
'))

fill_database

fill_database()

Fill missing tables from partitions within a database using the most recent partition as a template, this will report the partitions but not the tables which are being filled.

Returns:

Type	Description
	A `None` object on successful filling of missing tables in partitioned database

Examples:

Fill missing tables from a database

>>> import pykx as kx
>>> db = kx.DB(path = 'newDB')
>>> db.fill_database()
Successfully filled missing tables to partition: :/usr/newDB/2020.04
Successfully filled missing tables to partition: :/usr/newDB/2020.03
Successfully filled missing tables to partition: :/usr/newDB/2020.02
Successfully filled missing tables to partition: :/usr/newDB/2020.01

partition_count

partition_count(*, subview=None)

Count the number of rows per partition for the presently loaded database. Use of the parameter subview can allow users to count only the rows in specifies partitions.

Parameters:

Name	Type	Description	Default
`subview`		An optional list of partitions from which to retrieve the per partition count	`None`

Returns:

Type	Description
	A `pykx.Dictionary` object showing the count of data in each table within the presently loaded partioned database.

Warning

Using this function will result in any specified subview of the data being reset, if you require the use of a subview for queries please reset using the database subview command.

Examples:

Copy a column within a kdb+ database

>>> import pykx as kx
>>> db = kx.DB(path = 'newDB')
>>> db.partition_count()
pykx.Dictionary(pykx.q('
       | trades quotes
-------| -------------
2020.01| 334    0
2020.02| 324    0
2020.03| 342    1000
'))

Copy a column within a kdb+ database

>>> import pykx as kx
>>> db = kx.DB(path = 'newDB')
>>> db.partition_count(sub_view = kx.q('2020.01 2020.02m'))
pykx.Dictionary(pykx.q('
       | trades quotes
-------| -------------
2020.01| 334    0
2020.02| 324    0
'))

subview

subview(view=None)

Specify the subview to be used when querying a partitioned table

Parameters:

Name	Type	Description	Default
`view`		A list of partition values which will serve as a filter for all queries against any partitioned table within the database. If view is supplied as `None` this will reset the query view to all partitions	`None`

Returns:

Type	Description
	A `None` object on successful setting of the view state

Examples:

Set the subview range to include only 2020.02 and 2020.03

>>> import pykx as kx
>>> db = kx.DB(path = 'newDB')
>>> db.subview(kx.q('2020.02 2020.03m')
>>> kx.q.qsql.select(db.trades, 'month')
pykx.Table(pykx.q('
month
-------
2020.02
2020.03
'))

Reset the database subview to include a fully specified range

>>> import pykx as kx
>>> db = kx.DB(path = 'newDB')
>>> db.subview()
>>> kx.q.qsql.select(db.trades, 'month')
pykx.Table(pykx.q('
month
-------
2020.01
2020.02
2020.03
2020.04
2020.05
'))

enumerate

enumerate(table, *, sym_file=None)

Perform an enumeration on a user specified table against the current sym files associated with the database

Parameters:

Name	Description	Default
`path`	The folder location to which your table will be persisted	required
`table`	The `pykx.Table` object which is to be persisted to disk and which is to undergo enumeration	required
`sym_file`	The name of the sym file contained in the folder specified by the `path` parameter against which enumeration will be completed	`None`

Returns:

Type	Description
	The supplied table with enumeration applied

Examples:

Enumerate the symbol columns of a table without specifying the sym file

>>> import pykx as kx
>>> db = kx.DB(path = 'newDB')
>>> N = 1000
>>> tab = kx.Table(data = {
...     'x': kx.random.random(N, ['a', 'b', 'c']),
...     'x1': kx.random.random(N, 1.0),
...     'x2': kx.random.random(N, 10)
... }
>>> tab = db.enumerate(tab)
>>> tab['x']
pykx.EnumVector(pykx.q('`sym$`a`b`a`c`b..'))

Enumerate the symbol columns of a table specifying the sym file used

>>> import pykx as kx
>>> db = kx.DB(path = 'newDB')
>>> N = 1000
>>> tab = kx.Table(data = {
...     'x': kx.random.random(N, ['a', 'b', 'c']),
...     'x1': kx.random.random(N, 1.0),
...     'x2': kx.random.random(N, 10)
... }
>>> tab = db.enumerate(tab, sym_file = 'mysym')
>>> tab['x']
pykx.EnumVector(pykx.q('`mysym$`a`b`a`c`b..'))